For this variable, then, there are no actual missing observations, even though the pre-coded dataset allows for various categories of potential missings. Unsurprisingly, there are no instances of these values in the dataset. 8 and 9 indicate “Don’t know” and “Refusals.” Although in the dataset these values are coded, they are not relevant for this variable because it is completed by the interviewer on giving the self-completion questionnaire to the respondent.
The count column shows, though, that no values are thus coded, meaning that everyone was assigned to one of the three versions of the questionnaire. −2 and −1 denote missing data by design – in this case, it would mean that respondents were not allocated to A, B, or C. In addition to the three versions, coded 1, 2, or 3, there are other values that could be regarded as missing. Different statistical packages treat missings slightly differently for different computations and procedures, but the principles remain the same.įigures 10– 13 show information about coding and response frequencies for each of the five variables of different types in the dataset.įigure 10 shows information for the variable ABCVer, which identifies which of the questionnaire version, A, B, or C, that each respondent received. Both are examples of so-called “complete-case analysis” or “listwise deletion” where cases without data on the variable of interest are not used to derive estimates related to this variable. From the point of view of this introductory guide, the main options are, broadly, to discard any cases with missings on any of the variables used in a set of analyses that produce statistical estimates or alternatively, to use all possible cases available for each estimate, thereby using a different subset of cases for each analysis.
Secondly, you need to understand the reasons for each type of missingness – is it planned, refusal, non-substantive? Thirdly, you will need to decide how to prepare the data for analysis. The first task is to assess the amount of missing data and how it is distributed across the variables in your dataset. How you deal with missing data depends on the type of missingness, how much of the data are missing, and the reasons for the missingness. With less data, one’s inferences are less powerful, and if the data are missing in patterns that relate to what is being measured in the study, then inferences can be biased as a result. Missing data can pose a problem for the analysis of the data and for drawing appropriate inferences. Simply put, data are missing when information that is present in variables for some cases is not present for others. Missing data are found in almost every dataset.