Missing data
What is missing data?
Surveys may have missing data.
Variables and cases may have missing data.
In SPSS:
- For numeric variables, the default for missing data is a "full-stop" in the cell. Alternatively a specific value, such as -1 or 99, can be used to indicate missing values. If using such a value to indicate missing data, this will need to specified through the variable view for each variable.
- For string variables, missing data is indicated by a blank cell
Why is some data missing?
Some reasons include:
- Participant chose not to answer
- Participant answered but it is illegible
- Data entry person didn't enter the data
- Data analyst accidently or intentionally removed the data
Dealing with missing data
The presence of missing data should be identified through data screening.
Strategies for dealing with missing data should be decided prior to data analysis.
One strategy for dealing with missing data is listwise. This means that all cases with even a single piece of missing data (for the variables in an analysis) will not be used e.g.,:
DESCRIPTIVES VARIABLES=VAR00001 /STATISTICS=MEAN STDDEV MIN MAX /MISSING=LISTWISE.
In other words, to be used in the analysis, a case must have no missing data.
Alternatively, missing data can be dealt with pairwise. This means that all available data is used, even from cases with some missing data.
Other approaches involve imputation. This involves predicting or "filling in" the missing data. The simplest form of imputation is mean replacement (i.e., replace the missing data with the mean score for other cases for the same variable). More sophisticated imputation uses regression-based prediction (using scores on other related variables to predict the missing value).
See also
- Missing data (Wikipedia)
External links
- Missing data treatment (Howell)
- Statistical analysis with missing data (Little & Rubin, 2002)