Abstract
-
Mixed models are regularly used in the analysis of clustered data, but are only recently
being used for imputation of missing data. In household surveys where multiple people are
selected from each household, imputation of missing values should preserve the structure
pertaining to people within households and should not artificially change the apparent
intracluster correlation (ICC). This paper focuses on the use of multilevel models for
imputation of missing data in household surveys. In particular, the performance of a best
linear unbiased predictor for both stochastic and deterministic imputation using a linear
mixed model is compared to imputation based on a single level linear model, both with
and without information about household respondents.
In this paper an evaluation is carried out in the context of imputing hourly wage rate in the
Household, Income and Labour Dynamics of Australia Survey. Nonresponse is generated
under various assumptions about the missingness mechanism for persons and households,
and with low, moderate and high intra-household correlation to assess the benefits of the
multilevel imputation model under different conditions. The mixed model and single level
model with information about the household respondent lead to clear improvements when
the ICC is moderate or high, and when there is informative missingness.