Multiple imputation of missing income data in the National Health Interview Survey
Schenker, N., Trivellore Raghunathan, P.L. Chiu, D.M. Makuc, G.Y. Zhang, and A.J. Cohen. 2006. "Multiple imputation of missing income data in the National Health Interview Survey." Journal of the American Statistical Association, 101(475): 924-933.
The National Health Interview Survey (NHIS) provides a rich source of data for studying relationships between income and health and for monitoring health and health care for persons at different income levels. However, the nonresponse rates are high for two key items, total family income in the previous calendar year and personal earnings from employment in the previous calendar year. To handle the missing data on family income and personal earnings in the NHIS, multiple imputation of these items, along with employment status and ratio of family income to the federal poverty threshold (derived from the imputed values of family income), has been performed for the survey years 1997-2004. (There are plans to continue this work for years beyond 2004 as well.) Files of the imputed values, as well as documentation, are available at the NHIS website (http://www.cdc.gov/nchs/nhis.htm). This article describes the approach used in the multiple-imputation project and evaluates the methods through analyses of the multiply imputed data. The analyses suggest that imputation corrects for biases that occur in estimates based on the data without imputation, and that multiple imputation results in gains in efficiency as well.