Home > Publications . Search All . Browse All . Country . Browse PSC Pubs . PSC Report Series

PSC In The News

RSS Feed icon

Dynarski says NY's Excelsior Scholarship Program could crowd out low-income and minority students

U-M Poverty Solutions funds nine projects

COSSA makes 10 suggestions to next Administration for supporting and using social science research

More News

Highlights

2017 PAA Annual Meeting, April 27-29, Chicago

NIH funding opportunity: Etiology of Health Disparities and Health Advantages among Immigrant Populations (R01 and R21), open Jan 2017

Russell Sage 2017 Summer Institute in Computational Social Science, June 18-July 1. Application deadline Feb 17.

Russell Sage 2-week workshop on social science genomics, June 11-23, 2017, Santa Barbara

More Highlights

Next Brown Bag

Mon, Jan 23, 2017 at noon:
Decline of cash assistance and child well-being, Luke Shaefer

A hot-deck multiple imputation procedure for gaps in longitudinal data on recurrent events

Publication Abstract

Little, R.J., M. Yosef, K.C. Cain, B. Nan, and Sioban D. Harlow. 2008. "A hot-deck multiple imputation procedure for gaps in longitudinal data on recurrent events." Statistics in Medicine, 27(1): 103-120.

Hot-deck imputation offers advantages in reflecting salient features of data distributions in missing-data problems, but previous implementations have lacked the appeal associated with modern Bayesian statistical-computing techniques. We outline a strategy of iterative hot-deck multiple imputation with distance-based donor selection. With distance defined as a monotonic function of the difference in predictive means between cases, donors are chosen with probability inversely proportional to their distance from the donee. This method retains the implementation ease of ad hoc techniques, while incorporating the desirable features of Bayesian approaches. Special cases of our method include nearest-neighbor imputation and a simple random hot-deck. Iterating the procedure provides an analogy to Markov Chain Monte Carlo methods and is intended to mitigate dependence on starting values. Results from imputing missing values in a longitudinal depression treatment trial as well as a simulation study are presented. We evaluate how different definitions of distance, choices of starting values, the order in which variables are chosen for imputation, and the number of iterations impact inferences. We show that our measure of distance controls the tradeoff between bias and variance of our estimates. We find that inferences from the depression treatment trial are not sensitive to most definitions of distance. In addition, while differences exist between 1 iteration and 10 iterations, there are no meaningful differences between inferences based on 10 iterations and those based on 500 iterations. The choice of starting value did not have an impact on inferences but the order in which the variables were chosen for imputation was significant even after iteration.

DOI:10.1002/sim.3001 (Full Text)

Country of focus: United States of America.

Browse | Search : All Pubs | Next