Mon, Jan 23, 2017 at noon:
H. Luke Shaefer
Increasingly investigators append census-based socioeconomic characteristics of residential areas to individual records to surmount the problem of inadequate socioeconomic information on health data sets. Little empirical attention has been given to the validity of this approach. The authors analyze samples from two nationally representative data sets, the 1985 Panel Study of Income Dynamics and the 1988 National Maternal and Infant Health Survey, each linked to 1970 and 1980 United States Census data. They investigate whether statistical power is sensitive to the timing of census data collection or to the level of aggregation of the census data; whether different census items are conceptually distinct; and whether using multiple aggregate measures in health outcome equations improves prediction compared to a single aggregate measure. The authors find little difference in estimates when using 1970 compared to 1980 Census data or zip code compared to tract level variables. However, aggregate variables are highly multicollinear. Associations of health outcomes with aggregate measures are substantially weaker than with microlevel measures. The authors conclude that aggregate measures cannot be interpreted as if they were microlevel variables nor should a specific aggregate measure be interpreted as representing the effects of what it is labeled.