Data Services Knowledge Base:
Data Management: Sample Design Questions

  1. I have funny results for the distribution of education using 1950 IPUMS data. What am I doing wrong?
  2. Can one interpolate between the 2000 and 5-year ACS (2005-2009) to get characteristics of census tracts? In other words, can the 5-year data be thought of as a snapshot for 2010?
  3. We've had some interest on campus in developing a data archive for data generated by research performed by the college's students and faculty. We're working on developing procedures for migrating disclosure risk for human published guidance on what is an acceptable disclosure risk level. My statistical consultant informs me that it is mathematically impossible to get the disclosure risk (as measured by mu-Argus) down to 0, but nobody seems to be able to tell me what level of disclosure risk other archives consider appropriate , what level the U.S. government uses when preparing its microdata products, etc.
  4. How can an intercensal estimate change? I have an estimate from a P-25 report for July 1, 1977 and it does not agree with what is on the Census Bureau estimation web site.
  5. What proportion of American Community Survey (ACS) interviews end up in the ACS microdata samples?
  6. What types of institutions are considered "group quarters" in the 2006 ACS? How will the inclusion of GQ affect comparisons with previous ACS?
  7. Where can I find out more information on the quality of the data used in the American Community Survey?
  8. I am using data from the American Community Survey (ACS) for Monroe and Lenawee counties in Michigan. The tables from American Factfinder have margins of error for all the cells. However, if I am combining the two counties is there a way to calculate new margins of error based on this larger population?
  9. I want to combine the 2005 and 2006 ACS microdata. What do I do to the weights?
  10. Why do the counts from SF1 and SF3 differ?
  11. Why are there zero weights in the 1990 public use microdata file for the U.S. census?
  12. I am working with the National Comorbidity Survey Replication (NCS-R) and I want to know the exact sampling method, universe eligibility, etc. for the following items: TB15L: Self report of tobacco as causation of emotional problems SC21, SC22, SC23: Variables related to depression DA31B_101: Religious preference DA40: Age of mother when you were born PEA52: Personality question - “I often feel empty inside"
  13. The PSID changed their sample in 1997 - reduced the number of original families that are included in the survey. How does this affect the weights?
  14. Has anyone combined the 1999-2002 data with the 2003-2004 NHANES data? I'm using the 4yr weights for the 99-02 but I'm not sure what to do with the 2yr weights on the 2003-2004 data - there don't appear to be updates with 6yr weights. How do I handle the weights when combining 03-04 with the earlier data?
  15. How does one handle weights in a longitudinal analysis with differential sample attrition over time, etc? I am using selected years in the NLSY.
  16. The PSID is not based on a simple random sample. What variables should I use for complex sample survey variance estimation?