Home > Data Services > Catalog . Restricted Data . Census . ACS

Search Data Services

Meta Search
search across all the following databases:

Data Catalog
Data and documentation

KnowledgeBase
Common questions and answers.

Resources
Entire collection of data resources.


Latest Data News

RSS Feed icon

Census Bureau is considering adding a new Middle East/North Africa ethnic category

Big Data Initiative at NIH-OBSSR

UM Now Has Access to IndiaStat

NIH adds substantial set of genetic, health information to online database

Treasure Trove: US Congressional District Shapefiles, 1789-2012

Data Services Knowledge Base

Q:  

I am having trouble merging person records to the household record. There are around 5,000 person records, but when I finish my merge, I end up with almost 15,000 observations.

Here's my code:

use "W:\youth.dta"
merge hhid using "W:\hhld.dta"

A:  

A couple of things.

First, you need to sort each file before you attempt a merge. Second, it is helpful for you to look at the results of your merge. It is not unusual for there to be cases where a merge does not take place. When that happens your new file is always larger.

For instance, let's say I had a health survey in the US and had the zip code of the respondents.

If I wanted to match zip code characteristics to the respondents I could end up with 3 situations:

1 - respondent info; no zip (due to invalid respondent zip code)
2 - no respondent info; zip code info (not all zip codes are in sample)
3 - respondent info; zip code info

If there are 500 respondents, you sort of expect the final merge to be 500. However, that's if you delete the extraneous zip codes (#2). You often want to know how many #1 there are. You need to be careful about deleting them.

Anyway, here's link to a discussion of merging from the Carolina Population Center. Be sure and sort before you merge and then look at the results of your merge.

http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial

One last point, it might be easier to 'look at' your data if you just keep a few variables from the person file and a few variables from the household file. That allows you to see whether the merge is working or not. However, if you just get summary statistics on your variables you can tell if the merge is working or not.

In the example above, you'll get missing data on the respondent info if the zip code didn't match with any respondents and you'll get missing data on the zip code characteristics if the respondent's zip code didn't match with the zip code characteristics file.

Annotated Resources:

Related Question Groups: