Home > Data Services > Catalog . Restricted Data . Census . ACS

Search Data Services

Meta Search
search across all the following databases:

Data Catalog
Data and documentation

Common questions and answers.

Entire collection of data resources.

Latest Data News

RSS Feed icon

More census/citizenship articles

Deja vu: Citizenship question or not?

Data Visualization: Intro with Data

Data & Tools to study Global Inequality

Archiving Social Media Data

Data Services Knowledge Base


I am having trouble merging person records to the household record. There are around 5,000 person records, but when I finish my merge, I end up with almost 15,000 observations.

Here's my code:

use "W:\youth.dta"
merge hhid using "W:\hhld.dta"


A couple of things.

First, you need to sort each file before you attempt a merge. Second, it is helpful for you to look at the results of your merge. It is not unusual for there to be cases where a merge does not take place. When that happens your new file is always larger.

For instance, let's say I had a health survey in the US and had the zip code of the respondents.

If I wanted to match zip code characteristics to the respondents I could end up with 3 situations:

1 - respondent info; no zip (due to invalid respondent zip code)
2 - no respondent info; zip code info (not all zip codes are in sample)
3 - respondent info; zip code info

If there are 500 respondents, you sort of expect the final merge to be 500. However, that's if you delete the extraneous zip codes (#2). You often want to know how many #1 there are. You need to be careful about deleting them.

Anyway, here's link to a discussion of merging from the Carolina Population Center. Be sure and sort before you merge and then look at the results of your merge.


One last point, it might be easier to 'look at' your data if you just keep a few variables from the person file and a few variables from the household file. That allows you to see whether the merge is working or not. However, if you just get summary statistics on your variables you can tell if the merge is working or not.

In the example above, you'll get missing data on the respondent info if the zip code didn't match with any respondents and you'll get missing data on the zip code characteristics if the respondent's zip code didn't match with the zip code characteristics file.

Annotated Resources:

Related Question Groups: