Home > Data Services > Catalog . Restricted Data . Census . ACS

Search Data Services

Meta Search
search across all the following databases:

Data Catalog
Data and documentation

KnowledgeBase
Common questions and answers.

Resources
Entire collection of data resources.


Latest Data News

RSS Feed icon

Tools: Data as Text

More fun with names

Counting Same-Sex Couples

The Antidote for “Anecdata”: A Little Science Can Separate Data Privacy Facts from Folklore

Big Data: NYC Taxi Cab Trips

Data Services Knowledge Base

Q:  

I am merging two data sets and end up with too few variables. What is going on?

A:  

The most likely problem is that you have duplicate variable names across your two files. It is probably best to use a suffix or prefix in variable names so that you can tell which items come from which file. For instance, phght and mhght might represent the person’s self-reported height and m_hght might represent height from a medical record.

In SAS, the duplicate variable is dropped from the first file in the merge statement. The following provides more detail about how SAS handles duplicate variables with merges:

http://www.psc.isr.umich.edu/dis/data/prgmlib/sas/merge1.html

stata handles duplicate variable names the reverse of SAS. In other words, it drops the duplicate variable from the second file named or the ‘using’ data file.

  use temps
  merge id using tempm

In this case, the height variable from the temporary medical file (tempm) would be dropped and the height variable from the self-reported file would be kept.

 use tempm 
 merge id using temps

In this case, the height variable from the temporary self-report file will be dropped.

Annotated Resources:

Direct Links:

Related Question Groups: