search across all the following databases:
Data and documentation
Common questions and answers.
Entire collection of data resources.
I am merging two data sets and end up with too few variables. What is going on?
The most likely problem is that you have duplicate variable names across your two files. It is probably best to use a suffix or prefix in variable names so that you can tell which items come from which file. For instance, phght and mhght might represent the person’s self-reported height and m_hght might represent height from a medical record.
In SAS, the duplicate variable is dropped from the first file in the merge statement. The following provides more detail about how SAS handles duplicate variables with merges:
stata handles duplicate variable names the reverse of SAS. In other words, it drops the duplicate variable from the second file named or the ‘using’ data file.
use temps merge id using tempm
In this case, the height variable from the temporary medical file (tempm) would be dropped and the height variable from the self-reported file would be kept.
use tempm merge id using temps
In this case, the height variable from the temporary self-report file will be dropped.