Home > Data Services > Catalog . Restricted Data . Census . ACS

Search Data Services

Meta Search
search across all the following databases:

Data Catalog
Data and documentation

KnowledgeBase
Common questions and answers.

Resources
Entire collection of data resources.


Latest Data News

RSS Feed icon

Expansion of Free Lunch Could Have a Negative Effect on Research Data

You’ll live longer than you think you will [HRS data]

Federal Register Notice: Some ACS questions on the chopping block

Political Science: A self-inflicted wound?

Sage Stats and CQ Political Stats Trial

Merging data files with duplicate variable names

If you are merging two files that have variables with duplicate names, the duplicate variable names will be dropped from the first file. This happens even if the duplicate variable names do not have duplicate values.

To prevent this from happening, be aware of your variables across both data files. Rename variables if there are duplicate names that don't have duplicate values.

The following is an example of two data files that have a variable 'black.' In one file, a new variable, bdummy is created so that when the files are merged, nothing is lost.

data a;
infile 'a.dat';
input id state race black age sex earnings;
bdummy = black;
data b;
infile 'aggreg.dat';
input state white black asian namer other;
data c;
merge a b; by state;

Note that the file order is not determined by the order in which the files are read by SAS. The file order is determined by the merge statement.

data merge1;
merge a b; by id;
 
data merge2;
merge b a; by id;

In the first example, 'a' is the first data file and 'b' is the second. In the second example, the reverse is true.

If you read your log file carefully, you might notice when you have duplicate variable names across data files. The total number of variables in a merged file should be one less than the sum of the number of variables in both data sets. If you have dropped more than one variable, you have a duplicate variable problem.