Home > Data Services > Catalog . Restricted Data . Census . ACS

Search Data Services

Meta Search
search across all the following databases:

Data Catalog
Data and documentation

KnowledgeBase
Common questions and answers.

Resources
Entire collection of data resources.


Latest Data News

RSS Feed icon

Political Science: A self-inflicted wound?

Sage Stats and CQ Political Stats Trial

Breaking News: Protect the 2020 Census

The Fastest Growing US Cities Are Mostly In the West

Another rule from NIH: Demographers might get to ignore this one

Merging data files with duplicate variable names

If you are merging two files that have variables with duplicate names, the duplicate variable names will be dropped from the first file. This happens even if the duplicate variable names do not have duplicate values.

To prevent this from happening, be aware of your variables across both data files. Rename variables if there are duplicate names that don't have duplicate values.

The following is an example of two data files that have a variable 'black.' In one file, a new variable, bdummy is created so that when the files are merged, nothing is lost.

data a;
infile 'a.dat';
input id state race black age sex earnings;
bdummy = black;
data b;
infile 'aggreg.dat';
input state white black asian namer other;
data c;
merge a b; by state;

Note that the file order is not determined by the order in which the files are read by SAS. The file order is determined by the merge statement.

data merge1;
merge a b; by id;
 
data merge2;
merge b a; by id;

In the first example, 'a' is the first data file and 'b' is the second. In the second example, the reverse is true.

If you read your log file carefully, you might notice when you have duplicate variable names across data files. The total number of variables in a merged file should be one less than the sum of the number of variables in both data sets. If you have dropped more than one variable, you have a duplicate variable problem.