Home > Data Services > Catalog . Restricted Data . Census . ACS

Search Data Services

Meta Search
search across all the following databases:

Data Catalog
Data and documentation

KnowledgeBase
Common questions and answers.

Resources
Entire collection of data resources.


Latest Data News

RSS Feed icon

Tools: Data as Text

More fun with names

Counting Same-Sex Couples

The Antidote for “Anecdata”: A Little Science Can Separate Data Privacy Facts from Folklore

Big Data: NYC Taxi Cab Trips

In-house Data Utilities Overview     [ Index]

The following are programs that enable data users to examine their data before analysis, help in the extraction of a file for analysis, allow for quick analysis of data, and help with the handling of tapes using UNIX.

Below is a listing of the available data utility programs. A brief description of each program is provided. A complete write-up of each program is available in subsequent archive notes.

extract
This program rectangularizes hierarchical data sets. It allows for the selection of selected variables and simple exclusions.
simsel
This program allows for the selection of all records that meet a user-specified criterion. A typical use would be to pull all the county summary level data from summary tape files (STF) from the U.S. censuses.
codecount
This program provides frequencies of a data set on a column by column basis. The main purpose of this program is to make sure that reasonable values are in the data. If a variable you are interested in is one column (such as sex), the program will provide the actual counts for males and females. However, if the variable takes up two columns (such as age) or more (SMSA codes), the program shows the counts on a column-by-column basis rather than a variable-by-variable basis.
column
This program allows users to pull off selected columns of data. The program should be used to see the contents of selected columns rather than as an extraction tool, as it does not allow one to simultaneously pull off columns 6-7, 11-13, 49-50, and 54-57. If you need to do this, use selflat.
checklen
This program provides a description of the structure of a data set. The parameters it provides are number of records, maximum line length, and minimum line length.
divide
This program takes a data set as input and divides it into a maximum of 99 subfiles. Its main purpose is to allow one to write out a large data set from UNIX to a multi-volume external device.