In-house Data Utilities Overview [ Index]
The following are programs that enable data users to examine
their data before analysis, help in the extraction of a file
for analysis, allow for quick analysis of data, and help with
the handling of tapes using UNIX.
Below is a listing of the available data utility programs. A
brief description of each program is provided. A complete
write-up of each program is available in subsequent archive
notes.
-
extract
- This program rectangularizes hierarchical data sets. It
allows for the selection of selected variables and simple
exclusions.
-
simsel
- This program allows for the selection of all records that
meet a user-specified criterion. A typical use would be to
pull all the county summary level data from summary tape
files (STF) from the U.S. censuses.
-
codecount
- This program provides frequencies of a data set on a
column by column basis. The main purpose of this program is
to make sure that reasonable values are in the data. If a
variable you are interested in is one column (such as sex),
the program will provide the actual counts for males and
females. However, if the variable takes up two columns (such
as age) or more (SMSA codes), the program shows the counts on
a column-by-column basis rather than a variable-by-variable
basis.
-
column
- This program allows users to pull off selected columns of
data. The program should be used to see the contents of
selected columns rather than as an extraction tool, as it
does not allow one to simultaneously pull off columns 6-7,
11-13, 49-50, and 54-57. If you need to do this, use
selflat.
-
checklen
- This program provides a description of the structure of a
data set. The parameters it provides are number of records,
maximum line length, and minimum line length.
-
divide
- This program takes a data set as input and divides it
into a maximum of 99 subfiles. Its main purpose is to allow
one to write out a large data set from UNIX to a multi-volume
external device.