Creating County-Level Statistics from Public Use Microdata Areas (PUMAS)

Mail Response Rates ACS Normally, one would use tabular data from the American Community Survey (ACS) to produce county-based statistics. However, there are population restrictions for the data. The annual data are only released for units of geography that have a population of 65,000+. For three-year data the population limit is 20,000+. The five-year data have no population limits, but it is also not very timely. This means that a file based on annual data or even three-year data have lots of the country excluded.

In addition, sometimes items are not available in the tabular data. For instance, something as simple as the percent 60+ for the white population is not available in the tabular data. Likewise, the percent of ACS questionnaires that were returned by mail is only available from the microdata files.

A solution to this is to produce PUMA-based statistics and then use a PUMA to County crosswalk to generate pseudo county-based statistics. We provide two PUMA to County widgets. One is based on output generated from American FactFinder (aff2c), while the other is expecting output from PDQ-Explore (p2c).

PUMA to County Crosswalk [aff2c] | PUMA to County Crosswalk [p2c]

Blank PUMA map US One could always illustrate statistics via PUMAs, but these are not familiar to the public. And, urban areas with multi-PUMA counties show up as dark blobs. For instance, Los Angeles County, California is comprised of 35 PUMAs. There is no way to show that detail on a US-based map.

To the right is a blank PUMA-based map of the United States. Notice all the gray splotches and the lack of identifiable county outlines. This will not make a very nice thematic map, although it would allow one to publish annual characteristics.

PUMAs are units of geography that sum to at least 100,000+. In urban areas a single county will be comprised of multiple PUMAs. Likewise, in less populous areas a single PUMA will be comprised of multiple counties. And in a bit over 5 percent of the time, counties and PUMAs share exactly the same definition - at least based on the current definitions. A little over 25 percent of counties have populations of 65,000 or more. And a little over 40 percent of counties fall below 20,000. This means that around 60 percent of counties meet the requirements for the 3-year data release. The left out counties are concentrated in the Great Plains, although all but 5 states (and the District of Columbia) have at least one county that is dependent on 5-year data.

Illustration of Creating Pseudo County-Based Statistics from PUMAs using data from California

This is an illustration for producing annual data for all California counties from PUMAs. For background of the details of California, see the spreadsheet or map that shows the sizes of California counties, based on Census Estimates. This is what the Census Bureau uses to determine the ACS samples. The counties are shaded based on their population sizes: green (population >= 65,000); yellow (population between 20,000 and 64,999); and red (population < 20,000).

Step 1:
Download a table from American Factfinder. Export it as a csv database file. In this particular example, data are for 2008, limited to California, and were based on Detailed Table B25002.

Step 2:
Use PUMA to County widget (aff2c) to generate pseudo county-based statistics. Read and follow instructions for use. For help, here is the input file used in this example. The original file downloaded from American FactFinder is here.

Below is a spreadsheet that shows the output generated from the PUMA to County crosswalk. Vacancy rates based on PUMA geography are in column G and county geography in column L.

Notice that there are missing values in Column L for all counties of populations of less than 65,000. On occasion, there can also be discrepancies between the actual county vacancy rates and the PUMA to County rates.

For instance, Nevada County has a vacancy rate of 17.21, but in the PUMA to County calculation its rate is 21.64. That is because Nevada County is too small to be in a PUMA by itself. Its PUMA is comprised of Nevada, Plumas and Sierra County. They all end up with the same value on vacancy rates using this method. The user has to decide whether to use the known value when available (e.g., for Nevada) and use the PUMA to County value if nothing is available (e.g., Plumas and Sierra).

The same thing happens for Lake and Mendocino counties. They share a PUMA so end up with the same value on the PUMA to County score (20.55), but in this case, both are also large enough to have a single year value (30.60 and 11.46 respectively).

Comparison Maps: Vacancy rates in California, 2008 based on PUMAs or Counties

Vacancy for CA PUMAs Vacancy for CA Counties

Click on images for larger version of the two maps.

Connect with PSC follow PSC on Twitter Like PSC on Facebook