Data Quality Issues with the American Community Survey (ACS)

The American Community Survey is a replacement for the census long-form. One major difference between the census long-form and the ACS data is that ACS data are released on an annual basis rather than once every 10 years. This means that users get fresh socioeconomic data every year, including for small geographies like census tracts. The data for geographic units of less than 65,000 are based on multi-year estimates. 1

However, there are some data quality issues with the ACS data. The ACS data are noisier than data from the census long-form. This means that users should be prepared to either collapse cells or combine geographies to improve data metrics. The PSC ACS Aggregator Tool can help with these procedures.


The census long-form data is sample data (approximately 1 in 6 in 2000). And, yet the Census Bureau never published margins of error with the tabular data from the decennial census. In the table below is a marital status spreadsheet for Delaware counties. The sheets provide links to data from Census 2000; ACS 2006-2010; and ACS 2010.

Table 1. Sex by Marital Status for the Population 15 Years and Over for Delaware Counties: Census 2000 Summary File 3 (SF 3) - Sample Data; ACS 2006-2010; and ACS 2010] [Excel File]

Notice that the estimates from Census 2000 (the first sheet) do not include any margins of error. There are 13,092 never married men in Kent County. Not 13,092 +/- 300.

However, the ACS data have always included margins of error. The following sheet is an update to the Census 2000 version of marital status in Delaware counties, with the addition of margins of error to the count estimates. The number of never married men in Kent County based on a rolling sample from 2006-2010 is reported as 18,789 +/- 598.

The reason that the ACS has always included margins of error is because the original sample design for the ACS was going to yield higher margins of error than the census long-form data. The Census Bureau wanted to make sure that users were aware that the data had uncertainty. In fact, every table includes a footnote about the degreee of uncertainty in the data.2

There are several reasons for the increased margins of error in the ACS as compared to the long-form census data, which we will describe in some detail later. The first reason is differences in the sampling ratio/sample size across the two samples. The ACS fares worse in this comparison. The second explanation is the mixed-mode sample design in the ACS, which uses sampling in the in-person follow-up. This was primarily a cost consideration as the in-person interviews are approximately 10 times more expensive than the mail/telephone modes (Griffin, 2009). And, finally, the ACS does not have population controls for smaller geographies like census tracts and zip codes. This makes the calibration of the weights somewhat problematic. [See Census Bureau, 2011b & Velkoff, 2008 for further discussion of population controls and the ACS.]

Sample Woes
The original ACS sample was approximately 2.9 million addresses a year or about a 1-40 sampling fraction (Census Bureau, 2012a). Cumulating to a 5-year rolling sample yields a 1-8 sample compared to a 1-6 sample for the 2000 census long-form. This translates to a margin of error that will be 1.25 times larger for the ACS(Navarro, 2010). Exacerbating the sample size issue was that the sample was never quite as large as planned. In addition, the US population grew 9.7% over the decade making the sampling fraction even smaller (Census Bureau, 2011a).

Mixed Mode
The non-response follow-up between the ACS and the Census is different. The census is an enumeration of all households and every household has to be contacted, albeit after 6-8 visits, the Census Bureau uses proxy reports from neighbors.

The ACS uses a mixed-mode methodology (Census Bureau, 2009). The original sample receives a mail questionnaire (just like the decennial census methodology). However, the following month non-responding households are followed up via telephone (for those households where a phone number can be matched to the address). The following month approximately a 1 in 3 sample of the non-respondents is drawn. This sample is weighted to represent all of the non-respondents.

Figure 1. Mail Partipation Rates: ACS, 2006-2008.

Mail Response Rates ACS Household non-response is not even across geographies. While the mail response rate is approximately .51 for the US, it ranges from less than 10 percent in some counties in the US to close to 80 percent in others. [See Figure 1].3 The low response areas will have larger margins of error than the high response areas, all other things equal.

Population Controls
In the Decennial Census, there were population controls for census tract data (e.g., the short-form). For the ACS, there are population controls for states and counties, but not for census tracts. This increases the size of the standard errors by 15 to 25 percent (Starsinic, 2005). The Census Bureau has a research program underway to get a better handle on the sampling frame at small geographies.

The ACS was designed to produce estimates with a coefficient of variation (CV) about 1.33 larger than the CV of corresponding long form estimates. The most current results show that the CV is about 1.75 the CV of the corresponding long form estimates (Navarro, 2010).

Census Bureau Solutions

The Census Bureau is well aware of these issues and how they impinge on data quality. First, the sample size has been increased to 3.45 million households a year. It has increased the sampling fraction from 1 in 3 to something larger for low-response areas and for smaller census tracts (Alexander and Scanniello, 2011). Of course, this means that high response areas and larger census tracts will likely have smaller sampling fractions.

On an educational front, the Census Bureau has provided user-friendly guides to the ACS with the Compass Series. These include a technical appendix that provides easy-to-follow guidance on the relevant statistical concepts a user needs to know (margins of error, standard errors, coefficient of variation, statistical significance, etc.). In addition, there are clear explanations for how to combine and collapse cells and recalibrate the data metrics (e.g., margins of error, coefficient of variation, etc.). However, this could prove tedious to users, so we provide an ACS Aggregation Tool based on the Compass Series appendices.

How noisy are data in the ACS?

Short answer is pretty noisy. The following presents two solutions to noisy data: (a)collapse cells; and (b)combine geographies.

Collapse Cells
If we go back to sheet 2 in Table, which shows marital status x sex for Delaware counties using 5-year data, the results are fairly robust. The two smallest counties in Delaware are Kent and Sussex and the coeffient of variation (CV) for the number divorced for males is 5.7 and 3.8 respectively. The Census Bureau recommends that the CV be less than 10. [A reminder: the formulas for calculating the coefficient of variation can be found on page A-13 of the technical appendix.]

While the 5-year data for the smaller Delaware counties is robust, most users prefer to use 1-year data for county-based data because it is fresher data. If we pull off the same table for 2010, instead of the 5-year data, 2006-2010, many of the results for Kent and Sussex Counties are not really usable and these are not tiny counties with populations of 5,000 or census tracts with even smaller population counts.

The coefficient of variation for the number separated among males is 24.7 in Kent County and 28.9 in Sussex County, well above the recommended CV of 10. Even if we combine males and females - effectively doubling the sample, the CV still is not within the recommended values. The CV for the separated population (males + females) for Kent and Sussex Counties are 18.8 and 18.7 respectively.

Combine Geographies
Another example of noisy data are for the smaller units of geography such as census tracts and zip codes. These data are only available as 5-year samples, and yet the estimates for any one census tract is likely to be quite noisy. For example, looking at the number of female-headed families in the Capitol Hill population area in Seattle [Table 2] finds that most of the census tracts have high CVs for the poverty estimates. For instance, the CV for female-headed families in the first census tract, 006200, is 28.1.

Table 2. [Selected Household Characteristics for the Capitol Hill sub-area: Seattle, WA [Table B11001, ACS 2006-2010] [Excel File]

However, if we combine these 11 census tracts into a "neighborhood" the metrics are much improved, albeit still above 10. The CV for the number of female-headed families is 15.7.


To reiterate, the mantra for using ACS tabular data is that one must examine the data metrics carefully before drawing conclusions. If the data are too noisy one should collapse cells or combine geographies. Use the ACS Aggregatorfor either of these procedures.

One other solution is to move from 1-year to 3-year or 5-year data. This works for geographies of 20,000+ but will not work for smaller geographies like census tracts or zip codes, which are only available as 5-year data.

Other Technical Questions?

Contact Lisa Neidert for further questions.


1. [See Data Products Release Schedule].

2. Footnote that accompanies every ACS table: Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see Accuracy of the Data).] The effect of nonsampling error is not represented in these tables.

3. This map was created from public-use data microdata. The fraction is the unweighted proportion of households who answered "mail" on response-mode divided by the unweighted sample in that PUMA. The PUMA-based data were converted to counties, based on a PUMA to County crosswalk.


Alexander, Trent and Nicole Scanniello. 2011 "American Community Survey Program Updates." Paper presented at the annual meetings of the Association of Public Data Users (APDU), Washington, D.C. (September 22, 2011).

Census Bureau website. 2012a. American Community Survey: Methodology/Sample Size - Data

Census Bureau website. 2012b. American Community Survey: Handbooks for Data Users

Census Bureau. 2011a. "Intercensal estimates of the Resident Population by Sex and Age for the United States: April 1, 2000 to July 1, 2010 [xls].

Census Bureau, Population Division. 2011b. "American Community Survey Research Note: Change in Population Controls." [September 22, 2011] [URL:]

Census Bureau. 2009. "Design and Methodology: American Community Survey. U.S. Government Printing Office, Washington, D.C. April 2009.

Griffin, Deborah. 2009. "What the American Community Survey Can Tell us About Mixed Mode Surveys." Presented at the Institute for Social Research, University of Michigan (February 3, 2009).

Navarro, Freddie. 2010. "The ACS: Fulfilling its Promise to Data Users." Presented at the annual meetings of the Association of Public Data Users (APDU), Washington, DC (September 21, 2010).

Starsinic, Michael. 2005. "American Community Survey: Improving Reliability for Small Area Estimates," Proceedings of the 2005 Joint Statistical Meetings, Pp. 3592-3599).

Velkoff, Victoria. 2008. "The Use of Population Estimates as Controls to the American Community Survey: An Evalution." Census Bureau: Population Division. [URL:]

Connect with PSC follow PSC on Twitter Like PSC on Facebook