Archive for the 'Data' Category

Page 2 of 27

What’s the Matter with Polling?

What is the Matter with Polling?
Cliff Zukin | New York Times
June 20, 2015

This article focuses on political polling – and predictions from political polls, but much of the content is relevant to other sorts of telephone-based opinion surveys, many of which are used by social scientists: Survey of Consumers, Pew, Gallup, etc.

The article focuses on (a) the move from landline to cellphones; (b) the growing non-response rate; (c) costs; (d) and sample metrics, e.g., representativeness.

The decline in landline phones makes telephone surveys more expensive since cell phones cannot be reached through automatic dialers. The landline phone vs cellphone distribution comes from the National Health Interview Survey. Here’s a recent summary of the data. The article summarizes this as “About 10 years ago. . . . about 6 percent of the public used only cellphones. The N.H.I.S. estimate for the first half of 2014 found that this had grown to 43 percent, with another 17 percent “mostly” using cellphones. In other words, a landline-only sample conducted for the 2014 elections would miss about three-fifths of the American public, almost three times as many as it would have missed in 2008.”

The other issue for polling is the growing non-response rate.

When I first started doing telephone surveys in New Jersey in the late 1970s, we considered an 80 percent response rate acceptable, and even then we worried if the 20 percent we missed were different in attitudes and behaviors than the 80 percent we got. Enter answering machines and other technologies. By 1997, Pew’s response rate was 36 percent, and the decline has accelerated. By 2014 the response rate had fallen to 8 percent.

Non-response makes surveys more expensive – more numbers to call to find a respondent and many of them dialed by hand if it is a cellphone universe. And, most important, is the representativeness of the sample that the survey ends up with. So far, surveys based on probability samples seem to still be representative, at least based on comparing sample characteristics to gold-standard benchmarks like the American Community Survey (ACS). Participation in the ACS is mandatory, although for the last several years, Republicans in the House have tried to remove this requirement. Canada did away with its mandatory requirements with its census, with disastrous results. The following is a compilation of posts related to the mandatory response requirement in the US and Canada: [Older Posts]

Measuring Race . . . Again

The following are collection of news stories on how the Census Bureau is planning to collect data on race. It is misleading to say that the Census Bureau will not collect data on race. Instead, of asking about Hispanic Origin and Race, the Census Bureau is likely to ask about “categories” that describe the person.

And, a new category might be “Middle Eastern or North African.”

The Census Bureau collects data on all sorts of topics, but the Office of Management and Budget (OMB) makes the final call on how the concept is measured by the Federal Statistical System. Links to the Census Bureau’s submission to OMB and a report based on internal research follow a nice summary by Pew.

Census considers new approach to asking about race – by not using the term at all
D’Vera Cohn | Pew Research Center
June 18, 2015

2010 Census Race and Hispanic Origin Alternative Questionnaire Experiment
from the 2010 Census Program for Evaluations and Experiments
Feb 28, 2013

National Content Test
Submission for OMB Review | Federal Register
May 22, 2015

Educational data vulnerable to ‘privacy’ legislation

When Guarding Student Data Endangers Valuable Research
Susan Dynarski | New York Times (Upshot Blog)
June 13, 2015

University of Michigan Public Policy professor, Susan Dynarski, warns researchers of pending legislation that would curtail sharing of educational data with researchers:

In response to such concerns, some pending legislation would scale back the authority of schools, districts and states to share student data with third parties, including researchers. Perhaps the most stringent of these proposals, sponsored by Senator David Vitter, a Louisiana Republican, would effectively end the analysis of student data by outside social scientists. This legislation would have banned recent prominent research documenting the benefits of smaller classes, the value of excellent teachers and the varied performance of charter schools.

Below is a summary of Vitter’s proposed legislation from his office:

Vitter Introduces Student Privacy Protection Act
David Vitter, R(LA) | From David’s Desk
May 14, 2015

Using Grid Maps to Visualize Data

Danny DeBelius of NPR’s Visuals Team discusses how geographic data is represented on maps and ways to make the visualization more accurate. The visualization they have landed on is the Hex-Tile map.

image of Hexagon Map

H/T Flowing Data, which shows other ways of producing this kind of map, including sheep and Darth Vaders.

How to Ask for Datasets

Christian Kreibich at medium.com provides some helpful tips for asking other researchers to share their data.

I’m a systems researcher. I work with data, plenty of it. Over the past decade I have sent lots of data inquiries, and have received dozens. Judging by the latter it’s safe to say that people often go about this poorly, so I’d like to give a bit of advice regarding how to formulate inquiries to other researchers. But before we start, a few clarifications. This article is dataset-centric, but the concerns apply similarly to resources such as algorithms, methods, or code. Also, I assume you have done your background research and already know whom to ask. This is not a guide for finding useful stuff. Finally, the following is by no means a complete guide on how to collaborate with other researchers, but it might provide some tips regarding how to start such a collaboration.

H/T Flowing Data

American Community Survey (ACS) Data Products Survey

The American Community Survey Office is conducting a survey to gather feedback on it’s products:

The ACS data products consist of tabulated products, such as aggregated estimates found in detailed tables or data profiles in the American FactFinder, and the Public Use Microdata Sample (PUMS) Files. We need your feedback in order to provide relevant and timely data products that are easy to access and use.

Please take a moment to complete this survey. Your responses will help us evaluate the ACS data products and dissemination and find ways to improve them. Please respond no later than May 29, 2015.

We estimate the survey will take 15 minutes to complete.

H/T Data Detectives

Big Data in 1848

In 1848, newspaper magnate and Representative Horace Greeley used open records to compare the mileage reimbursements of his fellow representatives to the postal routes (which should have been the shortest routes between districts and the U.S. capital). He found several, including Abraham Lincoln, overcharged significantly.

See Scott Klein’s story at ProPublica.

H/T Wonkblog

ACS Median Earnings by Detailed Occupation

The U.S. Census Bureau released 2013 Earnings by Sex and Detailed Occupation tables from the American Community Survey. Other tables include Sex, Race, and Hispanic Origin by Occupation: 2012 and Median Earnings of College Graduates by Field of Bachelor’s Degree and Occupation: 2012.

All table packages are here.

H/T Data Detectives

Big Government and Big Data

Ben Casselman of FiveThirtyEight examines the legal, bureaucratic and practical impediments the U.S. government faces in collecting and disseminating data about U.S. citizens.

When the government wants to know how many people are unemployed, it calls people and asks them whether they’re working. When it wants to know how quickly prices are rising, it sends researchers to stores to check price tags. And when it wants to know how much consumers are spending, it mails forms to thousands of retailers asking about their sales.

“Big data” may have revolutionized industries from advertising to transportation, but many of our most vital economic statistics are still based on methods that are decidedly, well, small.

Read the full article

PAA President Ruggles wants you to write a letter

The is an excellent summary of the consequences of the demise of the 3-year ACS tabular products. Please follow through and contact the relevant government officials:

ACS 3-Year Summary Products: Please take action to save the ACS 3-year data products
Steve Ruggles | PAA President and Director of the Minnesota Population Center
March 4, 2015