Archive for the 'Methodology' Category

The Antidote for “Anecdata”: A Little Science Can Separate Data Privacy Facts from Folklore

The Antidote for “Anecdata”: A Little Science Can Separate Data Privacy Facts from Folklore
Daniel Barth-Jones | Info/Law Blog [Harvard]
November 21, 2014
This is a great piece that shows again that most of the publicity about re-identification in data are overblown:

The 11 in 173 million risk demonstrated for this celebrity ride re-identification (or 1 in 15,743,614) is truly infinitesimal. To put this in perspective, this risk is over 1,000 times smaller than one’s lifetime risk of being hit by lighting. With proper de-identification applied and the cryptographic hash problem fixed in any future data releases, this spooky specter of celebrity cyber-stalking using TLC taxi data is likely to vanish as soon as one turns on the lights.

This blog post is in reaction to the release of NYC taxi medallion data, which were improperly anonymized. A previous blog post described the data.

Here is the piece that sensationalizes the possibility of re-identification, based on famous people who ride cabs.
Riding with the Stars: Passenger Privacy in the NYC Taxicab Dataset
Anthony Tockar | Neustar Blog
September 15, 2014

Big Data: NYC Taxi Cab Trips

This is a big data resource, and more. Check out the reaction to the bad anonymization here.

20GB of uncompressed data comprising more than 173 million individual trips. Each trip record includes the pickup and dropoff location and time, anonymized hack licence number and medallion number (i.e. the taxi’s unique id number, 3F38, in my photo above), and other metadata.

Before the link to the data, here’s an analysis based on similar data:
Why New Yorkers Can’t Find a Taxi When It Rains
Eric Jaffe | City Lab Blog
October 20, 2014
Provides a nice synopsis of some research using taxi cab rides. Read it for the links to the formal research papers.

New York City Taxi Cab Trips [in small chunks]

FOILing NYC’s Taxi Trip Data
Chris Whong | personal website of an Urbanist, Mapmaker, Data Junkie
March 18, 2014
a synopsis of how he got the data via a FOIA request & a link to the data on rides/fares as single files, instead of the chunked version above.

and the story about how the taxicab medallion IDs were improperly anonymized:

Poorly anonymized logs reveal NYC cab drivers’ detailed whereabouts
Dan Goodin | ars technica
June 23, 2014

On Taxis and Rainbows: Lessons from NYC’s improperly anonymized taxi logs
Vijay Pandurangan | Medium blog

Getting A More Accurate Count Of Arab Americans

The PSC Infoblog has reported on this earlier here, but this is still of interest.

The U.S. Census Is Trying To Get A More Accurate Count Of Arab Americans
Ben Casselman | fivethirtyeight.com Blog
November 24, 2014

Note that this article mentions that the Census Bureau did a special tabulation for Homeland Security to provide counts of Arab populations by geography (place and zip code).

Some Arabs have expressed reluctance to identify themselves on a government form, especially after the Census Bureau shared detailed data on the Arab-American population with the Department of Homeland Security in the early 2000s

These “detailed tabulations” referenced above, were public use tables from American FactFinder. Here’s the original FOIA request from the Electronic Privacy Information Center:

FOIA request: Department of Homeland Security Obtained Data on Arab Americans From Census Bureau [Source: EPIC]

Here is the example for Places drawn from DP-2. Here’s the example for Zip Codes drawn from (Tables PCT16 and PCT17).

Measuring Race in the Census: Its Fluid

Researchers at the Census Bureau and the Minnesota Population Center matched census responses across the 2000 and 2010 Census and examined changes in the Race/Hispanic origin responses. They found that the biggest change was Hispanics changing their “Other Race” choice in 2000 to “White” in 2010. The groups least likely to change responses were those who identified themselves as non-Hispanic white, black or Asian in 2000.

America’s Churning Races: Race and Ethnic Response Changes between Census 2000 and the 2010 Census
Center for Administrative Records Research and Applications & Minnesota Population Center
CARRA Working Paper #2014-09
August 8, 2014

This paper is nicely summarized here – based on the PAA version:

Millions of Americans changed their racial or ethnic identity from one census to the next
D’Vera Cohn | Pew Research Center
May 5, 2014

As a reminder, the Census Bureau has spent the better part of the past year, looking to change how it collects data on race and ethnicity.

U.S. Census looking at big changes in how it asks about race and ethnicity
D’Vera Cohn | Pew
March 14, 2014

2010 Census Race and Hispanic Origin Alternative Questionnaire Experiment[website]
2010 Census Race and Hispanic Origin Alternative Questionnaire Experiment[report]

Federal Register Notice: Some ACS questions on the chopping block

You can comment on the Census Bureau’s plans to remove some questions from the American Community Survey (marriage history and field of study in college) via the Federal Register:

link to Federal Register Notice

A working paper by Kennedy and Ruggles provides some talking points on the marriage history question: “Breaking up is Hard to Count . . . ” And, quite a number of researchers of the STEM population, including the migration of STEM folks, ought to be interested in the field of study question.

It is interesting to note that questions that were thought to be vulnerable (flush toilet, leaving time for work, income, and mental/emotional disability) were unscathed. For historical purposes (e.g., April 2014) it is interesting to review a summary of these touchy questions.

Here is a summary of how the Census Bureau came up with the questions to be eliminated. It comes down to a grid of mandated/required questions x user burden/cost:

American Community Survey (ACS) Content Review
Gary Chappell |Census Bureau
October 9, 2014

Other helpful links are on the ACS Content Review website.

Political Science: A self-inflicted wound?

Stanford University and Dartmouth have sent an open apology letter to the state of Montana for a voting experiment conducted by political scientists at their respective institutions. The study had IRB approval, at least from Dartmouth. It uses a database of ideological scores based on donors to identify the political affiliation of judges. See this Upshot article on the start-up company, Crowdpac, that developed this database. Montana is quite irritated with the use of the state of Montana seal on the mailer. Did that get through the IRB?

The letters:

Senator John Tester’s letter to Stanford & Dartmouth | The apology letter

Messing with Montana: Get-out-the -Vote Experiment Raises Ethics Questions
Melissa R. Michelson | The New West (blog of the Western Political Science Association)
October 25, 2014

Today, the Internet exploded with news about and reactions to a get-out-the-vote field experiment fielded by three political science professors that may have broken Montana state law and, at a minimum, called into question the ethics of conducting experiments that might impact election results.

Professors’ Research Project Stirs Political Outrage in Montana
Derek Willis | NY Times
October 28, 2014

Universities say they regret sending Montana voters election mailers criticized for being misleading
Hunter Schwarz | The Washington Post
October 29, 2014

Bayesian Statistics in the New York Times

By: F. D. Flam

The method was invented in the 18th century by an English Presbyterian minister named Thomas Bayes — by some accounts to calculate the probability of God’s existence. In this century, Bayesian statistics has grown vastly more useful because of the kind of advanced computing power that didn’t exist even 20 years ago…

…Now Bayesian statistics are rippling through everything from physics to cancer research, ecology to psychology. Enthusiasts say they are allowing scientists to solve problems that would have been considered impossible just 20 years ago. And lately, they have been thrust into an intense debate over the reliability of research results.

Read the full article

Breaking News: Protect the 2020 Census

Senator David Vitter (R-LA) has filed an amendment to the FY2015 Commerce, Justice, and Science Appropriations bill (H.R. 4660), being considered now in the U.S. Senate, that would prohibit the Census Bureau from spending funds on the 2020 Census unless it includes questions regarding U.S. citizenship and immigration status. [Source: APDU]

Below are links to talking points related to this amendment:
Vitter Amendment Talking Points [from the Census Project]

Vitter Census Amendment to Require Questions about Illegal Aliens [Press Release, David Vitter]

Supreme Court rebuffs Louisiana’s 2010 Census Suit [PSC Info Blog]

Apportionment Resources: Legal [PSC Info Blog]

Another rule from NIH: Demographers might get to ignore this one

NIH wants the routine gender bias in basic research to end. This mostly applies to animals used in laboratory research, e.g. mice or cell cultures. An earlier directive from NIH required clinical trials to include women and minorities.

Policy: NIH to balance sex in cell and animal studies
Janine A. Clayton and Francis S. Collins | Nature
May 14, 2014
html | pdf

Labs Are Told to Start Including a Neglected Variable: Females
Roni Rabin | New York Times
May 15, 2014

Related History:
Monitoring Adherence to the NIH Policy on the Inclusion of Women and Minorities in Clinical Research
National Institutes of Health (NIH) | 2013

Research on Health Disparities: Incarceration Matters

High Incarceration Rates among Black Men Enrolled in Clinical Studies may Compromise Ability to Identify Disparities
Emily Wang, et.al. | Health Affairs
May 13, 2014
html | pdf

This is a nice note, which examines the selectivity introduced into studies when participants are lost to a study due to incarceration – primarily black men. The paper discusses a suggested change in the IRB regulations on studying prisoners, which would help address this selectivity issue. The Vox article below discusses the history of IRB rules, given that this would not be common knowledge among a more general reader pool.

Doctors can’t research the health of black men, because they keep getting sent to prison
Dara Lind | Vox
May 13, 2014