Archive for the 'Data' Category

Open Data Executive Order

This news was quite exciting for the data community and it should be for researchers as well. See the tweets in reaction to this:

The Reaction on Twitter

Here’s the Executive Order:

Open Data Policy – Managing Information as An Asset
May 9, 2013

And a White House blog post about it:

Landmark Steps to Liberate Open Data
Todd Park and Steve VanRoekel | White House Blog
May 9, 2013

And here’s the Project Open Data website:

Project Open Data

This event also got some coverage in the popular news. Here’s the pros and cons via the Wall Street Journal:

‘Open Data’ Brings Potential And Perils for Government
Ben Rooney | Wall Street Journal
May 9, 2013

Here’s the rest of the coverage in the popular press:
[Link]

Measuring Marriage & Divorce among Same-Sex Couples

For Gays, Breaking Up Is Hard to Do – or Measure
Carl Bialik | Wall Street Journal [print column]
May 3, 2013
This article touches on the personal and on the aggregate. The personal stories are couples being unable to get a divorce because they live in states that do not recognize same-sex marriages. On the other hand, states have not modified divorce forms to collect data on same-sex couples.

Same-Sex Divorce Stats Lag
Carl Bialik | Wall Street Journal [blog]
May 3, 2013
This version provides links to sources of marriage and divorce statistics. European countries do collect data on these events, but so far do not have enough dissolutions to calculate robust rates. An NIH-funded study is following a cohort of couples who were married in Vermont.

Decennial Census Data on Same Sex Couples
Census Bureau
May 2013
The Census Bureau has a website with links to technical papers, data, etc. on same-sex couples from 1990+ as measured by this agency.

Census Bureau: Flaws in Same-Sex Couple Data
D’Vera Cohn | Pew: Social and Demographic Trends
September 27, 2011
The Census Bureau announced today that more than one-in-four same-sex couples counted in the 2010 Census was likely an opposite-sex couple, and identified a confusing questionnaire as a likely culprit. The bureau released a new set of “preferred” same-sex counts, including its first tally ever of same-sex spouses counted in the census.

How Accurate Are Counts of Same-Sex Couples?
D’Vera Cohn | Pew: Social and Demographic Trends
August 25, 2011
This is a nice brief on the obstacles to accuracy in measuring same-sex couples in census data. And, it illustrates the efforts that the Census Bureau makes in measuring concepts in an era of rapid social change.

NetMob 2013 Conference

This conference is an illustration of how mobile phone data are being used to plan bus routes and to map SES cleavages in neighborhoods, etc. At the PAA conference this year, one of the papers in the social media section used cell phone data on a very small scale.

Conference paper
Conference Site
Program (pdf)
Book of abstracts
D4D – The D4D book (122 MB). This book contains copies of all the submissions to the D4D challenge that have been selected for NetMob. It is a large file (850 pages).

Nature: Challenges in irreducible research

The following post is from a recent issue of Nature, which highlights the importance of replication and open data for science. However, some of the examples might apply more to medicine or biology than population science. Lest, readers think that this issue doesn’t apply to demographers, here’s a tweet from Justin Wolfers, advertising a piece in Bloomberg Business on the importance of replication for the field of economics. His motivation is the recent dust-up due to an error in a famous paper by Reinhart and Rogoff [See PSC-Info], but the discussion is much broader than that example.

tweet

[Link to Stevenson/Wolfers Replication article]

INTRODUCTION TO SPECIAL NATURE ISSUE
No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help.

EDITORIAL

Reducing our irreproducibility
(25 April ,2013)

Further confirmation needed
A new mechanism for independently replicating research findings is one of several changes required to improve the quality of the biomedical literature.
Nature Biotechnology 30, 806 ( 10 September 2012 )

Error Prone
Biologists must realize the pitfalls of work on massive amounts of data.
Nature 487, 406 ( 26 July 2012 )

Must Try Harder
Too many sloppy mistakes are creeping into scientific papers. Lab heads must look more rigorously at the data — and at themselves.
Nature 483, 509 ( 29 March 2012 )

NEWS AND ANALYSIS

Independent labs to verify high-profile papers
Monya Baker
Nature News ( 14 August 2012 )

Power Failure: Why small sample size undermines the reliability of neuroscience
Katherine S. Button, John P. A. Ioannidis et al.
Nature Reviews Neuroscience 14, 365-376 ( 15 April 2013 )

Replication studies: Bad copy
Ed Yong
Nature 485, 298-300 ( 17 May 2012 )

Reliability of ‘new drug target’ claims called into question
Asher Mullard
Nature Reviews Drug Discovery 10, 643-644 ( September 2011 )

COMMENT

If a job is worth doing, it is worth doing twice
Jonathan F. Russell
Nature 496, 7 ( 04 April 2013 )

Methods: Face up to false positives
Daniel MacArthur
Nature 487, 427-429 ( 26 July 2012 )

Drug development: Raise standards for preclinical cancer research
C. Glenn Begley & Lee M. Ellis
Nature 483, 531-533 ( 29 March 2012 )

Believe it or not: how much can we rely on published data on potential drug targets?
Florian Prinz, Thomas Schlange & Khusru Asadullah
Nature Reviews Drug Discovery 10, 712 ( September 2011 )

Tackling the widespread and critical impact of batch effects in high-throughput data
Jeffrey T. Leek, Robert B. Scharpf et al.
Nature Reviews Genetics 11, 733-739 ( October 2010 )

PERSPECTIVES AND REVIEWS

Research methods: know when your numbers are significant
David L. Vaux
Nature 492, 180-181 ( 13 December 2012 )

A call for transparent reporting to optimize the predictive value of preclinical research
Story C. Landis, Susan G. Amara et al.
Nature 490, 187-191 ( 11 October 2012 )

Next-generation sequencing data interpretation: enhancing reproducibility and accessibility
Anton Nekrutenko & James Taylor
Nature Reviews Genetics 13, 667-672 ( September 2012 )

The case for open computer programs
Darrel C. Ince, Leslie Hatton & John Graham-Cumming
Nature 482, 485-488 ( 23 February 2012 )

Reuse of public genome-wide gene expression data
ohan Rung & Alvis Brazma
Nature Reviews Genetics 14, 89-99 ( February 2013 )

ACS to drop “Number of Times Married” question

This notice is from a Minnesota Population Studies Center data alert:

Dear IPUMS User,

I am writing to alert you that the Census Bureau is planning to drop the question on “number of times married” from the American Community Survey. For those of us who study family demography, this change would be a major loss. The times married question is not only vital for understanding blended families, it is also necessary for basic studies of nuptiality and marital instability. A recent working paper by Sheela Kennedy and myself demonstrated that the ACS is the only reliable source currently available for national divorce statistics. Without the number of times married, however, the divorce data will be badly compromised; for example, it will be impossible to construct a life table for first marriages, or to estimate the percentage of people who have ever divorced.

The news of this plan appears in the Federal Register in a single sentence at the end of an otherwise harmless notice of request for comments. If you believe as I do that this change would significantly harm the nation’s statistical infrastructure, you should make your feelings known to the responsible OMB desk officer, Dr. Brian Harris-Kojetin. He can be reached at (202) 395-7245 or by email at bharrisk@omb.eop.gov. The deadline for comments is May 16.

Thank you,

Steven Ruggles
Regents Professor
Director, Minnesota Population Center

Research from The Data Privacy Lab

Respondent re-identification is a big worry for data projects who want to share their data. And, some recent cases illustrate that can/is occurring with genetic data. But, sometimes the case is over-stated. Here is an illustration with a case that hit the press with great fanfare.

First, the fun stuff. See, if you are unique. The following link has you type in your gender, exact age of birth and your 5-digit zip code. The latter two do not meet HIPAA guidelines:

Next are several links: The first is the coverage of re-identification in the press (Forbes, The Scientist, & xxxx) followed by the researcher’s version of the story (Sweeney). The next is a rebuttal, which reminds readers that administrative matches, e.g., voting registration are not as ubiquitous as some claim. There is also a link to an article by Barth-Jones where he discusses the famous case of the re-identification of Governor William Weld, which lead to much of the HIPAA rules.

Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study
Adam Tanner | Forbes
April 24, 2013

Participants in Personal Genome Project Identified by Privacy Experts
MIT Technology Review
May 1, 2013

“Anonymous” Genomes Identified
Dan Cossins | The Scientist
May 3, 2013

Identifying Participants in the Personal Genome Project by Name
Latanya Sweeney, Akua Abu, Julia Winn | Data Privacy Lab

Reporting Fail: The Reidentification of Personal Genome Project Participants
Jane Yakowitz Bambauer | Info/Law [Harvard Law Blogs]
May 1, 2013

The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
Daniel C. Barth-Jones | Social Science Research Network (SSRN)
June 4, 2012

“it’s an Alice in Wonderland moment”

A post last week covered the House Republican "Census Reform Act of 2013." Here is some useful commentary from the national press.

GOP Census Bill Would Eliminate America's Economic Indicators
Michael McAuliffe | Huffington Post
May 1, 2013
This post has the best quotes:

Indeed, the government would not be able to produce any of the major economic indices that move markets every month, said multiple statistics experts, who were aghast at the proposal.

"They simply wouldn't exist. We won't have an unemployment rate," said Ken Prewitt, the former director of the U.S. Census who is now a professor of public affairs at Columbia University.

"I don't know how the market reacts if there is suddenly no unemployment rate at the start of the month," Prewitt said. "How does the market react if we don't have a GDP [gross domestic product]?"

"Do they understand that these data that the Census Bureau collects are fundamental to everything else that's done?" asked Maurine Haver, founder of business research firm Haver Analytics and a past president of the National Association for Business Economics. "They think the country doesn't need to know how many people are unemployed, either?"

"Independent observers had a hard time wrapping their minds around the legislation."

"It's hard to take this seriously because they're really saying also they don't want GDP. They want no facts about what's going on in the U.S. economy," said Haver. "It's so fundamental to a free society that we have this kind of information, I can't fathom where they're coming from. I really can't."

"It's so unimaginable. It would be like saying we don't need policemen anymore, we don't need firemen anymore," said Prewitt. "To say suddenly we don't need statistical information about the American economy, or American society, or American demography, or American trade, or whatever -- it's an Alice in Wonderland moment."

"Just as the House effort to stop the ACS went nowhere in the Senate last year, the current bill looks similarly likely to die there."

But supporters of the Census Bureau and of government-backed science are acutely aware that pieces of such measures have a way of getting attached to higher-priority legislation. In March, a measure from Sen. Tom Coburn (R-Okla.) that bars the National Science Foundation from doing political science research this year slid through the Senate attached to legislation to keep the government running.

And Duncan's bill comes as Congress has already proposed slashing the Census budget 13 percent below the president's request, and the bureau lacks a director to complain. There is also no secretary or deputy secretary at the Commerce Department, which oversees the bureau and would generally advocate its cause in Congress.

A new GOP bill would prevent the government from collecting economic data
Dylan Matthews | Washington Post Wonk Blog
May 1, 2013
One should never read comments, but this one by ottoparts sums this proposed bill succinctly: "The Party of Stupid strikes again"

Some selected quotes from the article:

"In what’s becoming a biennial tradition, another House Republican wants to cut the Census down to size. Rep. Jeff Duncan (R-S.C.) is rolling out the Census Reform Act this week, having formally introduced it April 18."

"It’s hard to overstate the loss of knowledge that this bill would bring about."

And, the best is an image that explains the "WHY" of this bill:


Congress: No more unemployment data for U.S.
Dan Primack | CNN Money
May 1, 2013
"Republican representatives want to gut the way we collect national economic data."

FORTUNE -- Bummed out by the latest unemployment or GDP report? Don't worry, Congress wants to help you out. Not by adding jobs or increasing productivity, but by eliminating the government surveys that help calculate such statistics in the first place.

Republicans introduce census reform bill that would end unemployment estimates
Stephen C. Webster | The Raw Story blog
May 1, 2013
"If the Census Reform Act of 2013 (PDF) becomes law, all data-gathering efforts at the U.S. Census Bureau except for the once-a-decade census mandated by the Constitution would come to an end."

"Republicans in the House tried and failed to kill the ACS last year. That sentiment appears to have returned in Duncan’s new bill, albeit in a much broader fashion."

Best Way to Deal With Unemployment? Don’t Track It.
Holly Scott | HardHatters Blog
May 1, 2013

Risk factor for a stroke? Living in the stroke-belt as a teen

This study is based on a cohort study most demographers are probably not familiar with, “The Reasons for Geographic and Racial Differences in Stroke study.” It is a relatively large study with residential histories of panel participants. If you are interested in finding out more about these data, here’s a link to the researcher portal to the project website.

Maybe this should be replicated and extended with the PSID as it covers a longer time period. Stroke mortality patterns have also experienced a shift according to Casper ML, Wing S, Anda RF, Knowles M, Pollard RA (May 1995).”The shifting stroke belt. Changes in the geographic pattern of stroke mortality in the United States, 1962 to 1988″. Stroke 26 (5): 755–60. PMID 7740562.

Teenage Years in the Stroke Belt
Nicholas Bakalar | The New York Times
April 29, 2013

Effect of duration and age at exposure to the Stroke Belt on incident stroke in adulthood
Virginia Howard, et.al. | Neurology
April 29, 2013
Abstract | pdf

Special Issue on Survey Non-response

Introduction: New Challenges to Social Measurement
Douglas S. Massey and Roger Tourangeau
Abstract | PDF

Facing the Nonresponse Challenge
Frauke Kreuter
Abstract | PDF

Explaining Rising Nonresponse Rates in Cross-Sectional Surveys
J. Michael Brick and Douglas Williams
Abtract | PDF

Response Rates in National Panel Surveys
Robert F. Schoeni, Frank Stafford, Katherine A. Mcgonagle, and Patricia Andreski
Abstract | PDF

Consequences of Survey Nonresponse
Andy Peytchev
Abstract | PDF

The Use and Effects of Incentives in Surveys
Eleanor Singer and CongYe
Abstract | PDF

Paradata for Nonresponse Adjustment
Kristen Olson
Abstract | PDF

Can Administrative Records Be Used to Reduce Nonresponse Bias?
John L. Czajka
Abstract | PDF

An Assessment of the Multi-level Integrated Database Approach
Tom W. Smith and Jibum Kim
Abstract | PDF

Where Do We Go from Here? Nonresponse and Social Measurement
Douglas S. Massey and Roger Tourangeau

Abstract | PDF

The Twitter paper from PAA’s “social media” session

Using Twitter for Demographic and Social Science Research:
Tools for Data Collection

T. McCormick, H. Lee, N. Cesare and A. Shojaie | CSSS/University of Washington
April 8 2013
This is a proof of concept paper. The researchers searched through tweets for phrases that indicated an intention to “not vote” in the 2012 election. They used Amazon’s Mechanical Turk to identify the profile pictures of their sample (age, gender, race).

Folks interested in other examples of “wild data” like Google searches, Twitter, etc. should check these posts:

Wild Data: Expanding Social Science Research
Big Data: Google Flu
Using Wild Data to Estimate International Migration