Archive for the 'Data' Category

Lessons from North of the Border

Why a Voluntary ACS Could Wipe Some States off of the Map
Terri Ann Lowenthal | The Census Project Blog
May 17, 2013

This is a great re-cap of the disaster Canada has on its hands with its voluntary National Household Survey. And, it is relevant for the US, because Congressional Republicans want to allow people to ‘just say no’ to all or part of the American Community Survey. She also reminds readers of the history of the marriage question in the US Census, including the possible deletion of the “times married” question.

The PSC-Info blog has several links to recent ACS/Census funding news:

ACS to drop “Number of Times Married” question

“it’s an Alice in Wonderland moment” or “GOP Census Bill would Eliminate America’s Economic Indicators”

The Census Reform Act of 2013

The ACS Faces More Battles

SENATE: The Census Bureau has already written the reports; read them.

Nerd Alert: Dictionary of Numbers

For those of you who try to incorporate quantitative reasoning in your teaching, here’s a nice resource:

Dictionary of numbers: putting numbers in human terms
This is a Google Chrome extension that tries to make sense of numbers you encounter on the web by giving you a description of that number in human terms. Because “8 million people” means nothing, but “population of New York City” means everything.

And, here’s a blog post about it from the nerd-friendly xkcd site – a webcomic of romance, sarcasm, math, and language:

Dictionary of Numbers
May 15, 2013
Opening paragraph:

I don’t like large numbers without context. Phrases like “they called for a $21 billion budget cut” or “the probe will travel 60 billion miles” or “a 150,000-ton ship ran aground” don’t mean very much to me on their own. Is that a large ship? Does 60 billion miles take you outside the Solar System? How much is $21 billion compared to the overall budget?

International Migration Statistics for the US

Here are several links related to international migration in the US from the Census Burea.

International Migration is Projected to become Primary Driver of U.S. Population Growth for the first time in Nearly Two Centuries
Census Bureau
May 15, 2013
This link goes to an overview page. To the right are links to detailed tables and graphs showing migration and natural increase and population by age group.

Estimating Net International Migration for 2010 Demographic Analysis: An Overview of Methods and Results
Renuka Bhaskar, et.al. | Census Bureau
February 2013
This working paper is relevant for Demographic Analysis – technique used to understand the age, sex, and racial composition of a population and how it has changed over time via births, deaths, and migration. Here is a link to the Demographic Analysis site at the Census Bureau.

The Foreign Born [Census Bureau website]
This includes links to an infographic - part of which is included below on America’s foreign born in the last 50 years, data from the American Community Survey on home ownership, STEM degrees, newly arrived, and region-specific reports. There is also a 2010 tables package from the Current Population Survey.

graph of foreign born over time

[Link to complete infographic]

Open Data Executive Order

This news was quite exciting for the data community and it should be for researchers as well. See the tweets in reaction to this:

The Reaction on Twitter

Here’s the Executive Order:

Open Data Policy – Managing Information as An Asset
May 9, 2013

And a White House blog post about it:

Landmark Steps to Liberate Open Data
Todd Park and Steve VanRoekel | White House Blog
May 9, 2013

And here’s the Project Open Data website:

Project Open Data

This event also got some coverage in the popular news. Here’s the pros and cons via the Wall Street Journal:

‘Open Data’ Brings Potential And Perils for Government
Ben Rooney | Wall Street Journal
May 9, 2013

Here’s the rest of the coverage in the popular press:
[Link]

Measuring Marriage & Divorce among Same-Sex Couples

For Gays, Breaking Up Is Hard to Do – or Measure
Carl Bialik | Wall Street Journal [print column]
May 3, 2013
This article touches on the personal and on the aggregate. The personal stories are couples being unable to get a divorce because they live in states that do not recognize same-sex marriages. On the other hand, states have not modified divorce forms to collect data on same-sex couples.

Same-Sex Divorce Stats Lag
Carl Bialik | Wall Street Journal [blog]
May 3, 2013
This version provides links to sources of marriage and divorce statistics. European countries do collect data on these events, but so far do not have enough dissolutions to calculate robust rates. An NIH-funded study is following a cohort of couples who were married in Vermont.

Decennial Census Data on Same Sex Couples
Census Bureau
May 2013
The Census Bureau has a website with links to technical papers, data, etc. on same-sex couples from 1990+ as measured by this agency.

Census Bureau: Flaws in Same-Sex Couple Data
D’Vera Cohn | Pew: Social and Demographic Trends
September 27, 2011
The Census Bureau announced today that more than one-in-four same-sex couples counted in the 2010 Census was likely an opposite-sex couple, and identified a confusing questionnaire as a likely culprit. The bureau released a new set of “preferred” same-sex counts, including its first tally ever of same-sex spouses counted in the census.

How Accurate Are Counts of Same-Sex Couples?
D’Vera Cohn | Pew: Social and Demographic Trends
August 25, 2011
This is a nice brief on the obstacles to accuracy in measuring same-sex couples in census data. And, it illustrates the efforts that the Census Bureau makes in measuring concepts in an era of rapid social change.

Cell phone data for research

The following conference was based on the use of cell phone data for research – mostly involving mobility, but also group differences in work/residential location. Demographers are starting to use this data source. We link below to a paper in the social media session at PAA 2013.

Net Mobility Conference 2013

Conference Program (pdf)

Submissions to D4D challange (122 MB). This book contains copies of all the submissions to the D4D challenge that have been selected for NetMob. It is a large file (850 pages).

Winning Paper
African Bus Routes Redrawn Using Cell-Phone Data
David Talbot | MIT Technical Review
April 30,2013

Paper from PAA 2013 Social Media Session
New Approaches to Human Mobility: Using Mobile Phones for Demographic Research
John Palmer, et al.
April 11-13, 2013

Nature: Replication, replication, replication

This issue of Nature is a compilation of replication articles across several issues of Nature. They highlight the importance of replication and open data for science. However, some of the examples might apply more to medicine or biology than population science. Lest, readers think that this issue doesn’t apply to demographers, here’s a tweet from Justin Wolfers, advertising a piece in Bloomberg Business on the importance of replication for the field of economics. His motivation is the recent dust-up due to an error in a famous paper by Reinhart and Rogoff [See PSC-Info], but the discussion is much broader than that example.

tweet

[Link to Stevenson/Wolfers Replication article]

INTRODUCTION TO SPECIAL NATURE ISSUE
No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help.

Reducing our irreproducibility (April 25 , 2013)
[Editorial]

Further confirmation needed (September 10, 2012)
A new mechanism for independently replicating research findings is one of several changes required to improve the quality of the biomedical literature.
Nature Biotechnology 30, 806
[Editorial]

Error Prone (July 26, 2012)
Biologists must realize the pitfalls of work on massive amounts of data.
Nature 487, 406
[Editorial]

Must Try Harder (March 29, 2012)
Too many sloppy mistakes are creeping into scientific papers. Lab heads must look more rigorously at the data — and at themselves.
Nature 483, 509 x
[Editorial]

NEWS AND ANALYSIS

Independent labs to verify high-profile papers (August 14, 2012)
Monya Baker
Nature News

Power Failure: Why small sample size undermines the reliability of neuroscience
(April 15, 2013)
Katherine S. Button, John P. A. Ioannidis et al.
Nature Reviews Neuroscience 14, 365-376

Replication studies: Bad copy (May 17, 2012 )
Ed Yong
Nature 485, 298-300

Reliability of ‘new drug target’ claims called into question(September 2011)
Asher Mullard
Nature Reviews Drug Discovery 10, 643-644

COMMENT

If a job is worth doing, it is worth doing twice (April 4, 2013 )
Jonathan F. Russell
Nature 496, 7

Methods: Face up to false positives (July 26, 2012 )
Daniel MacArthur
Nature 487, 427-429

Drug development: Raise standards for preclinical cancer research (March 29, 2012 )
C. Glenn Begley & Lee M. Ellis
Nature 483, 531-533

Believe it or not: how much can we rely on published data on potential drug targets? (September 2011 )
Florian Prinz, Thomas Schlange & Khusru Asadullah
Nature Reviews Drug Discovery 10, 712

Tackling the widespread and critical impact of batch effects in high-throughput data (October 2010)
Jeffrey T. Leek, Robert B. Scharpf et al.
Nature Reviews Genetics 11, 733-739 )

PERSPECTIVES AND REVIEWS

Research methods: know when your numbers are significant (December 13, 2012 )
David L. Vaux
Nature 492, 180-181

A call for transparent reporting to optimize the predictive value of preclinical research (October 11, 2012)
Story C. Landis, Susan G. Amara et al.
Nature 490, 187-191

Next-generation sequencing data interpretation: enhancing reproducibility and accessibility (September 2012 )
Anton Nekrutenko & James Taylor
Nature Reviews Genetics 13, 667-672

The case for open computer programs (February 23,2012 )
Darrel C. Ince, Leslie Hatton & John Graham-Cumming
Nature 482, 485-488

Reuse of public genome-wide gene expression data (February 2013 )
ohan Rung & Alvis Brazma
Nature Reviews Genetics 14, 89-99

ACS to drop “Number of Times Married” question

This notice is from a Minnesota Population Studies Center data alert:

Dear IPUMS User,

I am writing to alert you that the Census Bureau is planning to drop the question on “number of times married” from the American Community Survey. For those of us who study family demography, this change would be a major loss. The times married question is not only vital for understanding blended families, it is also necessary for basic studies of nuptiality and marital instability. A recent working paper by Sheela Kennedy and myself demonstrated that the ACS is the only reliable source currently available for national divorce statistics. Without the number of times married, however, the divorce data will be badly compromised; for example, it will be impossible to construct a life table for first marriages, or to estimate the percentage of people who have ever divorced.

The news of this plan appears in the Federal Register in a single sentence at the end of an otherwise harmless notice of request for comments. If you believe as I do that this change would significantly harm the nation’s statistical infrastructure, you should make your feelings known to the responsible OMB desk officer, Dr. Brian Harris-Kojetin. He can be reached at (202) 395-7245 or by email at bharrisk@omb.eop.gov. The deadline for comments is May 16.

Thank you,

Steven Ruggles
Regents Professor
Director, Minnesota Population Center

Research from The Data Privacy Lab

Respondent re-identification is a big worry for data projects who want to share their data. And, some recent cases illustrate that can/is occurring with genetic data. But, sometimes the case is over-stated. Here is an illustration with a case that hit the press with great fanfare.

First, the fun stuff. See, if you are unique. The following link has you type in your gender, exact age of birth and your 5-digit zip code. The latter two do not meet HIPAA guidelines:

Next are several links: The first is the coverage of re-identification in the press (Forbes, The Scientist, & xxxx) followed by the researcher’s version of the story (Sweeney). The next is a rebuttal, which reminds readers that administrative matches, e.g., voting registration are not as ubiquitous as some claim. There is also a link to an article by Barth-Jones where he discusses the famous case of the re-identification of Governor William Weld, which lead to much of the HIPAA rules.

Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study
Adam Tanner | Forbes
April 24, 2013

Participants in Personal Genome Project Identified by Privacy Experts
MIT Technology Review
May 1, 2013

“Anonymous” Genomes Identified
Dan Cossins | The Scientist
May 3, 2013

Identifying Participants in the Personal Genome Project by Name
Latanya Sweeney, Akua Abu, Julia Winn | Data Privacy Lab

Reporting Fail: The Reidentification of Personal Genome Project Participants
Jane Yakowitz Bambauer | Info/Law [Harvard Law Blogs]
May 1, 2013

The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
Daniel C. Barth-Jones | Social Science Research Network (SSRN)
June 4, 2012

“it’s an Alice in Wonderland moment”

A post last week covered the House Republican "Census Reform Act of 2013." Here is some useful commentary from the national press.

GOP Census Bill Would Eliminate America's Economic Indicators
Michael McAuliffe | Huffington Post
May 1, 2013
This post has the best quotes:

Indeed, the government would not be able to produce any of the major economic indices that move markets every month, said multiple statistics experts, who were aghast at the proposal.

"They simply wouldn't exist. We won't have an unemployment rate," said Ken Prewitt, the former director of the U.S. Census who is now a professor of public affairs at Columbia University.

"I don't know how the market reacts if there is suddenly no unemployment rate at the start of the month," Prewitt said. "How does the market react if we don't have a GDP [gross domestic product]?"

"Do they understand that these data that the Census Bureau collects are fundamental to everything else that's done?" asked Maurine Haver, founder of business research firm Haver Analytics and a past president of the National Association for Business Economics. "They think the country doesn't need to know how many people are unemployed, either?"

"Independent observers had a hard time wrapping their minds around the legislation."

"It's hard to take this seriously because they're really saying also they don't want GDP. They want no facts about what's going on in the U.S. economy," said Haver. "It's so fundamental to a free society that we have this kind of information, I can't fathom where they're coming from. I really can't."

"It's so unimaginable. It would be like saying we don't need policemen anymore, we don't need firemen anymore," said Prewitt. "To say suddenly we don't need statistical information about the American economy, or American society, or American demography, or American trade, or whatever -- it's an Alice in Wonderland moment."

"Just as the House effort to stop the ACS went nowhere in the Senate last year, the current bill looks similarly likely to die there."

But supporters of the Census Bureau and of government-backed science are acutely aware that pieces of such measures have a way of getting attached to higher-priority legislation. In March, a measure from Sen. Tom Coburn (R-Okla.) that bars the National Science Foundation from doing political science research this year slid through the Senate attached to legislation to keep the government running.

And Duncan's bill comes as Congress has already proposed slashing the Census budget 13 percent below the president's request, and the bureau lacks a director to complain. There is also no secretary or deputy secretary at the Commerce Department, which oversees the bureau and would generally advocate its cause in Congress.

A new GOP bill would prevent the government from collecting economic data
Dylan Matthews | Washington Post Wonk Blog
May 1, 2013
One should never read comments, but this one by ottoparts sums this proposed bill succinctly: "The Party of Stupid strikes again"

Some selected quotes from the article:

"In what’s becoming a biennial tradition, another House Republican wants to cut the Census down to size. Rep. Jeff Duncan (R-S.C.) is rolling out the Census Reform Act this week, having formally introduced it April 18."

"It’s hard to overstate the loss of knowledge that this bill would bring about."

And, the best is an image that explains the "WHY" of this bill:


Congress: No more unemployment data for U.S.
Dan Primack | CNN Money
May 1, 2013
"Republican representatives want to gut the way we collect national economic data."

FORTUNE -- Bummed out by the latest unemployment or GDP report? Don't worry, Congress wants to help you out. Not by adding jobs or increasing productivity, but by eliminating the government surveys that help calculate such statistics in the first place.

Republicans introduce census reform bill that would end unemployment estimates
Stephen C. Webster | The Raw Story blog
May 1, 2013
"If the Census Reform Act of 2013 (PDF) becomes law, all data-gathering efforts at the U.S. Census Bureau except for the once-a-decade census mandated by the Constitution would come to an end."

"Republicans in the House tried and failed to kill the ACS last year. That sentiment appears to have returned in Duncan’s new bill, albeit in a much broader fashion."

Best Way to Deal With Unemployment? Don’t Track It.
Holly Scott | HardHatters Blog
May 1, 2013