Archive for the 'Methodology' Category

Page 4 of 9

Canada’s “NSF” Problem

House Republicans are trying to implement serious changes to the evaluation and funding of NSF science [here and here].

Canada is perhaps a bit further down this road. Here’s the latest on the decision to fund research that has industry applications rather than basic science.

When science goes silent
Jonathan Gatehouse | MacLean’s
May 3, 2013
This article touches on the shift in funding from basic science to applied science, but it is more in-line with an earlier post on the muzzling of environmental scientists.

National Research Council move shifts feds’ science role
Canadian Press | CBC News
May 7, 2013
‘Job-neutral’ restructuring to make agency streamlined, efficient and functional, president says

The Harper government is telling the National Research Council to focus more on practical, commercial science and less on fundamental science that may not have obvious business applications.

The government says the council traditionally was a supporter of business, but has wandered from that in recent years — and will now get back to working on practical applications for industries.

Some folks disagree with this shift:

In a statement, the executive director of the Canadian Association of University Teachers said the government is “killing the goose that laid the golden egg.”

“By transforming the NRC into a “business-driven, industry-relevant” organization, you are denying its ability to support basic research,” said Jim Turk.

“At the same time, you are cutting support to basic research in the universities.”

And is this part of the Tory ‘war on science’? [more coverage on this]

NDP science critic Kennedy Stewart called the shift in direction for the NRC “short-sighted” and said it could actually hurt economic growth in the long run, because it scales back the kind of fundamental research that can lead to scientific breakthroughs.

Research Council to focus on commercially viable projects, rather than science for science’s sake
Jessica Hume | Sun News
May 7, 2013
Two quotes say it all:

The government of Canada believes there is a place for curiosity-driven, fundamental scientific research, but the National Research Council is not that place.

“Scientific discovery is not valuable unless it has commercial value,” John McDougall, president of the NRC, said in announcing the shift in the NRC’s research focus away from discovery science solely to research the government deems “commercially viable”.

Nature: Replication, replication, replication

This issue of Nature is a compilation of replication articles across several issues of Nature. They highlight the importance of replication and open data for science. However, some of the examples might apply more to medicine or biology than population science. Lest, readers think that this issue doesn’t apply to demographers, here’s a tweet from Justin Wolfers, advertising a piece in Bloomberg Business on the importance of replication for the field of economics. His motivation is the recent dust-up due to an error in a famous paper by Reinhart and Rogoff [See PSC-Info], but the discussion is much broader than that example.


[Link to Stevenson/Wolfers Replication article]

No research paper can ever be considered to be the final word, and the replication and corroboration of research results is key to the scientific process. In studying complex entities, especially animals and human beings, the complexity of the system and of the techniques can all too easily lead to results that seem robust in the lab, and valid to editors and referees of journals, but which do not stand the test of further studies. Nature has published a series of articles about the worrying extent to which research results have been found wanting in this respect. The editors of Nature and the Nature life sciences research journals have also taken substantive steps to put our own houses in order, in improving the transparency and robustness of what we publish. Journals, research laboratories and institutions and funders all have an interest in tackling issues of irreproducibility. We hope that the articles contained in this collection will help.

Reducing our irreproducibility
(April 25 , 2013)

Further confirmation needed
A new mechanism for independently replicating research findings is one of several changes required to improve the quality of the biomedical literature.
Nature Biotechnology 30, 806
(September 10, 2012)

Error Prone
Biologists must realize the pitfalls of work on massive amounts of data.
Nature 487, 406
(July 26, 2012)

Must Try Harder
Too many sloppy mistakes are creeping into scientific papers. Lab heads must look more rigorously at the data — and at themselves.
Nature 483, 509 x
(March 29, 2012)


Independent labs to verify high-profile papers
Monya Baker
Nature News
(August 14, 2012)

Power Failure: Why small sample size undermines the reliability of neuroscience
Katherine S. Button, John P. A. Ioannidis et al.
Nature Reviews Neuroscience 14, 365-376
(April 15, 2013)

Replication studies: Bad copy
Ed Yong
Nature 485, 298-300
(May 17, 2012)

Reliability of ‘new drug target’ claims called into question
Asher Mullard
Nature Reviews Drug Discovery 10, 643-644
(September 2011)


If a job is worth doing, it is worth doing twice
Jonathan F. Russell
Nature 496, 7
(April 4, 2013)

Methods: Face up to false positives )
Daniel MacArthur
Nature 487, 427-429 \
(July 26, 2012)

Drug development: Raise standards for preclinical cancer research )
C. Glenn Begley & Lee M. Ellis
Nature 483, 531-533
(March 29, 2012

Believe it or not: how much can we rely on published data on potential drug targets? )
Florian Prinz, Thomas Schlange & Khusru Asadullah
Nature Reviews Drug Discovery 10, 712
(September 2011)

Tackling the widespread and critical impact of batch effects in high-throughput data
Jeffrey T. Leek, Robert B. Scharpf et al.
Nature Reviews Genetics 11, 733-739 )
(October 2010)


Research methods: know when your numbers are significant
David L. Vaux
Nature 492, 180-181
(December 13, 2012)

A call for transparent reporting to optimize the predictive value of preclinical research
Story C. Landis, Susan G. Amara et al.
Nature 490, 187-191
(October 11, 2012)

Next-generation sequencing data interpretation: enhancing reproducibility and accessibility
Anton Nekrutenko & James Taylor
Nature Reviews Genetics 13, 667-672
(September 2012)

The case for open computer programs
Darrel C. Ince, Leslie Hatton & John Graham-Cumming
Nature 482, 485-488
(February 23, 2012)

Reuse of public genome-wide gene expression data
ohan Rung & Alvis Brazma
Nature Reviews Genetics 14, 89-99
(February 2013)

Research from The Data Privacy Lab

Respondent re-identification is a big worry for data projects who want to share their data. And, some recent cases illustrate that can/is occurring with genetic data. But, sometimes the case is over-stated. Here is an illustration with a case that hit the press with great fanfare.

First, the fun stuff. See, if you are unique. The following link has you type in your gender, exact age of birth and your 5-digit zip code. The latter two do not meet HIPAA guidelines:

Next are several links: The first is the coverage of re-identification in the press (Forbes, The Scientist, & xxxx) followed by the researcher’s version of the story (Sweeney). The next is a rebuttal, which reminds readers that administrative matches, e.g., voting registration are not as ubiquitous as some claim. There is also a link to an article by Barth-Jones where he discusses the famous case of the re-identification of Governor William Weld, which lead to much of the HIPAA rules.

Harvard Professor Re-Identifies Anonymous Volunteers In DNA Study
Adam Tanner | Forbes
April 24, 2013

Participants in Personal Genome Project Identified by Privacy Experts
MIT Technology Review
May 1, 2013

“Anonymous” Genomes Identified
Dan Cossins | The Scientist
May 3, 2013

Identifying Participants in the Personal Genome Project by Name
Latanya Sweeney, Akua Abu, Julia Winn | Data Privacy Lab

Reporting Fail: The Reidentification of Personal Genome Project Participants
Jane Yakowitz Bambauer | Info/Law [Harvard Law Blogs]
May 1, 2013

The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
Daniel C. Barth-Jones | Social Science Research Network (SSRN)
June 4, 2012

Special Issue on Survey Non-response

Introduction: New Challenges to Social Measurement
Douglas S. Massey and Roger Tourangeau
Abstract | PDF

Facing the Nonresponse Challenge
Frauke Kreuter
Abstract | PDF

Explaining Rising Nonresponse Rates in Cross-Sectional Surveys
J. Michael Brick and Douglas Williams
Abtract | PDF

Response Rates in National Panel Surveys
Robert F. Schoeni, Frank Stafford, Katherine A. Mcgonagle, and Patricia Andreski
Abstract | PDF

Consequences of Survey Nonresponse
Andy Peytchev
Abstract | PDF

The Use and Effects of Incentives in Surveys
Eleanor Singer and CongYe
Abstract | PDF

Paradata for Nonresponse Adjustment
Kristen Olson
Abstract | PDF

Can Administrative Records Be Used to Reduce Nonresponse Bias?
John L. Czajka
Abstract | PDF

An Assessment of the Multi-level Integrated Database Approach
Tom W. Smith and Jibum Kim
Abstract | PDF

Where Do We Go from Here? Nonresponse and Social Measurement
Douglas S. Massey and Roger Tourangeau

Abstract | PDF

The Twitter paper from PAA’s “social media” session

Using Twitter for Demographic and Social Science Research:
Tools for Data Collection

T. McCormick, H. Lee, N. Cesare and A. Shojaie | CSSS/University of Washington
April 8 2013
This is a proof of concept paper. The researchers searched through tweets for phrases that indicated an intention to “not vote” in the 2012 election. They used Amazon’s Mechanical Turk to identify the profile pictures of their sample (age, gender, race).

Folks interested in other examples of “wild data” like Google searches, Twitter, etc. should check these posts:

Wild Data: Expanding Social Science Research
Big Data: Google Flu
Using Wild Data to Estimate International Migration

The Unauthorized Immigrant Population: Two Technical Excercises

This blog entry has two nice technical pieces. The first describes how PEW Hispanic (and others) estimate the undocumented population in the US. The second is a life-table exercise, which shows how many of the undocumented population will die waiting for citizenship – assuming a 13 year wait time.

Unauthorized Immigrants: How Pew Research Counts Them and What We Know About Them
Interview with Jeff Passel | Pew Hispanic
April 17, 2013
In this interview, Passel describes how he estimates the undocumented population in the US – including the other characteristics of this population, e.g., occupation, current residence, family composition, etc. using data from the Current Population Survey.

As a note, most of the reports PEW Hispanic writes on the undocumented population have an appendix, which provides a more technical description of the methodology. See page 25 of the following report for an example: Cohen, D’Vera and Jeff Passel. 2011. “Unauthorized Immigrant Population: National and State Trends, 2010″ Pew Hispanic: February 1, 2011.

The life-table exercise is from Philip Cohen’s Family Inequality blog.

How many people should die waiting for citizenship? 319,462?
Philip Cohen | Family Inequality Blog
April 24, 2013

This is a life-table exercise, taking the current age distribution of the undocumented population in the US and applying a life-table for Hispanics to the numbers. He describes his assumptions and invites folks to re-calibrate the numbers.

Note that Cohen takes a dig at Reinhart and Rogoff [previous PSC Infoblog entry] by making his spreadsheet available. And, he notes “If you don’t like the way Excel does the maths, by all means, fix it in R.”

Living Apart Together: Data & Research

Living Apart Together: Uncoupling Intimacy and Co-Residence
S. Duncan, M. Phillips, S. Roseneil, J. Carter & M. Stoilova | NatCen Social Research Policy Brief
Winter 2013
Major conclusions from the research are (a) some “singles” are in LAT relationships; (b) living alone doesn’t always means being alone; and (c) intimacy doesn’t always imply co-residence

Note, a similar policy brief for the Canadian LAT population is in an earlier PSC-Info blog entry.

The Census Reform Act of 2013

This proposed legislation is really radical [H.R. 1638]. It would eliminate all surveys collected by the Census Bureau: Economic Census, Census of Governments, Census of Agriculture and a non-existent mid-decade census. Furthermore, it would limit the census to a population count.

In short:

(a) Notwithstanding any other provision of law–
(1) the Secretary may not conduct any survey, sampling, or other questionnaire, and may only conduct a decennial census of population as authorized under section 141; and
(2) any form used by the Secretary in such a decennial census may only collect information necessary for the tabulation of total population by States
(b) Repeal of Survey, Questionnaire, or Sampling Authority- Sections 182, 193, and 195 of title 13, United States Code, are repealed.

The Census Project Blog discusses this in more detail:
What We Don’t Know Can’t Hurt Us (Right?)
Teri Ann Lowenthal | The Census Project Blog
April 23, 2013

If Congress only wants a head-count census, will they fund a ‘mandatory population register?’ This is something New Zealand is considering:

National Census Could be Scrapped
National News | TVNZ
April 23, 2013

Microsoft Excel: The Ruiner of Global Economies?

This is a series of articles on the news that a well-cited and influential paper by Carmen Reinhart and Ken Rogoff had an Excel error in it, which led to an overstating of the association between debt and growth. There are other more fundamental problems with the paper – see comments by economists below.

From a training viewpoint, it is relevant to note that this was discovered by a graduate student, working on a class assignment: find a famous study and replicate it.

This entry has four sections: (a)the student; (b)comments by other economists; (c)replication & programming; and (d)coverage from the press.

The Story of the Student
Meet the 28-Year-Old Grad Student Who Just Shook the Global Austerity Movement
Kevin Roose | The New York Magazine
April 18, 2013

How a student took on eminent economists on debt issue – and won
Edward Krudy | Reuters
April 18, 2013

‘They Said at First That They Hadn’t Made a Spreadsheet Error, When They Had’
Peter Monagham | Chronicle of Higher Education
April 24, 2013
My favorite Q & A from this interview with Thomas Herndon is:
Q. This is more than a spreadsheet error, then?

A. Yes. The Excel error wasn’t the biggest error. It just got everyone talking about this. It was an emperor-has-no-clothes moment.

Comments/Analysis by Economists
Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogo ff
Thomas Herndon, Michael Ash and Robert Pollin | Political Economy Research Institute
April 15, 2013
easier to read pdf of paper, but link above includes data, code, etc.

Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems
Michael Konczal | Next New Deal (blog of the Roosevelt Institute)
April 16, 2013

Reinhart and Rogoff are wrong about austerity
Robert Pollin and Michael Ash | Financial Times
April 17, 2013

Reinhart/Rogoff and Growth in a Time Before Debt
Arindrajit Dube | Next New Deal (blog of the Roosevelt Institute)
April 17, 2013

Reinhart, Rogoff, and How the Macroeconomic Sausage Is Made
Justin Fox | Harvard Business Review
April 17, 2013

The Excel Depression
Paul Krugman | New York Times
April 19, 2013

Replication & Programming
The Mysterious Powers of Microsoft Excel
Colm O’Regan | BBC News Magazine
April 20, 2013

What the Reinhart & Rogoff Debacle Really Shows: Verifying Empirical Results Needs to be Routine
Victoria Stodden | The Monkey Cage Blog
April 19, 2013

What Reinhart-Rogoff Means for the Replication Debate
Political Science Replication Blog
April 19, 2013

Microsoft Excel: The ruiner of global economies?
Peter Bright | Ars Technica
April 16, 2013
This piece describes the Excel error, but also discusses other issues with the paper, including the interesting tidbit that the original Reinhart-Rogoff paper was published in the American Economic Review proceedings issue(May), which are not peer reviewed.

Two clever economists have looked to see if researchers pad their resumes by hiding their AER proceedings publications. The University of Michigan economics department was included in their sample.

Research: Bad math rampant in family budgets and Harvard studies
Jeremy Olshan | Wall Street Journal (Market Watch blog)
April 17, 2013
88% of spreadsheets have errors

On the accuracy of statistical procedures in Microsoft Excel 2007
B.D. McCullough and David A. Heiser | Computational Statistics and Data Analysis
March 2008
These authors criticize Excel for its use in statistical analysis because of its failures in statistical distributions, random number generation, and the NIST StRD(Statistical Reference Datasets). I suspect most users of Excel are using the simpler tools: summation, product, etc., but on occasion faculty have used Excel as a rudimentary statistical analysis tool.

What We Know about Spreadsheet Errors
Raymond Panko | Journal of End User Computing
May 2008

Come to Jesus Slides: Use Script-Based Analysis, not Excel
Matt Frost | Charlottesville, Virginia
The author is recommending R or more specifically R Studio, but his point applies to any script-based statistical package.

The Press
Too many to link to for the moment, but here’s a sampling:
[Search Link]

Essay: Linking, Exploring and Understanding Population Health Data

This is a nice data essay by former PSC trainee Michael Bader. He discusses multiple sources of data that one might use to understand population health. I especially like his point about the need to archive neighborhood conditions – after all neighborhoods change. But he also touches on the range of data available for analysis from focus groups to big data.

Linking, Exploring and Understanding Population Health Data
Michael Bader | Human Capital Blog (RWJ)
June 25 2012

The opening paragraph deserves a highlight, but read the entire entry. It is worth it:

Data are the sustenance of population health research, and like the food that sustains us, it comes in many forms, shapes and sizes. Also like food, it’s best appreciated in combination. A single data source in the absence of context is unfulfilling; but combining datasets that are rich with information and contours — now that’s a meal!