Archive for the 'Methodology' Category

Page 2 of 6

The Census Reform Act of 2013

This proposed legislation is really radical [H.R. 1638]. It would eliminate all surveys collected by the Census Bureau: Economic Census, Census of Governments, Census of Agriculture and a non-existent mid-decade census. Furthermore, it would limit the census to a population count.

In short:

(a) Notwithstanding any other provision of law–
(1) the Secretary may not conduct any survey, sampling, or other questionnaire, and may only conduct a decennial census of population as authorized under section 141; and
(2) any form used by the Secretary in such a decennial census may only collect information necessary for the tabulation of total population by States
(b) Repeal of Survey, Questionnaire, or Sampling Authority- Sections 182, 193, and 195 of title 13, United States Code, are repealed.

The Census Project Blog discusses this in more detail:
What We Don’t Know Can’t Hurt Us (Right?)
Teri Ann Lowenthal | The Census Project Blog
April 23, 2013

If Congress only wants a head-count census, will they fund a ‘mandatory population register?’ This is something New Zealand is considering:

National Census Could be Scrapped
National News | TVNZ
April 23, 2013

Microsoft Excel: The Ruiner of Global Economies?

This is a series of articles on the news that a well-cited and influential paper by Carmen Reinhart and Ken Rogoff had an Excel error in it, which led to an overstating of the association between debt and growth. There are other more fundamental problems with the paper – see comments by economists below.

From a training viewpoint, it is relevant to note that this was discovered by a graduate student, working on a class assignment: find a famous study and replicate it.

This entry has four sections: (a)the student; (b)comments by other economists; (c)replication & programming; and (d)coverage from the press.

The Story of the Student
Meet the 28-Year-Old Grad Student Who Just Shook the Global Austerity Movement
Kevin Roose | The New York Magazine
April 18, 2013

How a student took on eminent economists on debt issue – and won
Edward Krudy | Reuters
April 18, 2013

‘They Said at First That They Hadn’t Made a Spreadsheet Error, When They Had’
Peter Monagham | Chronicle of Higher Education
April 24, 2013
My favorite Q & A from this interview with Thomas Herndon is:
Q. This is more than a spreadsheet error, then?

A. Yes. The Excel error wasn’t the biggest error. It just got everyone talking about this. It was an emperor-has-no-clothes moment.

Comments/Analysis by Economists
Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogo ff
Thomas Herndon, Michael Ash and Robert Pollin | Political Economy Research Institute
April 15, 2013
easier to read pdf of paper, but link above includes data, code, etc.

Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems
Michael Konczal | Next New Deal (blog of the Roosevelt Institute)
April 16, 2013

Reinhart and Rogoff are wrong about austerity
Robert Pollin and Michael Ash | Financial Times
April 17, 2013

Reinhart/Rogoff and Growth in a Time Before Debt
Arindrajit Dube | Next New Deal (blog of the Roosevelt Institute)
April 17, 2013

Reinhart, Rogoff, and How the Macroeconomic Sausage Is Made
Justin Fox | Harvard Business Review
April 17, 2013

The Excel Depression
Paul Krugman | New York Times
April 19, 2013

Replication & Programming
The Mysterious Powers of Microsoft Excel
Colm O’Regan | BBC News Magazine
April 20, 2013

What the Reinhart & Rogoff Debacle Really Shows: Verifying Empirical Results Needs to be Routine
Victoria Stodden | The Monkey Cage Blog
April 19, 2013

What Reinhart-Rogoff Means for the Replication Debate
Political Science Replication Blog
April 19, 2013

Microsoft Excel: The ruiner of global economies?
Peter Bright | Ars Technica
April 16, 2013
This piece describes the Excel error, but also discusses other issues with the paper, including the interesting tidbit that the original Reinhart-Rogoff paper was published in the American Economic Review proceedings issue(May), which are not peer reviewed.

Two clever economists have looked to see if researchers pad their resumes by hiding their AER proceedings publications. The University of Michigan economics department was included in their sample.

Research: Bad math rampant in family budgets and Harvard studies
Jeremy Olshan | Wall Street Journal (Market Watch blog)
April 17, 2013
88% of spreadsheets have errors

On the accuracy of statistical procedures in Microsoft Excel 2007
B.D. McCullough and David A. Heiser | Computational Statistics and Data Analysis
March 2008
These authors criticize Excel for its use in statistical analysis because of its failures in statistical distributions, random number generation, and the NIST StRD(Statistical Reference Datasets). I suspect most users of Excel are using the simpler tools: summation, product, etc., but on occasion faculty have used Excel as a rudimentary statistical analysis tool.

What We Know about Spreadsheet Errors
Raymond Panko | Journal of End User Computing
May 2008

Come to Jesus Slides: Use Script-Based Analysis, not Excel
Matt Frost | Charlottesville, Virginia
The author is recommending R or more specifically R Studio, but his point applies to any script-based statistical package.

The Press
Too many to link to for the moment, but here’s a sampling:
[Search Link]

Essay: Linking, Exploring and Understanding Population Health Data

This is a nice data essay by former PSC trainee Michael Bader. He discusses multiple sources of data that one might use to understand population health. I especially like his point about the need to archive neighborhood conditions – after all neighborhoods change. But he also touches on the range of data available for analysis from focus groups to big data.

Linking, Exploring and Understanding Population Health Data
Michael Bader | Human Capital Blog (RWJ)
June 25 2012

The opening paragraph deserves a highlight, but read the entire entry. It is worth it:

Data are the sustenance of population health research, and like the food that sustains us, it comes in many forms, shapes and sizes. Also like food, it’s best appreciated in combination. A single data source in the absence of context is unfulfilling; but combining datasets that are rich with information and contours — now that’s a meal!

Demographic Potential: revisiting an old technique/methodology

The potential demography: a tool for evaluating differences among countries in the European Union
Gian Carlo Blangiardo and Stefania M. L. Rimoldi
Genus: Journal of Population Sciences*
Spring 2013

*This journal has just become an open access journal: http://www.genus-journal.org/

The ACS Faces More Battles

The source for this entry comes from “The Option of Ignorance: Gutting the ACS Puts Democracy at Risk” from The Census Project Blog. http://bit.ly/YGHp86

The funding for the American Community Survey (ACS) will be covered by the 2013 Continuing Resolution, H.R. 933. However, two bills have been introduced in the House (H.R. 1078) and the Senate (S.530) to make the ACS voluntary.

The House Bill provides a Constitutional Statement of Authority, e.g., Fourth Amendment. Note that one of the co-sponsors of this bill is Tim Walburg from the 7th Congressional District, e.g., just west of Ann Arbor.

We have multiple links in this blog on the shortsighted reasoning of this proposition. And, the Census Bureau has researched the issue. A voluntary ACS will be more expensive and will produce less reliable data.

The links are highlighted below:

SENATE: The Census Bureau has already written the reports; read them.
Oh Canada! Look Before you Leap
More on the Idea of a Voluntary ACS
Small Government Folks and the Federal Statistical System

SENATE: The Census Bureau has already written the reports; read them.

New Congress, Old Attacks on the Census
By Jason Jordan | APA Director of Policy and Government Affairs
March 15, 2013

The House and Senate are working in earnest now to pass a new Continuing Resolution to provide funding for the rest of the fiscal year and avoid a potential government shutdown. Fortunately, neither the House nor the Senate versions of the extension include language on ACS. However, the Senate version does ask the Census Bureau to submit a report on ACS, including an analysis of the costs and benefits of a voluntary ACS.

The Census Bureau has evaluated a voluntary American Community Survey (ACS). This was done at the behest of Congress back in 2003. New reports were posted on the Census Bureau website in 2011. The Senate (and House) needs to read them.

Comparison of the American Community Survey Voluntary versus Mandatory Estimates
Alfredo Navarro, Karen King, and Michael Starsinic | Census Bureau
September 2011

Quality Measures Associated with a Voluntary American Community Survey
Deborah H. Griffin and David Raglin | Census Bureau
August 2011

Cost and Workload Implications of a Voluntary American Community Survey
Deborah Griffin | Census Bureau
June 2011

Underestimating Alcohol Consumption

How is alcohol consumption affected if we account for under-reporting? A hypothetical scenario
Sadie Boniface, Nicola Shelton | European Journal of Public Health
February 26, 2013
These researchers compared reported alcohol consumption from survey data with published reports of alcohol sales and determined there is under-reporting of alcohol consumption in England, which is comparable to other studies.

This was mostly posted as an impetus to others to think of additional ways to get at this under-reporting problem. And, luckily the time period does not include the Olympics, which might have involved lots of tourists.

Research on Black First Names

The first paper is by former PSC post-doc, Trevon Logan, which shows that blacks had distinctive names in the early 20th Century – that this is not new. He and his co-authors used historical census data as well as data from death certificates. The second paper explores whether searches involving ‘black’ names results in different ads being displayed via a Google search. Interestingly, one of the black names is ‘Trevon.’ Likewise, one of the female black names is ‘Latanya’ which is the author’s first name. The final paper is probably a familiar paper to most – does having a black name make a difference in interview call backs.

Distinctively Black Names in the American Past
Lisa Cook, Trevon Logan, and John Parman | NBER (Working paper 18802)
February 2013

Discrimination in Online Ad Delivery
Latanya Sweeney | Harvard University [working paper posted on arcxiv.org]
January 2013

Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination
Marianne Bertrand and Sendhil Mullainathan | NBER (working paper 9873)
July 2003

Data Privacy: Some Articles on Failed Anonymizaton

January 28th is Data Privacy Day so this post provides a few articles and news reports on failed anonymization as well as a Guide to HIPAA Audits concerning publically identifiable health information.

HIPAA Audit Tips – Know What De-Identification of PHI Really Means
January 28, 2013

The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
Daniel C. Barth-Jones | Social Science Research Network
June 4, 2012

Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization
Paul Ohm | UCLA Law Review
August 2009

Identifying Personal Genomes by Surname Inference
Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin and Yaniv Erlich | Science
January 18, 2013

The Complexities of Genomic Identifiability
Laura L. Rodriguez, Lisa D. Brooks, Judith H. Greenberg and Eric D. Green | Science
January 18, 2013

And some articles that explain the issues to the educated public. The first article has a reference to UM football. See if you can find it:

Emperor’s New Short Tandem Repeats
John Wilbanks | del-fi.org (personal website)
January 17, 2013

Scientists Demonstrate how Hackers can unlock your Genetic Secrets
Alan Boyle, Science Editor | NBC News
January 17, 2013

Big Data: Google Flu

Google Flu trends uses aggregated Google search data to estimate current flu activity in near real-time as compared to the results from confirmed cases via CDC epidemiologists. Bob Groves mentioned this site in his 50th Anniversary talk at PSC as an example of “wild” data, which can be merged with or compared to data collected via traditional methods. [See this post for more examples of wild data in social science research.]

Google Flu Trends | United States
This site allows one to see trends over time for the US, e.g., how this year, compares to the pattern in previous years. One can also get reports for specific states or metro areas. Note that this is the methodology used in an Economics dissertation on the under performance of Obama in selected states.

Google Flu Trends | World
This shows the same results, but based on the entire world. The first thing that is clear is the striking difference in flu trends between the Northern and Southern hemispheres (totally expected). One can select a specific country, included the United States, and examine the flu trends for that country.

Google Dengue Trends | World
This shows the results of an aggregated search for dengue fever. The first thing that is apparent is that dengue fever is not a search that folks in the US or England make or make often enough to register.

250,000 Social Media Users in U.S. Said They Got the Flu
Chris Taylor | Mashable
January 16, 2013
If you mentioned you had the flu on either Twitter or Facebook, your post got analyzed by Crimson Hexagon, a firm that does sentiment analysis.

Below are several articles that describe the methodology and usefulness of these big data techniques for disease surveillance purposes:

Detecting influenza epidemics using search engine query data
Ginsberg, Jeremy, et.al. | Nature
February 2009

Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance
Chan, Emily, et.al. | PlosOne
May 2011

The following articles are nice examples to use in a class, illustrating the concept, without going into the details of the above articles:

Unless You Live in Takoma Park, Beverly Hills, or Reno, You’re Probably Going to Get the Flu
Henry Grabar | The Atlantic Cities
January 10, 2013

The Year’s Flu Season is the Worst in a Long Time, Google GIF Edition
Alexis Madrigal | The Atlantic
January 9, 2013