Author Archive for lisan

Page 4 of 19

Blast from the Past

Sunday’s Washington Post had an article on the divergent amounts spent on the elderly versus children. This was the theme of Sam Preston’s 1984 PAA Presidential address:

Feds spend $7 on elderly for every $1 on kids
Ezra Klein | Washington Post (WonkBlog)
February 15, 2013
The bulk of this article is based on a report from the Urban Institute.

Kids’ Share 2012: Report on Federal Expenditures on Children Through 2011
Julia Isaacs, et.al. | The Urban Institute

Children and the Elderly: Divergent Paths for America’s Dependents
Sam Preston | Demography
November 1984

Data Privacy: Some Articles on Failed Anonymizaton

January 28th is Data Privacy Day so this post provides a few articles and news reports on failed anonymization as well as a Guide to HIPAA Audits concerning publically identifiable health information.

HIPAA Audit Tips – Know What De-Identification of PHI Really Means
January 28, 2013

The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
Daniel C. Barth-Jones | Social Science Research Network
June 4, 2012

Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization
Paul Ohm | UCLA Law Review
August 2009

Identifying Personal Genomes by Surname Inference
Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin and Yaniv Erlich | Science
January 18, 2013

The Complexities of Genomic Identifiability
Laura L. Rodriguez, Lisa D. Brooks, Judith H. Greenberg and Eric D. Green | Science
January 18, 2013

And some articles that explain the issues to the educated public. The first article has a reference to UM football. See if you can find it:

Emperor’s New Short Tandem Repeats
John Wilbanks | del-fi.org (personal website)
January 17, 2013

Scientists Demonstrate how Hackers can unlock your Genetic Secrets
Alan Boyle, Science Editor | NBC News
January 17, 2013

Big Data Reveals Job Change

Task Specialization in U.S. Cities from 1880-2000
Guy Michaels, Ferdinand Rauch, Stephen J. Redding | NBER Working Paper 18715
January 2013

In this study, economists Guy Michaels, Ferdinand Rauch, and Stephen J. Redding analyze the verbs used to describe jobs in the U.S. Dictionary of Occupational Titles during a 120 year time period. They do this by geographic area, correlating their findings with the spread of telephone service and transportation networks. They discover “a systematic reallocation of employment over time towards interactive occupations, which involve tasks described by verbs that appear in thesaurus categories concerned with thought, communication and inter-social activity.”

Tip from @TrendCop via Twitter

An Exercise in Inefficiency: Sequestration/Threat of Sequestration

This is a collection of recent news articles on sequestration. For the most part, the articles concern NIH funding, although a few discuss all federal agencies. An earlier post links to a agency-by-agency cuts as well as UM-specific information.

Sequestration cuts no longer the ‘bad policy’ bogeyman for Congress
Jeremy Herb | The Hill
January 29, 2013
This news source focuses on Congress and the Federal government. It concludes “With the sequestration deadline a little over four weeks away, there appears to be little momentum in Congress or the White House to stop the cuts.”

Threats of automatic cuts costly to federal agencies
Lisa Rein |Washington Post
January 27, 2013

Paul Ryan Insists Republicans are ready to let the Sequester Happen
Suzy Khimm | Wonk Blog (Washington Post)
January 27, 2013

Ryan: No Sequestration had Romney and I Won
Pema Levy | Talking Points Memo
January 27, 2013

Sequestration means mass furloughs in April
Stephen Losey | Federal Times
January 25, 2013

NIH Director Francis Collins: Medical Research at Risk
Paige Winfield Cunningham | Politico
January 16, 2013

Sequestering Science
Michael D. Purugganan | Huffington Post
January 16, 2013

Big Data: Google Flu

Google Flu trends uses aggregated Google search data to estimate current flu activity in near real-time as compared to the results from confirmed cases via CDC epidemiologists. Bob Groves mentioned this site in his 50th Anniversary talk at PSC as an example of “wild” data, which can be merged with or compared to data collected via traditional methods. [See this post for more examples of wild data in social science research.]

Google Flu Trends | United States
This site allows one to see trends over time for the US, e.g., how this year, compares to the pattern in previous years. One can also get reports for specific states or metro areas. Note that this is the methodology used in an Economics dissertation on the under performance of Obama in selected states.

Google Flu Trends | World
This shows the same results, but based on the entire world. The first thing that is clear is the striking difference in flu trends between the Northern and Southern hemispheres (totally expected). One can select a specific country, included the United States, and examine the flu trends for that country.

Google Dengue Trends | World
This shows the results of an aggregated search for dengue fever. The first thing that is apparent is that dengue fever is not a search that folks in the US or England make or make often enough to register.

250,000 Social Media Users in U.S. Said They Got the Flu
Chris Taylor | Mashable
January 16, 2013
If you mentioned you had the flu on either Twitter or Facebook, your post got analyzed by Crimson Hexagon, a firm that does sentiment analysis.

Below are several articles that describe the methodology and usefulness of these big data techniques for disease surveillance purposes:

Detecting influenza epidemics using search engine query data
Ginsberg, Jeremy, et.al. | Nature
February 2009

Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance
Chan, Emily, et.al. | PlosOne
May 2011

The following articles are nice examples to use in a class, illustrating the concept, without going into the details of the above articles:

Unless You Live in Takoma Park, Beverly Hills, or Reno, You’re Probably Going to Get the Flu
Henry Grabar | The Atlantic Cities
January 10, 2013

The Year’s Flu Season is the Worst in a Long Time, Google GIF Edition
Alexis Madrigal | The Atlantic
January 9, 2013

How to calculate life expectancy & why it matters

Social Security: It’s Worse Than You Think
Gary King and Samir Soneji | New York Times
January 5, 2013

This Opinion piece in the Sunday Times is a summary of a Demography article where the authors argue that the Social Security Administration underestimates how long Americans live. That means the trust fund will run out two years earlier than the government has predicted.

Here’s the link to the original article in Demography:

Statistical Security for Social Security
Samir Soneji and Gary King | Demography
August 2012
[html] | [pdf]

This article has quite a few references to former PSC post-doc, John Wilmoth, who has also written on mortality projections and life expectancy.

Finally, for one more set of mortality estimates and methodology, see the Census Bureau’s National Population Projections web page. They have always projected higher life expectancy than the Social Security Administration, but their methodology is not the same as these authors:

2012 National Population Projections
Census Bureau
December 12, 2012

Is the NIH Funding Model Efficient?

Money and Science: To He that Hath
The Economist
December 8th, 2012
This analysis is based on the publication and funding record of the most highly cited biomedical papers and concluded that NIH may not support the best researchers. The link to the original article, in Nature, is provided below.

Research grants: Conform and be funded
Joshua M. Nicholson & John P.A. Ioannidis | Nature
December 5, 2012
Tag line: Too many US authors of the most innovative and influential papers in the life sciences do not receive NIH funding.

And, perhaps related to the above, NIH is considering anonymity for grant applicants:

NIH Considers Anonymity for Grant Applicants
Paul Basken | The Chronicle of Higher Education
December 10, 2012

And, be careful about unattributed text. Some federal agencies are using software to detect unattributed copying in research proposals. See below.

Plagiarism in Grant Proposals
Karen M. Markin | The Chronicle of Higher Education
December 10, 2012

Why Don’t Parents Name Their Daughters Mary Anymore?

Why Don’t Parents Name Their Daughters Mary Anymore?
Philip Cohen | the Atlantic
December 12, 2012

This article is by Philip Cohen, a professor at the University of Maryland. The Atlantic has picked up his blog, Family Inequality, where he posts short, but scholarly snippets.

This piece illustrates the decline in the name Mary via the Social Security Administration’s names database. He posits that this is due to a rise in the cultural value of individuality. Accordingly, people value names that are not common, perhaps even unique. A repercussion of this is that there were only 21,695 baby girls named Sophia (most popular name in 2011) whereas back in 1961, there were 47,655 girls name Mary.

Using Wild Data to Estimate International Migration

A previous post described several studies based on non-survey data, which inform demographic events. The following is another very creative example:

You are where you e-mail: using e-mail data to estimate international migration rates
Emilio Zagheni and Ingmar Weber | Max Planck Institute & Yahoo! Research
Proceedings of the 3rd Annual ACM Web Science Conference [Pages 348-351]
June 22-24, 2012

Wild Data: Expanding Social Science Resources

Most researchers use survey data, but more and more researchers are using “wild” data, which is defined as data not produced for research purposes. In fact, several PSC researchers are part of an NSF/Census project, which explores the usefulness of “wild” data ranging from administrative data (Social Security death index, Social Security earnings data) to data harvested from the web.

Below are several examples of informative posts based on web-based data:

The New Secessionists: Plotting whitehouse.gov secession petitions
Neal Caren | Big Data blog
November 14, 2012

This post shows the origin of each of the signers of the wave of secession petitions on the whitehouse.gov website via a county-based map. It also includes an explanation of how this was done. Many of the posts on Caren’s Big Data blog are excellent tutorials for the fundamentals of quantitative text analysis for social scientists.

It is also useful to refer to the history of secession petitions in the US, provided here:

10 facts about Secession
Kevin Robillard | Politico
November 14, 2012

The second example of an application of wild data, comes from a post about ‘mapping racist tweets’ based on content on Twitter immediately after Obama was re-elected to his second term:

Mapping Racist Tweets in Response to President Obama’s Re-election
floatingsheep.org
November 8, 2012

Note that a Harvard Ph.D. student used Google search data to study the under performance of Obama in 2008, which he atttributed to racial animus.

The Effects of Racial Animus on a Black Presidential Candidate: Using Google Search Data to Find What Surveys Miss
Seth Stephens-Davidowitz | Harvard
June 9, 2012

The popular press version of this is here:

Can Google Predict the Impact of Racism on a Presidential Election?
Garance Franke-Ruta | The Atlantic
June 11, 2012

And finally, the Google NGram project has useful data for researchers. Here’s an article from the Economist where the “data is” vs “data are” question is examined. More pertinent to researchers might be the evolution of “Negro man” to “colored man” to “African-American man” in common usage.

Data or datum?
K.N.C. | The Economist
July 13, 2012

And, here’s the link to the Google Ngram Viewer. Of course, you’ll want to have access to the data. Here’s the raw data is available for download link