Archive for the 'Data' Category

Page 3 of 18

Intrade is no more, but we have some of its data

The popular election website Intrade has ceased operations, but below are links to archives of Intrade trades as well as a few articles discussing the site and its history. Intrade was on shaky legs when US residents were prohibited from placing bets after the Commodity Futures Trading Commission accused the company of offering contracts outside of traditional exchanges with no regulatory oversight.

Archived Trades via Twitter
Intrade Archive via Panos Ipeirotis (@ipeirotis) [stopped collecting Financials in 2012]
Intrade Archive Archive Team at Internet Archive via
@textfiles

Online Betting Site Intrade Is Shut After Audit Queries
Mark Scott | NY Times
March 12, 2013

RIP Intrade: The last, best hope for pundit accountability
Neil Irwin | Wonkblog, Washington Post
March 11, 2013

Even without Intrade, Billions will be Bet on 2016 Race
Nate Silver | FiveThirtyEight Blog, NY Times
March 11, 2013

Living Apart Together

Canada has led the way in North America on gathering data on non-traditional living arrangements. The following is a report based on the Canadian General Social Survey on the living apart together population. Unlike the US, the Canadian GSS is collected by a federal entity, Statistics Canada.

Living Apart Together
Martin Turcotte | Statistics Canada
March 2013

Short version | Full report

Synopsis: A number of people are in a stable relationship but do not live together, and are known as non-cohabiting or ‘living apart together’ (LAT) couples. How many people are in such a situation? Are they transitioning towards a different kind of life together or making a deliberate lifestyle choice?

Questions
The following is a link to the items in the questionnaire that are used to determine LAT status. Warning, the link is pretty slow:

[LAT items from GSS (Canadian) questionnaire]

The Sister Study: Breast Cancer

The Sister Study
From 2004 to 2009, more than 50,000 women across the US and Puerto Rico, who were between ages 35–74 and whose sister had breast cancer, joined this landmark research effort to find causes of breast cancer. Because of their shared environment, genes, and experiences, studying sisters provides a greater chance of identifying risk factors that may help us find ways to prevent breast cancer.

The Sister Study is currently tracking the health of women in the cohort. Participants complete health updates each year, as well as detailed questionnaires about health and experiences every two-to-three years. Research in the Sister Study focuses on causes of breast cancer and other health issues in women, as well as on factors that influence quality of life and outcomes after a breast cancer diagnosis.

Data Access
Access to the data is not completely open, but there is a process for access. Click on the above link for instructions.

Research on Black First Names

The first paper is by former PSC post-doc, Trevon Logan, which shows that blacks had distinctive names in the early 20th Century – that this is not new. He and his co-authors used historical census data as well as data from death certificates. The second paper explores whether searches involving ‘black’ names results in different ads being displayed via a Google search. Interestingly, one of the black names is ‘Trevon.’ Likewise, one of the female black names is ‘Latanya’ which is the author’s first name. The final paper is probably a familiar paper to most – does having a black name make a difference in interview call backs.

Distinctively Black Names in the American Past
Lisa Cook, Trevon Logan, and John Parman | NBER (Working paper 18802)
February 2013

Discrimination in Online Ad Delivery
Latanya Sweeney | Harvard University [working paper posted on arcxiv.org]
January 2013

Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination
Marianne Bertrand and Sendhil Mullainathan | NBER (working paper 9873)
July 2003

Update on Sequestration Cuts for Statistical Agencies

Sequestration Cuts’ Impact on Statistical Agencies
Steve Pierson | The Census Project Blog
February 19, 2014

This is up-to-date information on selected statistical agencies, primarily the Census Bureau and Bureau of Labor Statistics. However, at the end of the post are links to impacts on NIH and NSF.

Data Privacy: Some Articles on Failed Anonymizaton

January 28th is Data Privacy Day so this post provides a few articles and news reports on failed anonymization as well as a Guide to HIPAA Audits concerning publically identifiable health information.

HIPAA Audit Tips – Know What De-Identification of PHI Really Means
January 28, 2013

The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now
Daniel C. Barth-Jones | Social Science Research Network
June 4, 2012

Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization
Paul Ohm | UCLA Law Review
August 2009

Identifying Personal Genomes by Surname Inference
Melissa Gymrek, Amy L. McGuire, David Golan, Eran Halperin and Yaniv Erlich | Science
January 18, 2013

The Complexities of Genomic Identifiability
Laura L. Rodriguez, Lisa D. Brooks, Judith H. Greenberg and Eric D. Green | Science
January 18, 2013

And some articles that explain the issues to the educated public. The first article has a reference to UM football. See if you can find it:

Emperor’s New Short Tandem Repeats
John Wilbanks | del-fi.org (personal website)
January 17, 2013

Scientists Demonstrate how Hackers can unlock your Genetic Secrets
Alan Boyle, Science Editor | NBC News
January 17, 2013

Big Data: Google Flu

Google Flu trends uses aggregated Google search data to estimate current flu activity in near real-time as compared to the results from confirmed cases via CDC epidemiologists. Bob Groves mentioned this site in his 50th Anniversary talk at PSC as an example of “wild” data, which can be merged with or compared to data collected via traditional methods. [See this post for more examples of wild data in social science research.]

Google Flu Trends | United States
This site allows one to see trends over time for the US, e.g., how this year, compares to the pattern in previous years. One can also get reports for specific states or metro areas. Note that this is the methodology used in an Economics dissertation on the under performance of Obama in selected states.

Google Flu Trends | World
This shows the same results, but based on the entire world. The first thing that is clear is the striking difference in flu trends between the Northern and Southern hemispheres (totally expected). One can select a specific country, included the United States, and examine the flu trends for that country.

Google Dengue Trends | World
This shows the results of an aggregated search for dengue fever. The first thing that is apparent is that dengue fever is not a search that folks in the US or England make or make often enough to register.

250,000 Social Media Users in U.S. Said They Got the Flu
Chris Taylor | Mashable
January 16, 2013
If you mentioned you had the flu on either Twitter or Facebook, your post got analyzed by Crimson Hexagon, a firm that does sentiment analysis.

Below are several articles that describe the methodology and usefulness of these big data techniques for disease surveillance purposes:

Detecting influenza epidemics using search engine query data
Ginsberg, Jeremy, et.al. | Nature
February 2009

Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance
Chan, Emily, et.al. | PlosOne
May 2011

The following articles are nice examples to use in a class, illustrating the concept, without going into the details of the above articles:

Unless You Live in Takoma Park, Beverly Hills, or Reno, You’re Probably Going to Get the Flu
Henry Grabar | The Atlantic Cities
January 10, 2013

The Year’s Flu Season is the Worst in a Long Time, Google GIF Edition
Alexis Madrigal | The Atlantic
January 9, 2013

Is the NIH Funding Model Efficient?

Money and Science: To He that Hath
The Economist
December 8th, 2012
This analysis is based on the publication and funding record of the most highly cited biomedical papers and concluded that NIH may not support the best researchers. The link to the original article, in Nature, is provided below.

Research grants: Conform and be funded
Joshua M. Nicholson & John P.A. Ioannidis | Nature
December 5, 2012
Tag line: Too many US authors of the most innovative and influential papers in the life sciences do not receive NIH funding.

And, perhaps related to the above, NIH is considering anonymity for grant applicants:

NIH Considers Anonymity for Grant Applicants
Paul Basken | The Chronicle of Higher Education
December 10, 2012

And, be careful about unattributed text. Some federal agencies are using software to detect unattributed copying in research proposals. See below.

Plagiarism in Grant Proposals
Karen M. Markin | The Chronicle of Higher Education
December 10, 2012

Why Don’t Parents Name Their Daughters Mary Anymore?

Why Don’t Parents Name Their Daughters Mary Anymore?
Philip Cohen | the Atlantic
December 12, 2012

This article is by Philip Cohen, a professor at the University of Maryland. The Atlantic has picked up his blog, Family Inequality, where he posts short, but scholarly snippets.

This piece illustrates the decline in the name Mary via the Social Security Administration’s names database. He posits that this is due to a rise in the cultural value of individuality. Accordingly, people value names that are not common, perhaps even unique. A repercussion of this is that there were only 21,695 baby girls named Sophia (most popular name in 2011) whereas back in 1961, there were 47,655 girls name Mary.

Using Wild Data to Estimate International Migration

A previous post described several studies based on non-survey data, which inform demographic events. The following is another very creative example:

You are where you e-mail: using e-mail data to estimate international migration rates
Emilio Zagheni and Ingmar Weber | Max Planck Institute & Yahoo! Research
Proceedings of the 3rd Annual ACM Web Science Conference [Pages 348-351]
June 22-24, 2012