Wild Data: Expanding Social Science Resources

Most researchers use survey data, but more and more researchers are using “wild” data, which is defined as data not produced for research purposes. In fact, several PSC researchers are part of an NSF/Census project, which explores the usefulness of “wild” data ranging from administrative data (Social Security death index, Social Security earnings data) to data harvested from the web.

Below are several examples of informative posts based on web-based data:

The New Secessionists: Plotting whitehouse.gov secession petitions
Neal Caren | Big Data blog
November 14, 2012

This post shows the origin of each of the signers of the wave of secession petitions on the whitehouse.gov website via a county-based map. It also includes an explanation of how this was done. Many of the posts on Caren’s Big Data blog are excellent tutorials for the fundamentals of quantitative text analysis for social scientists.

It is also useful to refer to the history of secession petitions in the US, provided here:

10 facts about Secession
Kevin Robillard | Politico
November 14, 2012

The second example of an application of wild data, comes from a post about ‘mapping racist tweets’ based on content on Twitter immediately after Obama was re-elected to his second term:

Mapping Racist Tweets in Response to President Obama’s Re-election
November 8, 2012

Note that a Harvard Ph.D. student used Google search data to study the under performance of Obama in 2008, which he atttributed to racial animus.

The Effects of Racial Animus on a Black Presidential Candidate: Using Google Search Data to Find What Surveys Miss
Seth Stephens-Davidowitz | Harvard
June 9, 2012

The popular press version of this is here:

Can Google Predict the Impact of Racism on a Presidential Election?
Garance Franke-Ruta | The Atlantic
June 11, 2012

And finally, the Google NGram project has useful data for researchers. Here’s an article from the Economist where the “data is” vs “data are” question is examined. More pertinent to researchers might be the evolution of “Negro man” to “colored man” to “African-American man” in common usage.

Data or datum?
K.N.C. | The Economist
July 13, 2012

And, here’s the link to the Google Ngram Viewer. Of course, you’ll want to have access to the data. Here’s the raw data is available for download link

Comments are currently closed.