New York City Census FactFinder

The NYC Planning Department’s American Community Survey update to the NYC Census Factfinder application has been released. It is now possible to get 2009-2013 ACS profiles for Neighborhood Tabulation Areas and user defined census tract aggregations, in addition to demographic profiles from the 2000 and 2010 censuses.

Cleaning Data and Extracting Data from PDFs

Flowing Data points out two useful data resources:

  • Marc Bellemare, an associate professor in the Department of Applied Economics at the University of Minnesota, provides practical tips on cleaning data.
  • Tabula, which is available for both Windows and Mac, has an update which makes it easier to extract data from PDF documents.

Using Probability in Criminal Sentencing

FlowingData extracts a statistics lesson on probability from a piece in FiveThirtyEight about risk assessment and criminal sentencing.

New Book From the Oxford Poverty & Human Development Initiative

The Oxford Poverty & Human Development Initiative published a book on Multidimensional Poverty Measurement & Analysis:

Multidimensional poverty measurement and analysis is evolving rapidly. Quite recently, a particular counting approach to multidimensional poverty measurement, developed by Sabina Alkire and James Foster, has created considerable interest. Notably, it has informed the publication of the Global Multidimensional Poverty Index (MPI) estimates in the Human Development Reports of the United Nations Development Programme since 2010, and the release of national poverty measures in Chile, Mexico, Colombia, Bhutan and the Philippines. The academic response has been similarly swift, with related articles published in both theoretical and applied journals.

The high and insistent demand for in-depth and precise accounts of multidimensional poverty measurement motivates this book, which is aimed at graduate students in quantitative social sciences, researchers of poverty measurement, and technical staff in governments and international agencies who create multidimensional poverty measures.

Draft chapters are available online.

Poll Results and Response Rates

Scott Keeter, Pew Research Center’s director of survey research, discusses declining response rates and what it means for survey reliability.

Big data and smart cities

Urban Demographics posted a presentation by Rob Kitchin based on his paper “The real-time city? Big data and smart urbanism” (gated version; working paper version).


‘Smart cities’ is a term that has gained traction in academia, business and government to describe cities that, on the one hand, are increasingly composed of and monitored by pervasive and ubiquitous computing and, on the other, whose economy and governance is being driven by innovation, creativity and entrepreneurship, enacted by smart people. This paper focuses on the former and, drawing on a number of examples, details how cities are being instrumented with digital devices and infrastructure that produce ‘big data’. Such data, smart city advocates argue enables real-time analysis of city life, new modes of urban governance, and provides the raw material for envisioning and enacting more efficient, sustainable, competitive, productive, open and transparent cities. The final section of the paper provides a critical reflection on the implications of big data and smart urbanism, examining five emerging concerns: the politics of big urban data, technocratic governance and city development, corporatisation of city governance and technological lock-ins, buggy, brittle and hackable cities, and the panoptic city.

Sage Stats

The University of Michigan Library has just acquired access to Sage Stats, a resource for local area statistics:

Sage Stats features data series on U.S. states, counties, cities, and metropolitan areas. Topics covered include the economy, education, crime, government finance, health, population, religion, social welfare, and transportation. Some series go back more than 20 years. Sage Stats makes it easy to download data, compare indicators or create simple visualizations of local area data.

Access is available to the Ann Arbor, Flint and Dearborn campuses at

Apple Research Kit: New Frontiers in Data Collection & Informed Consent

The Apple Research Kit allows researchers to develop an iPhone app, which interested respondents can download from the Apple Store. The respondent goes through an on-line consent form and then responds to questions, tasks (walking), etc. Some of the diagnostic tools are based on previously developed apps from the Apple Healthkit.

As of now, apps have been developed for collecting data for research projects on asthma, cardiovascular disease, diabetes, Parkinson’s, mind, body, and wellness after breast cancer, and for a population-based study, the LGBTQ population.

Here is a description of the informed consent process for these iPhone apps:
Participant-Centered Consent Toolkit

Listed below are a few press releases associated with the Pride Study – the population based study of the gay population. Following those posts are some more general critiques of this way of gathering data. The post from the Verge is probably the most critical raising issues of “on the internet no one knows you are a dog” and gaming the consent process (lying about eligibility for the study). On the plus side, the participant pool is going to be easier to sign up and won’t be limited to those who live close to research hospitals. Here is an excerpt from Business Insider to the reaction to the app launch for the Stanford Heart study:

It’s really incredible … in the first 24 hours of research kit we’ve had 11,000 people sign up for a study in cardiovascular disease through Stanford University’s app. And, to put that in perspective – Stanford has told us that it would have taken normally 50 medical centers an entire year to sign up that many participants. So, this is – research kit is an absolute game changer.

The participant pool is limited to iPhone users (no android version of these apps), although some will have a web interface (the Pride Study).

What’s the Matter with Polling?

This article focuses on political polling – and predictions from political polls, but much of the content is relevant to other sorts of telephone-based opinion surveys, many of which are used by social scientists: Survey of Consumers, Pew, Gallup, etc.

The article focuses on (a) the move from landline to cellphones; (b) the growing non-response rate; (c) costs; (d) and sample metrics, e.g., representativeness.

The decline in landline phones makes telephone surveys more expensive since cell phones cannot be reached through automatic dialers. The landline phone vs cellphone distribution comes from the National Health Interview Survey. Here’s a recent summary of the data. The article summarizes this as “About 10 years ago. . . . about 6 percent of the public used only cellphones. The N.H.I.S. estimate for the first half of 2014 found that this had grown to 43 percent, with another 17 percent “mostly” using cellphones. In other words, a landline-only sample conducted for the 2014 elections would miss about three-fifths of the American public, almost three times as many as it would have missed in 2008.”

The other issue for polling is the growing non-response rate.

When I first started doing telephone surveys in New Jersey in the late 1970s, we considered an 80 percent response rate acceptable, and even then we worried if the 20 percent we missed were different in attitudes and behaviors than the 80 percent we got. Enter answering machines and other technologies. By 1997, Pew’s response rate was 36 percent, and the decline has accelerated. By 2014 the response rate had fallen to 8 percent.

Non-response makes surveys more expensive – more numbers to call to find a respondent and many of them dialed by hand if it is a cellphone universe. And, most important, is the representativeness of the sample that the survey ends up with. So far, surveys based on probability samples seem to still be representative, at least based on comparing sample characteristics to gold-standard benchmarks like the American Community Survey (ACS). Participation in the ACS is mandatory, although for the last several years, Republicans in the House have tried to remove this requirement. Canada did away with its mandatory requirements with its census, with disastrous results. The following is a compilation of posts related to the mandatory response requirement in the US and Canada: [Older Posts]

Measuring Race . . . Again

The following are collection of news stories on how the Census Bureau is planning to collect data on race. It is misleading to say that the Census Bureau will not collect data on race. Instead, of asking about Hispanic Origin and Race, the Census Bureau is likely to ask about “categories” that describe the person.

And, a new category might be “Middle Eastern or North African.”

The Census Bureau collects data on all sorts of topics, but the Office of Management and Budget (OMB) makes the final call on how the concept is measured by the Federal Statistical System. Links to the Census Bureau’s submission to OMB and a report based on internal research follow a nice summary by Pew.

2010 Census Race and Hispanic Origin Alternative Questionnaire Experiment
from the 2010 Census Program for Evaluations and Experiments
Feb 28, 2013

