How One 19-Year-Old Illinois Man Is Distorting National Polling Averages
New York Times | Nate Cohn @Nate_Cohn
October 12, 2016
[Link to NYT article]
This is a nice illustration of the decisions polls make when they weight their respondents. The authors disagree with the decisions of the USC/LA-Times pollsters, but applaud them for transparency:
It’s worth noting that this analysis is possible only because the poll is extremely and admirably transparent: It has published a data set and the documentation necessary to replicate the survey.
The article has multiple illustrations of what the trend of national Trump support would have been with different weighting decisions. Check it out.
Gary King, Director for the Institute for Quantitative Social Science at Harvard University spoke at a recent Michigan Institute for Data Science (MIDAS) symposium. Below are links to the slides and a video of the presentation.
Slides | Video
For those who don’t want to watch the entire presentation, here are links to specific papers and/or software he mentions in the presentation.
Automated Text Analysis
VA: Verbal Autopsy [software]
Evaluating U.S. Social Security Administration Forecasts
Learning Catalytics [commercial start-up]
Crimson Hexagon: Social Media Insights [commercial start-up]
Perusall [commercial start-up, e-book platform to increase student engagement]
And, it might be more productive to just go through King’s personal website to find the content yourself. The above is just a fraction of his productivity.
The Census Bureau gathered data on fertility by asking a “children ever born” question from 1940 to 1990 in the decennial census. The 2000 Census did not ask a fertility question at all. With the advent of the American Community Survey, fertility was covered but with a different question. It asked if a woman had given birth to a child in the past year. This allows researchers to compute a total fertility rate. It performs reasonably well against the measure produced from the vital statistics system. And, given that geography is not readily available with the natality detail files anymore, this is a welcome solution. The main drawback to the ACS question is that the reference year will not span the calendar year that the vital statistics system is based on. Only the December respondents are referencing a January to December calendar year. See the Background section below for a further discussion of this.
However, recently, the Census Bureau noticed some anomalies in the data for selected areas and determined that some interviewers had been sloppy and asked “Have you given birth” rather than “Have you given birth in the last year.” Many more women will answer yes to the former and inflate the numerator. This is a good illustration of how much effort the Census Bureau goes to for producing accurate and robust statistics.
Addressing Data Collection Errors in the Fertility Question in the American Community Survey
Tavia Simmons | Census Bureau
In recent years, a few geographic areas in the American Community Survey (ACS) data had unusually high percentages of women reported as giving birth in the past year, quite unlike what was seen in previous years for those areas. This paper describes the issue that was discovered, and the measures taken to address it.
Indicators of Marriage and Fertility in the United States from the American Community Survey: 2000 to 2004
T. Johnson and J. Dye | Census Bureau
Slides 23 to 26 discuss and illustrate how the ACS and Vital Statistics estimates diverge from each other.
This is a nice tool for getting net migration reports based on IRS tax return data. Note that because these data are based on tax returns, one can also tell whether, on average, a state is losing/gaining wealthier residents. One can generate reports for counties by state or for states. The former is really tedious because one has to generate the county reports one by one.
Counties | States
And here’s the link to raw data for those who find widgets tedious. Note that the site has nice explanations for the methodology, including changes over time in how these files are created: SOI Tax Stats – Migration Data
And, do you want to know how to make something like the map above? Here’s a link from Flowing Data on how to make a similar map based on 5-years of county-to-county IRS data:
Article | How To Guide
This is a report on the NCI/SEERS web portal on a way to create residential histories of respondents/decadents for epidemiological research. The report (below) details how three commercial vendors were able to match the residential history of a small sample of federal government employees. Also available are the algorithms and software to reconcile conflicting addresses. Interested folks might want to browse other tools/papers in the NCI Geographical Information Systems and Science for Cancer Control webiste. https://gis.cancer.gov/index.html
NCI/SEER Residential History Project
David Stinchcomb and Allison Roeser | Westat
SAS residential history generation programs [3 programs]
[Summary] [Link to programs]
Even though NIH and NSF both have data sharing requirements, there is clearly some resistance to it. The best example is an editorial from the New England Journal of Medicine. Secondary data users are characterized as “research parasites.”
A rebuttal comes from a Science editorial with the title #IAmAResearchParasite.
Dan L. Longo and Jeffrey Drazen | N Engl J Med
January 21, 2016
Marcia McNutt | Science
March 4, 2016
The drop in birth rates from 2007 through 2013 has been well documented. However, it is also important to examine total rates of pregnancy and other pregnancy outcomes (abortion and fetal loss) to provide a comprehensive picture of current reproductive trends. This NCHS Health E-Stat uses data from 2010 to update a previous NCHS report on pregnancy rates. Data on pregnancy outcomes by age and race and Hispanic origin are presented.
2010 Pregnancy Rates Among U.S. Women
Sally C. Curtin, Joyce Abma [NCHS] and Kathryn Kost [Guttmacher Institute]
html | pdf
Monday’s Supreme Court case centered on data. The case, Evenwell v Abbot, argues that representation in Texas legislative districts ought to be based on voters rather than the total population. Currently, most states use total population for re-districting purposes and this comes from the decennial census. The decennial census does not have a citizenship question. But, the replacement for the Census long-form, the American Community Survey (ACS) does.
The former directors of the Census Bureau filed an amicus brief against the idea of using the eligible voter population (e.g., citizens 18+ years of age). A group of applied demographers also filed an amicus brief, noting that this was quite possible using the ACS. Note that Sonia Sotomayor does not think the ACS is adequate, but that is because she misunderstands the data:
As is typical with cases involving data and social science research, there are lots of supplementary links:
The Washington Post [10 or so opinions from the Opinion | In Theory section]
‘One Person One Vote’: A Primer
Washington Post | Opinion : In Theory
[10 or so opinions and comments]
Argument preview: How to measure “one person, one vote”
Lyle Dunston | ScotusBlog
December 1, 2015
The Threat to Representation for Children and Non-Citizens: An Analysis of the Potential Impact of Evenwel v. Abbott on Redistricting
Andrew Beveridge | Social Explorer
December 2, 2015
Supreme Court is skeptical of challenge to Texas district lines
Maria Recio | Sacramento Bee
December 8, 2015
This is the source of the Sotomayor quote
“. . . Dueling Affirmative Action Empiricism” [this is actually from Fisher vs Texas, but is included here as evidence of the Supreme Court using social science research.
OHRP has release its notice of proposed rule making that makes significant changes to the Common Rule.
Federal Register: Federal Policy for the Protection of Human Subjects
Comments are accepted up until 12/07/2015 at 11:59 PM EST
If you need to get up to speed, The National Academy of Sciences published a book in 2014 on the first release of changes to the common rule. It is available on-line, as a pdf or as a book.
The Canadian election campaign period is much shorter than the US. The Canadian election will take place on October 19, 2015 and the campaigning started on August 2nd of this year.
Another difference with the US is the types of issues that candidates are discussing – specifically science policy and the long-form census. Will these be issues in the US? Doubtful, but let’s watch the debates and see.
Below is recent coverage in the Canadian press about the long-form census and science policy being issues, at least among the NDP and Liberals:
Reviving the Census Debate
Donovan Vincent | The Star
September 12, 2015
The Liberals and the NDP have said they want to bring back the long-form census the Conservatives killed in 2010. Could it become an election issue?
Researchers try to make science a federal election issue
Julie Ireton | CBC News
September 3, 2015
Here is a running list of organizations that were against/in favor of the Harper government’s cancellation of the mandatory long-form census.
Here is previous coverage in this blog about Canada’s war on science and follies with their census.