Gary King, Director for the Institute for Quantitative Social Science at Harvard University spoke at a recent Michigan Institute for Data Science (MIDAS) symposium. Below are links to the slides and a video of the presentation.
Slides | Video
For those who don’t want to watch the entire presentation, here are links to specific papers and/or software he mentions in the presentation.
Automated Text Analysis
VA: Verbal Autopsy [software]
Evaluating U.S. Social Security Administration Forecasts
Learning Catalytics [commercial start-up]
Crimson Hexagon: Social Media Insights [commercial start-up]
Perusall [commercial start-up, e-book platform to increase student engagement]
And, it might be more productive to just go through King’s personal website to find the content yourself. The above is just a fraction of his productivity.
Nathan Yau at Flowing Data has an interactive graphic showing the growth of obesity rates by state, year (since 1985) and gender.
A post by Sunmoo Yoon in the NIH OBSSR blog looks at the potential of data mining to offer insights into predictors of physical activity in older urban adults:
Only two out of ten older adults meet the national guidelines for physical activity in the United States. Little is known about interrelationships of many socio-ecological factors to improve physical activity behavior among Hispanic older adults. As we move towards a precision medicine approach, we need innovative strategies to discover precisely tailored targets and accurate interventions. Data mining has the potential to offer such insights.
The Census Bureau gathered data on fertility by asking a “children ever born” question from 1940 to 1990 in the decennial census. The 2000 Census did not ask a fertility question at all. With the advent of the American Community Survey, fertility was covered but with a different question. It asked if a woman had given birth to a child in the past year. This allows researchers to compute a total fertility rate. It performs reasonably well against the measure produced from the vital statistics system. And, given that geography is not readily available with the natality detail files anymore, this is a welcome solution. The main drawback to the ACS question is that the reference year will not span the calendar year that the vital statistics system is based on. Only the December respondents are referencing a January to December calendar year. See the Background section below for a further discussion of this.
However, recently, the Census Bureau noticed some anomalies in the data for selected areas and determined that some interviewers had been sloppy and asked “Have you given birth” rather than “Have you given birth in the last year.” Many more women will answer yes to the former and inflate the numerator. This is a good illustration of how much effort the Census Bureau goes to for producing accurate and robust statistics.
Addressing Data Collection Errors in the Fertility Question in the American Community Survey
Tavia Simmons | Census Bureau
In recent years, a few geographic areas in the American Community Survey (ACS) data had unusually high percentages of women reported as giving birth in the past year, quite unlike what was seen in previous years for those areas. This paper describes the issue that was discovered, and the measures taken to address it.
Indicators of Marriage and Fertility in the United States from the American Community Survey: 2000 to 2004
T. Johnson and J. Dye | Census Bureau
Slides 23 to 26 discuss and illustrate how the ACS and Vital Statistics estimates diverge from each other.
This is a nice tool for getting net migration reports based on IRS tax return data. Note that because these data are based on tax returns, one can also tell whether, on average, a state is losing/gaining wealthier residents. One can generate reports for counties by state or for states. The former is really tedious because one has to generate the county reports one by one.
Counties | States
And here’s the link to raw data for those who find widgets tedious. Note that the site has nice explanations for the methodology, including changes over time in how these files are created: SOI Tax Stats – Migration Data
And, do you want to know how to make something like the map above? Here’s a link from Flowing Data on how to make a similar map based on 5-years of county-to-county IRS data:
Article | How To Guide
As of 9/30/2016, Easy Stats will no longer be available. To access data from the American Community Survey, use American FactFinder or QuickFacts. You can provide feedback here.
According to Data Detectives, “The retirement of the application is a result of a CEDSCI data tools assessment from earlier this year. The assessment looked at consolidating data tools to eliminate redundancy and also streamline our data dissemination offerings on Census.gov.”
This is a report on the NCI/SEERS web portal on a way to create residential histories of respondents/decadents for epidemiological research. The report (below) details how three commercial vendors were able to match the residential history of a small sample of federal government employees. Also available are the algorithms and software to reconcile conflicting addresses. Interested folks might want to browse other tools/papers in the NCI Geographical Information Systems and Science for Cancer Control webiste. https://gis.cancer.gov/index.html
NCI/SEER Residential History Project
David Stinchcomb and Allison Roeser | Westat
SAS residential history generation programs [3 programs]
[Summary] [Link to programs]
Nathan Yau of Flowing Data has 5 tips for for learning to code for visualization: “being able to code your own visualization carries its own benefits like flexibility, speed, and complete customization.”
Stata is holding three 2-day sessions for new users. Sessions are $950 with a 15% discount for group enrollments of three or more.
Become intimately familiar with all three components of Stata: data management, analysis, and graphics. This two-day course is aimed at both new Stata users and those who wish to learn techniques for efficient day-to-day use of Stata. Upon completion of the course, you will be able to use Stata efficiently for basic analyses and graphics. You will be able to do this in a reproducible manner, making collaborative changes and follow-up analyses much simpler. Finally, you will be able to make your datasets self-explanatory to your co-workers and yourself when using them in the future.
The May 24-25 and June 20-21 are in Washington, DC and the October 24-25 session is in Las Vegas.
Go to this site for more training courses.