Archive for the 'Data' Category

Using Grid Maps to Visualize Data

Danny DeBelius of NPR’s Visuals Team discusses how geographic data is represented on maps and ways to make the visualization more accurate. The visualization they have landed on is the Hex-Tile map.

image of Hexagon Map

H/T Flowing Data, which shows other ways of producing this kind of map, including sheep and Darth Vaders.

How to Ask for Datasets

Christian Kreibich at medium.com provides some helpful tips for asking other researchers to share their data.

I’m a systems researcher. I work with data, plenty of it. Over the past decade I have sent lots of data inquiries, and have received dozens. Judging by the latter it’s safe to say that people often go about this poorly, so I’d like to give a bit of advice regarding how to formulate inquiries to other researchers. But before we start, a few clarifications. This article is dataset-centric, but the concerns apply similarly to resources such as algorithms, methods, or code. Also, I assume you have done your background research and already know whom to ask. This is not a guide for finding useful stuff. Finally, the following is by no means a complete guide on how to collaborate with other researchers, but it might provide some tips regarding how to start such a collaboration.

H/T Flowing Data

American Community Survey (ACS) Data Products Survey

The American Community Survey Office is conducting a survey to gather feedback on it’s products:

The ACS data products consist of tabulated products, such as aggregated estimates found in detailed tables or data profiles in the American FactFinder, and the Public Use Microdata Sample (PUMS) Files. We need your feedback in order to provide relevant and timely data products that are easy to access and use.

Please take a moment to complete this survey. Your responses will help us evaluate the ACS data products and dissemination and find ways to improve them. Please respond no later than May 29, 2015.

We estimate the survey will take 15 minutes to complete.

H/T Data Detectives

Big Data in 1848

In 1848, newspaper magnate and Representative Horace Greeley used open records to compare the mileage reimbursements of his fellow representatives to the postal routes (which should have been the shortest routes between districts and the U.S. capital). He found several, including Abraham Lincoln, overcharged significantly.

See Scott Klein’s story at ProPublica.

H/T Wonkblog

ACS Median Earnings by Detailed Occupation

The U.S. Census Bureau released 2013 Earnings by Sex and Detailed Occupation tables from the American Community Survey. Other tables include Sex, Race, and Hispanic Origin by Occupation: 2012 and Median Earnings of College Graduates by Field of Bachelor’s Degree and Occupation: 2012.

All table packages are here.

H/T Data Detectives

Big Government and Big Data

Ben Casselman of FiveThirtyEight examines the legal, bureaucratic and practical impediments the U.S. government faces in collecting and disseminating data about U.S. citizens.

When the government wants to know how many people are unemployed, it calls people and asks them whether they’re working. When it wants to know how quickly prices are rising, it sends researchers to stores to check price tags. And when it wants to know how much consumers are spending, it mails forms to thousands of retailers asking about their sales.

“Big data” may have revolutionized industries from advertising to transportation, but many of our most vital economic statistics are still based on methods that are decidedly, well, small.

Read the full article

PAA President Ruggles wants you to write a letter

The is an excellent summary of the consequences of the demise of the 3-year ACS tabular products. Please follow through and contact the relevant government officials:

ACS 3-Year Summary Products: Please take action to save the ACS 3-year data products
Steve Ruggles | PAA President and Director of the Minnesota Population Center
March 4, 2015

Another take on gentrification

Gentrification in America Report
Mike Maciag | Governing
February 2015
This resource is city-specific and provides both counts and maps of gentrified census tracts for the 50 largest cities. To be eligible for gentrification a census tract’s median household income and median home value were both in the bottom 40th percentile of all tracts within a metro area at the beginning of the decade. The gentrified tracts recorded increases in the top third percentile for both measures when compared to all others in a metro area.

Methodology

And more broadly, this resource has a special issue on gentrification:

The G-Word: A Special Series on Gentrification
The titles in this series are:
Do Cities Need Kids?
The Neighborhood Has Gentrified, But Where’s the Grocery Store?
Just Green Enough
Gentrification’s Not So Black and White After All
The Downsides of a Neighborhood ‘Turnaround
Some Cities Are Spurring the End of Sprawl
Keeping Cities from Becoming “Child-Free Zones”
From Vacant to Vibrant: Cincinnati’s Urban Transformation
Can Cities Change the Face of Biking?

President Obama Appoints the First U.S. Chief Data Scientist

President Obama recently appointed Dr. DJ Patil the Deputy U.S. CTO for Data Policy and Chief Data Scientist.

Read Patil’s Memo to the American People from February 20 and watch his address, Data Science: Where are We Going? with an introduction by President Obama.

The Hedometer Index

This is an index of happiness created from tweets. The index provides a daily score, which can be toggled to exclude weekends, Mondays, etc.

Hedometer Index

This is an excellent resource because the creators of this happiness index describe the calculation of the index, the words used in it, provide an API, have links to articles based on the index, etc. It is a valuable resource, even if you do not care about happiness as it provides a template for many other uses of data from Twitter.

Instructions [Documenation of index via video or written – click on links]
Words [Words used in index, ranks, etc.]
Blog [The Computational Story Lab. . . mostly related to happiness]
Press [press coverage]
Papers [refereed papers by research team]
Talks [maybe you need a clip for a lecture]
API [lots of examples]