Quantitative Text Analysis: Michael vs Jacob

Most of the data demographers use are numeric and are easily handled via statistical packages. Text data via Google NGrams or names from the Social Security Names Database are more commonly analyzed using Python.

CSCAR and ARC are sponsoring free Python training Friday, November 8th. Space is limited.

In case you miss the workshop, here’s a link to some Big Data Tutorials by Neal Caren at the University of North Carolina, Chapel Hill.

And, to get back to the title of this blog entry, below are three data visualizations on names. The first two are the most common name by state & gender from 1960 to 2010.

Click on the images to activate the gifs.

us_map_gnames us_map_bnames

Notice that for the girls, Lisa dominated the US in 1965, which means I was born 10+ years too early to have that name. And for the boys, watch the epic battle for Michael vs Jacob. Also note that Jose is the dominant male name in Texas in 1996. Arizona also has two Hispanic names (Jose and Angel) in the recent past.

The third data visualization explores unisex names:
unisex names

Finally, think of these as data. We have a link to research on black first names as well as a post on the declining popularity of Mary.

Resources:
Big Data Tutorials, Neal Caren (University of North Carolina, Chapel Hill).

Google NGrams Viewer

Google NGrams Data
Note, we have downloaded quite a bit of this. See Data Service before you download another copy.

Social Security Names Database

A Wondrous GIF Shows the Most Popular Baby Names for Girls Since 1960
Rebecca Rosen | The Atlantic
October 18, 2013

America’s Most Popular Boys’ Names Since 1960, in 1 Spectacular GIF
Megan Garber |The Atlantic
October 24, 2013

The most unisex names in US history
Data Underload | FlowingData Blog
September 25, 2013

0 Responses to “Quantitative Text Analysis: Michael vs Jacob”


Comments are currently closed.