PDHP Workshops Series:
Principles of Text Analysis
Patrick van Kessel (Pew Research Center)
Wednesday, 11/18/2020, 9:00am to 1:00pm
PDHP resumes our 2020 workshop series on Nov. 15th, with a workshop entitled Principles of Text Analysis, presented by Patrick van Kessel, senior data scientist at Pew Research Center. This half-day workshop is geared toward data analysts with unstructured text data (e.g. open-ended survey responses or web-curated text), and will provide a tutorial on cleaning, processing, and analyzing data from text-based sources using state-of-the-art text analytics techniques primarily using Python, with some examples also provided in R (experience with either of these languages is recommended but not required).
* Preprocessing and cleaning messy text data
* Feature extraction using TF-IDF vectorization
* Text analytics techniques including topic modelling and unsupervised clustering methods
* Software demonstration featuring the scikitlearn library for Python.
Patrick van Kessel is a senior data scientist at Pew Research Center, specializing in computational social science research and methodology. He is the author of studies that have used natural language processing and machine learning to measure negative political discourse and news sharing behavior by members of Congress on social media, and is involved in the ongoing development of best practices for the application of data science methods across the Center. Van Kessel received his master's degree in social science from the University of Chicago, where he focused on open-ended survey research and text analytics. He holds bachelor's degrees in economics and political science from the University of Texas at Austin. Prior to joining Pew Research Center, he worked at NORC at the University of Chicago as a data scientist and technical advisor on a variety of research projects related to health, criminal justice and education.