An Introduction to Entity Resolution with Rebecca Steorts
Rebecca C. Steorts (Duke University, Statistical Science)
Wednesday, 7/10/2019. ARCHIVED EVENT
Location: 1430 ISR-Thompson
The PDHP workshop series resumes with the first in a multi-part series of workshops on record linkage topics & techniques within social research.
Please join Assistant Professor Rebecca C. Steorts, PhD, of Duke University's Department of Statistical Science, as she presents An Introduction to Entity Resolution, a half-day workshop geared toward statisticians, data scientists, population researchers, and computational social scientists of all experience levels. This hands-on workshop will cover both the theory and practice of probabilistic entity resolution, while demonstrating state of the art techniques using R software and Apache Spark.
• Overview and introduction to entity resolution
• Entity resolution fundamentals (record linkage, de-duplication, blocking, and computational gains)
• Entity resolution evaluation metrics (including precision, reduction ratio, and robustness to tuning parameters)
• Bayesian entity resolution models (including both parametric and nonparametric Bayesian mixture models)
• Hands-on demonstration of state of the art R packages (using blink) and computational gains (using Apache Spark)
Dr. Rebecca C. Steorts is Assistant Professor of the Department of Statistical Science at Duke University and affiliated faculty in Computer Science, Biostatistics and Bioinformatics, the information initiative at Duke (iiD), and the Social Science Research Institute. She also holds a Schedule A appointment at the U.S. Census Bureau.
Steorts main research focus is on entity resolution (record linkage or de-duplication), where the goal is to remove duplicated information from large, noisy databases in the absence of unique identifiers.