Jason Owen-Smith

Creating a Data Quality Control Framework for Producing New Personnel-Based S&E Indicators

Research Project Description
Jason Owen-Smith, Jinseok Kim

We will develop an Automated and Stratified Entity Disambiguation (ASED) framework to resolve name ambiguity in large bibliographic data. We increase disambiguation accuracy by using stratified segmentation of entity instances and supervised machine learning trained on automatically labeled data. Second, we demonstrate the value of disambiguated data at scale by examining the involvement of U.S. science & engineering (S&E) researchers in international collaboration and citation networks using the entire corpus of Web of Science. We propose counterfactual analyses and impact simulations that compare model validity and research findings from the same data disambiguated using different methods. The approach we propose to disambiguate names and estimate ambiguity impact will contribute to sociology and management research for understanding what makes scientists and nations innovative and productive from ambiguous data, and to computer & information science for improving entity disambiguation and unstructured record linkage. The tools will be shared for reuse and improvement by scholars, and integrated into a data and codes platform open to research community for rigorous knowledge discovery from promising but messy data on S&E.

National Science Foundation

Funding Period: 9/1/2019 to 8/31/2021

PSC In The News

RSS Feed icon

Courant on The Real Problem With Grade Inflation

Axinn explores How parents' love shapes children's lives in new study in Nepal

More News


Data Scientist Job Open at PSC/PDHP

New Investigator Mentoring Program (PDHP) Applications Sought

More Highlights

Connect with PSC follow PSC on Twitter Like PSC on Facebook