Jason Owen-Smith

Creating a Data Quality Control Framework for Producing New Personnel-Based S&E Indicators

Research Project Description
Jason Owen-Smith, Jinseok Kim

We will develop an Automated and Stratified Entity Disambiguation (ASED) framework to resolve name ambiguity in large bibliographic data. We increase disambiguation accuracy by using stratified segmentation of entity instances and supervised machine learning trained on automatically labeled data. Second, we demonstrate the value of disambiguated data at scale by examining the involvement of U.S. science & engineering (S&E) researchers in international collaboration and citation networks using the entire corpus of Web of Science. We propose counterfactual analyses and impact simulations that compare model validity and research findings from the same data disambiguated using different methods. The approach we propose to disambiguate names and estimate ambiguity impact will contribute to sociology and management research for understanding what makes scientists and nations innovative and productive from ambiguous data, and to computer & information science for improving entity disambiguation and unstructured record linkage. The tools will be shared for reuse and improvement by scholars, and integrated into a data and codes platform open to research community for rigorous knowledge discovery from promising but messy data on S&E.

National Science Foundation

Funding Period: 9/1/2019 to 8/31/2021

PSC In The News

RSS Feed icon

Shaefer comments on the Cares Act impact in negating hardship during COVID-19 pandemic

Heller comments on lasting safety benefit of youth employment programs

More News


Dean Yang's Combatting COVID-19 in Mozambique study releases Round 1 summary report

Help Establish Standard Data Collection Protocols for COVID-19 Research

More Highlights

Connect with PSC follow PSC on Twitter Like PSC on Facebook