Jason Owen-Smith

Creating a Data Quality Control Framework for Producing New Personnel-Based S&E Indicators

Research Project Description
Jason Owen-Smith, Jinseok Kim

We will develop an Automated and Stratified Entity Disambiguation (ASED) framework to resolve name ambiguity in large bibliographic data. We increase disambiguation accuracy by using stratified segmentation of entity instances and supervised machine learning trained on automatically labeled data. Second, we demonstrate the value of disambiguated data at scale by examining the involvement of U.S. science & engineering (S&E) researchers in international collaboration and citation networks using the entire corpus of Web of Science. We propose counterfactual analyses and impact simulations that compare model validity and research findings from the same data disambiguated using different methods. The approach we propose to disambiguate names and estimate ambiguity impact will contribute to sociology and management research for understanding what makes scientists and nations innovative and productive from ambiguous data, and to computer & information science for improving entity disambiguation and unstructured record linkage. The tools will be shared for reuse and improvement by scholars, and integrated into a data and codes platform open to research community for rigorous knowledge discovery from promising but messy data on S&E.

National Science Foundation

Funding Period: 9/1/2019 to 8/31/2021

PSC In The News

RSS Feed icon

Mehta makes it clear why young people are leading the rise of COVID cases in Michigan: Socializing

More News


Frey's Social Science Data Analysis Network, SSDAN wins 2020 MERLOT Sociology Classics Award

Doing COVID-19 research? These data tools can help!

More Highlights

Connect with PSC follow PSC on Twitter Like PSC on Facebook