Collaborative Research: Framework for Integrative Data Equity Systems
Framework for Integrative Data Equity Systems
Data Science has already had a significant impact on Science and Engineering, and on society atlarge. Much greater impact is expected in the near future. Often this impact is through our beingable to solve problems we were previously unable to, or to solve them faster or more accurately.However, the data science can sometimes lead to incorrect, inappropriate or undesired outcomes,such as introduction of unintended bias and leakage of private information. This project will establish the framework for a National Institute that minimizes these bad outcomes through development of a methodology to enable data equity, in terms of what (and who) is included in the datasets used for analysis, which datasets are available for analysis, and who has access to these. Techniques developed in this project will be utilized, and evaluated, on a broad variety of data sets, in multiple application scenarios.
Intellectual Merit This project addresses several technically challenging problems that are critical to responsibility in integrative data science. These include: (1) To be able to use data from multiple sources, we first have to obtain agreement from multiple sources to share data. Often, privacy concerns will prevent such sharing. This project develops synthetic data creation methods to overcome these concerns. (2) Data in any source may have skews that render it not suitable for the intended analysis. Integrating multiple data sets can exacerbate such skews. This project introduces a skew-correction tensor that can be applied broadly to address such problems. (3) To be able to trust analytical results, it is critical to be able explain the analysis steps performed and the sources of data analyzed. This project develops the provenance methods necessary.
Broader Impact Irresponsible data science is most likely to harm underrepresented minorities, women, and people from weaker socio-economic groups. This is true even for most science and engineering problems. For example, a transportation engineer's analysis can lead to bus routes that have a profound impact on the economic growth of a neighborhood. This institute will complement other HDR-DIRSE institutes, and work with them to achieve more responsible data science. Our work will impact the conduct of data science in mobility, a domain of great social importance. The core technical results will also be applicable in many other domains of science and engineering. Also expertise development, network-building, teaching, outreach, diversity. Keywords: data ethics, societal impacts of data science, fairness, diversity, transparency, data protection
National Science Foundation
Funding Period: 9/1/2019 to 8/31/2021