MIT Libraries logo MIT Libraries

MIT logo

2022 MIT Prize for Open Data

Photo by Bryce Vickmark.

The MIT School of Science and the MIT Libraries presented the inaugural MIT Prize for Open Data in 2022. The following winners and honorable mentions were selected from more than 70 nominees representing all five schools and several research centers across MIT.

Recipients were honored at the “Open Data @ MIT” event on October 28, 2022, in Hayden Library, featuring remarks from School of Science Dean Nergis Mavalvala and MIT Libraries Director Chris Bourg, award presentations, and short talks by the winners. Read more in MIT News.



  • Yunsie Chung
    Graduate student, Department of Chemical Engineering
    SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds
  • Matthew Groh, graduate student, MIT Media Lab; Caleb Harris, MEng, MIT Media Lab; Luis Soenksen, postdoc, MIT; Felix Lau, research engineer, Scale AI; Rachel Han, software engineer, Scale AI; Aerin Kim, manager, Scale AI; Arash Koochek, dermatologist, Banner Health; Omar Badri, dermatologist, Northeast Dermatology Associate
    Fitzpatrick 17k dataset, an open dataset consisting of 16,577 images of skin disease alongside skin disease and skin tone annotations.

  • Tom Pollard, research scientist; Benjamin Moody, programmer analyst; Li-Wei Lehman; research scientist; Brian Gow, technical associate; Chen Xie, engineer; Jesse Raffa, research scientist; Dana Moukheiber, technical associate; Lama Moukheiber; Ken Paik, research scientist; Leo Celi, principal research scientist; Alistair Johnson, research scientist; Roger Mark, Distinguished Professor of Health Sciences and Technology; Laboratory for Computational Physiology, Institute for Medical Engineering & Science
    PhysioNet, a data-sharing platform that enables thousands of clinical and machine-learning research studies each year and which allows researchers to share sensitive resources that would not be possible through typical data sharing platforms

  • Joseph Replogle
    Graduate student, Whitehead Institute
    Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date
  • Pedro Reynolds-Cuéllar, graduate student, MIT Media Lab/ACT; Diana Duarte, co-founder at Diversa, the Diversa team and the Retos’ network of community partners and universities.
    Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings, also aiding with matching hundreds of university students with challenges from rural collectives
  • Maanas Sharma
    Undergraduate student
    States of Emergency, a nationwide project analyzing and grading the responses of prison systems to COVID-19 using data scraped from public databases and manually collected data

  • Djuna von Maydell
    Graduate student, Department of Brain and Cognitive Sciences
    First publicly available dataset of single-cell gene expression from post-mortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene

  • Raechel Walker, graduate researcher; Olivia Dias, undergraduate researcher; Zeynep Yalcin, undergraduate researcher; Lina Henriquez, undergraduate researcher; Sophia Brady, undergraduate researcher; Matt Taylor, senior research team; Cynthia Breazeal, director, Personal Robots group, MIT Media Lab
    Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program (Cambridge); activities involved using data science and open data to challenge power inequalities, such as racism, and students learned how to use data science to recognize, mitigate, and advocate for people that are disproportionately impacted by systemic inequality

  • Suyeol Yun
    Graduate Student, Department of Political Science
    DeepWTO, a project creating open data for use in legal NLP (Natural Language Processing) research using cases from the World Trade Organization (WTO)
  • Jonathan Zheng
    Graduate student, Department of Chemical Engineering
    An open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution, transformed into FAIR (findable, accessible, interoperable and reusable) data from verified data locked in print

Honorable Mentions


Committee Co-Chairs

  • Chris Bourg, Director, MIT Libraries
  • Rebecca Saxe, Associate Dean of Science, School of Science (SoS)

Committee Members

  • Michael Bishop, School of Science Events Planner
  • Iain Cheeseman, Herman and Margaret Sokol Professor of Biology, SoS and Whitehead
  • Fotini Christia, Ford International Professor in the Social Sciences, School of Humanities, Arts, and Social Sciences (SHASS) and Institute for Data, Systems, and Society (IDSS)
  • Katharine Dunn, Scholarly Communications Librarian, MIT Libraries
  • Satrajit Ghosh, McGovern Institute, SoS, and Director of Data Models and Integration, ReproNim
  • Nick Lindsay, Director of Journals and Open Access, MIT Press
  • Amy Nurnberger, Program Head, Data Management Services, MIT Libraries
  • Jack Payette, graduate student, Earth and Planetary Sciences, SoS
  • Dave Rand, Erwin H. Schell Professor and Professor of Management Science and Brain and Cognitive Sciences, Sloan School of Management
  • Devavrat Shah, Andrew (1956) and Erna Viterbi Professor of EECS, School of Engineering and IDSS
  • Virginia Spanoudaki, Scientific Director, Preclinical Imaging and Testing facility, Koch Institute, SoS
  • Greg Wagner, Research Scientist, Earth and Planetary Sciences, SoS