Congratulations to the inaugural winners of the MIT Prize for Open Data!
The MIT School of Science and the MIT Libraries present the inaugural MIT Prize for Open Data to highlight the value of open data at MIT and to encourage the next generation of researchers. The following winners and honorable mentions were selected from more than 70 nominees representing all five schools and several research centers across MIT. 


Open Data @ MIT Event
An event was held on October 28 in Hayden Library to highlight the value of open data at MIT and to celebrate the winners of the inaugural MIT Prize for Open Data. The program featured remarks from School of Science Dean Nergis Mavalvala and MIT Libraries Director Chris Bourg, award presentations, and short talks from prize winners. The event also included a reception, featuring refreshments and an opportunity to meet the recipients and other open data advocates and practitioners from across campus. Read more in MIT News.



  • Yunsie Chung
    Graduate student, Department of Chemical Engineering
    SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds
  • Matthew Groh, graduate student, MIT Media Lab; Caleb Harris, MEng, MIT Media Lab; Luis Soenksen, postdoc, MIT; Felix Lau, research engineer, Scale AI; Rachel Han, software engineer, Scale AI; Aerin Kim, manager, Scale AI; Arash Koochek, dermatologist, Banner Health; Omar Badri, dermatologist, Northeast Dermatology Associate
    Fitzpatrick 17k dataset, an open dataset consisting of 16,577 images of skin disease alongside skin disease and skin tone annotations.

  • Tom Pollard, research scientist; Benjamin Moody, programmer analyst; Li-Wei Lehman; research scientist; Brian Gow, technical associate; Chen Xie, engineer, Dana Moukheiber, technical associate; Lama Moukheiber; Ken Paik, research scientist; Leo Celi, principal research scientist; Alistair Johnson, research scientist; Roger Mark, Distinguished Professor of Health Sciences and Technology; Laboratory for Computational Physiology, Institute for Medical Engineering & Science
    PhysioNet, a data-sharing platform that enables thousands of clinical and machine-learning research studies each year and which allows researchers to share sensitive resources that would not be possible through typical data sharing platforms

  • Joseph Replogle
    Graduate student, Whitehead Institute
    Genome-wide Perturb-seq dataset, the largest publicly available, single-cell transcriptional dataset collected to date
  • Pedro Reynolds-Cuéllar, graduate student, MIT Media Lab/ACT; Diana Duarte, co-founder at Diversa, the Diversa team and the Retos’ network of community partners and universities.
    Retos, an open-data platform for detailed documentation and sharing of local innovations from under-resourced settings, also aiding with matching hundreds of university students with challenges from rural collectives
  • Maanas Sharma
    Undergraduate student
    States of Emergency, a nationwide project analyzing and grading the responses of prison systems to COVID-19 using data scraped from public databases and manually collected data

  • Djuna von Maydell
    Graduate student, Department of Brain and Cognitive Sciences
    First publicly available dataset of single-cell gene expression from post-mortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene

  • Raechel Walker, graduate researcher; Olivia Dias, undergraduate researcher; Zeynep Yalcin, undergraduate researcher; Lina Henriquez, undergraduate researcher; Sophia Brady, undergraduate researcher; Matt Taylor, senior research team; Cynthia Breazeal, director, Personal Robots group, MIT Media Lab
    Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program (Cambridge); activities involved using data science and open data to challenge power inequalities, such as racism, and students learned how to use data science to recognize, mitigate, and advocate for people that are disproportionately impacted by systemic inequality

  • Suyeol Yun
    Graduate Student, Department of Political Science
    DeepWTO, a project creating open data for use in legal NLP (Natural Language Processing) research using cases from the World Trade Organization (WTO)
  • Jonathan Zheng
    Graduate student, Department of Chemical Engineering
    An open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution, transformed into FAIR (findable, accessible, interoperable and reusable) data from verified data locked in print

Honorable Mentions


Co-sponsored by 

