Celebrating Open Data

New prize program recognizes MIT researchers who make data openly accessible and reusable

Open Data Prize winners stand as a group with their arms around each other

Photo by Bryce Vickmark.

The inaugural MIT Prize for Open Data, which included a $2,500 cash prize, was awarded to 10 individual and group research projects last fall. Presented jointly by the School of Science and the MIT Libraries, the prize recognizes MIT-affiliated researchers who make their data openly accessible and reusable by others. The prize winners and 16 honorable mention recipients were honored at an event held October 28 at Hayden Library.

“By launching an MIT-wide prize and event, we aimed to create visibility for the scholars who create, use, and advocate for open data,” says Rebecca Saxe, associate dean of the School of Science and John W. Jarve (1978) Professor of Brain and Cognitive Sciences. “Highlighting this research and creating opportunities for networking would also help open-data advocates across campus find each other.”

Winners and honorable mentions were chosen from more than 70 nominees, representing all five schools, the MIT Schwarzman College of Computing, and several research centers across MIT. A committee composed of faculty, staff, and a graduate student made the selections.

 

2022 Winners, MIT Prize for Open Data

Yunsie Chung, graduate student in the Department of Chemical Engineering, won for SolProp, the largest open-source dataset with temperature-dependent solubility values of organic compounds.

Matthew Groh, graduate student, MIT Media Lab, accepted on behalf of the team behind the Fitzpatrick 17k dataset, an open dataset consisting of nearly 17,000 images of skin disease alongside skin disease and skin tone annotations.

Tom Pollard, research scientist at the Institute for Medical Engineering and Science, accepted on behalf of the PhysioNet team. This data sharing platform enables thousands of clinical and machine-learning research studies each year and allows researchers to share sensitive resources that would not be possible through typical data sharing platforms.

Joseph Replogle, graduate student with the Whitehead Institute for Biomedical Research, was recognized for the Genome-wide Perturbseq dataset, the largest publicly available, single-cell transcriptional dataset collected to date.

Pedro Reynolds-Cuéllar, graduate student with the MIT Media Lab/Art, Culture, and Technology, and Diana Duarte, co-founder at Diversa, won for Retos, an open data platform for detailed documentation and sharing of local innovations from under-resourced settings.

Maanas Sharma, an undergraduate student, led States of Emergency, a nationwide project analyzing and grading the responses of prison systems to Covid-19 using data scraped from public databases and manually collected data.

Djuna von Maydell, graduate student in the Department of Brain and Cognitive Sciences, created the first publicly available dataset of single-cell gene expression from postmortem human brain tissue of patients who are carriers of APOE4, the major Alzheimer’s disease risk gene.

Raechel Walker, graduate researcher in the MIT Media Lab, and her collaborators created a Data Activism Curriculum for high school students through the Mayor’s Summer Youth Employment Program in Cambridge, Massachusetts. Students learned how to use data science to recognize, mitigate, and advocate for people who are disproportionately impacted by systemic inequality.

Suyeol Yun, graduate student in the Department of Political Science, was recognized for DeepWTO, a project creating open data for use in legal natural language processing research using cases from the World Trade Organization.

Jonathan Zheng, graduate student in the Department of Chemical Engineering, won for an open IUPAC dataset for acid dissociation constants, or “pKas,” physicochemical properties that govern how acidic a chemical is in a solution.