
Photo by Bryce Vickmark.
The fourth annual MIT Prize for Open Data, which included a $2,500 cash prize, was recently awarded to seven individual and group research projects. Presented jointly by the School of Science and the MIT Libraries, the prize highlights the value of open data — research data that is openly accessible and reusable — at the Institute.
The prize winners and 10 honorable mention recipients were honored at the Open Data @ MIT event held Oct. 21 at Hayden Library. MIT President Sally Kornbluth opened the event by offering her congratulations to the winners, noting their creativity and determination and the wide range of projects being celebrated.
“I was excited to see how many of these projects support priorities we’ve identified for MIT, high-impact areas where we have a responsibility to harness our collective efforts and make a real difference – from climate and health care to generative AI and manufacturing,” said Kornbluth.

Photo by Bryce Vickmark.
The event also featured remarks by Rebecca Saxe, associate dean of the School of Science and the John W. Jarve (1978) Professor of Brain and Cognitive Science, who also serves as co-chair of the prize committee. Saxe noted the continued high caliber of nominations in the fourth year of the program. Winners were chosen from more than 65 nominees, representing 30 different academic departments, labs, centers, and institutes.
The MIT Prize for Open Data was launched in 2022, spearheaded by Saxe and Chris Bourg, director of MIT Libraries. It recognizes MIT-affiliated researchers who use or share open data, create infrastructure for open data sharing, or theorize about open data. Nominations were solicited from across the Institute, with a focus on trainees: undergraduate and graduate students, postdocs, and research staff. A committee composed of faculty, staff, and graduate students made the selections.
The 2025 winners include:
- Lucas Attia, a graduate student in chemical engineering, won for fastsolv, along with collaborators Jackson Burns, graduate student; Patrick S. Doyle, Robert T Haslam (1911) Professor in Chemical Engineering; and William H. Green, Hoyt Hottel Professor in Chemical Engineering. The team leveraged nearly 50,000 published experiments to develop fastsolv, an open-sourced deep learning model for organic solubility prediction. Fastsolv is freely available online and has been called by user scientists over 9,000 times since publication.
- Timur Cinay, a graduate student in earth, atmospheric, and planetary sciences, won for the Galapagos Emissions Monitoring Station, a first-of-their-kind continuous dataset monitoring ocean emissions of the greenhouse gas nitrous oxide, made completely free and openly available to all researchers globally.
- Edgar Costa, a research scientist in the Department of Mathematics, was recognized for the L-functions and modular forms database (LMFDB). LMFDB is a database of mathematical objects arising in number theory and arithmetic geometry that illustrates some of the mathematical connections predicted by the Langlands program.
- Danika Eamer, postdoctoral Impact Fellow, MIT Climate & Sustainability Consortium, presented on behalf of the team behind the Geospatial Trucking Industry Decarbonization Explorer (Geo-TIDE). Geo-TIDE is an open data platform that synthesizes fragmented public datasets into more than 400 curated, cloud-hosted geospatial layers for freight decarbonization planning. By making these high-value datasets openly available through Zenodo and Amazon Web Services, and pairing them with open-source code and documented methods, Geo-TIDE enables fleets, policymakers, and researchers to translate complex data into actionable strategies for zero-emission trucking. The team also includes Micah Borrero, PhD student, University of Michigan; Brooke Bao, undergraduate student, Wellesley College/Dartmouth College; Helena De Figueiredo Valente, undergraduate student, MIT; Amber Wu, undergraduate student, Wellesley College; Brilant Kasami, software consultant; and Viktoriia Tkachuk, UX/UI designer.
- Austin Saragih, PhD candidate, MIT Center for Transportation and Logistics, and Willem Guter, research engineer, MIT CAVE and MIT Intelligent Logistics Systems, presented on behalf of the SCGraph team. SCGraph is an open source Python package that transforms scattered open transportation datasets into clean, ready-to-use geographic networks for research and real world analysis. With over 3.3k monthly downloads and adoption in multiple research projects, it shows how open data can be creatively synthesized into tools with broad impact. The team also includes Connor Makowski, research associate, CAVE Lab and MIT MicroMasters SCx; Tim Russell, research engineer, MIT CAVE and MIT Humanitarian Supply Chain Lab; Arne Heinold, Assistant Professor for Transportation, Kühne Logistics University; and Spyridon Lekkakos, Professor of Supply Chain Management, MIT-Zaragoza International Logistics Program.
- Nada Tarkhan, graduate student in architecture, and Paolo Giani, postdoctoral associate, EAPS, won for their project, “Extreme-Aware Meteorological Years: Open Weather Data for Climate-Resilient Building Simulations.” The project introduces open-source Representative and Future Meteorological Years (RMYs and FRMYs)—novel weather file formats that embed extreme events into building simulation workflows using anomaly detection and climate model emulators. Designed for global scalability and resilience planning, they enable realistic assessments of overheating, peak loads, and future risk across diverse global locations.
- Jonathan Zheng, graduate student, chemical engineering, and collaborators Ivo Leito, professor of analytical chemistry, University of Tartu, Estonia, and William H. Green, Hoyt Hottel Professor in Chemical Engineering, won for their project, “Widespread misinterpretation of pKa terminology for zwitterionic compounds and its consequences.” Due to an unfortunate misinterpretation of chemical data, a widely-used biochemical dataset, ChEMBL, contains many incorrect values, negatively affecting its applications including drug design and organic chemistry. This work explained the reasons for the error, examined the downstream repercussions, and made recommendations for data curation to avoid these issues in the future.
A complete list of winning projects and honorable mentions, including links to the research data, is available on the MIT Libraries’ Open Data website.