MIT Prize for Open Data
To highlight the value of open data at MIT, and to encourage the next generation of researchers, the MIT School of Science and the MIT Libraries present the MIT Prize for Open Data.
Congratulations to the recipients of the 2025 MIT Prize for Open Data!
The following winners and honorable mentions were selected from more than 60 nominees representing 30 different departments, labs, centers, and institutes across MIT. Join us as we honor them at the Open Data @ MIT event held Oct. 21 at Hayden Library.
Winners
- Lucas Attia, graduate student, Chemical Engineering; Jackson Burns, graduate student, Chemical Engineering; Patrick S. Doyle, Robert T Haslam (1911) Professor in Chemical Engineering; and William H. Green, Hoyt Hottel Professor in Chemical Engineering
Fastsolv
The team leveraged nearly 50,000 published experiments to develop fastsolv, an open-sourced deep learning model for organic solubility prediction. Fastsolv is freely available online and has been called by user scientists over 9,000 times since publication. - Timur Cinay, graduate student, Earth, Atmospheric, and Planetary Sciences
Galapagos Emissions Monitoring Station
First-of-their-kind continuous dataset monitoring ocean emissions of the greenhouse gas nitrous oxide, made completely free and openly available to all researchers globally. - Edgar Costa, research scientist, Mathematics
The L-functions and modular forms database (LMFDB)
The LMFDB is a database of mathematical objects arising in number theory and arithmetic geometry that illustrates some of the mathematical connections predicted by the Langlands program. - Danika Eamer, postdoctoral Impact Fellow, MIT Climate & Sustainability Consortium
Geospatial Trucking Industry Decarbonization Explorer (Geo-TIDE)
Geo-TIDE is an open data platform that synthesizes fragmented public datasets into more than 400 curated, cloud-hosted geospatial layers for freight decarbonization planning. By making these high-value datasets openly available through Zenodo and Amazon Web Services, and pairing them with open-source code and documented methods, Geo-TIDE enables fleets, policymakers, and researchers to translate complex data into actionable strategies for zero-emission trucking. - Connor Makowski, research associate, Computational Analytics, Visualization & Education (CAVE) Lab and MIT MicroMasters SCx; Tim Russell, research engineer, MIT CAVE and MIT Humanitarian Supply Chain Lab; Willem Guter, research engineer, MIT CAVE and MIT Intelligent Logistics Systems; Austin Saragih, PhD candidate, MIT Center for Transportation and Logistics; Arne Heinold, Assistant Professor for Transportation, Kühne Logistics University; and Spyridon Lekkakos, Professor of Supply Chain Management, MIT-Zaragoza International Logistics Program
SCGraph
SCGraph is an open source Python package that transforms scattered open transportation datasets into clean, ready to use geographic networks for research and real world analysis. With over 3.3k monthly downloads and adoption in multiple research projects, it shows how open data can be creatively synthesized into tools with broad impact. - Nada Tarkhan, graduate student, Architecture, and Paolo Giani, postdoctoral associate, Earth, Atmospheric and Planetary Sciences
Extreme-Aware Meteorological Years: Open Weather Data for Climate-Resilient Building Simulations
This project introduces open-source Representative and Future Meteorological Years (RMYs and FRMYs)—novel weather file formats that embed extreme events into building simulation workflows using anomaly detection and climate model emulators. Designed for global scalability and resilience planning, they enable realistic assessments of overheating, peak loads, and future risk across diverse global locations. - Jonathan Zheng, graduate student, Chemical Engineering; Ivo Leito, professor of analytical chemistry, University of Tartu, Estonia; and William H. Green, Hoyt Hottel Professor in Chemical Engineering
Widespread misinterpretation of pKa terminology for zwitterionic compounds and its consequences
Due to an unfortunate misinterpretation of chemical data, a widely-used biochemical dataset, ChEMBL, contains many incorrect values, negatively affecting its applications including drug design and organic chemistry. This work explained the reasons for the error, examined the downstream repercussions, and made recommendations for data curation to avoid these issues in the future.
Honorable Mentions
- Jeroen Audenaert
Multimodal Universe: Enabling Large-Scale Machine Learning with 100TBs of Astronomical Scientific Data - CAVE App team: Matthias Winkenbach, Tim Russell, Connor Makowski, Luis Vazquez, Willem Guter, Alice Zhao, Ella Wang
CAVE App - Yu-Chen (Janice) Chen
Reviving ALEPH: Modern, Validated Open Data from CERN’s LEP for New QCD Tests - Evan Collins
LNPDB (Lipid Nanoparticle Database) - Matteo Di Bernard
Brieflow: An Integrated Computational Pipeline for High-Throughput Analysis of Optical Pooled Screening Data - Lelia Hampton
Targeted urban afforestation can substantially reduce income-based heat disparities in U.S. cities (Zenodo; Github) - Margaret Hughes, Cassandra Overney
Voice to Vision - Sarah Mokhtar, Caitlin Mueller
PRISM: A Multi-modal Dataset for Learning-based Building Performance Modeling - William Parker
Space Debris as a Sensor for Earth’s Upper Atmosphere - Ci Xue
The GOTHAM Project: Open-Sourcing Interstellar Chemistry (Observation dataset; molecular census dataset)
2025 Committee
Committee Co-Chairs
- Chris Bourg, Director, MIT Libraries
- Rebecca Saxe, Associate Dean of Science, School of Science (SoS)
Committee Members
- Awad Abdelhamid, assistant director of research, Urban Mobility and Transit Labs
- Paul Berube, research scientist, Civil and Environmental Engineering
- Jerik Cruz, graduate student, Political Science
- Yifu Ding, post-doctoral research associate, MIT Energy Initiative
- Steve Flavell, Associate Professor, Picower Institute for Learning & Memory and Department of Brain and Cognitive Sciences
- Satrajit Ghosh, Director of the Open Data in Neuroscience Initiative, McGovern Institute, and Director of Data Models and Integration, ReproNim
- Rafael Jaramillo, Thomas Lord Career Development Professor, Associate Professor of Materials Science and Engineering
- Stuart Levine, Director, MIT BioMicro Center
- Peace Ossom, Director of Research Data Services, MIT Libraries
- Tom Pollard, research scientist, Laboratory for Computational Physiology
- Sadie Roosa, Collections Strategist for Repository Services, MIT Libraries
- Virginia Spanoudaki, Scientific Director, Preclinical Imaging and Testing, Koch Institute
Co-sponsored by the MIT School of Science and MIT Libraries