Honoring Researchers Across MIT

Third annual MIT Prize for Open Data awarded to 10 research projects

MIT Prize for Open Data honorees pose for a group picture

Photo by Bryce Vickmark.

The third annual MIT Prize for Open Data, which included a $2,500 cash prize, was awarded in October. Presented jointly by the MIT Libraries and the School of Science, the prize highlights the value of open data — research data that is openly accessible and reusable — at the Institute. The prize program was launched in 2022, spearheaded by Chris Bourg and Rebecca Saxe, associate dean of the School of Science and the John W. Jarve (1978) Professor of Brain and Cognitive Sciences. It recognizes MIT-affiliated researchers who use or share open data, create infrastructure for open data sharing, or theorize about open data.

“This year, we noted a number of submissions touching on strategic priorities for MIT, like artificial intelligence,” said Bourg. “There were also a number of projects that relate in some way to climate change, democracy, and human health.”

The 2024 awards were presented at a celebratory event held during International Open Access Week. Winners gave fiveminute presentations on their projects and the role that open data plays in their research. The program also included remarks from Nergis Mavalvala, dean of the School of Science and Curtis (1963) and Kathleen Marble Professor of Astrophysics, who noted how open data drives research forward, including her own work detecting gravitational waves. “People who make data usable by others are not celebrated enough,” she said.

Winners were chosen from more than 70 nominees, representing 25 different departments, labs, and centers across the Institute.

  • Awad Abdelhalim, assistant director of research, Urban Mobility and Transit Labs, won for the KhartouMap Initiative, along with collaborators Ilham Ali and Abubakr Ziedan. KhartouMap is the first to fully map Khartoum’s semi-formal public transit system and provide open data on transit routes, usage, and opportunities for improvement.
  • Faisal AlNasser, PhD candidate in Civil and Environmental Engineering, and Dara Entekhabi, Bacardi and Stockholm Water Foundations Professor, were recognized for the DustSCAN Dust Plumes Dataset, the first open-source collection tracking mineral dust plumes using satellite data across the global “Dust Belt.”
  • The team behind Cast Vote Records: A Database of Ballots from the 2020 U.S. Election, downloaded publicly available unstandardized cast vote records from the 2020 U.S. general election, standardized them into a multi-state database, and extensively compared their totals to certified election results.
  • Undergraduate student Lily Chen won for FactPICO, a novel and open benchmark for factuality evaluation of plain language summarization of medical evidence including 345 LLMgenerated summaries of randomized controlled trial abstracts, as well as fine-grained medical expert factuality assessments based on a PICO evaluation framework.
  • Also recognized was the team who created a dataset of the Operating Station Heat Rate for 806 Indian Coal Plant Units Using Machine Learning. Considering different factors, including water stress, coal price, coal age, and power capacity, the group created a station heat rate dataset for 806 Indian coal plant units using machine learning, presenting the most comprehensive coverage compared with previous databases.
  • Mohamed Elrefaie, graduate student in Mechanical Engineering, Faez Ahmed, d’Arbeloff Career Development Assistant Professor of Mechanical Engineering, and collaborators Angela Dai and Florin Morar won for DrivAerNet. It provides a comprehensive, large-scale multimodal car dataset with high-fidelity CFD simulations and deep learning benchmarks, enabling advanced aerodynamic analysis and design optimization.
  • Hannah Jacobs, PhD candidate in Biology, won for her project, “Widespread naturally variable human exons aid genetic interpretation,” detecting naturally variable human exons in publicly available RNA sequencing data to aid in understanding of health and disease.
  • Members of the MIT BioMicro Center, including Charlie Demurjian, Taisha Joseph, and Director Stuart Levine, were recognized for the Data Management and Analysis Core for the MIT Superfund Research Program. They created infrastructure that handles thousands of datasets to enable effective sharing through open access.
  • Joachim Schaeffer, a visiting graduate student at the MIT Energy Initiative, won for a large lithium-ion battery field dataset. It is the first openly available dataset of batteries that failed in the field and enables further research into battery health monitoring and fault detection, which is important for battery safety.
  • Yosuke Tanigawa, a research scientist in the Computer Science & Artificial Intelligence Lab, developed inclusive polygenic scores, the first methodology applicable to everyone across the continuum of genetic ancestry, for genetic prediction of disease risks.

Learn more about the winning projects, as well as honorable mentions, and see links to all the projects’ research data, at libraries.mit.edu/opendata.