Peter Fisher, Thomas A. Frank (1977) Professor of Physics, led MIT’s Department of Physics for eight years before serving as the inaugural director of the Office of Research Computing and Data (ORCD), which launched in 2022. Bibliotech asked him to share his perspective on meeting MIT’s needs, the challenges of sharing research data, and how the Libraries can play a role.
ORCD launched last year to provide resources across MIT. What are the Institute’s most critical needs when it comes to research computing and data?
The Institute has taken the posture that in the future, everybody will need to use computers extensively for their research and for their teaching, and so we handle the research part of it, and starting the Schwarzman College was part of it. The critical need is to provide shared resources that meet the need of MIT researchers for their computing and data.
The thing that’s really grown as fast, if not faster, than the computing part is the data storage part. This is where there’s a lot of coupling with the Libraries, because libraries are all about storing and curating data. Also, along with data come a lot of restrictions on how data is stored. People use patient records, medical data, student records, and things like that in their research, and you have to store that differently than data from a particle physics experiment or an electronics lab.
An urgent need that we’re working on now is a flexible secure data storage system that we’re calling the “ORCD Enclave.” The other major thing is that our computing infrastructure needs renewal. So those are our two most critical things.
ORCD has talked about the MIT Libraries as a key partner in this work. What expertise do the Libraries bring to this collaboration that you think is crucial?
This really touches on data a lot, because our researchers frequently acquire or buy data. And data comes with data use agreements, which set forth how you’ll use the data in research, how you’ll store it, how you’ll ensure its security. And the Libraries have a huge amount of expertise in formulating and implementing data use agreements. Early on, we identified that as someplace where we’re going to need a lot of help, and the Libraries are, in fact, eager to help with this. [Digital repository] DSpace@MIT is a really good model, because that’s a place where there’s an enormous amount of heterogeneous information stored. So there’s kind of a porous line between the Libraries and our shared storage systems.
From your perspective as a researcher and as the head of ORCD, what value do you see in making research data publicly accessible?
There is of course the moral good. It’s always good to share. People at other institutions — particularly institutions that are not in a position to generate data themselves — they can be part of the research enterprise. There’s also the legal obligation that we have now to make research data publicly available. We at ORCD have it easy, because we provide the technological base, but the people who have to store the data have to do it in a way that’s actually useful. It does no good to hand someone a big data file without giving those people tools to access that data and sort it and extract parts of it.
There’s enormous value in making data available; the thing is, how do you provide publicly available data that preserves people’s privacy, that has security surrounding it, to people who want access to that data but maybe are in another country with a very different ability to access? There are a lot of conflicting requirements with data storage that there aren’t for computing; you’re not sharing computing around the world like you do for data.
What have you learned about the MIT Libraries through this new partnership with ORCD?
I think the Libraries are kind of a crown jewel at MIT, and I’d like to see them get honored more. I’ve always been fascinated with DSpace; you can go in there and poke around and find all kinds of things. The Libraries seem to be forward-leaning about digital curation, and that’s something that I strongly support. ORCD is just a year old, and we’re still finding our way, but I do want to find more and better ways to partner with the Libraries surrounding data.