Happy World Digital Preservation Day!

This World Digital Preservation Day, MIT Libraries is looking back over some of the progress we’ve made since launching our Comprehensive Digital Preservation Service in 2020. One big update was adding new preservation storage locations for preserving digitized content in addition to our born digital collections. 

The work of our imaging, digital archiving, and digital preservation teams overlapped, especially when it came to another workflow improvement: transcribing and captioning our digitized audio and video content. We started with a pilot program to establish the transcription/captioning phase of our digitization workflow to comply with regulations, meet our goals to make content more accessible, and ensure we can handle this ongoing commitment with our current resources. 

With the pilot completed and the workflows implemented, we had more complex digital assets to manage from the imaging workflow. We decided to preserve and make accessible transcripts and captions as separate files (sidecar files) to the audio or video itself, meaning we needed to incorporate the sidecar files into the preservation package structure. This meant that we needed to decide how to arrange and describe the AV, text, and other files that are all part of the same asset. In the end, we chose to store the transcript and captioning files as metadata files. This will make them more easily editable in our preservation system in the future while still preserving them alongside the AV assets.

As we lay the foundation for larger AV digitization projects to preserve at risk media, we are looking into other ways to improve our captions and transcripts. We’re reviewing guidelines like for Embedded Metadata in WebVTT Files and figuring out how to caption and transcribe recordings that contain multiple languages throughout.

Making careful decisions about how to structure our preservation packages allows us to not only make the audio and video files accessible to a wider audience today, but also means that our service will be able to provide more accessible dissemination packages for users going into the future. A future staff member won’t have to know ahead of time that we are storing the transcript/captions as well as the AV file in preservation storage. Anyone retrieving the asset from the preservation service will be provided with all of the files making up the asset, as well as the information they need to understand the relationship between the AV files and the transcripts/captions.