MIT Libraries

Data Management and Publishing

 

File Formats for Long-Term Access

The file format in which you keep your data is a primary factor in one's ability to use your data in the future.

As technology continually changes, researchers should plan for both hardware and software obsolescence. How will your data be read if the software used to produce them becomes unavailable?

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.           

Examples of preferred format choices:

  • PDF/A, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

For examples of how data archives treat different file formats, see the DSpace Format Support Policy or the UK Data Archive page on data formats and software. Note that not all repositories are able to migrate data files to newer file formats for preservation.

 

This page was last updated on Thursday, 16-Jul-2009 08:02:27 EDT

For advice on a data management project, contact:

data-management
@mit.edu

Anne Graham
Civil and Environmental Engineering Librarian

Katherine McNeill
Data Services and Economics Librarian

Amy Stout
Computer Science Librarian

Lisa Sweeney
Head of GIS Services

 

MIT

For help on a data management project, contact: data-management@mit.edu