MIT Libraries logo MIT Libraries

MIT logo Search Contact

File formats for long-term access

As technology changes, researchers should plan for both hardware and software obsolescence and consider the longevity of their file format choices to ensure long term readability and access.

File formats more likely to be accessible in the future have the following characteristics:

  • Non-proprietary
  • Open, documented standard
  • Common usage by research community
  • Standard representation (ASCII, Unicode)
  • Unencrypted
  • Uncompressed

Examples of preferred file format choices include:

  • ODF or LaTeX or TXT, not Word
  • ASCII, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format. Note, in some cases, migrating data to an open format may cause data/metadata loss.

If you deposit your data in a repository, your files may be migrated to newer formats, so that they’re usable to future researchers.