MIT Libraries logo MIT Libraries

MIT logo Search Contact

Documentation & metadata

Documentation, also known as metadata, helps you understand and describe your data, code, and research materials in detail, and also helps other researchers find, use, and properly cite what you’ve done. Metadata is generally created at both project- and data/code-levels. Metadata at the project-level describes the “who, what, where, when, how and why” of the materials, which provides context for understanding why the materials were collected or created and how they were used. Metadata at the data/code-level describes the attributes of the dataset, code, software, model, etc. at a granular level (e.g., data standards).

It is good practice to use an established metadata standard when possible, preferably one recognized within your discipline. Metadata standards are predefined guidelines that dictate the structure and format of metadata, ensuring consistency in describing and managing data. Metadata standards promote consistency, interoperability, and effective data management. Various standards are available for particular file formats and disciplines, as cataloged in the Metadata Standards Catalog and FAIRsharing.org.

General guidelines are provided below. For more details, see materials from our workshop on file organization. For help in documenting your data, email data-management@mit.edu.

Important things to do while you collect or create your data

  • Document while you work by using an electronic lab notebook (ELN) and readME files.
  • Make a note of all file names and formats associated with the project, how the data is organized, how the data was generated (including any equipment or software used), and information about how the data has been altered or processed.
  • Include an explanation of codes, abbreviations, or variables used in the data or in the file naming structure.
  • Keep notes about where you got the data so that you and others can find it.
  • If you know the repository where you are planning to deposit, check their metadata requirements.

Things to document about your data and research materials

Title
Name of the dataset, code, or research project that produced it

Creator
Names of the organization or people who created the materials and contact information (e.g., physical addresses [for organizations], email addresses, websites, contact webforms)

Identifier
Unique identifier for the data. Preferably, this is a unique persistent identifier (e.g., DOI, handle), but an internal project reference number is better than nothing!

Citation
How the material should be cited.

Dates
Key dates associated with the data, including project start and end date, data modification data release date, and time period covered by the data

Abstract / Subject
Brief summary of the dataset, providing an overview of its contents, purpose, and how it was collected. Keywords or phrases describing the subject or content of the data.

Funders / Project sponsors
Organizations or agencies who funded the research

Rights / Protections / Licenses
Any known intellectual property rights held for the data. Any protections or conditions that define how a dataset can be used, shared, modified, or redistributed. Legal agreement that defines how others can use the dataset (See http://choosealicence.org).

Language
Language(s) of the intellectual content of the resource, when applicable

Location
Where the data relates to a physical location, record information about its spatial coverage

Methodology

  • Data origin – Overview of experimental, observational, raw or derived, physical collections, models, images, etc.
  • File format – The structured method or convention used to organize, present, and store data.
  • Methods – How the data was generated, including equipment or software used, experimental protocol, other things you might include in a lab notebook
  • Data standards – Documented agreements on representation, format, definition, structure, tagging, transmission, manipulation, use, documentation, and management of data.  If using a well-documented data standard, include links to supporting documentation. (See FAIRsharing.org). Standards may include controlled vocabularies, terminologies, and ontologies; common data elements (CDEs); or common data models (CDMs).

Need more help?