{"id":48,"date":"2014-05-29T19:44:13","date_gmt":"2014-05-29T19:44:13","guid":{"rendered":"http:\/\/libraries-dev.mit.edu\/data-management\/?page_id=48"},"modified":"2026-03-12T13:41:40","modified_gmt":"2026-03-12T13:41:40","slug":"documentation","status":"publish","type":"page","link":"https:\/\/libraries.mit.edu\/data-management\/store\/documentation\/","title":{"rendered":"Documentation &amp; metadata"},"content":{"rendered":"<p>Metadata and data documentation help you understand and describe your data, code, and research materials in detail, and also help other researchers find, use, and properly cite what you&#8217;ve done. Documentation is generally created at both project- and data\/code-levels. Documentation at the <b>project-level<\/b> describes the \u201cwho, what, where, when, how and why\u201d of the materials, which provides context for understanding why the materials were collected or created and how they were used. Documentation at the <b>data\/code-level <\/b>describes the attributes of the dataset, code, software, model, etc. at a granular level (e.g., data standards).<\/p>\n<p>It is good practice to use an established metadata standard when possible, preferably one recognized within your discipline. Metadata standards are predefined guidelines that dictate the structure and format of metadata, ensuring consistency in describing and managing data. Metadata standards promote consistency, interoperability, and effective data management. Various\u00a0standards are available for particular file formats and disciplines, as cataloged in the <a href=\"https:\/\/rdamsc.bath.ac.uk\/\">Metadata Standards Catalog<\/a> and<a href=\"http:\/\/fairsharing.org\"> FAIRsharing.org<\/a>.<\/p>\n<p>General guidelines are provided below. For more details, <a title=\"Workshops\" href=\"\/data-management\/services\/workshops\/\">see materials from our workshop on file organization<\/a>.\u00a0For help in documenting your data, email\u00a0<a href=\"mailto:data-management@mit.edu\">data-management@mit.edu<\/a>.<\/p>\n<h2>Important things to do while you collect or create\u00a0your data<\/h2>\n<ul>\n<li>Document while you work by using an <a href=\"https:\/\/libraries.mit.edu\/data-management\/store\/electronic-lab-notebooks\/\">electronic lab notebook (ELN)<\/a> and readME files.<\/li>\n<li>Make a note of all file names and formats associated with the project, how the data is organized, how the data was generated (including any equipment or software used), and information about how the data has been altered or processed.<\/li>\n<li>Include an explanation of codes, abbreviations, or variables used in the data or in the file naming structure.<\/li>\n<li>Keep notes about where you got the data so that you and others can find it.<\/li>\n<li>If you <a href=\"https:\/\/libraries.mit.edu\/data-management\/share\/find-repository\/\">know the repository<\/a> where you are planning to deposit, check their metadata requirements.<\/li>\n<\/ul>\n<h2>Things to document about your data and research materials<\/h2>\n<p><strong>Title<br \/>\n<\/strong>Name of the dataset, code, or research project that produced it<\/p>\n<p><strong>Creator<br \/>\n<\/strong>Names of the organization or people who created the materials and contact information (e.g., physical addresses [for organizations], email addresses, websites, contact webforms)<\/p>\n<p><strong>Identifier<br \/>\n<\/strong>Unique identifier for the data. Preferably, this is a unique persistent identifier (e.g., DOI, handle), but an internal project reference number is better than nothing!<\/p>\n<p><strong>Citation<br \/>\n<\/strong>How the material should be cited.<\/p>\n<p><strong>Dates<\/strong><br \/>\nKey dates associated with the data, including project start and end date, data modification data release date, and time period covered by the data<\/p>\n<p><strong>Abstract \/ Subject<\/strong><br \/>\nBrief summary of the dataset, providing an overview of its contents, purpose, and how it was collected. Keywords or phrases describing the subject or content of the data.<\/p>\n<p><strong>Funders \/ Project sponsors<br \/>\n<\/strong>Organizations or agencies who funded the research<\/p>\n<p><strong>Rights \/ Protections \/ Licenses<br \/>\n<\/strong>Any known intellectual property rights held for the data. Any protections or conditions that define how a dataset can be used, shared, modified, or redistributed. Legal agreement that defines how others can use the dataset (See <a href=\"http:\/\/choosealicence.org\">http:\/\/choosealicence.org<\/a>).<\/p>\n<p><strong>Language<br \/>\n<\/strong>Language(s) of the intellectual content of the resource, when applicable<\/p>\n<p><strong>Location<br \/>\n<\/strong>Where the data relates to a physical location, record information about its spatial coverage<\/p>\n<p><strong>Methodology<\/strong><\/p>\n<ul>\n<li><b>Data origin <\/b>&#8211; Overview of experimental, observational, raw or derived, physical collections, models, images, etc.<\/li>\n<li><b>File format <\/b>&#8211; The structured method or convention used to organize, present, and store data.<\/li>\n<li><strong>Methods<\/strong> &#8211; How the data was generated, including equipment or software used, experimental protocol, other things you might include in a lab notebook<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20250705130051\/https:\/\/www.nichd.nih.gov\/about\/org\/od\/odss\/data_standards\"><b>Data standards<\/b><\/a> &#8211; Documented agreements on representation, format, definition, structure, tagging, transmission, manipulation, use, documentation, and management of data.\u00a0 If using a well-documented data standard, include links to supporting documentation. (See <a href=\"https:\/\/fairsharing.org\/search?fairsharingRegistry=Standard\">FAIRsharing.org<\/a>). Standards may include controlled vocabularies, terminologies, and ontologies; common data elements (CDEs); or common data models (CDMs).<\/li>\n<\/ul>\n<p><strong>AI use<\/strong><\/p>\n<p>As more LLM\/AI-based tools are available in the research and scholarship spaces, it is important to acknowledge and document their use as part of accurate methodology and supporting reproducibility. The <a href=\"https:\/\/aidframework.org\/\">Artificial Intelligence Disclosure (AID) &#8211; Statement Builder<\/a> provides a framework to document where and how these tools were used in your research and scholarship.<\/p>\n<h2>Need more help?<\/h2>\n<ul>\n<li>Cornell&#8217;s Research Data Management Service Group has a <a href=\"https:\/\/data.research.cornell.edu\/data-management\/sharing\/readme\/\">Guide to writing &#8220;readme&#8221; style metadata<\/a><\/li>\n<li>Social Science Data Editor&#8217;s released <a href=\"https:\/\/doi.org\/10.5281\/zenodo.4319999\">A template README for social science replication packages<\/a><\/li>\n<li>Harvard&#8217;s guide on <a href=\"https:\/\/datamanagement.hms.harvard.edu\/collect-analyze\/documentation-metadata\/data-dictionary\">Data Dictionaries<\/a><\/li>\n<li>ICPSR&#8217;s guide on <a href=\"https:\/\/www.icpsr.umich.edu\/sites\/icpsr\/posts\/shared\/what-is-a-codebook\">Codebooks<\/a><\/li>\n<li><a href=\"https:\/\/www.dropbox.com\/scl\/fo\/rsateysruutn8z9qhivkz\/AKXLWhNW80QG3CYPGYi2ZSs?rlkey=fkajluftkzyc8amqj9lkt9apw&amp;st=sz6jzwrc&amp;dl=0\">readME templates<\/a> from our workshop on file organization<\/li>\n<li>Documenting data that you are reusing (secondary data) or that you are backing up from some other location? Please see our <a href=\"https:\/\/www.dropbox.com\/scl\/fi\/ij4u7v26s01932pfcd7hr\/Template_SECONDARY_DATASET_Readme.txt?rlkey=cmz5u7e2evdb4nytlsmrrllaq&amp;dl=0\">readME template for secondary data <\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Metadata and data documentation help you understand and describe your data, code, and research materials in detail, and also help other researchers find, use, and properly cite what you&#8217;ve done. Documentation is generally created at both project- and data\/code-levels. Documentation at the project-level describes the \u201cwho, what, where, when, how and why\u201d of the materials, which provides context for understanding why the materials were collected or created and how they were used. Documentation at the data\/code-level describes the attributes of the dataset, code, software, model, etc. at a granular level (e.g., data standards). It is good practice to use an [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":16,"menu_order":6,"comment_status":"open","ping_status":"open","template":"templates\/page.php","meta":{"footnotes":""},"class_list":["post-48","page","type-page","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/pages\/48","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/comments?post=48"}],"version-history":[{"count":15,"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/pages\/48\/revisions"}],"predecessor-version":[{"id":1061,"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/pages\/48\/revisions\/1061"}],"up":[{"embeddable":true,"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/pages\/16"}],"wp:attachment":[{"href":"https:\/\/libraries.mit.edu\/data-management\/wp-json\/wp\/v2\/media?parent=48"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}