Documentation and Metadata
In order for your data to be used properly by you, your colleagues,
and other researchers in the future, they must be documented. Data
documentation (also known as metadata) enables you understand your data
in detail and will enable other researchers to find, use and properly
cite your data.
It is critical to begin to document your data at the very beginning
of your research project, even before data collection begins; doing so
will make data documentation easier and reduce the likelihood that you
will forget aspects of your data later in the research project.
Researchers can choose among various metadata standards, often tailored
to a particular file format or discipline. One
such standard is DDI (the Data
Documentation Initiative),
designed to document numeric data files. For further help in documenting
your data, contact data-management@mit.edu or
the MIT Libraries' Metadata
Services Unit.
Following are some general guidelines for aspects
of your project and data that you should document, regardless of your
discipline. At
minimum, store this documentation in a readme.txt file or the equivalent,
together with the data. One can also reference a published article which
may contain some of this information.
| Title |
Name of the dataset or research project that produced
it |
| Creator |
Names and addresses of the organization or people who created
the data |
| Identifier |
Number used to identify the data, even if it is just an internal
project reference number |
| Subject |
Keywords or phrases describing the subject or content of the data |
| Funders |
Organizations or agencies who funded the research |
| Rights |
Any known intellectual property rights held for the data |
| Access information |
Where and how your data can be accessed by other researchers |
| Language |
Language(s) of the intellectual content of the resource, when applicable |
| Dates |
Key dates associated with the data, including: project start and
end date; release date; time period covered by the data; and other
dates associated with the data lifespan, e.g., maintenance cycle,
update schedule |
| Location |
Where the data relates to a physical location, record information
about its spatial coverage |
| Methodology |
How the data was generated, including equipment or software used,
experimental protocol, other things one might include in a lab notebook |
| Data processing |
Along the way, record any information on how the data has been
altered or processed |
| Sources |
Citations to material for data derived from other sources, including
details of where the source data is held and how it was accessed |
| List of file names |
List of all data files associated with the project, with their
names and file extensions (e.g. 'NWPalaceTR.WRL', 'stone.mov') |
| File Formats |
Format(s) of the data, e.g. FITS, SPSS, HTML, JPEG, and any software
required to read the data |
| File structure |
Organization of the data file(s) and the layout of the variables,
when applicable |
| Variable list |
List of variables in the data files, when applicable |
| Code lists |
Explanation of codes or abbreviations used in either the file names
or the variables in the data files (e.g. '999 indicates a missing
value in the data') |
| Versions |
Date/time stamp for each file, and use a separate ID for each version
(see organizing your files) |
| Checksums |
To test if your file has changed over time (see backups) |
|
For advice on a data management project, contact:
data-management@mit.edu
Courtney Crummett
Bioinformatics and Biosciences Librarian
Anne Graham
Civil and Environmental Engineering, Building Technology Librarian
Katherine McNeill
Social Science Data Services & Economics Librarian
Daniel Sheehan
Senior GIS Specialist
Amy Stout
Electrical Engineering and Computer Science Librarian
|