Organizing Your Files
File Version Control
Keeping track of versions of documents and datasets is critical. Strategies include:
- Directory Structure Naming Conventions
- File Naming conventions
Always record every change to a file no matter how small. Discard obsolete versions after making backups.
Directory Structure Naming Conventions
When organizing files, directory top-level folder should include the project title, unique identifier, and date (year).
The substructure should have a clear, documented naming convention; for example,
each run of an experiment, each version of a dataset, and/or each person in
the group.
File Naming Conventions
- Reserve the 3-letter file extension for application-specific codes, for example, formats like .wrl, .mov, and .tif.
- Identify the activity or project in the file name
File Renaming
Use free tools to help you:
File Naming Conventions for Specific Disciplines
Many disciplines have recommendations, for example:
Data Identifiers for Sharing Your Data
The information at the beginning of this page will help you organize your
datasets for your own use. But you'll want to consider using more sophisticated
name schema if you want to share or cite your data. You'll want put your datasets
where other people can access them, and give your datasets identifiers that
can be referenced easily.
Data identifiers must be globally unique and persistent. That is to say, they must not be repeated elsewhere and they must not change over time.
There are many different schemes:
- PURL -- A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client.
- DOI -- A DOI (Digital Object Identifier) is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.
- ACCESSION -- Accession numbers used by the National Center for Biotechnology Information (NCBI) are unique and citable.
- InChI -- The IUPAC International Chemical Identifier (InChITM) is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations.
- URI -- Uniform Resource Identifier (URI) consists of a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols.
|
For advice on a data management project, contact:
data-management@mit.edu
Courtney Crummett
Bioinformatics and Biosciences Librarian
Anne Graham
Civil and Environmental Engineering, Building Technology Librarian
Katherine McNeill
Social Science Data Services & Economics Librarian
Daniel Sheehan
Senior GIS Specialist
Amy Stout
Electrical Engineering and Computer Science Librarian
|