Data planning checklist

Evaluate your data needs and make a plan before you begin your research and throughout its life cycle to ensure current usability and long-term preservation and access. Use this checklist to evaluate and plan for your data needs. Learn more about data management plans.

  1. What type of data will be produced?
    Gather a clear picture of what your data will look like. Is it numerical data, image data, text sequences or modeling data? The type of data will affect your decisions about formatsorganizationbackups, and more.
  2. How much data will be produced, and at what growth rate?
    Once you know what kind of data you’re producing, you’ll be able to assess the growth rate. How will the data be collected and for how long? How often will it change? The answer to this question impacts how you organize the data, as well as the level of versioning you’ll need to use.
  3. Who will use it now and later? 
    Understanding the audience will help you organize your data and provide information on where you might share it.
  4. Who controls it? 
    Do you have the right to manage this data, or is that the responsibility of the PI, student, lab, MIT or funding agency?
  5. How long should it be retained?
    3-5 years, 10 years, or forever? Not all data needs to be retained indefinitely. Determine what’s important to keep, and make sure your long-term management and storage plans for those datasets are clear.
  6. Are there tools or software needed to create/process/visualize the data?
  7. Are there any funding or journal requirements for sharing or planning?
    Funding agencies may have a data sharing policy or require a data management plan in the proposal. Many journals require that published articles be accompanied by the underlying research data.
  8. Are you keeping good project and data documentation?
    What directory and file naming convention will be used? What file formats?
  9. What’s your storage and backup strategy
    What would happen if the data got lost or became unusable? Is the data reproducible?
  10. Will the data be shared? Where?
    Any special privacy or security requirements (personal or high-security data)?  Is there an ontology or other community standard for data sharing/integration?