Evaluate Your Data Needs
What type of data will be produced?
Gather a clear picture of what your data will look like. Is it, for example, numerical data, image data, text sequences or modeling data? Knowing exactly what kind of data you have will inform many decisions you need to make about storage, backups and more. Image data requires a lot of storage space, so you'll want to decide which of your images, if not all, you want to retain, and where such large datasets can be housed. As for backing up your data, your research center or university may have the ability to help you. On the other hand, if you are storing images, you may quickly exceed your institution's limit for backing up individual laboratories or groups.
How much of it, and at what growth rate?
Once you know what kind of data you're producing, you'll be able to assess the growth rate. For example, are you gathering data by hand or using sophisticated instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.
Will it change frequently?
The answer to this question impacts how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge, so it's imperative you begin with a plan that will carry you through the data management process.
Who is it for?
Who is your audience for the data? How will they use the data? The answer to this question will tell you how to structure the data and where to distribute it, among other things.
Who controls it (PI, student, lab, MIT, funder)?
Before you spend a lot of time figuring out how you're going to store the data, name it, etc. you need to know if you have the authority to control it.
How long should it be retained? (e.g. 3-5 years, 10-20 years, permanently)
Not all data needs to be retained indefinitely. Figure out what's important to keep and make sure your plan for those datasets is solid.