MIT Libraries

Social Science Data Services
Finding and Managing Data for Research

 
 

Data Research Process

The following outlines the steps of the data research process, questions to ask yourself at different stages, and resources available to you at MIT.

  1. Define the Research Question
  2. Define the Type of Data Needed
    • What variables and measures will be needed?
    • Are you looking for a study that takes place over time and/or is repeated with the same subjects?
    • At what level of geography do you need the data (e.g. city, state, country, etc.)?
    • Do you want to conduct geospatial analysis? If yes, see the GIS Laboratory at MIT.
    • In what form do you need the information? Data can be categorized along the spectrum from data for information (e.g. reference book such as Statistical Abstract of the United States) to data for research (e.g. large survey data set such as General Social Survey).
  3. Identify Potential Data Sources
  4. Determine Usefulness of the Data: Once you have identified potential data sources, assess whether or not the given data will help to answer your research question and fit into the bounds of your project.
    • Consider: sample design, method of collection, quality of the data, measures and units of analysis, variables, file structure, and validity of the source.
    • For raw data files, you need the codebook, the documentation of the contents and structure of the data file. It generally contains a description of the research project and methods, the data collection instrument, and a data dictionary listing the variables, their locations, and the coded values.
    • See understanding data files.
    • See data access, which also provides access to data documentation.
  5. Access the Data
  6. Format the Data
    • If the data is in the form of easily-retrievable statistics, no further analysis may be needed. If you have a raw data file, then you need to do some formatting and analysis.
    • Import the data into the preferred software package (e.g. Stata, SAS, SPSS)
      • Setup files enable easier importation without manual creation of a data dictionary. See the ICPSR page on SPSS setup files.
      • If the data is available in a format compatible with another software package, you can use transfer software (e.g. Stat/Transfer) to convert it to your preferred package; see software availability.
    • Extract variables of interest (e.g. if you are interested in economic but not lifestyle variables).
    • Subset observations of interest (e.g. if you are interested in only respondents from a certain age group).
    • Note: some older data sets may be in print only; thus, decide whether it’s worth it to manually enter the data.
    • See working with data files.
  7. Analyze the Data

 

Quick Links

- Harvard-MIT Data Center

- ICPSR


New Resources

- China Data Online (MIT only)
- Historical Statistics of the United States (MIT only)
MIT
Katherine McNeill, Social Science Data Services and Economics Librarian, mcneillh@mit.edu
MIT Libraries - Ask Us!
Massachusetts Institute of Technology
77 Massachusetts Avenue, Cambridge, MA 02139-4397 USA