Data Research Process
The following outlines the steps of the data research process, questions
to ask yourself at different stages, and resources available to you at
Format the Data
- Define the Research Question
- Define the Type of Data Needed
- What variables and measures will be needed?
- Are you looking for a study that takes place
over time and/or is repeated with the same subjects?
- At what level of geography do you need the data (e.g. city, state,
- Do you want to conduct geospatial analysis? If yes, see the GIS
Laboratory at MIT.
- In what
form do you need the information? Data can be categorized along
the spectrum from data for information (e.g. reference book such as
Statistical Abstract of the United States) to data for research (e.g.
large survey data set such as General Social Survey).
- Identify Potential Data Sources
- Determine Usefulness of the Data:
Once you have identified potential data
sources, assess whether or not the given data will help to answer
your research question and fit into the bounds of your project.
- Consider: sample design, method of collection, quality of the data,
measures and units of analysis, variables, file structure,
and validity of the source.
- For raw data files, you need the codebook, the documentation of the
contents and structure of the data file. It generally contains a
description of the research project and methods, the data collection
instrument, and a data dictionary listing the variables, their locations,
and the coded values.
- See understanding data files.
- See data access, which also
provides access to data documentation.
- Access the Data
Analyze the Data
- If the data is in the form of easily-retrievable statistics, no further
analysis may be needed. If you have a raw data file, then you need
to do some formatting and analysis.
- Import the data into the preferred software package (e.g. Stata,
- Setup files enable easier importation
without manual creation of a data dictionary. See the
on SPSS setup files.
- If the data is available
in a format compatible with another software package, you
can use transfer software (e.g. Stat/Transfer) to convert it to
your preferred package; see software
- Extract variables of interest (e.g. if you are interested
in economic but not lifestyle variables).
- Subset observations of
interest (e.g. if you are interested in only respondents from
a certain age group).
- Note: some older data sets may be in print only;
thus, decide whether it’s
worth it to manually enter the data.
- See working with data files.