MIT Libraries


Metadata Reference Guide

 
----
A guide to metadata by the Metadata Advisory Group of the MIT Libraries

Data Documentation Initiative (DDI)

http://www.icpsr.umich.edu/DDI/

Definition:
The Data Documentation Initiative (DDI) is an effort to establish an international criterion and methodology for the content, presentation, transport, and preservation of metadata about datasets in the social and behavioral sciences.

In the social sciences, metadata about datasets are often called codebooks. Previously, systems that searched for social science datasets could search on fairly limited fields (i.e. name, author, study number, and abstract). One would need to manually examine the codebook to find out detailed, and important, information about the study (such as variables, methodology, and structure of the data) that determine the usefulness of the data to the researcher.

With the achievements of the DDI, codebooks can now be created in a uniform, highly structured format that is easily and precisely searchable on the Web, that lends itself well to simultaneous use of multiple datasets, and that will significantly improve the content and usability of metadata. Further, this specification may have far-reaching implications for improvement of the entire process of data collection, dissemination, and analysis. The DTD employs the eXtensible Markup Language (XML).

Constituency:
The primary users of the DDI are researchers who produce social and behavioral science data, archivists and librarians who store and provide access to it, and end users.

Using DDI, codebooks have the potential to structure and support the entire data collection, distribution, and analysis process throughout the social and behavioral sciences (including in experiments). In addition to easier searching, many see the DDI as enabling a new mode of doing comparative and other research that uses multiple datasets, a growing trend throughout the social and behavioral sciences. Users will be able to document a complex dataset to statistical packages through its codebook, rather than have to go through conversion processes. Thus, many see the DDI as offering not only data producers and data archivists but also data users a new power and flexibility to do their work and to do it effectively and efficiently.

Evolution:
The DDI Committee leads the project. Initially with support from ICPSR and then with support from an NSF grant and considerable in-kind contributions of staff time from institutions all over the world, the group has created what is known as a Document Type Definition (DTD) for the "markup" of social science codebooks. The DTD employs the eXtensible Markup Language (XML), which is a dialect of a more general markup language, SGML. ICPSR and the Roper Center for Public Opinion Research jointly have received a collaborative grant from the National Science Foundation to continue the activities of the Data Documentation Initiative (DDI) for an additional yea through 2003.

Version 1 of the DDI DTD, along with a Tag Library providing instructions and examples in using the DTD, was formally published on March 24, 2000, and is now available for use by the social science research community on the DDI Web site at http://www.icpsr.umich.edu/DDI. The site also provides sample marked-up codebooks, suggestions for markup tools and software, and additional information about the initiative.

The DDI Committee includes representatives from a variety of different groups from around the world, all engaged in the social science research enterprise. The DDI Committee meets regularly once or twice a year, often at the meetings of ICPSR or the International Association of Social Science Information Service and Technology (IASSIST.) The Committee first met at IASSIST in May of 1995 and periodically over the next few years to develop and refine the codebook elements to appear in the DTD.

The beta-test of the DDI DTD began in March 1999 and continued until August 3. This included the following test sites: Centre for Comparative European Survey Data (CCESD); Danish Data Archive; The Data Archive; Harvard-MIT Data Center; NIWI-Steinmetz Archive; Norwegian Social Science Data Services (NSD); University of California, Berkeley; University of Giessen; University of Ljubljana; University of Michigan; University of Minnesota; University of Warsaw; and University of Wisconsin-Madison. At the conclusion of the beta-test, a list of changes suggested by the testers was compiled; version 1 of the DTD, published March 24, 2000, incorporates these changes.

In subsequent meetings, the DDI Committee continued to revise and develop new draft versions of the standard (now up to Version 1.02.1, draft), trying to address description of aggregate and tabular data, geography in the DTD, ISO harmonization, weights, and XML schemas.

Content:
The codebook DTD can be found at:
http://www.icpsr.umich.edu/DDI/CODEBOOK/index.html
The DDI Tag Library can be found at: http://www.icpsr.umich.edu/DDI/CODEBOOK/codedtd.html (Version 1 (Final) with modifications for Version 1.01 included and highlighted).

Following are the five main sections of the Document Type Definition (DTD) for social science data documentation developed by the Data Documentation Initiative (DDI) Committee. These are the highest-level components of any document that will be marked up in compliance with this DTD.

1. Document Description
Items describing the marked-up document itself as well as its source documents (citation, title, etc.)
Element -- optional, not repeatable.

2. Study Description
Items describing the overall data collection (title, citation, methodology, study scope, data access, etc.)
Element -- required, repeatable.

3. Data Files Description
Items relating to the format, size, and structure of the data files
Element -- optional, repeatable.

4. Variables Description
Items relating to variables in the data collection
Element -- optional, repeatable.

5. Other Study-Related Materials
Other study-related material not included in the other sections (bibliography, separate questionnaire file, etc.)
Element -- optional, repeatable.


Encoding:
XML, see: http://www.icpsr.umich.edu/DDI/INFO/index.html

MIT Libraries’ Expert:
Katherine McNeill-Harman, Data Services Reference Librarian

Examples of projects using DDI:
Networked Social Science Tools and Resources (NESSTAR): http://www.nesstar.org/
NESSTAR strives to provide a "seamless interface" between the user and the data and its documentation, through integrated data discovery, usage, and dissemination tools.

ICPSR: http://www.icpsr.umich.edu
ICPSR has marked up all of the study descriptions in its catalog of holdings according to the DDI specification.

The Council of European Social Science Data Archives (CESSDA):
http://www.nsd.uib.no/cessda/
CESSDA is moving its Integrated Data Catalog to DDI format.

Harvard-MIT Virtual Data Center (VDC):
http://www.thedata.org/
The VDC is an operational, open-source, digital library to enable the sharing of quantitative research data, and the development of distributed virtual collections of data and documentation. The VDC imports and exports DDI-compliant documentation.

Survey Documentation and Analysis:
http://sda.berkeley.edu
SDA is a set of programs, developed and maintained by the Computer-assisted Survey Methods Program (CSM) at the University of California, Berkeley, for the documentation and Web-based analysis of survey data. SDA imports and exports DDI-compliant documentation.

Census Bureau's DataFerrett:
http://dataferrett.census.gov/TheDataWeb/index.html
DataFerrett supports metadata searches across surveys, on-the-fly variable recoding, complex tabulations, and graphics. DataFerrett is working to promote interoperability with the DDI format.

California Digital Library's "Counting California" Project:
http://countingcalifornia.cdlib.org/
This initiative provides a single interface that facilitates access to a wide range of social and economic data on California. Documentation underlying "Counting California" was tagged using Version 1 of the DDI specification.

National Historical Geographic Information System:
http://www.nhgis.org/
This project, based at the University of Minnesota, involves harmonizing all extant electronic census summary data and converting documentation to the DDI format.

Workshops/Meetings and Conferences:
Members of the DDI Committee frequently present on the standard at meetings of ICPSR, IASSIST, and other organizations. For more information see
http://www.icpsr.umich.edu/DDI/ORG/committee-act.html.

Future Directions:
Among the areas that the committee is exploring are the following:

• Establishing a repository of marked-up codebooks.
• Exploring XML Schema language and RDF to facilitate machine processing of marked-up documents
• Developing controlled vocabularies for as many attributes as possible
• Extending the DTD to encompass elements describing the Computer-Assisted Interviewing (CAI) process
• Extending the DTD to handle complex file types
• Enabling the DTD to document "families" of datasets as well as different but related datasets
• Creating style sheets for the presentation of DDI-compliant codebooks
• Developing interactive metadata entry software
• Adding crosswalks to other bibliographic schemes like Dublin Core, GILS, and MARC
• Incorporating standards for metadata about spatial data
• Identifying an ultimate "home" for the DTD where it will be maintained and revised when necessary
• Ensuring interoperability across data distribution systems
• Developing tools to assist data producers and archives in marking up technical documentation according to the DDI specification

The new NSF grant will enable participants to work on Version 1.1 of the specification, conduct training at meetings of professional associations to familiarize researchers and others with the project and to encourage them to adopt this method for documenting social science datasets, and providing a set of best practices to demonstrate optimal usage.

Reading list:
Home Page (primary source of information for this handout):
http://www.icpsr.umich.edu/DDI/

About the Organization:
http://www.icpsr.umich.edu/DDI/ORG/index.html

Codebook DTD
http://www.icpsr.umich.edu/DDI/CODEBOOK/index.html

Papers, Presentations, Reports:
http://www.icpsr.umich.edu/DDI/PAPERS/index.html

XML “cover page” on DDI
http://www.oasis-open.org/cover/ddi.html

See also pages referenced above.

 

 

 

This page was last updated on 11/06/07

webmaster@libraries.mit.edu