A guide to metadata by the Metadata Advisory
Group of the MIT Libraries
Data Documentation Initiative
(DDI)
http://www.icpsr.umich.edu/DDI/
Definition:
The Data Documentation Initiative (DDI) is an effort to establish
an international criterion and methodology for the content,
presentation, transport, and preservation of metadata about
datasets in the social and behavioral sciences.
In the social sciences, metadata about datasets are often
called codebooks. Previously, systems that searched for social
science
datasets could search on fairly limited fields (i.e. name,
author, study number, and abstract). One would need to manually
examine the codebook to find out detailed, and important,
information about the study (such as variables, methodology,
and structure
of the data) that determine the usefulness of the data to
the researcher.
With the achievements of the DDI, codebooks can now be created
in a uniform, highly structured format that is easily and
precisely searchable on the Web, that lends itself well to
simultaneous
use of multiple datasets, and that will significantly improve
the content and usability of metadata. Further, this specification
may have far-reaching implications for improvement of the
entire process of data collection, dissemination, and analysis.
The
DTD employs the eXtensible Markup Language (XML).
Constituency:
The primary users of the DDI are researchers who produce
social and behavioral science data, archivists and librarians
who
store and provide access to it, and end users.
Using DDI, codebooks have the potential to structure and
support the entire data collection, distribution, and analysis
process
throughout the social and behavioral sciences (including
in experiments). In addition to easier searching, many see
the
DDI as enabling a new mode of doing comparative and other
research that uses multiple datasets, a growing trend throughout
the
social and behavioral sciences. Users will be able to document
a complex dataset to statistical packages through its codebook,
rather than have to go through conversion processes. Thus,
many see the DDI as offering not only data producers and
data archivists but also data users a new power and flexibility
to do their work and to do it effectively and efficiently.
Evolution:
The DDI Committee leads the project. Initially with support
from ICPSR and then with support from an NSF grant and considerable
in-kind contributions of staff time from institutions all
over the world, the group has created what is known as a
Document
Type Definition (DTD) for the "markup" of social
science codebooks. The DTD employs the eXtensible Markup Language
(XML), which is a dialect of a more general markup language,
SGML. ICPSR and the Roper Center for Public Opinion Research
jointly have received a collaborative grant from the National
Science Foundation to continue the activities of the Data Documentation
Initiative (DDI) for an additional yea through 2003.
Version 1 of the DDI DTD, along with a Tag Library providing
instructions and examples in using the DTD, was formally
published on March 24, 2000, and is now available for use
by the social
science research community on the DDI Web site at http://www.icpsr.umich.edu/DDI.
The site also provides sample marked-up codebooks, suggestions
for markup tools and software, and additional information
about the initiative.
The DDI Committee includes representatives from a variety
of different groups from around the world, all engaged in
the
social science research enterprise. The DDI Committee meets
regularly once or twice a year, often at the meetings of
ICPSR or the International Association of Social Science
Information
Service and Technology (IASSIST.) The Committee first met
at IASSIST in May of 1995 and periodically over the next
few years
to develop and refine the codebook elements to appear in
the DTD.
The beta-test of the DDI DTD began in March 1999 and continued
until August 3. This included the following test sites: Centre
for Comparative European Survey Data (CCESD); Danish Data
Archive; The Data Archive; Harvard-MIT Data Center; NIWI-Steinmetz
Archive;
Norwegian Social Science Data Services (NSD); University
of California, Berkeley; University of Giessen; University
of
Ljubljana; University of Michigan; University of Minnesota;
University of Warsaw; and University of Wisconsin-Madison.
At the conclusion of the beta-test, a list of changes suggested
by the testers was compiled; version 1 of the DTD, published
March 24, 2000, incorporates these changes.
In subsequent meetings, the DDI Committee continued to revise
and develop new draft versions of the standard (now up to
Version 1.02.1, draft), trying to address description of
aggregate
and tabular data, geography in the DTD, ISO harmonization,
weights, and XML schemas.
Content:
The codebook DTD can be found at:
http://www.icpsr.umich.edu/DDI/CODEBOOK/index.html
The DDI Tag Library can be found at: http://www.icpsr.umich.edu/DDI/CODEBOOK/codedtd.html (Version 1 (Final) with modifications for Version 1.01 included
and highlighted).
Following are the five main sections of the Document Type
Definition (DTD) for social science data documentation developed
by the
Data Documentation Initiative (DDI) Committee. These are
the highest-level components of any document that will be
marked
up in compliance with this DTD.
1. Document Description
Items describing the marked-up document itself as well as
its source documents (citation, title, etc.)
Element -- optional, not repeatable.
2. Study Description
Items describing the overall data collection (title, citation,
methodology, study scope, data access, etc.)
Element -- required, repeatable.
3. Data Files Description
Items relating to the format, size, and structure of the
data files
Element -- optional, repeatable.
4. Variables Description
Items relating to variables in the data collection
Element -- optional, repeatable.
5. Other Study-Related Materials
Other study-related material not included in the other sections
(bibliography, separate questionnaire file, etc.)
Element -- optional, repeatable.
Encoding:
XML, see: http://www.icpsr.umich.edu/DDI/INFO/index.html
MIT Libraries’ Expert:
Katherine McNeill-Harman, Data Services Reference Librarian
Examples of projects using DDI:
Networked Social Science Tools and Resources (NESSTAR): http://www.nesstar.org/
NESSTAR strives to provide a "seamless interface" between
the user and the data and its documentation, through integrated
data discovery, usage, and dissemination tools.
ICPSR: http://www.icpsr.umich.edu
ICPSR has marked up all of the study descriptions in its
catalog of holdings according to the DDI specification.
The Council of European Social Science Data Archives (CESSDA):
http://www.nsd.uib.no/cessda/
CESSDA is moving its Integrated Data Catalog to DDI format.
Harvard-MIT Virtual Data Center (VDC):
http://www.thedata.org/
The VDC is an operational, open-source, digital library to
enable the sharing of quantitative research data, and the
development of distributed virtual collections of data and
documentation.
The VDC imports and exports DDI-compliant documentation.
Survey Documentation and Analysis:
http://sda.berkeley.edu
SDA is a set of programs, developed and maintained by the
Computer-assisted Survey Methods Program (CSM) at the University
of California, Berkeley, for the documentation and Web-based
analysis of survey data.
SDA
imports and exports DDI-compliant documentation.
Census Bureau's DataFerrett:
http://dataferrett.census.gov/TheDataWeb/index.html
DataFerrett supports metadata searches across surveys, on-the-fly
variable recoding, complex tabulations, and graphics. DataFerrett
is working to promote interoperability with the DDI format.
California Digital Library's "Counting California" Project:
http://countingcalifornia.cdlib.org/
This initiative provides a single interface that facilitates
access to a wide range of social and economic data on California.
Documentation underlying "Counting California" was
tagged using Version 1 of the DDI specification.
National Historical Geographic Information System:
http://www.nhgis.org/
This project, based at the University of Minnesota, involves
harmonizing all extant electronic census summary data and
converting documentation to the DDI format.
Workshops/Meetings and Conferences:
Members of the DDI Committee frequently present on the standard
at meetings of ICPSR, IASSIST, and other organizations. For
more information see
http://www.icpsr.umich.edu/DDI/ORG/committee-act.html.
Future Directions:
Among the areas that the committee is exploring are the following:
•
Establishing a repository of marked-up codebooks.
•
Exploring XML Schema language and RDF to facilitate machine
processing of marked-up documents
•
Developing controlled vocabularies for as many attributes as
possible
•
Extending the DTD to encompass elements describing the Computer-Assisted
Interviewing (CAI) process
•
Extending the DTD to handle complex file types
•
Enabling the DTD to document "families" of datasets
as well as different but related datasets
•
Creating style sheets for the presentation of DDI-compliant
codebooks
•
Developing interactive metadata entry software
•
Adding crosswalks to other bibliographic schemes like Dublin
Core, GILS, and MARC
•
Incorporating standards for metadata about spatial data
•
Identifying an ultimate "home" for the DTD where
it will be maintained and revised when necessary
•
Ensuring interoperability across data distribution systems
•
Developing tools to assist data producers and archives in marking
up technical documentation according to the DDI specification
The new NSF grant will enable participants to work on Version
1.1 of the specification, conduct training at meetings of
professional associations to familiarize researchers and
others with the
project and to encourage them to adopt this method for documenting
social science datasets, and providing a set of best practices
to demonstrate optimal usage.
Reading list:
Home Page (primary source of information for this handout):
http://www.icpsr.umich.edu/DDI/
About the Organization:
http://www.icpsr.umich.edu/DDI/ORG/index.html
Codebook DTD
http://www.icpsr.umich.edu/DDI/CODEBOOK/index.html
Papers, Presentations, Reports:
http://www.icpsr.umich.edu/DDI/PAPERS/index.html
XML “cover page” on DDI
http://www.oasis-open.org/cover/ddi.html
See also pages referenced above.
|