MIT Libraries

Metadata Reference Guide

 
----
A guide to metadata by the Metadata Advisory Group of the MIT Libraries

TEI (Text Encoding Initiative) Metadata

Definition:

Text Encoding Initiative: defines a general-purpose scheme that makes it possible to encode different textual views. “Grew out of technology based textual analysis applications employed by Humanities scholars”[1] e.g., tracing the use of the word ‘love’ in the genre poems within a specific historical period. Focus has been on text capture (in electronic form from already existing text in another medium) rather than text creation, i.e., no other text copy exists. [2] Assumes texts and works on texts have a common core of textual features.

Constituency:

Originally a joint project of the

• Association of Computers in the Humanities
• Association of Computational Linguistics
• Association for Literary and Linguistic Computing

TEI addresses many of the needs of the “language technology community which is amassing substantial multi-lingual, multi-modal corpora of spoken and written texts and lexicons in order to advance research in human language, understanding, production, and translation.” [3]

History of use:

Begun in 1987 as an international project for the encoding of electronic textual materials. Planning conference held at Vassar College in 1987 led to an agreement on basic design goals. From the initial stages, there has been a relationship between TEI and MARC bibliographic records. The TEI Header was based on ISBD and intended to supply information suitable to create a catalog record. Similar to MARC, TEI makes a distinction between required, recommended, and optional encoding practices and provides a mechanism for user-defined extensions to the scheme.

E-text centers and MARC communities have fostered communication. Library of Congress MARC to SGML crosswalk located on LC’s web site: http://lcweb.loc.gov/marc/marcsgml.html

• 1990: First draft version of TEI Header and Guidelines was distributed.

• May 1994: “Guidelines for the encoding and interchange of Machine-Readable Texts” was issued. Guidelines provide conventions for describing physical and logical structures of text types for research in language technology, computational linguistics, and the humanities.

• June 1998: TEI and XML in Digital Libraries workshop sponsored by LC and the Digital Library Federation charged a working group to recommend some best practices for TEI header content.

• June 2001 revision: TEI P4 disseminated to provide equal support for XML and SGML applications. Next revision expected early 2002.

Prerequisites:

TEI is an interchange format independent of application.

Progress towards standardization:

Joint project of the Associations resulted in an extensible SGML ‘document type definition’. TEI Guidelines published. TEI continues to develop and maintain encoding standards. Has a specific mark-up syntax as well as a large well-defined tag set, but few tags are mandatory. Similar to MARC in specifying input standard each tag.

Responsibility divided between two committees: Committee on Text Representation and Committee on Text Interpretation and Analysis. Standard feature: TEI Header which provides bibliographic history, provenance information and information about the text and its creation (encoder, file size, file availability, encoding practices.)

TEI/MARC best practices for TEI Headers distributed by University of Michigan:
http://www-personal.umich.edu/~jaheim/teiguide.html

Encoding:

SGML (ISO 8879) and ISO 646 (7-bit character set standard). Encodings for different views of text; alternative encodings for the same text features; mechanisms for user-defined extensions to the scheme. The Guidelines make it possible to encode many different views of the text, simulataneously if necessary. TEI Guidelines are not prescriptive: few features are mandatory, but the Guidelines define a core set of tags. Extensible. The focus is on the capture of text that already exists in another medium rather than text creation.

TEI Header is a set of descriptions prefixed to a TEI encoded document that specifies four components:

• file description (a full bibliographic description),

• encoding description (level of detail of the analysis-the aim or purpose for which an electronic file was encoded; editorial principles and practices used during the encoding of the text),

• text profile (classificatory and contextual information such as the text’s subject matter; the languages and sublanguages used, the situation in which it was produced, the participants and their setting),

• revision history (history of changes during the electronic files’ development). contains bibliographic information supporting resource discovery, and data management portions supporting use of the resource.

If TEI Header is similar to the information contained in a MARC record, why didn’t the scholarly community simply use MARC?

Workflow is the primary answer…the TEI drafters envisioned that the individuals who marked up the electronic texts would be creating the metadata for them and shouldn’t be expected to know cataloging rules, but the Header was deliberately designed to provide a trained cataloger the information necessary to create a good cataloging record. The difference is that the rules for obtaining and representing the content are not prescribed and, consequently, catalogers find that the data is usable only to the extent that the encoder followed cataloging rules.[4]

Over time, the progression of the TEI header has been towards greater consistency and compatability with traditional library cataloging and greater syntactical congruence with MARC. It is conceivable that the TEI header will evolve such that it would carry detailed encoding, profile and revision information, but would point to a MARC record that would contain the bibliographic description.

The TEI Header supports a number of field categories which cannot be captured in MARC, e.g., the change history section provides a structure for logging changes made to an electronic text, including date, responsible party and the nature of the change. The source desc within the file desc allows for a detailed a richly content-designated description particularly for non-print sources. The encoding desc provides for a lengthy and detailed description of the encoding of the electronic file including the data about the project, the purpose for which it was created, the editorial decisions that were made, and the transcription practices that were used.

Unlike METS, the TEI Guidelines do not specify a particular approach to the problem of fidelity to the source text and recoverability of the original, but it does provide for typographic and linguistic characteristics of the text rather than a detailed mark up of the layout or fine distinctions of the manuscript.

TEI does not restrict combining objective and subjective information in the encoding. The Guidelines provide a means for encoding for the text representation as well as the text interpretation and analysis.

Implementations:
(see http://www.tei-c.org/applications/index.html)

• University of Michigan Digital Library Program
• Making of America Project
• Women Writers Online: Brown University
• Electronic Text Center: University of Virginia

Useful links:

• TEI home page: http://www.tei-c.org/
• TEI Guidelines: http://www.tei-c.org/P4X/
• International metadata initiative: http://lcweb.loc.gov/catdir/bibcontrol/caplan_paper.html
• Guidelines for electronic encoding and interchange: http://www.hti.umich.edu/t/tei/
• Teach yourself TEI: http://www.tei-c.org/Tutorials/
• Projects using the TEI: http://www.tei-c.org/Applications/

 



[1] OCLC Systems and Services, v. 17, no. 3, p. 117.
[2] Guidelines for Electronic Text Encoding and Interchange (TEI P3), p. 2
[3] TEI Guidelines as posted on the University of Illinois (Champagne/Urbana) http://www.uic.edu/orgs/tei/p3/
[4] Priscilla Caplan, “International Metadata Initiatives: Lessons in Bibliographic Control”, p. 2 Conference on Bibliographic Control in the New Millennium (Library of Congress) http://lcweb.loc.gov/catdir/bibcontrol/caplan_paper.html

 

This page was last updated on 11/06/07

webmaster@libraries.mit.edu