Planning for an Archive of Dynamic E-Journals at MIT

A proposal for the Andrew W. Mellon Foundation

13 October 2000

The MIT Libraries applaud the interest of the Trustees of the Andrew W. Mellon Foundation in the challenge of maintaining our scholarly literature for future generations. At MIT, we are interested in tackling the archiving of a specific subset of the new scholarly literature, a medium we think will constitute the next generation of e-journal publishing, a medium we will call "dynamic e-journals." Dynamic e-journals are scholarly web sites which aim to share discoveries and insights, but do not feel bound by the conventions of "issues" and "articles" that have become standard in print. We believe that the dynamic e-journals currently published represent the leading edge of a broad range of dynamic content which we must learn to capture for future scholars.

It must be said that developing a model, let alone a mechanism, for archiving dynamic e-journals is an ambitious undertaking, particularly when publishers and libraries are still caught in the transition between paper and electronic publication, trying to decide whether to maintain solid footing in both worlds, or leap, possibly irretrievably, into the electronic sphere. Dynamic e-journals - whose number will inevitably grow - force us even closer to making the leap into the digital arena, since they have no print counterpart. We want to anticipate the growth of this new truly digital mode of scholarship, to be ready with a model and a method to capture these uniquely valuable entities: the first dynamic e-journals. Libraries were not ready with an archiving solution for the first generation of e-journals, but if we move now we can be ready for the second .

This move will not be easy, for the issues involved in archiving dynamic e-journals are complex; indeed, while the technical, procedural, and philosophical issues spawned by the need to preserve electronic journals in their traditional form are significant and daunting, they are dwarfed by the complexities of preserving dynamic e-journals. Because dynamic e-journals leave behind the conventions of print, they explode the archiving challenges. That it took some time for e-journals to reach this point is not surprising. e-journals have followed the pattern common in cases of technological change, in that the first technological innovations mimicked the previous, print-based technology: the first e-journals bundled articles into issues and often were presented as digital images of print pages. e-journals then evolved into a hybrid model which shared features of both the old and the new technologies: they were characterized by online structures resembling print in organization, but with html-based interlinkages. We now find ourselves grappling for the first time with a form that fully leverages the functionality of the new technology.

Dynamic e-journals, with all their complexity, are our future; we want to meet the archiving challenges they represent so that we can be sure we are ready to make the newest form of e-journal dependably available to future scholars. Ignoring the archiving challenges and focusing exclusively on simpler problems first will mean that the very earliest examples of a unique form of scholarly communication may not be available to future scholars. This would be a great loss not only to scholars in the disciplines covered by the dynamic e-journals, but also to future historians and sociologists, and scholars in related disciplines.

Dynamic e-journals

On the crest of this second wave of e-journals, MIT Press officially launched CogNet at the end of September, 2000 (http://cognet.mit.edu). MIT Press believes CogNet is "the future of journals publishing." CogNet strives to create a community for researchers in cognitive and brain sciences. It offers a gateway for scholars' research, teaching, and community needs by acting as a central repository for important traditional resources in cognitive and brain sciences (including books, journals, conference proceedings, and reference works); forming dynamic partnerships with all of the participants in the information chain for cognitive and brain sciences, including scholars, professional societies, academic departments, and other publishers; and providing customized services, all in a dynamic, constantly updated environment. Professor Terrence J. Sejnowski of the Salk Institute and the University of California, San Diego, has said that Cognet "is an important new venture that brings together many different resources for the cognitive and brain sciences. By bringing information directly to researchers and students and allowing rapid dissemination of important new research directions, CogNet is bypassing the slow and expensive efforts in the commercial publishing world. A new model for how scientific publishing will look in the next century is already being tested today in CogNet."

And the MIT Press is not alone. Other publishers have also begun to invent this second wave of e-journals, including Columbia University Press with EarthScape and CIAO (Columbia International Affairs Online). Columbia EarthScape is billed as "an online resource on the global environment." It offers continuously updated information on research and education, including a mix of traditional resources such as conferences, journals, books, and resources "born digital," such as databases, datasets, and a new quarterly publication "Earth Affairs Magazine," as well as sample syllabi, and classroom models. It provides, like CogNet, a central point for access to relevant links and to a multiplicity of resources. Columbia EarthScape has set the goal of "transforming the way researchers, teachers, students, and decision makers get critical information in the Earth sciences and environmental policy." According to their website, the service "creates, selects, and links the widest range of Earth-systems resources available online."

Columbia University Press has also launched CIAO, which, according to the web site (http://www.ciaonet.org/) "is designed to be the most comprehensive source for theory and research in international affairs. It publishes a wide range of scholarship from 1991 on that includes working papers from university research institutes, occasional papers series from [non-governmental organizations], foundation-funded research projects, and proceedings from conferences." Like its dynamic e-journal counterparts , it aims to offer a dynamically updated, wide-ranging, centralized access point to a variety of resources in a clearly-defined subject area.

In a similar effort, the American Association for the Advancement of Science (AAAS) has collaborated with the Stanford University Libraries and The Center for Resource Economics/Island Press to develop a "Knowledge Environment" to support research in signal transduction. The Signal Transduction Knowledge Environment (STKE) (http://www.stke.org) is intended to "systematize the consensus knowledge within a scientific domain and to facilitate users' access to that knowledge." The AAAS' idea is that in a Knowledge Environment like the STKE, "access occurs through searching, browsing, and current awareness features combined with user-friendly graphical interfaces. KEs combine primary and review literature with more dispersed sources of "how-to", "what-is", and "where-to" knowledge. Specific electronic tools that facilitate entry of information into the underlying databases are also being developed as part of the concept." Like Cognet, the STKE is intended to harness the full potential of the internet to improve access to current information and the speed and breadth of scholarly exchange. Like Cognet, it is a hybrid, offering access to some traditional forms, but in a new dynamic context that is experimental and evolving.

Vanderbilt University is considering a new publishing venture that could evolve into a dynamic e-journal. Vanderbilt's initiative would involve partnering with scholarly societies and universities to host a site supporting the field of archaeology. According to Paul Gherman, University Librarian, the site would include a preprint server, current journals in the field, monographs, core texts, reports of excavations, maps, and images of artifacts.

These examples make it clear that publishers are beginning to embrace the idea that the web and new media open up new mechanisms for attracting authors and building relationships with their customers far beyond what was available to them in print. Dynamic e-journals are among the first of these new mechanisms, encouraging exchange between readers and authors, and fostering the development of a community of interest around the core of the journal. They depend less on the bundle of articles called an issue, and more on discipline-based contexts which evolve out of the discussions of articles and other submissions.

Dynamic e-journals build community through this creation of context and the potential for dynamic engagement. They fully harness the potential power of the digital environment for scholarly communication, capturing discussions that occur at professional meetings and field trips, weekly seminars, and lab visits for a geographically scattered audience. In so doing, the dynamic e-journal seems, ironically, to bring us full circle from the earliest days of research and scholarship, when small groups of interested researchers sent their papers to each other, forming communities to exchange ideas. This method worked when the number of scholars was limited. Dynamic e-journals graft this same sense of intense, focused community into today's vastly more complex and larger scholarly arena, offering widespread groups of scholars a forum for an immediate exchange, a central source for information, and allowing for contributions ranging from informal to formal. Dynamic e-journals have the power to offer, potentially, both the potency of real-time exchange and a forum for in-depth inquiry.

We have chosen to retain the term "journal" in our description of this new online publication form, because we believe that the essence of the scholarly journal lies not in the container-specific definition of journal as something published in discrete units over time, but rather in terms of the purpose -- the essence -- of what we have known as the scholarly journal. This purpose has been to record, distribute, and provide quality control for scholarly research. The particular package, or container, which was based on the constraints of print publication, was the familiar numbered journal. Shedding the demands of the print container, the soul of the journal is now freed to take on new shape. Dynamic e-journals are the new shape. We expect them to serve as the heart of the system of scholarly communication in the twenty-first century, as the traditional journal did (and still does, for now) in its time.

The existing dynamic e-journals make it clear that journals have already begun evolving into substantially different entities that do more than just distribute standard research articles that have been peer reviewed. New types of content, tools, and services have started to emerge, such as: open posting of articles that have not been peer-reviewed for open review; publication of articles as completed, prior to publication in issues (being done currently by American Chemical Society); working papers; articles of longer length than standard research articles; negative results; datasets; computational models; and aggregations of content from a number of different journals into new collections. Additional services -- such as letters to the editors, links to abstracting and indexing services, and material from other journals, bibliographies, indexes, related links, etc. -- will be bundled with the content in order to keep the journal, albeit in a more dynamic form, as the central tool for publication of new research.

It could be argued that at this stage in the development of dynamic e-journals, the new "container" is primarily an amalgam of traditional publication forms, with some added functionality. We believe that even if dynamic e-journals currently contain pieces borrowed from the print world and from traditional forms, such as key reference works, they will evolve beyond these forms. It seems inevitable that once freed from the constraints of print, dynamic e-journals will begin to tap the potential of the digital world and will become less and less like the print entities they may currently include. Even as aggregations, dynamic e-journals constitute a new "species" of publication: with the added "born digital" elements as well as added functionality and opportunity to comment and interact with the material, with the connectivity and context offered by dynamic e-journals, the whole is bigger than the sum of its parts. And, above all, it is important to begin resolving the archiving issues that are inherent in such multifaceted and dynamic tools while they are still evolving.

Archiving Dynamic e-journals

Though dynamic e-journals bring benefits to authors and readers that cannot be achieved in print, their shape-shifting nature presents unique demands for anyone trying to capture their content. As noted above, the challenge is especially complex for dynamic e-journals. This is true for several reasons:

Given these challenges, working through the "Minimum criteria" included in the call for proposals, and determining how to meet these with regard to dynamic e-journals, would be a major part of the planning we would engage in over the next year. We would need to work on a series of questions, including: How could we define our archival mission? What would be the appropriate deposits? How much context will be necessary for presenting the content of a dynamic e-journal to future scholars? What elements merit archiving, according to what means? What auditing techniques and agents would test our fulfillment of mission? With whom could we network to ensure broader preservation of this information? Are there significantly different copyright implications due to the dynamic nature of these e-journals?

MIT Strength and Timing

Archiving this content demands a close working relationship with the publisher and possibly even new techniques that a publisher could use to push archival content out to an archive site as it is generated within the dynamic site. As we come to understand the strategies that work for gathering this content, we imagine we could define them in ways that would allow publishers to work with more remote partners. But to develop the strategies will certainly take an established base of trust.

The MIT Libraries and the MIT Press have a relationship we believe could sustain such an effort. (This relationship is strengthened by the fact that both the Libraries and the Press report to the same individual, Ann Wolpert, Director of Libraries.) We expect that what we establish as fair groundrules for archiving of dynamic e-journals could be used as a template for building similar relationships with other publishers, particularly university presses. We would test this by reaching out to a few other publishers, such as Columbia University Press, to explore the realities of expanding the model developed at MIT.

A year of working out the appropriate relationships and technical details will also put the MIT Libraries in a good position to leverage the digital repository infrastructure we are building as part of our DSpace project to house the archived e-journal content. Since we will be depending on this infrastructure as the long-term home of our own digital output, we would welcome the opportunity to submit the process to the auditing required to satisfy both ourselves and our partners that it serves as a safe repository for dynamic e-journals.

The MIT Libraries also participate in the Northeast Research Libraries consortium (NERL). Other NERL institutions are engaged in similar e-journal archiving projects. We plan to share information about strategies and tools with them in a supportive environment that emphasizes mutual learning. In addition, we will use the body of NERL institutions as a forum on the issue of acceptable assurance of reliability over time.

Planning Process

We feel that this project would be best managed by a full-time project planner who could coordinate efforts across the libraries and with our publisher partners. This planner would be responsible for three broad activities:

(1) surveying the existing e-journal environment and negotiating partnerships with appropriate publishers of dynamic e-journals;

(2) exploring the strategies and technologies we might apply to archiving dynamic e-journals, understanding the legal issues, and convening meetings of experts to illuminate the challenges; and

(3) developing a specific plan for building a sustainable dynamic e-journal archive at MIT and nurturing investment in the plan by key stakeholders. While the project planner and other staff of the libraries involved in this effort would certainly be welcome to experiment with prototypes of the archiving process, our real focus for the planning year would be on the partnerships, strategies, and plans which have to be put in place for a serious development project to get under way in future years. From a raw technology perspective, we hope to be able to take advantage of the Dspace repository infrastructure (http://web.mit.edu/dspace) so that it becomes a home for e-journal archives as well. We also bring to the project our limited experience collecting a mirror of the three existing e-journal-only publications of the MIT Press.

Our working relationship with MIT Press is quite strong, but this project will require a broader scale of participation than the Press. Our survey effort will seek to identify other publishers developing second-wave dynamic e-journals. Ideally, we would identify two other publishers with whom to forge relationships. We will also want to make sure that our strategy is planned with reference to the direction other NERL institutions pursue on this topic. We hope that over time archives can interoperate sufficiently to serve as fail-safe repositories for one another and auditors of each other's practice.

Work Plan

Quarter 1: Our activity will focus on recruiting a Project Planner, negotiating for publishing partners, detailed analysis of the technical and legal challenges, and a thorough review of prior applicable work. Along with partnering with the MIT Press we anticipate approaching both Columbia and Stanford as partners for the full archiving project, and identifying experts in related, applicable areas, e.g., archiving dynamic information such as web sites and listservs.

Quarter 2: We will bring together partners and experts for the first of two workshops exploring the complexities of the issues undertaken, and identifying the key technical and legal hurdles which need to be addressed. The Project Planner will develop and lead teams of staff from MIT and its partners in developing the technical specifications required for success, based on the Minimum Criteria for an Archival Repository of Digital Scholarly Journals.

Quarter 3: We will provide the necessary time to explore, test, and refine the technical specifications developed previously. It will also allow us to determine whether the DSpace project provides a suitable infrastructure for supporting the dynamic e-journal archive, and if other applicable work, e.g., LOCKSS, can be used.

Quarter 4: We will first reconvene our partners and necessary experts to explore the work done, and provide ample time for discussion, review, and exploration of outstanding issues. Based on the outcome of this final two day workshop necessary revisions in the proposal will be made.

Budget

I. Qualified staff will be essential to the success of this proposal. A full-time Project Planner is a requirement. The Project Director will be Carol Fleishauer, Associate Director for Collections Services. She will be the key liaison from the Libraries, providing guidance to the Project Planner and coordinating the participation of other MIT Libraries staff with expertise in licensing, e-journals, preservation, technical, and user issues.

II. Equipment and software will be necessary for the Project Planner.

III. Travel by the Project Planner to partnering organizations will be required to build understanding of the dynamic e-journal environments produced by each of the publishers. In addition we anticipate that solicited partners will need to visit the MIT team on occasion.

IV. Workshops will be important activities to facilitate useful discussion and analysis of the technical, legal, and social challenges associated with this project. We propose two workshops, each lasting two days. The first will help define the specific issues that make dynamic e-journals different than normal e-journals. The second will be helpful in reviewing the progress made in the planning year, and insure that the final proposal is relevant and possible. The workshops are intended to bring together key players from the "dynamic journal" publishing world with key technical people who have experience in (or are experts in) archiving dynamic information of various types (web sites, listservs, publications). Outcomes of the workshops would be to: define some of the problems inherent in archiving dynamic journals; develop possible strategies for approaching these problems; create connections between MIT Libraries and key players that will result in longer-term dialogues; develop archiving "awareness" within the publishing community; brainstorm possible solutions.

V. Funds will cover normal workshop expenses, e.g., travel, food, stipends, and consulting fees for legal and other experts.

VI. Communication charges will cover expenses incurred by the Project Planner.

VII. Other charges are nominal costs for related administrative expenses.

Outcomes

The primary product of this grant will be an MIT proposal for a full Electronic Journal Archiving Project. By the time we submit such a proposal, we also expect to have an explicit list of publishers who have signed an appropriate document indicating their readiness to pursue the project with us. Finally, we will have clarified the technical strategy we plan to use to capture such dynamic e-journals.

Contacts

We have a small team of staff working on this project at MIT. The main contacts for this proposal are:

Eric Celeste, efc@mit.edu, 617-253-8184

Carol Fleishauer, fleish@mit.edu, 617-253-5962

Teresa Ehling, ehling@mitpress.mit.edu, 617-253-1672