MIT libraries Site Index Search MIT Libraries | Site Index | Search

MIT Libraries


MIT Libraries
Annual Report FY 2007-2008

Technology Planning and Administration

The past year was once again characterized by big changes in the global information technology landscape that affect many aspects of the MIT Libraries' business.  We saw continued innovation in Web search engine functionality, significant progress in scanning historic print collections (e.g., the Google Books Library Project), major new open-source software initiatives in the library and related domains, ranging from personal products (e.g., the Zotero bibliography management software) to enterprise systems (e.g., a Duke University initiative to design an open-source research library management system).  Service-Oriented Architecture (SOA) and Software as a Service (SaaS) models of software development have become routine, even further dating our existing large-scale business systems.  "Cloud computing" infrastructure and services, from large-scale storage to High-Performance Computing facilities, are rapidly changing the option space for libraries in unexplored ways.  At the same time, our existing technology experiments - for example, institutional repositories for digital scholarship exemplified by DSpace, or next-generation search and browse tools such as those created by the Libraries' SIMILE Project - are reaching a new level of maturity in both the technology and the library services that they are beginning to support.  And the need for science and humanities "cyberinfrastructure" at research institutions is now so well documented and widely accepted that major government funding agencies, including the NSF and NEH, are providing significant funding to begin to build that infrastructure.

Staying ahead of all this change at all times is no longer possible, but becoming more flexible and responsive to it is.  While the MIT Libraries have made significant incremental progress on many technology fronts this year, our biggest effort was a reorganization of the Libraries' technology staff to improve our capacity to deal with technology initiatives in the coming years.  Prior to 2008, the technology staff of the MIT Libraries was concentrated in the Technology Directorate and considerable effort was spent to communicate and coordinate between the technology experts and the rest of the Libraries' staff, with varying success.  In 2008 we reorganized these staff, recognizing the pervasive nature of technology use and innovation throughout the Libraries, by creating three new groups: Technology Operations, based in the new Information Resources directorate; Technology Services, based in the Public Services area; and Technology Research and Development, which remains in the Technology directorate.

Technology Operations is primarily responsible for providing high-quality, reliable and secure support and maintenance for all of the MIT Libraries' production systems and technology-based services.  In particular, it provides hardware and software infrastructure (servers, desktops and network), systems administration and integration, and workflow automation to support all units of the MIT Libraries.  As our main technology support operation, this department is now part of Information Resources, the directorate that encompasses other major library operations including acquisitions, cataloging and serials processing, collection management and the archives.  Bringing together these large operations departments allows more flexibility in staffing arrangements as priorities shift between managing print and digital materials, and as we implement new types of library management systems that are more responsive to constantly changing workflows.

Technology Services includes the staff focused on the Libraries' technology-based services.  This currently includes the DSpace (i.e., Digital Library) Product Manager and the Web Manager and Usability Specialist, but the group can now grow to include product and service managers for a range of the Libraries' technology-based offerings.  Located in Public Services, these staff now have improved access to the MIT faculty and students, as well as the librarians who work with them every day, so they can focus our products and services more effectively based on direct user feedback.

Finally, the Technology Research and Development group is in the Technology directorate and focuses its efforts to perform research on new technology-based systems and services, and professional software development for computer systems commissioned by the Libraries.  Staff in this group has significant technical analysis, project management, and software development expertise, and without the need to support operational systems they can focus on meeting ambitious project schedules.  In this area we created a new Software Development Group and appointed its new director, Richard Rodgers, at the close of FY08.  This group will provide software development services for both the Libraries' internal systems (e.g., the library catalog or electronic resource management system) and the grant projects requiring in-house development. 

With this new distributed, cross-organizational model of technology expertise and responsibility we can begin to streamline work to identify new opportunities and priorities (Technology Services), develop the new systems they require (Technology R&D), or implement commercial offerings and provide focused production support for deployed services (Technology Operations).  Staff growth can occur quickly across this spectrum as new needs are identified or priorities change.  Technology staff have clearer priorities for their work and are empowered to stay focused on those priorities.  And the entire staff will have a clearer sense of who is responsible for each aspect of technology-based services, whom to talk with about a new idea, and how our process for innovation functions. 

Technology Operations for FY08

This past year the Libraries' technical infrastructure and production systems operation was managed by the Systems and Technology Services department, now part of Information Resources.  They are responsible for managing the computing equipment, systems and services that support the work of the Libraries' staff and users.  Their mission is to provide an excellent and stable production environment, and to plan and implement improvements that provide benefits for the immediate future.

New initiatives

  • The long-awaited new search interface for licensed electronic resources, called Vera Multi-Search, was launched as a beta in the fall of 2007, and will move into full production during FY09.  The Multi-Search project has been particularly exciting as an application of "design thinking," representing a new way for the Libraries to envision and work on a project for public access to our resources.
  • Major progress was made this past year in establishing the DOME, the MIT Libraries' digital library collection.  The DOME is built with the DSpace software platform but covers a different range of collections than the current DSpace@MIT service, primarily digitized library and archives collections.  Several projects were begun in FY08 to populate the DOME, and a large number of digital images from the Rotch Art and Architecture Library are now available directly or via Stellar, MIT's course management system.
  • In July 2007, an OCLC WorldCat Study Team was formed to explore the feasibility of a WorldCat Local implementation at MIT, as a possible replacement for, or supplement to, the Barton Online Public Access Catalog system now in place.  The group recommended implementing a six-month pilot of WorldCat Local as a "beta" system.   As we end FY08, the team is conducting initial testing of our local implementation and plans to make the beta available to the public and conduct the usability tests in the fall semester.
  • Working with the Institute Archives, staff tested and evaluated the Archivist's Toolkit, an open-source software system developed by UCSD and NYU for creating and managing finding aids for archives collections.  A recommendation to adopt the system was approved in FY08 and will be implemented in FY09, allowing the Institute Archives to move their extensive paper finding aid collection online and into modern management practices.  Once the finding aids are in the Archivist's Toolkit they can be exported in a variety of formats to support online public access to this valuable data.   
  • Progress on implementing Verde, a commercial electronic resource management (ERM) application developed by Ex Libris in partnership with the MIT and Harvard Libraries, continued with some delays this year.  Work was done to clean up discrepancies between our journal holdings data in the Ex Libris SFX Knowledgebase and the same data in Vera, our locally-developed ERM system, to support a future migration to Verde.
  • During FY08 the Libraries also began work with IS&T on a new e-authorization system: a set of Web Services for the MIT Roles database (representing faculty, staff, students and other members of the MIT community) that will enable fine-grained authorization of clients' access to the Libraries' licensed electronic resources.  There was also significant progress in the implementation of Touchstone, MIT's Shibboleth-based single sign-on service, for Aleph and ILLiad.  We hope that FY09 will see the full implementation of these systems to bring us into better compliance with our license terms and conditions.

Support for production systems

Infrastructure

The MIT Libraries continues to maintain most of its computer systems in-house, with a combination of UNIX and Microsoft Windows servers and related hardware.  We leverage IS&T services when possible, and in FY08 we converted server backups to the IS&T supported TSM backup system, achieving significant savings over local tape backups.  This year we consolidate several servers and migrated to the i386 Linux platform and iSCSI disks for UNIX-based applications, so that we now have an inexpensive COTS hardware environment.  All networked files are stored on the shared SAN storage hardware for improved access and better reliability.  Our Windows hardware was also modernized and streamlined.  As a result of this we have achieved significant savings with improved service, and have a solid plan in place to maintain these systems for the next 3-5 years.  Finally, we experienced no significant down time for any production systems with the exception of the aging, FileMaker-based Vera application that has since been upgraded to new software to resolve the problem.

Electronic Resource Management

Our custom electronic resource management system, Vera, was subjected to increasing numbers of attacks by network robots during the summer of 2007, causing service disruptions for the better part of FY08.  In May the system was upgraded to a current version of the FileMaker software by an outside consultant to eliminate the problem.  As our dependence on this application increases we intend to migrate both the public access component and the business back end component to new systems during FY09. 

Barton Integrated Library System

The Barton system, based on the Ex Libris Aleph product, continues to be the mainstay of the MIT Libraries' technology-based systems.  It supports almost all of our normal business processes (acquisitions, cataloging, serials processing, circulation, etc.) as well as providing public access to the catalog of our print and many of our digital collections.  In FY08 the major change to Barton was the addition of a new ‘BookPage' service that allows patrons to request local delivery of library books.  This service is described in more detail elsewhere in this report.

DSpace@MIT

During FY08 there were two major upgrades to the DSpace software platform, leading up to a version 1.5 rollout that will be completed in early FY09.  This version includes significant amounts of software developed by MIT staff to move the platform towards the new architecture for DSpace that was specified in FY07.  It has greatly improved modularity and scalability in a number of areas, as well as a new UI framework, Manakin, originally developed at Texas A&M.  The system was migrated to new hardware, stabilized, and now holds almost 30,000 items of Open Access research content, including more than 21,000 digital MIT theses. 

Support for staff and public computing

  • This year, most of the Libraries' public computers were migrated to the centrally-supported Windows domain, allowing more centralized maintenance of these workstations, and has contributed to providing a safer and more comfortable environment for the public.
  • The process for department heads to request new computer hardware and software was extensively revised this year, to simplify routine purchases and require more analysis of service implications and priorities for major purchases.
  • In FY08 we concluded a major planning effort for a Libraries-wide Windows Vista rollout to be implemented in FY09.  This included creating an inventory database and significant hardware purchases to accommodate the new operating system.

Technology Research and Development for FY08

The MIT Libraries' Digital Library Research Group continues to work on a number of grant-funded projects working on different aspects technology related to knowledge management and digital curation. 

Ongoing Research

PLEDGE

During FY08 the Group concluded a multi-year project funded by the National Archives and Records Administration and the NSF called PLEDGE (PoLicy Enforcement in Distributed Grid Environments).  The PLEDGE Project was a collaboration between the MIT Libraries and the San Diego Supercomputer Center to investigate how digital content management (or curation) policies affect digital research archives at every level, and how those policies should be captured, encoded, enforced, and shared across preservation environments.  The project integrated the DSpace digital archive system with SDSC's Storage Resource Broker (SRB) and later iRODS (a new rules-based preservation system), and the Harvard DataVerse archive for statistical datasets.  By using MIT's digital collections and existing policies as examples, we were able to test a range of archival activities and how to automate and audit them in a large scale in a distributed, networked environment.  A set of real-world policies drawn from DSpace@MIT were modeled into an RDF ontology called Rei, and the DSpace system was modified to capture, store, and transmit relevant policies to 3rd party systems like iRODS as part of a distributed data management strategy.  While the project concluded successfully, there is much work left to be done in this area, and a new proposal was recently submitted to HP Labs to continue the work with new partners and new funding.

SIMILE

FY08 saw the start of the final year of the SIMILE Project, funded by the Mellon Foundation in 2005.  SIMILE is a long-term collaboration of the MIT Libraries with MIT's CSAIL and the W3C to develop next-generation metadata discovery (i.e., search and browse), navigation, and display tools based on Semantic Web technology standards such as RDF (a universal data model that supports Web-scale data integration and interoperability).  The SIMILE mission has been to tackle the problem of Web-scale data integration and interoperability by working on all parts of the problem simultaneously - tools to capture, process, search, browse, visualize, and navigate data - and for all levels of scale, from small personal collections to enterprise-sized collections such as libraries maintain.
This was a remarkably productive year for the project, including the migration of several of its more successful tools to independent open-source software communities hosted at Google Code.  These included:

  • Timeline, a Web widget for visualizing time-based events, like Google Maps for time-based information.
  • Timeplot, another widget for plotting time series and laying time-based events over them.
  • Welkin, a graph-based RDF visualizer.
  • Gadget, an XML inspector designed to create useful summaries of vast pools of XML data.
  • Exhibit, a Web application that allows users to create interactive data-rich Web pages without ever touching a database or a web server, or doing any programming.

Other products of the SIMILE Project improved in FY08 included the ‘Longwell' RDF faceted browsing engine, a data schema translation Web Service called ‘Babel', an extension to the popular Thunderbird email client called ‘Seek' that allows users to do faceted browsing of their email, and a tool called ‘Solvent' that lets user scrape ordinary Web pages to produce RDF data from them.
These products are now used in hundreds of Web sites, from the New York Times and the BBC, to major open-source software projects like ‘Zotero' (a bibliography management tool that extends the popular Firefox Web browser), to individual historians working with small collections of incredibly rich scholarly data.  The MIT Libraries is beginning to use this technology for both data management projects (e.g., the FACADE Project described below) and to pilot new services that will allow MIT faculty to manage their data more effectively.

FACADE

During FY08 the Libraries' FACADE Project, funded by the Institute for Library and Museum Studies in 2006, completed its first year of activities.  FACADE (Future-proofing Architectural Computer-Aided Design) is a multi-year project to work with the School of Architecture, and Professor William Mitchell in particular, on the challenges and opportunities of collecting and preserving digital CAD models and related material for important architecture of the twenty-first century.  Our flagship project is the MIT Stata Center, which was designed by Frank O. Gehry using a state-of-the-art 3D CAD system called CATIA.  The records of that building include 3D CAD models, 2D CAD drawings, and myriad digital files related to the project from its initial design to its final reality.  For architects and architectural historians of the future, having access to archives of this type will, of course, be critical, and no other research library anywhere is tackling this problem.

In FY08 we acquired two new architectural collections: the United States Institute of Peace, designed by Moshe Safdie Associates and now under construction in Washington D.C., and the Caltrans District 7 Headquarters building in Los Angeles designed by Thom Mayne of Morphosis.  Both projects made extensive use of 3D CAD modeling tools and produced tens of thousands of digital artifacts that we will process, organize, archive, and make available via a new prototype system that we developed using the DSpace and various SIMILE tools for research access.  To create the prototype we developed a new RDF ontology for architecture project - the Project Information Model or PIM ontology - and wrote software to process the archives.  We also did significant research on preservation strategies for digital 3D CAD formats, both proprietary and open standard-based, which will represent a major contribution to the field of digital preservation.  Finally, we changed the DSpace digital archive platform to integrate with various external services that track digital file formats (e.g., the PRONOM registry of the UK National Archives and the Global Digital Format Registry under development by Harvard University and OCLC).

Other Strategic Initiatives

Cyberinfrastructure

A continuing area of intense interest in the Library domain is the emerging cyberinfrastructure for scientific and humanities research computing.  The past year saw the conclusion of deliberation by an ARL e-science task force, of which MIT was a participant, and the publication of their recommendations to the community.  It's safe to say that the MIT Libraries are at the forefront of cyberinfrastructure planning, for MIT in particular and the library community in general.  During FY08 we participated in numerous planning and strategy meetings on the subject and prepared an NSF grant proposal for a new, extremely ambitious ‘DataNet' program to build exemplar national and global data research infrastructure organizations that provide unique opportunities to communities of researchers to advance science and/or engineering research and learning.  MIT's proposal was among the final five selected for full review, but was ultimately unsuccessful in that round and will be revised and resubmitted in FY09.  Whatever the outcome of the NSF proposal, the Libraries have determined that there is an unmet need at MIT to provide expert support to researchers in data management and long-term archiving, and that the Libraries can and should play a major role in providing that service.

The DOME Roadmap

The Libraries also continue to respond to the shifting environment around Open Access publishing and new services that will attract more open research content into our care so that we can leverage it in a variety of ways.  The DSpace Product Manager position, which was still new at the outset of FY08, has provided us with a roadmap for the DSpace service offering, including both ‘traditional' content (i.e., Open Access scholarly publications and related research data) and digital library content (i.e., the scanned images in the DOME).  The roadmap combines data sources and new services that will help us position the Libraries and leverage our investment in the DSpace technology platform.  As an example of a new service that leverages technology investment and tests new lines of service, during FY08 we began designing a service called ‘Citeline' that will support faculty bibliography capture, publishing, aggregating, and mining.  Citeline relies on both DSpace and some of the SIMILE Project's tools to accomplish this, and we have found that with these tools now in place we can more easily envision new services, prototype and pilot them, evaluate and assess them, and launch them as appropriate - the beginnings of a true rapid innovation model.

Digital Preservation

Protecting and preserving digital collections, whether born-digital or digitized from analog media, remains a central concern for the MIT Libraries.  Digital research content is very fragile and expensive to manage and preserve over time, and MIT (like most other major research libraries) has only begun to build the infrastructure, expertise and strategies for dealing with this now-pervasive material.  A concept we explored in FY08 for this was the creation of a Center for Digital Permanence, to house a variety of activities in that area, from cutting-edge research like FACADE to new large-scale digital preservation operations on our existing collections.  We have developed a white paper for the concept and began to share it with interested donors.  While the details are still in development, the creation of such a Center would allow the Libraries to build large-scale production preservation operations and conduct the necessary research and development to determine the right path forward.

The DSpace™ Platform

The DSpace open-source software platform continued to be in heavy use at MIT, and at approximately 400 other research universities or other organizations world wide.  In late FY07 we established a new 501(c)3 organization - the DSpace Foundation - and hired its founding Director, Michele Kimpton, to lead the community of DSpace adopters to the next level of technology development and shared, cross-institutional services.  FY08 saw the solidification of the new organization and its board of directors, to the point where the MIT Libraries formal role in managing the DSpace community has almost disappeared: the Foundation is based within MIT and the Libraries' Director chairs its Board of Directors, and we continue to have two DSpace software committers on staff, but the day-to-day responsibilities for managing the software and community have now shifted to the Foundation, and it is in excellent hands.  At the end of FY08 a new Technology Director - Brad McLean - was recruited to help with the DSpace 2.0 development initiative and to coordinate development efforts across the community.  With his recruitment the staff of the Foundation is now complete, and they are well on their way to full independence in the next few years.  The DSpace 2.0 development work has been funded by the UK JISC agency and that work will begin in the fall of FY09.  The last major event in the DSpace community during FY08 was the announcement of a formal collaboration with the Fedora community (another major open-source software repository platform in the academic sector) and its foundation, the Fedora Commons.  We feel that this collaboration will help both platforms and the entire research university community to advance further faster in its exploration and exploitation of these new technologies.

Conclusion

As we begin FY09 we see the MIT Libraries better positioned to respond to the still rapidly accelerating changes in the technological landscape affecting us and our community of patrons.  We have a rational technology staffing model that supports both opportunistic and planned growth across the system with appropriate oversight.  We have more focused technological skill sets that play to our strengths and can be responsibly sustained.  We have increased expertise in senior management around technology planning and capacity, and this will really begin to manifest itself in FY09.  We have a strong and focused technology research agenda around digital content management and long-term digital preservation that builds on excellent relationships with senior MIT faculty in complementary areas.  And we continue to maintain a track record of good system performance as well as an excellent technology support ethic.  The MIT community, including the staff of the MIT Libraries, continues to place unprecedented demand on the capabilities and capacity of the Libraries' technology systems and staff, but we are responding well, and keeping the Libraries relevant and distinctive within MIT and around the world.

MacKenzie Smith
Associate Director for Technology


webmaster@libraries.mit.edu
This page was last updated on 07/16/09