MIT Libraries logo MIT Libraries

MIT Logo Search

Engineering the Future of the Past

Blog about MIT Libraries' Digital Archives Work

Engineering the Future of the Past

Explaining Digital Forensics for Collecting Institutions

As a Charter Member of the BitCurator Consortium, the MIT Libraries in involved in outreach and education about how digital forensics can be used for understanding digital material and making it accessible.  As a collecting repository, the Institute Archives and Special Collections department of the Libraries has been utilizing and testing digital forensics tools and methods for the past three years.

Kari Smith, Digital Archivist, is a member of the BitCurator Consortium Executive Board and is on the Advisory Board for the BitCurator Access project.  Through her BitCurator Community work, Smith has recently participated as an instructor along with Nancy McGovern and Cal Lee for an advanced Digital Preservation Management Workshop focused on Management Considerations for Digital Forensics: Implementing Tools and Techniques into Digital Preservation Workflows.

Smith will be presenting a three-hour hands-on workshop on using the BitCurator environment for the MIT IAP winter session on Firday January 22, 2016.  Register for IAP Digital Forensics 1/22/16

BitCurator logo

Presenting Digital Archives and Preservation Tools and Systems! is a Creative Commons Attribution 2.5 Denmark license. Image: Jørgen Stamp. is a Creative Commons Attribution 2.5 Denmark license. Image: Jørgen Stamp.

In an effort to inform my colleagues and keep myself up to date about the tools and systems we are using or assessing for our digital archives collections I have been giving open presentations to MIT LIbraries staff.

The first presentation was an overview of our needs, ecosystem, and a brief summary of each of the seven tools/systems to discuss.  Here is the presentation file:   DATools_MIT_overview_pt1

Other presentations covered basics about ArchivesSpace, Archivematica, Access to Memory (atom), and BitCurator and how we are thinking about where they fit in our Digital Archives Ecosystem.

Resources for archiving your personal digital data

Today I had the pleasure of contributing to the Radio Boston program, hosted by Meghna Chakrabarti and with Bina Venkataraman as her guest.

For information about how to preserve your digital data follow-up with these resources:

Personal Archiving at the Library of Congress
The Signal, blog at the Library of Congress

Taking time each year to go through your digital files and decide what to keep and what to delete can be easier if you do it during Preservation Week.  Check for programs, workshops, and activities happening in your local area.  In 2016, Preservation Week will be April 24-30 and is usually the last week in April.

In April for Preservation Week 2015 at MIT Libraries, Jessica Venlet, Fellow for Digital Archives and I hosted an activity, Connect the Clues! Exploring Personal Archives.  Read her Connect the Clues! blog post about the activity and link to the presentation slides and resources handout. Check out these tips for archiving your content from social media sites, steps for finding, selecting, storing, and managing your digital files, and some information on legal and digital estate planning: Tips for Personal Digital Archiving, ver3.

Thanks to Radio Boston for hosting a great segment on a topic that effects all creators and users of digital data!

Providing Reference to Disk Images

This month I received a reference request that involved a digital collection we acquired some months back.  It’s not an unusual type of reference question but I thought I’d blog about it as a use case for digital archives. The request came from the office of origin so we provided the reference directly.

The requestor wanted to know if the collection contained correspondence about a specific topic.  The date range was 2006-2008 and perhaps also 2012-2013.  Happily, I had three very unique names as terms that I could use for my search.  Also, happily the collection is a hybrid collection of both print and digital material.  My colleague searched the print archives and did not locate the material.  Step two was onto me and digital reference.

The material that I was searching in was the data files from a laptop of the donor. Because the material is under restriction by policy, we had not done much processing beyond triage and understanding the basic nature of the content for our collection description. For the original acquisition, we had captured the data files using Guymager as a E01 disk image.

So, the steps for answering the reference request involved:

  1. Using BitCurator tools to process the E01 disk image.
    1. Running BulkExtractor and then Fiwalk to find the “features” of the data on the disk.  This took a day.
    2. Attempted to run Fiwalk and Reports from BitCurator.  This took a day and stalled out my computer (running BitCurator in Virtual Box timed out.)
  2. Switched to using Autopsy directly on my Windows machine to reprocess the E01 disk files and search for the specific terms using Autopsy’s feature extraction tool.
    1. This took a day but was successful.
    2. I was able to preview the files located by the search and see that almost all the hits were valid returns and would help to answer the reference request.
    3. Using the Export feature, I selected the relevant hits and exported a set of files for later.
  3. Using Autopsy I also located the PST and OST file from the disk image and exported them to my temporary reference storage area on my off-line computer.
  4. Using Mozilla Thunderbird, I planned to search the email and attachments for more files that might help answer the reference request.
  5. In order the search the email files, I needed to transform the PST and OST files into the non-proprietary format MBOX.  This took about 2 hours.
  6. Once I had the files in MBOX format, I set up a temporary User Account and then following instructions on how to open MBOX files in Thunderbird off-line, I put the MBOX files into the appropriate User – Appdata – folders and restarted Thunderbird.
  7. Thunderbird took a while to parse the email – there was 2 years of messages and attachments in the main “Inbox” file.  I also loaded the off-line email folders “Personal” file (with 10 years of selected and filed messages and attachments.)  This took the better part of a day.
  8. Once I had access to the email messages in Thunderbird (think Eudora Reader if you are not familiar with Thunderbird) I was also to repeat my search using my reference terms and located another several dozen hits that included attachments.
  9. Finally, after I had copied the messages / attachments into my temporary reference question storage folder, I went through and de-duped the results from the 5 search sets.
  10. In order to provide the search results, we decided to provide read-only PDF versions to the requestor – all the data files were amenable to that kind of transformation; none were audio, video, spreadsheets with formulas, etc.  We did notice that many of the early files (from 2005-2007) had “auto update” dates that were changing when opened for export to PDF.

How does this compare with your process for providing reference to data that is captured as a disk image?  I’d love to get your feedback.  Contact me via the link on this page.

Engaging with Digital Archives Open Source Communities is a Creative Commons Attribution 2.5 Denmark license. Image: Jørgen Stamp. is a Creative Commons Attribution 2.5 Denmark license. Image: Jørgen Stamp.

In addition to the work we are doing here at the MIT Libraries, we are also engaged with the global community of archivist and preservation folks developing open source tools for digital archives, digital forensics, and digital preservation.  Here are a few of the ways in which we are participating in communities to enhance, improve, and consider future integration and development of these tools.

ArchivesSpace Community

MIT Libraries is a Charter Member of the ArchivesSpace Community with Tom Rosko, Head of Institute Archives and Special Collections, as our official representative. ArchivesSpace is the next-generation archives information management application. Kari Smith is also a member of the Technical Advisory Committee (TAC) and the Chair of the TAC documentation sub-group.  These groups assist witharchivesspace_logo prioritizing and improving the technical development and technical documentation for the software system.  Just recently, she has been appointed to the cross-council Features Prioritization team that includes development and program management staff from ArchiveSpace as well as a few members from the User Advisory Committee and the TAC.

BitCurator Community and BitCurator Access

As the only digital forensics environment and tool set created specifically for collecting institutions, the BitCurator environment and BitCurator Access projects are critical for digital archives.  We participated in the BitCurator-in-a-Box assessment prior to the release of ver 1.0 and also hosted Cal Lee and Kam Woods at the Digital Sustainability Lab and a presentation to MIT Libraries staff and regional colleagues.  logoKari participates in the BitCurator Consortium monthly teleconferences during which users of BitCurator Tools discuss their use and thoughts about enhancements to the environment.  She was also recently appointment to the BitCurator Access Advisory Board.  BitCurator Access software tools will assist collecting institutions (libraries, archives, and museums) in providing web-based and local access to born-digital materials held on disk images.  Additionally, the Andrew W. Mellon funded project led by Cal Lee at UNC Chapel Hill will explore transforming and using digital forensics metadata in collecting environments, and redaction of file items, metadata and hidden data from disk images.

Archivematica users community

Although not a formal sub-group of the work by Artefactual Systems on the Archivematica digital preservation system, we are engaged with the intersections and integration of using Archivematica with BitCurator and ArchivesSpace.  logoKari and Nancy McGovern, Head of Digital Preservation and Curation at MIT Libraries are working on projects and with other uses of Archivematica to discuss case studies and explore how data objects and metadata can travel between the software systems for optimal use, automation, and to meet our archives and preservation requirements. Much of this work is happening under the aegis of the Digital Sustainability Lab at the Libraries.

History_Historie_Digitization is a Creative Commons Attribution 2.5 Denmark license. Image: Jørgen Stamp.