Programme

CUTTING EDGE // Common Formats IJ lounge
Paper

Tuesday 17 September 2019, 13:30 - 15:00

Detailed Programme

13:30 - 14:00

Recovering '90s Data Tapes - Experiences From the KB Web Archaeology project

Johan van der Knijff (KB | National Library of the Netherlands)

The recovery of digital data from tape formats from the mid to late '90s is not well covered by existing digital preservation and forensics literature. This paper addresses this knowledge gap with a discussion of the hardware and software that can be used to read such tapes. It introduces tapeimgr, a user-friendly software application that allows one to read tapes in a format-agnostic manner. It also presents workflows that integrate the discussed hardware and software components. It then shows how these workflows were used to recover the contents of a set of DDS-1, DDS-3 and DLT-IV tapes from the mid to late '90s. These tapes contain the source data of a number of "lost" web sites that the National Library of the Netherlands (KB) is planning to reconstruct at a later stage as part of its ongoing Web Archaeology project.

14:00 - 14:30

Extensive Extensions: Exploring File Extensions in Library of Congress Collections

Trevor Owens (Library of Congress)

Through four decades of digital initiatives and collecting programs, the U.S. Library of Congress has built up a sizable digital collection. In support of long-term management of this digital content, in 2018 staff worked to review information about file extensions of content in the permanent digital collection through analysis of data in the institution’s primary digital content inventory system in support of long-term management of this digital content,. This paper reports on the results of the analysis and how these findings will inform the development of digital content management policy and practice at the institution.

14:30 - 14:45

What is the Standard Format for Digitized Audio? Approaches for Storing Complex Audio Objects

Nick Krabbenhoeft (New York Public Library)

The best practices for representing analog audio with digital bitstreams are relatively clear. Sample the signal with 24 bits of resolution at 96KHz. The standards for storing the data are less clear, especially for media with complex configurations of faces, regions, and streams. Whether accomplished through metadata and/or file format, the strategy chosen to represent the complexity of the original media has long-term preservation implications. Best practice guides rarely document these edge cases and informal discussions with practitioners have revealed a wide range of practices. This paper aims to outline the specific challenges of representing complex audio objects after digitization and potential approaches that can be adopted by the community.

14:45 - 15:00

The Case For A Standard That’s Old News: Recommendation of PDF/A for Digitized Newspaper Preservation

Anna Oates (Federal Reserve Bank of St. Louis) and William Schlaack (University of Illinois at Urbana-Champaign)

Since 2004, the Library of Congress, a beholden stakeholder in the risk assessment of and consideration for file formats, has supported the preservation of and access to digitized historic newspapers through the National Digital Newspaper Program (NDNP), a distributed, mass digitization program. This paper evaluates the implementation and validation of PDF as specified for NDNP, explores the benefits of PDF/A, and analyzes the adverse effects for digital preservation as realized in current digitization workflows.