Programme

NEW HORIZONS // Web Archiving Room at the Top
Paper

Tuesday 17 September 2019, 13:30 - 15:00

Detailed Programme

13:30 - 14:00

Who is asking? Humans and Machines experience a different Scholarly Web

Martin Klein (Los Alamos National Laboratory)

Libraries and archives are motivated to capture and archive scholarly resources on the web. However, the dynamic nature of the web in addition to frequent changes at the end of scholarly publishing platforms have crawling engineers continuously update their archiving framework. In this paper we report on our comparative study to investigate how scholarly publishers respond to common HTTP requests that resemble typical behavior of both machines such as web crawlers and humans. Our findings confirm that the scholarly web responds differently to machine behavior on the one hand and human behavior on the other. This work aims to inform crawling engineers and archivists tasked to capture the scholarly web of these differences and help guide them to use appropriate tools.

14:00 - 14:15

Saving Data Journalism: Using ReproZip-Web to Capture Dynamic Websites for Future Reuse

Vicky Steeves (New York University)

While dynamic and interactive Web applications are becoming increasingly common to convey news and stories to people all around the world, their technological complexity makes it hard to archive and preserve such applications, and as such, they are being lost. We present ReproZip-Web, an open-source prototype aimed at saving these news applications from extinction. ReproZip-Web leverages ReproZip, a computational reproducibility tool, and Webrecorder, a tool for recording Web resources, to automatically and transparently capture and replay dynamic Websites. The prototype creates a bundle that contains all the information needed to reproduce a news application, and its lightweight nature makes it ideal for distribution and preservation. We will present our ongoing work on the prototype, and also discuss some use cases and avenues for future development.

14:15 - 14:30

Data Stewards and Digital Preservation in Everyday Research Practice

Esther Plomp (TU Delft)

Data Stewards at TU Delft promote digital preservation by incorporating actions into daily research routines. Typical activities include requirement scoping, tool selection and policy drafting, all of which are tailored to a specific group, project or faculty. Here we discuss the situation of the Data Stewards within the university and examples of preservation work, including the creation of data repositories and a trial of Webrecorder.