Programme

Machine Learning For Big Text: A Tutorial On Using Predictive Coding Tools To Process Large Archival Datasets Studio
Tutorial

Monday 16 September 2019, 09:30 - 12:30

Detailed Programme

Machine Learning For Big Text: A Tutorial On Using Predictive Coding Tools To Process Large Archival Datasets

Brent West (University of Illinois) and Joanne Kaczmarek (University of Illinois)

Big datasets can be a rich source of history, yet they pose many challenges to archivists. They can be difficult to acquire and process due to the varied formats and sheer volume of files. Sensitive content must be identified in advance of making materials publicly available. These challenges inhibit access for research purposes and often dissuade archivists from acquiring big datasets. Predictive coding can alleviate these challenges by using supervised machine learning to: augment appraisal decisions, identify and prioritize sensitive content for review and redaction, and generate descriptive metadata of themes and trends. Following the authors’ previous work processing Capstone email, participants will learn about innovative and effective practices to enable digital preservation of large textual datasets at scale. Hands-on experience with specific tools is provided.