Recovered Memories: Bringing the Air Force Archive Into the Digital Age

Author/Creator

Author/Creator ORCID

Date

2023-05-31

Department

University of Baltimore. Yale Gordon College of Arts and Sciences

Program

University of Baltimore. Doctor of Science in Information and Interaction Design

Citation of Original Publication

Rights

Attribution-NonCommercial-NoDerivs 3.0 United States
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by The University of Baltimore for non-commercial research and educational purposes.

Abstract

Recruiting unpaid volunteers through a “crowdsourcing” technique has become a near-ubiquitous tactic of libraries, archives, and other institution seeking to textually digitize their analog holdings. Determining 1) certain demographic characteristics of those volunteers, 2) their familiarity with the topic, 3) their motivation, and 4) the process they use that correlate with higher performance in that task, has been little studied. Recovered Memories investigates 9 such variables both individually and combined. Optical character recognition (OCR) technology is one automated method for converting text, but has proven to be unsatisfactory for creating web content or e-books, mining data, creating data for artificial intelligence (AI) and machine learning (ML) software, and even some search functions. This paper theorized that some of the variables studied will correlate with higher performance. This research project examined the efficacy of a custom-built application to gather data (www.airforcehistory.net) One hundred and twelve historic documents from the Air Force Historical Research Agency’s archive were used in the examination to measure participants’ performance. Despite a relatively small sample size (n=50) and the lack of control endemic to field research, the participant variables of ‘familiarity with U.S. history in Vietnam (PV5),’ ‘process choice (PV8),’ and ‘age (PV2), group affiliation (PV7), and ‘familiarity with U.S. Air Force operations (PV6)’ were related. Multiple regression showed three factors correlated with better performance: gender, familiarity/Vietnam, and process selection. The author argues that OCR correction rather than copying/transcription, i.e. process choice, results in best performance and might be generalizable. Given the interest at the federal level in textually digitizing the holdings of military archives, this study has strong implications for policy and practice.