Creating Data and Tools for Open Cultural Analysis Activities: TORCHLITE and Beyond
Open Cultural Analysis Activities
Auf Einladung der Tandem-Professorin für "Data Science in the Humanities" Prof. Dr. Christin Katharina Kreutz wird Prof. Stephen Downie am 29. April 2026 das Herder-Institut besuchen und von 15-16 Uhr einen Vortrag halten. Der Vortrag findet hybrid statt, im Vortragssaal des Herder-Institut.
Creating Data and Tools for Open Cultural Analysis Activities: TORCHLITE and Beyond
Abstract:
The HathiTrust Research Center (HTRC) provides analytic access to 19 million volumes found in the HathiTrust Digital Library (HTDL). Roughly 10 million of the volumes in the collection are under copyright restrictions, and cannot be freely shared with scholars. In order to provide more open access to HathiTrust’s materials, the HTRC has released its Extracted Features (EF) 2.5 Dataset which contains over 3 trillion unigram tokens found on each of the 6 billion pages in the corpus. This talk provides a briefing update on the HTRC’s ongoing “Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction” (TORCHLITE) project. Funded by the National Endowment for Humanities (NEH), TORCHLITE strives to create easy-to-use text analysis tools, dashboards and application programming interfaces (APIs) to facilitate open cultural analytics research using the uniquely valuable HTDL data. The talk will highlight motivations, challenges, and accomplishments of the TORCHLITE to date, along with its upcoming next steps that envision the creation of an international consortium of similar groups, tentatively called the “International Consortium for Open Cultural Analytics,” which is designed to encourage Extracted Feature access to otherwise closedcollections. The talk will conclude with a conversation about the upcoming sunsetting of HTRC and the role that EF will play in continuing the work of the HTRC team and other digital humanities scholars.
J. Stephen Downie is a professor and the Executive Associate Dean at the School of Information Sciences, University of Illinois at Urbana-Champaign. He is also the Illinois Co-Director of the HathiTrust Research Center. Professor Downie conducts work in Digital Libraries, Digital Humanities and Music Information Retrieval. He holds degrees from the University of Western Ontario including a BA (music theory and composition), Master’s of Library and Information Science (MLIS), and a PhD in Library and Information Science.