Skip to Main Content

SoTE: Data Mining

Boston College Attribution

This page was copied and adapted from the Boston College Libraries Text & Data Mining Guide under a Creative Commons Attribution 4.0 License. Our thanks to Boston College for developing this excellent resource and sharing it under the license!

HathiTrust Research Center (HTRC) elephant head logo

The HathiTrust Research Center (HTRC) is the research arm of HathiTrust.  It facilitates scholarly research

using the large-scale HathiTrust Digital Library by providing mechanisms for researchers to access content in the HathiTrust and study it using computational tools for text analysis.

Entire Collection Piloted for TDM

The HathiTrust Research Center has expanded its services to support computational research on the entire collection of one of the world’s largest digital libraries, held by HathiTrust. HathiTrust’s collections include over 14 million digitized volumes, including more than 7 million books, 725,000 US federal government documents, and 350,000 serial publications. Previously the HathiTrust Research Center supported analysis of only the public domain subset of the HathiTrust collection. Researchers will now be able to explore the entire collection and run an algorithm against all 14 million volumes. The change is being piloted in 2016 and is expected to be more widely available in 2017.

[HathiTrust Press Release]

Create an account

Most of the HTRC services require an account to log in and interact with the tools. Register for an account by going to the Portal and choosing "Sign up" from the menu. Anyone using an email address from a nonprofit institution of higher education is allowed to register, including those whose institutions are not HathiTrust members. 

Tools

The HTRC has created a suite of tools that allow researchers to perform text analysis on content in the HathiTrust Digital Library. These tools include:

Documentation

HTRC provides extensive documentation on the Tools, including instruction videos, tutorials, presentations, examples and Getting Started FAQs.