The Case of Case Law

By Marian Moszoro & Henry E Smith

Case law used to require analytical thinking and a huge dose of memory to link a large number of court cases and opinions. This is changing, at least in the later aspect. In the last decades, law publishers have started digitizing and distributing their libraries, first on CDs and—most recently—over Internet-based application program interphases (APIs). The most known case law publishers in the US are LexisNexis and WestLaw, which also include basic search and analytical tools. These commercial platforms are expensive for individuals and not all graduate schools can afford access to them.

In 2010, two scholars from Berkeley launched CourtListener (, a non-profit wiki-style legal searcher, which provides access to millions of cases and legal opinions. Its API includes advanced Boolean search capabilities (e.g., with operators for intersection, union, negation, phrase search, grouped queries, fielded search, wildcard, fuzzy and proximity, ranges, and field boosting). This project is growing rapidly and is worth following by legal scholars.


A “Distinguished” Endeavor: Ravel Law

In 2014, Harvard Law School partnered with Ravel Law, a VC-backed startup from California, to digitize HLS Library’s case stocks—the world’s largest academic law library. The project includes the scanning and OCR-ing (i.e., optical character recognition) of 43,000 volumes of US case law, ca. 40 million pages with 8-10 million state and federal cases. Ravel Law will then categorize and facilitate access to these documents to the Harvard community and, afterwards, to all researchers, while conserving rights to their commercial use.

Currently, there are nine states covered (about one quarter of the total cases): Arkansas, California, Delaware, Massachusetts, Montana, New York, Oregon, Texas, and Washington. Given the pace of 45 thousand pages scanned per week (with variation in the yield depending on quality/condition of the paper, machine downtime, and available staff), the team assesses that it will take another 18 months to scan all volumes, with a six-month lag in their digitization.

Apart from JDs, Ravel Law’s team includes a dozen of PhDs in data science and linguistics. Their algorithms index not only case names, topics, citations, and jurisdiction, but also companies’ CIK (i.e., Central Index Key), patent numbers, and judges’ names. This feature will allow to easily link cases to other datasets, sparking a new era of law & economics empirical research.

Ravel Law is also planning to develop advanced textual analysis tools. Some sophisticated trivia will be accessible at the tip of the finger. For example, do you know what the most distinguished word used by lawyers is? Not surprisingly, “distinguished!”


Prospective Research

Larger samples and wider scope of indexing along with textual analysis tools will allow for a new spectrum of legal research, to name a few:

  • Rise and evolution of doctrines (down to the personal level of influence, i.e., which judge influenced whom)
  • Legal federalism
  • Scope of regulation and consumer protection
  • Limits of patent protection
  • Corporate governance adaptation


Changes in the Legal Services Industry 

The same way the harnessing of digital technologies and innovative business models by Uber and AirBnB disrupted the local transportation and hoteling industry, respectively, our conjecture is that CourtListener, Ravel Law, and similar ventures are going to revolutionize the legal services industry. For example, presently law companies bill by the hour, where a significant fraction of this time is dedicated to case search and data analysis. Instead of days of specialized staff, it will take a few moments for an intern or paralegal to reach the same result of selected precedents with a suggested line of argumentation.

Fantasizing further on future developments, one can imagine that simple law opinions would be commoditized and complex ones unraveled and—provided competition—become significantly cheaper. The law profession in general would be in less demand and legal talent would focus on narrow issues and leading cases. Maybe we would witness a legal case on regulation or competition involving legal cases providers..., which machines would solve by themselves.


More about HLS Library and Ravel Law: