Visualizing American case law

Published in

Interacta

5 min readOct 22, 2020

The Seamless Web: How to visualize a corpus of 4 million legal precedents and make American judicial system look clear

In 2017 we’ve been asked to design and develop a better way of searching through case law. That resulted in an experimental online search tool I would like to tell you about today.

The American system of law is based on precedents. It means that during trials the court relies on previous decisions in similar cases. Besides, it surely considers the epoch and local socio-economic situation. Thus, each new legal case refers to others.

Over three centuries, more than 6 million cases came to court and they are all interconnected. Certainly, there were numerous attempts to apply digital tools to this database. However, they have mainly engaged linear textual search. Such a filter already facilitates lawyers’ work, but it doesn’t allow to see the relationship and hierarchy of cases.

Infinite relations can be presented in a graph, where some cases will be more authoritative than others due to a number of factors. Thus, an idea emerged to visualize this dataset in bulk.

Basic solution

We set out to visualize the system of American precedents and make it interactive. The source base was borrowed from the Free Law Project library https://free.law/ that consists of texts. So, the key tool for delving in it was text search.

Search input providing you suggestions as you type

We decided to elaborate on a metaphor of web and presented search results as networks of documents referring to each other. Documents are shown as dots, their size implies the significance of the case. Large clusters make up subnets, which, in turn, connect into a common network. There are two display modes: in Network Mode one can see all the relations in bulk (using an interactive Force Layout simulation from D3.js), and in Timeline Mode — the distribution of cases on a timeline.

Search results displayed as a network (Network Mode)

Almost 4 million cases constitute the project. Every case has been added to an Elaticsearch database and processed in order to extract citations to other cases, most quoted sentences, key terms and paragraphs. When you do a search, Elasticsearch first finds relevant cases, then our code forms a network with detected communities (by using the Louvain method for community detection) and sends it to your web browser for visualization.

Case network with nodes distributed over time (Timeline Mode).

On the right side of the UI we display Case Insights and Network Insights information blocks. They ensure quick navigation between cases and allow deeper diving into the network. The panel reflects the most significant cases relevant for your search, key terms, fragments from influential cases and the most cited phrases. Any unit can become the next step for search. There is also a Community Insights tab, which gathers the same information for each cluster. Thus, you can endlessly surf from case to case until you collect all relevant links and quotes.

As a result, the connection of all court judgments was made visual. More importantly, this dataviz has become a useful tool for lawyers. Now there is no need to look for reciprocal links in texts — it is enough to make a search query and the system proposes a network of related cases and ranks them by significance and citation rate.

Text Processing Engine

Natural language processing underlies this project, it is a form of AI that engages both linguistics and statistics. In order to rate cases and single out terms, we set up the system to estimate the importance of the whole texts, quotes or separate words.

Case Reader highlighting key and most quoted paragraphs

The importance of a case is determined by the number of links to it: the more other cases refer to it — the more important it is.
It took a month for the algorithm searching for the most cited fragments to process the entire database. Besides accurate quotes, it was necessary to take into account possible errors, typos and inaccurate quotes. Briefly, the method runs as follows: the text is split into passages, passages — into paragraphs, paragraphs — into sentences, sentences into parts of sentences. Then each part is studied for references to it in 4 million other cases. After that, another algorithm based on the Fuzzy logic principles searches for “similarities” in these quotes in order to attribute greater relevance. As a result, it took a month for our text processing engine to simplify for us the search through legal texts as much as possible. Now we instantly observe all relations and view them in a clear way.
Important terms are identified by a similar principle. The algorithm establishes if the word is unique in relation to an individual text and to the entire corpus. If the term is frequently met in a particular text, and rarely in the whole database, the algorithm marks it as important.

Wrapping up

Researchers of the American judicial system describe it as a “seamless web”, it gave the name to the project. We visualized a huge database that previously existed in the form of linear text sequences and created an experimental but tool for researchers and professionals.

Visualization with its inherent abilities can make non-obvious phenomena obvious:

reveal importance of this or that case,
tell a story with a timeline,
trace the evolution of court sentences for a particular topic, etc.

For us, this most vividly illustrates the main purpose of data visualization: by means of design and algorithms create tools for work and decision-making.

While the project is highly experimental and has a bunch of flaws, you’re welcome to give it a try at https://theseamlessweb.com

Work in progress: Designing the visualization

Dev Team: Nikita Rokotyan, Darren Reid, Olya Stukova
Written by: Tina Garnik, Nikita Rokotyan

Visualizing American case law

Basic solution

Text Processing Engine

Wrapping up

Written by Nikita Rokotyan