AI in legal industry — A case study on predicting judgements through deep learning

Nikhil Chandna
12 min readApr 8, 2020

--

The statue of justice

Throughout history, people have relied on courts, juries, kings and queens for delivering justice. Today the ability of courts to provide fair and swift justice to its citizens is not only one of the most important obligations of any country but is a primary pillar of a democracy.

India which prides itself as being the world’s largest democracy remains wanting when it comes to the judicial system. It struggles with one of the world’s highest backlogs of pending cases which can be attributed to the shortage of judges and legal loopholes that delay judgments. It is estimated that India has only 19 judges for every million citizens.

What is this case study about?

In this case study, we have trained and built a deep learning-based model to predict judgements issued by the Supreme Court of India with considerable accuracy. Our aim is to set the ball rolling for an advanced ML solution in the future that can augment the decision-making capability of judges to help speed up the legal process and bring relief to millions.

As for the structure of this article, we’ll talk briefly about the benefits of Machine Learning (ML) in legal, the technology we’ve used to predict decisions and discuss in detail the process we adopted for prediction starting from gathering relevant data to building a deep learning model. We shall also outline the next steps to improve the model to enhance its accuracy. Lastly, you can find our contact information for any questions or suggestions and to touch base for collaborating with us.

Why Machine Learning?

Machine learning has begun to revolutionize various industries already and we can’t think of a reason why it shouldn’t be able to help in India’s legal system. Its power to analyze vast amounts of data to learn patterns and predict outcomes can especially come in handy in the legal sector. The latest advancements in Machine learning, especially in the sub field of NLP, makes it all the more relevant. Already, there are solutions and bots in the market where NLP is used to ‘smart search’ existing judgements and rulings to help with the preparation of a new case.

Why not traditional ML?

Models that predict legal outcomes have been built in abundance in the last few years but these tend to rely more on traditional machine learning algorithms with greater stress towards feature engineering; for example- ruling history of judges, appeal origin court and sections. An additional layer of contextual AI will certainly help here, that’s why we decided to give Deep learning a go here.

Benefits of using ML in legal

Accelerated decision process and outcomes

Instant verification of input data

Unbiased point of view

Ability to quickly showcase historical cases with similar patterns

Easier to spot corruption by identifying cases with high variance in human and AI decisions

Areas in legal that already use ML

 Document Review and Legal Research

 Contract Management

 Client Due Diligence

 Predicting Legal Outcomes

It is important to note that the day when an algorithm can decisively understand facts and objectively identify the crime based on the IPC sections mentioned in the charge sheet while simultaneously having a detailed understanding of the world around it is still far away.

Which ML?

For gathering and preprocessing data we used Python3. All models have been developed through Fastai. Fastai is a deep learning library built on top of PyTorch with an ability to run advanced deep learning algorithms. We have found fastai to be extremely useful for this exercise, with its stress on *transfer learning and practical before theory, fastai has almost revolutionized the way in which one can use deep learning by making it highly accessible and easy to use.

*Transfer learning: fastai provides built in models that have already been trained on substantial amount of data, this allows users to run a pretrained model and use new data to only update the already meaningful weight values rather than starting from scratch (random values) . This process ends up saving a lot of time and effort and provides results with good accuracy fairly early in the training cycle.

The Process

Shows the flow of the process with different stages

Gather and format data

Create data labels (training data)

Create a Text Data Bunch

Build language learning model

Save encoder from language model

Build final prediction model using encoder

Gather and Format data

The first step is to collect and store publicly available data of judgments. Due to a relatively recent digitization drive by the Indian government, all Supreme Court judgements are present in electronic form on a government website. Every judgment is available as a pdf file and contains information like the case number, names of presiding judges on the bench, details of accused and respondent parties, history of the case throughout the lower courts and facts about the case complete with the final judgment of the Supreme Court which can fall under one of the two possibilities, whether the appeal is “Allowed” or it is “Dismissed”.

We have taken help of a simple data scraping method using python to scrape four years’ worth of data and converted it into csv format which can then easily be read into a data frame in a python notebook. Our input data frame consists of more than 3600 unique Supreme Court judgments.

Create data labels

In a typical machine learning set up, we would have input data complete with labels (for our data, ‘allowed’ and ‘dismissed’) which can be used as training and validation dataset for training the algorithm.

Collecting facts or the input data- Ideally, it should have been the same information that was fed to the Supreme Court at the onset of the trial and consisting of all history and information about the facts of the case. In our case though, there was no way of getting hold of the aforementioned input data, that is, the exact data that was fed to the court; So, we used the facts mentioned in the pdf judgment of the case as a proxy of the input data. This is the most logical and simple way forward as has been corroborated by multiple lawyers, the facts mentioned in the final judgment are almost identical to the information provided to the court initially.

The first page of the pdf file contains information about the case number, important dates, details of accused and respondents, names of lawyers representing the parties and names of judges presiding over, from then on start the facts of the case together with a detailed timeline and the outcome of the judgment from the lower court and lastly the decision of the court is mentioned at the end. There are countries where the structure of a judgment is strictly defined and follows a given standard but we have found that there is no definitive protocol being followed in the writing of judgments in India, the only thing that we can be certain of is that the court’s operative order (final decision) can be found on the last two pages.

Labelling the facts or input data- Once we have extracted text from the pdf in a data frame, we added a column called labels which contains the information about the outcome of the appeal. We had already established that the last two pages contain the final judgment information by using *Regex matching. We then extracted the label from the last two pages of the judgment whether the appeal was allowed or dismissed, thereby creating labels for each record. This removes any costly requirement of creating labels manually and removes restrictions on the size of our input. We are sufficiently sure that this way of labelling is mostly accurate, we haven’t checked each label manually, but it is an assumption that the *evidence permits.

*(If you require further details of the evidence please feel free to check the actual code and data presentin the github repository, a link to which has been provided at the end of this case study.)

Snapshot of data

To complete our input data frame, the last step required us to remove the last two pages of the judgment from the text in the input as these contain the final decision of the court of whether to allow the appeal or not. If left there, the algorithm can directly learn to read the last two pages and predict the decision. The final data frame contains the complete judgment sans the last two pages and a label corresponding to each input.

Creating a Text DataBunch

For this notebook, the csv file obtained from scraping was converted into a data frame and then into a Data Bunch. Data Bunch is a specialized form of data storage that is specific to fastai. This is a customized form of data storage that is highly convenient to use as it carries out multiple operations in the background. We will use a specialized form of DataBunch called TextDataBunch which is used in applications where the input data is in the form of text. Going over all the background operations that happen while creating a TextDataBunch is not possible but we will explain a few important ones below.

Creation of valid and training datasets: The dataset is divided into a training set and a validation set, while creating the TextDataBunch we provided a value of 0.2 for the parameter valid_pct which divided the dataset into a ratio of 80:20 keeping 80% of the dataset as a training set and using the remaining 20% as the validation set.

Tokenization: This is a relatively simple process where the text is broken into words or tokens. It can be achieved by splitting the text by spaces but evidence has shown that it is not the best way, a better way is to split the sentences not just on spaces but splitting punctuations and contractions too. Fastai performs all these actions and goes a step ahead to clean the data by removing any HTML code instances. The whole text is also converted into lowercase to prevent any capitalized forms of the same tokens to be counted twice. Fastai provides an option to take a look at the tokenized data, which reveals not only the execution of the steps that have been already mentioned earlier but other useful information as well. For example, one can notice that all apostrophe “s” are clunked together to form just one token, several special tokens are also created that are used to replace unknown tokens or to create separate text fields.

Numericalization: Another operation that is carried out in the background by fastai during the creation of TextDataBunch is called Numericalization. This is one of the most important tasks of the TextDataBunch and one of the main reasons that we use TextDataBunch rather than the normal DataBunch of fastai for processing text. When fastai reads in an image dataset, the dataset already consists of number that represent pixels of the image, but when we read text, we need to convert this text into tokens and those tokens to numbers, allowing the application of universal approximation. This step is known as Numercalization and it works by converting all tokens to integer values. To maintain an optimum performance and to reduce the probability of creating a highly sparse vector, fastai, by default, only keeps tokens that appear at least twice in the matrix while also capping the maximum vocabulary size to 60,000 tokens. Although the maximum vocabulary size can be altered, the default values were used for this solution.

snapshot of preprocessed data

Language learning model

A language learning model strives to predict the next word in the sentence and thereby gains a basic understanding of the text and its context while also learning about the language. The ability to predict the next word in the sentence is not an easy task and requires a pretty good knowledge of the context and general understanding. After developing this model though, we saved its encoder and fed it into our final RNN model that predicted decisions. More details on our final model will come later in this article.

Additionally, Fastai has a pretrained language model which comes built in the library. This way we didn’t have to train our model from scratch. It saved a lot of time and computing power. For our case, we used a pretrained language learning model that had been trained on Wikipedia dataset called Wiki-103.

We created our new language model containing pre trained weights from Wikitext-103 as initial values, this new language model tries to predict the next word in the judgment data. While reading a new DataBunch provided by us, this model did not start from zero as its model parameters were not set to random values as they would have otherwise in a new model that starts from scratch. Because we made use of a pretrained model, the weights were set to values resulting from the training done using the text from Wikipedia. This made us reach higher accuracies quickly. During the training, our model displayed an accuracy of 46%. This means our language model is able to predict the next word of the judgment correctly 46% of the time. This is very good by the industry standards.

Language model accuracy

Another feature that fastai provides that offers great convenience is the ability to unfreeze only a few of the last layers of the model and train only those layers. This feature lets us refine the existing pre trained model to align it better to our new data without disrupting the initial trained parameters, thereby saving a lot of training time.

To further increase the accuracy of the language learning model and to also increase the legal knowledge of the model, we have augmented our existing input of judgments with the constitution of India and IPC acts and sections, this was done by simply scraping these two documents into data frames and feeding to model while in the language learning model phase. This is not difficult to understand as all arguments in a case must refer to multiple IPC acts, sections and rights of citizens as per given in the constitution of India, therefore a model that has knowledge of these documents will surely be able to understand the associations and thus will be able to predict the next word with greater accuracy and confidence score.

Classifier model (RNN)

Once the language model was trained, we saved the encoder of this model. In the next phase of training we used this encoder in training our final prediction model. This final model was a RNN type classification model (AWD LSTM) which is preferred in NLP applications. RNN stands for recurrent neural networks, reader can easily find tons of material on the internet if they wish to brush up on any of the concepts mentioned in this article.

This model was able to achieve an accuracy of 76% which means it was able to read details of an previously unseen case as input and was able to predict the court’s decision correctly approximately 76 out of hundred times.

We are confident that the accuracy can be increased substantially by tweaking various parameter values of the model and also by trying out a range of learning rates.

Final model snapshot

Conclusion and outcome

Through our model, we have demonstrated that ML and Deep learning can be successfully applied to the legal domain for enhancing the efficiency of the judicial process. The outcomes of this case study prove that there is a definite possibility of building a solution to assist judges in deciding a case:

Automated the process for ingesting huge volumes of legal data and producing structured information and rules

Significantly reduces the judge’s effort for reviewing the several hundreds of case documents and materials for similar cases

Successfully augments the analysis of the case thus helping judges to arrive at the decision faster

By further increasing the accuracy, we can start to test this hypothesis for the lower courts where most of the backlogs are present

This is an exciting opportunity for legal and technology folk to work together and develop a system that benefits the society as a whole.

Get in touch

Any feedback or suggestions will be greatly appreciated, if anyone would like to collaborate on the next steps and take this forward, please feel free to get in touch.

Nikhil.chandna71@gmail.com

https://www.linkedIn.com/in/nik

Special thanks to Debabrata Roy for building the scraping code used in this solution.

The code can be examined at the following link on github.

Disclaimer — Views expressed in this article are personal, and in no way representative of my employer or any other organization I may be affiliated with.

--

--