New Technologies and the Law: The Impact of Artificial Intelligence on the Practice of Law

28 min readApr 10, 2018

I. Introduction

The legal practice is getting increasingly confronted by artificial intelligence. But what is artificial intelligence and how can artificial intelligence help legal practitioners better do their respective jobs? This paper will discuss these both questions. The aim of this paper is to provide an overview of the broad area of artificial intelligence and highlight its relevance specifically in relation to law and the legal services industry. Before we move on, two caveats are in place. First, this paper is written for an audience who has no prior experience in the field of artificial intelligence nor a technical background, but nevertheless wishes to better understand artificial intelligence and some of its underlying technologies. Second, this field is so vast that it is very challenging to provide a comprehensive overview of artificial intelligence. In addition, since this paper’s audience is a non-technical audience, some of the complexities have been simplified in order to improve readability and brevity. As noted, this paper serves as an introduction into the field of artificial intelligence applied to the practice of law.

Part II of this paper will introduce the reader to the driving forces behind artificial intelligence’s recent popularity, and will answer the question: why is artificial intelligence currently such a hot topic? Part III will dive into the terminology of ‘artificial intelligence’ and other concepts which are frequently used in concjunction, including ‘machine learning’. The chapter will reveal that artificial intelligence is an umbrella term and that there are multiple subsets of artificial intelligence or ‘techniques’ to achieve artificial intelligence. The remainder of this paper will highlight the three most important techniques for the practice of law: Part IV covers expert systems, Part V will discuss machine learning and Part VI will analyze natural language processing. The reader will notice that machine learning has been given the greatest emphasis as it is a fundamental block to understand natural language processing. Finally, Part VII will tie everything together and will provide concluding remarks on the relevance of artificial intelligence for the practice of law.

II. Driving Forces Behind Artificial Intelligence’s Recent Popularity

More and more actors in the legal services industry are adopting artificial intelligence technologies. For example, recently it was announced that Davis Polk and Latham & Watkins, two top US law firms, adopted artificial intelligence software to conduct contract review[1]. Law schools are also increasingly offering courses in the area of artificial intelligence[2]. What is driving this change? There are two types of drivers: those linked to the technology in general and those linked to the legal services industry specifically.

1. Technological Drivers

In relation to the drivers that relate to technology in general, multiple drivers can be identified which are all intimately linked to one another. These drivers all relate to the fact that technology is advancing at an exponential pace. These exponential advancements can be illustrated by referring to Moore’s Law. Moore’s Law is the observation that processing power doubles approximately every two years, while the cost for such power is halving[3]. Since processing power is an important ingredient for artificial intelligence (certain applications of artificial intelligence need large amounts of processing power), the progression in processing power has a direct impact on the adoption of artificial intelligence.

In addition, while Moore’s Law relates to processing power, it is also being mirrored in other aspects of technology[4]. Other examples include the explosion of data and the increasing adoption of cloud computing[5]. As will be explained further, artificial intelligence, and particularly machine learning, rely for a substantial part on data. As a result, the more data we collect as a society, the more opportunities there are for machine learning algorithms to use that data to perform predictions. The explosion of data in recent years has therefore also had a positive effect on the adoption of artificial intelligence[6]. Finally, cloud computing allows for artificial intelligence software to be run on computers which are on the cloud, meaning that legal practitioners can subscribe to services which offer artificial intelligence software without the need to invest in expensive hardware that need to be located on the practitioner’s premises.

2. Drivers Specifically Relating to the Practice of Law

Two drivers can be identified which relate specifically to the practice of law and result in the adoption of artificial intelligence, amongst other novel technologies, within the legal services industry. First, the ‘more-for-less’ challenge has resulted in legal service providers to adopt more efficient ways to deliver their services. The more-for-less challenge, as coined by Richard Susskind, means that consumers of legal services need increasingly more legal services but at a lower cost[7]. Those consumers include in-house lawyers, but also small businesses with no in-house lawyers or individual citizens. As we will see in Part VI, artificial intelligence applications such as in the framework of the e-Discovery process can drastically reduce the number of hours a lawyer needs to spend on certain tasks, and in turn provide more legal services at a lower cost.

A second driver which relates to the practice of law is the liberalization of the legal services industry. The legal services industry is characterized by regulatory protection in order to protect the public and protect the legal profession. Two examples of such rules include rules on the unauthorized practice of law and the non-lawyer ownership of law firms[8]. These rules make it harder for other players, including those that may have better access to artificial intelligence applications, to enter the legal services industry. However, these protective rules are starting to be challenged in some parts of the world. In the United Kingdom, for example, the Legal Services Act 2007 was introduced which “permits the setting up of new types of businesses called ‘alternative business structures’, so that non-lawyers can own and run legal business”[9].

III. Defining Artificial Intelligence and Affiliated Terminology

As noted in the previous chapter, artificial intelligence is a hot topic. It has become increasingly difficult to not come across the concept of artificial intelligence in recent literature, news articles and even legal literature[10]. Searching for “artificial intelligence” on the website of the Wall Street Journal, for example, yields 138 results for the five-year period between 2007 and 2012. Searching for the same word for the same five-year period between 2012 and 2017 yields 2387 results. In 2016, Google DeepMind researchers published a paper in Naturereporting how their computer program, called AlphaGo, was able to beat five to zero against the best European Go-player[11]. That was notable, even after IBM Deep Blue defeated the reigning world champion chess player twelve years ago[12]. The reason is because the 2,500-year old game of Go contains exponentially more moves compared to chess, making it exponentially more complex[13].

Even though coverage of artificial intelligence has skyrocketed, it remains difficult to pinpoint what it exactly entails. This blurriness is reinforced since it is often used together or interchangeably with concepts such as machine learning and deep learning. To understand how artificial intelligence may impact the legal practice, and how it relates to the concepts enumerated earlier, a definition is important. While there are many definitions of artificial intelligence and there is a lack of a precise, universally accepted definition of AI[14], the easiest definition of artificial intelligence is a machine or a computer that is able to “perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision making, and translation between languages”[15]or “a set of techniques aimed at approximating some aspect of human or animal cognition using machines”[16]. Both definitions are easy since it allows for the user of the definition to readily identify which technologies may be classified as having some form of artificial intelligence.

An attentive reader might object to such definitions, claiming that many technologies nowadays may be qualified as having artificial intelligence. A good example is a simple pocket calculator, which performs calculations much quicker than a human. These technologies, while supposedly being empowered by artificial intelligence, are, however, not the type of advanced technologies that are often associated with artificial intelligence of today such as autonomous cars. This reflection is correct and is the result of two things: the fact that artificial intelligence as a research area has existed since 1956, when John McCarthy initially coined the term, and something called the “AI effect”[17]. The AI effect means the process whereby artificial intelligence “brings a new technology into the common fold, [and] people become accustomed to this technology, it stops being considered [to be artificial intelligence], and newer technology emerges”[18]. In other words, artificial intelligence can be seen as a scale on which certain technologies may be plotted. The scale evolves from technologies which society considers straightforward (such as the pocket calculator) to more advanced technologies (such as the autonomous car). Research on artificial intelligence, paired with its commercialization, has resulted in dramatic advancements in the field which in turn expands the scale further.

As noted earlier, artificial intelligence is often used in conjunction with or interchangeably with concepts such as machine learning and deep learning. Some authors even use these terms interchangeably and use ‘artificial intelligence’ when they really mean machine learning. This is because artificial intelligence has to be seen as an umbrella term encompassing a wide range of research. There are multiple ‘techniques’ or ‘cognitive technologies’ (also known as ‘cognitive computing’) which emanate from the field of artificial intelligence[19]. Machine learning and deep learning are techniques to achieve artificial intelligence. Michael Mills identifies 7 of such techniques: machine learning, natural language processing, expert systems, vision, speech, planning and robotics (see Figure 1)[20]. Each of these techniques usually aims to make it possible for the computer to perform a task which is usually reserved for humans. For vision and speech, that is recognizing images and objects and understanding the spoken word. Machine learning aims to ‘teach’ a computer rules by using data and examples, rather than using explicit rules. Natural language processing relates, in turn, to the understanding and translation of natural language[21].

*Figure 1 Michael Mills’ Seven Techniques*

In the remainder of this paper, we will discuss the three techniques that are most relevant for the legal practice, which are expert systems, machine learning and natural language processing. It is the two latter techniques which have been drastically improved in the last few years and which makes it now increasingly possible to apply artificial intelligence to the legal profession.

IV. Expert Systems

Before starting with machine learning and natural language processing, it is important to take a moment to understand expert systems. The popularity of expert systems boomed around the 1980s, when “perhaps half of the Fortune 500 were developing or maintaining” expert systems[22]. The popularity of expert systems was also picked up on in the legal practice[23]. Expert systems may therefore be seen as one of the early ‘booms’ of the efforts of applying artificial intelligence in the context of the legal practice.

Expert systems are “computer programs that have been constructed (with the assistance of human experts) in such a way that they are capable of functioning at the standard of (and sometimes even at a higher standard than) experts in given fields”[24]. This is achieved by programming the expertise of a human expert and the reasoning behind such expertise to solve issues with if-then rules. The expert system will prompt the user with certain questions. Based on the input, the expert system will use the logical steps, programmed by the expert, to come to a certain outcome. As a result, any other person can use the expert system to solve a particular problem, and the expert system will provide the user with the reason why a particular solution is the best solution[25].

Expert systems have, however, not been able to disrupt the legal practice. The most important limitations of expert systems, which tempered its ability to fundamentally replace human experts such as lawyers, included the difficulty of capturing the experts’ tacit knowledge and, more importantly, the cost and complexity of creating and maintaining such systems[26]. These limitations have resulted in the legal industry to direct its hopes to machine learning and natural language processing. It should be noted, however, that expert systems may still be useful in specific areas of law where its limitations are less relevant. Areas of law which involve specific and detailed regulations may still be suitable to be incorporated in a legal expert system, since there is less room for ‘tacit knowledge’ and it is relatively easy to create the appropriate if-then rules because these can often be found explicitly in the regulation. Good example of areas of law which are governed by regulations include tax law and securities law (for example, the Regulation D exemptions).

V. Machine Learning

1. Defining Machine Learning and its Benefits

Expert systems translate human expertise into an algorithm. An algorithm is a “sequence of instructions that are carried out to transform the input to the output”[27]. Using expert systems still assumes that the expert in question knowsthe rules, since the human expert is required to program the rules into the algorithm. It is, however, not always possible to know all the rules. In addition, in some cases it may also simply be too cumbersome to determine all the rules. Imagine, for example, spam e-mail (the most used example to illustrate the use of machine learning). Which rules determine whether an e-mail is spam or not? Trying to figure out these rules is vastly time consuming (if at all possible), and the expert system would still not be very accurate since the human export would need to accommodate all possible situations. In addition, what constitutes spam may change over time or may vary from individual to individual.

That is where machine learning comes in. Machine learning makes it possible to replace knowledge with data. Machine learning refers to “the ability of computer systems to improve their performance by exposure to data without the need to follow explicitly programmed instructions”[28]. Machine learning systems effectively discovers patterns in data, and create algorithms based on such patterns. As a result, no human expert needs to expressly program the rules into the systems. This is useful because, as explained, the expert might not know those rules (the patterns) or might take a long time to find such rules.

2. Supervised Learning or How Machine Learning ‘Learns’

Now that we know that machine learning systems discover patterns in data, the subsequent question is: how does it find these patterns? In order for the computer to learn, it needs feedback. Feedback may be the result of three distinguishable learning models: supervised learning, reinforcement learning and unsupervised learning[29]. In this section, we will focus on supervised learning.

The (currently) most important way for a computer to get feedback is through supervised learning. Supervised learning works by feeding into the computer two datasets: the training set and a “new” dataset. The training set is distinguishable from the new dataset, because the training set has labeled data. Labeling data means that all the instances in the dataset are labeled with the desired outcome[30]. This means, for example, that all 10,000 e-mails in our dataset are labeled with either being spam or no-spam. The training set effectively grants the computer the ‘right answers’.

The computer will subsequently go through all instances of the dataset and search for patterns which allows the computer to predict the correct outcome on new data. The computer will use a statistical model and will adjust the parameters of this model in order to improve its accuracy. This process is repetitive and incremental in nature. These parameters may be certain attributes of the data instances, such as, for the spam example, country of origin, presence of certain words or number of recipients. Once the computer ‘learned’ the most accurate model and parameters (which it can test because it has the training set), the computer is ready to apply its model to the new dataset. Generally, the more training data a computer receives, the more robust its learning model will be.

In order to be able to determine whether a particular model’s predictive power is performant, it is necessary to empirically evaluate the model. In order to evaluate the model, a k-fold cross validationmay be used. This procedure will remove some of the training data before the training of the model is started (for example, 20% of the training set). Once the model has been trained, the model will be applied to this subset of training data in order to evaluate its performance. This is possible, because the training data is labeled and the computer can therefore self-assess its performance.

Four metrics are key in relation to the k-fold cross validation and relate to true or false positives or negatives. True negatives (TN) are the total number of negative cases that were predicted negative. True positives (TP) are the total number of positive cases that were predicted positive. False negatives (FN) are the total number of positive cases that were predicted negative. Finally, false positives (FP) are the total number of negative cases that were predicted positive. The four key metrics are the following[31]:

- Accuracy: the ratio of correct case predictions over all case predictions. (TN + TP)/(TN + TP + FN + FP)

- Precision: the ratio of the number of positive case predictions that are correct over the toal number of positive case predictions. (TP)/(TP + FP)

- Recall: the ratio of positive case predictions that are correct over the number of cases that were positive. (TP)/(TP + FN)

- F1-score or F1 measure: the harmonic mean of precision and recall here both measures are treated as equally important.

Amongst the four metrics, accuracy is often the most important metric to signal the overall performance and predictive ability of a machine learning model. It is important to note in this framework that it is not realistic to expect a 100% accuracy from a machine learning model for complex use cases. In addition, it is not uncommon for example for people to have a bias against machine learning models and in favor of human experts, thinking that humans have a better ability to predict outcomes. However, when human experts and machine learning models are evaluated when performing the same task, it may very well be that the machine learning model is more accurate. For example, Katz et al. (2014) were able to create a machine learning model which was more accurate in predicting Supreme Court outcomes than human legal experts. The machine learning model predicted 69.7% of the case outcomes and 70.9% of the individual Justices outcomes over a 60-year period, compared to 59% and 67.9% respectively for human legal experts[32]. The performance is not drastically better, but is enough to show that there is no reason to have a bias against machine learning models.

3. Decision trees

One example of a model which a computer may use to discover patterns in data is a decision tree. It is one of the most simplistic and successful models for machine learning[33]. A decision tree is somewhat similar to an expert system in the way that it also uses if-then statements. The difference, of course, is that the computer incrementally adjusts the parameters which result in the appropriate if-then statements that result to the best result. The decision tree may be visualized as multiple nodes, each branching to new nodes (hence the name of a decision tree, whereby each node is a ‘leaf’ of the tree). Every node in the tree is an if-then statement linked to a particular statement or question[34].

For example: if an e-mail has more than 500 recipients, then it is spam. If it has less than 500 recipients, it is not spam. However, the computer will move down to a new node (or a new ‘test’). By combining all the nodes, the computer will be able to bring together a (sometimes complex) set of rules to determine the outcome based on the data that is provided. For example, if an e-mail contains the words “free money” (node 1), originates from country X (node 2) and has 100 recipients (node 3), there is a 90% chance that the e-mail is spam.

The ‘smaller’ (the lowest number of nodes) the decision tree, the easier it is to interpret the results. A decision tree with more than 100 nodes may be very complicated for a human expert to understand. In any case, a decision tree model will always order the nodes in a way that the first node will be the most important if-then statement[35]. That means that the first node (or ‘rule’) has the strongest ability distinguish the data. It should be noted, however, that the decision tree model will not aim to create nodes which are 100% accurate, meaning that it can create certain if-then rules which doesn’t necessarily fit all the data. The model will only try to create nodes which are as ‘accurate’ as possible, whereby accuracy is defined as the ratio of correct case predications over all case predictions[36].

A good example of a decision tree model may be found in Figure 2, which illustrates how we can create a decision tree in relation to previous decisions for releasing a defendant for bail for a small set of instances. Note that the first node in the illustration, whether the crime related to drugs, is the most important node since it has the ability to predict four out of the seven data instances. The subsequent nodes create more granularity, and allows for the prediction of cases where no drugs is involved.

*Figure 2 Bail decision data (a) from which decision tree is created (b)* ***[37]***

4. Generalization and Overfitting

One important problem in relation to machine learning in general[38], but for decision trees in particular[39], is the issue of ‘overfitting’ or generalization. A decision tree model will create as many nodes as possible to fit as much of the data into an if-then framework. Every subsequent node classifies fewer amounts of instances. Without any limitations, this may result in hundreds of nodes, whereby the lowest level nodes are simply there to classify, for example, two instances. This may be relevant for the illustration given in Figure 2 (where there are only seven instances), but if the dataset exists of thousands of instances, creating a complicated rule (complicated because every node is a cumulative if-then rule) to classify two instances may not be worthwhile. In addition, it is likely that these two instances have separate attributes (and are therefore being classified separately by a node) simply because of the “idiosyncrasies or biases”[40]of the dataset. As a result, we should be suspicious to create a rule for these random instances which are not necessarily representative in relation to the total dataset.

That is in essence what overfitting is: a machine learning algorithm, such as a decision tree model, will “seize on any pattern it can find in the input”[41]and the model has “so many extra terms that it fits the random variations in data rather than real patterns”[42]. There are two ways to deal with overfitting, and both methods may be used simultaneously. First, it possible to use decision tree ‘pruning’. Pruning means removing the nodes that are not relevant. In order to determine whether a node is relevant or not, we may use the statistical significance test. This means, in essence, that we look at the data and determine how much it deviates from a “perfect absence of pattern”[43]. Second, a decision tree ‘forest’ can be used. That means that multiple decision trees will be made, and that all such trees will be put together in order to determine which of the nodes in each of the individual decision tree is not relevant. The forest will subsequently aggregate into one final result[44].

5. The Machine Learning Model Doesn’t Really Learn but Uses Proxies Instead

What does ‘learning’ mean? We defined artificial intelligence as the ability of computer systems to perform tasks normally requiring human intelligence. In the framework of machine learning, the task that is performed by the computer system is the process of learning. In other words, we are trying to replicate the cognitive function of humans[45]. It is important to note, however, that the current capabilities of machine learning (and artificial intelligence as a whole) only approximatesthe cognitive function. That is because artificial intelligence systems such as machine learning models do not really ‘learn’ the same way that a human does. A machine learning model will use heuristics and proxies, including statistical correlations derived from the patterns in the data, to create a result which is a similar result compared to when a human would solve the particular issue. What a machine learning model is unable to do is understand abstract concepts in relation to the tasks it is assigned to do[46]. Take the example of translation tools such as Google Translate, which is also based on machine learning. These systems do not understand the words that are being translated. They use data and statistics to determine how each word should be translated. Another good example are the recommendation algorithms from Netflix and Amazon. Amazon’s algorithm doesn’t really know that the books Nineteen Eighty-Four and Brave New World are similar, but it knows from purchasing data that it is more likely that customers purchase both books and therefore assumes (as a proxy) that the books have some degree of similarity and may be recommended to customers who purchase either book.

As a result, the model cannot generalize its ‘learning’ the same way a human would do[47]. The generalized learning as described here may be qualified as artificial generalintelligence. However, artificial intelligence is not yet sufficiently developed to be labeled as having general intelligence. Instead, artificial intelligence can only do very specific tasks. Its strength is currently restricted to ‘narrow’ intelligence, also called artificial narrowintelligence[48].

VI. Natural Language Processing

Up until now we have mainly focused on applying machine learning to what is called ‘structured data’, meaning that it is data comprised of clearly defined data types whose pattern makes them easily searchable. Figure 2 is a good example of structured data, whereby the data is neatly organized in a table with rows and columns. Unstructured data is essentially everything that is not structured. The most important example of unstructured data is text. In order to apply artificial intelligence in the framework of textual data, natural language processing may be used. Natural language processing is the “subfield of computer science concerned with using computational techniques to learn, understand, and produce human language content”[49].

Applying machine learning to structured data is extremely useful, including in the framework of the legal profession. The illustration in Figure 2 demonstrates that we can use structured data to develop predictions, for example in relation to case outcome predications. Nevertheless, the core data of the legal profession is frequently textual data. Lawyers work with legal documents all the time: court decisions, court documents, contracts and statutory provisions. As a result, in order to fully utilize the power of artificial intelligence and machine learning in the context of the legal practice, we want to apply machine learning to textual data. This chapter will discuss how machine learning models may be used in the framework of textual data.

1. Applying Machine Learning to Textual Data

In order to apply machine learning to textual data, multiple steps may be identified which are required to be taken (beyond collecting the textual data in the first place)[50]. In the remainder of this section, we will discuss two important steps in an attempt to simplify the process. The first step is to ‘clean’ the data, and, more particularly, ‘normalize’ and ‘tokenize’ the text. The end result in the application of machine learning on textual data is essentially identical to the application on structured data: we want to discover patterns in the data. For textual data, that means that we want to extract statistical correlations between words and their frequency.

In order to properly discover patterns in textual data, we need to ‘normalize’ certain words. Normalization involves “converting words to lower case and stemming them to their uninflected roots in order to eliminate superficial variations”[51]. No human has any issues with recognizing that “Running”, “running”, “ran” and “run” are all variations of one and the same verb. Without any normalization, these words are, however, completely different for a computer. Normalization will convert all these words to “run”. In addition, normalization will also remove ‘stop words’, which are words with a high frequency but do not have substantial informational value. Examples are function words (of, the, and) and pronouns (them, who, that). Finally, tokenization will convert single (unigrams) or multiple words (n-grams) to ‘tokens’: “Please, drink the tea” will become ‘please’ ‘drink’ ‘tea’ (provided that the ‘the’ will be removed as it is a stop word).

The previous example illustrated unigrams, where one word is one token. N-grams allows for a token to include two (‘bigrams’), three (‘t-trigrams’) or more words. N-grams are important because it better allows to capture the complexity of “information representation with lexical, syntactic and semantic rules at different levels”[52]. For example, let us imagine we want to discover the sentiment of the following phrase: “I am not good at tennis, I am very bad at it”. Using unigrams (‘I’ ‘am’ ‘not’ ‘good’ ‘at’ ‘tennis’ ‘I’ ‘am’ ‘very’ ‘bad’ ‘at’ ‘it’), we would capture both ‘good’ and ‘bad’, effectively eliminating each other. Instead, if we use bigrams (‘I am’ ‘am not’ ‘not good’ ‘good at’ ‘at tennis’ ‘tennis I’ ‘I am’ ‘am very’ ‘very bad’ ‘bad at’ ‘at it’), we would capture both ‘not good’ and ‘very bad’. These have a similar meaning, and it is therefore much easier to capture the sentiment of such phrase.

Now that we made it easier to determine the frequency of and relationship between the words in our textual data, the next step is to structure this data in a way that we can apply machine learning. In order to structure the textual data, a document-term matrix is used. This is a spreadsheet-like document in which the ‘instances’ are organized in rows, and in columns by unigrams or other n-grams such as bigrams[53]. Whatever these ‘instances’ are, depends on the objective of the machine learning model. If we want to compare multiple documents and determine whether there is any similarity, the instances will be the documents. If we want to compare multiple paragraphs within one single document, the instances will be paragraphs, and so on. Figure 3 illustrates a document-term matrix, which shows five documents (labelled D1 trough D5).

*Figure 3 Unigram document-term matrix* ***[54]***

After having created a document-term matrix, we have effectively transformed unstructured data to structured data and it becomes possible to apply statistical models in order to discover or ‘learn’ insights, similar to what was discussed in the previous chapter on machine learning. Here too, we can choose to use supervised or unsupervised learning.

2. Natural Language Processing for Technology Assisted Review and e-Discovery

Now that we better understand how artificial intelligence, and machine learning in particular, relate to textual data, we can look at how these techniques are used in the legal practice. There is an increasing amount of commercial solutions being developed which leverage the power of natural language processing to either help lawyers to better perform their jobs or to go beyond the lawyer altogether and provide legal services directly to customers and business. One of the most prominent applications of artificial intelligence, machine learning and natural language processing are tools that automate the discovery in litigation or government investigations, also known as ‘e-discovery’[55]. More and more law firms are now also implementing natural language processing tools in the transactional framework, in order to more efficiently review large batches of agreements (within or outside of the context of an M&A due diligence)[56]. In both cases, lawyers are conducting what is called “technology assisted review” or TAR in order to classify documents. In the e-discovery context, documents are classified as relevant or non-relevant. In the contract analysis context, agreements may be automatically classified by type (such as ‘lease agreement’ or ‘employment agreement’), or as having or lacking a particular clause (such as a ‘change of control’ provision). In the remainder of this section, we will focus on how artificial intelligence is being leveraged in the framework of e-discovery, since “taken as a whole, e-discovery represents perhaps the most mature incursion of technology into the practice of law”[57].

Pretrial discovery is the process in which parties request access to information from opponents and third parties in order to assemble evidence for trail. E-discovery relates to the process of discovery where the information sought is in electronic format. Since large lawsuits usually involve millions of documents[58]of which only a small subset is relevant for the case, lawyers have sought to use artificial intelligence to make the process of going through all the documents to find the ‘relevant’ documents more efficient. As a result, the practice of ‘predictive coding’, or the application of machine learning in the discovery practice, was born. It is now firmly established that using TAR, and predictive coding in particular, is not only more effective but also much cheaper than human review[59]. TAR is now so common and accepted by the judiciary that one judge even declared it to be “black letter law”[60].

The TAR process, using predictive coding, is usually initiated by using a ‘seed’ set of documents. This is essentially the ‘training data’ as discussed earlier in the framework of supervised learning with machine learning. The seed set of documents is an initial batch of documents which is manually coded by the human expert and which she considers to be relevant and issue-related[61]. Once the seed set of documents has been manually programmed, the machine learning process can start. The machine learning system will create a model based on the seed set and will identify additional documents based on that model which may be relevant for the human reviewers. The system may present documents to the human reviewers using ‘uncertainty sampling’, which will select the documents of which the system is least certain[62]. Finally, the human reviewers will provide feedback to the machine learning system in relation to the relevance of the documents the system is returning. As a result, the machine learning system receives valuable feedback and is able to update its model, making the process iterative[63].

*Figure 4 Prioritization using predictive coding* ***[64]***

The efficiency gains from such a process can be illustrated by Figure 4. The shaded cells represent the relevant documents. The white cells represent non-relevant documents. The first bar represents conducting the discovery process at random, where no particular prioritization is used. The second bar represents the ideal scenario, in which the machine learning model is able to prioritize all the relevant documents before any of the non-relevant documents. As written earlier, aiming for the machine learning model to have an accuracy of 100% is not realistic, which is why the third and last bar is closer to reality. In the last bar, the machine learning model successfully prioritized all but one relevant agreement in the first half of the review process.

VII. The Relevance of Machine Learning and Natural Language Processing for the Legal Practice

Artificial intelligence is rapidly evolving. Two technologies within the broad area of artificial intelligence that are particularly interesting for the practice of law are machine learning and natural language processing. The last remaining question is whether these technologies are sufficiently advanced in order to be able to replace legal practitioners, including lawyers. The answer is: it depends. Whether or not artificial intelligence will be able to replace legal practitioners, depends on the specific task that the practitioner is conducting. The key to understand the impact of artificial intelligence on the legal profession is to therefore understand which tasks are being performed by practitioners. In this framework, it is important to understand that the work of the legal practitioner can be decomposed.

For example, a litigation lawyer may perform any of the following tasks: document review, legal research, project management, litigation support, (electronic) disclosure, strategy, tactics, negotiation and advocacy[65]. As we have seen in the previous chapter, document review can be greatly enhanced by using artificial intelligence, more particularly predictive coding. That does not mean, however, that the litigation lawyer’s job is at risk. The lawyer is still crucial in the litigation process for the numerous tasks which are not yet able to be replaced by artificial intelligence software, including strategy, tactics, negotiation and advocacy.

In conclusion, the question is not whether artificial intelligence will replace legal practitioners. Rather, the question is how we can maximize the efficiencies as the result of legal practitioners implementing artificial intelligence to better serve their clients. As Daniel Katz writes: “the equation is simple: Humans + Machines > Humans or Machines”[66].

[1]Richard Tromans, White Shoe Firm Davis Polk Picks Kira In AI Market Milestone, Artificial Lawyer(March 15, 2018), https://www.artificiallawyer.com/2017/11/29/white-shoe-firm-davis-polk-picks-kira-in-ai-market-milestone/.

[2]Columbia Law introduces a pilot “J-Term,” a series of one-week electives, Columbia Law School Blog(March 15, 2018), http://www.law.columbia.edu/news/2017/12/electives-winter-term.

[3]Richard Susskind, Tomorrow’s Lawyers 11 (2013).

[4]Id.

[5]Joanna Goodman, Robots in Law: How Artificial Intelligence is Transforming Legal Services7 (ARK Group, 2016).

[6]Id.

[7]Richard Susskind, Tomorrow’s Lawyers 4 (2013).

[8]Ron Dolin, Adaptive Innovation: Innovator’s Dilemma in Big Law(April 15, 2015) 5. Available at https://ssrn.com/abstract=2593621.

[9]Richard Susskind, Tomorrow’s Lawyers 6 (2013).

[10]Jonathan Macey and Joshua Mitts, Finding Order in the Morass: Three Real Justifications for Piercing the Corporate Veil, 100 Cornell L. Rev.99 (2014).

[11]David Silver et al., Mastering the game of Go with deep neural networks and tree search, 529 Nature484 (2016).

[12]Michael Mills, Artificial Intelligence in Law: The State of Play 2016, Thomson Reuters, 2016.

[13]Cade Metz, In a Huge Breakthrough, Google’s AI Beats a Top Player at the Game of Go, Wired(Feb. 15, 2018), https://www.wired.com/2016/01/in-a-huge-breakthrough-googles-ai-beats-a-top-player-at-the-game-of-go/.

[14]Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach2 (3d ed. 2010).

[15]Joanna Goodman, Robots in Law: How Artificial Intelligence is Transforming Legal Services4 (ARK Group, 2016).

[16]Calo, Ryan, Artificial Intelligence Policy: A Primer and Roadmap(August 8, 2017). Available at https://ssrn.com/abstract=3015350.

[17]David Schatsky & Craig Muraskin, Demystifying Artificial Intelligence, Deloitte University Press, 2014.

[18]Artificial Intelligence and Life in 2030, accessed February 22, 2018, https://ai100.stanford.edu/sites/default/files/ai_100_report_0831fnl.pdf.

[19]Calo, Ryan, Artificial Intelligence Policy: A Primer and Roadmap(August 8, 2017) 5. Available at https://ssrn.com/abstract=3015350.

[20]Michael Mills, Artificial Intelligence in Law: The State of Play 2016 3, Thomson Reuters, 2016.

[21]Richard Susskind, Expert Systems in Law: a Jurisprudential Approach to Artificial Intelligence and Legal Reasoning, 49 The Modern L. Rev. 168, 172 (1986).

[22]David Schatsky & Craig Muraskin, Demystifying Artificial Intelligence, Deloitte University Press, 2014 4.

[23]Richard Susskind, Artificial Intelligence, Expert Systems and Law, The Denning Law Journal, 1990.

[24]Richard Susskind, Expert Systems in Law: a Jurisprudential Approach to Artificial Intelligence and Legal Reasoning, 49 The Modern L. Rev. 168, 172 (1986).

[25]Beth Enslow, The Payoff from Expert Systems, Across the Board, 1989 54.

[26]David Schatsky & Craig Muraskin, Demystifying Artificial Intelligence, Deloitte University Press, 2014 4.

[27]Ethem Alpaydin, Machine Learning16 (2016).

[28]David Schatsky & Craig Muraskin, Demystifying Artificial Intelligence, Deloitte University Press, 2014 6.

[29]Ethem Alpaydin, Machine Learning694–695 (2016)

[30]Surden, Harry, Machine Learning and Law, 89 Washington Law Review 87, 93 (2014).

[31]Kevin Ashley, Artificial Intelligence and Legal Analytics114 (2017).

[32]Daniel Katz, Michael Bommarito II and Josh Blackman, A General Approach for Predicting the Behavior of the Supreme Court of the United States, https://ssrn.com/abstract=2463244 (2014).

[33]Ethem Alpaydin, Machine Learning77 (2016) and Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach697 (3d ed. 2010).

[34]Kevin Ashley, Artificial Intelligence and Legal Analytics110 (2017).

[35]Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach700 (3d ed. 2010).

[36]Kevin Ashley, Artificial Intelligence and Legal Analytics114 (2017).

[37]Id110.

[38]Surden, Harry, Machine Learning and Law, 89 Washington Law Review 87, 106 (2014).

[39]Kevin Ashley, Artificial Intelligence and Legal Analytics113 (2017).

[40]Surden, Harry, Machine Learning and Law, 89 Washington Law Review 87, 106 (2014).

[41]Stuart Russell & Peter Norvig, Artificial Intelligence: A Modern Approach705 (3d ed. 2010).

[42]Kevin Ashley, Artificial Intelligence and Legal Analytics113 (2017).

[43]Id.

[44]Id.

[45]Ethem Alpaydin, Machine Learning18 (2016).

[46]Surden, Harry, Machine Learning and Law, 89 Washington Law Review 87, 95 (2014).

[47]Rodney Brooks, The Seven Deadly Sins of AI Predicitons, MIT Technology Review(March 8, 2018), https://www.technologyreview.com/s/609048/the-seven-deadly-sins-of-ai-predictions/.

[48]Jeff Margolies, Rajeev Ronanki, David Steier, Tech Trends 2018: The Symphonic Enterprise, Deloitte Insights, 2018.

[49]Julia Hirschberg and Christopher Manning, Advances in Natural Language Processing, 349 Science 261, 261 (2015).

[50]Eric Talley, Is The Future of Law a Driverless Car? Assessing How the Data Analytics Revolution Will Transform Legal Practice, https://ssrn.com/abstract=3064926 11 (2017).

[51]Kevin Ashley, Artificial Intelligence and Legal Analytics236 (2017).

[52]Ethem Alpaydin, Machine Learning69 (2016).

[53]Kevin Ashley, Artificial Intelligence and Legal Analytics238 (2017).

[54]Eric Talley, Is The Future of Law a Driverless Car? Assessing How the Data Analytics Revolution Will Transform Legal Practice, https://ssrn.com/abstract=3064926 13 (2017)

[55]David Schatsky & Craig Muraskin, Demystifying Artificial Intelligence, Deloitte University Press, 2014, 7.

[56]Richard Tromans, White Shoe Firm Davis Polk Pick Kira In AI Market Milestone, Artificial Lawyer (March 8, 2018), https://www.artificiallawyer.com/2017/11/29/white-shoe-firm-davis-polk-picks-kira-in-ai-market-milestone/.

[57]Daniel Katz, Quantitative Legal Prediction — or — How I learned to Stop Worrying and Start Preparing for the Data-Driven Future of the Legal Services Industry, 62 Emory Law Journal909 945 (2013).

[58]Kevin Ashley, Artificial Intelligence and Legal Analytics239 (2017).

[59]Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Effcient Than Exhaustive Manual Review, 16 RICH. J.L. & TECH. 11 (2011).

[60]John Tredennick, TAR for Smart People 2.025 (2016).

[61]Kevin Ashley, Artificial Intelligence and Legal Analytics241 (2017).

[62]Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Effcient Than Exhaustive Manual Review, 16 RICH. J.L. & TECH. 11 12 (2011).

[63]Kevin Ashley, Artificial Intelligence and Legal Analytics241 (2017).

[64]John Tredennick, TAR for Smart People 2.018–19 (2016).

[65]Richard Susskind, Tomorrow’s Lawyers 30–32 (2013).

[66]Daniel Katz, Quantitative Legal Prediction — or — How I learned to Stop Worrying and Start Preparing for the Data-Driven Future of the Legal Services Industry, 62 Emory Law Journal909 929 (2013)