Designing for Machine Learning — Part 1 

Cluster Analysis in Kira— A WIP Case Study

How does Kira use machine learning & clustering?

If you are unfamiliar with machine learning, check out this clear and simple breakdown

How do we use machine learning?

Machine learning (ML) is the foundation of our app. It helps our users get information they need from their documents. Users can identify language used to refer to topics (a.k.a. fields) that the ML then uses to extract similar information from docs they upload (a.k.a. extractions). They can also train the ML models to be more specific, so that it extracts more relevant info. Most of our users are lawyers and use the app to conduct due diligence review. Instead of reading thousands of docs, the app gives a summary to scan, on a per-document basis. This saves them time and money, and catches potentially risky info that may otherwise be missed.

How do we use clustering?

For the sake of brevity:

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

To steal an analogy from our ML researcher (S/O to Adam Roegiest!): Imagine you have a giant box of produce. Clustering would sort the vegetables and fruit into categories (e.g., apples vs oranges). This is done based on parameters you define (e.g., round, red, 3 inch diameter); the granularity depends on how specific you are with your parameters.

At the moment, Kira uses clustering in 2 ways:

  1. To group documents
  2. To group field extractions

This case study will focus on the second — how we group field extractions — and why our users find this useful. 

What is field extraction clustering?

Currently in Kira, field extractions are clustered (or, grouped) together based on syntactic (e.g, date range of a field) and semantic (language) similarities. Anything outside of that is marked as an outlier

This GIF shows clustering in action; in the Governing Law field (old language is “Provision”), there are four clusters, including one outlier. The numbers refer to the number of extractions associated with each cluster. Clicking on each cluster reveals all the extractions associated with it, including links to the documents they originate from.

What do Kira users want from extraction clustering? 

From our user research — including user interviews, support tickets, and feature suggestions from customers — we understand that users use this feature primarily to find outliers. Identifying outliers means they can quickly assess which language in documents is most risky. If the language in an extraction varies wildly across all documents (i.e. is an outlier), it is likely risky and needs to be reviewed for the level of risk it may pose to, for example, a merger. This would then be noted in a final report to their clients.

Users also use clustering as an alternative project review workflow, during their second round of review. They conduct project review across field extractions, instead of going document by document. Clustering makes this process more structured by identifying similarities and differences across documents, for the same field (e.g., Governing Law, in the GIF above).

Evaluating the current clustering feature

The design team at Kira investigated what did and didn’t work for users with this feature, as it is currently designed. Some feedback:

  • The clustering ML actually worked, which they liked, but;
  • There was no way to identify individual clusters. Because the only labelled cluster was the outlier, it was confusing when two (or more) clusters had the same number.
  • The bubble animation was fun, but confusing. When unnamed bubbles bounced around and changed location, it was hard to keep track of what you had already viewed, especially if it was unnamed. The animation is supposed to indicate how far/close clusters are to each other, a function that got lost in translation.
  • There was no way to change the range of variations. Users wanted to fine-tune the cluster analysis to suit their needs for more or less exact matches.
  • It would be helpful to mark a certain extraction as the “standard” to which all others would be compared to. This would allow users to identify language that was more accurate for their cases.
  • Users wanted to see how extractions in the same cluster differed from the standard

From this, we developed the following user stories:

As a senior due diligence reviewer, I want to identify/review outliers so that I can quickly assess risk in a project.
As a due diligence reviewer, I want to identify a standard extraction so that I can be sure the system is comparing all other extractions in that cluster to my preferred language/standard.
As a due diligence reviewer, I want to see how the nonstandard extractions in a cluster vary from the standard, so that I can evaluate if the differences are minor or not.
As a due diligence reviewer, I want to easily identify the difference between certain clusters so that I can keep track of what I have been reviewing/viewing.
As a due diligence reviewer, I want to adjust the granularity of the clustering so that I can control how precise the clusters are, depending on my needs for different projects.

Redesigning clustering

Moving away from bubbles

As a due diligence reviewer, I want to easily identify the difference between certain clusters so that I can keep track of what I have been reviewing/viewing.
As a senior due diligence reviewer, I want to identify/review outliers so that I can quickly assess risk in a project.

We addressed these user stories and the issues with the bubbles in one go. The outliers were in a group with a purple background and positioned prominently at the beginning of list so they were (hopefully) the first thing a user sees. Clusters were named (e.g., Cluster A, B, etc), ordered from largest to smallest, to help differentiate them. I also included a preview of the standard extraction (e.g. Variant A).

All outliers and clusters for the field “Change of Control”.

Displaying more information about extractions

Let’s imagine a user clicked into Cluster A in Change of Control (see mockup above). What they would ideally see is a list of all the variants in that cluster, including a clearly marked standard, and an indication of the variety between extractions, as per these user stories:

As a due diligence reviewer, I want to identify a standard extraction so that I can be sure the system is comparing all other extractions in that cluster to my preferred language/standard.
As a due diligence reviewer, I want to see how the nonstandard extractions in a cluster vary from the standard, so that I can evaluate if the differences are minor or not.

In the component design for variants below, users can now:

  • Identify which variation the ML has determined as the standard (the star icon) and choose a new standard, if desired.
  • See the difference between variants with red-line highlighting and strike-outs, similar to reviewing in MS Word.
Standard variation (left); red-lining (right). The farther from the standard, the lighter the background colour of the variant.

Suggesting an alternative workflow

I thought this redesign would also be a good opportunity to suggest an alternative workflow to document-by-document review. I could address those users doing second round review by giving them the space to review field-by-field and:

  • Review all the field extractions in each cluster on a single page
  • Answer questions related to the field (determined by the user elsewhere in the app), and batch apply that answer to all extractions in that cluster.

In addition, the ML could then learn what the answers for future variations may be, based on user behaviour (e.g., batch applying answers to all variations). 

For example, if users knew the field “Change of Control” may be risky based on their own assessment, they could review all of it’s extractions without having to go document-by-document.

A few notes on interactivity on this design: Users can quickly jump between clusters using the left-hand navigation indicating O (outliers), A (Cluster A), B (Cluster B), etc. Also, changing the standard variation will alter the information view on the page below — the chosen variation would go to the top, the colour gradient would change, and red-lining would adjust to reflect the new standard.

User configurations

Finally, users asked for the ability to configure clustering:

As a due diligence reviewer, I want to adjust the granularity of the clustering so that I can control how precise the clusters are, depending on my needs for different projects.

To do this, I designed this slider:

I had many labelling challenges. Initially I considered labelling the extreme points on the slider “smaller” versus “larger”, but after consulting our ML researcher I realized that changing the granularity did not mean the actual number of clusters for a field would be smaller or larger. For example, a field could have 3 clusters of 20 documents. Asking for the slider to make the clusters “smaller” may not actually work if all those 3 clusters had exact matches of language and syntax.

I also toyed with “more accurate” versus “less accurate”, but we both agreed that this was not representative of how the technology worked. A cluster wasn’t “less” or “more” accurate, it was just a reflection of how exact or approximate the language and syntax was, and that level of exactness could matter more or less for different users.

In the design above, the finest granularity would make every extraction an outlier (unless they are exact matches). However, if all the matches were exact in an entire project, the coarsest granularity would not necessarily produce a smaller number of clusters than the finest.

Pulling it all together

In the final workflow, this is what it would look like if you, as the user, wanted to see an analysis of All Fields (left), then decided to explore the clusters inside the field of Change of Control (right):

In Designing for Machine Learning — Part 2 I explore fun and interactive charts we’re considering for clustering, and how to get users to configure the charts they want. Give it a read & thanks for getting this far!