Can Artificial Intelligence understand patents?

Legalicity
Aug 13 · 5 min read

Background

The legal profession isn’t known to be on the bleeding edge of technology, but the success of Artificial Intelligence in other industries hasn’t gone unnoticed.

For example, many intellectual property (IP) offices around the world have undertaken AI initiatives to help with tasks like patent classification and prior art search.

The World Intellectual Property Organization considers AI to be vital to the future administration of IP:

“[W]e are not going to be able to deal, as a world, with this volume of data and this complexity of data without applications based on Artificial Intelligence.”

With over 3 million new patent applications filed globally each year, the problem is compounding.

In September 2018, the US Patent and Trademark Office (USPTO) issued a Request for Information (RFI) titled Challenge to Improve Patent Search With Artificial Intelligence.

The RFI characterized prior art search as looking for a needle that does not yet exist in an ever-growing haystack.

It also stressed that language matters, because terms evolve over time and innovation spurs new words and phrases beyond common vocabulary.

The RFI concluded that patent examiners need better results, not more results.

Given the nature of these problems, it’s clear that they won’t be solved by traditional methods alone.

Words without meaning

Boolean search for patents is outdated and becoming less effective as the volume of prior art grows.

You have to pick the right keywords and enter them into a database one by one, or in different combinations.

It’s a tedious process of trial-and-error, and it puts you at the mercy of the nuance and variability inherent in patent language. No matter how good your keyword selection is, you still need to review a potentially large number of irrelevant results before you can focus on your top picks.

What if, instead, you could instantly get a list of the closest prior art by typing in a description of your idea?

This approach is known as semantic search.

Rather than finding literal keyword matches, a semantic search extracts meaning from the text as a whole; it can read entire paragraphs of input, like an Abstract or a set of Claims.

This eliminates the need to craft elaborate search queries, and lets you spend more time looking at potential winners instead of discarding likely losers.

For a semantic search to work, however, automation isn’t enough. We already know that computers can process millions of documents in seconds — much more than humanly possible.

But “processing” should include an “understanding” of what the input is about.

To understand a piece of text is to perceive the intended meaning of the words; to be able to make inferences from the information received.

Understanding is proved by performing a task that tests the reader’s interpretation of the text.

So to prove that a computer program can understand patents, it must be given a task to complete.

There are two obvious candidates: (1) classification by technology area; and (2) prior art search.

Both tasks involve judging the extent to which one piece of text is similar to another in terms of their subject matter. It’s the kind of complex reasoning that we associate with human intelligence.

Putting AI to the test

Achieving expert-level decision-making with Artificial Intelligence has been our goal at Legalicity.

In the course of our research, we developed a tool that can accomplish tasks (1) and (2) above with a measurable degree of success.

In other words, we have empirical results that demonstrate how well our AI performs relative to trained human experts.

According to our logic so far, this data should make it possible to answer the question at hand: can AI really understand patents?

The following is a quick summary of our findings; a separate overview of the methodology will be available soon.

To learn more about the product, please visit NLPatent.com.

(1) Patent classification

The Cooperative Patent Classification (CPC) system provides a framework for organizing patents by technology area. It’s arranged as a tree structure with a total of five levels:

sectionclasssubclassgroupsubgroup

At the bottom, there are tens of thousands of subgroups.

Inevitably, some of them overlap, which is why patents are often assigned to multiple categories.

We decided to focus on the primary CPC subclass for any given patent.

We tested whether our AI could identify the correct subclass, out of hundreds of possibilities, after reading only a portion of the patent.

Here are the results from a sample of 1,000 randomly selected patents:

The AI was able to put the correct subclass in first place over 50% of the time; in the top 5 up to 84% of the time; and in the top 10 up to 94% of the time.

(2) Prior art search

Perhaps the “ultimate” test of the AI is whether it can recognize which prior art is relevant and which isn’t — an incredibly difficult task, even for humans.

We looked at actual prior art cited by USPTO patent examiners to see if it appeared in the top 50, 100, and 200 results generated by our AI.

We focused on prior art cited for Section 102 (novelty), because it consists of a small number (usually one or two) of the most similar documents identified by the examiner for each patent.

Here are the results from a sample of 800 randomly selected patents:

The AI found, on average, 30–50% of the prior art, and every single document 25–40% of the time.

Conclusion

These results show that AI can do a patent examiner’s job of finding all the prior art needed to assess the novelty of an invention at least one out of four times.

Given how much effort and expertise is currently required to do this job, the idea of completely automating it, even some of the time, is remarkable.

(Not to mention that humans make mistakes and occasionally miss important prior art, which could be avoided with the help of AI.)

As for the question at hand: we may never know whether AI can truly understand patents, but we do know that it can extract meaning from text and reason about it in a way that few human beings can, so the future looks promising.

Yaroslav Riabinin is an intellectual property lawyer and co-founder of Legalicity, a Toronto-based legal technology startup.

Get in touch with him at yaroslav@legali.city.

Thank you for reading!

Please like, subscribe, follow us on Twitter, add Yaroslav and Stephanie on LinkedIn, and let us know your thoughts!

Photo by Alex Knight on Unsplash
Legalicity

Written by

Legal technology startup based in Toronto. We make AI work so you don’t have to.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade