Can Artificial Intelligence understand patents?

Aug 13, 2019 · 4 min read


Many intellectual property (IP) offices around the world have already undertaken AI initiatives to help with tasks like patent classification and prior art search.

The World Intellectual Property Organization considers AI to be vital to the future administration of IP:

“[W]e are not going to be able to deal, as a world, with this volume of data and this complexity of data without applications based on Artificial Intelligence.”

With over 3 million new patent applications filed globally each year, the problem is compounding.

In September 2018, the US Patent and Trademark Office (USPTO) issued a Request for Information (RFI) titled Challenge to Improve Patent Search With Artificial Intelligence.

The RFI characterized prior art search as looking for a needle that does not yet exist in an ever-growing haystack.

It also stressed that language matters, because terms evolve over time and innovation spurs new words and phrases beyond common vocabulary.

The RFI concluded that patent examiners need better results, not more results.

Words without meaning

Boolean search for patents is outdated and becoming less effective as the volume of prior art grows.

You have to pick the right keywords and enter them into a database one by one, or try different combinations.

It’s a tedious process of trial-and-error, and it puts you at the mercy of the nuance and variability inherent in patent language.

No matter how good your keyword selection is, you still need to review a potentially vast number of irrelevant results before you can focus on your top picks.

A different approach known as semantic search lets you find the closest prior art by typing in a description of your idea.

Rather than finding literal keyword matches, a semantic search extracts meaning from the text as a whole; it can read paragraphs of input, like an Abstract or a set of Claims.

This eliminates the need to craft elaborate search queries, and lets you spend more time looking at potential winners instead of discarding likely losers.

Computers can process millions of documents in seconds — much more than humanly possible.

The real challenge is understanding what each document is about.

But how do we know whether a computer program understands what it reads?

One approach is to give it a task to see how well it performs.

We can look at the following patent-related tasks:

  1. Classifying an invention into one or more technology categories; and
  2. Finding existing documents that disclose similar subject matter (i.e., prior art search).

Putting AI to the test

At Legalicity, we developed a tool that accomplishes 1. and 2. above with a measurable degree of success: we’ve tested how well our AI performs relative to trained human experts.

The following is a quick summary of our findings.

To learn more about the product and how it’s being used by some of the largest law firms, technology companies, and research institutions, please visit

1. Patent classification

The Cooperative Patent Classification (CPC) system provides a framework for organizing patents by technology area. It’s arranged as a tree structure with a total of five levels:


At the bottom, there are tens of thousands of subgroups.

Inevitably, some of them overlap, which is why patents are often assigned to multiple categories.

We focused on the primary CPC subclass for any given patent and tested whether our AI could identify the correct subclass, out of hundreds of possibilities, after reading only a portion of the patent.

Here are the results from a sample of 1,000 randomly selected patents:

The AI was able to put the correct subclass in 1st place over 50% of the time; in the top 5 up to 84% of the time; and in the top 10 up to 94% of the time.

2. Prior art search

The “ultimate” test is whether the AI can recognize which prior art is relevant and which isn’t — a very difficult task even for humans.

We looked at actual prior art cited by USPTO patent examiners to see if it appeared in the top 50, 100, and 200 results generated by our AI.

We focused on prior art cited for Section 102 (novelty), because it consists of a small number — usually just one or two — of the most similar documents identified by the examiner.

Here are the results from a sample of 800 randomly selected patents:

The AI found, on average, 30–50% of the prior art, and every single document 25–40% of the time.


To put that in perspective, the AI found every document the examiner needed to evaluate the novelty of an invention at least one out of every four times, and did it instantly.

The level of expertise required to find a needle in a haystack of that size suggests at least some degree of understanding on the part of the AI.

In practical terms, it also means we now have a tool that can free up people’s time to focus on more important things, and potentially discover prior art that may be overlooked by human experts.

Yaroslav Riabinin is an intellectual property lawyer and co-founder of Legalicity, a Toronto-based legal technology startup.

Get in touch with him at

Thank you for reading!

Please like, subscribe, follow us on Twitter, add Yaroslav and Stephanie on LinkedIn, and let us know your thoughts!

Photo by Alex Knight on Unsplash

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store