Chau | IP
Published in

Chau | IP

Drafting Quality AI Claims

There are a number of challenges in drafting a quality AI patent, and the most important question of all is how to draft a quality AI claim. There are a number of dimensions to writing a quality claim, but I want to focus specifically on how to characterize an AI invention.

A Framework for Understanding AI Inventions

As a baseline, I think it is useful to have a framework for thinking about different types of novelty in an AI invention. I like to think in terms of four basic categories. Not every invention has novel features in every category, but many of them have more than one unique element, and it is important to understand them from a few different angles.

Keep in mind that one thing I don’t focus on here is the particular parameters (e.g., the node weights of a neural network) or hyper-parameters (e.g., the learning rate or the gamma). These are important aspects of most AI systems, but they are much harder to capture in a patent claim. The fact that they are so difficult to comprehend means that often, although not always, they are even omitted from the patent specification.

Perhaps one of the reasons that people might think claiming AI is difficult is because they focus on this difficulty in comprehending the parameters. In many cases, the engineers themselves don’t really know what the parameters of their system are, or what they represent. But this doesn’t mean you can’t patent AI. It just means that in general it is better to focus on the four categories described below. That doesn’t mean it is impossible to draft a patent claim that focuses on the model parameters, but it is usually not the most promising approach. With that said, let us turn to the more useful categories for capturing the novelty of an AI invention:

I. Application

Perhaps the first question to ask about an AI system is what it is trying to accomplish. In other words, what is the output of the system, and how is that output used to solve a problem?

For example, the output of an image classification system can be a binary classification (i.e., an answer to a yes or no question), a probability vector representing the relevance of multiple categories, or an image mask that provides information about each pixel.

How can this output used to solve a real world problem, or impact the physical environment? Does applying the output of the machine learning model depend on framing a physical problem using a particular mathematical model (e.g., a Markov decision process)?

II. Data

Another way that AI systems differ from each other is in what input they use, and how they convert that input into usable features. Some machine learning models use raw data, while others extract intermediate features from the data before performing their designated function.

Some models use a single type of data, while others rely on data of different modalities (i.e., image data and text data). Some models even develop multiple types of features from the same input data (e.g., a feature pyramid network that generates feature maps with different resolutions).

In most cases, the type of data that is used in intimately connected with the kind of problem that is being solved, but in some cases the invention is related more to one than the other.

III. Architecture

The next question to ask is what kind of AI architecture is used to convert the input into the output.

The AI architecture can describe how parameters or network nodes are related in an individual layer (e.g., the differences between SVM, MLP, CNN, RNN, LSTM, self-attention, etc.) Architecture can also describe how different layers are connected in a multi-layer network (e.g., LeNet, AlexNet, VGG, ResNet).

IV. Training

The final question to ask yourself when trying to understand an AI invention is how the model is trained. Is the training supervised? Unsupervised? Reinforcement learning? What does the training data look like? What kind of loss function is used? Are there multiple training stages that use different loss functions?

Model training can also be unique in other ways, such as using a dropout function in a unique way. In most cases, the kind of training that is used is deeply connected to the application, data, and architecture of the model. However, as we shall see below, sometimes the novelty is almost entirely in the training process itself.

Drafting an AI Claim

With this framework in mind, we now turn to the actual task of drafting an AI patent claim. In some cases, you might be working with an invention involving a completely novel architecture. But in many cases, AI inventions apply some known techniques (and often to an existing problem) in an inventive way. How do we capture this inventiveness without seeming obvious?

One way to do this is to 1) identify something that is unique to the patent in two or more of the above categories, and then 2) draft a claim that captures how the unique aspects are related.

Here is an example claim from a patent entitled “Neural network for keyboard input decoding” (emphasis added):

A computing device comprising:

at least one processor; and

at least one module, operable by the at least one processor to:

output, for display at an output device operatively coupled to the computing device, a graphical keyboard;

receive an indication of a gesture detected at a location of a presence-sensitive input device, wherein the location of the presence-sensitive input device corresponds to a location of the output device that outputs the graphical keyboard;

determine, based on a neural network processing at least one spatial feature of the gesture, at least one character string, wherein the at least one spatial feature indicates at least one physical property of the gesture and the neural network comprises a Long Short Term Memory network; and

output, for display at the output device, the at least one character string determined based on the neural network processing of the at least one spatial feature of the gesture using the neural network.

This claim is for a machine learning model that. The claim has elements from three of the four categories mentioned above:

Application: identifying character strings

Data: spatial features are identified based on data from a “presence-sensitive input device”

Architecture: an LSTM

This is a pretty good claim.The patent is granted, so it has that going for it. And there is no reason a claim has to involve all four elements. In fact, it is probably better to focus on 2–3 categories at a time. If you really have novelty in all four elements you might want to break them up into different independent claims (or even separate patents). Also, novelty in training is perhaps the most difficult to detect, so it might not be the first choice if there are other novel elements to focus on.

However, I do have one complaint. I would like the claim to capture a little bit more about the relation between the spatial features and the LSTM. Why is an LSTM a good choice of architecture for this problem? The reason is that LSTM networks are often useful in handling time series data, and gestures are in fact a time series of spatial data.

However, neither claim 1 nor any other claim mentions this. It is discussed at length in the specification, but I think that to properly capture this invention, the concept of a time series should have been mentioned somewhere in the claim set. In other words, the concept of a time series is like a connecting concept that bridges the gap between aspects of novelty and helps tell the story for why LSTM isn’t just an arbitrary choice.

Now, this doesn’t necessarily mean that I want claim 1 to be any longer. A basic rule of thumb is that a claim with more words is harder to enforce, because you have to show that an infringer has implemented every single word. Counter-intuitively, getting a claim allowed that doesn’t capture all the novel aspects is sort of a good thing. But leaving out important connections between the unique elements of an invention can make it much more difficult to get an allowed patent.

So here’s another claim for comparison, from a patent entitled “System and method for addressing overfitting in a neural network”:

A computer-implemented method comprising:

obtaining a plurality of training cases; and

training a neural network having a plurality of layers on the plurality of training cases, each of the layers including one or more feature detectors, each of the feature detectors having a corresponding set of weights, and a subset of the feature detectors being associated with respective probabilities of being disabled during processing of each of the training cases, wherein training the neural network on the plurality of training cases comprises, for each of the training cases respectively:

determining one or more feature detectors to disable during processing of the training case, comprising determining whether to disable each of the feature detectors in the subset based on the respective probability associated with the feature detector,

disabling the one or more feature detectors in accordance with the determining, and

processing the training case using the neural network with the one or more feature detectors disabled to generate a predicted output for the training case.

Here the invention has novel elements from two categories:

Architecture: multiple layers that each include a feature detector

Training: disable feature detectors during training

This example is much more abstract than the last one. We don’t really know what the training data is, or what it will be used for (because it is a very general technique that can be used for a lot of things). But, unlike that last one, the connection between the architecture and the training process is very clear. Each layer has a feature detector, and we disable some of them during training. If we didn’t have multiple feature detectors, we couldn’t disable some of them during training.

So there you have it. When drafting an AI claim, start by understanding the invention from multiple angles, and then try to draft a claim that captures how the different aspects are related.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store