Quantum Natural Language Processing

Natural language processing (NLP) is an interdisciplinary subfield of artificial intelligence concerning how computers process natural language. With computers becoming increasingly advanced, NLP has come to be a fixture of the online world, with applications from machine translation to speech recognition; however, there is plenty of room for improvement in the field.

This article aims to be an introduction to the NLP framework known as DisCo, and its integration into a quantum computing framework to produce an efficient text classifier.

An Introduction to DisCo

Classical NLP has historically used what is called the vector space model for modeling the meanings of words. In this model, a basis is selected (say, the 2000 most common words in a given corpus), and each word is encoded as a normalized vector that describes how often they appear near the words in the basis; if the inner product of the vectors for two words is close to one, then the two words have a similar meaning. This approach is termed distributional semantics.

Distributional semantics alone, however, cannot determine the meaning of phrases or sentences. This is the job of compositional semantics, which puts together the meanings of the component words and the sentence’s grammatical structure to determine the meaning of the whole sentence. Combining the two approaches yields “distributional computational semantics”, or DisCo.

The combined model assigns each grammatical type a tensor product space based upon how they interact with other grammatical types, then composes them to obtain a vector representing the meaning of the sentence. A transitive verb, for instance, takes two nouns as arguments (subject and object) to form a sentence, so its vector would exist in the tensor space 𝒩 ⊗ 𝒮 ⊗ 𝒩, where 𝒩 is the meaning space for nouns and 𝒮 is the meaning space for sentences. DisCo also has a number of linear maps that can be used to connect words in a grammatical structure; especially key here is the cap, the primary method of making such connections.

A diagrammatic representation of the DisCo model, as used in [1]. Left: examples of words. Right: examples of linear maps, including the cap (bottom right), a linear map that sums across the entire set of basis vectors of 𝒩.

The framework has the following algorithm for producing the representation of a sentence:

  1. Compute the tensor product of every word in the sentence in order.
  2. Construct a linear map representing the grammatical reduction of the sentence, using the tensor product of the component linear maps.
  3. Apply the map to the word product.

The algorithm presented here is a promising method of incorporating grammatical structure within NLP vector space models, but it is also highly computationally expensive in the classical realm due to the large tensor product spaces used in calculations. However, by incorporating quantum computing, this overhead can be reduced.

The remainder of this article will discuss one such implementation of DisCoCat performed by Zeng & Coecke¹.

NISQ + DisCoCat = QNLP

Using a quantum system can drastically reduce the amount of space needed to store vectors.

Consider a basis space of 2000 words for noun meaning vectors. Then (assuming the sentence meaning space is no larger than the word meaning space) we need N = (2000)³ = 8 x 10⁹ bits to represent a single transitive verb (two noun arguments and a sentence) in the system. Quantum systems reduce this overhead exponentially, taking only log_2(N) = 33 qubits to represent the same verb. This effect is even more noticeable with large numbers of words.

A comparison between the storage needs for classical and quantum systems running DisCo.

In addition, since quantum systems already compose using tensor products, they are well-suited for a system that extensively relies on tensor products.

Sentence similarity quantum algorithm

One particular task in the field of computational linguistics asks computers to classify sentences about similar topics. In terms of vectors, given a vector s and a set U of M sets of vectors, the computer should be able to assign s to the set containing its nearest neighbor. In the simplest case, each of the M sets contains a single vector, and the problem is reduced to seeing which of these vectors s is the closest to. This is an instance of what is known as the closest vector problem.

Solving the closest vector problem

Also known as the nearest-neighbor problem, the closest vector problem is of particular interest to computational semantics, simply because semantic similarity is defined by having a low inner product. Much like the sentence representation discussed earlier in the article, the closest vector problem can get prohibitively expensive for classical computers with a sufficiently large dataset.

The quantum algorithm used to solve this problem comes in three steps, and essentially combines Grover’s search algorithm with some preprocessing³:

A diagram describing the quantum nearest-neighbor algorithm.

The first step consults some oracles to prepare the states; the second step uses quantum amplitude estimation to obtain distance estimates without disturbing the state so that Grover’s algorithm (in the form of a minimization algorithm) can find the closest vector in the third step.

Classical algorithms for the nearest-neighbor problem take O(MN) time for M sets of N-dimensional vectors. Due to incorporating Grover’s algorithm, the quantum algorithm provides for a quadratic reduction in time, assuming the input vectors are sparse (have entries that are mostly 0) and the runtime is mostly dominated by oracle queries.

Solving sentence similarity

Using the quantum algorithm described above, we can compare the sentence vector with a number of reference vectors to classify it. However, the algorithm is now primarily dominated by the derivation of the sentence vector, which takes O(3N) time. In order to recover the quadratic speedup, we can reframe the sentences: instead of computing the tensor product of every single word in the sentence and applying a linear map to obtain s, we redefine s and U to not require this calculation.

A sample reframing. Note the lack of caps in the bottom diagrams.

Note that s and U are now just tensor products of extant information, which can be done with little overhead; the similarity computation overhead is thus reduced to that of a single inner product. This eliminates the sentence vector as a significant source of runtime complexity, resulting in the desired quadratic speedup.

Conclusion

Natural language processing is not an easy task for computers, not the least because doing so is intensive on resources in both time and space; as such, the quadratic reduction in complexity afforded by applying quantum computing to the field is rather significant.

However, while classical NLP and quantum computing are both areas of research with active experimental arms, quantum NLP is, along with the rest of the field of quantum machine learning, primarily theoretical. No experimental results have been created for this framework yet, and it may not be practical to do so for years.

As one final interesting note: DisCo was actually directly inspired by quantum theory. Which makes its application in a quantum extension of natural language processing all the more fitting.

Sources

¹ https://arxiv.org/pdf/1608.01406.pdf

² https://arxiv.org/pdf/1003.4394.pdf

³ https://arxiv.org/pdf/1401.2142.pdf

--

--