Quantum Natural Language Processing

Published in

Cambridge Quantum

11 min readApr 7, 2020

“We did it! On an actual quantum computer!”

by Bob Coecke, Giovanni de Felice, Konstantinos Meichanetzidis, Alexis Toumi

Sentences as networks. A sentence is not just a “bag of words”,¹ but rather, a kind of network in which words interact in a particular fashion. Some 10 years ago one of the authors of this article (BC), together with two colleagues, Mehrnoosh Sadrzadeh and Steve Clark, started to draw these networks. This lead to a graphical representation of how the meanings of the words are combined to build the meaning of a sentence as a whole, as opposed to treating the sentence as a structureless “bag” containing the meanings of individual words.

These results have subsequently become widely known, and at the time were also a cover-heading feature in New Scientist.²

The “drawn” networks look like this :

In order to understand better how these networks operate, let’s consider a slightly simpler example:

The idea here is that the boxes represent the meanings of words and that the wires are channels through which these meanings can be transmitted. So, in the example above, the subject Alice and the object Bob are both sent to the verb hates and together they then make up the meaning of the sentence. This flow of words in sentences can, in fact, be traced back to work originally started in the 1950s by Chomsky and Lambek among others, that unified the grammatical structures, essentially of all languages, within a single mathematical structure. In particular, the network of a sentence’s meaning-flow is constructed according to a compositional mathematical model of meaning (semantics).¹⁶

Language is “quantum native”. One particularly interesting aspect of this graphical framework for linguistics was that the networks were inherited from previous work that provided quantum theory with an entirely network-like language.³ This pioneering work was accumulated eventually in a 900-page textbook written by BC and Aleks Kissinger.⁴

In summary, a direct correspondence was established on the one hand between the meanings of words and quantum states, and on the other hand grammatical structures and quantum measurements. This is illustrated below:

Obviously this led to the question — can one make quantum computers handle natural language? This was first proposed in a paper by Will Zeng and BC in 2016.⁵ The paper created a new paradigm for natural language processing (NLP) in a quantum computing context. However, there were some significant challenges to the proposal. Most importantly, there weren’t any sufficiently capable quantum computers that could implement the NLP tasks proposed. Additionally, an assumption was made that one could encode word meanings on the quantum computer using quantum random access memory (QRAM), which to this day, and despite theoretical progress and experimental proposals, remains a distant possibility.

Our new attempt. The idea of making quantum computers process natural language is just not only incredibly cool but also a very natural thing to do for the reasons indicated above. More researchers started to take an interest, and very recently Intel supported a basic attempt to develop some of the ideas contained in the Zeng-Coecke paper on their quantum simulator.⁶

The first conference on Quantum Natural Language Processing, or “QNLP”as we’ve called it, took place in Oxford in December 2019,¹² where we presented a simulation of our experiment¹⁴ (all talks can be found online¹³). In the past few months since the conference, we have been examining ways to use existing NISQ devices, in this instance one of IBM’s quantum devices. The networks as depicted above can’t be interpreted directly by IBM’s machine, which instead, needs something in the form of a “quantum circuit”. Natural language, now with a ‘quantum circuit skeleton’, looks like this:

In this form, we demonstrate that QNLP can be implemented on NISQ devices, and of course, will work extremely well as these devices scale in terms of size and performance. It is worth noting that in this picture as well as in our experiments we used the ZX-language for drawing quantum circuits, which was developed by BC and CQC’s Ross Duncan⁷ and is again part of the same network language of quantum theory that works very well together with QNLP.

Critically, our solution provides a way forward in the absence of QRAM. A fuller and detailed explanation is not possible in this forum, but by employing quantum machine learning we do not directly encode the meanings of words, but instead construct a framework in which quantum states and processes learn their meanings directly from text. In quantum machine learning, by way of analogy with classical machine learning, we use quantum circuits¹⁵ instead of classical neural networks in order to learn patterns from data. Interestingly, neural network architectures are the state-of-the-art in classical NLP, but the majority of methods do not take advantage of grammatical structures. In contrast, we saw that our approach to QNLP naturally accommodates both grammar and meaning.

Using our framework, once the meanings of words and phrases are encoded as quantum states and processes, we are able to prepare quantum states that encode the meaning of grammatical sentences on quantum hardware. Posing a question to the quantum computer, constructed by the vocabulary and grammar the quantum computer has learned, it returns the answer.

Naturally, we next turned our attention toward the design and execution of an experiment that is non-trivial, not least of all since our design is predicated on the program being scalable. This means that the dimension of the meaning space grows significantly with the number of qubits available whilst the size of the circuits dictated by the grammar does not grow too large with the size of the sentence.

Interestingly, we note further that QNLP on NISQ devices is a novel playground for quantifying the scalability of quantum machine learning algorithms and experimenting with various quantum meaning spaces. Being able to accommodate large chunks of text is crucial if we aspire to graduate from sentence to text, and we know that we have made the first step towards this vision.

The technical details. Our experimental workflow is as follows. Let G be the grammar category, which is the mathematical model where grammar diagrams are generated. As stated above, a grammar diagram (or network) encodes the information flow of word-meanings in a grammatical sentence.

In more detail, a diagram is none other than a grammatical and syntactic parsing of a sentence according to a specified grammar model. This diagram is then instantiated as a quantum circuit that lives in the category QCirc(θ). Next, the meaning of the words is encoded in quantum states in such quantum circuits. Specifically, any state can be prepared from a classical reference state, and so by “state” we refer to the circuit (or process) that prepares it. Then the composition of words in a sentence corresponds to composition of circuits representing words. This results in a circuit that prepares a state encoding the meaning of a sentence.

Importantly, the circuits are parameterised by a set θ. In other words, these ‘grammatical quantum circuits’ are a family spanned by θ. By allowing circuits to depend on parameters we create the semantic space in which the meanings of the words, and consequently whole sentences, are encoded.

Once a quantum circuit is created from a sentence, it needs to be evaluated in order to compute the meaning. We may choose to perform this evaluation on a classical computer, where we employ state-of-the-art methods for performing the costly task of multiplying exponentially big matrices. On the other hand, we may choose to implement the circuit on a quantum computer.

The parameterised quantum circuit of a subject-verb-object sentence like ‘Alice hates Bob’ looks like this:

Each of the parts of speech (subject, verb, object) is a quantum circuit defined as a function of some parameters. For example, there are sets of parameter values θ₀, θ₁, θ₂ such that:

subj(θ₀)=Alice verb(θ₁)=hates obj(θ₂)=Bob

The values are determined empirically by a text corpus and are then used to answer questions about the corpus.

In order to ensure that our experiment can be executed effectively on near-term NISQ devices, but at the same time be complex enough to be interesting, we chose a vocabulary of a few words, for example

{Alice, Bob, loves, hates, rich, cute}

and generated not some, but all grammatical sentences from their combinations. From these sentences, we created their corresponding parameterised circuits. Moreover, we interpret the grammar diagrams such that the semantic space is one-dimensional, i.e. just a number indicating the truth-value of the sentence. This number is obtained by evaluating the quantum circuit C(θ) corresponding to the sentence:

0<|C(θ)|²<1

A value close to 1 represents “true” and a value close to 0 indicates “false”.

The labeled toy corpus would look like:

K={(Alice loves Bob, false), (Bob is cute, true), (Alice is rich, true), …}

Now that we have our corpus of sentences and ‘truth universe’, we split the corpus K=R∪E in a training set R and a test set E. Sentences in the training set R are used to do supervised quantum machine learning in order to learn the parameters that result in the correct measurement of the truth-labels. In this way, the parameters for the circuits that prepare the meaning states for nouns {Alice, Bob}, verbs {is, loves, hates}, and adjectives {rich, cute}, are learned.

The scheme for learning the parameters for words in sentences in the training set is as follows. The circuit of a sentence in the training set is evaluated for the current set of parameters on the quantum computer. By sampling measurement outcomes we estimate |C(θ)|². This number is read by a classical computer that checks how well this matches the desired truth label of the sentence. If there is a mismatch, our system updates the parameters so that the quantum computer may evaluate an updated circuit. Iterating this procedure until the parameters converge and all truth labels are reproduced for the sentences in the training set. Note that finding the optimal sequence of updates is, in general, a hard optimization problem, so it is important that our quantum semantic space is well designed and allows the learning to be as tractable as possible. This design feature is critical.

After training, sentences in E are used to estimate how well the truth labels of new sentences, i.e. not in R, are inferred. These new sentences share the same vocabulary as the sentences used for training, but they are grammatically and semantically different.

With this learning framework, we can now ask questions of the quantum computer, as long as the questions are grammatical sentences expressed in terms of the vocabulary and grammar used during training. We are pleased to add that in our experiment, questions can, in fact, be posed as compositions of already learned sentences. For example, we can use the relative pronoun {who}, which we model in terms of CNOT gates, and ask:

Does Bob who is cute love Alice who is rich?

Consequently, this amounts to evaluating a bigger quantum circuit than the one that the model has been trained on. However, because the model was trained on the same grammar and vocabulary as used to express the question, we get the expected truth label, false in this case, by running the quantum circuit for:

Bob who is cute loves Alice who is rich

In particular, the truth label is obtained by estimating the value of the circuit, depicted below, by sampling measurement outcomes for a large number of runs of the quantum circuit corresponding to the above sentence:

The transformation of the question-sentence to a statement-sentence that can be mapped to a quantum circuit and evaluated on a quantum machine is done at the level of grammar diagrams by suitable transformations.⁸

A critical enabling feature of our experiment is effective and efficient compiling and optimization. In order to run a circuit on a quantum device, it needed to be compiled. Compiling entails morphing the circuit such that quantum operations are expressed in terms of device-native operations, as well as accommodating for the quantum processor’s restricted connectivity. For this task, we used CQC’s quantum software development platform, t|ket>.⁹

GitHub repository. The experiments described above have been made available in the following repository:

https://github.com/oxford-quantum-group/discopy/blob/ab2b356bd3cad1dfb55ca6606d6c4b4181fe590c/notebooks/qnlp-experiment.ipynb

Where do we go next? There are a number of ways in which this experiment can be modified and/or generalized.

Firstly, one could vary the kind of hardware that one uses, for example, ion traps or optics instead of superconducting qubits. By implementing our program via CQC’s hardware-agnostic t|ket>, this development can take place fairly easily.

Secondly, one could also vary the computational model, for example, MBQC instead of circuits. We expect to achieve this in due course, by exploiting the tight relationship between the ZX-language and MBQC.¹⁰

Thirdly, and importantly, rather than being restricted to single sentences as was the case in this demonstration, we will process larger text.¹¹ We will provide further information in subsequent articles and papers as the team completes new experiments.

Fourthly, we could embark on other tasks besides question-answering, such as language generation, summarization, etc.

Finally, of course, when hardware becomes more powerful we can simply scale up the size of the meaning spaces and complexity of the tasks — which is clearly our overall objective. With the acceleration of quantum volume capacities in a variety of different quantum computing platforms ranging from superconducting and ion-trap devices to photonic quantum processors, we look forward to providing further information in the coming period.

Notes and References

¹ Here, `bag’ is a widely used technical term, referring to the fact that in may NLP applications grammatical structure has been entirely ignored for many years.
https://en.wikipedia.org/wiki/Bag-of-words_model

² https://www.newscientist.com/article/mg20827903-200-quantum-links-let-computers-understand-language/
http://www.cs.ox.ac.uk/people/bob.coecke/NewScientist.pdf

³ https://arxiv.org/abs/quant-ph/0510032

⁴ https://www.cambridge.org/pqp

⁵ https://arxiv.org/abs/1608.01406

⁶ https://newsroom.intel.ie/news-releases/intel-to-support-the-irish-centre-for-high-end-computing-on-new-collaborative-quantum-computing-project/#gs.z02awn

⁷ https://arxiv.org/abs/0906.4725

⁸ https://arxiv.org/abs/1905.07408

⁹ https://arxiv.org/abs/2003.10611

¹⁰ https://arxiv.org/abs/1904.03478

¹¹ https://arxiv.org/abs/1203.6242

¹² http://www.cs.ox.ac.uk/QNLP2019/

¹³ https://www.youtube.com/channel/UCD5Pa5pKMRrkUSKGwjBxrSQ/videos