Our quest for finding the universality of language

Bob Coecke

Published in

Quantinuum

10 min readJan 27, 2023

By Bob Coecke, Vincent Wang-Mascianica, Jonathon Liu

Why are there so many languages, and not just one?

Within one language, why are there different ways of saying exactly the same thing, and not just one way?

In fact, what does it even mean to say the ‘same thing’, across languages, or within one language?

In different languages, the ‘same thing’ may appear very different, due to different orderings of corresponding words and different grammar. Even within the same language, there are choices to be made regarding things such as the order in which words are used, punctuation, and choosing between using many short sentences or one very long sentence.

The way we use language therefore seems to contain redundancies.

Understanding these redundancies, and more interestingly asking ourselves the question: “What remains if we free language from those redundancies?” is relevant and important.

This short note suggests an answer to these questions. We do this by proposing what is truly universal beneath language, across different languages, and across different styles of writing and expressing the same point in a single language. We also explore why for us humans, language is unnecessarily complicated.

We also take into account and venture into the quantum world, explaining why we are working on these matters at Quantinuum.

This blogpost accompanies the release of a long technical paper published on the arXiv entitled “Distilling Text into Circuits”.¹

1. Language and meaning

Language is worth exploring and understanding for many reasons — some of which have pre-occupied scientists and theoreticians for a very long time. The recent frenzy around the launch of ChatGPT simply highlights the many and varied reasons that natural language processing captures our imaginations, and also highlights why developing ‘human like’ technologies to understand language are so very hard. Even ChatGPT, for all it’s evident advances, is still a very long distance away from being ‘human like’ in terms of basic things such as understanding. We are a long way from a technology understanding something basic such as ‘knowing’ what the word ‘red’ means.

Or are we?

Coming back to our own very basic grasp of how languages work, we know that there are some things we humans can only understand by what we might describe as ‘doing’. The best examples of ‘doing’ could be in pottery or creating art or playing a sport where repetition helps us to get better.

However, when we stop to think more carefully about how humans learn, we realise very quickly that many of the important things we know and skills we possess are taught using language. Each and every day we use language to instruct, cooperate, investigate, promise, and entertain. Language has allowed our most important ideas to survive beyond their original thinkers, so that future generations benefit from all the lessons of the past. The whole of literature in a sense is a testimony to this fact.

A theory of language is in fact a theory of everything that can be talked about!

So, let’s look at how human language works.

In general, languages are complicated and diverse, but we can start with a simple observation: we can only say one word at a time. All written natural languages live on a line, and that line only breaks up when we reach the end of the page.

However, the things we might want to speak about and point to in the world or in our minds may live in richer spaces than a mere line. A painting for example lives on a two-dimensional canvas, a film in addition involves a flow of time, a piece of music has several instruments played at the same time, and who knows how many dimensions our general thoughts even need, given that we are able to think about many things at the same time.

The mystery of language is clearly very profound: for example, using language to describe a movie or a thought seems as surprising as using an orchestra to ‘perform’ a photograph. How is this possible at all?

A first stepping-stone towards understanding and answering these questions dates back some 15 years, in research that aimed to combine meaning — as it is now understood in large language models like GPT3 — with grammar — the well-mathematicised part of language that people had mostly been studying until then — and do so in a manner that mimics how we humans combine the two.

In very simple terms, in that research we placed words on a line as usual, and drew the grammatical connections between them as wires that connect different words. This leads naturally to diagrams like this:

The inspiration for these diagrams came from a novel formalism that we were developing for quantum mechanics by exploiting the correspondence between existing language theory and a new quantum formalism, which can be found in a more technical book² or a very accessible one³. Since then, these diagrams have led to the application of quantum natural language processing on quantum hardware. You can read about all of this in our early blog posts⁴ ⁵ or in a much more recent IBM blog post⁶, or you can listen to a talk that was recently hosted by IBM⁷.

The realisation that language lives on a line explained some of the variety we see in natural languages. For example, in the sentence “Alice hates Bob”, there are two actors, Alice and Bob, and there is a hatred relationship between them. Since “Alice hates Bob” and “Bob hates Alice” have different meanings, we need to know who is doing the hating, and who is being hated. In English we put the hater on the left of the verb ‘hates’, and the hated one on the right. In Japanese, the hater and hated one are both on the left of ‘hates’, and in Tagalog (the main language of the Philippines) they are both on the right. In fact, for every one of the 6 possible orders of ‘hates’, hater, and hated one, there is some language in the world that uses that order. Hence there is no best order for these words, and people have essentially made ad hoc decisions that eventually became conventions and allowed us to understand each other.

2. Text as circuits

In 2020 the work took a somewhat different twist when we started to represent language in terms of circuits⁸, rather than as words on a line. We did this because it enabled sentences to be further composed into a larger text. After all, we don’t communicate in single sentences. Literature and language are made up of long strings of individual sentences.

In order to illustrate what we mean, in the previous theory, we could have two sentences that both involve Bob, but there was no good way to put the two together:

Let’s instead think of actors like Alice and Bob as wires in a circuit, and a verb as a circuit-gate acting on those wires. That is, we view “Alice hates Bob” as a circuit where the meanings of the actors Alice and Bob live in the wires, and they are related by the verb ‘hates’:

Bob shows up on two different wires, and those Bob wires can now be easily composed, to show a story unfolding about Alice, Bob, and beer, how Alice hates Bob, and Bob therefore goes for an unhealthy solution to his problem, and the story progresses further:

3. Inter- and intra-language independence

However, as so often happens with deep research, the interesting features or consequences of our approach to understanding strings of sentences and not just a single sentence, were unintended. In science the unexpected is often the most interesting.

Let’s explore what happened.

Firstly, for more complicated sentences, the circuits look much simpler. For example, the complicated sentence:

as a circuit becomes:

How is that possible? How were we able to simplify the picture so much without apparently losing any meaning?

The answer to that question is that the first picture has a lot of extra data, which can either be language- or style-specific, such as the order of words in a sentence, or differences in punctuation, uses of relative pronouns, etc. All this data gets distilled away in the circuit, leaving only what we referred to as the ‘same thing’ at the very beginning of this note.

That is, different texts, even if they look different, mean the same thing when they become the same text circuit:

*Examples of different sentences or parts of text resulting in the same language circuit*

Remarkably we found that we can also reverse the method for converting text into circuits, to get a method that converts any circuit into many possible texts that all mean the same thing. Think about this for a moment — there is literally no limit to the number of things we can express in language and yet this infinity of expression is interchangeable.

This discovery in fact works for any circuit made up of word gates we might imagine. In the normal grammar for words on a line, there are many rules that determine whether a sentence is good or bad — for example, the word order in “hates Alice Bob” is bad in English — and as we explained above, these rules differ between languages. But unlike words on a line, all circuits are good, no matter what order they appear in!

What is really going on here? Is this the universality in language that we were seeking?

In fact, all these observations are evidence that we have found a representation that is independent of the constraints that make languages superficially different.

We can claim that the circuits are the true representation of language meaning. However, we poor humans can’t talk to each other using those circuits, so we were forced to place words on a line, sequentially. This comes with many choices, and hence, different languages and different styles emerged over time.

4. What we did in the paper

Our story so far is about language and circuits in diagrams, and though it looks very pretty, we also have to be sure that the story we are telling is mathematically rigorous. For us, this means that the circuits are always well-behaved, and don’t suddenly become naughty in a surprising way.

Since we relate the structure of circuits to the structure of language, we first build a grammar for a good chunk of English that behaves like a mathematical skeleton for natural language, capturing how different words interact, refer to, and modify each other. We crafted it such that the structure precisely captures just enough to be able to derive the corresponding text circuits.

The grammar looks something like this:

It’s important to get a mathematical skeleton for text because circuits are also mathematical objects, and we can only use mathematics to relate mathematical things. We then relate text and circuits via a translation algorithm: a recipe to transform hybrid grammar into circuits and back with rules so foolproof (and complete) that anyone, or anything, including machines, can turn text into circuits and back, as long as they follow the rules.

Ultimately, we show that all of the very cool observations we made about language and circuits are always true. Once we have a recipe that relates text and circuits, we show mathematically that any English text can be turned into a unique circuit, and also that all circuits can be obtained in this way. Latterly we have built upon this foundation to go further in other work, where text circuits are truly independent across different languages, such as English and Urdu⁹.

5. Where does quantum come in?

When we were testing our earlier theory on quantum hardware in 2020, (we have written about previously⁴ ⁵), we turned language into circuits, which is the structure of programs for quantum computers.

We ended up with something like this¹⁰:

We were already turning language into circuits, and unknowingly distilling away all the extra data that language usually has. We know that these circuits are really hard to simulate on classical computers, due on the one hand to the complex relationships between nouns that occur in text, and on the other hand to how word-meanings are encoded as vectors. Look no further than the enormous cost and complexity of ChatGPT for evidence of this. It was a very small step for us to realise that written and spoken text is a promising field for exploring quantum advantage as quantum computers start to scale in capacity and performance.

It is our understanding and belief that the advantage of using quantum computers for natural language processing is in the same league as the advantage we are expecting for chemistry and materials development. In other words, NLP is quantum native — hence “Q” NLP.

Stay tuned!