AMI Residency Part 1 : Exploring (word) space, projecting meaning onto noise, learnt vs human bias.
Women have vaginas, men have nostrils
This is the second of a series of articles [1, 2, 3], on research I’ve done over the past few years that I’m only just getting round to writing about. In May/June 2016 I was a resident at Google’s Artists & Machine Intelligence program. I explored a few related topics, with separate outputs. So I’ll be writing about it in two posts. This is the first of those.
Exploring latent space is a very fascinating thing. I previously wrote about it in depth here. To cut a very long story short, a latent space is a high dimensional space in which a single point represents a single instance, or sample, of data. Most importantly, we try to construct this latent space such that ideally it captures some kind of semantic relationships, e.g. so that we can perform geometric operations on points to transform them, or move in certain directions to meaningfully manipulate the data, e.g. to add glasses to a face, or make a face smile etc.
Some aspects of Machine Learning (ML) can be thought of as learning functions which map from our input and output domains (e.g. raw pixels) to this latent space. And piping data through a deep neural network can be thought of as a journey through multiple dimensions and transformations in space (and time).
Word embeddings (or word vectors) are the same concept applied to words. One can take the vocabulary of a language (e.g. 20,000 words, 50,000 words, 100,000 words etc) and plot them as points in an arbitrary high dimensional space.
There’s a few established algorithms that do this, notably Word2Vec and GloVE do it quite well. These are learning algorithms that go through a huge corpus of text (e.g. 100 billion words), and they learn how to position words in a high dimensional space (e.g. 300D) such that there are complex, meaningful, spatial relationships between the words. It’s not just that words which are related in meaning are close to each other, but the directions and distances in which they are organised mean something.
We can famously do maths operations on words. You can see in the left image (NB this is a projection from the original 300D latent space, to 3D, to 2D for visualisation purposes) that the vector from the word ‘king’ to ‘queen’ is very similar to the vector from ‘man’ to ‘woman’. Or in fact the vector from ‘man’ to ‘king’ is very similar to the vector from ‘woman’ to ‘queen’. So we can actually add the vector from ‘man’ to ‘king’ (i.e. ‘king’ - ‘man’) to the vector for ‘woman’, and we end up with a new point in this 300D latent space. If we search for the closest word to this point, we’ll find that it is ‘queen’ (actually this is not entirely accurate, more on this later).
We can write this as the famous word2vec example:
king - man + woman = queen
This is also known as a Word Analogy, and can be written as
man : king :: woman : [queen]
(read as : “man is to king as woman is to ?” and the model returns ‘queen’)
Similarly, the vector from ‘walking’ to ‘walked’ is very similar to the vector from ‘swimming’ to ‘swam’. The model seems to learn tenses as well. Also country-capital relationships, e.g. compare the vectors from ‘Spain’ to ‘Madrid’ and ‘Italy’ to ‘Rome’ etc.
These relationships are not explicitly enforced by humans during training. They are learnt, in an unsupervised manner. The learning algorithm just reads through the text, and picks up patterns on how words are arranged in sentences and phrases, and from that it infers how the vectors should be assigned.
Also remember that the diagrams above are projections into 3D space (projected into 2D space), so that we can view them. In reality this latent space is 300D, so it captures far more relationships in directions that we can’t even begin to imagine. And god knows what else it learns.
So I wrote a few twitter bots to explore this space.
This bot performs random mathematical operations on random words and tweets the results.
It first picks 2 to 4 completely random words from a vocabulary of 100K words (actually it’s 53K words, reasons explained in the src. Note that it’s estimated that educated native English speakers have a vocabulary of around 20K-30K words). The bot plots these words in a high dimensional latent space (using the famous word2vec model trained by Mikolov et al on 100 billion words of Google news). It then performs random arithmetic operations (addition or subtraction) on these vectors. This results in a new location in the high dimensional space. The bot then returns the closest words.
I.e. ‘human’ - ‘god’ = ‘animal’ means that the bot has randomly picked the words ‘human’ and ‘god’, and randomly decided to perform a subtraction. It subtracts the vector for ‘god’ from the vector for ‘human’, and finds and tweets the closest word to that point, in this case ‘animal’ (actually it tweets the top five closest words, here I just hand-picked some of my favourite results).
Above you can see some fully genuine, untampered results. But I should point out that there are hundreds (if not thousands?) of results, and I cherry-picked a few of my favourites. (I haven’t actually thoroughly looked through them all, there might be much more interesting ones).
Initially I was curating and trying to impose rules on what words the bot should pick from, so that the results would be more ‘sensible’ and interesting. But in doing so, I was actually limiting the ability of the bot to find more ‘creative’ (and arguably more interesting, or unexpected) results. So I removed any constraints that I had imposed, and let the bot explore the space a lot more freely. It now produces things which are a bit more nonsensical, and sometimes a lot harder to make sense of.
And in fact this is what this project ended up being about.
It’s not about what the model tells us, but what we look for and see in the outcome.
Tons of examples can be found on twitter. Below are a few I selected. Some of the first few examples are probably quite easy to interpret.
human - god = animal
This is an interesting one. It could be interpreted as: “if we don’t have / believe in god, we will descend to the level of primitive animals” or alternatively: “what sets humans apart from other animals, is that we were created in the image of god”. Or maybe: “humans are just animals, that have invented religions and beliefs in god” etc.
There are probably many other ways of interpreting this, and I’d be curious to hear some other ideas. But the truth is, I don’t think it means any of those things. Because there is no one behind this, saying it, to give it any meaning. It’s just noise, shaped by a filter, and then we project whatever we want onto it. It’s just a starting point for us to shape into what we desire, consciously or unconsciously.
Some might disagree and say that the model has learnt from the massive corpus of text that it has trained on, and that this artifact produced by the model carries the meanings embedded in the corpus. This is of course true to some degree, and can be verified with the examples given earlier, such as king-man+woman=queen, or walking-walked+swam=swimming. Surely it’s not a coincidence that the model is returning such meaningful results in those cases?
It does seem like the model has learnt something. But when we start to push the boundaries of the model, we have to resist the temptation of jumping to conclusions as to what the model has learnt vs what is just ‘semi-random’ results, with our brain completing the rest of the picture. I’m not suggesting that there is a cut-off point as to when the model stops making sense and starts generating random results. It’s more of a spectrum. The more we sway away from what the model is ‘comfortable’ with (i.e. has seen in abundance during training, has learnt and is able to generalise), the more noise is injected into the output, and potentially the more fertile the output for our biased interpretations.
I will expand on this in more detail a bit later. But first some more examples.
nature - god = dynamics
I particularly like this one. I interpret it as “without the need for a god, nature is just the laws of physics”. But it’s also very plausible that the word ‘dynamics’ just happens to be close to ‘nature’ and ‘god’, along with a bunch of other words that I don’t find that interesting or relevant. But you might. (NB I find ‘nature’ - ‘god’ = ‘engaging’ also quite interesting).
twitter + bot = memes
I couldn’t believe this one when I saw it. It almost needs no explanation. “bots on twitter become memes”. Too good to be true.
sex - love = intercourse, masturbation, prostitution, rape
This is a powerful one. I interpret it as “Sex without love is just intercourse”, or “prostitution is sex without love”, or “rape involves sex and hate (as the opposite of love)”. These results are very interesting. But again, it should not be assumed that the model is learning this particular interpretation from the training data. In most likeliness, all of these words are somewhere in the vicinity of ‘sex’ and/or ‘love’, since they are all related words. And yes perhaps these words do lie in a particular direction of ‘love’ or ‘sex’. But there is a difference between a bunch of words being laid out in space, and the sentence “sex without love is intercourse or prostitution…”. The latter is my interpretation of the spatial layout.
authorities - philosophy = police, governments
I have to push my creativity to be able to make sense of this one. I ask myself “If we think of philosophy as the act of thinking, and being logical or critical, then perhaps this sentence says that police and governments are authorities that don’t think, and are not logical?”. Or in other words “what kinds of authorities do not think, and are illogical? Police and governments”.
beard - justified - space + doctrine = theology, preacher
This one pushes the limits of my creativity even further. But I can still find meaning if I try hard. E.g. Let’s assume that a beard traditionally and stereotypically signifies wisdom. Imagine a beard, that is not justified — i.e. it pretends to signify wisdom, but actually it doesn’t. In fact, this particular beard also replaces space (which I liberally assume to represent the ‘universe’, ‘knowledge’, ‘science’) with doctrine. Where might we find such a beard, pretending to be wise, but replacing science with doctrine? In theology of course, e.g. a preacher.
Of course this is me trying quite hard to fit a square peg into a round hole, trying to make sense of this ‘semi-random’ sentence which the model has spat out. I wouldn’t be surprised if somebody was able to interpret this sentence to mean the exact opposite to how I chose to interpret it.
Projecting meaning onto noise
Nevertheless, I find these results endlessly fascinating. Not because I think the model has such a strong understanding of the English language, but because it acts as a kind of ‘meaning filter’.
What goes into the model is completely random (i.e. the words and arithmetic operations that the bot chooses). Or to be a bit more precise with my language, think of it as noise with a uniform distribution, white noise.
There probably isn’t much material here for you to make sense of and write stories around? It’s pretty much a blank slate.
The model effectively applies a filter to that noise, bends it, shapes it, and out comes a new type of noise.
It’s still ‘random’, but with a more specific distribution, a bit more of a structure.
In more general terms, I see these latent spaces as ways of constructing Rorschach-style inkblots for different domains, e.g. words, images, sounds, text etc. Random numbers or processes (i.e. white noise) go into the model, and more ‘structured random’ results comes out. They’re still ‘random’, but with enough of a structure for us to be able to see things in, and project meaning onto.
And this is what we do in our everyday lives anyway. Everything that we see, hear, feel, read, touch is meaningless in the scale of the universe. The letters on this page which you’re reading don’t mean anything to the world. They’re just ‘random’ squiggles, if you were to say, ask my dog. But they do have a particular distribution (of brightness across space), such that as these squiggles travel through your visual cortex, and signals propagate up and down higher levels of cognition, you start to build an idea of what they might mean.
That’s what we do, we take in particular distributions of noise, and we project meaning onto them. Sometimes — like in the case of these squiggles on this page — arguably there is an intended meaning embedded in the artifact. This is a meaning imposed by an author, like me, the producer of the artifact — embedded using a shared language, method of communication and context (e.g. the Latin alphabet, English language, talking about AI, cognition, semantics, semiotics etc.). In this case, hopefully you’ll interpret the meaning embedded as I had intended it. You’ll use that as a starting point, and then combined with the beliefs that you already hold within yourself, you’ll take away a message that is hopefully somewhat aligned with my intended meaning. But of course perhaps not, it’s very easy to get into disagreements due to ambiguous communications. Just ask Richard Dawkins.
But we still manage to find meaning even in places where there isn’t always an intended, embedded meaning, or in fact no author to begin with. Like when we see faces in clouds, or holy folks in toast. Or even like the many different stories invented by the many different cultures of the world, upon looking at the bright dots decorating the night sky.
We’re wired to project whatever internal baggage we might have onto anything that is remotely able to carry it. And for good reason too.
And that’s what I loved about Deepdream when it first came out last year. Not that it produced trippy puppy-slugs and bird-lizards. But that it took noise, and distorted it just enough so that we would start projecting meaning onto it, to detect and interpret puppy-like, slug-like, bird-like features — just like the algorithm itself did.
I wrote a longer post on deepdream in context of this train of thought (here), and a summary of the relevant bit is as follows:
When we look at these deepdream generated images, we say “oh it’s a puppy-slug, or a bird-lizard”. But actually, there’s no such thing. There are no birds or lizards or puppies or slugs in these images. There are only bird-*like*, puppy-*like*, slug-*like* features. The artificial neural network vaguely recognises those features in the original image, the corresponding artificial neurons fire, but weakly, and somewhere deep in latent space. The deepdream algorithm modifies the images to amplify those firings. And then *we* look at these images, and certain activity in our brain registers those same bird-like, puppy-like, slug-like features. But still there are no birds or puppies here. *We* complete that recognition process by projecting those meanings back onto what is essentially noise, with a specific distribution. And I’d argue that’s really the essence of our whole existence: making sense of particular distributions of noise.
Learnt vs human bias
father : doctor :: mother : [nurse]
Particularly, there was one result that was widely shared and made the headlines. When presented with “doctor-father+mother” (i.e. “father is to doctor as mother is to ?”), the model apparently returns ‘nurse’. If true, this is very clear evidence of strong gender bias in the model, learnt from the training data (in this case, 100 billion words of Google News).
Unfortunately however, this is not true.
In reality, when we perform an operation like “king-man+woman”, the closest word to the end point is not always ‘queen’. It’s quite likely to be ‘king’. In fact, in all of the operations above, usually the closest word is one of the original words that was in the input query (i.e. king, man or woman). So when we perform these operations, we manually remove (i.e. filter out, ignore) the input words from the results that the model returns. In the case of “doctor-father+mother” the model actually does return ‘doctor’ as the closest word, with ‘nurse’ being 2nd closest. In fact the top five words from the model are doctor, nurse, doctors, physician, dentist (you can try it here).
The authors of the papers explicitly state that “[word embeddings] exhibit hidden biases inherent in the dataset they are trained on … and return biased solutions … such as father:doctor :: mother:nurse”. This sentiment is expressed many times throughout both papers, and in fact learnt bias is the basis of the research.
Whereas in fact, the model hasn’t learnt this gender bias. The model is returning ‘doctor’, but literally due to human user error, that result is ignored and ‘nurse’ is reported in the paper as the model’s top output.
I don’t know if the authors of the paper are actually aware of this or not. Perhaps they are and they’re being devious, purposely reporting incorrect results. But I’d like to think not. I hope, and I think, that it’s an innocent mistake, and they aren’t actually working with the model directly, but they’re using a 3rd party interface to the model. This 3rd party interface is doing the filtering, and the authors aren’t even aware of it (e.g. here is an unfiltered online interface to the model, and here is a filtered online interface — select the English Google News model — or I have some python and C++ code here to play with the model directly).
Nevertheless, this research went viral in the news and social media and was shared widely — especially this particular result — including on places like MIT Technology Review. The interesting aspect of this, is that while fake news is indeed a big problem, we ‘intellectual critical thinkers’ generally like to attribute it to ‘the other side’ — to the Daily Mail, Breitbart and alt-right on Facebook (now more topical than ever). So why is MIT Technology Review reporting ‘fake news’? Why is everyone sharing it on Twitter and Facebook? Aren’t the folks looking for bias and unfairness in machine learning models supposed to be the critical thinkers?, the ‘good guys’?
This is a big topic for another post, but it’s the motivation for the following project so I’ll touch upon it briefly.
It seems the human bias in interpreting results might be stronger than any bias that might be embedded in the experiment or model. And no one is immune to this (including me of course, which will be inherent in the perspective of this article).
It seems the authors of those papers wanted to find bias in the word2vec model so they did, without really questioning how or why they were getting those results.
It seems MIT Technology Review wanted to report bias on the language model so they did, without questioning the research. After all, why should they question it? The results were in a paper! (NB. a paper on arxiv is not peer-reviewed, anyone can post on there and it should have no authority. And a paper at a workshop is not held to the same level of scrutiny as a conference or journal).
It seems everyone who shared these articles on Twitter and Facebook wanted to share stories about learnt gender bias in ML models, so they did, after all, why should they question MIT Technology Review? or researchers at Boston University, or Microsoft Research?
And most crucially, the questions being asked in the papers are important questions and should be asked and discussed, and I praise the authors for doing so (in fact, the next project and this post might not have happened if it weren’t for them — and they perform many other studies in the papers which are definitely worth a read).
Nevertheless, I find it fascinating how we can let our guard down, and be less critical of our allies — of stories, narratives and ‘evidence’ — when they are aligned with causes that we support and believe in. It’s almost as if we’re willing to relax our critera for critical assessment, and to forego a little bit of truth if it’s for ‘a good cause’ (this is something that comes up a lot in my thinking, and I wrote briefly about it here, with a classic example here).
And I’m not suggesting that there isn’t learnt biases in the model. In fact there is without a doubt learnt biases in the model, there is almost always bias in a model, that’s why the field of statistics was born to begin with! (i.e. to study, and try and minimise this bias — I wrote about a brief history of machine and statistical bias here).
It’s just that “doctor-father+mother=nurse” is not an example of it in this case. If anything, it’s evidence of human bias in interpreting, reporting and sharing the results.
So I started thinking about how one could explore gender bias in the model.
Everything I’ve said up to this point — projecting meaning onto noise, and learnt vs human bias — was motivation for this twitter bot.
This bot is similar to the previous one, but it’s more about exploring societal biases (particularly gender) that the model might have learnt from the training data. It looks for random word analogies with ‘man’ and ‘woman’, and runs them both ways.
I.e. if unfiltered, “man : doctor :: woman : ?” returns doctor, which isn’t very interesting, i.e. no bias evident, we don’t gain much insight about the model or data. If filtered, we get nurse, which is interesting, but doesn’t say much on its own. I.e. this cannot be interpreted as the model claiming “man is to doctor as woman is to nurse” (see prev section).
However, if we reverse ‘man’ and ‘woman’, and also run “woman : doctor :: man : ?” and filter the results, we get ‘physician’. Now that’s interesting, I think. While the top (unfiltered) result for both “man : doctor” and “woman : doctor” is still ‘doctor’, the second top result for woman is ‘nurse’, while the second top result for man is ‘physician’. This is clearly a bias embedded in the model, learnt from the training data. I wonder what else is in there?
So this bot explores word embeddings in this manner. It picks a completely random word, adds the vector from ‘man’ to that word to ‘woman’ and returns the results. It also adds the vector from ‘woman’ to that word to ‘man’, and returns the results. In both cases it returns the top four results, and filters out the input query words to save space.
It’s not very scientific, just a casual exploration. But actually, as well as exploring learnt bias in the model, it’s also exploring human bias in our interpretations. Just like in the case of @wordofmath bot, I find it interesting to see how we try to project meaning onto the results. Since the bot picks a truly random word (i.e. white noise, uniform distribution), the results are often quite hard to interpret. And as before, we read what we want to read from this structured noise.
If the random word is ‘requested’, why does a woman ‘consent’ or ‘demand’, while a man ‘instructs’, or ‘agrees’? Is it me or does it seem like the man’s words have more of a positive connotation? Does that say anything about the training data? or am I reading too much into it? Does it say anything about the model? Or does it say more about me and the way I think? How would I have interpreted these results 10 years ago compared to how I do today? How will I interpret them in 10 years?
If the random word is ‘likes’, a woman ‘adores’ or ‘enjoys’, while a man ‘relishes’ or ‘knows’. How come a woman ‘enjoys’ while a man ‘knows’? Does that mean anything? Or could it be as inconsequential as being caused by floating-point rounding errors?
If the random word is ‘characters’, woman is ‘heroines’ or ‘actresses’ while man is ‘villains’ or ‘monsters’. Surely that isn’t a coincidence.
If the random word is ‘cars’, the only word that is different is ‘sedan’ for woman, and ‘vans’ for man. I guess it is more common to have male van drivers than female. Clearly this is not random, and the model has learnt something. But what exactly has it learnt? Is it right to use this example to justify other results?
This is very interesting, if the random word is ‘chewed’, the only different word is ‘eaten’ for woman and ‘gobbled’ for man. Personally speaking, ‘gobbled’ is a pretty accurate description of how I eat.
Women are more likely to be associated with ‘advocacy’ or ‘charities’, while men are more likely to be associated with a ‘team’ or ‘club’. Again, this sounds believable.
‘social’ issues relating to women include ‘gender’, ‘mothers’, ‘welfare’, while for men it’s ‘sociological’, ‘youth’, and ‘intellectual’.
In response to ‘eyelids’, women have ‘vagina’ and ‘cheeks’, while men have ‘nostrils’ and ‘forehead’.
Woman is to ‘tub’ or ‘tray’, as man is to ‘bucket’ or ‘colander’. What does that mean or imply? I’m sure one could use this as a starting point for an essay if they put their mind to it, as I did with the “beard-justified-space+doctrine = theology, preacher” example.
The model will return a result, no matter how ridiculous the question you ask. Again I’m reminded of one of my favourite quotes (which I also include in my post on the history of statistical bias):
[on Babbage’s calculating machines]
“On two occasions I have been asked [by members of Parliament],
‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’
I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.”
— Charles Babbage (1791–1871), “Passages from the Life of a Philosopher”, 1864
There’s no doubt that word embeddings can indeed learn to spatially position words in high dimensions in such a way as to capture some kind of meaningful relationships. This will include biases embedded in the training data. And when using such models for critical decision making, any discrimination caused by such biases is likely to have very negative consequences, especially for those who are already at some kind of disadvantage.
But mixed in with that, often these outputs are so delicately positioned between ‘random’ and ‘structured’, that sometimes it’s very difficult to know whether there is actually any meaning behind them, i.e. are the biases embedded in the model, or are we are just projecting what we want to see, exposing our own biases in interpreting the results. Sometimes projecting too much meaning onto the output of a model can be a bit like seeing Jesus’s face in a piece of toast, and being convinced that it’s a message from God.
And in some cases this isn’t always a bad thing. I find the idea fascinating, to use ML models and latent spaces as meaning-filters, to interrogate our own biases and perceptions — to take uniform distributions of noise (i.e. totally random, i.e. white noise), and bend them into slightly more structured noise, like parametric Rorschach inkblot generators, for various different domains.
And then we can use the produced artifacts as starting points, as seeds that flower in our imagination, that we see things in, project meaning onto, create stories and invent narratives around, as we have done for millions of years.
NB. These ideas of “projecting meaning onto noise” and other self-serving biases of course go well beyond interpreting the outputs of machine learning models, to arguably all aspects of cognition and in fact life — including even some of the extreme social and political polarisation we’re seeing today. I’ll undoubtedly be working on these themes more in the near future.
In fact, whenever I think of the term “what does it mean?” I cannot help but think of yosemitebear62’s double rainbow video, and his efforts to project meaning onto this magnificent phenomenon:
and even more exemplified in the last 30 seconds of his explanation:
In addition to my ongoing research in this field as part of my PhD, this work was supported by a residency at Google’s Artists and Machine Intelligence Program. In that capacity I’d like to thank Kenric McDowell, Mike Tyka, Andrea Held, Blaise Aguera y Arcas and many others for the support, inspiring conversations and suggestions. The work and ideas I talk about here were also inspired by many many others, but I’d like to give a particular shout out to Allison Parrish and Ross Goodwin.