Machine learning, artificial intelligence and neural networks have been governing many aspects of everyday life without being noticed, and becoming a more decisive mechanism for crucial aspects. Machine learning based systems are penetrating and being implemented in an increasing number of mediums, from autonomous cars and personalized news feeds to medical diagnostics and crime potential analysis, which affects our culture, perception and the relationship with machines. They reflect us by generating ethical discussions, while learning to function “correctly” through human input. At the same time, it reveals the necessity to reconsider our tools and methods for debugging, repairing and calibrating. Can an artificial intelligence know what’s best for us? Or mislead us caused by the human error in the dataset? How can we test the sanity of artificial intelligence?
As I started working on this topic, a sporadic conversation with Bager Akbay, made me realize how interconnected this topic is with many issues that otherwise seem irrelevant. Besides the exhibited work, I tried to portray my state of mind during this research and make the process transparent by displaying the texts that I have been reading, learning and the recent developments that I followed as a part of the exhibition.
When Google released Deep Dream, the modified version of their object recognition artificial intelligence that manipulates parts of an image in order to increase the resemblance of what the AI thinks it is, I used it for fortune telling from Turkish coffee (an ancient tradition still practiced by Turks the world over). I tried to read the symbols that are hidden in the trails of coffee sediment left at the bottom of the cup, through the algorithm. Unfortunately, puppy-slugs don’t have an equivalent in a Turkish coffee context.
In reference to this work, we started thinking if it would be possible to do the psychoanalysis of an AI, through the same method as Rorschach inkblots?
To begin with, I should elaborate on the nature of AI. The majority of the technology that we interact with on a daily basis is deterministic. Which means their functionality is based on specific conditions and states. The main difference of artificial intelligence (AI), or machine learning (ML) based systems is that they are stochastic. This creates a big advantage in regards to problems that don’t have a mathematical model, or might even be undefined. Memo Akten’s example explains this eloquently.
There are many other physical features which are not directly measurable. E.g. gladiator-y-ness (how much of a Hollywood gladiator someone looks like).
This isn’t how strong someone is (i.e. what they can benchpress), or how tall they are, or how much they weigh. But how gladiator-y they look. We humans can look at someone and say person X is more gladiator-y-er than person Y, but less gladiator-y-er than person Z. This is a real (but subjective) feature which exists, but it isn’t directly measurable. It’s a latent feature (i.e. hidden).
The ability to do arithmetic operations on a semantic level through latent features renders each new ML based application to be quite surprising. Analyzing the relational vectors between words and images, or implementing this structure to thought vectors, proves that we are in fact witnessing something that was considered science fiction.
The systems that are being deployed are as interesting and diverse as experimental ones. From personalized news feed collectors (which are more invisible, thus doesn’t create much discussion) to a beauty contest jury, or the search that finds porn that resembles most to the uploaded image (which generates further discussion), or in more crucial topics such as automated cars, risk assessment and medical diagnostics, AI systems started being deployed in various fields.
As each emerging technology, there are many issues about the integration of AI in our daily lives. Among 44 “winners” most of them were white, and only one was black. The AI based risk assessment tools that are being used in courts calculates the risk factor of white people as less than black people, even if it is the first time for the black person to have broken the law, and a recurring action for the white person. These mistakes are not something we can overlook.
Personally speaking, one of the great things about ML is that it reflects humanity. Trained by human-produced data, observing ML systems show us who we really are. The cognitive biases seep into the algorithm through the data we produced. Although it is worth noting that this bias is not related to ML. The “Shirly Cards” that were used for calibrating the chemicals in the photography films, resulted in a period where black people could not be visible in the photographs (I can’t even think of not having any childhood photos) which didn’t take long to correct. The AI that creates analogies in the aforementioned example (king-man+woman=queen) was put to “implicit association test”. The results reveals the societal values of the period when the data was produced. This renders as out-of-date and does not fit in today’s values.
But how can we correct these flaws? Current tools for debugging, repairing and calibration are not suitable for AI. In fact, looking from a deterministic perspective, the software is working properly. “A close up of a person holding a pair of scissors” aims to expand this discussion. Being able to get results of AI in a very human-specific test like Rorschach, directs us to many questions.
Rorschach, who in fact wanted to be a painter, shows drawings to the patients in the hospital where he works, and discovers a pattern in the answers of schizophrenic patients. In this case, the recent image captioning AI’s were suitable subjects.
I tried different image captioning AI’s to examine what they see in the Rorschach inkblots. The only AI that was answering in a detailed enough way was “DenseCap”, which was analyzed by an expert. The name of the exhibition came from my early sketches where the tenth card was first manipulated by Deep Dream and captioned by “NeuralTalk2”. It was a good reference to the way I feel about our approach to technology as absolute neutral, whereas it is obvious how unprepared we are.
Think about your autonomous car being neurotic. If the training data is gathered by İstanbulite drivers, I don’t think it will be an improvement in our lives. The decision that autonomous cars make in the moment of inevitable loss, is a common example. This reveals some questions that we would rather avoid, or can not answer. No one wants to choose between sacrificing 2 versus 5 people as in the classic car-trolley problem. On the other hand, we are at the point where we have to provide an answer. In order to seek the answer, MIT created Moral Machine, a platform to gather human perspective on moral decisions. But can this platform achieve a single answer? Or as the results change each day, will they become software updates? Shall we approach this problem globally, or can different localities have different perspectives on this, and does this mean that if I cross a border, my car will function differently? Does my personal opinion matter? What if it is not a car, but an AI boss that is telling me how to work? Can there be a single way of doing things correctly? Doesn’t this diminish diversity which is the basis of evolution? These are just a sample of the questions that comes to my mind.
It is inevitable for ML systems to fail as it is in all kinds of technology. But who’s fault will it be? As Microsoft’s Twitter bot Tay learned what to talk about from the twitter users, it turned into a fascist in less than a day. Microsoft, nor the programmers were responsible for this, it was users’ fault. Can this mean that users will be responsible for the mistakes in ML systems? Meanwhile, giving personal rights for liability issues is being discussed in recent law journals.
Another similar topic is about the copyright issues of ML systems’ outputs. The number of songs, poems, scripts that are generated by AI is increasing everyday. Who is going to own the copyrights of these intellectual properties is still vague. In the age of digital production, the artwork can be digitally copied, while the idea still remains unique, whereas AI systems can produce resembling artworks diminishing this uniqueness. As it is the case with all technological improvements, it is us who should change our point of view.
This is why it is impossible for an individual or an institution to resolve these issues. Having open academic and public discussion and creating an archive might be a way. I will archive and exhibit all the sources that fed me through this process. Even though the Rorschach inkblots have been in public domain since 2009, and considering success of the test relies on seeing them for the first time, I decided not to display the inkblots as part of the exhibition. Instead, I will use the Deep Dream sketches, along with captions of the original images.
Lastly, all this work is possible because the people who are developing cutting-edge ML systems are developing as open-source. (Thanks to Andrej Karpathy and Justin Johnson). I should also mention Kyle Mcdonald, Memo Akten and Gene Kogan as sources of information and inspiration. Last but not least, I thank Assist. Prof. İrem Erdem Atak for the analysis, Assist. Prof. Bahar Tanyaş, and Bager Akbay for his full support through this whole process.
You can read the analysis here: https://medium.com/@kocosman/evaluation-of-rorschach-test-results-of-artificial-intelligence-aeba8193c52f#.hlgvas2f4