For as long as I can remember, I have had an absolute fear of writing. In primary school, my parents were told that my spelling and grammar was ‘worrisome’. At the age of 11, I participated in a game show on national television, where I had to complete a word puzzle. I made such a fool of myself that the directors eventually decided not to show all my blunders, but just enough to be mocked at school. Two years later, I was diagnosed with dyslexia and the same year my English teacher told my parents I would probably never be able to speak or write English. You could say language has never been my closest friend.
Fortunately, I was born at the perfect time. When using Word, I’ve always been guided by the wavy red and blue underlines. Without them, this article would probably not even exist. Computers have become remarkably good at understanding and processing language as a result of dramatic improvements in Artificial Intelligence (AI) over the last decade. Most of these improvements remain under the radar because we often don’t even realize when the technology is being used. When we fill in an online complaint or chat with customer service, our language is most probably understood by a computer, and the answer is (partially) processed by an AI model.
GPT-2, developed by OpenAI, is the most recent AI model for natural language understanding that illustrates how much this technology has improved. This system quickly sparked some controversy as the researchers decided not to make their model publicly available, making it impossible to test the efficiency of the system. The reason given for not publishing the model was the fear of it being used for malicious. However, the developers decided to publish a more limited model, 117M parameters instead of 1542M parameters and 768 dimensions instead of a 1600 dimensions model. It is hard to apprehend the significance of these differences and what these differences imply, but remembering how math class became suddenly much harder when we switched from two dimensions to three dimensions in primary school may give us a good hint.
Here is how it works; the model is trained on 40GB of data collected from thousands of webpages and will predict the next words following a sequence of inputted words. I am currently writing my thesis (coincidently on this particular subject), and I decided it was time to find out if I could use GPT-2 to assist me in doing so. I used the first sentence of the abstract of my thesis as the input for the model on which I let the system run ten consecutive times, resulting in 10 completely different results. Remember that the system was not able to read the rest of my thesis. Most of the results were garbage, but one answer left me almost speechless.
“Automatic speech recognition (ASR) has experienced a boost in technology development and technology adoption across different markets, from the first digit recognizer to the omnipresent voice assistants ready to support us in our daily lives.”
“At the center of the latest technology we are seeing is Deep Learning and how we can leverage AI to improve our personal and business interactions.
As technology has expanded in the world of business, the scope and size of services we require has become greater and we’ve seen a surge in customer demand for greater control over our personal and business lives. Deep Learning and machine learning have allowed us to create the next generation of personal assistants, enabling us to better anticipate the needs of our customers, create new products based on the needs of existing customers, and use AI to perform more powerful and predictive actions when given the information they need from us. This is one area where advances in Deep Learning, Machine Learning and AI have improved by leaps and bounds over time. In this blogpost on Google Cloud Vision I’ll look at one of the recent breakthroughs in this area, Google Now. Google Now is a product that is used in Google products such as Gmail and Chrome to anticipate important needs based on the information that you provide in real time. Google Now recognises the information provided using Deep Learning”
The answer provided by GPT-2 is full of mistakes, but I was absolutely stunned by how much of the generated abstract was actually referring to essential parts of my thesis. Yes, Deep Learning is at the center of this new technology. Yes, we have seen a recent surge in customer demand.
And last but not least, in my thesis, I discuss the importance of Google Now as the starting point of this emerging demand. And yes, I did indeed find this on a blog post from Google. It is awe-inspiring how the model can (or at least give the impression to) understand the context of the complicated sentence and build an abstract that, if slightly tweaked, could be used in my thesis.
For comparison, the original abstract of my thesis:
“Automatic Speech Recognition (ASR) has experienced a boost in technology development and technology adoption across different markets, from the first digit recognizer to the omnipresent voice assistants ready to support us in our daily lives. As man-to-machine interactions through speech are increasing, an important question arises: are those voice assistants just a hype or are we really on the verge of a technological breakthrough? This question is addressed by identifying how the focal technology has transformed over the years, by first analyzing the development of technology components, substitutes and complements that have formed bottlenecks in the development of speech recognition enabled applications, and subsequently by analyzing the pace of technology substitution empirically to build predictions. The empirical data confirm a positive impact of recent advances in machine learning and language modelling on technology performance and market demand, showing a double-boom cycle in patenting activity. This thesis demonstrates that the youngest phase of increasing ASR technology development is a result of growing market demand and technology development of essential components and complements.”
So, is it time to let AI write your thesis? No, the nine other results from my test illustrated that the system does not fully understand language nor context. The generated text lacked structure and was often completely off topic. And, although we are not able to test it, we may at this point assume even the complete model with 1542M parameters will still fall short in many situations. So, there is some luck involved in generating usable text but it would be naive to ignore it.
AI technology and more specifically, natural language understanding will have major consequences for different jobs in the near future. For instance, look at journalism. In the situation of breaking news, it is not hard to imagine an AI model like GPT-2, trained on specific data (only news articles) to generate an article that covers the full story in a matter of minutes or even seconds. Or during a natural disaster, computers may be able to process some of the incoming emergency calls.
Of course, this is one of these examples of AI where it is easy to have an apocalyptic view on the consequences; in brief, more fake news and fewer jobs. I do not hold such an apocalyptic view. Good AI can be used to fight bad AI. You could, for instance, use the same AI models for writing or detecting fake news. However, it is crucial that politicians understand the impact that AI will have on all dimensions of society and act proactively in guiding the technology development in the right direction.