Conversations with AI — The manual

Published in

ArtInUX

11 min readApr 14, 2021

It feels like not that long ago conversational AI was in its inception. Chatbots were very basic and had to fall back to human agents pretty quickly, and you were never really fooled that they were bots anyway, despite the names and avatars the companies gave them. Today, conversational AI is at the forefront of the industry, having infiltrated our lives with chatbots that have since learnt to help us find the best deals on car insurance, house appliances that talk back to us and smart assistants that (sometimes) predict our next move so well, it’s even a little creepy. With CAI, NLP, NLU and NLG becoming commonplace jargon — what is it and how does it all work?

What is Natural language?

In order to understand NLP, we must first understand what natural language is. Natural language can be visual, audible, written, or a combination of all three. It is comprised of sentences, words, letters, characters that vary across many cultures. Take, for example, the three languages I am most comfortable with — English has 26 letters, Latvian and Russian — 33 each. These are considered to be average in length and aren’t outliers.

According to Ethnologue.com, there are 7,117 spoken languages around the world. The number of letters in alphabets varies — from 12 (Rotokas, an indigenous language to Papua New Guinea) to 74 (Khmer, Cambodian). But it doesn’t end there — there are just under 50,000 Kanji and just over 50,000 Hanzi characters.

Another characteristic of natural languages, aside from the diverse vocabulary, are things like syntax (or the grammar rules), semantics (the intended meaning) and linguistic ontology (the relationship between words, sentences and phrases). And as if that wasn’t enough, natural language also contains words with different, often ambiguous meanings, different accents, wordplay, slurs and errors, context and mispronunciation. Then there’s the understanding of when and where a word or a sentence ends, and another begins as well as the variations in the speed of utterance.

And have you heard of homophones and homonyms? To best explain what those are, recall a particular American boy band from the mid-to-late 90’s.

Here, let me jog your memory:

Well, when I first heard that song, I thought it was about the peculiarities of the English language. After all, to me, the line went like this “buy, by, bye”.

That is because the word ‘bye’ is both a homophone (same pronunciation but different meanings) and a homonym (the same spelling /pronunciation but different meaning).

And then there’s humour. Have you ever come across a joke from another culture and wondered how that’s funny? Have you ever struggled to identify puns? (bonus points if you can count the number of linguistic puns in this article) Humour is a universal phenomenon, but it differs across cultures and languages. This begs the question — can humour be taught to a machine?

And, we all, without fail, always understand sarcasm, right? (/sarcasm)

To say that a natural language is complex is an understatement.

With all of this, you quickly begin to understand that language is by far one of the most complex things we have invented. It is hard enough to understand for a human, so what, you might think, are machines doing in this space?

Well, turns out — quite a lot.

What is NLP?

NLP or Natural Language Processing is a sub-field of AI that strives to give computers the ability to understand human language. It’s not perfect, but it’s getting better every day. And while these machines still have a way to go, they are getting better at interpreting the nuances of human speech. At its core, NLP consists of Natural Language Understanding (NLU) and Natural Language Generation (NLG).

NLU

In short, Natural Language Understanding is the process of using analysis of language input to determine the meaning. Homonyms and homophones come into play here. This part also deals with the aforementioned syntax, semantics and ontology.

NLG

You can think of Natural Language Generation as the machine’s ability to respond (in a natural language). NLG outputs are predominantly text-based, but, just like language inputs, mediums can be voice or text, so can NLG outputs be translated into voice, if desired.

While NLU deals with processing inputs, NLG is responsible for the outputs. A brilliant example of both of these at work was an experiment run by Ross Intelligence. ROSS built a legal text summariser (and simplifier?) — you input a large, complex document, say, an NDA contract. NLU side processes the text to distil the meaning, and the NLG side then is able to summarise and simplify the content. See it for yourself:

Ross summarising and simplifying NDA. source

This is a long process where words are picked apart to determine their meaning in isolation and within context. It’s the ability to analyse a string of text — be it a string of text on a screen or even a string of text in a voice input channel — and have it translate back to a human in a coherent, logical and user-friendly — i.e. natural — manner.

🎉 Congratulations! You have graduated with the basics (the very, very basics) in NLP. If you have made it this far, we might as well continue and take it up a notch with Conversational AI.

What is conversational AI?

CAI or Conversational AI refers to technologies that we, the users, can talk to. The most common examples are chatbots and voice assistants.

Chatbots

The term “Chatbot” appeared in the 1990s, but the concept and (basic) technology has been around since the 50s. Chatbot, or it’s earlier version chatterbot, meaning chat or chatter=conversation + bot=robot, is a machine’s attempt at the Turing test. And one of the earliest attempts was ELIZA — an early NLP program created in the mid 60’s at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum.

A lot of time has passed since the inception of ELIZA, and since then, it has become apparent that not only are chatbots here to stay, they are more popular than ever. Back in 2016, chatbot adoption was minimal, but even then, a survey by Oracle found that as many as 80% of surveyed businesses were looking to implement chatbots by 2020. Now, fast forward to 2021, we have accepted them as a norm. So much so that Emarketer estimated that a quarter of all mankind had used a chatbot by the end of 2019. A quarter of the globe!

Leveraging Natural Language Processing in Chatbots is a relatively recent trend. The idea of using artificial intelligence and machine learning to understand human communications has proven itself popular, with applications like Facebook Messenger, Apple’s Siri, Amazon Echo, and more. In the (not so distant) past, building a chatbot meant spending months designing rules and compiling every possible answer to every possible question. But NLP has made leaps and bounds of progress, and with deep learning, it is now possible to build a more powerful conversational AI in just a matter of hours, not weeks (or months).

Voice

Similarly to chatbots, voice UI allows us, the users, to interact with technology through the medium of our voice. Whereas with a chatbot, you would write what you wish to communicate, with Voice agents, you would say it. Think Google Assistant, Alexa or Siri, to name a few.

I speak for myself only, but I imagine there are many out there like me — I talk to my devices. My morning starts with “Hey Google, turn on the coffee machine” and ends with “Hey Google, lights off”. And countless other interactions in between. And the more voice-enabled technology I adopt, the more I want it. I have found myself wanting to ask the door to let me in when my hands are full and my keys are nowhere to be found.

Every once in a while, I find myself giggling when Google or Alexa misunderstands my partner’s strong accent.

I’m sorry, I didn’t quite get that!

Apparently, Google and Amazon’s engineers have not catered for inputs such as: “Alexa, lights oooooooph!” and other strong accent variations. And we’re not the only ones noticing this.

Data by Statista, data viz by @kristinedottech

In a 2020 global survey investigating the main barriers to the adoption of voice technology, 73% of all survey respondents named accuracy as the leading barrier, with 66% stating accent or dialect related recognition issues as the second biggest.

Voice-assisted devices for all?

If you think this is niche and not worth talking about, think again! NLP is a lucrative business.

Data by Statista (December 2018 forecasts), data viz by @kristinedottech

With the global revenues from NLP markets projected to exceed $43 million UDS by 2025, you can expect most modern appliances to be striking up a casual conversation with you soon.

Similarly to hunger for Chatbots, more than 90% of surveyed businesses either plan to or have already made significant investments in voice tech and as many as 94% plan to increase their investment within the year.

In fact, it seems that voice devices have spoken loud and clear (how many puns is that now?), Businesswire.com estimated that in 2020 there were 4.2 billion voice assistants. And by 2024 this figure will double to be in excess of 8.4 Billion globally. To put this in context, the UN predicts the global population for the same year will be less than that, at 8.1 Billion. Just think about that for a moment.

The lead singers

While there is a lot of competition in this space, there are certainly those that dominate: the usual suspects, Apple, Amazon and Google. But the distribution isn’t equal here.

As of last year, Amazon and Google’s voices dominate, and they have left the rest on mute. With the number of smart home devices supported by Amazon’s Alexa surpassing the 100,000 mark in 2020 and Google Assistant servicing half of that, at 50,000 devices globally, you really can expect your fridge to chat to you soon.

As you recall, at the start of this article, we covered that language is complex, and it has many variables. So how do the big three compare in terms of the accuracy of NLP abilities? Let’s take a closer look.

Better than expected, if I was to go by my subjective experience, if you asked me.

But for this technology to continue growing and being adopted, the service providers must focus not only on what they provide to the end-user but also on how they address the main concerns. Based on a survey of 7000 participants, 52% stated that their main concern was the safety of personal data, followed by 41% stating privacy around unsolicited listening/recording.

Language as a superpower

If all of this feels like a bit too much to take in, don’t worry — I assume you, the reader, are a human, and as such much, if not all of the above comes naturally to you. Think of it as your superpower (really, this stuff is hard!). I am suddenly having flashbacks to my Bachelor’s. That was fun! (sarcasm?😅 )

Conclusion

So, where are we now with our fast-evolving world and where is this technology heading? The future of conversational AI is bright, but it’s important to remember this evolution is a marathon, not a sprint. The technology of conversational AI has come a long way in a very short period of time, and just because you aren’t using it now, doesn’t mean you’re out of the race, you can be on top of the next wave of innovation by leveraging the tools and techniques within the industry. There is still so much work to do, however, the advent of advances in technology in such a short space of time provides a significant opportunity for businesses to influence the outcomes of conversations, and deliver new levels of innovation and engagement to both customers and brands.

The key to being successful in today’s society and the world is to understand the user and understand how we communicate with them. Artificial Intelligence and NLP are helping to automate and process communication as a language so we can be more effective and efficient in our communication with customers.

P.S.

Parting thoughts

As someone who studied topics like language acquisition, sociolinguistics, syntax and semantics for by BA, covering things like language and meaning and how our brains learn the concept of language, I was particularly interested in understanding how this process compares to how machines ‘acquire’ language. While I didn’t go down the route of comparison, I did find there to be a lot of similarities. Perhaps this is because it’s the ‘easiest’ way to learn something like language. Or it could also be that this is the only way we know to learn it and teach it.

Brief history of NLP

While doing my research for this article, I put together this timeline of major milestones of NLP. If you found this topic to be interesting, think of it as further reading and signposting of where to go next in this rabbit hole. These are not all, but what I found to be quite interesting.

1939 — speech synthesis — Bell labs the Voder and “she saw me” experiment if you only have a few minutes — watch this
1950 — Turing test
1952 — speech recognition — Bell Labs — Audry
1960’s — text — MIT — Eliza
1962 — speech recognition — IBM — 16 words
1971 — speech recognition — DARPA and Harpy from Carnegie-Mellon University — 1000+ words
2006 — NLP — IBM Watson — (winning Jeopardy against the best human players in February 2011)
2011 — NLU + NLG — Apple — Siri
2014 — NLU + NLG — Microsoft with Cortana and Amazon with Alexa
2016 — NLU + NLG — Google Assistant
2017 — text — Facebook — Bob + Alice
2020 — OpenAI’s GPT3. this stuff is mindblowing