Natural Language Processing: How Computers Learn to Write
As of 150,000 years ago, humans have creatively come up with new ways to communicate with one another. More recently, humans have developed complex languages, that require different sets of rules to make any kind of sense.
One of the most popular languages, English (second place after Mandarin) is being used to train machine learning models to create their own text!
What is Natural Language Processing? 🤔
To make it simple, NLP is a form of Artificial Intelligence that is inside the field of machine learning, that allows for a computer to understand, manipulate, analyze, and potentially generate human language.
Natural Language Processing (or NLP) is currently more focused on being able to mirror the English language. The English language’s rules include sentence structure, grammar, syntax, etc that allow for proper communication.
There are several other forms of NLP that work hard to translate sentences in other languages to English or to, another language. Although, we aren’t going to be discussing this certain kind of NLP, its important to understand the different applications that NLP undergoes.
Todays Applications: 📈
There are many examples of NLPs that we use today and many that you might not know about.
GOOGLE! Yes, Google. When you type things into the search bar, press enter, and immediately get billions of results in less than a second, it’s mostly done through NLP. This machine learning model allows Google to see and search through hundreds of sites, news articles, etc.
2. Another interesting way that Google has integrated NLP is by using it in their Speech Recognition, two examples are Google Web speech and Vocalware.
Although today’s applications are very intriguing, I want to speak more about what the future of NLP can become and how we’re slowly but surely making progress in this field.
Future/Progressing Applications: 🤖
OpenAI has created one of the most fascinating NLP models to date!
It is a text generative ML model named GPT-2 , short for Generative Pre-trained Transformer 2.
The second version of this model is 10x better than the last one, “GPT”. GPT-2 is said to have one of the best training algorithms because instead of using specific domains (websites and books online), it has already been trained with 8 million web pages.
The model uses advances in tasks like question answering, reading comprehension, summarization, and translation, GPT-2 learns from the text you give it rather than the text found online. The only objective is to figure out what word comes next, not what the whole sentence should be.
OpenAI claims that its model is “like a chameleon” and that it adapts to the conditioning of the text that is given to it.
How does it work?
*Before I move on I just want to quickly mention that this model (GPT-2) is not available for public use because of concerns of malicious use, although a smaller version (GPT-2 small) is. *
First, I’ll start with an example:
When we input this sentence into the model:
The dog on the ship ran 🐶
The GPT-2 small model generated the phrase:
“The dog on the ship ran off, and the dog was found by the crew.”
If you chose to change the word dog to motor:
The motor on the ship ran 🚢
The model has now created the phrase:
“The motor on the ship ran at a speed of about 100 miles per hour.”
- This example demonstrates that GPT-2 small understands the difference between a dog running, and a ship running.
You may be asking yourself, how does this work?
We should first talk about two very important terms that explain how a computer can actually “read.” TF-IDF stands for Term-Frequency Inverse Document Frequency and what this helps do is find out how often and how important a term or word occurs. So we can go through them both:
TF: Term Frequency measures how frequently a term occurs in a document. Every document is a different length, so when looking for terms, It is how many times the term appears, divided by how many total terms to find the term frequency.
TF(t) = (Number of times term t appears in a document) / (Total number of terms in the document)
IDF: Inverse Document Frequency measures how important a term is. Certain times such as “is”, “of”, and “that”, may appear a lot of times but have only a bit of importance, so we use the following equation to find how often the term pops up, and its “importance”;
IDF(t) = log_e(Total number of documents / Number of documents with term t in it).
Going back to GPT-2:
GPT-2 is what you call an attention model, which means that it reads the word before it, to predict what word comes next.
When the model is guessing the next word after ran, the model pays close attention to the word dog. GPT-2 uses linguistic properties and with which the model is able to know what words come next.
GPT-2 has 12 layers, each layer has 12 independent “attention” mechanisms, called “heads”; this means that there are approximately 144 distinct patterns that are being used to find out which is the correct next word.
An example of what a “head” looks like:
Each “head” focuses on something different, the word before, the word after, and in this way, it learns what word can be used to complete the sentence.
The ML model, GPT-2 is based on the transformer concept, which is first mentioned in Google’s paper “Attention is all you need.”
GPT-2 works with a term called “auto-regression” which means that after each section is produced, that section is added to the sequence of inputs after this happens, the new section is made into an input, and this process is repeated.
What does all this mean? 😲
One of the most popular examples of the full power of GPT-2 can be seen here:
Knowing that this story is completely fake, is honestly kind of scary but mind-blowing as well. It is kind of hard to tell exactly what this specific Natural Language Processing model is capable of, but what we do know is that it’s going to make a huge impact and has taught us a lot about how increasingly possible it is every day for a computer to be “human.”
As the Chinese thought experiment suggests: We will measure the intelligence of a computer or machine, on how well it can convince us how smart it is, not based on how smart the computer or machine actually is.
Key Takeaways- 🤯
- Google is making huge advances when it comes to NLP technologies
- OpenAI’s GPT-2 ML Model is one of the best out there
- The GPT-2 ML model is so advanced it is not being used for public use because it may be dangerous.
- GPT-2 Learns from a database to deliver sentences
- NLPs are currently more focused on the English language for NLP
Social Media: 👤
If you enjoyed this article or had any questions or concerns please contact me at dexteralxbarahona@gmail.com
Connect with me on Linkedin at https://www.linkedin.com/in/dexter-barahona-723314194
Instagram: DexterBarahona