Simply Put, A.I is Probability and Statistics

Sam Bobo
Speaking Artificially
7 min readAug 22, 2023
Imagined by Bing Image Creator powered by DALL-E

Nearly every conversation I’ve held with people — friends, family, neighbors, new acquaintances, etc — around the topic of Artificial Intelligence, upon learning that I’ve spend my entire professional career in the space, gravitate towards the topic de jour, Generative AI (specifically the acclaimed ChatGPT) and the lingering question present in all knowledge worker’s minds, “Will AI replace my job?”

Simply put — NO!

Understanding why Artificial Intelligence will not replace our jobs requires a fundamental understanding of Artificial Intelligence technology. Let me start:

Note: As always, my focus continues to highlight Conversational / Natural Language Processing based capabilities

Before delving into Natural Language Processing, we first need to understand how Machine Learning works more broadly. In order to achieve that, I encourage readers to close their eyes and recall sitting in their high school mathematics classroom. Imaging your teacher at the chalkboard or whiteboard reviewing the concept of linear regression. They draw an x-axis and y-axis, place a large quantity of data point dots on the graph, and intentionally does so in a manner that the class can easily identify some relationship. The teacher then draws a line roughly through the series of data points and voilà, a line-of-best-fit. Thereafter, that line gets a function, y = f(x), and now, with any new data point on the x-axis, we can estimate the value of y.

Achieving that line of best fit, mathematically, is no easy feat. There exist a lot of trial and error to place the line exactly in the correct spot. Why? Well, shouldn’t the line try to make a prediction as close as possible to what is observed in reality? That requires a lot of trial and error, particularly to minimize that error.

Machine Learning, at it’s core, is exactly that. Annotated data — annotated meaning that the x and y inputs are clearly defined — is inputted into the model, forming what is called the “ground truth.” A computer loop through and iterates many times through a process called an algorithm, which draws the “line-of-best-fit,” measures the error, and keeps going until it reaches the absolute lowest error it could find. That output is called a model.

Enough with the mathematics class, let’s apply our lesson to Machine Learning. I will draw the analogy. Instead of random (x,y) coordinates as data points, we will consider our x-input to be a sentence and the y-output to be an intent. Over time, you input enough sample sentences into the “graph” and enough tagging of the intended output, the intent, and a theoretical line-of-best-fit can be drawn, however, instead of numerical coordinates, it’s using natural language. Now, we’ve created a Natural Language Understanding model.

We can then use this Natural Language Understanding model to classify other statements someone says, say in a chatbot scenario, and can classify what the designer thinks the end user is requesting.

So we’ve covered some fundamentals of Machine Learning and applied that knowledge to glean some insight into Natural Language Processing, however, the aforementioned explanation was about numbers, not language. So how does a computer understand human language?

Simply put, probability and statistics!

Early in Natural Language Processing, a common approach used to model language was called the n-gram. An n-gram name contains a bit of mathematical notation, whereby the n represents a number. Take a sentence:

“The dog jumped over the moon.”

A computer would start to learn language by first implementing a bi-gram, or set of two words together: “The dog” “dog jumped” “jumped over” “over the” “the moon” as tokens. The sentence could be modeled as a tri-gram or set of three words such as “The dog jumped” “dog jumped over” “jumped over the” “over the moon.”

In effect, the computer is building a corpora of knowledge and trying to glean insight into the probability of words coming after one another. In our example, if the computer see’s “dog” what is the next likely word or set of words that would follow?

The above technique of labeling inputs and outputs to form a model, in the machine learning world, is called supervised machine learning. The name “supervised” entails that a human trained the system and provided examples of input and corresponding outputs. This paradigm of machine learning and natural language processing was the basis for many of the first two eras of Conversational AI.

Contrast supervised machine learning with unsupervised machine learning, a method whereby the computer algorithmically “learns” and draws patters within the data without the aid of a human labeling the intended outcome as ground truth.

Take, for example, the concept of cluster analysis used within marketing. The idea behind cluster analysis is to unveil a set of n-groupings or clusters that contain many similarities to capitalize on.

Take a group of 100 people. These 100 people were selected randomly off the street across a diverse set of cities, ranging from urban to rural and spanning all walks of life from religious preferences, sexual preferences, gender identifies, race, country of origin, etc. The objective is simple, separate the 100 people into three distinct groups whereby each group has a common set of features about them. The hard part… the features are not defined. Think about this, as a human, how would you go about creating those clusters? Would you try by birthday? Would you cluster by height? The problem is daunting, right?

With computers, iterations can happen rapidly in a fraction of seconds, and moreover, the computer can uncover relationships among people that humans could not see (or would take a significantly long time to uncover). That has always been the holy grail and promise of Artificial Intelligence — feed a computer a million images of breast cancer and those without so it can detect anomalies and see what humans could not and earlier then normal as to increase the odds of survival. I digress.

Over time, compute power and memory increased drastically, running in parallel with intelligent researchers and computer scientists working on developing algorithms that utilized compute more efficiently, completing the work quicker.

Enter Large Language Models, or “Foundation Models” and Generative AI.

Imagine, instead of feeding a computer data bout 100 people, the computer ingests a library of books from a local municipality. Nah, lets strive bigger… all libraries across the country. Nah, even larger, the entire internet! While exaggerated for illustrative purposes, the point exists. Large Language Models exist by “reading” a massive amount of information from the internet, say like the entire set of Wikipedia entries, and starts to form relationships among words similar to the n-grams described earlier. The technique is called vectorization and that is outside of the scope of this blog post, rather, I hope that the readers understand that its simply taking words and representing them in a manner that a computer can ingest.

Using an algorithm called “transformers” with attention mechanisms, in conjunction with the boom of cloud computing, creation and implementation of Graphical Processing Units (GPUs), Generative AI emerged into existence.

Effectivelly, humans developed a method of unsupervised machine learning for computers to “understand” natural language. When ingesting trillions of sentences, the computer can naturally “interpret” and draw connections among words.

While I will be oversimplifying here and not getting into the details and nuances of Generative AI, the point remains the same:

ITS ALL PROBABILITY AND STATISTICS!

The computer might have some inkling of the concept of a “chair” for example and can define it. Even larger, the model could generate a cover letter for a potential applicant since, within its training data, it’s seen a number of cover letters and can utilize probability and statistics to predict future words in a similar format because its underlying model is so vast.

Yes there is a level of allure and magic happening under the hood mathematically and computationally that's extremely fascinating; and yes, for those not privy to the interworking of Artificial Intelligence can view this as scary, however, we need to root ourselves in this process of probability and statistics.

Generative AI and other Artificial Intelligence systems, like all technologies, will displace some jobs, however, they will also create new ones. Moreover, AI will augment human intelligence and many knowledge worker jobs so long as those workers are equipped to partner with AI systems. Continue to use Generative AI as a tool to aid in solving the Blank Canvas Problem and iterate thereafter. Humans should always be in control of the final output and should not blindly rely on the output of a generative model. After all, the computer understands the “concept” of a mortgage pre-approval letter but not to the level of the expert with finesse and craft.

I am a massive advocate for Artificial Intelligence and am in awe of the space and its development. My goal is to convince you that AI is a tool and not a replacement and while “scary” at times, its simply probability and statistics! Thank you for reading!

--

--

Sam Bobo
Speaking Artificially

Product Manager of Artificial Intelligence, Conversational AI, and Enterprise Transformation | Former IBM Watson | https://www.linkedin.com/in/sambobo/