Large Language Model Mathematics explained Through a Road Trip

Thierry Damiba
3 min readDec 12, 2023

--

The artistic representation of the concept ‘Understanding the mathematics of Large Language Models (LLMs)’, visualized as a metaphorical journey from New York City to Los Angeles. This image symbolizes the complexity and interconnectedness of mathematical concepts in LLMs, intertwined with elements of a cross-country journey.

Understanding the mathematics of Large Language Models (LLMs) can be likened to planning a cross-country journey from New York City to Los Angeles. We can apply this analogy to key mathematical concepts, transforming them into familiar ideas.

On my initial day exploring LLMs, I concentrated on their mathematical foundations. This involved delving into Linear Algebra, Statistics, Probability, and Calculus. The features of LLM’s key mathematical concepts interweave with each other seamlessly, and build upon themselves. You can compare learning the math of LLMs to learning letters before words when you are starting to read and write. We begin with Linear Algebra, transforming data into lines, and end with Calculus, calculating the area beneath curves.

Rather than just memorizing formulas and symbols, I prefer to conceptualize these mathematical principles through familiar analogies. I liken the essential mathematical concepts to planning a cross-country road trip. Adopting this approach offers a more instinctive grasp of the fundamental ideas underpinning the mathematics of LLMs.

Linear Algebra: Charting the Course

Linear algebra in LLMs can be likened to plotting a journey on a map. Consider the map as a two-dimensional plane, with the North-South axis representing the x axis and East-West as y axis. In linear algebra “i-hat” and “j-hat” are unit vectors for horizontal and vertical movement, similar to a mile east-west or north-south.

Using New York City as a starting point in our analogy, linear algebra enables us to draw a line to Los Angeles. We can also draw a line to any city in the world, symbolizing paths or connections. In LLMs, linear algebra illustrates how data travels through a neural network. Each layer of the network acts like a city stop, altering the data as it moves through, similar to how a route changes across different cities. Just as a trip from NYC to LA involves stopping at numerous cities, data passes through many network layers before yielding the final output.

Probability and Statistics: Anticipating Variables

On a road trip, we must anticipate variables such as traffic, weather, and bathroom breaks. A traveler might estimate a higher likelihood of encountering traffic during rush hour on a major highway. Probability and statistics in LLMs involve anticipating and accounting for the variability and uncertainty in language. These models use probabilistic methods to predict the likelihood of certain words or phrases occurring in a given context.

In language models, “knob” could indicate a door handle in one context and a difficult person in another. If the word “turn” precedes the word “knob” in the text, the model can estimate a higher likelihood that the word “knob” refers to a way of opening a door.

Calculus: Fuel Management Strategy

Calculus is akin to planning our fuel stops efficiently. It’s about calculating when and how often to refuel — a task that requires understanding rates of change and optimization. How far are we going, how much fuel capacity does our car have, and how much of our drive will be in cities vs highways?

In LLMs, calculus is used to optimize the model’s performance, ensuring that it refuels its knowledge at the right intervals, based on the data it processes. If we don’t stop enough, we risk running out of gas on the road and having to call AAA to bail us out. If we stop too often, the duration of the trip is extended unnecessarily. This optimization helps in avoiding both underfitting (not learning enough) and overfitting (learning too much from limited data).

A Journey Through Data Landscapes

The road trip analogy underscores the multifaceted nature of understanding the math behind LLMs. It’s not a straight line journey; it involves navigating through complex mathematical landscapes, where linear algebra lays out the path, probability and statistics help in dealing with uncertainties, and calculus ensures optimal progression.

--

--

Thierry Damiba

Learning in Public about LLMs on modest hardware & Investigating how LLMs help and hurt emerging markets