How Tensors Advance Human Technology
An exploration of mathematics past and present
What is a tensor?
Simply put, a tensor is a multi-dimensional matrix, or array of numbers. More generally, tensor analysis spans a set of mathematical tools used to quantify and model a diversity of systems spanning physics, psychology, evolutionary biology, and modern artificial intelligence.
Like many, I was curious to understand more about what tensors have to do with machine learning.
How are tensors used in modern approaches to artificial intelligence and machine learning?
Brief History of Tensor Analysis
I first came across tensors in the form of continuum mechanics while studying engineering.
Let me tell you a fascinating story about mathematics. A mathematical exploration of the meaning and origins of tensors.
This story tells some of the fascinating applications of tensor analysis — ones that have helped shape and revolutionize humanity’s technological progress.
In the field of solid mechanics, tensors are used to quantify forces throughout a material.
Let’s take, for instance, consider a piece of cylindrical chalk used to write on blackboards in school. Imagine pulling, bending, and twisting this solid and predicting when and how it would break.
Applying a combination of external forces on a simple object (like chalk) creates an array of stresses throughout the material.
These stresses can be modeled using a set of 9 stress forces, exerted on virtually every particle within the object.
The stresses throughout a solid can be modeled at every point using a a tensor known as the Cauchy stress tensor.
This type of analysis yields some curious consequences.
For instance, it turns out that if you apply a purely torsional force (twist) a piece of chalk until it breaks, then it will consistently snap along a 45-degree angle.
Whereas bending the chalk will break chalk along a 90-degree angle.
Materials such as chalk and metals often break predictably depending on:
- the organization of their molecules, and
- the (external and internal) forces applied to the object
Stress forces, like in the chalk example, can be modeled by tensors in an area of mathematics called continuum mechanics.
Understanding when things will break helps us design things that won’t break unexpectedly.
Advancements in Technology
The development of mathematics to model physical forces has allowed humans to make tremendous leaps in technological progress.
Let’s consider that for most of civilization, the tallest structures created by human were piles of stones — pyramids and eventually cathedrals.
The tensor mathematics used to model stresses in materials such as steel was developed in the mid 1800’s by Augustin-Louis Cauchy. These tensor equations helped advance humanity’s engineering from tall piles of stones to the Eiffel Tower, and far beyond.
In fact, for his contributions, Cauchy’s name is one of the seventy-two names of scientists, engineers, and mathematicians engraved near the base of the Eiffel Tower.
The consequences of this can be staggering if you think that every time you drive over a large bridge or ride a tall elevator, that your life relies (at least in some small way) on these equations of mathematical physics and continuum mechanics. Engineering design calculations apply principles of tensors to ensure that the roof doesn’t fall in on our heads.
Theory of Relativity
After the technological progress of the nineteenth century, another mathematician named Hermann Minkowski, took this direction of mathematics further by describing Einstein’s theory of Special Relativity in terms of a tensor field representing a four-dimensional manifold known as spacetime.
In General Relativity, the Riemann Curvature Tensor associates a tensor to each point in space used to describe the bending of spacetime due to gravity.
Tensors also appear in the study of electromagnetism — Minkowski simplified Maxwell’s equations (four vector calculus equations) into two tensor field equations.
There are plenty of other examples of tensor mathematics spanning from fluid mechanics, to quantum mechanics.
It’s fascinating to think that tensors appear in both the mathematics used to model spacetime (time-dilation due to gravity) as well as electromagnetism. This has laid the foundations for further leaps in technology, which have allowed humans to create AM/FM radio, satellite communications (such as GPS) and hopefully (one day) interstellar space travel.
So what do tensors mean in the context of AI and machine learning? Particularly deep learning.
Tensor mathematics comes up in a class of machine learning models that involve hidden variables.
In these models, the latent (hidden) state of data cannot be observed directly, but instead, their effects are indirectly observed in correlated variables.
For instance, let’s consider Gaussian mixture models — where random samples are drawn from several normal distributions (or bell curves) with unknown parameters. The challenge becomes estimating the parameters of the probability distributions that generated the data samples.
Traditionally we would solve for these unknown parameters using iterative approaches such as the expectation–maximization algorithm. Taking a random initial estimate of the parameters, the algorithm would iterate repeatedly to converge on a solution.
However, for a set of problems with many contributing factors, these iterative techniques become prohibitive and computationally expensive.
Enter tensor decomposition.
The modern approach to estimating these unknown parameters, known as the method of moments, was originally developed by a British biostatistician, Karl Pearson, who outlined the technique in his 1894 paper, Contributions to the Mathematical Theory of Evolution.
Pearson recognized that certain processes in nature were the result of multiple random factors. He remarks that the statistical distribution of various “biological, sociological, and economic measurements” are not a result of a single Gaussian, but rather “It may happen that we have a mixture of 2,3,…,n homogeneous groups, each of which deviates about its own mean symmetrically and in a manner represented with sufficient accuracy by the normal curve.“
What does this have to do with artificial intelligence? It turns out that Pearson was on the right path over 120 years ago. Solving for unknown statistical parameters using the method of moments approach has lead to breakthroughs in computation of various models in AI research.
A key research paper on the topic, Tensor Decompositions for Learning Latent Variable Models, was published in 2014. One of the authors, Anandkumar wrote the following in a discussion on Quora:
The key idea is to consider the tensors which are derived from multivariate moments of the observed data.
(Think of multivariate moments as a statistical fingerprint that helps untangle how the random data was created, and thereby estimate the hidden variables.)
We show that decomposing these tensors allows us to consistently learn the parameters of a wide range of latent variable models …
It is crucial to use higher order moments (and therefore tensor analysis), since it can be shown that for many of these models, just pairwise moments (and therefore matrix analysis) is not sufficient to learn the model parameters.
Standard iterative approaches, such as expectation-minimization mentioned earlier, tend to be computationally prohibitive. The breakthrough of these statistical tensor analysis techniques arises when we consider the computational efficiency of this approach.
The process of estimating model parameters is reduced to a problem of extracting a decomposition of a symmetric tensor derived from the moments. Furthermore, these decomposition problems tend to be amenable to efficient methods, such as gradient descent and the power iteration method.
Consequently, these computationally-efficient methods lead to tremendous advancements in the field of machine learning enabling researchers to build much larger models in order to tackle far more complex problems.
Admittedly, current machine learning applications pale in comparison to the complexity of a single human brain. However with the exponential pace of improvements in technology and advancements in algorithmic techniques, some believe that a technological singularity may occur within decades.
Recent years have already yielded exciting results in fields such as image recognition by detecting diabetic retinopathy in human eyes, brain computer interfacing (BCI) by identifying hand motions from EEG scans, and a plethora of applications in voice recognition, facial recognition, text analysis, and so forth.
Google Research had a remarkable result where an unsupervised learning algorithm, trained from still frames from unlabeled YouTube videos, learned the visual concept of a cat. They managed to scale their computation across 16,000 CPU cores to train a model with over 1 billion connections!
The exciting feature of these algorithms (beyond the ability to identify cats) is their scalability. Rather than having computation limited to a single sequential calculation, modern approaches allow the computation to be shared among many, many processors.
What applications will arrive with these mathematical and computational advancements? What are the limitations of tensor decomposition?
This feels like, in many ways, like an Eiffel Tower moment in history where all past efforts to build large-scale AI has been the equivalent of stacking large piles of stones. Now we could be developing the tools to make the next technological leap forward.
Links & Additional Resources
- Tensor Decompositions for Learning Latent Variable Models
(A. Anandkumar, R. Ge, D. Hsu, S.M. Kakade, M. Telgarsky) http://arxiv.org/abs/1210.7559
- Tensor Methods in Machine Learning (R. Ge)
- Pearson’s Polynomial (M. Hardt)
- Contributions to the Mathematical Theory of Evolution (K. Pearson)
- Tensors: Stress, Strain and Elasticity
- How is tensor analysis applied to machine learning (A. Anandkumar)
- Tight Bounds for Learning a Mixture of Two Gaussians
(M. Hardt, E. Price) http://arxiv.org/abs/1404.4997
- Analyzing Tensor Power Method Dynamics in Overcomplete Regime
(A. Anandkumar, R. Ge, M. Janzamin) https://arxiv.org/abs/1411.1488