Visualizing Higher Dimensions
This blog has offered some recent musings on the dynamics of high dimensions, particularly those aspects related to the loss manifold of a neural network, which in modern practice have begun to be implemented with parameterizations in the billions. The descriptive evaluation of most high dimensional geometries is still an open question for mathematicians. In machine learning, navigating the geometry of a loss manifold involves following a path of gradient signal in a series of updating steps known as backpropagation, which only reveals the geometry of a curved 1 dimensional path tracking towards loss manifold minima as opposed to anything in the aggregate. We have thus tried to look elsewhere and reason about meta properties of loss manifold high dimensional geometry that can be inferred independent of fine grained details.
Hyperspheres, also known as n-spheres, are sort of a special case for high dimensioned geometries. There are established formulas to derive descriptive metrics like surface area and volume at arbitrary dimensions, and other properties like optimal packing densities at various lower dimensions are gradually being formalized through the years. Hypercubes are also better understood than most shapes, and relating geometries of spheres to cubes can help to illustrate some distinctions that may be present.
The purpose of this essay will be to survey some of the fundamentals of high dimensional geometry that are known today, and we’ll see that the extent of known knowns is still somewhat of a short list. Hopefully through review a reader may begin to achieve the beginnings of intuition of what it means to talk about geometry in high dimensions, with the recognition that such intuition will likely be of a limited nature owing to how many gaps in our descriptions yet remain. We’ll provide extra attention to hyperspheres owing to their known properties, but a reader should keep in mind that there may be some properties of spheres that don’t generalize to other geometries or vice versa, the n-sphere and n-cube are a starting point.
Part 1 — Visualize
People are visual creatures, and when reasoning about about high dimensions, one of the first considerations will likely be an attempt to imagine a visualization using metaphors from our shared three dimensional surroundings. Unfortunately, there are some elements of higher dimension that just don’t directly translate to 3D. Consider the difference between a 0D point, 1D line, 2D square, and 3D cube, and how each tier results in a increased number of vertices (1/2/4/8), edges (0/1/4/12), and faces (0/0/1/6). We can see that some of these elements are climbing in a non-linear manner, and so any attempt to visualize a progression may quickly become overwhelmed in a maze of features. As used here, a vertex is like a point intersection between edges, an edge is like a line intersection between planes, and a face is like a plane intersection between 3 dimensional elements. We can reasonably infer that at higher dimensions a similar progression of lower dimensioned features may emerge, for example a tesseract (a 4D hypercube) may have some set of three dimensional cube elements as an equivalents to a face’s plane (8 of them in fact), and a 5-cube may have some set of tesseract features. However since visualizing much beyond a tesseract is itself a challenge, that observation may not be helpful for our purposes.
Another way to visualize hypercubes could be to set aside firm geometric arrangement and instead consider the objects as graphs, as in simply a set of vertices and edges that can be arranged in arbitrary fashion neglecting arrangement of faces or other higher dimensional geometries. The Wolfram language [Wolfram Research, 2022] has the helpful
HypercubeGraph function to apply just that, allowing us to visualize at least the graph representation in higher dimensions, albeit as we’ll see with such visualization quickly getting lost in a forest of features.
Of course mathematics can circumvent the visualization obstacles by considering high dimensional points as tensors, which then opens the door to linear algebra operations between points that machine learning practice has gotten so good at. A matrix is a special case of a square tensor, and as we climb to higher dimensions although in some cases there may be an inherent structure of a geometry making it suitable for representation as esoteric tensor shapes and operation aggregations (e.g. for the RGB pixel parallel planes of a convolutional network) we can always simply visualize the parameterization as a single large matrix with each parameter having a representation on the x and y axis, and the omission of any exotic tensor aggregations from point wise operations being realized instead as invisible constraints on degrees of freedom or joint dependencies of the surrounding variables. (If we want to retain those dependencies, tensor networks allow the contraction of surrounding tensors to a common matrix in such a suitable manner.) In the case of a hypercube no esoteric tensors are required, a matrix point representation is sufficient.
The matrix representation can also be useful for the graph framing. Consider that a hypercube graph can be converted to an “adjacency matrix”, which is a sparse matrix relating each pair of vertices to identify cases of adjacency from interconnectedness.
The adjacency matrix on its own may be difficult to extract a conceptualization of what is being represented. A further translation to a matrix plot may help to identify trends. An interesting property that becomes apparent from the trend shown in [Fig 4] is that the adjacency count of vertices appears to climb through dimension counts in a linear fashion, and further that the geometry of interconnectedness has its own form of emergence that extend patterns through progression.
Most of the efforts to facilitate accurate visualization of full geometry of the hypercube have thus limited focus to merely climbing a single rung on the dimensional ladder, meaning inspecting the tesseract as an extension through the 4th dimension. By limiting our focus to this one added dimension, we have the ability to project slices down to a lower dimension [Noll, 1967], sort of like how when you walk outside in the afternoon you project a shadow along the sidewalk. This type of projection can even be performed in a manner that retains fine-grained detail of the excerpts, preserving ratios and angles as a collection of 3D slices of a 4D configuration being rotated through a 4D space are projected to a 2D stereographic image. This form of projection would probably be more helpful in the context of a set of progressions through measured rotations, or even better as a movie (e.g. [Leios Labs, 2016]) with the time axis matched to traversals through a rotation axis.
Disappointingly, the natural extension of attempting to project slices of a 5D hypercube (a cube-within-a-cube within a cube-within-a-cube) as progressing through 4D->3D->2D has not been found to be very interpretable when presented in a similar fashion.
Armed with this background of projection visualization, we can now begin to consider hyperspheres of dimensions greater than 3 in lieu of hypercubes, again devoting focus to the 4 dimensional case for purposes of interpretability. In this case we’re handicapped though in that our spheres will not contain vertices or edges to support the image. To circumvent, another form of projection specific to hypersphere for considering 4D images mapped to a 2D plane is known as a Hopf fibration [Hopf, 1931], which may map a 3D surface area to a 2D plane, or when applied to a 4-sphere may translate to a collection of fibers mapped to a ball. Here is a Hopf fibration projection of a 3D sphere [Fig 6] adjacent to Hopf fibration map of a 4-sphere [Fig 7], each as produced by the Hopf library [Walczyk, 2022]. In the first case, each “fiber” is represented as a line in 2D space [Fig 6], in the second case, each fiber is represented as a point in 3D space [Fig 7]. Clearly hyperspheres demonstrate geometries that are non-trivial.
Perhaps a more illuminating 4-sphere visualization aid we found through explorations was an animation [Belmonte, 2022], utilizing an extension of the Hopf convention known as a Hopf map to translate singular points on a 3D sphere to circles on a corresponding 4-sphere, starting at a sphere’s equator followed by rotated inspection points around various axes, which demonstrated that there are torus-like aspects of the shape realized by the 4th dimension [Fig 8].
These demonstrations of hypercubes and hyperspheres do have some distinctions beyond trading vertices and edges for curves. As we’ve previously noted, hyperspheres have an asymptotic zero volume and zero surface area convergence at high dimensions. This is not present for the unit hypercube, which we believe remain throughout dimensional adjustment at a volume of one (1^(n+1) = 1), although the surface area will have transience. We do not know if surface area of a hypercube through dimensional adjustment has a peak in a manner consistent with the hypersphere.
The lectures of [Hamming, 1995] noted that for hyperspheres in high dimensions, the majority of volume will lay on the surface of the shape, suggesting that if we are finding parallels between a neural network loss manifold and a high dimensioned hypersphere, we can expect that a gradient signal updating step will likely tend to update a majority, or at least sizable portion, of weights collectively as opposed to singular weights in isolation. With the volume lying near the surface and trending towards zero, that aligns with what we already noted that in the infinite-dimensional case all of the mass of the hypersphere being concentrated in the center.
As an aside, one could possibly interpret that a neural network interpolation threshold mirroring the hypersphere volumetric peak and decline through dimensional adjustment could be considered empirical evidence of the Poincaré conjecture [Poincaré, 1904], which has been generalized to the topology related statement that every compact-manifold is homotopy-equivalent to the n-sphere iff it is homeomorphic to the n-sphere [Weisstein, 2022], such empirical evidence aligning with what has previously been considered the proof in certain geometries by Thurston’s geometrization conjecture [Thurston, 1982].
Part 2 — Imagine
Hypercubes and spheres aren’t the only shapes possible at high dimension, at a minimum their study has probably originated from their being the two simplest examples for cases of continuous verses disjointed geometries, sort of like how in machine learning tanh and ReLU activation functions are considered two different regimes of smooth or disjointed (the latter of which opened the door to deep learning as an aside). Even though tanh and ReLU are different regimes, we previously noted that a 3 layer ReLU network is capable of modeling a function generated by a smaller depth smooth activation. These type of equivalency considerations are also studied in high geometries for geometric figures like spheres and cubes.
One of the ways to consider geometries of exotic variety is by considering how tightly a set can be packed together. The artist M.C. Escher (and the physicist Roger Penrose before him) was known for tiling canvases with interlocking figures of corresponding shapes fit together like puzzle pieces, like white and black birds crossing paths in the horizon fading into cropland squares. A natural question to ask for these higher dimensions is whether there exist naturally occurring Penrose tilings whose sets of common shapes can interlock into a comprehensive canvas. We don’t know if this has been proven for hypercubes, but it is reasonable to suspect that it might be, after all each side has a commonly shaped face and each edge is of equivalent length.
High dimensional shapes that carry this property of common faces on each side, like that set of five unique dice you may have played dungeons and dragons with in middle school, are rare enough that their prevalence has been catalogued through dimensions, and it turns out that there is a very interesting pattern that emerges when counting the number of such unique figures that exist in each dimension count.
The 1D line is a trivial case, and upon inspection so is the infinite 2D case which is just polygons of any number of sides, but the climbing from 5 to 6 unique shapes when going from 3D to 4D and then settling on 3 shapes for all other dimension counts suggests some kind of fundamental uniformity that arises across the range of high dimensions not present in those that we experience. It turns out that the 3 unique shape set found in the higher dimensions are actually in all cases the same shapes extended through dimensions, similar to how we extended a square to a cube to a tesseract. Those three shapes are the simplex (every vertex connected to every other), the hypercube (same that we already know), and the dual cube (derived from a hypercube by inverting between the vertices and faces such that there are vertices in place of faces and vice versa).
The hypersphere also has been studied for tiling applications, although with the recognition that perfect stacking is not possible without some degree of gaps in between instances. This type of property can easily be visualized in the 2D case to illustrate. [Fig 11] shows that a 2-sphere can be stacked in a grid pattern and that a more efficient packing, with smaller gaps, is available in the hexagonal packing, which for the 3D case is similar to the arrangement you’ll see when buying oranges from the grocery store.
The question of optimal packing density has been long studied, often framed as how many adjacent spheres can touch, as we can expect the more spheres that are touching the smaller the gaps. [Fig 11] demonstrates a grid packing capable of a center 2-sphere touching 4 adjacent, or the optimal hexagonal capable of 5 adjacent touching. These optimal values are known at low dimensions, but in the intermediate range we may only know a range for what that optimal density may be and at high dimensions these are a mystery. A high profile discovery recently demonstrated that an exact known optimal packing of 240 adjacent can be proven for 8D [Viazovska, 2016], which was soon followed by a proof of 196,560 for 24D [Cohn, 2016]. In both cases an exact number of kissing hypersphere neighbors is known, and the 24 dimensional arrangement, known as the Leech lattice, is now of high interest in the exciting world of theoretical mathematics.
A really interesting property of hyperspheres that can be demonstrated by studying sphere packing, and one that might even be relevant to neural networks, is that it appears that increasing dimensions translates to geometries that are spiky. For example, the 8D optimal packing noted above results in gaps between spheres of around 75% of available space, but when we get up to the 24 dimensional case the gaps are closer to 99.9%. One thought experiment that can illustrate this spikiness is to consider the case where we have an n-cube packed with four n-spheres touching each face, and then a smaller sphere is fit in the center of the hypercube touching the edges of each of the surrounding hyperspheres. This can be easily visualized in the 2D or 3D case.
The interesting finding of this thought experiment is that as we start to progress this arrangement through added dimensions, we see initially the center sphere is smaller than those surrounding, at some point the center sphere becomes larger, and eventually the center hypersphere will actually protrude from the boundaries of the hypercube [Hamming, 1996].
We speculate this spikiness could be relevant to a neural network loss manifold in that as a training path approaches a minima, the contraction of possible surrounding weight updates for a reduced loss could have the effect of the path entering a form of tendril in the loss manifold, and that the surrounding volume of this tendril may be decreased from increasing overparameterization.
Trying to visualize the literal geometry of loss manifolds with billions of parameters is a fool’s errand. We can barely understand 5 dimensions, let alone 50. It helps to consider things in an abstract sense. To find shapes in nature of the scale needed for a metaphor, we would need more strands of grain than there are in a vast field, but we can still use this for our purpose. Consider that each parameter in a neural network is like a stalk of grain ready for harvest, 1 dimensional from a distance but when you take account for correlations and constraints of the surrounding stalks perhaps having jaggedness and a pack of seeds near the global minima. A wind turbine rises far above the field slowly rotating with a slight whooshing sound, the stalks swaying gently in the breeze, the electric hum of insects scurrying by, the farmer and his son making their way slowly towards home, one step at a time, where dinner and smiling faces will be waiting.
Belmonte, Nico. Hopf Fibrations. (2022) http://philogb.github.io/page/hopf/#
Cohn, H., Kumar, A., Miller, S. D., Radchenko, D., and Viazovska, M., The sphere packing problem in dimension 24, arXiv e-prints, (2016). https://arxiv.org/abs/1603.06518
Hamming, Richard W. The Art of Doing Science and Engineering: Learning to Learn. Stripe Press (1996)
Hopf, H. Über die Abbildungen der dreidimensionalen Sphäre auf die Kugelfläche. Math. Ann. 104, 637–665 (1931). https://doi.org/10.1007/BF01457962
Leios Labs. Understanding 4D — The Tesseract. YouTube, (2016). https://youtu.be/iGO12Z5Lw8s
Noll, A. Michael. A Computer Technique for Displaying n-Dimensional Hyperobjects. Communications of the ACM, (1967). http://reprints.gravitywaves.com/VIP/ViewExtraSpaceDims/Noll-1967_AComputerTechniqueForDisplayingNDimensionalHyperObjects.pdf
Poincaré, H. Cinquième complément à l’analysis situs. Rend. Circ. Mat. Palermo 18, 45–110, 1904. Reprinted in Oeuvres, Tome VI. Paris, (1953), p. 498.
Thurston, W. P. Three-Dimensional Manifolds, Kleinian Groups and Hyperbolic Geometry. Bull. Amer. Math. Soc. 6, 357–381, (1982).
Viazovska, M., The sphere packing problem in dimension 8, arXiv e-prints, (2016). https://arxiv.org/abs/1603.04246
Walczyk, Michael and Barker, Ben. Hopf. Github Repository, (2022). https://github.com/mwalczyk/hopf
Weisstein, Eric W. Poincaré Conjecture. From MathWorld — A Wolfram Web Resource, (2022). https://mathworld.wolfram.com/PoincareConjecture.html
Wolfram Research, Inc. (www.wolfram.com), Wolfram Programming Lab, Champaign, IL (2022).
The author reviewed several additional media accounts that contributed to reinforcing these discussions:
Ciechanowski, Bartosz. Tesseract. Blog (2019). https://ciechanow.ski/tesseract/
Fort, Stanislov. A high-dimensional sphere spilling out of a high-dimensional cube despite exponentially many constraints. Blog, 2022. https://stanislavfort.github.io/blog/sphere-spilling-out/
Houston-Edwards, Kelsey. A Breakthrough in Higher Dimensional Spheres | Infinite Series | PBS Digital Studios. Youtube, (2016). https://youtu.be/ciM6wigZK0w
Jones, Garrett. Higher Dimensions Introduction. website (2003). http://hi.gher.space/classic/sitemap.htm
Klarreich, Erica. Sphere Packing Solved in Higher Dimensions. Quanta Magazine, (2016). https://www.quantamagazine.org/sphere-packing-solved-in-higher-dimensions-20160330
Lamb, Evelyn. Why You Should Care about High-Dimensional Sphere Packing. Scientific American, (2016). https://blogs.scientificamerican.com/roots-of-unity/why-you-should-care-about-high-dimensional-sphere-packing/
Numberphile. Perfect Shapes in Higher Dimensions — Numberphile. YouTube, (2016). https://youtu.be/2s4TqVAbfz4
Star, Zach. The things you’ll find in higher dimensions. YouTube, 2019. https://youtu.be/dr2sIoD7eeU
Books that were referenced here or otherwise inspired this essay:
The Art of Doing Science and Engineering — Richard Hamming
Flatland — Edwin Abbott
As an Amazon Associated I earn from qualifying purchases.