Geometric Deep Learning

by Gueorgui Mihaylov, PhD

Gueorgui M Mihaylov
Trusted Data Science @ Haleon
13 min readNov 6, 2023

--

Gueorgui Mihaylov

A more radical geometric programme for deep learning

Deep neural networks (NN) play an increasingly important role in a variety of applications. A well-recognised fact about NNs is that relevant characteristics of their nature and performance are emergent i.e., they depend on the collective behaviour of the network and cannot be modelled, inferred, and understood in a strictly reductionist approach from the properties of individual neurons or connections. Modelling, understanding and quantifying the complexity of systems is an active field of research. Shock wave propagation, phase transitions, vitrification processes, and explosive signal amplification are examples of collective large-scale manifestations of the emergent properties of a complex system. Deep NNs are very complex adaptive systems. They have proven to be very powerful tools, but the intrinsic nature of their emergent properties makes verification, task validation, analysis, and interpretability of their functioning very difficult and often their performance deviates significantly from the expectations. Developing our mathematical understanding and ability to model, quantify and control the emergent properties of deep NNs is a challenging and fascinating task.

In this brief article, I would like to highlight an interesting parallel between a geometric theory of complex adaptive systems and the recent development in the space of geometric deep learning and propose some ideas for future research.

Emergent behaviour generated by local and global obstructions that can be modelled by geometric tools

A theoretical paradigm which successfully models and reproduces emergent phenomena, even in systems characterised by simple interactions between agents, is based on a generalised concept of geometric frustration. In other words, emergent behaviour is associated with local contradictions between the dynamics of agents or difficulties for the agents to synchronise their states or to find stable local equilibrium configurations. For example, the theory of spin glasses offers many interesting and highly nontrivial phenomena of this type.

Differential geometry studies the properties of smooth manifolds and geometric structures defined on them. The theory of fibre bundles offers a rich set of tools to model and quantify the relevant obstructions for point-wise phenomena to be extended to local phenomena, and local phenomena to be extended to global ones. Local is usually referred to phenomena limited to one local coordinate chart on a differentiable manifold. In the world of fibre bundles local phenomena, valid in a coordinate-compatible way, manifest/are measured by the concept of integrability of geometric structures. Global phenomena regard the global structure of the manifold. Fibre bundles on smooth manifolds provide a context for useful generalisations of the concept of vector-valued, tensor-valued, or something-else-valued functions. The fibres of principal bundles are (Lie) groups and sections of principal bundles are only locally defined. If a principal bundle admits global smooth sections, then it is trivial. Similarly, vector (tensor) bundles might not admit global smooth non-vanishing sections (have you tried to comb a sphere?). Local and global obstructions to the integrability and triviality of fibre bundles have been extensively studied modelled, classified, and quantified in the context of this theory. For example, the deviation of parallel transport on a manifold to be trivial is associated with a local tensor obstruction called curvature. Local obstructions for symplectic or complex geometries to be extended in a coordinate-compatible way beyond a single tangent space on a smooth manifold are determined/measured by another tensor called intrinsic torsion etc… (Torsion and curvature tensors can be interpreted in a unified way as the components of a higher-order integrability obstructions within the theory of jets… which are fibre bundles of a “higher order” … but this is a much longer story).

Analogously, there are topological obstructions for local phenomena to be extended globally. The theory of characteristic classes has been developed to capture, describe, and quantify the topological obstructions for a bundle to be trivial (more technically a characteristic class is a way to associate to each bundle on a manifold a cohomology class of the base manifold).

Gauge theories — where geometric obstructions assume physical meaning

The geometric theory described above is extremely rich and has been developed in a continuous and productive interaction and exchange of ideas with theoretical physics. Fundamental theories in physics up to the Standard model are indeed gauge theories. Gauge theory means the configuration space of the theory is modelled as a principal bundle with a certain structural Lie group, defined over a differentiable base manifold (space-time) and a set of associated tensor bundles. The sections of the associated bundles represent particle fields and the interaction fields of the theory.

The structure Lie group manifests as a set of symmetries of the theory, or in other words a set of transformations under which the action (a functional that determines the dynamics via extremisation) remains invariant. Interaction fields are introduced in a certain sense as “corrections” of the derivation operators i.e., covariant derivatives on the bundles are defined by means of a field called connection, which ultimately determines the parallel transport operation on the base manifold.

Very relevantly, the actions of many successful gauge theories (quantum electrodynamics, Yang-Mills theory, Quantum Chromodynamics, the Standard Model in Physics, and teleparallel gravity) are constructed precisely from the local obstructions to the integrability of the underlying geometric structures. Nontrivial dynamics is associated/generated by the integrability obstructions! Curvature tensors play the role of field strength tensors in electrodynamics and Yang-Mills theory, whereas in teleparallel gravity the dynamics is associated with torsion.

Analogously, in topological field theories, topological invariants of space-time give rise to physical observables. In particular, the propagators (correlation functions) of the field are topological invariants. A particularly inspiring example in this space (for the purposes of our geometric view on complexity and Graph NNs) is represented by the Chern-Simons theory. This is a quantum field theory which exploits the Chern-Weil construction of the so-called secondary characteristic (de Rahm cohomology) classes by means of connections and curvatures. This is very relevant because topological invariants and global obstructions for the triviality of principal bundles and their associated vector bundles are expressed by means of local obstructions for integrability. Again, in this case, topological invariants generate physical observables (correlation functions of gauge invariant operators).

Fredholm’s theory allows transferring concepts from a continuous to a discrete world

Real-world systems are usually composed of a discrete set of agents and interactions. Similarly, NNs are discrete systems composed of a finite set of connected neurons. A way to transfer the continuous and differentiable constructions described above in a discrete context is offered by Fredholm’s theory. Physical theories can be formulated in terms of integral equations. For example, the evolution over time of an initial configuration of a dynamical system can be obtained by computing the convolution of that initial condition with an integral kernel traditionally called Green’s function. Many differential operators relevant to physics have their integral operator counterpart, implemented as a convolution with a suitable integral kernel. Similarly, the propagator operators in (quantum) field theories are expressed as convolutions with suitable integral kernels. Convolutions with integral kernels express the transformations of sections of relevant associated bundles in the interacting theories.

Integration is a discretisation-friendly operation, in the sense that integrals and convolutions with integral kernels can be approximated (but more importantly conceptualised) by finite sums. This is explained for example in [2,6]. Interesting and relevant work in the manifold learning space has been done to show how discrete approximations of the Laplace-Beltrami operator on graphs (see [2,3]) converge to the Laplace-Beltrami operator on the underlying smooth manifolds both in local and spectral terms.

In this world of functionals, energies, actions and integral kernels, certain geometric properties of the discrete kernels can capture relevant geometric characteristics of the underlying smooth manifold. For example, in [7, 24] a correspondence between a class of discrete integral kernels that realise the Laplace-Beltrami operator and a class of Riemannian metrics on the underlying manifold has been described.

A geometric theory of complex adaptive systems

In the last few years, I have been working on a geometric (gauge) theory of complex adaptive systems that implements these ideas by means of the following elements:

  • A system approach i.e., modelling a system in terms of agents and interactions,
  • Local interactions of multiple agents are modelled by higher-order simplices. Simplicial complexes contain a much richer topological structure,
  • A suitable concept of locality and a suitable concept of non-trivial fibre bundles over a simplicial complex (fibre bundles over a discrete set of points are trivial),
  • Material vector (tensor) field that describes the state of the agents. The field approach puts emphasis on collective modes and degrees of freedom,
  • A rigorous and consistent construction of connections and curvature,
  • A relevant and consistent construction of characteristic classes.

Real-world relevance

I have presented some of these ideas in a series of seminars, conference tutorials and lectures (for example I presented one of the lectures at the 2020 LMS-IMA Joint Meeting on Topological methods in Data and others). My colleague and friend professor Sergio Cacciatori and I are just one step away from publishing our ULTIMATE paper that formalises and summarises this work. Quite relevantly some of these ideas have been successfully implemented in interesting solutions to complex real-world problems such as optimisation of global multi echelon supply chains, manufacturing efficiency, distribution and logistics optimisation etc. We have presented some elements, particularly relevant in industrial context in our keynote speech “AI and Complex industrial Systems — a strategic view” at the 2021 London Data Science Festival.

Geometric deep learning and gauge theories

Geometric deep learning has emerged as an interesting and impactful area of research in very recent years. Graph NNs are very popular architectures exploited for many different applications from drug discovery to capturing, learning and modelling the behaviour of complex biological and industrial systems.

Graph NNs are the starting point and the major inspiration for the theory of geometric deep learning, however, a specific effort has been made to expand the theory (see for example in [1] and other publications by the same authors) and to prove that that most known deep neural network architectures can be included in the same formalism. Geometric deep learning emphasises the importance of symmetries i.e., invariances under the action of relevant groups [1,3,9]. In this geometric programme for deep learning (inspired by Klein’s Erlangen Programme,) geometry is described/captured by detecting invariants. Fredholm’s theory, spectral analysis and other manifold learning techniques have been re-discovered in the context of geometric deep learning [10,19,20,21]. Furthermore, a gauge theory formalism that implements local symmetries and gauge transformations has been and successfully developed [1,15,16,17,18,19]. The equivariance properties of convolutional kernels that can guarantee the global or local (gauge) invariance of the network have been studied and formalised. Parallel transport has been included in the convolution operation [16,19], but less work has been done on the formal definition and analysis of the mathematical properties of the interaction gauge field, description of the field strength tensors, propagators, actions etc.

We can summarise the mathematical aspects of geometric deep learning in the following (theoretical physicist/differential geometer-friendly) way. Consider a graph and a relevant (Lie) group of symmetries, which we assume as a structure group. The states of the vertices of the graph are described by some vector or tensor quantity (a material field sampled in a finite number of points). We consider a representation of the structure group on the state space of the vertices. The edges of the graph are supposed to carry the interaction gauge field, which (in the discrete world of Graph NNs) means that the state of an edge needs to implement the parallel transport over a finite segment of curve. The state of an edge is therefore characterised by an element of a suitable linear representation of the structure group on the material field space. A configuration of the theory is given by a configuration of the material field and the interaction gauge field. A feature is a section of an associated vector bundle (a remarkable coincidence of the definitions in [2] and [16]. An initial configuration of the system is supplied to the ingestion layer of the Graph NN. Then this configuration is transformed through the layers of the deep Graph NN in analogy with evolution over time. The evolution between layers is performed by means of a convolution with an equivariant learnable integral kernel and the domain of the convolution is determined by the graph connectivity structure, which remains unmodified between Graph NN layers.

The fully interacting theory is characterised by the message-passing architecture in which the evolved state of an edge (the gauge field) depends on the states of both the vertices it connects. Most commonly this evolution is represented by learnable affine functions of the states of both vertices and an opportunely selected activation function.

An interesting philosophical question is whether equivariant convolution needs to be always implemented in the architecture or a gauge symmetry can emerge i.e., equivariant kernels can be spontaneously learned.

A new geometric programme for deep learning

In this context, a deep Graph NN (but recall most architectures can be formalised in these terms) is a realisation of the spacetime propagator of a gauge theory.

I believe crystallising this interpretation is interesting and relevant in its own right but should not be particularly surprising. The fact that NNs have been used to learn integral flows or evolution operators associated with a system of ordinary or partial differential equations [13], confirms the validity of these considerations.

Once the propagator over spacetime has been learned, the behaviour of local and global obstructions can be studied and analysed for different initial configurations of the material field and the gauge field. The behaviour of field strength tensors (curvatures or torsions), and potential non-trivial characteristic classes (topological invariant observables of the theory) are expected to capture, model and help quantify the complexity and the emergent characteristics of the neural network. The more formal constructions of ideas including non-trivial principal bundles on discrete sets are expected to provide the right theoretical framework for this analysis.

Our concept of non-trivial principal bundles on a discrete manifold involves the construction of simplicial complexes where tensor objects of different ranks are defined on simplices of different dimensions. Simplicial complexes can encode much richer topological structures (beyond the standard known graph characteristics like connectivity, centrality, distribution of the node degree etc.) Simplices are used to describe local multi-object interactions, so this concept needs to be meaningfully adapted in this context. For example, the multi-object interactions between the vertices in the graph structure propagated over time, can be based on criteria related to the simultaneous activation of neurons in the architecture. Activation bounds and the simultaneous activation of neurons have been described for example in [8]. Recall that properties of an underlying geometry of the “spacetime” can be encoded by the learned convolution kernels.

These considerations regard the structure of the Graph NN (Propagator of the interacting gauge theory), so this is different from the recently developed concept of simplicial neural networks. NNs that can ingest, process, and propagate through different layers the combinatorial and topological information encoded by simplicial complexes by means of a (further) generalised concept of convolution based on the simplicial realisation of the Laplace-Beltrami operator have been described in [11,12,14]. The techniques (for example based on the spectral structure of the simplicial Laplacian) developed in this area and the further generalisation of the convolution operation can be very helpful in the context of this gauge theory of neural networks is potentially very useful.

Conclusion

I believe that the gauge formalism in geometric deep learning has great potential, which is still underexploited. A theory that models and quantifies complexity/emergent phenomena through the rich and powerful set of tools provided by differential geometry could efficiently capture behaviours of NNs, which are not fully understood or explained at this stage. Just imagine the power of a statement like “the XYZ NN is characterised by specific limitations on its expressivity or verifiability determined by a non-trivial characteristic class C…”!

Bibliography

[1] M. Bronstein , J. Bruna , T. Cohen , P. Veličković Geometric Deep Learning Grids, Groups, Graphs, Geodesics, and Gauges arXiv:2104.13478v2 (2021)

[2] G. Mihaylov, M. Spallanzani, Emergent behaviour in a system of industrial plants detected via manifold learning, International Journal of Prognostics and health Management — special issue on big data analytics ISSN2153–2648, 2016 030 (2016)

[3] B. Aslan, D. Platt, D. Sheard, Group invariant machine learning by fundamental domain projections, Proceedings of the 1st NeurIPS Workshop on Symmetry and Geometry in Neural Representations, PMLR 197:181–218, (2023).

[4] A. Singer, HT Wu, Spectral convergence of the connection Laplacian from random samples, Information and Inference: A Journal of the IMA 6, 58–123 (2017)

[5] M. Belkin, P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation, 15, 1373 (2003)

[6] G. Puglisi, G. Mihaylov, G. Panopoulou, D. Poletti, J. Errard, P. Puglisi, G.Vianello, Improved Galactic Foreground Removal for B-Modes Detection with Clustering Methods, Monthly Notices of the Royal Astronomical Society Main Journal (2022), Oxford University Press, Volume 511, Issue 2, April 2022, Pages 2052–2074 (2022)

[7] T. Berry, T. Sauer, Local Kernels and the Geometric Structure of Data Applied and Computational Harmonic Analysis Volume 40, Issue 3, Pages 439–469 (2016)

[8] E. Botoeva, P. Kouvaros, J. Kronqvist, A. Lomuscio, R. Misener, Efficient Verification of ReLU-based Neural Networks via Dependency Analysis Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34 №04

[9] D. Yarotsky, Universal approximations of invariant maps by neural networks, Constructive Approximation 55(1):1–68 (2020)

[10] M. Finzi, S. Stanton, P. Izmailov, and A. G. Wilson. Generalizing convolutional neural networks for equivariance to Lie groups on arbitrary continuous data. In ICML, (2020)

[11] S. Ebli, M. Defferrard, G. Spreemann, Simplicial Neural Networks DOI:10.5281/ZENODO.4309827 Corpus ID: 222209152, (2020)

[12] M. Yang, E. Isufi, Convolutional Learning on Simplicial Complexes. arXiv:2301.11163 (2023)

[13] Lizuo Liu, Wei Cai, DeepPropNet — A Recursive Deep Propagator Neural Network for Learning Evolution PDE Operators arXiv:2202.13429v1 (2022)

[14] C. Bodnar, F. Frasca, YG Wang, N. Otter, G. Montufar, P. Liò, and M. Bronstein. Weisfeiler and Lehman go topological: Message passing simplicial networks. arXiv:2103.03212 (2021)

[15] T. Cohen and M. Welling. Group equivariant convolutional networks. In ICML (2016).

[16] T. Cohen, M. Weiler, B. Kicanaoglu, M. Welling. Gauge equivariant convolutional networks and the icosahedral CNN. In ICML (2019)

[17] M. Hutchinson, C. Le Lan, S. Zaidi, E. Dupont, YW Teh, H. Kim. LieTransformer: Equivariant self-attention for Lie groups. arXiv:2012.10885 (2020)

[18] H. Maron, H. Ben-Hamu, N. Shamir Y. Lipman. Invariant and equivariant graph networks. arXiv:1812.09902, (2018)

[19] J. Masci, D. Boscaini, M. Bronstein, P. Vandergheynst. Geodesic convolutional neural networks on Riemannian manifolds. In CVPR Workshops (2015).

[20] S. Mei, T. Misiakiewicz, and A. Montanari. Learning with invariances in random features and kernel models. arXiv:2102.13219, 2021.

[21] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, M. Bronstein. Geometric deep learning on graphs and manifolds using mixture model CNNs. In CVPR, (2017)

[22] J. Wood and J. Shawe-Taylor. Representation theory and invariant neural networks. Discrete Applied Mathematics, 69(1–2):33–60 (1996)

[23] W. Zeng, R. Guo, F. Luo, X. Gu. Discrete heat kernel determines discrete Riemannian metric. Graphical Models, 74(4):121–129, (2012)

--

--