What is the new Neural Network Architecture?(KAN) Kolmogorov-Arnold Networks Explained

Zul Ahmed
8 min readMay 3, 2024

--

A groundbreaking research paper released just three days ago introduces a novel neural network architecture called Kolmogorov-Arnold Networks (KANs). This paper was released just yesterday and definitely holds some promiseThis new approach, inspired by the Kolmogorov-Arnold representation theorem, promises significant improvements in accuracy and interpretability compared to traditional Multi-Layer Perceptrons (MLPs). Let’s dive into what KANs are, how they work, and the potential implications of this exciting development.

Neural Network Graph in the shape of a graph from IconWorks
Credit @ IconWorks

What are Kolmogorov-Arnold Networks?

Kolmogorov-Arnold Networks are a new type of neural network that takes a fundamentally different approach to learning than MLPs. While MLPs have fixed activation functions on nodes (or “neurons”), KANs have learnable activation functions on edges (or “weights”). This seemingly simple change has profound effects on the network’s performance and interpretability.

In a KAN, each weight parameter is replaced by a univariate function, typically parameterized as a spline. As a result, KANs have no linear weights at all. The nodes in a KAN simply sum the incoming signals without applying any non-linearities.

Figure on the differences between KAN networks and a MLP (Multi-layer Perceptron)
First Figure from the KAN paper

You can see from this figure that in shallow model section the activation functions are learnable.

How do they work?

Simple

At its core, a KAN learns both the compositional structure (external degrees of freedom) and the univariate functions (internal degrees of freedom) of a given problem. This allows KANs to not only learn features, like MLPs, but also to optimize these learned features to great accuracy.

KANs leverage the strengths of both splines and MLPs while avoiding their weaknesses. Splines are accurate for low-dimensional functions and can easily adjust locally, but suffer from the curse of dimensionality. MLPs, on the other hand, are better at exploiting compositional structures, but struggle to optimize univariate functions. By combining the two approaches, KANs can learn and accurately represent complex functions more effectively than either splines or MLPs alone.

Expanded

This architecture that learns both the compositional structure (external degrees of freedom) and the univariate functions (internal degrees of freedom) of a given problem. This unique approach allows KANs to learn features like Multi-Layer Perceptrons (MLPs) while also optimizing these learned features to achieve high accuracy. Let’s take a closer look at how KANs accomplish this.

Compositional Structure Learning (External Degrees of Freedom)
KANs, like MLPs, can learn the compositional structure of a problem. In other words, they can identify and learn the relationships between different input features and how they contribute to the output. This is achieved through the network’s architecture, which consists of layers of nodes connected by edges.

In a KAN, the nodes are responsible for summing the incoming signals without applying any non-linearities. The edges, on the other hand, contain learnable activation functions, which are typically parameterized as splines. This architecture allows the network to learn the optimal composition of these activation functions to model the underlying structure of the problem.

By learning the compositional structure, KANs can effectively handle high-dimensional problems and exploit the inherent relationships between input features. This capability is similar to that of MLPs, which can also learn complex feature interactions through their layered architecture.

Univariate Function Optimization (Internal Degrees of Freedom)
What sets KANs apart from MLPs is their ability to optimize univariate functions to a high degree of accuracy. In a KAN, each edge contains a learnable activation function, which is a univariate function parameterized as a spline. Splines are piecewise polynomial functions that can closely approximate complex univariate functions.
During training, KANs optimize these spline activation functions to best fit the target function. The spline parameterization allows for local adjustments, meaning that the network can fine-tune the activation functions in specific regions of the input space without affecting other regions. This local adaptability is a key advantage of splines over global activation functions like sigmoids or ReLUs, which are commonly used in MLPs.
By optimizing the univariate functions, KANs can achieve high accuracy in modeling complex, non-linear relationships between inputs and outputs. This is particularly useful for problems with low-dimensional input spaces, where splines can excel.

Combining Strengths of Splines and MLPs
KANs leverage the strengths of both splines and MLPs while avoiding their weaknesses. Splines are highly accurate for low-dimensional functions and can easily adapt locally, but they suffer from the curse of dimensionality. As the number of input dimensions increases, the number of spline parameters required to maintain accuracy grows exponentially, making splines impractical for high-dimensional problems.

On the other hand, MLPs are better suited for high-dimensional problems due to their ability to learn compositional structures. However, MLPs struggle to optimize univariate functions effectively, as their activation functions are typically fixed and global.

KANs overcome these limitations by combining the compositional structure learning of MLPs with the univariate function optimization of splines. The network’s architecture allows it to learn complex feature interactions like an MLP, while the spline activation functions enable accurate modeling of univariate relationships.

By integrating these two approaches, KANs can learn and represent complex, high-dimensional functions more effectively than either splines or MLPs alone. This synergy enables KANs to achieve state-of-the-art performance on a wide range of problems, from data fitting to solving partial differential equations.

In summary, Kolmogorov-Arnold Networks work by learning both the compositional structure and the univariate functions of a problem. The network’s architecture, with nodes for summing signals and edges containing learnable spline activation functions, allows KANs to combine the strengths of splines and MLPs. By optimizing the univariate functions and learning the compositional structure, KANs can accurately model complex, high-dimensional functions, making them a powerful and versatile tool for various machine learning tasks.

Implications and Potential Applications:

The introduction of Kolmogorov-Arnold Networks has several exciting implications:

1. Improved accuracy: KANs have demonstrated comparable or better accuracy than much larger MLPs in tasks such as data fitting and solving partial differential equations (PDEs). This suggests that KANs could lead to more efficient and accurate models in various domains.

2. Enhanced interpretability: KANs are designed to be more interpretable than MLPs. The learnable activation functions can be visualized and interacted with, allowing users to gain insights into the model’s internal workings. This interpretability could be particularly valuable in fields like healthcare, where understanding a model’s decision-making process is crucial.

How Kolmogorov-Arnold Networks Could Revolutionize Large Language Models

Generative AI, particularly in the form of large language models (LLMs), is currently the most exciting and rapidly advancing frontier in artificial intelligence. LLMs like GPT-3, PaLM, and Chinchilla have demonstrated remarkable capabilities in natural language understanding, generation, and even reasoning. However, these models still face challenges in terms of efficiency, interpretability, and the ability to learn from fewer examples. This is where Kolmogorov-Arnold Networks (KANs) could make a significant impact, potentially outperforming existing neural network architectures.

  1. Improving LLM efficiency: LLMs are notoriously large, with billions or even trillions of parameters. This makes them computationally expensive to train and deploy. KANs have shown the ability to achieve comparable or better performance than much larger MLPs in various tasks. In contrast, other architectures like Transformers and convolutional neural networks (CNNs) can also be computationally intensive. By incorporating KAN architectures into LLMs, it may be possible to create more compact and efficient models without sacrificing performance, potentially surpassing the efficiency of existing architectures.
  2. Enhancing interpretability: One of the main criticisms of LLMs is their lack of interpretability. It can be difficult to understand how these models arrive at their outputs, which raises concerns about bias, fairness, and trustworthiness. While some architectures like decision trees and rule-based systems are more interpretable, they often lack the performance of deep learning models. KANs, with their learnable activation functions and more interpretable structure, could help address this issue. By integrating KANs into LLMs, researchers could gain more insights into how the models process and generate language, potentially leading to more transparent and explainable AI systems that outperform other interpretable architectures.
  3. Few-shot learning: While LLMs have shown impressive few-shot learning capabilities, they still require substantial amounts of data and compute to achieve optimal performance. Other architectures like Siamese networks and metric learning approaches have been used for few-shot learning, but they may not scale as well to complex language tasks. KANs’ ability to learn both compositional structure and univariate functions more efficiently could help LLMs learn from fewer examples, potentially outperforming existing few-shot learning approaches in the language domain.
  4. Knowledge representation and reasoning: LLMs have demonstrated some ability to store and retrieve knowledge, as well as perform basic reasoning tasks. However, their ability to represent and manipulate complex, structured knowledge is still limited. Graph neural networks (GNNs) and knowledge graphs have been used to represent structured knowledge, but integrating them with language models remains challenging. KANs’ more interpretable and modular structure could potentially help LLMs better represent and reason over structured knowledge, offering a more seamless integration of knowledge representation and language modeling compared to existing approaches.
  5. Multimodal learning: While LLMs primarily focus on text data, there is growing interest in multimodal models that can process and generate multiple types of data, such as images, audio, and video. Architectures like Vision Transformers and Multimodal Transformers have shown promise in this area, but they can be computationally intensive and may not fully exploit the unique characteristics of each modality. KANs’ ability to learn compositional structures and optimize learned features could make them well-suited for multimodal learning tasks, potentially leading to more efficient and effective multimodal models compared to existing architectures.

The integration of Kolmogorov-Arnold Networks into large language models could lead to significant advancements in generative AI, potentially outperforming existing neural network architectures in terms of efficiency, interpretability, few-shot learning, knowledge representation, and multimodal learning. As research in this area progresses, we can expect to see new and exciting applications of LLMs powered by KAN architectures, pushing the boundaries of what is possible with generative AI and setting new standards for performance and capabilities in the field.

So What’s The Catch?

Currently, the biggest bottleneck of KANs lies in its slow training. KANs are usually 10x slower than MLPs, given the same number of parameters. We should be honest that we did not try hard to optimize KANs’ efficiency though, so we deem KANs’ slow training more as an engineering problem to be improved in the future rather than a fundamental limitation. If one wants to train a model fast, one should use MLPs. In other cases, however, KANs should be comparable or better than MLPs, which makes them worth trying. The decision tree in Figure 6.1 can help decide when to use a KAN. In short, if you care about interpretability and/or accuracy, and slow training is not a major concern, we suggest trying KANs.

This is the main drawback and its a quote I pulled from the paper and I think its pretty interesting with how the main limitation is speed here.

When should I use them?

Flow Decision Chart on when to use KANs vs MLPs
Deciding when to use KANs vs. MLPs

Conclusion

Kolmogorov-Arnold Networks represent a significant step forward in neural network architecture. By combining the strengths of splines and MLPs, KANs offer improved accuracy and interpretability compared to traditional approaches. As research into KANs continues, we can expect to see further improvements and applications across various domains. This exciting development opens up new opportunities for advancing machine learning models and their use in scientific discovery.

Embedded Link to Paper you can view here

--

--

Zul Ahmed

SWE @ Microsoft | Colgate University Senior from NYC Looking to share insights throughout my Academia and Projects! https://www.linkedin.com/in/zul-ahmed/