A Simplified Explanation Of The New Kolmogorov-Arnold Network (KAN) from MIT

Isaak Kamau
2 min readMay 1, 2024
A screenshot from: https://arxiv.org/abs/2404.19756

Exploring the Next Frontier in AI: The Kolmogorov-Arnold Network (KAN)

In the ever-evolving landscape of artificial intelligence, a new architecture is making waves, promising a revolution in how we understand and construct neural networks. Dubbed the Kolmogorov-Arnold Network (KAN), this innovative framework from MIT is poised to transform traditional models with its unique approach.

The Traditional Foundation: Multi-Layer Perceptrons (MLPs)

To appreciate the significance of KAN, it’s essential to revisit the traditional backbone of AI applications — Multi-Layer Perceptrons (MLPs). These models are pivotal in AI, structuring computations through layered transformations, which can be simplified as:

f(x) = σ (W * x + B)

Where:
- σ denotes the activation function (like ReLU or sigmoid) introducing non-linearity,
- W symbolizes tunable weights defining connection strengths,
- B represents bias,
- x is the input.

Courtesy: https://twitter.com/ZimingLiu11

This model implies that inputs are processed by multiplying them with weights, adding a bias, and applying an activation function. The essence of training these networks lies in optimizing W to enhance performance for specific tasks.

Enter the Kolmogorov-Arnold Network (KAN)

KAN introduces a radical shift from the MLP paradigm by redefining the role and operation of activation functions. Unlike the static, non-learnable activation functions in MLPs, KAN incorporates univariate functions that act as both weights and activation functions, adapting as part of the learning process.

Consider this simplified representation:

f(x1, x2) = Φ2(φ2,1(φ1,1(x1) + φ1,2(x2)))

Here:
- x1 and x2 are inputs,
- φ1,1 and φ1,2 are specific univariate functions for each input, combined and then processed through another function Φ2 in the subsequent layer.

Pioneering Changes in Neural Network Architecture

KAN not only tweaks but overhauls network operations, making them more intuitive and efficient by:

Activation at Edges: Moving activation functions to the edges rather than the neuron’s core, potentially altering learning dynamics and enhancing interpretability.
Modular Non-linearity: Applying non-linearity before summing inputs, allowing differentiated treatment of features and potentially more precise control over input influence on outputs.

This innovative architecture could lead to networks that are not just slightly better but fundamentally more capable of handling complex, dynamic tasks.

For more details on this groundbreaking work, access the original research and resources:

Connect:

If you like this topic and you want to support me:

  1. Clap my article 10 times; that will help me out.👏
  2. Follow me on Medium and LinkedIn to get my latest articles 🫶

--

--