Kolmogorov Arnold Networks (KANs) vs. Multi-Layer Perceptrons (MLPs) — A Comparison

Siddharth Sudhakar
Accredian
Published in
6 min readMay 20, 2024

The field of Artificial Intelligence (AI) is constantly evolving, with researchers pushing the boundaries of what’s possible. A new type of neural network architecture called Kolmogorov-Arnold Networks (KANs) has recently emerged. If you’re looking to understand what KANs are or the intuition behind them, you can check out another Medium article exploring the concepts behind KANs here:

This article will explore how KANs differ from traditional neural networks, also known as multi-layer perceptron. We’ll also examine the challenges that KANs must overcome to replace traditional neural networks.

Table of Contents

  1. Introduction
  2. Advantages of KANs over MLPs
  3. Comparing KANs and MLPs Through Code
  4. Limitations of KANs
  5. Conclusion

Introduction

Kolmogorov Arnold Networks

Kolmogorov-Arnold Networks are inspired by the Kolmogorov-Arnold Representation Theorem, a mathematical concept. This theorem states that any continuous multivariate function can be represented as a composition of univariate functions.

KANs use this idea by replacing the traditional linear weights in neural networks with more flexible, spline-parametrized univariate functions. This design choice allows KANs to adapt their activation patterns dynamically.

Overview of KAN’s model architecture. Image from the research paper

Multi-Layer Perceptron

Multi-layer perceptron, often simply referred to as “neural networks,” have been the workhorses of machine learning for decades. They are structured like a chain of interconnected nodes, or “neurons,” organized into layers.

Each connection between these neurons carries a weight that determines the strength of the signal passing through. By adjusting these weights during training, MLPs can learn to recognize patterns in data and make predictions.

Overview of MLP’s model architecture. Image from the research paper

The activation functions in MLPs are fixed and exist in nodes rather than edges.

Advantages of KANs Over MLPs

Efficiency

KANs are generally more parameter-efficient than MLPs. They can achieve comparable or even better accuracy with fewer parameters. This is particularly evident in tasks involving regression and partial differential equations (PDEs), where KANs significantly outperform MLPs.

Interpretability

KANs offer better interpretability because of their use of splines. The activation functions on edges can be visualized and understood more intuitively than the fixed, node-based activations in MLPs. This makes KANs particularly useful in scientific applications where understanding the model’s behavior is crucial.

Avoiding Catastrophic Forgetting

KANs have shown an ability to avoid catastrophic forgetting, a common issue in neural networks where learning new information can cause the model to forget previously learned information. This is due to the local plasticity of spline-based activations, which only affect nearby spline coefficients and leave far-away coefficients intact.

MLP vs. KAN. Image from the research paper

The research paper includes a decision tree that can assist in determining when to use a KAN. In summary, if interpretability and/or accuracy are important, and slow training is not a significant concern, the authors recommend considering KANs.

Should I use KANs or MLPs? Image from the research paper

Comparing KANs and MLPs Through Code

First, let’s take a look at a KAN classifier for the make_moons dataset in scikit-learn, which consists of points belonging to two interleaving half circles. We’ll begin by creating the dataset.

We need to import the necessary libraries and create the dataset using the following code:

import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
import torch
import numpy as np
dataset = {}
train_input, train_label = make_moons(n_samples=1000, shuffle=True, noise=0.1, random_state=None)
test_input, test_label = make_moons(n_samples=1000, shuffle=True, noise=0.1, random_state=None)

dataset['train_input'] = torch.from_numpy(train_input)
dataset['test_input'] = torch.from_numpy(test_input)
dataset['train_label'] = torch.from_numpy(train_label)
dataset['test_label'] = torch.from_numpy(test_label)

X = dataset['train_input']
y = dataset['train_label']
plt.scatter(X[:,0], X[:,1], c=y[:])

The dataset we have created has been visualized below.

We must install the necessary library before writing code for the KAN model. Execute the following code to install the required library.

!pip install pykan

Now that the library is installed let’s import and define the model.

from kan import KAN

model = KAN(width=[2,2], grid=3, k=3)

def train_acc():
return torch.mean((torch.argmax(model(dataset['train_input']),
dim=1) == dataset['train_label']).float())

def test_acc():
return torch.mean((torch.argmax(model(dataset['test_input']),
dim=1) == dataset['test_label']).float())

results = model.train(dataset, opt="LBFGS", steps=20,
metrics=(train_acc, test_acc),
loss_fn=torch.nn.CrossEntropyLoss())

Upon executing the code block above, the following output is generated:

train loss: 4.71e-10 | test loss: 1.77e-07 | reg: 1.54e+02 : 100%|██| 20/20 [00:04<00:00,  4.10it/s]

Let’s now fix the functions.

lib = ['x','x^2','x^3','x^4','exp','log','sqrt','tanh','sin','abs']
model.auto_symbolic(lib=lib)

Upon executing the code block above, the following output is generated:

fixing (0,0,0) with sin, r2=0.9497780818149338
fixing (0,0,1) with sin, r2=0.9264025193127776
fixing (0,1,0) with sqrt, r2=0.9948629426193356
fixing (0,1,1) with sqrt, r2=0.9948216854724679

Now, the formula is defined.

formula1, formula2 = model.symbolic_formula()[0]

Let’s examine the accuracy of this formula by looking at the training and test accuracies.

def acc(formula1, formula2, X, y):
batch = X.shape[0]
correct = 0
for i in range(batch):

logit1 = np.array(formula1.subs('x_1',
X[i,0]).subs('x_2', X[i,1])).astype(np.float64)
logit2 = np.array(formula2.subs('x_1', X[i,0]).subs('x_2',
X[i,1])).astype(np.float64)

correct += (logit2 > logit1) == y[i]

return correct/batch

# Print Accuracy
print('train acc of the formula:', acc(formula1,
formula2,
dataset['train_input'],
dataset['train_label']))

print('test acc of the formula:', acc(formula1,
formula2,
dataset['test_input'],
dataset['test_label']))

Upon running this code block, the following accuracies are obtained:

train acc of the formula: tensor(0.9990)
test acc of the formula: tensor(1.)

That completes the KAN model. Now, let’s create an MLP Classifier for this dataset using scikit-learn’s MLPClassifier model. First, we need to import the necessary libraries.

from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler

We need to extract the data from the dataset we created earlier to be suitable for this model. We use .numpy() to convert the tensors into NumPy arrays.

# Extract data
X_train = dataset['train_input'].numpy()
y_train = dataset['train_label'].numpy()
X_test = dataset['test_input'].numpy()
y_test = dataset['test_label'].numpy()

Remember to include feature scaling as part of our process to enhance the model’s performance.

# Scale the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Now, we train our neural network. In this context, alpha represents the strength of the L2 regularization term, and max_iter specifies the maximum number of iterations. After defining the model, we proceed to fit it to the data.

# Train the MLPClassifier
clf = MLPClassifier(alpha=1, max_iter=1000)
clf.fit(X_train, y_train)

Let’s review the training and test accuracies to evaluate our MLP’s performance now that all tasks are complete.

# Calculate training and test accuracies
train_accuracy = clf.score(X_train, y_train)
test_accuracy = clf.score(X_test, y_test)
# Print the accuracies
print(f"Training Accuracy: {train_accuracy:.2f}")
print(f"Test Accuracy: {test_accuracy:.2f}")

We get the following output:

Training Accuracy: 0.98
Test Accuracy: 0.98

Therefore, KANs perform slightly better here, although there isn’t much difference. We can utilize other standard benchmark datasets to compare KANs and MLPs and better understand the advantages of using KAN models.

Limitations of KANs

The new architecture comes with several limitations, some of which are listed below:

  1. Slow Training: KANs typically train about ten times slower than MLPs with the same number of parameters. This inefficiency is considered an engineering problem rather than a fundamental limitation, suggesting that there is room for optimization in the future.
  2. Curse of Dimensionality: While splines used in KANs are accurate for low-dimensional functions, they struggle with high-dimensional functions due to the curse of dimensionality. This is because splines cannot efficiently exploit compositional structures, which is a significant limitation compared to MLPs.
  3. Generalization and Realistic Setups: While KANs have shown promising results in avoiding catastrophic forgetting and being interpretable, it remains unclear whether they can generalize to more realistic and complex setups.

Conclusion

Kolmogorov-Arnold Networks is a novel neural network architecture with the potential to address the limitations of traditional multi-layer perceptrons (MLPs). Their efficiency, interpretability, and resilience to catastrophic forgetting make them a promising alternative in specific applications.

However, many challenges need to be addressed before they can replace traditional neural networks entirely. As research progresses, overcoming these hurdles could solidify KANs’ place as a valuable tool in the ever-evolving field of artificial intelligence.

--

--