Quantum Support Vector Machine (QSVM)

--

Classical Support Vector Machine

Classical Support Vector Machine (Source: Wikipedia)

Support Vector Machine (SVM) is a famous method in Machine Learning used to classify data into labels. Developed in the ’60s, SVM’s idea is to find the hyperplane that maximizes the ‘street’ width that divides the labels into the data. The above figure shows the case when there are two features in the data. For three features, SVM will try to find the best plane, and for four or more features, it would try to find the best hyperplane.

To perform the optimization, the ML model minimizes |w| subject to the following constraint:

SVM optimization equation

in which yᵢ is the label (i.e -1 or 1), w is the normal vector to the hyperplane, xᵢ is the feature vector, and b is the bias.

One of the main benefits of SVM is that, unlike many ML methods that use the Gradient Descent Algorithm to find the optimal model to predict test data, the vector machine method does not get blocked by local optimal points, as it is a convex optimization problem. This, by itself, makes SVM one of the strongest classification ML models that are available to us.

However, due to its nature, the Support Vector Machine depends on how the data is distributed to be able to find the optimal hyperplane. When there is no optimal hyperplane, SVM needs another feature that makes the classification possible: the Kernel.

2-D Data hard to classify

Kernel

Example of kernel

A kernel is a mapping feature that takes data points and maps them into a new domain which, in this case, makes the dataset easier to classify. It works, for instance, by adding new dimensions to the data (i.e. the image above) or redistributing the dataset in the given domain.

Although it may be very computationally expensive to compute the mapping, SVM does not need the map itself, but only the dot products between the feature vector and the weight vector in the kernel function, as given by the optimization equation. This makes kernels extremely useful in better classifying complex datasets.

Example of redistributing data in the given domain (Source: Wikipedia)

Currently, there are several kernel functions. However, Sklearn, a famous data science library, has already implemented 3 useful ones:

  • Polynomial → k(xᵢ, xⱼ) = (xᵢ ⋅ xⱼ)ᵈ , linear if d = 1
  • Radial Basis Function (RBF) → k(xᵢ, xⱼ) = exp(-γ|xᵢ –xⱼ|²), for γ >0
  • Sigmoid → k(xᵢ,xⱼ) = tanh(κ xᵢ ⋅ xⱼ + c), for κ > 0 and c < 0

Even with kernel mapping, however, we might find it computationally expensive and inefficient to find such map features to the point of being ineffective to use SVM. For such cases, we can apply what is known as Quantum Kernel.

Quantum Kernel

The idea behind quantum kernels is to take advantage of quantum mechanics to achieve better results while mapping the feature vector. The general guidelines for such mapping are encoding classical data into qubits, performing operations (such as superposition and rotations in the Bloch sphere), then computing the dot product of the resulting states.

Some of the proposals for the kernel have been made. For instance, Aram Harrow et al. proposed the following kernel circuit:

Quantum Kernel

Here, U is a unitary gate that takes the feature vector as a parameter that represents the following, where S is the test dataset:

Unitary gate for a classical feature function ϕ

Although it may be complex, quantum kernels have to perform operations that are not available to a classical machine to outperform them. In the case above, the circuit uses Hadamard gates and Z Pauli matrices to get some advantage over them.

Currently, Qiskit has 2 types of kernels available and another one that allows the user to create circuits that will encode the classical data.

ZFeatureMap

The idea for ZFeatureMap is that, for each dimension in our feature set, there is going to be a qubit that will go through a Hadamard gate and a unitary gate .

ZFeatureMap circuit

As shown, the circuit is linear in the sense that does not ‘crossover’ (i.e. no CNOT gates) information through the qubits, which makes it absent of entanglement besides de H gates.

ZZFeatureMap

The ZZFeatureMap adds entanglement to the quantum system in the following way.

ZZFeatureMap circuit (Source: Medium)

PauliMatricesMap

The idea behind PauliMatricesMap is to add Pauli matrices – X, Y, or Z – to the system to rotate a certain amount in the Bloch Sphere.

PauliMatrices circuit with a Z, Y and ZZ matrices (Source: Medium)

Applying Quantum Kernels in ML

Now the next step is applying these ideas into datasets. Qiskit already provides datasets suitable to benchmark QSVM against regular SVM. The experiments here made used the data displayed in the first section of this article (‘2-D data hard to classify’).

  1. First, we are going to import Qiskit and set up our data set:
Code to get data

2. Then, we are going to get the backend running for our experiment in Qiskit and define our feature map to set up our kernel:

3. Now we run the experiments:

SQVM (1˚ cell); Classical SVM w/ Quantum Kernel (2˚ cell); Classical SVM w/ RBF kernel (3˚ cell)

Results I

Although quantum kernels took roughly 4 minutes to run in total each run, they were able to correctly classify all the 40 data points while its classical counterpart only got 55% of correct predictions, performing as well as flipping a coin. The difference in time can be attributed to running the experiments in the backend and some other factors, while this may not be the case.

New dataset – ‘Success’ in Tennis

To test it again, I created models to predict success in tennis according to the age and interest in the given sport. The dataset was taken from Kaggle and it is useful for this experiment as it only has 2 features (I tried to run with datasets with 3 or more features but got several problems).

Tennis dataset – yellows are success cases (1) and purples are failed cases(0)

Although it is an easier dataset to divide the labels, the Tennis dataset proved to be hard for quantum kernels to understand, while classical SVM got it all right. Let’s take a look at the code:

  1. & 2. are similar as previously:
Code for the dataset. Note the change of feature map

3. Here, we try the very same code:

QSVM (1˚ cell); Classical SVM w/ Quantum Kernel (2˚ cell); Classical SVM with RBF as the kernel (3˚ cell)

Results II

As shown above, the quantum kernel was heavily outperformed by the classical SVM. While the Sklearn model got an accuracy of 90.9% with the Radial Basis Function kernel function, Qiskit SQVM only got 52.5% of accuracy.

Final Thoughts

Undoubtedly, Quantum Machine Learning will improve current predictions models. In particular, Support Vector Machine has been proved to be a very effective method in order to get better results than classical algorithms.

However, current implementations of the Quantum Support Vector Machine face some hardships to which one should pay attention. While Q. Kernels are better in some cases, they still stay behind in the optimization timing. This will most likely be fixed in the future with better quantum computers and new optimization algorithms have been introduced in the quantum computing world.

Secondly, as of now, Qiskit has some limitations concerning the number of features in the feature vector. This seems to be a temporary problem, as the IBM team works to improve the algorithm and make it to be more accessible to larger datasets.

Most importantly, however, is that quantum kernels may have an optimal dataset type to work with. As seen above, QSVM performed well when a regular SVM would not be able to get good results. On the other hand, when the data was mathematically simple, quantum kernels had a huge problem dealing with it. This should not be understood as a failure, but rather a niche in which quantum computing can thrive. I shall explore more of that in the future.

Along those lines, one may have seen the different types of quantum kernels. While some standards are available, it is yet remained to be explored what are the best kernels for some given dataset or in general.

Appendix – Non-hybrid Models

As of today, Qiskit implements an algorithm that is not fully quantum. Most of the algorithm, as described above, relies on the kernel mapping feature, but then goes into its classical parallel again. While this is effective, there are already proposed interventions to a fully quantum Support Vector Machine procedure.

A. Kariya and B. Behera proposed a circuit that fully implements SVM in a quantum computer. The general guidelines for that can be seen below:

General circuit for QSVM

For each part, the authors also proposed a circuit, yielding the following general circuit:

General circuit, in which F is the kernel matrix

--

--