Training Quantum Kernels for Machine Learning Using Qiskit

Published in

Qiskit

7 min readMay 11, 2022

By Caleb Johnson, IBM Quantum Developer

In recent years, machine learning has rapidly grown from a promising area of research to an economically significant and fast-growing industry. There is widespread interest in developing new techniques for creating better-performing models while using less resources, and quantum kernel training is a natural extension of this effort.

Developing quantum kernel-based techniques does not necessarily require fault tolerance and can be tested on near-term devices. Additionally, quantum kernel estimation has the potential to provide exponential quantum speedup via the use of quantum-enhanced feature spaces [1]. In this blog, we will review what kernels are in a machine learning context, we will discuss what distinguishes quantum kernels from their classical counterparts, and finally, we will learn how to use Qiskit to estimate and train a quantum kernel for a machine learning task.

What are quantum kernels and what do they do?

In machine learning, kernels are often used in conjunction with linear methods to handle linearly-unseparable data — data through which we cannot draw a flat line (or plane) that separates it into groups for some property. Kernel methods allow one to map the input data to an implicit, higher-dimensional feature space, in which the problem may become easier to solve, without having to explicitly compute the feature mapping. For example, a kernel used with a support vector machine may make the data more linearly separable in the feature space, leading to improved classification accuracy.

To illustrate this point, consider the one-dimensional data visualized in the below image. This data contains two classes, and it is clear that these classes cannot be perfectly separated by a linear model (i.e. there is no single line we can draw that would separate all the 0 labels from the 1 labels). We will need to use a kernel method to map this data to a higher-dimensional space where it can be separated by a linear model.

This data cannot be separated into two groups by one line.

The kernel we will choose to transform this data will add a feature to our data, x2, such that x2 = x1².

As seen below, mapping our one-dimensional data into this new two-dimensional space makes the data perfectly separable. This is called the “kernel trick”.

What is a quantum kernel?

Like their classical counterparts, quantum kernels are similarity measures between pairs of data samples encoded in a high-dimensional feature space. The distinction is that quantum kernels use quantum circuits as feature maps. This allows for the exploitation of an exponentially large quantum state space during kernel matrix calculation without having to store the exponentially large vector representations of that space in computer memory.

Are quantum kernels better than normal kernels?

Not necessarily. We know that a quantum advantage can only be obtained if the quantum kernel circuit is hard to estimate classically [2]. In general, finding a good quantum kernel for a given dataset can be challenging.

How do I choose a quantum kernel?

Designing a quantum kernel is a process that strongly depends on the learning problem at hand. In general, we cannot suggest an optimal feature mapping with no prior knowledge of the learning problem.

Sometimes structure in the data can inform the selection of a kernel [1], [3]; whereas other times kernels may be chosen in an ad hoc manner. Another option is to train a parametrized quantum kernel on a labeled dataset using Qiskit.

How do I train and evaluate a quantum kernel using Qiskit?

We can perform quantum kernel training in a few steps using the qiskit-machine-learning package. For the remainder of this blog post, we will walk through the steps to perform kernel training for a binary classification task. Keep in mind, however, that Qiskit’s quantum kernel training interface can also be used for a variety of other tasks, as it allows users to specify custom inputs (i.e. loss functions, optimizers, etc.).

Now let’s see how it works!

First, we will visualize some scattered two-dimensional data with accompanying binary class labels. It is clear from the plot that this data cannot be perfectly separated solely by a linear classifier. We will later see how quantum kernels can be used to map this data onto a space where a linear classifier can perfectly separate it.

2. Next, we will design the feature map, which will map the classical input data into quantum state space. For this blog post, we will use a modified version of the ZZFeatureMap from qiskit-terra with a trainable parameter appended to the beginning of each qubit in the circuit.

Note: There is nothing to suggest that the addition of this particular trainable parameter improves the ZZFeatureMap in any way; we use it here simply to demonstrate how to train quantum kernel parameters using Qiskit.

from qiskit import QuantumCircuit
from qiskit.circuit import ParameterVector
from qiskit.circuit.library import ZZFeatureMapNUM_QUBITS = 2# Specify a single training parameter to share across both qubits
training_params = [ParameterVector('θ', 1)]#Create the feature map
zzfm = ZZFeatureMap(feature_dimension=NUM_QUBITS)# Append a trainable ry gate to the beginning of each qubit
feature_map = QuantumCircuit(NUM_QUBITS)
feature_map.ry(training_params[0], 0)
feature_map.ry(training_params[0], 1)
feature_map = feature_map.compose(zzfm)print(f"training_params: {training_params}")
feature_map.draw()--------------------------------------------------------------------OUT:
training_params: [Parameter(θ)]
     ┌───────┐┌──────────────────────────┐
q_0: ┤ Ry(θ) ├┤0                         ├
     ├───────┤│  ZZFeatureMap(x[0],x[1]) │
q_1: ┤ Ry(θ) ├┤1                         ├
     └───────┘└──────────────────────────┘

3. Once we have designed a feature map, we can set up a QuantumKernel object and specify which of its feature map parameters should be trained via the user_parameters field.

from qiskit_machine_learning.kernels import QuantumKernelqk = QuantumKernel(feature_map, user_parameters=training_params)

4. Next, we instantiate a QuantumKernelTrainer object and specify the quantum kernel, loss function, and classical optimizer to be used during training. The SVCLoss object we use in this example implements a loss function corresponding to a weighted kernel alignment for support vector classification [3]. Remember, the loss essentially calculates how far off we are from the optimal answer.

During each call to the SVCLoss object from the optimizer, the updated training parameters are assigned to the quantum kernel, and it is used to train a new linear support vector classifier (SVC). The SVC’s dual coefficients and support vectors are then used to calculate the value of the loss over the updated parameters.

import numpy as np
from qiskit.algorithms.optimizers import SPSA
from qiskit_machine_learning.utils.loss_functions import SVCLoss from qiskit_machine_learning.kernels.algorithms import QuantumKernelTrainer# Use SPSA to optimize the user parameters
spsa_opt = SPSA(maxiter=20, learning_rate=0.03, perturbation=0.01)# Set up loss obj and pass a parameter to underlying sklearn.svm.svc
loss_func = SVCLoss(C=1.0)# Instantiate QKT object with the fields required for training
qkt = QuantumKernelTrainer(
                           quantum_kernel=qk,
                           loss=loss_func,
                           optimizer=spsa_opt,
                           initial_point=np.pi/2,       )

5. Finally, we train the quantum kernel on the data visualized above and retrieve the optimized kernel matrix.

# Train the quantum kernel and retrieve the optimized kernel
results = qkt.fit(X_train, y_train)
optimized_kernel = results.quantum_kernel

6. Since we are considering a binary classification task, we can evaluate the effectiveness of the kernel on the test set using the QSVC model from qiskit-machine-learning. We will check the weighted accuracy, visualize the decision boundary created by the quantum kernel, and visualize the convergence of the training loss.

Note: The mlxtend package was used to generate the QSVC boundary image, but for the sake of brevity, we chose not to include the code used to generate the image.

# Use QSVC to evaluate the kernel we learned
qsvc = QSVC(quantum_kernel=optimized_kernel)# Fit the QSVC
qsvc.fit(X_train, y_train)# Predict the labels
y_pred = qsvc.predict(X_test)# Evaluate the accuracy on test set
accuracy = metrics.balanced_accuracy_score(
                                    y_true=y_test,
                                    y_pred=y_pred
                    )
print(f"Accuracy: {accuracy}")
--------------------------------------------------------------------OUT:
Accuracy: 1.0

When we inspect the decision boundary created by the quantum kernel overlaid with the test samples (above), we see that the quantum kernel was very effective at providing a boundary between samples of different classes.

In the graph below, we can see the loss converging as the model approaches the optimal parameter value.

How do I learn more about quantum kernel training?

The best place to learn more about quantum kernel training is the quantum-kernel-training prototype repository in theqiskit-communityGithub organization. We encourage everyone to send any questions or comments to the repository discussion board, and the prototype team will respond as quickly as we can. We look forward to hearing from you!

References

[1] Liu et al. - A rigorous and robust quantum speed-up in supervised machine learning

[2] Havlicek et al. - Supervised learning with quantum enhanced feature spaces

[3] Glick et al. - Covariant quantum kernels for data with group structure