Building a Quantum Variational Classifier Using Real-World Data

Published in

Qiskit

12 min readApr 9, 2021

Interested in getting started with Qiskit Machine Learning? Learn more here.

Hello world, I’m Rodney Osodo. An undergrad student at Jomo Kenyatta University of Agriculture and Technology in Kenya. I’ve been interested in quantum computing for a while now and am so excited to share my learning from my most recent experience with quantum computing.

For my Quantum Open Source Foundation project, I built a quantum variational classifier using a heart attack dataset. The purpose of this project was to help me gain insight into the actual construction of a quantum model, applied to real data. By sharing these insights, I hope to help many of you understand and learn much of the dynamics that accompany quantum machine learning, which I grasped while doing this project.

Before I get started, it’s important to note that quantum machine learning is in its early stages, and this blog does not discuss, or claim to have found, any advantages in using these methods over classical machine learning methods. However, it was an exciting way to get started experimenting with this field, and I hope it will inspire you to experiment as well.

My project plan was to:

Explore a specific dataset and preprocess it. For this project, we decided to use heart attack data as our baseline. This is because, in medical aspects, heart disease is the leading cause of death. In computation aspects, the dataset was rather small and we could easily fit it on today’s quantum computers. We also used the iris dataset and wine datasets for validation, included at the bottom of this blog.
Create a quantum neural network (AKA variational classifier) by combining a feature map, variational circuit, and measurement component (don’t worry, I will explain what these components mean in detail).
Explore different types of optimizers, featuremaps, depths of featuremaps and depths of the variational circuit.
Explain my observations based on the best 10 model configurations.
Try to understand why these models performed the best and see if they are able to generalize well on new data.

Exploratory Data Analysis

Exploratory Data Analysis (EDA) refers to the critical process of performing initial investigations on a dataset so as to discover patterns, spot anomalies, test hypotheses and check assumptions. It it usually good practice in data science to explore the data first before getting your hands dirty and starting to build models. I won’t dwell too much on this section, since I’d like to focus more on QML than on the data!

We also shuffled the data to introduce some randomness, we removed less relevant features, and we normalized the data using sklearn.preprocessing.MinMaxScaler between ranges-2π and 2π. This is to ensure we utilize the Hilbert space appropriately as we will be encoding the data into quantum states via rotation angles. We split the data into training set for creating the model and test set for testing the model, keeping the test set size to 30% of the total dataset which is rather standard in classical machine learning.

Explaining Variational Quantum Classifiers

For this point onward, we will be using the Qiskit framework to do our quantum computing. A typical quantum machine learning model is composed of two parts, a classical part for pre- and post-processing data and a quantum part for harnessing the power of quantum mechanics to perform certain calculations more easily, such as, perhaps, solving systems of linear equations. One of the main motivations for using quantum machine learning is because it is difficult to train very large machine learning models on huge datasets. The hope is that one day, features of quantum computing can be used as resources.

A quantum neural network has many definitions in literature, but can broadly be thought of as a kind of variational quantum circuit, or one with variable parameters. With this, we can define a quantum neural network as a variational quantum circuit that can be optimized by training the parameters of the quantum circuit, which are qubit rotations, and the measurement of this circuit will approximate the quantity of interest — i.e. the label for the machine learning task.

We hope that any part of this process is better on a quantum computer. To pursue the task of classification using quantum machine learning, we construct a hybrid neural network based on a quantum variational classifier. We hope one day that quantum variational classifiers will have an advantage over certain classical models through a higher effective dimension and faster training ability.

Given a dataset about patient’s information, can we predict based on the training data if they are likely to have a heart attack or not. This is a binary classification problem, with a real input vector x and a binary output y in {0,1}. We want to build a quantum circuit whose output is a quantum state

Process

We build this quantum state by designing a quantum circuit that behaves similarly to a traditional machine learning algorithm. The quantum machine learning algorithm contains a circuit which depends on a set of parameters that, through training, we will optimize to reduce the value of a loss (aka cost) function.

In general, there are three steps to this type of quantum machine learning model: state preparation, model circuit, and measurement.

Step 1: Data encoding/state preparation

We begin by performing certain operations that will help us work the classical data into quantum circuits. One of the steps is called data embedding, which is the representation of classical data as a quantum state in Hilbert space via a quantum feature map — similar to a classical feature map, in that it helps us translate our data into a different space, in this case quantum states, so that we can input it into the algorithm. We are producing a quantum circuit in which the parameters depend on the input data, which for our case is the classical heart attack data. Recall that a variational quantum circuit depends on parameters that can be optimized by classical methods.

For embedding we take a classical data point, x, and encode it by applying a set of gate parameters in the quantum circuit where gate operations depend on the value of x, hence encoding our classical data xᵢ into quantum states |ϕ(xᵢ)>

In this analysis, we use three different types of featuremaps precoded in the Qiskit circuit library, namely ZZFeaturemap, ZFeaturemap and PauliFeaturemap. We varied the depths of these featuremaps (1, 2, 4) in order to check the different models’ performance. By increasing a feature map’s depth, we introduce more entanglement into the model and repeat the encoding circuit.

Once we’ve applied our feature map, a quantum computer can analyze the input data in this feature space, and a classifier can find a hyperplane to separate the data.

Step 2: Model Circuit

The second step is the model circuit, or the classifier strictly speaking. We create a parameterized unitary operator U(w) such that:

|ψ(x: θ)> = U(w) |ψ(x)>

The model circuit is constructed from gates that evolve the input state. The circuit is based on unitary operations and depends on external parameters which will be adjustable. Given a prepared state |ψᵢ>, the model circuit U(w) maps |ψᵢ> to another vector |ψᵢ>= U(w)|ψᵢ>. In turn, U(w) consists of a series of unitary gates.

We used the RealAmplitudes variational circuit from Qiskit for this. Increasing the depth of the variational circuit introduces more trainable parameters into the model.

Step 3: Measurement

The final step is the measurement step, which estimates the probability of belonging to a class by performing certain measurements. It’s the equivalent of sampling multiple times from the distribution of possible computational basis states and obtaining an expectation value.

For demonstration purposes I made some design considerations. I chose the final circuit to have ZZFeatureMap with a depth of 1 and a variational form RealAmplitudes with a depth of 1. This is to make a simple illustration of how the full model works.

Training

As alluded to above, during training we aim to find the values of parameters to optimize a given loss function. We can perform optimization on a quantum model similar to how it is done on a classical neural network. In both cases, we perform a forward pass of the model and calculate a loss function. We can then update our trainable parameters using gradient-based optimization methods since the gradient of a quantum circuit is possible to compute. During training we use the mean squared error (MSE) as loss function. This allows us to find a distance between our predictions and the truth, captured by the value of the loss function.

We will train our model using ADAM, COBYLA and SPSA optimizers. Below, I included s a brief explanation of these optimizers, but I encourage you to read a bit further on their pros/cons.

The code can be found here. Please note that Qiskit has received an update, so some of the imports in my code will need to be replaced; for example, all from qiskit.aqua.components.optimizers import ... statements need to be changed to from qiskit.algorithms.optimizers import ... in the notebooks.

Implementation

We initialize our circuit in the zero state (all qubits in state zero)

2. We use a feature map such as, ZZFeaturemap, ZFeaturemap or PauliFeaturemap and choose the number of qubits based on the input dimension of the data and how many repetitions (i.e. the circuit depth) we want. We use 1, 3, 5.

3. We choose the variational form as RealAmplitudes and specify the number of qubits as well as how many repetitions we want. We use 1, 2, 4 to have models with an increasing number of trainable parameters.

4. We then combine our feature map to the variational circuit. ZZfeaturemap and RealAmplitudes,both with a depth of 1.

5. We create a function that associates the parameters of the feature map with the data and the parameters of the variational circuit with the parameters passed. This is to ensure in Qiskit that the right variables in the circuit are associated with the right quantities.

6. We create another function that checks the parity of the bit string passed. If the parity is even, it returns a ‘yes’ label, and if the parity is odd it returns a ‘no’ label. We chose this since we have two classes and parity checks either returns true or false for a given bitstring. There are also other methods, e.g., for three classes you might convert the bistring to a number and pass is through an activation function. Or perhaps you might interpret the expectation values of a circuit as probabilities. The important thing to note is that there are multiple ways to assign labels from the output of a quantum circuit and you need to justify why or how you do this. In our case, the parity idea was originally motivated in this very nice paper (https://arxiv.org/abs/1804.11326) and the details are contained therein.

7. Now we create a function that returns the probability distribution over the model classes. After measuring the quantum circuit multiple times (i.e. with multiple shots), we aggregate the probabilities associated with ‘yes’ and ‘no’ respectively, to get probabilities for each label.

8. Finally, we create a function that classifies our data. It takes in data and parameters. For every data point in the dataset, we assign the parameters to the feature map and the parameters to the variational circuit. We then evolve our system and store the quantum circuit, so as to run the circuits at once at the end. We measure each circuit and return the probabilities based on the bit string and class labels.

Results

Data classification was performed by using the implemented version of VQC in IBM’s framework and executed on the provider simulator. Please note that Qiskit has since been upgraded to 0.25, featuring a phasing out of the elements. You can learn more here.

qiskit==0.23.1
qiskit-aer==0.7.1
qiskit-aqua==0.8.1
qiskit-ibmq-provider==0.11.1
qiskit-ignis==0.5.1
qiskit-terra==0.16.1

Every combination of the experiments were executed with 1024 shots, using the implemented version of the optimizers. We conducted tests with different feature maps and depths, the RealAmplitudes variational form with differing depths and different optimizers in Qiskit. In each case, we compared the loss values after 50 training iterations on the training data. Our best model configs were:

From the results, the ZFeatureMap with a depth of 2, RealAmplitudes variational form with a depth of 5 and the SPSA optimizer achieved the lowest cost. These results seem to indicate that the feature map which resulted in a lower cost function generally was the ZFeatureMap. But does this mean that the ZFeaturemap typically performs better in general?

Questions

1. Does increasing the variational form depth increase convergence?

Increasing the depth of the variational form does not seem to increase convergence of any of these models substantially. Note that increasing the variational form’s depth implies introducing more trainable parameters into the model. One would naively think that more parameters in the model would allow us to model things better and capture more intricate relationships that exist in the data, but perhaps these models are simply too small to exploit any of these advantages through higher parameterization.

2. Does increasing featuremap depth increase convergence?

When increasing feature map depth on ZZFeatureMap ADAM (maxiter=50) and PauliFeatureMap ADAM(maxiter=50), this does increase the convergence of model training. The other model configurations don't change significantly (in some, increasing the feature map depth actually reduces convergences almost linearly - why this happens could make for an interesting research project!).

3. How do the models generalize on different datasets?

As a final experiment, we benchmarked these results on the iris and wine datasets, two popular datasets used in classical machine learning and of the same dimension of the heart attack data, hence we can also use 4 qubits to model it. This time, the best model configs were:

Iris dataset:

Wine dataset:

Discussion

This time, our best model configs are totally different! What’s fascinating about this is that the dataset used seems to demand a particular model structure. This makes sense intuitively, right? Because the first step in these quantum machine learning models is to load the data and encode it into a quantum state. If we use different data, perhaps there is a different (or more optimal) data encoding strategy depending on the kind of data you have.

Another thing that surprised me, especially coming from a classical ML background, is the performance of the SPSA optimizer. I would have thought something more state-of-the-art, like ADAM, would be the clear winner. This was not the case at all. It would be cool to understand why SPSA seems to well suited for optimizing these quantum models.

A final remark is that we only looked at the loss values on training data. Ultimately we would like to also see if any of these quantum models are good at generalization. A model is said to have good generalization if it is capable of performing well on new data that it has never seen before. A proxy for this is usually the error we would get on test data. By taking the best configs here and checking their performance on test sets, we could gauge how well these toy models perform and generalize which would be pretty interesting even in these small examples!

We are now (sadly!) at the finishing line. We have come so far and there are still many more open questions to uncover. If you are interested in any of this work, please feel free to reach out, and maybe we could collaborate on something cool! Hopefully, you have understood the pipeline of training a quantum machine learning algorithm using real world data. Thank you for reading these posts and thanks to Amira Abbas for mentoring me through the QOSF program. Until next time :)

References

https://en.wikipedia.org/wiki/Quantum_machine_learning
https://medium.com/xanaduai/analyzing-data-in-infinite-dimensional-spaces-4887717be3d2
https://arxiv.org/abs/1412.6980
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-cobyla.html
https://www.jhuapl.edu/spsa/
Ventura, Dan and Tony Martinez. “Quantum associative memory” Information Sciences 124.1–4 (2000):273–296
M. Schuld and N. Killoran, Phys. Rev. Lett. 122, 040504 (2019)
A. Abbas et al. “The power of quantum neural networks.” arXiv preprint arXiv:2011.00027 (2020).