Quantum Munging

Hashtag #QTML2021

Nicholas Teague
From the Diaries of John Henry
8 min readNov 13, 2021

--

Enjoyed a whirlwind of a week playing catch up in the emerging field quantum machine learning by way of the QTML research conference, aka Quantum Technologies in Machine Learning. One thing I have learned from my forays into research conferences is that the currency of the venues are contributions — whether to theory, experimental findings, etc. You will have a better experience if you are attempting to contribute to state of the art. Spectators stay home. It is in that vein that I would like to share a few takeaways.

This blog has covered several aspects of this field over the last few years, ranging from fundamentals of quantum computers, quantum machine learning, quantum supremacy, learning libraries, and a few other tidbits here and there. I offer these links with the caveat that this field is rapidly progressing, and there has been no shortage in developments in the time spanning between these essays. This is an understatement.

In earlier generations of quantum machine learning, one of the primary channels of investigation was associated with attempting to speed up classical machine learning with quantum subroutines, like performing linear algebra operations with the HHL (Harrow-Hassidim-Lloyd) algorithm, or parameter tuning with Grover’s search algorithm. Many of the newer conventions of quantum machine learning have instead sought to replace the underlying targets of training — trading the tuning of weights in networks of classical perceptrons with tuning circuits of qubits with parameterized gates. You’ll see terms thrown around like PQC (parameterized quantum circuits) or QNN (quantum neural networks). I believe the distinctions between the two are subtle and the terms are sometimes used interchangeably. Both these conventions have a training phase to tune parameterized gates based on data properties channeled as an input to the circuit, perhaps one way to think about it could be that a QNN may encompasses one or more PQC modules integrated with classical machine learning modules that are collectively trained from a common gradient signal fed from a loss function evaluation in a manner similar to backpropagation. These conventions differ from quantum kernel learning which is a non-parametric approach to QML.

Jonas M. Kübler, Simon Buchholz, Bernhard Schölkopf. The Inductive Bias of Quantum Kernels. (2021) arxiv:2106.03747

You may be familiar with kernel methods from classical learning architectures like support vector machines — which were a precursor to modern paradigms of deep learning. In a kernel learner, the training is basically learning how to translate the map of the feature space into a form with more coherent groupings between classification targets, such as may be more easily linearly separable. In quantum kernel methods, there is a similar feature map translation taking place, however with the added orthogonal space available to quantum channels the translations may have benefits over classical kernels. Consider that researchers at Google have demonstrated that even for classical learning, a quantum kernel can be used as a form of preprocessing to embed classical features into a Hilbert space followed a projection back to classical space for improved grouping characteristics.

Hsin-Yuan Huang, Michael Broughton, Masoud Mohseni, Ryan Babbush, Sergio Boixo, Hartmut Neven, Jarrod R. McClean. Power of data in quantum machine learning. (2020) arxiv:2011.01938

There is a little bit of noise surrounding the question of where quantum learning can concretely outperform classical learning. It’s important to keep in mind that quantum algorithms of sufficiently compact scale can in some cases be implemented on classical hardware — such as with the aid of tensor networks. Classical learning paradigms can often be applied as a replacement for quantum learning paradigms — even when evaluating quantum data, albeit often with a higher burden for quantity of training data to reach parity. The sophistication of classical learning has had much longer to progress, and doesn’t have to overcome the limitations of current paradigms of NISQ era quantum hardware (“noisy intermediate scale quantum”), which until hardware reaches sufficient performance and scale that fault tolerance can be achieved by quantum error correction means that each gate application will introduce some degree of noise into quantum algorithms. That being said one of the researchers noted a concrete proof that when evaluating quantum data, even though quantum learning doesn’t have advantage over classical learning for average prediction error, if one is trying to manage the worst-case prediction error, quantum learning may have exponential advantage owing to classical learning’s exposure to the uncertainty of measurements.

Hsin-Yuan Huang, Richard Kueng, Giacomo Torlai, Victor V. Albert, John Preskill. Provably efficient machine learning for quantum many-body problems. (2021) arxiv:2106.12627

When tuning the parameterized gates of a quantum circuit through training there is a unique obstacle to the quantum realm in comparison to classical learning known as “barren plateaus”, which refers to the loss of gradient signal within proximity of certain ranges of the optimization hyperplane. I didn’t see this stated explicitly, but the impression I got was that barren plateaus are often associated with circuit and parameter initializations as opposed to intermediate stages of training. Samson Wang noted that when encountering barren plateaus, the number of shots required to reach a gradient signal can climb exponentially. (Shots refers to the application of one round of quantum circuit initialization / gate applications / measurements, which is often the pricing basis for cloud vendors.) In a talk by Zoe Holmes it was noted that barren plateaus can arise from circuits that are either too expressive or too entangled, and that randomness can be a barrier to trainability (suggesting that some conventions used for weight initializations in classical learning like He initialization may be less suitable for parameterized gates). In quantum learning alternate initialization strategies to random sampling may be applied by heuristics like reusing parameters from similar data sets. One of the researchers offered an extension to such heuristics, noting that in many settings broad categories of applications can be grouped by initialization characteristics independent of circuit size, such that even if the improved initializations are derived on a smaller representative ansatz (which is a term referring to the structure of a parameterized quantum circuit), those same initialization characteristics can be extended for use towards larger scale applications.

Frederic Sauvage, Sukin Sim, Alexander A. Kunitsa, William A. Simon, Marta Mauri, Alejandro Perdomo-Ortiz. FLIP: A flexible initializer for arbitrarily-sized parametrized quantum circuits. (2021) arxiv:2103.08572

Just like how classical learning can be categorized into applications like e.g. classification / regression, reinforcement learning, generative learning, and unsupervised learning, quantum machine learning has potential applicability in each of these subdomains. Elton Zhu noted that quantum computers are particularly well suited for generative models in comparison to discriminative models, offering that the generative models may be less sensitive to the noise inherent in NISQ hardware. Roger Melko demonstrated the use of quantum generative models for simulations of nature (eat your heart out Richard Feynman). In some cases the paradigms can be combined. Daochen Wang offered a quantum algorithm for reinforcement learning with a generative model. One of the reasons quantum learning may have benefits over classical stems from their ability to produce probability distributions that are difficult to simulate classically, owing to the expressivity arising from quantum properties like “contextuality and complementarity” (this is referring to some fundamental properties of quantum mechanics associated with measurements and orthogonality principles, hat tip Niels Bohr). One of the researchers offered that for applications in sequence models (like in mainstream practice could be conducted with recurrent networks or transformers), this type of expressivity advantage can elevate the performance of hidden Markov models in comparison.

Eric R. Anschuetz, Xun Gao. Quantum Advantage in Basis-Enhanced Neural Sequence Models. (2021) Extended abstract

It was of particular interest to this author discussions surrounding data encodings in the context of quantum learning. Aikaterini Gratsea noted that there are works suggesting that classical data encoding when fed into a QNN is more important than the network architecture itself. Samuel Chen described common forms of data encoding including “amplitude encoding” in which data is encoded as Pauli rotations (referring to fundamental quantum gate types like X, Y, Z, H) and “variational encoding” in which input numbers are used as quantum rotation angles. Elton Zhu suggested that for some kinds of features a preceding data mapping into “copula space” could be easier for a quantum computer to learn. Perhaps the most relevant talk was by Matthias Caro in which he sought to benchmark the influence of data encodings towards parameterized quantum circuit model complexity, which directly impacts performance towards training error and generalization error. Caro focussed on applications of amplitude encoding using “generalized trigonometric polynomials” (GTPs), which refers to a type of parameterized quantum circuit in which the data encoding gates are intermingled with the tunable parameterized gates.

Matthias C. Caro, Elies Gil-Fuster, Johannes Jakob Meyer, Jens Eisert, Ryan Sweke. Encoding-dependent generalization bounds for parametrized quantum circuits. (2021) arxiv:2106.03880

Caro cited a work in reference to generalized trigonometric polynomials which was a good read, and I’ll offer some further clarifications even though this one wasn’t included in the conference proceedings. A helpful way to think about what is being represented in amplitude encoding is that each qubit rotation — which may be applied sequentially or in parallel — is encoding one sine wave, which basically translates to one frequency in a Fourier series. The intermingled parameterized gates are encoding the Fourier coefficients. This suggests that time series learning and signal processing, which naturally benefit from Fourier representations, are particularly well suited for quantum machine learning. The author notes that even with the ability to implement very wide and deep quantum circuits which are intractable to simulate classically, the expressivity of the quantum model is fundamentally limited by the data encoding strategy. Classical pre-processing of the data, such as may create more features, can give even small models more expressivity by enriching the feature spectrum.

Maria Schuld, Ryan Sweke, Johannes Jakob Meyer. The effect of data encoding on the expressive power of variational quantum machine learning models. (2021) arxiv:2008.08605

The author of this essay is not formally integrated into the quantum computing ecosystem. I am a founder of a startup offering solutions for classical pre-processing. Automunge is a python library that automates the preparation of tabular data for ML, and may also serve as a platform for engineering pipelines of univariate transformations fit to properties of a training set. We have an extensive internal library of data encoding options, which may be mixed into sets with custom defined operations. We have built in support for auto-ML derived missing data infill. We have several options for encoding numeric features into qubit friendly representations via our “qbt1” family of transforms, which translates the features in a dataframe into maps for binary decimal digits, with push button inversion support to recover prior numeric forms. We believe that classical learning differs from quantum learning in that noise injections may actually be more of a help than a hindrance, with several potential benefits like data augmentation, bias mitigation, and non-determinism.

For further readings please check out A Table of Contents, Book Recommendations, and Music Recommendations. For more on Automunge: automunge.com

Preservation Hall Jazz Band — Keep Your Head Up

--

--

Nicholas Teague
From the Diaries of John Henry

Writing for fun and because it helps me organize my thoughts. I also write software to prepare data for machine learning at automunge.com. Consistently unique.