I was doing some looking into QML (specifically quantum neural networks) this past week, checking out a few of the python frameworks like Qiskit and Pennylane, and I've come up against this fundamental problem with QNN's right now. Because of the no-cloning theorem, you cannot save state. Because you cannot save state, you cannot do backprop. And because you can't do backprop, you have to run the model twice for each parameter, varying it to find the derivative (whether you're using finite diff or parameter shift). So classical NN's are O(1) per data, you do one feedforward and one backprop per input, where with QNN's it's O(n)!!

So, that seems completely insane. Leaving aside the technical problems and assuming we could just build a n-scale quantum system with no decoherence noise, a "small" classical model like Resnet18 has 11,000,000 trainable parameters. That's 22 million forward passes just to do one training step for the QNN variation. My thought is, there must be some advantage I'm missing. Like, that the representational ability of a coherent quantum system must be enormous compared to classical. But all the examples I've seen of equivalent-size QNN's to NN's show pretty much equivalent performance.

What's up then? Is there an advantage at scale that we haven't realized yet? Or are they hoping to find some technique of finding the gradients that takes much less time? Am I just missing something?

Comments

pmirallesr t1_irnmvpg wrote on October 9, 2022 at 5:16 PM

#64,814

You're thinking about implementing classic ml (backprop) in qc. Proponents of quantum ML look for alternative ways of "machine learning", either not calculating gradients, or trying to exploit the properties of quantum mechanics to "learn better". If it all sounds very fuzzy it's because it is

avialex OP t1_irnnm6a wrote on October 9, 2022 at 5:21 PM

#64,843

Replying to pmirallesr (#64,814)

They certainly are looking, but at the same time gradient calculation is fundamental to how quantum neural networks are implemented right now, and QNN's are a relatively active area of study. I don't think we can dismiss the work in the field as it stands, because it's all built on the foundation of gradient descent. Afaik no one has yet found a better way to train a QNN, even on quantum data. I could be wrong.

anomalousraccoon t1_irno8g5 wrote on October 9, 2022 at 5:25 PM

#64,866

you need a quantum algorithm to take advantage of a quantum computer. it's pointless to run a classical algorithm like gradient descent on a quantum computer. I don't know of any quantum algorithms that currently exist for ML but a very well known one for prime factorization is Shor's algorithm.

gosh-darnit- t1_irnpn8b wrote on October 9, 2022 at 5:34 PM

#64,916

Replying to pmirallesr (#64,814)

This is my understanding as well. Quantum computing offers another computing paradigm, which opens up new possibilities. There's little point of thinking of algorithms that work well using the current computer paradigm.

Unfortunately I know too little of QML to give specific examples of what new opportunities it might provide.

avialex OP t1_irnpnry wrote on October 9, 2022 at 5:34 PM

#64,917

Replying to anomalousraccoon (#64,866)

Quantum NN's are quantum algorithms, are they not? Are you thinking of hybrid nets where only a few neurons are quantum?

edit: ok I see, you're saying GD is the problem, we need a QC algorithm to train QNN's. I would definitely agree, but as it stands I don't think there is one?

pmirallesr t1_irnqpq4 wrote on October 9, 2022 at 5:40 PM

#64,956

Replying to avialex (#64,843)

That's fair. I liked Maria Schuld's research on QML.

https://arxiv.org/abs/2101.11020

Less-Article1309 t1_irnsaed wrote on October 9, 2022 at 5:51 PM

#65,035

eh who cares about quantum models, too much incoherence and noise. D-Wave's adiabatic quantum computer is showing a lot of promise in the optimization arena. All that needs to happen is a quantum method used for finding good classical ANN minima instead of SGD; then there would no longer be the requirement for lengthy and costly GPU training if quantum annealing of ANN weights becomes a reality.

utilop t1_ironow2 wrote on October 9, 2022 at 9:18 PM

#65,994

Replying to Less-Article1309 (#65,035)

Why did this get downvoted?

Is there some fundamental limitation implying that we would have to rely on SGD and cannot do the optimization through superposition?

chatsagnik t1_irpvukn wrote on October 10, 2022 at 3:09 AM

#67,960

Theoretically, QNNs don't need backpropagation exclusively to train. See https://pennylane.ai/qml/glossary/parameter_shift.html to understand parameter shift rule. If you are willing to do a deeper dive, check out this paper: https://arxiv.org/abs/2105.00080 on Quantum GANs to get another sense of how the classical and quantum worlds diverge.

You also have to understand that quantum data is not the same as classical data. So your O(1) and O(n) are not really sensible in this argument. For more context, read quantum state preparation papers, and data loading papers (just google). You could also

Will any of this ever be feasible? Probably. Billions of dollars are being sunk in QEC and Hardware RnD. Some of the sharpest minds are working on it. Even if nothing works out, we at least know a new way to do math :D

bran-bar t1_irqcnh2 wrote on October 10, 2022 at 6:10 AM

#68,579

I spent some (not a lot) time with quantum machine learning algorithms and i agree with many comments that basically say that you are taking a not so sophisticated projection from classical ml onto the quantum realm. Here is an example of an algorithm where you adjust al parameters in O(1) per training example: https://arxiv.org/abs/1804.11326.

graphicteadatasci t1_irqq0jr wrote on October 10, 2022 at 9:25 AM

#68,953

But you don't have to do backprop to train a neural network. Even without anything quantum you could do simulated annealing. It's just that SGD is fast and effective.

Less-Article1309 t1_irrjhfe wrote on October 10, 2022 at 2:30 PM

#70,282

Replying to utilop (#65,994)

There's plenty of other optimization methods out there, simulated annealing for example. SGD just lends itself well to the massively parallel architecture of Nvidia GPUs, that's the only reason why it's so prevalent in the industry.