Viewing a single comment thread. View all comments

Ambitious_Smile_981 t1_iwdrmam wrote

I don't see the problem of differentiating inverted and non-inverted architectures, as they are both generative models. The difference lies in what you are generating. In one case, you generate the label, and give as prior information the image, in the other, you generate the image giving the label as prior information.
Both have their advantages and disadvantages, but I don't see why the 'inverted' one is not interesting.

As of the BP = PC literature, I think that showing that by simply introducing a temporal scheduling for the weight updates of PC, we are able to obtain exact BP is interesting. I agree that this variation of PC loses all the advantages that PC has over BP, but it is still important to know that it is possible to derive exact backprop from a variational free energy.

1

BerenMillidge t1_iy814ur wrote

Hi, author of some of the papers linked here. Broadly, Maizeq is right to distinguish between FPA-PC and ‘standard PC’ (the ‘inverted vs generative direction of the PC net is a different orthogonal direction). The equivalence between PC and BP only holds exactly in the case with the FPA (or some equivalent set of assumptions — for instance in the original Whittington paper they use the precision ratio tending to 0. Of course all of these limits are in some sense extreme and eliminate some (but not all) of the major advantages of PC (in some sense this was inevitable since if they exactly equal BP then they must very roughly have the same advantages/disadvantages as it). The way to view these works, at least as I have come to view them, is as a idealised exploration of a specific limit of PC. In recent work (https://arxiv.org/pdf/2206.02629), we expand on this limit idea and show that all current EBM approximations to BP, such as PC, Equilibrium-prop and Contrastive Hebbian learning, can be expressed as a single ‘infinitesimal inference limit’.

Overall I disagree that the work in this vein is particularly misleading, although this is a subjective assessment. It is upfront about the assumptions you need to make to obtain equivalence to backprop, as well as how this departs from standard PC.

Of course, from a neuroscientific perspective, this limit is perhaps not the most realistic and so we are also exploring the ML performance of more ‘standard’ PC versions which are more biologically plausible and which don’t approximate backdrop (, as well as specifically understanding the special advantages and disadvantages of these algorithms. For instance, in a recent paper -- https://www.biorxiv.org/content/biorxiv/early/2022/05/18/2022.05.17.492325.full.pdf --, we propose a new understanding of standard PC as ‘prospective configuration’ and demonstrate how this version of PC can outperform backdrop in a number of its properties. We also have a more theoretical analysis of standard PC (https://arxiv.org/pdf/2207.12316) where we show that although it differs from backdrop, it can also converge to minima of a supervised loss function, and has close links to target-propagation and hence Gauss-Newton optimization. Our groups have also explored other potential advantages of PC over BP including the ability for it to learn arbitrary recurrent computation graphs (https://arxiv.org/pdf/2201.13180), the fact that you can significantly speed it up with incremental variants, and that you can get PC to perform a mix of iterative and amortised inference https://arxiv.org/pdf/2204.02169.

In terms of the hardware, I have also looked into this a little, and my feeling is that while PC has better parallelism properties than PC, it is unlikely to outperform BP on a GPU due to the need to iteratively perform the inference phase while BP just has a sequential forward and backward. GPUs are now getting very highly optimised for the exacts style of computations needed in BP for large scale ANNs. PC does possess a much higher degree of parallelism and locality than BP and on a sufficiently distributed architecture may eventually prove better, especially once we start building proper ‘neuromorphic’ processor-in-memory architectures. However this seems likely to be many years away. I haven’t read much about Erlang so I’m not sure if it possesses the degree of necessary parallelism. One possibility is that Erlang with Pc might allow you to move to a different point on the Pareto frontier of having lots of CPUs and developing learning algorithms comparable in performance with doing BP on a single GPU. I haven’t run any fermi-style estimates of whether this is feasible or not. We have some calculations about this in a forthcoming paper but this is on a highly abstract computation model of ‘parallel matrix multiplications’ and I haven’t figured out what the actual equivalent calculations for realistic hardware would look like.

2