Submitted by Moose_a_Lini t3_yjwvav in MachineLearning

The specific application is for orbital cameras - networks can be trained on earth and then sent to orbital FPGA's for use in image recognition systems. Both the earth based training system and orbital FPGA have a lot of computational power so there is no real need for reduction there, but transmission bandwidth is incredibly limited.
For context I'm trying to find a PhD topic - I have a strong background in FPGA's, space-bourne imaging systems and comms, but I'm a machine learning noob (currently furiously trying to get my head around it).
I may not be using the right terminology but my searches haven't turned up anything. (Also it may be the case that due to some information theory reason that Pruning is the optimal solution to this problem).

Any suggestions of papers, pointers in a direction or any other related tidbits would be highly appreciated.

22

Comments

You must log in or register to comment.

bernhard-lehner t1_iuqc992 wrote

It would help if you explain what exactly you want to transmit, the model, results, gradients,...? Btw, how would pruning not reduce the computational demand?

3

buildbot_win t1_iuqf15z wrote

Oh hey this is is a neat justification for this type of pruning, I did my masters on this topic! - The technique we came up with was called Dropback, basically resetting the weights to their initial values instead of zero, used in combination with a pseudo rng so you can reinit weights on the fly deterministically. You only had to track a few million parameters out of tens of millions to achieve similar accuracy. https://mlsys.org/Conferences/2019/doc/2019/135.pdf

13

Ulfgardleo t1_iuqhizr wrote

you might have misunderstood the message. OP asked if there is work on reducing the number of weights without compromising network strength as it is not compute that is the bottleneck but transmitting the object.

1

garridoq t1_iuqkmbo wrote

Recurrent Parameter Generators https://arxiv.org/abs/2107.07110 could be interesting for you. The idea is not to prune the architecture, but instead use a limited bank of parameters that generates the networks parameters

2

haowanr t1_iuqpgyp wrote

For example if you prune using the builtin unstructured pruning methods in pytorch it will not lead to faster inference as by default pytorch does not leverage the sparsity.

1

tastycake4me t1_iuqpt7c wrote

One of the most popular/studied methods of reduction of neural network weights is "Knowledge Distillation".

3

xrailgun t1_iur0bva wrote

Maybe 'Weight Agnostic Neural Networks' will help

1

dI-_-I t1_iur5z7j wrote

No because it makes more sense to start with a too large network and reduce compute to a reasonable level than keeping compute constant.

1

ElongatedMuskrat122 t1_iurk0kv wrote

Might be a bit overkill, but, instead of weight reduction, have you considered changing the architecture completely? I can’t find it so someone maybe can link it below, but, I read an article on using reinforcement learning to decide on an architecture for a neural network. Theirs was designed to just be accurate, however, you could take a similar approach while also adding a parameter to minimize overall size of the network.

Like I said, not really pruning as much as it is about overall size reduction

1

ReginaldIII t1_iurmviz wrote

This is just describing pruning though, the whole purpose of better pruning methods is reduce size without compromising performance on the intended task.

If you are embedding the weights of a model in an FPGA then the size of the FPGA is your bottleneck, it's unlikely to be your bandwidth talking to the ground because FPGA's just aren't that big relatively speaking.

Yes ground comms is a factor, but realistically A. How often are you going to be flashing new models onto your orbital systems relative to B. How much inference are you going to be doing with any one given model, and C. How big is the amount of data you'll then need to beam back down to collect those inferences.

Is the upload of the model weights really the dominant factor here?

By all means, strive to make the model as small as possible. But there's nothing special about the edge device being in orbit compared to it being on earth but hard to access.

8

LetterRip t1_iusmrac wrote

bitsandbytes LLM int8 you can quantize most weights in large models, and keep a small subset in full range, and get equivalent output. You could then also use a lookup table to further compress the weights.

2

_Arsenie_Boca_ t1_iusvc0e wrote

Parameter sharing across layers would achieve just that. In the ALBERT paper the authors show that repeating a layer multiple times actually leads to similar performance than having separate parameter matrices. I havent heard a lot about this technique, but I assume this is because people mostly care about speed, which this does not improve (while it is a good match for your usecase)

2

Zealousideal_Low1287 t1_iutmq2q wrote

As in, you care about compression for the sake of communication and not computation?

2

burn_1298 t1_iuutosp wrote

It sounds like you just need straight compression. Not neural network compression but like storage of numbers compression. That’s going to be 10x or 100x better than whatever pruning will do for you. There is research in the field of compressing models for storage or transmission.

2

bernhard-lehner t1_iuv03xs wrote

These are exactly the questions one needs to ask before even starting. I have seen it numerous times that people are working on something that might be interesting, but utterly useless at the end of the day.

1

mad_alim t1_iuwfsz0 wrote

If I understood correctly, you want to train a model on the "earth based training system" and transmit the whole model to an FPGA on space for inference (usage).

The main paper I remember reading on this is https://arxiv.org/abs/1510.00149.
(they propose techniques, including quantization to reduce size while maintaining the same accuracy.)

In general, for "embedded applications" (or edge IA, low power, etc) it is very common to quantize ANNs (weight and activation-wise. Going from floating 32bits to 8bits or lower).
(It is so common that there is: https://www.tensorflow.org/lite.)
What particularly interests you is weights quantization (because that's the biggest part you'll transmit). So I'd recommend reading more on quantization.

Another thing to consider is the architecture itself, which determines how many parameters you have in the first place.
Particularly, convolutional layers use a small shared kernel for each convolution (e.g. 3x3 weights) but compute several dot matrix multiplications, whereas dense layers are essentially one dot matrix multiplication but with n_in*n_out weights.
(Keep in mind that compression is just a tangent to my research topic and that my main education was not in CS/ML, so I might be missing relevant topics)

2