Comments

You must log in or register to comment.

kompootor t1_jdpvjj4 wrote

In short, based on what you are describing, a LLM is a terrible tool for the compression of its training data in comparison to virtually any other reasonable compression technique one could think of, by any metric.

When you talk about compression, you're generally talking about some raw data that you run through an algorithm which compresses it into a more manageable form, and then you run it through another algorithm to recover the raw data again with some amount of lossiness (or it can be lossless). AI models can do that, sure, but they are not designed to be data structures for storage and retrieval -- in a simplified ANN model they take new training data that is given to them, and in adjusting their weights the model can now interpolate between this new data and previous training data. That might, however, make it so that now asking this model to recall a specific piece of old training data will result in an even fuzzier, less-faithful output, the tradeoff being that the model can now be asked about hypothetical data between what it's been trained on. (I'll have to find a good intro guide for a simple ANN model that illustrates this with diagrams.) None of this gets into space, time, or resource efficiency, but those are all guaranteed to be worse than a dedicated compression algorithm in any practical as well.

I suppose you can look at a broad overview of how data compression works in general. There are ANN/AI algorithms for compression -- they use the predictive network to essentially tune an existing deterministic compression algorithm, optimizing it for the data that's being compressed. That's not anywhere close to similar to taking an ANN like a large language model and locating the compressed data entirely in the ANN's weights.

I don't know if this helps -- I can try to clarify stuff or provide some better articles if you like.

4

adfoucart t1_jdq3jy5 wrote

The parameters don't store the training data. They store a mapping between inputs (for LLMs: sequences of words) and predicted outputs (next word in the sequence). If there is not a lot of training data, then this mapping may allow you to recall the specific data points in the training set (eg if you start a sentence from the data set, it will predict the rest). But that's not the desired behaviour (such a model is said to "overfit" the data.

If there is enough data, then the mapping no longer "recalls" any particular data point. It instead encodes relationships between patterns in the inputs and in the outputs. But those relationships "summarize" many data points.

So for instance when an LLM completes "Napoléon was born on" with "August 15, 1769", it's not recalling one specific piece of information, but using a pattern detected from the many inputs that put those sequences of words (or similar sequences) together.

So it's not really accurate to talk about "compression" here. Or, rather, LLMs compress text in the same sense that linear regression "compress" the information of a point cloud...

2

askscience-ModTeam t1_jdq4j0c wrote

Thank you for your submission! Unfortunately, your submission has been removed for the following reason(s):

  • This question is based on fundamentally flawed premises. Please conduct some background research and revise your question if you wish to resubmit.

  • Deep learning models are not compression methods.

1

samyall OP t1_jdqacnn wrote

I really like your last point there. That is a good analogy.

I guess my question boils down to "how to think about information in a trained model". What I am wondering is whether a model can carry more information than it's raw size which I think it may be able to conceptually as the relationship between neurons carries information but isnt reflected in the file size of a model.

So like a regression represents a point cloud, could we now vectorise a book or a movie (if that was what we wanted)?

1