Viewing a single comment thread. View all comments

Crazy-Space5384 t1_iseeqex wrote

But they limit the size of the decompressor executable so that it cannot contain a priori knowledge about the text corpus. Meaning you can‘t include a pretrained network…

2

MTGTraner t1_isem4w5 wrote

That seems fair, no? Otherwise, you could just deploy an overfitted model!

7

BrotherAmazing t1_isfrx3g wrote

And might need ‘N’ decompressors on your PC for ‘N’ files, and the size of those decompressors might he so large that it starts to outweigh the savings in compression. I mean, a decompressor that knows what the text is can magically “decompress” a file of size 0 with nothing in it to the original text. lol

1

midasp OP t1_isf183v wrote

It doesn't really matter. With just a language model trained on general use English (or whatever human language is in the corpus), it should still be able transform each sentence or paragraph into a short encoding.

1

Crazy-Space5384 t1_isf1r74 wrote

But so does traditional data compression. So it‘s to be proven that an ML model gets closer to the entropy limit - given that the model must be transferred alongside the encoded text given the size restriction of the decompressor binary.

1

midasp OP t1_isf72lo wrote

I'm sorry, I should have clarified that I have no interest in the Hutter Prize or its rules, nor is it about the getting close to the entropy limit.

My idea is more about the transmitter and receiver already having mutually shared information (stored within the ML model). In such a situation, the transmitter can reduce the amount of information that needs to be transmitted because it does not have to transmit that mutually shared information. The receiver will be able to combine the transmitted information with its shared information to rebuild the original message.

I should not have used the term "image compression", that is an error on my part and I apologize if it lead any confusion. It is only "compression" in the sense that we are transmitting less information rather than transmitting the message in its entirety and pushing the limits of information compression.

0