Viewing a single comment thread. View all comments

ThrowThisShitAway10 t1_isdsqd5 wrote

Yes of course. A lot of compression is moving towards AI-based methods because they can be a lot better.

There is actually an explicit connection between AI and compression. It is believed that advanced methods to compress text are equivalent to the AGI problem. There's even a million dollar prize for anyone who can make progress in this domain: https://en.wikipedia.org/wiki/Hutter_Prize

11

WikiSummarizerBot t1_isdsrcg wrote

Hutter Prize

>The Hutter Prize is a cash prize funded by Marcus Hutter which rewards data compression improvements on a specific 1 GB English text file, with the goal of encouraging research in artificial intelligence (AI). Launched in 2006, the prize awards 5000 euros for each one percent improvement (with 500,000 euros total funding) in the compressed size of the file enwik9, which is the larger of two files used in the Large Text Compression Benchmark; enwik9 consists of the first 1,000,000,000 characters of a specific version of English Wikipedia. The ongoing competition is organized by Hutter, Matt Mahoney, and Jim Bowery.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

9

Crazy-Space5384 t1_iseeqex wrote

But they limit the size of the decompressor executable so that it cannot contain a priori knowledge about the text corpus. Meaning you can‘t include a pretrained network…

2

MTGTraner t1_isem4w5 wrote

That seems fair, no? Otherwise, you could just deploy an overfitted model!

7

BrotherAmazing t1_isfrx3g wrote

And might need ‘N’ decompressors on your PC for ‘N’ files, and the size of those decompressors might he so large that it starts to outweigh the savings in compression. I mean, a decompressor that knows what the text is can magically “decompress” a file of size 0 with nothing in it to the original text. lol

1

midasp OP t1_isf183v wrote

It doesn't really matter. With just a language model trained on general use English (or whatever human language is in the corpus), it should still be able transform each sentence or paragraph into a short encoding.

1

Crazy-Space5384 t1_isf1r74 wrote

But so does traditional data compression. So it‘s to be proven that an ML model gets closer to the entropy limit - given that the model must be transferred alongside the encoded text given the size restriction of the decompressor binary.

1

midasp OP t1_isf72lo wrote

I'm sorry, I should have clarified that I have no interest in the Hutter Prize or its rules, nor is it about the getting close to the entropy limit.

My idea is more about the transmitter and receiver already having mutually shared information (stored within the ML model). In such a situation, the transmitter can reduce the amount of information that needs to be transmitted because it does not have to transmit that mutually shared information. The receiver will be able to combine the transmitted information with its shared information to rebuild the original message.

I should not have used the term "image compression", that is an error on my part and I apologize if it lead any confusion. It is only "compression" in the sense that we are transmitting less information rather than transmitting the message in its entirety and pushing the limits of information compression.

0