ThePerson654321

ThePerson654321 OP t1_jbk8kxy wrote

I'm basically just referring to the claims by the developer. He makes it sound extraordinary:

> best of RNN and transformer, great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

> Inference is very fast (only matrix-vector multiplications, no matrix-matrix multiplications) even on CPUs, so you can even run a LLM on your phone.

The most extraordinary claim I got stuck up on was "infinite" ctx_len. One of the biggest limitations of transformers today is imo their context length. Having an "infinite" ctx_len definitely feels like something DeepMind, OpenAi etc would want to investigate?


I definitely agree with that their might be a incompatibility with the already existing transformer specific infrastructure.

But thanks for your answer. It might be one or more of the following:

  1. The larger organizations hasn't noticed/cared about it yet
  2. I overestimate how good it is (from the developer's description)
  3. It has some unknown flaw that's not obvious to me and not stated in the repository's ReadMe.
  4. All the existing infrastructure is tailored for transformers and is not compatible with RWKV

At least we'll see in time.

0

ThePerson654321 OP t1_jbjz508 wrote

> I emailed them that RWKV exactly met their desire for a way to train RNNs 'on the whole internet' in a reasonable time. So prior to a month ago they didn't know it existed or happened to meet their use case.

That surprises me considering his RWKV repo/repos has thousands of stars on GitHub.

I'm curious about what they responded with. What did they say?

> There was no evidence it was going to be interesting. There are lots of ideas that work on small models that don't work on larger models.

According to his claim (especially infinite ctx len) it definitely was interesting. That it was scaling was pretty obvious even at 7B.


But your argument is basically that no large organization simply has noticed it yet.

My guess is that it actually has some unknown problem/limitation that makes it inferior to the transformer architecture.

We'll just have to wait. Hopefully you are right but I doubt it.

1

ThePerson654321 OP t1_jbjisn7 wrote

  1. Sure. RWKV 7B came out 7 months ago but the concept has been promoted by the developer much longer. Comparing to, say, DALL-E 2 (that has exploded) which only came out 9 months ago it still feels like some organization would have picked RVWK if it was as useful as the developer claim.

  2. This might actually be a problem. But the code is public so it shouldn't be that difficult to understand it.

  3. Not necessarily. Google, OpenAI, Deepmind etc tests things that doesn't work out all the time.

  4. Does not matter. If your idea is truly good you will get at attention sooner or later anyways.


I don't buy the argument that it's too new or hard to understand. Some researcher at, for example, Deepmind would have been able to understand it.

I personally have two potential explainations to my question:

  1. It does not work as well as the developer claim or have some other flaw that makes it hard to scale for example (time judge of this)
  2. The community is basically really slow to embrace this due to some unknown reason.

I am leaning towards the first one.

5

ThePerson654321 t1_jabdk7a wrote

I agree! It's sad to see that Pong, the game that started it all, isn't taken seriously anymore. It deserves respect for paving the way for the entire gaming industry and being a damn good game. The mechanics are elegant, and it rewards skill and practice. We've become too obsessed with flashy graphics and complex mechanics, forgetting that sometimes the simplest things can be the most enjoyable. Let's remind people that Pong is a classic game that deserves to be celebrated and remembered.

1