MysteryInc152 t1_jeecbeq wrote

I didn't downvote you but it's probably because you're being obtuse. anyway whatever. if you don't want to take evidence at plain sight then don't. the baseline human comparisons are right there. Frankly it's not my problem If you're so suspicious of results and not bilingual to test it yourself. It's not really my business if you believe me or not.


MysteryInc152 t1_jee5zba wrote

It's not cherry picked lol.

Wild how everyone will just use that word even when they've clearly not tested the supposed model themselves. I'm just showing you what anyone who's actually used these models for translation will tell you


MysteryInc152 t1_jeanb01 wrote

>LLM trained on a multi-lingual corpus can be prompted to translate but they are far inferior to actual translation models.

No lol. You would know this if you've ever actually tried to translate with GPT-4 and the like. They re far superior to current sota


MysteryInc152 t1_jdrpjd4 wrote

Sorry I'm hijacking the top comment so people will hopefully see this.

Humans learn language and concepts through sentences, and in most cases semantic understanding can be built up just fine this way. It doesn't work quite the same way for math.

When I look at any arbitrary set of numbers, I have no idea if they are prime or factors because they themselves don't have much semantic content. In order to understand whether they are those things or not actually requires to stop and perform some specific analysis on them learned through internalizing sets of rules that were acquired through a specialized learning process. Humans themselves don't learn math by just talking to one another about it, rather they actually have to do it in order to internalize it.

In other words, mathematics or arithmetic is not highly encoded in language.

The encouraging thing is that this does improve with more scale. GPT-4 is much much better than 3.5


MysteryInc152 t1_jdj8x5e wrote

>they mentioned an image takes 30 seconds to "comprehend" by the model...

wait really ? Cn you link source or something. There's no reason a native implementation should take that long.

Now i'm wondering if they're just doing something like this -


MysteryInc152 t1_jd3v3kp wrote

There are foundation models that do these kinds of things. You can connect them to a language model to get the kind of effect you're thinking about.

Visual chatGPT -


MysteryInc152 t1_jcrnqc8 wrote

You can try training chatGLM. 6b parameters and initially trained on 1T English/Chinese Tokens. Also completely open source. However, it's already been fine tuned and had RLHF but that was optimized for Chinese Q/A. Could use some English work,

Another option is RWKV. There are 7b and 14b models(I would go with the 14b, it's the better of the two) fine tuned to a context length of 8196 tokens. He plans on increasing context further too.


MysteryInc152 OP t1_jcputc0 wrote

Uses relative positional encoding. Long context in theory but because it was trained on 2048 tokens of context, performance gradually declines after that. Finetuning for more context wouldn't be impossible though.

You can run with FP-16 (13GB RAM), 8-bit(10GB) and 4-bit(6 GB) quantization.