PassingTumbleweed t1_jbri1kj wrote

I won't repeat what other comments said but there are interesting architectures like H-Transformer that have lower asymptotic complexity and scale to longer sequences than the original Transformer. It's also worth noting that in practice the MLP cost may actually dominate the self-attention cost or vice versa, depending on the sequence length and model size.


PassingTumbleweed t1_j7mlwls wrote

I'm not aware of any comparison. Maybe it doesn't matter that much?

PaLI feeds embeddings from the Vision Transformer to the LM after a linear projection layer. It allows back propagation through ViTs weights so that the image encoding can be learned for the task. The ability to tune the embeddings in end-to-end fashion might be an important consideration.


PassingTumbleweed t1_j77s2wr wrote

What's weird is that they say it applies to right wing people watching right wing content only (not left wing people watching right or left wing content). There's an asymmetry to it


PassingTumbleweed t1_j75hr9k wrote

I wonder what it means that they could predict political alignment based on the sensorimotor cortex. That's the part responsible for tactile sensing and controlling movement, If I'm understanding correctly. Are there other activities that activate those parts in a similar way?


PassingTumbleweed OP t1_j6tz0wq wrote

I agree everyone should take predictions with a huge grain of salt (obviously some clever person might find a way to make Open-ChatGPT on mobile... We can only hope), however this does seem like a conversation worth having, since LLMs appear to have a massive impact across many areas at once. Already I find a lot of the insights here interesting!


PassingTumbleweed t1_j62anc3 wrote

You could solve the problem you describe at the tokenization level without moving away from Unicode, which is more about how text is encoded for storage and transmission purposes.

For example let's say you still represent your text as Unicode at rest, but you have a tokenizer that budgets its vocab space s.t. the average number of tokens per sentence is the same across languages (or whatever your fairness criteria is)


PassingTumbleweed t1_j3gq662 wrote

Not every physicist can afford a particle accelerator, but that doesn't stop them from researching particle physics.

Chat gpt makes basic reasoning errors that even a child wouldn't make, which makes me think this is a weakness of the current approach. Maybe "more data" is not the solution to this problem. This is one direction I would consider.