PassingTumbleweed
PassingTumbleweed t1_jeh0p1j wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
I'm curious to get your thoughts about a simple example where you have three classes: cat, dog, and bird. What happens if the top-1 prediction is "eagle"? Does that probability mass get discarded? Because it should probably go into the bird category
PassingTumbleweed t1_jegvhb5 wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
What I was thinking is that some kind of hierarchical LLM taxonomy might be interesting, where you can re-jigger the conditional probability tree onto any arbitrary vocab of token sequences.
PassingTumbleweed t1_jegonam wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
Cool! I wonder if you've thought about synonyms. It seems like there might be a lot of cases where classes with more synonyms (or even cases like plurality , eg bird vs birds) are at a disadvantage.
PassingTumbleweed t1_jegfgxg wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
If you assumed the classes are exactly one-token long and equally common, then you could use the probability distribution $P(x_i|1:x_{i-1})$, exactly as returned by GPT APIs. Is that correct? And the rest of your work is to account for those two assumptions not being true?
PassingTumbleweed t1_jegde4t wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
Nice ty!
PassingTumbleweed t1_jeg11bt wrote
Reply to [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
Thanks for sharing! Can you explain the internals a bit more? How do you convert the user input into GPT prompt(s) and how do you turn the response(s) into a probability distribution?
PassingTumbleweed t1_jbri1kj wrote
Reply to [D] What's the Time and Space Complexity of Transformer Models Inference? by Smooth-Earth-9897
I won't repeat what other comments said but there are interesting architectures like H-Transformer that have lower asymptotic complexity and scale to longer sequences than the original Transformer. It's also worth noting that in practice the MLP cost may actually dominate the self-attention cost or vice versa, depending on the sequence length and model size.
PassingTumbleweed t1_ja6w9ai wrote
It's weird to read this when RLHF has been one of the key components of chat GPT and friends
PassingTumbleweed t1_j7mlwls wrote
Reply to comment by _Arsenie_Boca_ in [D] Papers that inject embeddings into LMs by _Arsenie_Boca_
I'm not aware of any comparison. Maybe it doesn't matter that much?
PaLI feeds embeddings from the Vision Transformer to the LM after a linear projection layer. It allows back propagation through ViTs weights so that the image encoding can be learned for the task. The ability to tune the embeddings in end-to-end fashion might be an important consideration.
PassingTumbleweed t1_j7lt1o5 wrote
Any LM with multimodal input? PaLI?
PassingTumbleweed t1_j77s2wr wrote
Reply to comment by Jetztinberlin in Political views can be predicted by differences in brain activity. Study says political differences don’t just emerge when it comes to how we interpret reality around us; our brains actually ‘see’ different things depending on our politics. by mossadnik
What's weird is that they say it applies to right wing people watching right wing content only (not left wing people watching right or left wing content). There's an asymmetry to it
PassingTumbleweed t1_j75hr9k wrote
Reply to Political views can be predicted by differences in brain activity. Study says political differences don’t just emerge when it comes to how we interpret reality around us; our brains actually ‘see’ different things depending on our politics. by mossadnik
I wonder what it means that they could predict political alignment based on the sensorimotor cortex. That's the part responsible for tactile sensing and controlling movement, If I'm understanding correctly. Are there other activities that activate those parts in a similar way?
PassingTumbleweed OP t1_j6tz0wq wrote
Reply to comment by Screye in [D] What does a DL role look like in ten years? by PassingTumbleweed
I agree everyone should take predictions with a huge grain of salt (obviously some clever person might find a way to make Open-ChatGPT on mobile... We can only hope), however this does seem like a conversation worth having, since LLMs appear to have a massive impact across many areas at once. Already I find a lot of the insights here interesting!
Submitted by PassingTumbleweed t3_10qzlhw in MachineLearning
PassingTumbleweed t1_j62bzdk wrote
Reply to comment by madmax_br5 in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
You can totally do that. There are tricks to reduce memory usage, too, such as the embedding factorization used in ALBERT.
The best part is, none of these options are precluded by Unicode. Unicode in fact has nothing to do with it!
PassingTumbleweed t1_j62anc3 wrote
Reply to [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
You could solve the problem you describe at the tokenization level without moving away from Unicode, which is more about how text is encoded for storage and transmission purposes.
For example let's say you still represent your text as Unicode at rest, but you have a tokenizer that budgets its vocab space s.t. the average number of tokens per sentence is the same across languages (or whatever your fairness criteria is)
PassingTumbleweed t1_j46sco1 wrote
Reply to comment by Raphaelll_ in [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont
That depends on what you mean. I don't think any of the LLMs use it, but it has some citations and follow-up literature.
PassingTumbleweed t1_j432cw4 wrote
Reply to [D] Can someone point to research on determining usefulness of samples/datasets for training ML models? by HFSeven
You need to clarify what you mean by "useful learning". Performance on some downstream task? You may be interested in meta-learning.
PassingTumbleweed t1_j41pibv wrote
Reply to [R] Is there any research on allowing Transformers to spent more compute on more difficult to predict tokens? by Chemont
Yes. This thread made me think of Universal Transformers which has dynamic halting and has been around for a while now: https://openreview.net/forum?id=HyzdRiR9Y7
PassingTumbleweed t1_j3gq662 wrote
Not every physicist can afford a particle accelerator, but that doesn't stop them from researching particle physics.
Chat gpt makes basic reasoning errors that even a child wouldn't make, which makes me think this is a weakness of the current approach. Maybe "more data" is not the solution to this problem. This is one direction I would consider.
PassingTumbleweed t1_jeh1248 wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
One thing I've seen with these LLMs is that you can prompt them with the classes using sort of a multiple choice style. It would be interesting to experiment with whether this can stabilize the outputs and reduce the amount of out of vocabulary predictions you get