currentscurrents t1_jdmzphs wrote on March 25, 2023 at 4:39 PM

Reply to comment by gamerx88 in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

That's true, but only for the given compute budget used in training.

Right now we're really limited by compute power, while training data is cheap. Chinchilla and LLaMA are intentionally trading more data for less compute. Larger models still perform better than smaller ones given the same amount of data.

In the long run I expect this will flip; computers will get very fast and data will be the limiting factor.

currentscurrents t1_jdmyjrb wrote on March 25, 2023 at 4:31 PM

Reply to comment by Crystal-Ammunition in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

Bigger models are more sample efficient, so it should need less data.

But - didn't the Chinchilla paper say bigger models need more data? Yes, but that's only true because right now compute is the limiting factor. They're intentionally trading off more data for less model size.

As computers get faster and models bigger, data will increasingly become the limiting factor, and people will trade off in the opposite direction instead.

currentscurrents t1_jdjc1hl wrote on March 24, 2023 at 8:19 PM

Reply to [R] Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity by blabboy

I don't think this is a good test because these questions allow you to trade off knowledge for creativity, and LLMs have vast internet knowledge. It's easy to find listicles with creative uses for all of the objects in the test.

Now, this applies to human creativity too! If you ask me for an alternative use for a pair of jeans, I might say that you could cut them up and braid them into a rug. This isn't my creative idea; I just happen to know there's a hobbyist community that does that.

I think in order to test creativity you need constraints. It's not enough to find uses for jeans, you need to find uses for jeans that solve a specific problem.

currentscurrents t1_jdj9tsl wrote on March 24, 2023 at 8:04 PM

Reply to comment by Imnimo in [R] Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity by blabboy

Oh, definitely. I just checked ChatGPT and it's both aware of the existence of the test and can generate example question/answer pairs. This is a general problem when applying human psychology tests to LLMs.

It does help that this test is open-ended and has no right answer. You can always come up with new objects to ask about.

currentscurrents t1_jdj5p8d wrote on March 24, 2023 at 7:37 PM

Reply to comment by Extension-Mastodon67 in [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92

gpt2-medium is only 355M parameters so don't expect quality.

Even the fullsize GPT-2 is smaller than the smallest variants of most modern language models.

currentscurrents t1_jdft0hp wrote on March 24, 2023 at 1:59 AM

Reply to [D] "Sparks of Artificial General Intelligence: Early experiments with GPT-4" contained unredacted comments by QQII

>They seem to refer to this model as text-only, contradicting to the known fact that GPT-4 is multi-modal.

I noticed this in the original paper as well.

This probably means that they implemented multimodality the same way Palm-E did; starting with a pretrained LLM.

currentscurrents t1_jdf547h wrote on March 23, 2023 at 11:07 PM

Reply to comment by frequenttimetraveler in [N] ChatGPT plugins by Singularian2501

I expect it's more likely that people will run their own chatbots with proprietary content. (Even if just built on top of the GPT API)

For example you might have a news chatbot that knows the news and has up-to-date information not available to ChatGPT. And you'd pay a monthly subscription to the news company for it, not to OpenAI.

currentscurrents t1_jdaqd09 wrote on March 23, 2023 at 1:12 AM

Reply to comment by linverlan in [D] Do you have a free and unlimited chat that specializes only in teaching programming or computing in general? by Carrasco_Santo

Google search uses BERT, you're just calling a language model via an API.

currentscurrents t1_jdaq9xo wrote on March 23, 2023 at 1:11 AM

Reply to comment by darkshenron in [D] Do you have a free and unlimited chat that specializes only in teaching programming or computing in general? by Carrasco_Santo

Right, but you're still loading the full GPT4 to do that.

The idea is that domain-specific chatbots might have better performance at a given model size. You can see this with StableDiffusion models, the ones trained on just a few styles have much higher quality than the base model - but only for those styles.

This is basically the idea behind mixture of experts.

currentscurrents t1_jdag0js wrote on March 22, 2023 at 11:58 PM

Reply to comment by SorrowInCoreOfWin in [R] Introducing SIFT: A New Family of Sparse Iso-FLOP Transformations to Improve the Accuracy of Computer Vision and Language Models by CS-fan-101

We're really running out of acronyms at this point.

currentscurrents t1_jd1c52o wrote on March 21, 2023 at 2:47 AM

Reply to comment by VodkaHaze in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

Doesn't look like they sell in individual quantities right now but I welcome any competition in the space!

currentscurrents t1_jd10ab5 wrote on March 21, 2023 at 1:18 AM

Reply to comment by pier4r in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.

>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.

By comparison, the Nvidia 4090 is clocking in at ~1000GB/s

Apple is clearly positioning their devices for AI.

currentscurrents t1_jd0f76v wrote on March 20, 2023 at 10:47 PM

Reply to comment by Educational-Net303 in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

I mean of course not, nobody would make such a game right now because there are no >24GB cards to run it on.

currentscurrents t1_jd007aa wrote on March 20, 2023 at 9:05 PM

Reply to comment by satireplusplus in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

Right. And even once you have enough VRAM, memory bandwidth limits the speed more than tensor core bandwidth.

They could pack more tensor cores in there if they wanted to, they just wouldn't be able to fill them with data fast enough.

currentscurrents t1_jczuqo8 wrote on March 20, 2023 at 8:29 PM

Reply to comment by gybemeister in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

Just price. They have the same amount of VRAM. The 4090 is faster of course.

currentscurrents t1_jczods2 wrote on March 20, 2023 at 7:49 PM

Reply to comment by UnusualClimberBear in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

I'm hoping that non-Vonn-Neumann chips will scale up in the next few years. There's some you can buy today but they're small:

>NDP200 is designed natively run deep neural networks (DNN) on a variety of architectures, such as CNN, RNN, and fully connected networks, and it performs vision processing with highly accurate inference at under 1mW.

>Up to 896k neural parameters in 8bit mode, 1.6M parameters in 4bit mode, and 7M+ In 1bit mode

An arduino idles at about 10mw, for comparison.

The idea is that if you're not shuffling the entire network weights across the memory bus every inference cycle, you save ludicrous amounts of time and energy. Someday, we'll use this kind of tech to run LLMs on our phones.

currentscurrents t1_jczkbue wrote on March 20, 2023 at 7:23 PM

Reply to comment by 2muchnet42day in [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

Honestly, they already cost more than I can afford to spend on a side project.

I'm just gonna have to wait and hope that AMD gets their act together on AI support.

currentscurrents t1_jczjxbb wrote on March 20, 2023 at 7:20 PM

Reply to comment by londons_explorer in [P] TherapistGPT by SmackMyPitchHup

Data is really hard to get because of privacy regulations too.

There are millions of brain MRI scans sitting in hospital databases but nobody can use them without individually asking each patient. Most published datasets are only a couple dozen scans, and plenty are N=1.

currentscurrents t1_jcziz0q wrote on March 20, 2023 at 7:14 PM

Reply to [Project] Alpaca-30B: Facebook's 30b parameter LLaMa fine-tuned on the Alpaca dataset by imgonnarelph

I'm gonna end up buying a bunch of 24GB 3090s at this rate.

currentscurrents t1_jcqzjil wrote on March 18, 2023 at 9:57 PM

Reply to [D] LLama model 65B - pay per prompt by MBle

I haven't heard of anybody running LLama as a paid API service. I think doing so might violate the license terms against commercial use.

>(or any other) model

OpenAI has a ChatGPT API that costs pennies per request. Anthropic also recently announced one for their Claude language model but I have not tried it.

currentscurrents t1_jchn22q wrote on March 16, 2023 at 10:03 PM

Reply to comment by Dankmemexplorer in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

Computers have gotten literally 100 million times faster within my lifetime. I'm not even that old!

currentscurrents t1_jch9ulc wrote on March 16, 2023 at 8:37 PM

Reply to comment by sam__izdat in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

Oh, it is clearly structured. Words and phrases and sentences are all forms of structure and we're using them right now.

What it doesn't have is formal structure; it cannot be fully defined by any set of rules. This is why you can't build a rules-based parser that understands english and have to use an 800GB language model instead.

>shared across essentially every language and dialect

Noam Chomsky thinks this, but the idea of a universal grammar is controversial in modern linguistics.

currentscurrents t1_jch3nic wrote on March 16, 2023 at 7:56 PM

Reply to comment by sam__izdat in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

So why do you think it is a structured formal thing?

currentscurrents t1_jcgtqwz wrote on March 16, 2023 at 6:54 PM

Reply to comment by impossiblefork in [P] nanoT5 - Inspired by Jonas Geiping's Cramming and Andrej Karpathy's nanoGPT, we fill the gap of a repository for pre-training T5-style "LLMs" under a limited budget in PyTorch by korec1234

...for a toy-sized 250M parameter language model, yes.

currentscurrents t1_jcfu9l8 wrote on March 16, 2023 at 3:11 PM

Reply to comment by sam__izdat in Modern language models refute Chomsky’s approach to language [R] by No_Draft4778

That's why it's a natural language instead of a formal language.