Submitted by jaqws t3_10dljs6 in MachineLearning
LetGoAndBeReal t1_j4mihya wrote
Reply to comment by avocadoughnut in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
I looked through their repo, but I'm not understanding something: what is the foundational model that they plan to use and where/how will the model be run?
avocadoughnut t1_j4n5sp8 wrote
From what I've heard, they want a model small enough to run on consumer hardware. I don't think that's currently possible (probably not enough knowledge capacity). But I haven't heard that a decision has been made on this end. The most important part of the project at the moment is crowdsourcing good data.
LetGoAndBeReal t1_j4n6rfa wrote
Wow, that seems awfully ambitious given that GPT3.5 requires something like 700GB of RAM and the apparent unlikeliness that SoTA model sizes will get smaller anytime soon. Interesting project to watch, though.
avocadoughnut t1_j4n8bp2 wrote
Well, there are projects like WebGPT (by OpenAI) that make use of external knowledge sources. I personally think that's the future of these models: moderated databases of documents. The knowledge is much more interpretable and modifiable that way.
MegavirusOfDoom t1_j4oelbd wrote
less than 500MB is used for code learning, 690GB is used for culture, geography, history, fiction and non-fiction... 2GB for cats, 2GB bread, horses, dogs, Cheese, Wine, Italy, France, Politics, Television, Music, Japan, Africa. less than 1% of the training is on science and technology, i.e. 300MB is biology, 200MB chemistry, 100MB physics, 400MB maths...
yahma t1_j4owot0 wrote
This may be the size of the datasets, but i it's hard to say how many parameters will be needed for a good llm that's just really good at explaining code.
MegavirusOfDoom t1_j4pfdi1 wrote
Then we'd have to crawl all of stack exchange, all of wiki, and 1 terabyte of programming books... This "generalist NLP" is for article writing, for poetry.
I'm a big fan of teaching ChatGPT how to interpret graphs, the origin lines, to record in a vector engine that is couple with the NLP. For a coding engine, I believe NLP should be paired with a compiler, just like a maths specialized NLP should also have a mathlab type engine.
throwaway2676 t1_j4q8zuh wrote
Well, can you just run it from an SSD, but more slowly?
Viewing a single comment thread. View all comments