Submitted by 00001746 t3_1244q71 in MachineLearning
Professional-Gap-243 t1_jdz27jp wrote
The way I think about this is like I think about OS. Yes you can build your own OS from scratch, but more often than not you just use windows or Linux. And if you need something custom it is often sufficient to setup your own Linux distro.
To me LLMs are in a similar situation. It doesn't really make a sense to build your own LLM from scratch most of the time just like it wouldn't to build your own OS. This doesn't mean that there is no space for building new LLMs tho.
GPT is in this example like windows (closed, controlled by a corporation) and I think the ML community now needs to focus on building open source alternative that could stand toe to toe with it.
Otherwise the space becomes monopolistic/oligopolistic with large corps running the show (just like before Linux came around).
EvilMegaDroid t1_je0d2a0 wrote
There are many open source projects which in theory can do better than chatgpt.
The issue? Spend millions of dollars on the data to fed it.
Open source LLM are useless, the data is the important part.
Google microsoft etc can fed them their own data and they still spend millions of $,imagine how much it would cost for the normal joe to buy that data and the operating cost.
I doubt there will ever be an open source chat gpt that just works.
Zealousideal-Ice9957 t1_je5vo1c wrote
You better have a look at the OpenAssistant initiative made by Laion, their Human assisted data collection process is said to be of very good quality compared to the underpaid croworder-based one used by OpenAI
EvilMegaDroid t1_je6s99k wrote
Good idea, I'm kinda skeptical if enough users would complete tasks for it to get enough data.
Not impossible though, there are huge open source projects so who knows.
Zealousideal-Ice9957 t1_jebdm73 wrote
They just completed the data collection a few days ago, and they claim prompts of really high quality due to strict filtering algorithm and the propension of the community to create a better open source alternative to OAI.
EvilMegaDroid t1_jec89t6 wrote
That would be insane (I mean as noted, was not impossible given that people have come together to improve things such as big open source projets like linux, mpv etc).
I checked it out for a while but got confused, is everyone supposed to access the data because i could not.
HerculeanSubmarine t1_jeaeqow wrote
Alpaca LoRA cost pretty much nothing to get the dataset from GPT-3
GPT4All was fine-tuned using a 430k dataset that costed $100 in OpenAI API fees
Viewing a single comment thread. View all comments