rshah4 t1_jdy0mjg wrote on March 28, 2023 at 12:40 AM

I wouldn't get worried about training these models from scratch. Very few people are going to need those skills. My suggestion is to focus on learning how to use these models (prompting, chained prompting ala langchain) and then maybe fine-tuning. Fine-tuning these models is going to be key and people are just now starting to make those techniques widely usable. I just finished a video on using PEFT for fine-tuning a LLM using LoRA. So don't stress, it's very early and the tools are just starting to become easier to use.

antonivs t1_jdyp1zw wrote on March 28, 2023 at 3:49 AM

> I wouldn't get worried about training these models from scratch. Very few people are going to need those skills.

Not sure about that, unless you also mean that there are relatively few ML developers in general.

After the ChatGPT fuss began, one of our developers trained a GPT model on a couple of different subsets of our company's data, using one of the open source GPT packages, which is obviously behind GPT 3, 3.5, or 4. He got very good results though, to the point we're working on productizing it. Not every model needs to be trained on internet-sized corpuses.

Craksy t1_jdywiwi wrote on March 28, 2023 at 5:04 AM

Well that doesn't really contradict previous comment. They did mention fine tuning as an exception. GPT even stands for Generalized Pretrained Transformer. I'm sure some people like to draw hard lines between transfer learning/specialisation/fine tuning (different task or just different data) but at any rate, what you're describing can hardly be considered "training from scratch".

Indeed very few will need to be able to train models on that scale. In fact that was the whole motivation behind GPT. Training LLMs from scratch consumes a tremendous amount of resources, and 99% of that work goes into building a foundation that happens to generalize very well across many different tasks.

kalakau t1_jdzb1jx wrote on March 28, 2023 at 8:16 AM

> Generalized Pretrained Transformer

this is pedantic but it's actually Generative PT

Craksy t1_jdzbgzj wrote on March 28, 2023 at 8:22 AM

Not at all.
While it doesn't mean the world for the point I was trying to make, it does change the meaning quite a bit.

Thank you for the correction

antonivs t1_je1cuw1 wrote on March 28, 2023 at 6:28 PM

My description may have been misleading. They did the pretraining in this case. The training corpus wasn't natural language, it was a large set of executable definitions written in a company DSL, created by customers via a web UI.

Craksy t1_je3tzt3 wrote on March 29, 2023 at 5:28 AM

Aah, got you. My bad. Well, I suppose most people mainly think of NLP in these kind of contexts. That's where my mind went, anyway.

Training from scratch on a DSL is indeed an entirely different scale of problem (assuming it's not some enormous, complex DSL that relies heavily on context and thousands of years of culture to make sense of).

Sounds very interesting though. If you're allowed to share more information, I'd love to hear about it

antonivs t1_je82r3j wrote on March 30, 2023 at 2:14 AM

Well, I do need to be a bit vague. The main DSL has about 50 instructions corresponding to actions to be performed. There's also another different sub-DSL, with about 25 instructions, to represent key features of the domain model, that allows particular scenarios to be defined and then recognized when executing.

Both DSLs are almost entirely linear and declarative, so there's no nested structure, and the only control flow is a conditional branch instruction in the top-level DSL, to support conditional execution and looping. The UI essentially acts as a wizard, so that users don't have to deal with low-level detail.

There are various ideas for the GPT model, including suggesting instructions when creating a program, self-healing when something breaks, and finally generating programs from scratch based on data that we happen to already collect anyway.

NLP will probably end up being part of it as well - for that, we'd probably use the fine-tuning approach with an existing language model as you suggested.

visarga t1_jdzt9gd wrote on March 28, 2023 at 12:08 PM

> Generalized 😓

abnormal_human t1_jdywyac wrote on March 28, 2023 at 5:08 AM

I'm in the midst of a similar project. It also doesn't require massively expensive compute because for domain specific tasks, you often don't need models with gajillions of parameters to achieve business-interesting results.

antonivs t1_jdz6vai wrote on March 28, 2023 at 7:15 AM

Exactly what I was getting at, yes.

happycube t1_jdzq0v4 wrote on March 28, 2023 at 11:36 AM

nanoGPT's good for this sort of from-scratch training, there's an updated version of the classic char-RNN Shakespeare model in the repo.

antonivs t1_je0pfza wrote on March 28, 2023 at 4:00 PM

Thanks! I actually don't know exactly what this guy used, I'll have to check.

Qpylon t1_jdzmiaq wrote on March 28, 2023 at 10:57 AM

I’m curious, is this for your company wiki or something? Was considering trying that with our documentation etc.

antonivs t1_je0pb85 wrote on March 28, 2023 at 4:00 PM

Our product involves a domain-specific language, which customers typically interface to via a web UI, to control the behavior of execution. The first model this guy trained involved generating that DSL so customers could enter a natural language request and avoid having to go through a multi-step GUI flow.

They've tried using it for docs too, that worked well.

dancingnightly t1_je0o082 wrote on March 28, 2023 at 3:51 PM

The benefit of finetuning or training your own text model (e.g. in the olden days on BERT), now through the OpenAI API vs the benefit of just using contextual semantic search is reducing day-by-day... especially with the extended context window of GPT-4.

If you want something in house, finetuning GPT-J or so could be the way to go, but it's definitely not the career direction I'd take.

antonivs t1_je1d8o0 wrote on March 28, 2023 at 6:30 PM

The training corpus size here is in the multi-TB range, so probably isn't going to work with the OpenAI API currently, from what I understand.

SlowThePath t1_je2buak wrote on March 28, 2023 at 10:08 PM

No models are trained on internet sized corpuses.That would take an infinite amount of time. I would think.

antonivs t1_je7ws1v wrote on March 30, 2023 at 1:28 AM

I was referring to what the OpenAI GPT models are trained on. For GPT-3, that involved about 45 TB of text data, part of which was Common Crawl, a multi-petabyte corpus obtained from 8 years of web crawling.

On top of that, 16% of its corpus was books, totaling about 67 billion tokens.

SlowThePath t1_je7xmaz wrote on March 30, 2023 at 1:35 AM

Definitely not denying that it was trained on a massive amount of data because it was, but calling it internet sized is not accurate. I guess you were speaking in hyperbole and I juts didn't read it that way. I know what you mean.

machineko t1_je05orp wrote on March 28, 2023 at 1:50 PM

I agree. While these giant centralized models are all over the news, there are ways to make smaller models much more efficient (i.e. LoRA mentioned above). And during the process working with these techniques, we can perhaps discover new methods and architecture .

We are working on an open-source project focused on making fine-tuning for LLMs, simple, fast and efficient: https://github.com/stochasticai/xturing.

OP, we till got a ton of stuff we want to try out to make fine-tuning faster and more compute/memory efficient, if you are interested in contributing.

dimem16 t1_jdy7aja wrote on March 28, 2023 at 1:30 AM

Thanks for your insight. Could you share the link to the video please?

rshah4 t1_jdy7o2h wrote on March 28, 2023 at 1:33 AM

Here is my video: https://youtu.be/YKCtbIJC3kQ

Here is the blog post its based on: https://www.philschmid.de/fine-tune-flan-t5-peft

Efficient Large Language Model training with LoRA and Hugging Face

[deleted] t1_jdyrtnq wrote on March 28, 2023 at 4:15 AM

[removed]

dimem16 t1_jdzsrhh wrote on March 28, 2023 at 12:03 PM

Thanks:)

[deleted] t1_jdyp2rj wrote on March 28, 2023 at 3:49 AM

[deleted]

crazyvaclav3 t1_jdzx86v wrote on March 28, 2023 at 12:43 PM

Is the video available? I'd love to see it

rshah4 t1_je00t24 wrote on March 28, 2023 at 1:13 PM

Here is my video: https://youtu.be/YKCtbIJC3kQ
Here is the blog post its based on: https://www.philschmid.de/fine-tune-flan-t5-peft
Efficient Large Language Model training with LoRA and Hugging Face

I should also post in ML - I will do that later today

modernzen t1_je6xujz wrote on March 29, 2023 at 9:12 PM

Totally agree with this. Something like ChatGPT is overkill for most use cases and comes at a cost of both money (using the API) and latency. Clever prompting and fine-tuning can let you build free, fast models that are tailored towards your specific problem at hand.

nxqv t1_jdxx53i wrote on March 28, 2023 at 12:15 AM

I don't know a whole lot about LLMs because I'm new to the field but I sure do know about FOMO. I recently felt a lot of FOMO about having missed opportunities to path towards graduate school and AI research years ago.

What you need to do is put a name to the face. Dig deep and understand your feelings better.

What is it you're afraid of missing out on exactly?

Untold riches? Researchers don't really make any more or less money than other computer science jobs. And most billionaires aren't following some predetermined path.

Fame? Clout? We can't all be Sam Altman or Yann LeCun or Eliezer Yudkowsky or whoever. Besides, most of the things you see these types of guys say or do in public is only tangentially related to the day to day experience of actually being them.

Impact? I've recently come to realize that a craving for "impact" is often rooted in a desire for one of these other things, or rooted in some sort of egotistical beliefs or other deep seated psychological matter like seeking someone's approval. In reality, you could be the guy who cures cancer and most regular people would only think about you for half a second, your peers could be jealous freaks, and people could still find some tiny little reason to turn on you if they really wanted to. You could easily die knowing you did something amazing for the world and nobody cared but you. Are you the type of person who would be okay with that?

Edit: the "Impact" part was controversial so I'd like to add:

> don't lose sight of the forest because of a tree. We're talking about impact in the context of FOMO - if you feel that level of anxiety and rush about potentially missing out on the ability to make an impact because others are already making the impact you want to make, it's more likely to be ego-driven than genuine altruism

The ability to work on something cool or trendy? There's SO MANY new technologies out there you can path towards a career in. And there will continue to be something cool to do for as long as humanity exists.

Something else?

For each one of these, you can come up with convincing counterarguments for either why it's not real or why you can just find a similar opportunity doing many other things.

And let's be real for a second, if this technology really is going to take knowledge workers' jobs, researchers are probably on the chopping block too.

[deleted] t1_jdy83bi wrote on March 28, 2023 at 1:36 AM

[deleted]

ginsunuva t1_jdyu8d2 wrote on March 28, 2023 at 4:39 AM

Some things don’t need impacting and yet people need to force an impact (which may worsen things) to satisfy their ego, which usually soon goes back to needing more satisfaction after they realize the issue is psychological and always relative to the current situation. Not always of course, duh, but some times. I usually attribute it to OCD fixated on fear of death without “legacy.”

nxqv t1_je006pc wrote on March 28, 2023 at 1:08 PM

Yeah "legacy" is another one of those ego-loaded words that doesn't always mean what it looks like it means.

Impallion t1_je07in1 wrote on March 28, 2023 at 2:02 PM

I completely agree and of the things that u/nxqv listed, I think impact is the thing that most everyday people want and fear they will no longer have, more so than fame, riches, clout etc. It's totally natural to want the things you spend effort on to have impact.

Now what I'm more interested in is the argument of how much impact is enough to make you feel satisfied, and I think this is where the FOMO starts to set in for people. People want to have a "large" impact - making company-wide differences, influence large swaths of people. I think the fear is that in the face of a ChatGPT, your little model or little application can only reach a handful of others.

Extrapolate current trends and you might think, oh well AI applications are just going to get bigger and bigger. Midjourney 5 or SuperChatGPT-12 are going to be so insanely capable that we will have no more use for human writing, human art, human music, human programming. There will simply be no more room for my work to EVER have a big impact in the future. (Maybe this change is also similar to how the scientific greats back in the day could discover big theorems like Einstein's relativity, but nowadays you need to hyper-specialize in academia to produce results for your tiny corner)

My solution is that we need to dig a little deeper. What does it mean to be human? What does it mean to live a good meaningful life? If your answer to that is that a good life worth living is one where you impact on the order of thousands or millions of humans, then yes we might be shifting away from that possibility. But humans are built for connection, and I think we will need to look inwards and realize that we don't need to influence thousands to experience that connection. You can make a little model or application that affects hundreds. You can write a song just for your friends and family. You can paint a piece of art that just hangs on your wall and gets a single compliment. To me that is already human connection, and is just as meaningful as making a large model that drives the next Google/Meta forward.

nxqv t1_je0cw14 wrote on March 28, 2023 at 2:39 PM

>People want to have a "large" impact - making company-wide differences, influence large swaths of people. I think the fear is that in the face of a ChatGPT, your little model or little application can only reach a handful of others.

Yes, it's this idea of wanting to make "as large of an impact as possible" that I was starting to chip away at. A lot of people - myself often included - feel dismayed when we think about our work only impacting a tiny corner of the world. It feels like you're "settling for less." But when you finish that thought, it sounds more like "settling for less than what I'm capable of" which has a lot to unpack.

And for the record, I think it's okay to want to make a big splash to satisfy your own ego. I wasn't trying to say that it's immoral. I just think it's important to understand that you're in that position and unpack how you got there. Mindfulness is the way to combat FOMO, as well as all sorts of other negative emotions.

>My solution is that we need to dig a little deeper. What does it mean to be human? What does it mean to live a good meaningful life? If your answer to that is that a good life worth living is one where you impact on the order of thousands or millions of humans, then yes we might be shifting away from that possibility. But humans are built for connection, and I think we will need to look inwards and realize that we don't need to influence thousands to experience that connection. You can make a little model or application that affects hundreds. You can write a song just for your friends and family. You can paint a piece of art that just hangs on your wall and gets a single compliment. To me that is already human connection, and is just as meaningful as making a large model that drives the next Google/Meta forward.

Yes yes yes.

[deleted] t1_jdy8zqw wrote on March 28, 2023 at 1:43 AM

[removed]

nxqv t1_je00ks4 wrote on March 28, 2023 at 1:11 PM

Also, don't lose sight of the forest because of a tree. We're talking about impact in the context of FOMO - if you feel that level of anxiety and rush about potentially missing out on the ability to make an impact because others are already making the impact you want to make, it's more likely to be ego-driven than genuine altruism

ghostfaceschiller t1_jdyerkp wrote on March 28, 2023 at 2:23 AM

> Yan LeCun

That dude is becoming straight-up unhinged on Twitter

spiritus_dei t1_jdz6pml wrote on March 28, 2023 at 7:13 AM

If he's the standard of "success" then based on Twitter that's something you may want to reconsider. Jürgen Schmidhuber comes in a close second.

visarga t1_jdzu6az wrote on March 28, 2023 at 12:16 PM

Let the critics critique, it's better to have an adversarial take for everything, when you take a survey you get better calibration that way.

He's angry for the forced Gallactica retraction, followed by chatGPT success. Both models had hallucination issues but his model was not tolerated well by the public.

nxqv t1_jdyjvxe wrote on March 28, 2023 at 3:04 AM

Yeah it's really somethin

Alternative_Staff431 t1_je5r9j7 wrote on March 29, 2023 at 4:41 PM

I thought so too but I actually genuinely appreciate what he says. His POV is valuable and his recent posts aren't really bad in recen times.

MootVerick t1_jdyj5x3 wrote on March 28, 2023 at 2:58 AM

If ai can do research better than us, we are basically at singularity.

spiritus_dei t1_jdz7rmz wrote on March 28, 2023 at 7:28 AM

I think this is the best formulation of the question I've seen, "Can you imagine any job that a really bright human could do that a superintelligent synthetic AI couldn't do better?"

Everyone loves to default to the horse and buggy example and they always ignore the horse. Are programmers and researchers the blacksmiths or are they the horses?

It's at least 50/50 that we're all the horses. That doesn't mean that horses have no value, but we don't see horses doing the work they once did in every major city prior to their displacement by automobiles.

We also hear this familiar tome, "AI will create all of these news jobs that none of us can imagine." Really? That superintelligent AIs won't be able to do? It reminds me of a mixed metaphor. These two ideas are just not compatible.

Either they hit a brick wall with scaling or we all will be dealing with a new paradigm where we remain humans (horses) or accept the reality that to participate in the new world you become a cyborg. I don't know if it's possible, but may be the only path to "keep up" and it's not a guarantee since we'd have to convert biological matter to silicon.

And who wants to give up their humanity to basically become an AI? My guess is the number of people will shock me if that ever becomes a possibility.

I'm fine with retirement and remaining an obsolete human doing work that isn't required for the fun of it. I don't play tennis because I am going to play at Wimbledon or even beat anyone good - I play it because I enjoy it. I think that will be the barometer if there isn't a hard limit on scaling.

This has been foretold decades ago by Hans Moravec and others. I didn't think it was possible in my lifetime until ChatGPT. I'm still processing it.

starfries t1_jdyx0xh wrote on March 28, 2023 at 5:09 AM

I feel like Eliezer Yudkowsky proves that everyone can be Eliezer Yudkowsky, going from a crazy guy with a Harry Potter fanfic and a blog to being mentioned in your post alongside those other two names.

sdmat t1_jdyyqwe wrote on March 28, 2023 at 5:29 AM

Does it? How many other fanfic writer -> well known researcher trajectories come to mind?

starfries t1_jdyz458 wrote on March 28, 2023 at 5:33 AM

No, I mean you don't need anything special or to follow a conventional path.

sdmat t1_jdz0h51 wrote on March 28, 2023 at 5:50 AM

I mean no personal offense, but it's strange to see someone generalizing from an extreme outlier in a machine learning sub.

starfries t1_jdz0q2b wrote on March 28, 2023 at 5:53 AM

That's not what I meant, so no offense taken.

ReasonablyBadass t1_jdyzvv1 wrote on March 28, 2023 at 5:42 AM

That dude was a researcher before he wrote that though

starfries t1_jdz0f1p wrote on March 28, 2023 at 5:49 AM

Huh, I could have sworn it was a lot older.

ThePseudoMcCoy t1_jdy53ol wrote on March 28, 2023 at 1:13 AM

Thx for posting this!

landongarrison t1_jdz5ao3 wrote on March 28, 2023 at 6:53 AM

This was an incredibly well thought out comment. Should be at the top.

ObiWanCanShowMe t1_jdyh4wm wrote on March 28, 2023 at 2:41 AM

Utilizing the models and all the upcoming amazing things is going to be 10x more valuable than getting your hands dirty trying to make one on your own.

You won't get replaced by AI, you will get replaced by someone who knows how to use the AI.

deepneuralnetwork t1_jdyis5x wrote on March 28, 2023 at 2:55 AM

Came to same conclusion after using GPT-4. It’s kind of like a mediocre magic wand that still ends up making my job easier. It’s not perfect by all means but I’ve gotten way more value out of it already than the $20 I’ve paid into it so far.

RedditLovingSun t1_jdziqh1 wrote on March 28, 2023 at 10:09 AM

Me too, I've used it to aid me reading books, study for tests, complete some small side projects, etc. Wish there was a list or subreddit somewhere for people to share what ways they've gotten value out of it so far

AssHypnotized t1_jdzjacv wrote on March 28, 2023 at 10:17 AM

r/chatgptpro and maybe r/chatgptcoding

[deleted] t1_je0c6tk wrote on March 28, 2023 at 2:34 PM

[removed]

SlowThePath t1_je2cknq wrote on March 28, 2023 at 10:14 PM

The thing about magic is that it is only magic in the beginning. Eventually it becomes commonplace and it is no longer "magic" anymore. Right now it feels like magic to me though too.

keepthepace t1_jdzvbww wrote on March 28, 2023 at 12:27 PM

> You won't get replaced by AI, you will get replaced by someone who knows how to use the AI.

I wonder why this is any comfort. This is just a rephrasing of "your skillset is obsolete, your profession that used to pay you a salary is now worth a 15 USD/month subscription service"

The person "who knows how to use AI" is not necessarily a skilled Ai specialist. It could simply be your typical client.

The current AI wave should be the trigger to reconsider the place we give to work in our lives. Many works are being automated and no, this is not like the previous industrialization waves.

Workers used to be replaced by expensive machines. It took time to install things, prepare the infrastructure for the transition, it required other workers to do maintenance.

This wave replaces people instantly with an online service that requires zero infrastructure (for the user), costs a fraction of a wage and gives almost instant results.

Yes, progress that suppress jobs tend to create new jobs as well, but there is no mechanism through which there is any guarantee of symmetry between these two quantities and when you think about the AI wave, it is clear that the jobs will be removed faster than they are created and that the skillsets from the jobs removed do not translate well to the hypothetical jobs created.

lqstuart t1_je0jvt7 wrote on March 28, 2023 at 3:25 PM

100%, I think the US really REALLY needs to figure out a universal basic income soon, and they aren't going to do it and life is going to suck

keepthepace t1_je0u1de wrote on March 28, 2023 at 4:30 PM

US is not the only country in the world, maybe they wont be the first one on this thing.

Necessary-Meringue-1 t1_je2r12k wrote on March 28, 2023 at 11:58 PM

Large scale automation has been happening for over 200 years (and beyond) and so far it hasn't translated to productivity gains being handed down to workers, so I'm not holding my breath.

slaweks t1_je8220m wrote on March 30, 2023 at 2:09 AM

Really? Average worker life has not imporoved over last 200 years?

Necessary-Meringue-1 t1_je84su5 wrote on March 30, 2023 at 2:30 AM

Of course it has, but those are hard fought gains that are primarily results of WWI, WWII, and the early phases of the Cold War, not productivity gains.

There is no natural law that productivity gains get handed down. Just compare the years 1950-1970 in the US, where life for the average worker improved greatly, to the 1980s onward, since when we've been in a downward trend. There's steady productivity gains over all that.

belikeron t1_jdy7jrv wrote on March 28, 2023 at 1:32 AM

I mean that's true, but it's not worth losing sleep over either. Yes a disruptive technology based on scalability will always make decades of research look like a waste of time to the lay person.

It also would be impossible without the insights gained from those decades of research. It is the same with galactic travel.

The first mission to the nearest star will not be the first ones to get there. We will have a colony waiting on them to arrive at the objectively slow almost the speed of light. The technology the colonists used to get there in 20 minutes wouldn't have happened without all of the advances made just to get that first lemon into space.

That's my two cents.

cheddacheese148 t1_jdyo5w5 wrote on March 28, 2023 at 3:40 AM

Ignoring literally everything else about what you said, it’s insanely cool to think about the first colonists in another solar system being the like 10th group to make the journey. If this isn’t already a movie, it needs to be!

sdmat t1_jdyzb37 wrote on March 28, 2023 at 5:36 AM

Not a movie, but it's definitely SF:

> "Far Centaurus" (1944) by A. E. van Vogt: This classic science fiction story tells the tale of a group of colonists who embark on a centuries-long voyage to the distant star system Centaurus. Upon arrival, they discover that Earth has developed faster-than-light travel during their journey, and a thriving human civilization already exists in Centaurus. > > "The Songs of Distant Earth" (1986) by Arthur C. Clarke: The novel features the crew of a slower-than-light colony ship, Magellan, who arrive at their destination planet Thalassa, only to discover that faster-than-light ships have already colonized other planets in the meantime. The story explores the consequences of different levels of technology and adaptation for the human settlers. > > "Tau Zero" (1970) by Poul Anderson: In this novel, a group of colonists aboard the starship Leonora Christine set out to explore a distant star system. During their journey, they encounter a series of technical malfunctions that cause their ship to accelerate uncontrollably. As a result, they experience time dilation, and the rest of the universe rapidly advances around them. They must navigate their own obsolescence and search for a new home as other expeditions overtake them.

Being able to find anything with a few vague words about content is one of my favourite GPT4 capabilities!

belikeron t1_jdz9h8x wrote on March 28, 2023 at 7:53 AM

I prefer my version where they match their speed, knock on the window like Matthew McConaughey and say, "You losers getting in? We're going colonizing!"

nmfisher t1_jdyeyit wrote on March 28, 2023 at 2:24 AM

IMO the area most ripe for picking is distilling larger pretrained models into smaller, task-specific ones. Think extracting a 30mb LM from Lllama that is limited to financial terminology.

There's still a huge amount of untapped potential.

WarAndGeese t1_jdyi94w wrote on March 28, 2023 at 2:50 AM

You are thinking about it backwards. This stuff is happening now and you are a part of it. You are among the least of people who is "missing out", you are in the centre of it as it is happening.

Spziokles t1_jdyyies wrote on March 28, 2023 at 5:26 AM

Came to say this. Compare yourself with someone who enters the field in two years, or two months. Heck, we all witness what difference even two weeks currently make.

Will they find a job? Will they have a hard time? If your worries are true, then it should be even harder for them. Which means, you have an advantage having this head start.

I guess we can also safely expect the demand for all skill levels around ML to increase, the more it impacts our societies and economies. Yes, we might need less people for a single task, but the amount of tasks will grow more. I do not worry for either new and old folks.

CriticalTemperature1 t1_jdyubo2 wrote on March 28, 2023 at 4:40 AM

Unfortunately the nature of this field is "the bitter lesson", scale trumps everything in machine learning so unfortunately/fortunately we are getting interested in language models when the scale is so large that it is impossible to make in impact on them unless you own your own $xxM company.

However, there are several interesting research avenues you can take:

Improve small models with RLHF + fast implementations for a specific task (e.g. llama.cpp)
Chaining models together with APIs to solve a real human problem
Adding multimodal inputs to smaller LLMs
Building platforms to make it easy to train and serve LLMs for many use cases
Analyzing prompts and understanding how to make the most of the biggest LLMs

visarga t1_jdztq3o wrote on March 28, 2023 at 12:12 PM

In short, build around LLMs and with LLMs, but don't compete directly with them.

SlowThePath t1_je2d9oi wrote on March 28, 2023 at 10:19 PM

Yeah I don't see any startup being able to acquire the resources and time to catch up let alone compete or surpass. Unless they come up with some very novel new magic secret sauce which seems extremely unlikely.

Professional-Gap-243 t1_jdz27jp wrote on March 28, 2023 at 6:12 AM

The way I think about this is like I think about OS. Yes you can build your own OS from scratch, but more often than not you just use windows or Linux. And if you need something custom it is often sufficient to setup your own Linux distro.

To me LLMs are in a similar situation. It doesn't really make a sense to build your own LLM from scratch most of the time just like it wouldn't to build your own OS. This doesn't mean that there is no space for building new LLMs tho.

GPT is in this example like windows (closed, controlled by a corporation) and I think the ML community now needs to focus on building open source alternative that could stand toe to toe with it.

Otherwise the space becomes monopolistic/oligopolistic with large corps running the show (just like before Linux came around).

EvilMegaDroid t1_je0d2a0 wrote on March 28, 2023 at 2:40 PM

There are many open source projects which in theory can do better than chatgpt.

The issue? Spend millions of dollars on the data to fed it.

Open source LLM are useless, the data is the important part.

Google microsoft etc can fed them their own data and they still spend millions of $,imagine how much it would cost for the normal joe to buy that data and the operating cost.

I doubt there will ever be an open source chat gpt that just works.

Zealousideal-Ice9957 t1_je5vo1c wrote on March 29, 2023 at 5:08 PM

You better have a look at the OpenAssistant initiative made by Laion, their Human assisted data collection process is said to be of very good quality compared to the underpaid croworder-based one used by OpenAI

EvilMegaDroid t1_je6s99k wrote on March 29, 2023 at 8:35 PM

Good idea, I'm kinda skeptical if enough users would complete tasks for it to get enough data.

Not impossible though, there are huge open source projects so who knows.

Zealousideal-Ice9957 t1_jebdm73 wrote on March 30, 2023 at 7:54 PM

They just completed the data collection a few days ago, and they claim prompts of really high quality due to strict filtering algorithm and the propension of the community to create a better open source alternative to OAI.

EvilMegaDroid t1_jec89t6 wrote on March 30, 2023 at 11:18 PM

That would be insane (I mean as noted, was not impossible given that people have come together to improve things such as big open source projets like linux, mpv etc).

I checked it out for a while but got confused, is everyone supposed to access the data because i could not.

HerculeanSubmarine t1_jeaeqow wrote on March 30, 2023 at 4:11 PM

Alpaca LoRA cost pretty much nothing to get the dataset from GPT-3

GPT4All was fine-tuned using a 430k dataset that costed $100 in OpenAI API fees

WarAndGeese t1_jdyi9nm wrote on March 28, 2023 at 2:50 AM

I think a lot of people have falsely bought the concept that their identity is their job, because there is such material incentive for that to be the case.

Also note that people seem to like drama, so they egg on and encourage posts about people being upset or emotional, whereas both, those cases aren't that representative, and those cases themselves are exaggerated for the sake of that drama.

Necessary-Meringue-1 t1_je2qurd wrote on March 28, 2023 at 11:57 PM

>I think a lot of people have falsely bought the concept that their identity is their job, because there is such material incentive for that to be the case.

This is easily said and while true, this kind of sentiment seems to neglect the fact that we live in an economic system where you need a job to survive if you're not independently wealthy.

And for that question it does make a big difference whether you are a 200k/year ML engineer, or a $20/hr LLM prompter.

FinancialElephant t1_jdzdm7x wrote on March 28, 2023 at 8:55 AM

I was way more impressed by mu zero when it came out. I feel crazy for not being that impressed by these LLMs. I do think they are changing the world, but I don't see this as some huge advancement in ML as much as an advanced ingestion and regurgitation machine. All the "intelligence" is downstream from the humans that generated the data.

Honestly I think the reason it made a huge splash is because the RLHF fine tuning made the models especially good at fooling humans. It feels like more of a hack than a big step in AI. My biggest worry is people will expect too much out of these things, too soon. There seems to be a lot of fear and exuberance going around.

light24bulbs t1_jdybyi9 wrote on March 28, 2023 at 2:05 AM

I'm just going in deep. These are FUN

kaisear t1_jdyxdbq wrote on March 28, 2023 at 5:13 AM

I feel the anxiety, too. At a deeper level, AGI will replace most of the jobs. Elon Musk says CEOs will be replaced before machine learning engineers. Society and the economy will need a structural change. Don't worry. We (humans) are all in the same boat.

slaweks t1_je82s9s wrote on March 30, 2023 at 2:14 AM

"CEOs will be replaced before machine learning engineers" - that's very naive :-)

WildlifePhysics t1_jdyzon6 wrote on March 28, 2023 at 5:40 AM

There are a lot of ways to get involved in advancing research beyond generating your own foundational models. Do what others don't.

tysam_and_co t1_je8wno6 wrote on March 30, 2023 at 7:28 AM

This will be one of the key things that I think will keep people up and going in their respective subfields. Perhaps not ironically at all, DougDoug, a popular YT creator has a great video on how to be a content creator that I find to be pretty exceptional and well-targeted to the current state of ML research, including some unique strategies on how to pick a unique fusion that only the person doing the researching is able to compete well in (while still contributing somewhat to the field), if one assumes that the research or software created is content that people might want

It's helped me as a researcher in producing research, I've recommended it to a number of people, and he just won a decently-sized award recognizing him doing in the way he's doing it. May not seem at all related but it has been one of the best guidebooks so far for me, and it has not failed me quite yet.

djc1000 t1_jdys06n wrote on March 28, 2023 at 4:17 AM

Totally agree with you. I was able to do interesting work when you could do that on a 10k budget. Now almost everyone is boxed out. It’s incredibly frustrating.

---AI--- t1_jdysc3x wrote on March 28, 2023 at 4:20 AM

But this just isn't true. You can train gpt 3 level transformers for like 600 usd

fiftyfourseventeen t1_jdz6eu7 wrote on March 28, 2023 at 7:08 AM

The only way you are training your own GPT 3 level model for 600 is by spending 300 bucks on a gun, 300 bucks renting a u haul and heisting a datacenter

Edit: maybe cheap out on the gun and truck, can't forget about electricity costs of your newly acquired H100s

utopiah t1_jdzcevv wrote on March 28, 2023 at 8:37 AM

$500 https://github.com/tatsu-lab/stanford_alpaca#data-generation-process

[deleted] t1_jdzuwoq wrote on March 28, 2023 at 12:23 PM

[deleted]

fiftyfourseventeen t1_je0u1oj wrote on March 28, 2023 at 4:30 PM

You can't compare a lora to training a model lol

utopiah t1_je0zqae wrote on March 28, 2023 at 5:06 PM

Well I just did so please explain why not, genuinely trying to learn. I'd also be curious if you have a list of trained models compared by cost. I only saw some CO2eq order of magnitude equivalent but not rough price estimations so that would help me to get a better intuition as you seem to know more about this.

That being said the point was that you don't necessarily need to train anything from scratch or buy anything to have useful results, you cant rent per hour on cloud and refine existing work, no?

fiftyfourseventeen t1_je1gprd wrote on March 28, 2023 at 6:51 PM

If you just want to change the output of a model to look more like something else in its training data, sure. LoRa trains the attention layers (technically it trains a separate model but it can be merged into the attention layers), so it doesn't necessarily add anything NEW per se, but rather focuses on things the model has already learned. For example, if you were to try to make a model work well with a language not in its training data, LoRa is not going to work very well. However, if you wanted to make the model give things in a dialogue like situation (as is the case of alpaca), it can work because the model has already seen dialogue before, so the LoRa makes it "focus" on creating dialogue.

You can get useful results with just LoRa, which is nice. If you want to try to experiment with architecture improvements or large scale finetunes / training from scratch, you are out of luck unless you have millions of dollars.

I'd say the biggest limitation of LoRa is that your model for the most part already has to "know" everything that you are trying to do. It's not a good solution to add more information into the model (e.g. training it on information after 2021 to make it more up to date) with lora. That has to be a full finetune which is a lot more expensive.

As for the cost, I honestly don't know because these companies don't like to make data like that public. We don't even know for sure what hardware GPT 3 was trained on, although it was likely V100s, and then A100s for GPT 3.5 and 4. I think people calculated the least they could have spent on training was around 4.5 million for GPT 3, and 1.6 million for llama. That doesn't even include all the work that went into building an absolutely massive dataset and paying employees to figure out how to do distributed training across tens of thousands of nodes with multiple GPUs each.

friuns t1_jdz8ef6 wrote on March 28, 2023 at 7:37 AM

I feel you on the FOMO with LLMs. It's like we're all aboard a speeding train, right? Don't stress too much, though! Remember, innovation is a collective journey, and there's always room for exploration, even with limited resources. Keep an eye on new techniques, distillation, and distributed compute - the ML world is full of opportunities to hop in and make a difference! Let's embrace the excitement and keep learning together!

braindead_in t1_jdz7okb wrote on March 28, 2023 at 7:27 AM

I'm getting geeky instead of being anxious. Computera are never gonna be the same again. LLMs are giving me the same vibes I got after writing my first program.

UBI is coming anyways.

boonhet t1_je28xyl wrote on March 28, 2023 at 9:49 PM

>UBI is coming anyways.

I do hope you have good weaponry, because UBI will have to be fought for. Trillionaires aren't going to be giving up their assets for the lulz.

keepthepace t1_jdzvxl2 wrote on March 28, 2023 at 12:32 PM

Maybe I am stubborn but I haven't totally digested the "bitter lesson" and I am not sure I agree in its inevitability. Transformers did not appear magically out of nowhere, they were a solution to RNN's venishing gradient problem. AlphaGo had to be put into a min-max montecarlo search to do anything good, and it is hard to not feel that LLMs grounding issues may be a problem to solve with architecture changes rather than scale.

CommunismDoesntWork t1_je1bklp wrote on March 28, 2023 at 6:20 PM

Maybe figure out how to train an LLM with far less data and much faster?

LanchestersLaw t1_je22xzv wrote on March 28, 2023 at 9:09 PM

Something ive seen a lot of on reddit which you can get a slice of is now that GPT is out, “let me build an app that has GPT do this thing automatically” with varying degrees of success from dating bot to medical diagnosis tools

Cherubin0 t1_jdz7s1i wrote on March 28, 2023 at 7:28 AM

The do called botter lesson just shows that the research is still at the start. Of course line fitting gets better the more data points you have. We are still just line fitting.

[deleted] t1_jdy460s wrote on March 28, 2023 at 1:07 AM

[removed]

[deleted] t1_jdyf5pe wrote on March 28, 2023 at 2:26 AM

[removed]

[deleted] t1_jdzp97e wrote on March 28, 2023 at 11:28 AM

[removed]

[deleted] t1_jdzpeq4 wrote on March 28, 2023 at 11:29 AM

[removed]

mkffl t1_je53xf1 wrote on March 29, 2023 at 2:08 PM

What impact has gpt delivered except some interest from the general population about generative models - which is not insignificant? Not much, so there’s potentially a lot of work needed to turn it into something useful, and I would focus on this.

[deleted] t1_jeflnld wrote on March 31, 2023 at 5:31 PM

[removed]

BawkSoup t1_jdykxs6 wrote on March 28, 2023 at 3:12 AM

FOMO? This is peak 1st world problems.

It's work, man. Do your passions on your own time. Or start your own company.

CatalyzeX_code_bot t1_jdxqbho wrote on March 27, 2023 at 11:24 PM

Found relevant code at https://github.com/microsoft/LoRA + all code implementations here

--

To opt out from receiving code links, DM me

Comments