You must log in or register to comment.

currentscurrents t1_j2cm36p wrote

TL;DR they want to take another language model (Google’s PaLM) and do Reinforcement Learning with Human Feedback (RLHF) on it like OpenAI did for ChatGPT.

At this point they haven't actually done it yet, since they need both compute power and human volunteers to do the training:

>Human volunteers will be employed to rank those responses from best to worst, using the rankings to create a reward model that takes the original model’s responses and sorts them in order of preference, filtering for the top answers to a given prompt.

>However, the process of aligning this model with what users want to accomplish with ChatGPT is both costly and time-consuming, as PaLM has a massive 540 billion parameters. Note that the cost of developing a text-generating model with only 1.5 billion parameters can reach up to $1.6 million.

Since it has 540b parameters, you will still need a GPU cluster to run it.


Ok_Reference_7489 t1_j2e73fe wrote

>At this point they haven't actually done it yet

There is no "they" there. This is just some random crypto guy's blog who clearly does not know what he is talking about.


currentscurrents t1_j2ef37r wrote

Right, he's not the developer - it's just an article about the project.


Ok_Reference_7489 t1_j2eg79x wrote

There is no project.


FruityWelsh t1_j2covdi wrote

it'll be interesting if something like can help with this. The human reinforcement and getting gpu processing parts that is.


lucidrage t1_j2e7pgv wrote

Just Blockchain it and use the rewards tokens for api consumption


Glycerine t1_j2ddrg8 wrote

This went viral pretty quickly. I'm pretty sure that was posted on reddit only a few days ago about going open source with the project:

I starred it this week at ~50stars, now it's 3.3k

It looks really exciting, but yes it's not easy to run. Knowing I'm underpowered for most ML work I still gave it a shot on my AMD 4.0Ghz - 32GB ram - 1080GTX.

The moment I knew it was out of reach to process wikipedia:

training:   0% 36/100000 [1:01:05<2581:58:40, 92.98s/it]training loss: 2.976300001144409

That shows it took 1 hour to reach epoch 36 (of 100K). Which estimates about 3 months (24/7) of training...

Secondly it's not built for daily driving yet, the source is still in dev mode and needs a intermediate python dev to execute it - just due to the implementation after the training step.

It would be fun to have a slow input system, or some documentation on how to load super thin datasets as an example. A finished model I can run immediately would be awesome - but I guess that's what the other team are doing.

The future of talky speaky machines is getting very exiting; I can't wait to see what happens two more papers down the line... I'm 101% looking forward to my speaky toaster!!!


comefromspace t1_j2dgqhz wrote

> The moment I knew it was out of reach to process wikipedia:

training:   0%| 274/100000 [10:06<55:51:29,  2.02s/it] training loss: 1.4352326393127441

on GTX1650


Disastrous_Elk_6375 t1_j2e6d6d wrote

> 92.98s/it

Are your CPUs fully used when training? You might want to check if this is running on GPU or not, those numbers are generally found on CPU training.


Glycerine t1_j2frh3o wrote

You're right it's poor. All 8 CPU's hit 100%.

As an update though:

I made a bunch of changes and reduces the dataset to 5 lines from wikipedia; reduced the PaLM size to about 25% of the original, and reduced the epoch times to 8.

It's phenomenal. Within < 30 minutes and a bunch of poking it can easily generate sensible sentences.

I dropped it onto lambda GPU A100 instance - it's silly fast


As an example; I trained the model on 5 sentences, with a optimal length of ~128 chars. I ask for a word and see what it constructs.

The goal here is to see if it produces sensible sentences from real words:

With a known word the response is fairly stable:

'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example', 20)
'example, wrote of vi'
&gt;&gt;&gt; qu('example', 10)
'example, w'
&gt;&gt;&gt; qu('example', 50)
'example, wrote of violence as a necessary and some'

untrained words produce some interesting results. Prior to the <100 epochs of training it was saying nonsense:

tensor(0.0431, grad_fn=&lt;NllLoss2DBackward0&gt;)
&gt;&gt;&gt; qu('when')
'whent he wher a arevo-pociaty on indiviolent resis'
&gt;&gt;&gt; qu('when')
'whent he refuted Nechaev).  Other anarchists, some'
&gt;&gt;&gt; qu('but')
'but. how a free society might be brought about.  H'
&gt;&gt;&gt; qu('but')
'but.  The there is also ofowerat; there is no [[co'

Disastrous_Elk_6375 t1_j2ft5zo wrote

> You're right it's poor. All 8 CPU's hit 100%.

Yeah, you're probably not using the gpu. Make sure that your pytorch & cuda stuff are compatible and properly installed. To test, go into a python session, and do


If the output is false it will train on CPU.


Ronny_Jotten t1_j2esha7 wrote

This is clickbait, there's nothing to see here. Wang, among others, has been working on putting together some code as a kind of "proof of concept" that could do RLHF on top of PaLM. Actually doing that on the scale of ChatGPT, i.e. implementing a large, trained, working system, is a completely different story.

The readme includes this:

> This repository has gone viral without my permission. Next time, if you are promoting my unfinished repositories (notice the work in progress flag) for twitter engagement or eyeballs, at least (1) do your research or (2) be totally transparent with your readers about the capacity of the repository without resorting to clickbait. (1) I was not the first, CarperAI had been working on RLHF months before, link below. (2) There is no trained model. This is just the ship and overall map. We still need millions of dollars of compute + data to sail to the correct point in high dimensional parameter space. Even then, you need professional sailors (like Robin Rombach of Stable Diffusion fame) to actually guide the ship through turbulent times to that point.


lucidraisin t1_j2exfpq wrote

my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.

but still it is deceptive to the average person to tell them that chatgpt replication is imminent. good code is just a prerequisite to begin the journey. it will take data, compute, adventurers to actually set sail, and in the case of chatgpt, a complicated process of gathering human feedback (I will do my best to lower the activation energy by building a simple and concise app that covers all cases, assuming RLHF does not get outdated by another technique)


Ronny_Jotten t1_j2fcaq1 wrote

> my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.

Sure, but I didn't say anything about your other repositories. I said that this particular repository is a proof of concept, in the sense that it demonstrates working code that could serve in the development of a future open-source ChatGPT-like system, but such a system, as you say, is not imminent. It's great that you're working towards it though!


lucidraisin t1_j2ff2cy wrote

right right, more work remains to be done after the new years. we will get there


3deal t1_j2d5iuj wrote

System requierment : 4x RTX 4090


ThatInternetGuy t1_j2d5nkm wrote

170GB VRAM minimum.

So that's 8x RTX 4090.


3deal t1_j2d8bj5 wrote

I mean, for a startup it is not very expensive for all the benefit it gives.


Disastrous_Elk_6375 t1_j2de4o2 wrote

Can the 4090 pool their VRAM? I always thought that LLMs need GPUs from the A/V series so that they can pool memory. Am I wrong in thinking that?


zaptrem t1_j2e2lvb wrote

You can do pipeline parallelism via FairScale and HF Accelerate on any identical (and sometimes non identical) GPUs.


sheeplearning t1_j2dz5eg wrote

I don't believe this sub allowed clickbait content like this. mods?


IdainaKatarite t1_j2eew0u wrote

Serious answer, you could look into getting compute power from CoreWeave. Obviously, something like this is crucial to humanity. (Compare Stable Diffusion to everything else). If we let Big Tech control AI Alignment, then very soon its version of reality will be the dominant one (this is the Letter Agencies' wet dream).

This could be potentially be one of our most important battles of our life time.


Mikatron3000 t1_j2e07hz wrote

Now if only this was distributed in some decentralized fashion.


Mcfrlnd38 t1_j2ep0s3 wrote

I'm sure the p2p squad will get their hands on it heheh


legocuber t1_j2eyv3m wrote

This is kind of clickbait. Cool that they reproduced some of it, but 90% of that had existed since OpenAI released the source code for InstructGPT's architecture. The real limitation is data and compute, which this repo doesn't really provide... What is really required is a huge open-source RLHF dataset (like ImageNet but for human instructions)


Jean-Porte t1_j2fcs81 wrote

Mom, can we have chatGPT ?

No we have chatGPT at home

ChatGPT at home:


PrinceOfLies0 t1_j2d5h95 wrote

If the cost of efficiently running a trained LLM locally comes down to ~ 100k, it would probably be a worthwhile investement for me. Definitely something to look out for and potentially contribute. Exciting times :)


EthansWay007 t1_j2eykpr wrote

I’m new to the world of programing (mostly Python) what does this becoming open source mean? You can view the API to it?


rawzone t1_j2f45tb wrote

wth.!? A pretty decent written short "mainstream" article on a pretty complex tech subject.