currentscurrents t1_j2cm36p wrote on December 31, 2022 at 6:26 AM

TL;DR they want to take another language model (Google’s PaLM) and do Reinforcement Learning with Human Feedback (RLHF) on it like OpenAI did for ChatGPT.

At this point they haven't actually done it yet, since they need both compute power and human volunteers to do the training:

>Human volunteers will be employed to rank those responses from best to worst, using the rankings to create a reward model that takes the original model’s responses and sorts them in order of preference, filtering for the top answers to a given prompt.

>However, the process of aligning this model with what users want to accomplish with ChatGPT is both costly and time-consuming, as PaLM has a massive 540 billion parameters. Note that the cost of developing a text-generating model with only 1.5 billion parameters can reach up to $1.6 million.

Since it has 540b parameters, you will still need a GPU cluster to run it.

Ok_Reference_7489 t1_j2e73fe wrote on December 31, 2022 at 4:34 PM

>At this point they haven't actually done it yet

There is no "they" there. This is just some random crypto guy's blog who clearly does not know what he is talking about.

currentscurrents t1_j2ef37r wrote on December 31, 2022 at 5:28 PM

Right, he's not the developer - it's just an article about the project.

Ok_Reference_7489 t1_j2eg79x wrote on December 31, 2022 at 5:36 PM

There is no project.

currentscurrents t1_j2ege8h wrote on December 31, 2022 at 5:37 PM

https://github.com/lucidrains/PaLM-rlhf-pytorch

Ok_Reference_7489 t1_j2ehw9g wrote on December 31, 2022 at 5:47 PM

LucidDrains "implements" all kinds of a papers. He has more than 200 such repos. But, as far as I know, he never actually tries to reproduce the results in the paper or run at any kind of scale. Note, that in the readme he points people to other projects.

FruityWelsh t1_j2covdi wrote on December 31, 2022 at 6:57 AM

it'll be interesting if something like petal.ml can help with this. The human reinforcement and getting gpu processing parts that is.

lucidrage t1_j2e7pgv wrote on December 31, 2022 at 4:39 PM

Just Blockchain it and use the rewards tokens for api consumption

Whiteboinebony t1_j2evf1c wrote on December 31, 2022 at 7:18 PM

How would you prevent people from giving bad responses?

Southern-Trip-1102 t1_j2ffir3 wrote on December 31, 2022 at 9:44 PM

As long as the net responses are good shouldn't it still work albeit less efficiently? not talking bout block

[deleted] t1_j2eyxu3 wrote on December 31, 2022 at 7:43 PM

[deleted]

[deleted] t1_j2f8o4v wrote on December 31, 2022 at 8:53 PM

[deleted]

[deleted] t1_j2dzkvw wrote on December 31, 2022 at 3:41 PM

[deleted]

Glycerine t1_j2ddrg8 wrote on December 31, 2022 at 12:27 PM

This went viral pretty quickly. I'm pretty sure that was posted on reddit only a few days ago about going open source with the project: https://github.com/lucidrains/PaLM-rlhf-pytorch

https://old.reddit.com/r/artificial/comments/zy6swx/palm_with_rlhf_is_now_opensource/

I starred it this week at ~50stars, now it's 3.3k

It looks really exciting, but yes it's not easy to run. Knowing I'm underpowered for most ML work I still gave it a shot on my AMD 4.0Ghz - 32GB ram - 1080GTX.

The moment I knew it was out of reach to process wikipedia:

training:   0% 36/100000 [1:01:05&lt;2581:58:40, 92.98s/it]training loss: 2.976300001144409

That shows it took 1 hour to reach epoch 36 (of 100K). Which estimates about 3 months (24/7) of training...

Secondly it's not built for daily driving yet, the source is still in dev mode and needs a intermediate python dev to execute it - just due to the implementation after the training step.

It would be fun to have a slow input system, or some documentation on how to load super thin datasets as an example. A finished model I can run immediately would be awesome - but I guess that's what the other team are doing.

The future of talky speaky machines is getting very exiting; I can't wait to see what happens two more papers down the line... I'm 101% looking forward to my speaky toaster!!!

comefromspace t1_j2dgqhz wrote on December 31, 2022 at 1:01 PM

> The moment I knew it was out of reach to process wikipedia:

training:   0%| 274/100000 [10:06&lt;55:51:29,  2.02s/it] training loss: 1.4352326393127441

on GTX1650

Disastrous_Elk_6375 t1_j2e6d6d wrote on December 31, 2022 at 4:29 PM

> 92.98s/it

Are your CPUs fully used when training? You might want to check if this is running on GPU or not, those numbers are generally found on CPU training.

Glycerine t1_j2frh3o wrote on December 31, 2022 at 11:13 PM

You're right it's poor. All 8 CPU's hit 100%.

As an update though:

I made a bunch of changes and reduces the dataset to 5 lines from wikipedia; reduced the PaLM size to about 25% of the original, and reduced the epoch times to 8.

It's phenomenal. Within < 30 minutes and a bunch of poking it can easily generate sensible sentences.

I dropped it onto lambda GPU A100 instance - it's silly fast

Edit:

As an example; I trained the model on 5 sentences, with a optimal length of ~128 chars. I ask for a word and see what it constructs.

The goal here is to see if it produces sensible sentences from real words:

With a known word the response is fairly stable:

 qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example')
'example, wrote of violence as a necessary and some'
&gt;&gt;&gt; qu('example', 20)
'example, wrote of vi'
&gt;&gt;&gt; qu('example', 10)
'example, w'
&gt;&gt;&gt; qu('example', 50)
'example, wrote of violence as a necessary and some'

untrained words produce some interesting results. Prior to the <100 epochs of training it was saying nonsense:

tensor(0.0431, grad_fn=&lt;NllLoss2DBackward0&gt;)
&gt;&gt;&gt; qu('when')
'whent he wher a arevo-pociaty on indiviolent resis'
&gt;&gt;&gt; qu('when')
'whent he refuted Nechaev).  Other anarchists, some'
&gt;&gt;&gt; qu('but')
'but. how a free society might be brought about.  H'
&gt;&gt;&gt; qu('but')
'but.  The there is also ofowerat; there is no [[co'

Disastrous_Elk_6375 t1_j2ft5zo wrote on December 31, 2022 at 11:26 PM

> You're right it's poor. All 8 CPU's hit 100%.

Yeah, you're probably not using the gpu. Make sure that your pytorch & cuda stuff are compatible and properly installed. To test, go into a python session, and do


torch.cuda.is_available()

If the output is false it will train on CPU.

Ronny_Jotten t1_j2esha7 wrote on December 31, 2022 at 6:58 PM

This is clickbait, there's nothing to see here. Wang, among others, has been working on putting together some code as a kind of "proof of concept" that could do RLHF on top of PaLM. Actually doing that on the scale of ChatGPT, i.e. implementing a large, trained, working system, is a completely different story.

The readme includes this:

> This repository has gone viral without my permission. Next time, if you are promoting my unfinished repositories (notice the work in progress flag) for twitter engagement or eyeballs, at least (1) do your research or (2) be totally transparent with your readers about the capacity of the repository without resorting to clickbait. (1) I was not the first, CarperAI had been working on RLHF months before, link below. (2) There is no trained model. This is just the ship and overall map. We still need millions of dollars of compute + data to sail to the correct point in high dimensional parameter space. Even then, you need professional sailors (like Robin Rombach of Stable Diffusion fame) to actually guide the ship through turbulent times to that point.

lucidraisin t1_j2exfpq wrote on December 31, 2022 at 7:32 PM

my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.

but still it is deceptive to the average person to tell them that chatgpt replication is imminent. good code is just a prerequisite to begin the journey. it will take data, compute, adventurers to actually set sail, and in the case of chatgpt, a complicated process of gathering human feedback (I will do my best to lower the activation energy by building a simple and concise app that covers all cases, assuming RLHF does not get outdated by another technique)

Ronny_Jotten t1_j2fcaq1 wrote on December 31, 2022 at 9:20 PM

> my repositories are more than proof of concept. they have led to the training of significant models, Stable Diffusion among them.

Sure, but I didn't say anything about your other repositories. I said that this particular repository is a proof of concept, in the sense that it demonstrates working code that could serve in the development of a future open-source ChatGPT-like system, but such a system, as you say, is not imminent. It's great that you're working towards it though!

lucidraisin t1_j2ff2cy wrote on December 31, 2022 at 9:40 PM

right right, more work remains to be done after the new years. we will get there

[deleted] t1_j2fevju wrote on December 31, 2022 at 9:39 PM

[deleted]

Dendriform1491 t1_j2cwdur wrote on December 31, 2022 at 8:32 AM

If visual pollution was a website.

3deal t1_j2d5iuj wrote on December 31, 2022 at 10:38 AM

System requierment : 4x RTX 4090

ThatInternetGuy t1_j2d5nkm wrote on December 31, 2022 at 10:40 AM

170GB VRAM minimum.

So that's 8x RTX 4090.

3deal t1_j2d8bj5 wrote on December 31, 2022 at 11:17 AM

I mean, for a startup it is not very expensive for all the benefit it gives.

Disastrous_Elk_6375 t1_j2de4o2 wrote on December 31, 2022 at 12:32 PM

Can the 4090 pool their VRAM? I always thought that LLMs need GPUs from the A/V series so that they can pool memory. Am I wrong in thinking that?

zaptrem t1_j2e2lvb wrote on December 31, 2022 at 4:03 PM

You can do pipeline parallelism via FairScale and HF Accelerate on any identical (and sometimes non identical) GPUs.

ThatInternetGuy t1_j2deqmr wrote on December 31, 2022 at 12:39 PM

Need to deploy the inference model with Colossal AI.

sheeplearning t1_j2dz5eg wrote on December 31, 2022 at 3:38 PM

I don't believe this sub allowed clickbait content like this. mods?

IdainaKatarite t1_j2eew0u wrote on December 31, 2022 at 5:27 PM

Serious answer, you could look into getting compute power from CoreWeave. Obviously, something like this is crucial to humanity. (Compare Stable Diffusion to everything else). If we let Big Tech control AI Alignment, then very soon its version of reality will be the dominant one (this is the Letter Agencies' wet dream).

This could be potentially be one of our most important battles of our life time.

Mikatron3000 t1_j2e07hz wrote on December 31, 2022 at 3:46 PM

Now if only this was distributed in some decentralized fashion.

Mcfrlnd38 t1_j2ep0s3 wrote on December 31, 2022 at 6:34 PM

I'm sure the p2p squad will get their hands on it heheh

legocuber t1_j2eyv3m wrote on December 31, 2022 at 7:42 PM

This is kind of clickbait. Cool that they reproduced some of it, but 90% of that had existed since OpenAI released the source code for InstructGPT's architecture. The real limitation is data and compute, which this repo doesn't really provide... What is really required is a huge open-source RLHF dataset (like ImageNet but for human instructions)

Jean-Porte t1_j2fcs81 wrote on December 31, 2022 at 9:24 PM

Mom, can we have chatGPT ?

No we have chatGPT at home

ChatGPT at home:

PrinceOfLies0 t1_j2d5h95 wrote on December 31, 2022 at 10:38 AM

If the cost of efficiently running a trained LLM locally comes down to ~ 100k, it would probably be a worthwhile investement for me. Definitely something to look out for and potentially contribute. Exciting times :)

EthansWay007 t1_j2eykpr wrote on December 31, 2022 at 7:40 PM

I’m new to the world of programing (mostly Python) what does this becoming open source mean? You can view the API to it?

rawzone t1_j2f45tb wrote on December 31, 2022 at 8:20 PM

wth.!? A pretty decent written short "mainstream" article on a pretty complex tech subject.

Comments