Viewing a single comment thread. View all comments

Miguel7501 t1_j7tfkpj wrote

It doesn't learn anything from a single thread in pre-training. This either happened because the data selection was bad or because humans fed it wrong information during the reinforcement learning period.

That reinforcement learning period isn't over though. OpenAI is collecting a metric fuckton of data through the public demo and they use a lot of it to improve the model, so you can give feedback and that will make it into the model very quickly.

7