Viewing a single comment thread. View all comments

massimosclaw2 t1_ix2w5bc wrote

Not quite. I think there’s value to this technique but it’s still constrained by probability of what GPT thinks an AI would say based on all the instances of similar texts in the data it consumed, which is not quite the same thing

1

visarga t1_ix2wyw6 wrote

There is also prompt-tuning that will fine-tune only a few token embeddings keeping the model itself frozen. This changes the problem from finding that elusive prompt to finding a few labeled examples + fine-tuning the prompt.

Another approach is to use a LLM to generate prompts and filter them by evaluation. This has also been used to generate step by step reasoning traces for datasets that only have input-output pairs. Then train another model on the examples + chain of thought for a big jump in accuracy.

There's a relevant paper here: Large Language Models Can Self-Improve. They find that

> fine-tuning on reasoning is critical for self-improvement

I would add that sometimes you can evaluate a result, for example when generating math or code. Then you can learn from the validated outputs of the network. Basically what was used for AlphaZero to reach super-human level without supervision, but requires a kind of simulator - a game engine, a python interpreter, or a symbolic math engine.

1