Viewing a single comment thread. View all comments

visarga t1_ix2wyw6 wrote

There is also prompt-tuning that will fine-tune only a few token embeddings keeping the model itself frozen. This changes the problem from finding that elusive prompt to finding a few labeled examples + fine-tuning the prompt.

Another approach is to use a LLM to generate prompts and filter them by evaluation. This has also been used to generate step by step reasoning traces for datasets that only have input-output pairs. Then train another model on the examples + chain of thought for a big jump in accuracy.

There's a relevant paper here: Large Language Models Can Self-Improve. They find that

> fine-tuning on reasoning is critical for self-improvement

I would add that sometimes you can evaluate a result, for example when generating math or code. Then you can learn from the validated outputs of the network. Basically what was used for AlphaZero to reach super-human level without supervision, but requires a kind of simulator - a game engine, a python interpreter, or a symbolic math engine.

1