visarga t1_ix2vuys wrote on November 20, 2022 at 9:14 AM

#588,348

You mean like this? You just prepend "The following is a conversation with [a very intelligent AI | a human expert]". In image generation the trick is to add artist names to the prompt "in the style of X and Y", also called "style phrases" or "vitamin phrases".

Dall-E 2 was tweaked in a similar way to be more diverse when asking for a photo of a CEO, or other job, they would add various race and gender keywords. People were generally upset about having their prompts modified. But prepending the modifier on top by default might be useful in some cases.

If you want to extract a specific style or ability more precisely from a model you can fine-tune it on a small dataset, probably <1000 examples. This is easy to do using the cloud APIs, but not as easy as prompting.

massimosclaw2 t1_ix2w5bc wrote on November 20, 2022 at 9:19 AM

#588,365

Replying to visarga (#588,348)

Not quite. I think there’s value to this technique but it’s still constrained by probability of what GPT thinks an AI would say based on all the instances of similar texts in the data it consumed, which is not quite the same thing

visarga t1_ix2wyw6 wrote on November 20, 2022 at 9:31 AM

#588,432

Replying to massimosclaw2 (#588,365)

There is also prompt-tuning that will fine-tune only a few token embeddings keeping the model itself frozen. This changes the problem from finding that elusive prompt to finding a few labeled examples + fine-tuning the prompt.

Another approach is to use a LLM to generate prompts and filter them by evaluation. This has also been used to generate step by step reasoning traces for datasets that only have input-output pairs. Then train another model on the examples + chain of thought for a big jump in accuracy.

There's a relevant paper here: Large Language Models Can Self-Improve. They find that

> fine-tuning on reasoning is critical for self-improvement

I would add that sometimes you can evaluate a result, for example when generating math or code. Then you can learn from the validated outputs of the network. Basically what was used for AlphaZero to reach super-human level without supervision, but requires a kind of simulator - a game engine, a python interpreter, or a symbolic math engine.