turnip_burrito t1_ja2q7t6 wrote on February 26, 2023 at 12:02 PM

Reply to comment by DizzyNobody in Raising AGIs - Human exposure by Lesterpaintstheworld

That's also interesting. It's like building a specialized "wariness" or "discernment" layer into the agent.

This really makes one wonder which kinds of pre-main and post-main processes (like other LLMs) would be useful to have.

DizzyNobody t1_ja2uka9 wrote on February 26, 2023 at 12:51 PM

I wonder if you can combine the two - have a judge that examines both input and output. Perhaps this is one way to mitigate the alignment problem. The judge/supervisory LLM could be running on the same model / weights as the main LLM, but with a much more constrained objective - prevent the main LLM from behaving in undesirable ways either by moderating its input and even by halting the main LLM when undesirable behaviour is detected. Perhaps it could even monitor the main LLM's internal state, and periodically use that to update its own weights.

turnip_burrito t1_ja6re1h wrote on February 27, 2023 at 6:43 AM

I think if we had the right resources, this would make a hell of a research paper and conference talk.