daking999 t1_ix5hbok wrote on November 20, 2022 at 10:10 PM

While there are plenty of good responses here I want to add that I don't think ~~you're~~ your idea is dumb. Taking a pre-trained LLM and using it to initialize an RL agent that tries to maximize upvotes when commenting on reddit would be pretty interesting.

Nameless1995 t1_ix6hzd0 wrote on November 21, 2022 at 2:50 AM

Yeah, that would be an interesting scalable way to get human feedback. Perhaps, someone is already doing it.

blazejd OP t1_ix7jusm wrote on November 21, 2022 at 10:14 AM

Interesting idea, but it would probably turn into something similar to Yannic Kilcher's 4chan model which was super toxic because people give the most upvote in highly controversial topics.

daking999 t1_ix7livm wrote on November 21, 2022 at 10:40 AM

I don't think so, that's not what gets upvoted on reddit (for the most part, on the popular subreddits). It would be moderate/left-leaning. It might even learn to be funny.