Viewing a single comment thread. View all comments

lmericle t1_irxbw0u wrote

Reinforcement learning does seem like a good option, but you'd need a bit of nuance in your reward function for training to work IMO. It would need to be more complex (and less sparse) than just "did you choose the correct next item or not".

1