Submitted by altmorty t3_113x9ir in technology
gurenkagurenda t1_j8vnlyo wrote
Reply to comment by anti-torque in ChatGPT is a robot con artist, and we’re suckers for trusting it by altmorty
I think you must be getting confused because of the "reward predictor". The reward predictor is a separate model which is used in training to reduce the amount of human effort needed to train the main model. Think of it as an amplifier for human feedback. Prediction is not what the model being trained does.
anti-torque t1_j8xd81i wrote
Yes, I see the meanings as different, because I was thinking the context of the question would bias the result.
Viewing a single comment thread. View all comments