Viewing a single comment thread. View all comments

tdgros t1_j41fn3f wrote

At train time, you plug decoders at many levels with the same objective, you can find out if some things can be decoded earlier, using an additional network that outputs a sort of confidence. At inference time, you run the layers one by one, and stop when the confidence is high. which allows you to skip some computations. (It's probably a simplistic description, feel free to correct me)

4