Viewing a single comment thread. View all comments

FutureIsMine t1_j41l2ck wrote

There's albert which reuses the same layers throughout, I can see a case where albert is used, and a decoder thats a few neuros is also present, where at each step it will use a token in the input to determine if its time to stop, similarly reso net did this for Q&A

3