Viewing a single comment thread. View all comments

PassionatePossum t1_iv05451 wrote

I would only include as a historical reference. It is certainly not a "must read" paper. It is written so poorly that you are better off to just look at the code.

1

flaghacker_ t1_iv5jf05 wrote

What's wrong with it? They explain all the components of their model in enough detail (in particular the multi head attention stuff), provide intuition behind certain decisions, include clear results, they have nice pictures, ... What could have been improved about it?

2