Viewing a single comment thread. View all comments

incrediblediy t1_ir4ssxg wrote

what is the "sequence length" in BERT ?

3

minimaxir t1_ir6bisc wrote

The amount of tokens in the input. Sequence length requires quadratic scaling compute.

Pretrained BERT takes in a maximum of 512 tokens.

2

incrediblediy t1_ir7bljy wrote

Yeah! I mean what is the seq_length used by OP :) also the batch size :) I have tried seq_length = 300 but with a small batch size in Colab, specially with AdamW instead of Adam

3