masterofn1

masterofn1 t1_jdu8jug wrote

How does a Transformer architecture handle inputs of different lengths? Is the sequence length limit inherent to the model architecture or more because of resource issues like memory?

2