oathbreakerkeeper t1_jd43931 wrote on March 21, 2023 at 6:14 PM

I'm using amp mixed precision which should be using fp16. It still requires training==false.

But the torch code also disables flash attention if autocast is enabled I'm not sure how to deal with that one.

Dependent_Ad5120 t1_jdec7kx wrote on March 23, 2023 at 7:57 PM

I don't know. I was using pure fp16, no autocast and it works.

How do you use pure fp16 out of curiosity? I've only ever trained with mixed precision, letting pytorch handle the fp16 stuff from there.

Do you have an example of a github repo that does it?

I don't have a github repo for this, but it is pretty simple:

```

model = nn.Transformer().cuda().half

input = torch.rand(..).cuda().half

with sdp_kernel(...enable only flash attn):

output = model(input)

```

These 4 lines should be enough.