Viewing a single comment thread. View all comments

CommunismDoesntWork t1_iydruw8 wrote

Has anyone checked to see if training fundamentally needs all that precision? Intuitively, I can understand why it works better that way, but if a model can be converted to int8 after the fact without taking a huge hit in accuracy, then I don't see why an optimizer couldn't find that int8 network in the first place.

1

diviramon t1_iydw5aq wrote

Yeah - a quick search showed some attempts on RN50 and Mobilenet, but nothing on transformers (not surprising since INT8 quant for Bert is very hard). However, it seems like all the INT8 focus is shifting towards MF8 (edit FP8) which should be more suitable for training as well.

2