suflaj t1_j3u4smq wrote on January 11, 2023 at 2:02 AM

Reply to comment by Think_Olive_1000 in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda

Google's BERT use is not a commercial, consumer product, it is an enterprise one (Google uses it and runs it on their hardware), they presumably use the large version or something even larger than the pretrained weights available on the internet and to achieve latencies they have they are using datacentres and non-trivial distribution schemes for it, not just consumer hardware.

Meanwhile, your average CPU will need anywhere from 1-4 seconds to do one inference pass in onnx runtime, of course much less on a GPU, but to be truly cross platform you're targetting JS in most cases, which means CPU and not a stack as mature as what Python/C++/CUDA have.

What I'm saying is:

people have said no to paid services, they want free products
consumer hardware has not scaled nearly as fast as DL
even ancient models are still too slow to run on consumer hardware after years of improvement
distilling, quantizing and optimizing them seems to get them to run just fast enough to not be a nuisance, but is often too tedious to work out for a free product