Submitted by singularpanda t3_1060gfk in MachineLearning
suflaj t1_j3u4smq wrote
Reply to comment by Think_Olive_1000 in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda
Google's BERT use is not a commercial, consumer product, it is an enterprise one (Google uses it and runs it on their hardware), they presumably use the large version or something even larger than the pretrained weights available on the internet and to achieve latencies they have they are using datacentres and non-trivial distribution schemes for it, not just consumer hardware.
Meanwhile, your average CPU will need anywhere from 1-4 seconds to do one inference pass in onnx runtime, of course much less on a GPU, but to be truly cross platform you're targetting JS in most cases, which means CPU and not a stack as mature as what Python/C++/CUDA have.
What I'm saying is:
- people have said no to paid services, they want free products
- consumer hardware has not scaled nearly as fast as DL
- even ancient models are still too slow to run on consumer hardware after years of improvement
- distilling, quantizing and optimizing them seems to get them to run just fast enough to not be a nuisance, but is often too tedious to work out for a free product
Viewing a single comment thread. View all comments