Probably because the startup overhead dominates over the processing time. 500 weights is not really something you can apply to real life, as modern neural networks are 100+ million parameters for consumer hardware, and not on a dataset which is considered solved.
Viewing a single comment thread. View all comments