Viewing a single comment thread. View all comments

suflaj t1_isbgg32 wrote

Probably because the startup overhead dominates over the processing time. 500 weights is not really something you can apply to real life, as modern neural networks are 100+ million parameters for consumer hardware, and not on a dataset which is considered solved.

3