Viewing a single comment thread. View all comments

Open-Dragonfly6825 OP t1_j72s5ov wrote

One question: what do you mean by "kernels" here? It is the CNN operation you do to the layers? (As I said, I am not familiar with Deep Learning, and "kernels" means another thing when talking about GPU and FPGA programming.)

I know about TPUs and I understand they are the "best solution" for deep learning. However, I did not mention them since I won't be working with them.

Why wouldn't GPU parallelization make inference faster? Isn't inference composed mainly of matrix multiplications as well? Maybe I don't understand very well how GPU training is performed and how it differs from inference.

1

suflaj t1_j731s6u wrote

I mean kernels in the sense of functions.

> Why wouldn't GPU parallelization make inference faster?

Because most DL models are deep, and not exactly wide. I've explained already, deep means a long serial chain. Not parallelizable outside of data parallelism, which doesn't speed up inference, and model parallelism (generally not implemented, and has heavy IO costs).

Wide models and how they become equivalent to deep ones are unexplored, although they are theoretically just as expressive.

1

Open-Dragonfly6825 OP t1_j73258d wrote

Ok, that makes sense. Just wanted to confirm I understood it well.

Thank you.

2