Viewing a single comment thread. View all comments

Zironic t1_j6nh61n wrote

>You CPU can do maybe 8 unique instructions on a single piece of data each.

A modern CPU core can run 3 instructions per cycle on 512 bits of data, making each core equivalent to about 96 basic shaders. Even so you can see how even a 20 core CPU can't keep up with even a low end GPU in raw parallel throughput.

>CPUs are better at calculations that only need to be done on a single piece of data since they are clocked higher and no latency to setup.

The real benefit isn't the clockrate, if that was the main difference we wouldn't be using CPU's anymore because they're not that far apart.

What CPU's have which GPUs do not is branch prediction and very very advanced data pipelines and instruction queue's which allow per-core performance a good order of magnitude better then a shader for anything that involves branches.

1

Thrawn89 t1_j6nkd1y wrote

True, SIMD is absolutely abysmal at branches since it needs to take both true and false cases for the entire wave (usually). There are optimizations that GPUs do so it's not always terrible though.

It sounds like you're discussing vector processing instruction set with 512 bits which are very much specialized for certain tasks such as memcpy and not much else? That's just an example of a small SIMD on the CPU.

1

Zironic t1_j6nzase wrote

>It sounds like you're discussing vector processing instruction set with 512 bits which are very much specialized for certain tasks such as memcpy and not much else? That's just an example of a small SIMD on the CPU.

The vector instruction set is primarily for floating point math but also does integer math. It's only specialized for certain tasks in so far those certain tasks are SIMD, it takes advantage of the fact that doing a math operation across the entire memory of the CPU is as fast as doing it on just a single word.

In practice most programs don't lend themselves to vectorisation so it's mostly used for physics simulations and the like.

1