This is wrong see:

The real reason is it's just faster to train on smaller batches (because the steps are quicker).


Yes that too, although my explanation wasn't incorrect, there was just more needed to the explanation right?


There was for a while the belief that the stochasticity was key for good performance (one paper supporting the hypothesis from 2016). Your framing makes it sound like that is still the case - you suggest no other reason for not doing full batch descent - and I think it's important to point out it's not.