You'd think, but I'm pretty sure no. Different models. Also different types of models, I think? Isn't most image captioning GAN?

One thing that's interesting about this q is that the diffusion models, as I understand them (not too well) do already involve a kind of "reversal" in their training - adding more and more noise to an image till it vanishes, then trying to create an image from "pure" noise.

Just in a really non mathy way, I wonder how OP imagines this accommodating rerolling? Would it provide an image seed?

