Submitted by AdministrationOk2735 t3_11229f7 in MachineLearning
As I'm learning about how stable diffusion works, I can't figure out why during image generation there's a need to deal with 'noise'.
I know I'm glossing over a lot of details, but my understanding is that the algorithm is trained by gradually adding noise to an image and then de-noising it to recover the initial image. Wouldn't this be functionally equivalent to a machine that starts with an image, gradually reduces it to a blank canvas (all white), and then gradually reconstructs the original image? Then, post training, the generative process would just start with a blank canvas and gradually generate the image based on the input string provided.
The idea of generating an image from a blank canvas feels more satisfying to me than revealing an image hidden by noise, but I'm sure there's a mathematical/technical reason why what I'm suggesting doesn't work. Appreciate any insight into this!
NoLifeGamer2 t1_j8hmag2 wrote
To my understanding, if you use noise, then you can generate different images using the same algorithm, just by changing the noise. If you have a blank canvas, there is only 1 initial starting position (blank), so there would be only 1 output image.