ryunuck

ryunuck t1_ir6aulx wrote

Yet I open up the codebase and I still can't understand shit. What are we getting out of the model when we run inference on it? It's not an image, it looks like some sort of "bag of imagery". We have a sampler that is sampling this bag. How does this work exactly? I hate these high level explanations, they don't explain anything. No one can read this article and reimplement Stable Diffusion. I look at the different samplers implemented in k-diffusion and I am left mystified.

Sorry if I come off aggressive, not the intention! Your explanation on transformers is truly amazing and this one is great as well. I'm just tired of reading these overly simplified explanations targeted at 'mom and dad'; These little arrows and grids don't mean anything to me if you don't relate them to the code. Stable Diffusion has nothing to do with maths and statistics, it is a programmed behavior. Imagine if we explained how to implement a raycaster purely theoretically with pictograms. F*** that! A minimal implementation of a raycaster with heavy documentation, and pictograms on the side if you want, is infinitely more useful.

I may not be a master statistician, but as a programmer if you explain each line one by one I should be able to truly grasp what is happening. Print the tensors, show me exactly what they look like in text, then you can map the text to images. If someone actually explained these implementations, we could unlock a whole new pool of talent contributing to the field. This does not help anyone understand how SD works, it only helps to pretend like I do.

1