I'm no expert either but this definitely felt like the sort of question that sounds basic but hits on some fundamental/abstract "theory of information" sort of complexity. It's why I find it so fascinating -- there's something really mysterious and compelling going on in these models that even the researchers themselves are struggling to unravel. Thanks for taking the time!


Is there a reason the language model part of image diffusion requires a lot less horsepower than running a language model by itself? I'm still amazed SD works quickly on my 2016-era PC, but apparently something like GPT-J requires dozens or hundreds of GB of memory to even store. Is it the difference between generating new text vs. working with existing text?


>You invest so much in it, don't you? It's what elevates you above the beasts of the field, it's what makes you special. Homo sapiens, you call yourself. Wise Man. Do you even know what it is, this consciousness you cite in your own exaltation? Do you even know what it's for?

>Maybe you think it gives you free will. Maybe you've forgotten that sleepwalkers converse, drive vehicles, commit crimes and clean up afterwards, unconscious the whole time. Maybe nobody's told you that even waking souls are only slaves in denial.

>Make a conscious choice. Decide to move your index finger. Too late! The electricity's already halfway down your arm. Your body began to act a full half-second before your conscious self 'chose' to, for the self chose nothing; something else set your body in motion, sent an executive summary—almost an afterthought— to the homunculus behind your eyes. That little man, that arrogant subroutine that thinks of itself as the person, mistakes correlation for causality: it reads the summary and it sees the hand move, and it thinks that one drove the other.

>But it's not in charge. You're not in charge. If free will even exists, it doesn't share living space with the likes of you.

-- Peter Watts, Blindsight