chodegoblin69

chodegoblin69 t1_j44uao7 wrote

Reply to comment by benanne in [R] Diffusion language models by benanne

Thank you, I will check those out.

Diffusion’s lack of causality constraint seems like a pretty tall hurdle for tasks with output formats requiring “fluency” (like summarization) though. Kind of like drawing hands early on in stable diffusion (or drawing most anything coherently for earlier models like disco diffusion). Multiple-choice question answering seems like a more natural domain, though certainly doesn’t show off the “expressive” generative abilities. Fluency probably improves significantly with scale and fine-tuning though.

1

chodegoblin69 t1_j3vdqtc wrote

Great blog post. I found the Li Diffusion-LM results very intriguing due to the seemingly better semantic capture, despite the tradeoff in fluency.

Question - do you see diffusion models as having any advantages for approaching the "long text" issue (token window size limit) that autoregressive models suffer from? Curious generally, but areas like abstractive summarization in particular come to mind.

1