I'm planning to see how a latent diffusion model would perform in the image reconstruction from brain activity task. Specifically, the image generation would be conditioned on brain activity instead of text. Has anyone tried conditioning on brain activity or other information apart from text? I'm having a hard time digesting the code from the LDM repo and was wondering if anyone has tried coding it (or a simpler version) from scratch.

Comments

shawarma_bees t1_ivaa8if wrote on November 6, 2022 at 2:28 PM

#456,475

How is the “brain activity” information encoded?

acertainmoment t1_ivaloxt wrote on November 6, 2022 at 3:48 PM

#457,004

Most likely you won’t need to code everything from scratch. You’ll probably just need to add an nn.Linear or a 1x1 conv to convert whatever dimension your brain activity data is into the dimension of the tensor that it is currently conditioned on (I think it’s 1024 or 2048 dim embeddings currently not exactly sure)

johnnydaggers t1_ivb869u wrote on November 6, 2022 at 6:17 PM

#457,920

It’s trivially easy. The problem is getting enough training data for something like that.

[deleted] t1_ivbb58c wrote on November 6, 2022 at 6:35 PM

#458,099

[deleted]

king_of_walrus t1_ivbdzd4 wrote on November 6, 2022 at 6:53 PM

#458,234

I don’t think it’s that easy… do you have brain activity/image pairs for training? If so, do you have a lot of them?

To train the conditional models, and you need pairs of targets and conditioning info. Also, the code is a lot to digest. I would suggest looking in ldm/models/diffusion/ddpm.py to see how things work. You can clearly see all diffusion related code and the training logic. It may help your understanding.

bloc97 t1_ivbmwbb wrote on November 6, 2022 at 7:50 PM

#458,605

Replying to king_of_walrus (#458,234)

I'm just guessing, but it's probably pairs of visual cortex activations with images seen by an animal (maybe mice)...

le_theudas t1_ivbztjx wrote on November 6, 2022 at 9:10 PM

#459,155

Doing Something similar, use k-diffusion. You can either use the Cross Attention Inputs or add a Network component where you need it

ajin-wolf t1_ivc2wxf wrote on November 6, 2022 at 9:30 PM

#459,275

Someone did this with UK Biobank data (very large sample) although I think it would be more interesting to use the Child Mind Institute HBN dataset https://arxiv.org/abs/2209.07162

elbiot t1_ivdnwpb wrote on November 7, 2022 at 4:51 AM

#461,713

Latent diffusion works with text because Clip was trained on millions of pairs of text and image already. You've got a huge project of training millions of brain activity/text pairs ahead of you

Traditional_Tale_748 t1_ive0vsq wrote on November 7, 2022 at 7:26 AM

#462,161

If I were you I would a convert brain activity into a graph then into an image. Then you can use that as a condition input for standard diffusion model.