Submitted by cloneofsimo t3_ykiuq0 in MachineLearning

Hi. Today I've came across this interesting paper https://arxiv.org/abs/2210.16056 that proposes interesting method to combine semantics of text and image in diffusion process.

In short, this mixes "layout" with "content", however unlike style transfer,

>"...semantic mixing aims to fuse multiple semantics into one single object."

I was surprised by the examples they showed, so I wanted to try it but the code wasn't available. I've implemented the method myself, and I wanted to share it here!

https://github.com/cloneofsimo/magicmix

Layout of \"realistic photo of a rabbit\" with content of \"tiger\"

I hope my implementation helps who is reading the paper!

Note: I'm not the author of the paper, and this is not an official implementation

88

Comments

You must log in or register to comment.

Inevitable-Ad8503 t1_iuth7a3 wrote

Nice. Thanks for taking the time to implement and to share. I get the feeling that this approach will be way more coherent for certain types of content and layout (mayhaps when prompting something like “a tiger and a bunny sitting side by side” and the “traditional” (is it old enough to have traditions yet?) approach will be better for other types, perhaps such as “a tiger and a bunny”, where the intent is what that approach often results in, some interpolation between the two objects; a chimera or mutant.

Or maybe another way to explain is that this MagixMix is akin to orchestration of multiple parts, and the present approach is more akin to a mashup of multiple parts; each has its own strengths and appropriate applications.

9

cloneofsimo OP t1_iutiw0k wrote

Indeed, there are so many natural methods to interpolate concepts, and I agree 100% that there some are better than others at certain tasks.
Compared to famous Img2Img, I understood this as a "generalized" method to interpolate. Since if you take \mu = 1.0, this becomes just Img2Img interpolation. You can read the paper to see the effect of \mu on interpolation, and it's quite interesting. Since this is more general approach, there are more things to tweak and figure out I guess...?

1

starstruckmon t1_iuu086h wrote

I only gave the paper a cursory look, but isn't this the same thing as prompt editing?

5

cloneofsimo OP t1_iuvbdjb wrote

Prompt edit seems to be special case of MagicMix where Kmax = Kmin = T and nu = 0. MagicMix is more like Img2Img than sampling where ive understood it

2

LetterRip t1_iuxas7h wrote

It is prompt editing + prompt interpolation. So N steps of A, M steps of A transitioning to B, and then the remaining steps at B.

2

starstruckmon t1_iuxsbge wrote

Thanks. I understood it partially, but your explanation made everything crystal clear and things all clicked in an instant.

I wish more papers has an "intuition" section like this.

1

IntelArtiGen t1_iutivin wrote

Thanks for this implem, I'll try it out!

2

meldiwin t1_iuxmgkv wrote

I am not in the field, but I am working on multi-materials architecture designs and one of the questions how can I started design such system to come up with new possibilities of architecture and new geometries. While reading the abstract in that paper it mentions "novel object synthesis" what this mean actually?

I am also struggling to understand what are the possibilities behind the explosion of diffusion models beyond art, sorry if that sound ignorant but I want to understand and deploy this in my work hopefully.

1

msbeaute00000001 t1_iuxrxkk wrote

I would say that you should colab with someone in the field, doing this by yourself when you are not in the field might take a lot of time and a bit lower chance of success.

1

meldiwin t1_iuxshbk wrote

Do you have any names you would recommend?

1

msbeaute00000001 t1_iv2010u wrote

You just need to find someone working in the ML/AI that you could work with. This person needs to understand your problem and convert it into a ML problem. If you don't mind, I can take a look. I am looking for someway to apply my skills anyway.

1

LetterRip t1_iuxy54g wrote

pretty sure 'novel object' means a image that is the combination of multiple objects so for instance - dog + coffee_pot = dog with some characteristics of a coffee_pot (in the image examples the head was short of coffee pot like). rabbit + tiger = rabbit with tiger charactistics. rabbit + sheep = rabbit with sheep characteristics (the example showed a rabbit with a wool like texture as opposed to rabbit fur texture).

1

Tioben t1_iwei425 wrote

I've never been able to get a PC-based version of anything to run successfully, so if anyone makes or encounters a colab, huggingface, or similar implementation, I'd much appreciate a link!

1