Submitted by 0x00groot t3_y7u6gg in MachineLearning

Text-Based Real Image Editing

Code: https://github.com/ShivamShrirao/diffusers/tree/main/examples/imagic

Colab: https://colab.research.google.com/github/ShivamShrirao/diffusers/blob/main/examples/imagic/Imagic_Stable_Diffusion.ipynb

Still need to play around and tune the parameters a bit, may not work as is on every subject. Hopefully everyone can try it out now.

Input Image

A photo of Barack Obama smiling with a big grin.

131

Comments

You must log in or register to comment.

ThatInternetGuy t1_isxfzu2 wrote

This Shivam Shrirao guy is super fast! Took him two days to make Dreambooth scripts and now just one day to make Imagic scripts.

15

advertisementeconomy t1_iswsi3r wrote

Wow. The pace is exciting. Is that the Barack from the original tweet or was it run through this implementation?

Here's the README for anyone interested: https://github.com/ShivamShrirao/diffusers/blob/main/examples/imagic/README.md

Is the .ipynb file a Jupyter Notebook that could be run locally on a card with 12GB VRAM (forgive me if this is a stupid question using Colab and Jupyter is new to me)?

7

0x00groot OP t1_iswtu7x wrote

This is produced through this implementation.

Yes you can run it locally in 12 GB VRAM.

5

Roarexe t1_isx0dpe wrote

Awesome, thanks for sharing!

3

danquandt t1_isy30dt wrote

How different is this in practice from running img2img on regular SD? The examples shown in the paper look very similar to what you would get from img2img, as far as I can tell.

(Ps: great work on your repos! I still can't run Dreambooth on my 3080 10gb but have played around with it in Collab and it's fantastic.)

2

HuWasHere t1_itafuvj wrote

img2img even at a high init image setting doesn't necessarily respect the init image, this is far more precise. It's limited (to my knowledge) because it uses one input image, but the results are pretty incredible.

2

thelastpizzaslice t1_isxw4bb wrote

What is the value of having a ckpt output? Is it like dreambooth?

1

0x00groot OP t1_isxwwp7 wrote

Not right now. You need the model weights along with the optimised embeddings to get the results.

2

thelastpizzaslice t1_isxxsok wrote

So, to use this, I run the colab, take the ckpt and also a pt that exists somewhere presumably, drop them into AUTOMATIC1111, and then I can pose a specific photo like it's a doll/restyle it at will in AUTOMATIC1111? Am I correct in this description?

2

0x00groot OP t1_isy024x wrote

Currently automatic doesn't support it. You can use the inference code given at the end of colab to generate images for now.

2

thelastpizzaslice t1_isy86ca wrote

I decided to copy paste the model into automatic1111 anyway. I made one based on a photo of Atul from spiritfarer with a loose description of him as "uncle frog spirit person" and it's actually the single best cartoon generator I've ever worked with. I've spent dozens of hours trying to make these things and this paper beat all of them on accident. What a time to be alive!

The author of this paper is apparently a genius who has built something better than TI or Dreambooth, and is massively understating his accomplishment.

Here's the three photos #1 is standard, #2 is dreambooth, #3 is imagic

This is Atul

5

0x00groot OP t1_isycsor wrote

Oh wow. That's really interesting. I'll have to look into it.

2

thelastpizzaslice t1_it4lynp wrote

Does this use model v1.5 or is it still running on v1.4?

1

0x00groot OP t1_it5u08r wrote

You can specify what to use with MODEL_NAME variable.

3

readyourSICP t1_itga0um wrote

Does this give the exact same output as 24gb VRAM?

1

deep-yearning t1_isxlgw1 wrote

paging Automatic1111

pls implement in webui

−1

nmkd t1_isxlyh2 wrote

This is not Windows compatible as far as I know.

1