Submitted by Norlax_42 t3_xuojma in MachineLearning

Stable Diffusion in the diffusers library became x3 times faster thanks to a set of optimizations tips, some of which require minimal code changes, making it the fastest implementation of Stable Diffusion out there!

You can now generate 3 images of size 512x512 with 50 steps in less than 26 seconds - beating the Keras' implementation. All you have to do is run this notebook in free colab.

The best thing about these optimisations is that they work for most Deep Learning models (as long as you're using Pytorch), so feel free to try them on other models as well!

To understand better how these optimisations work, you can check either:

  • This recent tweet explaining the optimisations made
  • The diffusers library docs about optimisation

​

Generating 3 images with 50 steps takes less than 26 seconds on colab's Tesla T4

91

Comments

You must log in or register to comment.

ReginaldIII t1_iqzu0z9 wrote

I've been using the KerasCV's implementation with a T4 GPU on Colab with 16 bit floats and jitted to do batch size 5, 25 steps in 13 seconds. So I don't think it's fair to say you outright beat Keras' performance.

Amazing work all the same.

2

DuLLSoN t1_ir04t67 wrote

5x 512x512 images with 25 steps on T4 Colab in 13 seconds? I would like to see a notebook of that.

I wonder if you mean 13 seconds per image because this implementation reports ~10s per image with mixed precision.

2

ReginaldIII t1_ir067hc wrote

import keras_cv
from tensorflow import keras

keras.mixed_precision.set_global_policy("mixed_float16")
model = keras_cv.models.StableDiffusion(img_width=512, img_height=512, jit_compile=True)

images = model.text_to_image("photograph of an astronaut riding a horse", batch_size=5)
3

Powered_JJ t1_ir07egr wrote

I hope one day this will be included in webui (auto) repo... :)

2

pennomi t1_ir0fjj2 wrote

Just read something about Facebook’s AITemplate code making StableDiffusion 3x faster as well. Worth looking into as another potential optimization.

3

Norlax_42 OP t1_ir2k1z5 wrote

Please notice that I reported 25.48 seconds for 50 steps. While you're talking about 25 steps. I expect this implementation to take less than 13 seconds for 15 steps.

In keras's blog they reported a runtime of 28.97s for 50 steps, thus the claim to beat their performance.

3