Submitted by Just0by t3_z9q0pq in MachineLearning
Hi everyone, we just release probably the fastest Stable Diffusion. The following two pictures show that on A100 GPU, whether it is PCIe 40GB or SXM 80GB, OneFlow Stable Diffusion leads the performance results compared to other deep learning frameworks/compilers.
GitHub URL: https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion
OneFlow URL:https://github.com/Oneflow-Inc/oneflow/
​
Before that, On November 7th, OneFlow accelerated the Stable Diffusion to the era of "generating in one second" for the first time. On A100 SXM 80GB, OneFlow Stable Diffusion reaches a groundbreaking inference speed of 50 it/s, which means that the required 50 rounds of sampling to generate an image can be done in exactly 1 second. Now, OneFlow refreshed the SOTA record again.
You might wonder how OneFlow Stable Diffusion made this exciting result. Actually, OneFlow's compiler has played a pivotal role in accelerating the model. The compiler can allow any PyTorch frontend-built models to run faster on NVIDIA GPUs.
Welcome to try OneFlow Stable Diffusion and make your own masterpiece using Docker! all you need is to execute the following snippet:
docker run --rm -it \
--gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
-v ${HF_HOME}:${HF_HOME} \
-v ${PWD}:${PWD} \
-w ${PWD} \
-e HF_HOME=${HF_HOME} \
-e HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN} \
oneflowinc/oneflow-sd:cu112 \
python3 /demos/oneflow-t2i.py # --prompt "a photo of an astronaut riding a horse on mars"
Check out OneFlow on GitHub . We'd love to hear your feedback!
Deep-Station-1746 t1_iyi0y4e wrote
> whether it is PCIe 40GB or SXM 80GB
Oh thank god SXM 80GB is supported! I have way too many A100 80GBs just lying around the house, this will help me find some use for them. /s
Also, I might be stretching this a bit, but uh, do you guys happen to also have an under-8GB VRAM model lying around? :)