Stable Diffusion XL Turbo¶
SDXL Turbo is an adversarial time-distilled Stable Diffusion XL (SDXL) model capable of running inference in as little as 1 step.
This guide will show you how to use SDXL-Turbo for text-to-image and image-to-image.
Before you begin, make sure you have the following libraries installed:
# uncomment to install the necessary libraries
#!pip install mindone transformers
Load model checkpoints¶
Model weights may be stored in separate subfolders on the Hub or locally, in which case, you should use the from_pretrained
method:
from mindone.diffusers import StableDiffusionXLPipeline
import mindspore as ms
pipeline = StableDiffusionXLPipeline.from_pretrained("stabilityai/sdxl-turbo", mindspore_dtype=ms.float16, variant="fp16")
You can also use the from_single_file
method to load a model checkpoint stored in a single file format (.ckpt
or .safetensors
) from the Hub or locally. For this loading method, you need to set timestep_spacing="trailing"
(feel free to experiment with the other scheduler config values to get better results):
from mindone.diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
import mindspore as ms
pipeline = StableDiffusionXLPipeline.from_single_file(
"https://huggingface.co/stabilityai/sdxl-turbo/blob/main/sd_xl_turbo_1.0_fp16.safetensors",
mindspore_dtype=ms.float16, variant="fp16")
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config, timestep_spacing="trailing")
Text-to-image¶
For text-to-image, pass a text prompt. By default, SDXL Turbo generates a 512x512 image, and that resolution gives the best results. You can try setting the height
and width
parameters to 768x768 or 1024x1024, but you should expect quality degradations when doing so.
Make sure to set guidance_scale
to 0.0 to disable, as the model was trained without it. A single inference step is enough to generate high quality images.
Increasing the number of steps to 2, 3 or 4 should improve image quality.
from mindone.diffusers import StableDiffusionXLPipeline
import mindspore as ms
pipeline_text2image = StableDiffusionXLPipeline.from_pretrained("stabilityai/sdxl-turbo", mindspore_dtype=ms.float16, variant="fp16")
prompt = "A cinematic shot of a baby racoon wearing an intricate italian priest robe."
image = pipeline_text2image(prompt=prompt, guidance_scale=0.0, num_inference_steps=1)[0][0]
image
Image-to-image¶
For image-to-image generation, make sure that num_inference_steps * strength
is larger or equal to 1.
The image-to-image pipeline will run for int(num_inference_steps * strength)
steps, e.g. 0.5 * 2.0 = 1
step in
our example below.
from mindone.diffusers import StableDiffusionXLImg2ImgPipeline
from mindone.diffusers.utils import load_image, make_image_grid
import mindspore as ms
# use from_pipe to avoid consuming additional memory when loading a checkpoint
pipeline_image2image = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/sdxl-turbo", mindspore_dtype=ms.float16, variant="fp16")
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")
init_image = init_image.resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
image = pipeline_image2image(prompt, image=init_image, strength=0.5, guidance_scale=0.0, num_inference_steps=2)[0][0]
make_image_grid([init_image, image], rows=1, cols=2)
Speed-up SDXL Turbo even more¶
- When using the default VAE, keep it in
float32
to avoid costlydtype
conversions before and after each generation. You only need to do this one before your first generation:
pipe.upcast_vae()
As an alternative, you can also use a 16-bit VAE created by community member @madebyollin
that does not need to be upcasted to float32
.