Skip to content

Stable Diffusion 2

Stable Diffusion 2 is a text-to-image latent diffusion model built upon the work of the original Stable Diffusion, and it was led by Robin Rombach and Katherine Crowson from Stability AI and LAION.

The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. These models are trained on an aesthetic subset of the LAION-5B dataset created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using LAION’s NSFW filter.

For more details about how Stable Diffusion 2 works and how it differs from the original Stable Diffusion, please refer to the official announcement post.

The architecture of Stable Diffusion 2 is more or less identical to the original Stable Diffusion model so check out it's API documentation for how to use Stable Diffusion 2. We recommend using the DPMSolverMultistepScheduler as it gives a reasonable speed/quality trade-off and can be run with as little as 20 steps.

Stable Diffusion 2 is available for tasks like text-to-image, inpainting, super-resolution, and depth-to-image:

Task Repository
text-to-image (512x512) stabilityai/stable-diffusion-2-base
text-to-image (768x768) stabilityai/stable-diffusion-2
inpainting stabilityai/stable-diffusion-2-inpainting
super-resolution stable-diffusion-x4-upscaler
depth-to-image stabilityai/stable-diffusion-2-depth

Here are some examples for how to use Stable Diffusion 2 for each task:

Tip

Make sure to check out the Stable Diffusion Tips section to learn how to explore the tradeoff between scheduler speed and quality, and how to reuse pipeline components efficiently!

If you're interested in using one of the official checkpoints for a task, explore the CompVis, Runway, and Stability AI Hub organizations!

Text-to-image

from mindone.diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
import mindspore as ms

repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, mindspore_dtype=ms.float16, variant="fp16")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, num_inference_steps=25)[0][0]
image

Inpainting

import mindspore as ms
from mindone.diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from mindone.diffusers.utils import load_image, make_image_grid

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).resize((512, 512))
mask_image = load_image(mask_url).resize((512, 512))

repo_id = "stabilityai/stable-diffusion-2-inpainting"
pipe = DiffusionPipeline.from_pretrained(repo_id, mindspore_dtype=ms.float16, variant="fp16")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25)[0][0]
make_image_grid([init_image, mask_image, image], rows=1, cols=3)

Super-resolution

from mindone.diffusers import StableDiffusionUpscalePipeline
from mindone.diffusers.utils import load_image, make_image_grid
import mindspore as ms

# load model and scheduler
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, mindspore_dtype=ms.float16)

# let's download an  image
url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png"
low_res_img = load_image(url)
low_res_img = low_res_img.resize((128, 128))
prompt = "a white cat"
upscaled_image = pipeline(prompt=prompt, image=low_res_img)[0][0]
make_image_grid([low_res_img.resize((512, 512)), upscaled_image.resize((512, 512))], rows=1, cols=2)

Depth-to-image

import mindspore as ms
from mindone.diffusers import StableDiffusionDepth2ImgPipeline
from mindone.diffusers.utils import load_image, make_image_grid

pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-depth",
    mindspore_dtype=ms.float16,
)


url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = load_image(url)
prompt = "two tigers"
negative_prompt = "bad, deformed, ugly, bad anotomy"
image = pipe(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7)[0][0]
make_image_grid([init_image, image], rows=1, cols=2)