Chroma¶

Chroma is a text to image generation model based on Flux.

Original model checkpoints for Chroma can be found here.

Tip

Chroma can use all the same optimizations as Flux.

Inference¶

The Diffusers version of Chroma is based on the unlocked-v37 version of the original model, which is available in the Chroma repository.

import mindspore
import numpy as np
from mindone.diffusers import ChromaPipeline

pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", mindspore_dtype=mindspore.bfloat16)

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=np.random.Generator(np.random.PCG64(seed=42)),
    num_inference_steps=40,
    guidance_scale=3.0,
    num_images_per_prompt=1,
)[0][0]
image.save("chroma.png")

Loading from a single file¶

To use updated model checkpoints that are not in the Diffusers format, you can use the ChromaTransformer2DModel class to load the model from a single file in the original format. This is also useful when trying to load finetunes or quantized versions of the models that have been published by the community.

The following example demonstrates how to run Chroma from a single file.

Then run the following example

import mindspore
import numpy as np
from mindone.diffusers import ChromaTransformer2DModel, ChromaPipeline

model_id = "lodestones/Chroma"
dtype = mindspore.bfloat16

transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors", mindspore_dtype=dtype)

pipe = ChromaPipeline.from_pretrained(model_id, transformer=transformer, mindspore_dtype=dtype)

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=np.random.Generator(np.random.PCG64(seed=42)),
    num_inference_steps=40,
    guidance_scale=3.0,
)[0][0]

image.save("chroma-single-file.png")

`mindone.diffusers.ChromaPipeline` ¶

Bases: DiffusionPipeline, FluxLoraLoaderMixin, FromSingleFileMixin, TextualInversionLoaderMixin, FluxIPAdapterMixin

The Chroma pipeline for text-to-image generation.

Reference: https://huggingface.co/lodestones/Chroma/

PARAMETER	DESCRIPTION
`transformer`	Conditional Transformer (MMDiT) architecture to denoise the encoded image latents. TYPE: [`ChromaTransformer2DModel`]
`scheduler`	A scheduler to be used in combination with `transformer` to denoise the encoded image latents. TYPE: [`FlowMatchEulerDiscreteScheduler`]
`vae`	Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representation TYPE: [`AutoencoderKL`]
`text_encoder`	T5, specifically the google/t5-v1_1-xxl variant. TYPE: [`T5EncoderModel`]
`tokenizer`	Second Tokenizer of class T5TokenizerFast. TYPE: `T5TokenizerFast`