Skip to content

MindOne - One for All

FluxControlNetModel

mindspore-lab/mindone

FluxControlNetModel¶

FluxControlNetModel is an implementation of ControlNet for Flux.1.

The ControlNet model was introduced in Adding Conditional Control to Text-to-Image Diffusion Models by Lvmin Zhang, Anyi Rao, Maneesh Agrawala. It provides a greater degree of control over text-to-image generation by conditioning the model on additional inputs such as edge maps, depth maps, segmentation maps, and keypoints for pose detection.

The abstract from the paper is:

We present ControlNet, a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large diffusion models, and reuses their deep and robust encoding layers pretrained with billions of images as a strong backbone to learn a diverse set of conditional controls. The neural architecture is connected with "zero convolutions" (zero-initialized convolution layers) that progressively grow the parameters from zero and ensure that no harmful noise could affect the finetuning. We test various conditioning controls, eg, edges, depth, segmentation, human pose, etc, with Stable Diffusion, using single or multiple conditions, with or without prompts. We show that the training of ControlNets is robust with small (<50k) and large (>1m) datasets. Extensive results show that ControlNet may facilitate wider applications to control image diffusion models.

Loading from the original format¶

By default the FluxControlNetModel should be loaded with ModelMixin.from_pretrained.

from mindone.diffusers import FluxControlNetPipeline
from mindone.diffusers.models import FluxControlNetModel, FluxMultiControlNetModel

controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)

controlnet = FluxControlNetModel.from_pretrained("InstantX/FLUX.1-dev-Controlnet-Canny")
controlnet = FluxMultiControlNetModel([controlnet])
pipe = FluxControlNetPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", controlnet=controlnet)

FluxControlNetModel¶

`mindone.diffusers.models.controlnet_flux.FluxControlNetModel` ¶

Bases: ModelMixin, ConfigMixin, PeftAdapterMixin

Source code in mindone/diffusers/models/controlnet_flux.py

class FluxControlNetModel(ModelMixin, ConfigMixin, PeftAdapterMixin):
    _supports_gradient_checkpointing = True

    @register_to_config
    def __init__(
        self,
        patch_size: int = 1,
        in_channels: int = 64,
        num_layers: int = 19,
        num_single_layers: int = 38,
        attention_head_dim: int = 128,
        num_attention_heads: int = 24,
        joint_attention_dim: int = 4096,
        pooled_projection_dim: int = 768,
        guidance_embeds: bool = False,
        axes_dims_rope: List[int] = [16, 56, 56],
        num_mode: int = None,
        conditioning_embedding_channels: int = None,
    ):
        super().__init__()
        self.out_channels = in_channels
        self.inner_dim = num_attention_heads * attention_head_dim

        self.pos_embed = FluxPosEmbed(theta=10000, axes_dim=axes_dims_rope)
        text_time_guidance_cls = (
            CombinedTimestepGuidanceTextProjEmbeddings if guidance_embeds else CombinedTimestepTextProjEmbeddings
        )
        self.time_text_embed = text_time_guidance_cls(
            embedding_dim=self.inner_dim, pooled_projection_dim=pooled_projection_dim
        )

        self.context_embedder = nn.Dense(joint_attention_dim, self.inner_dim)
        self.x_embedder = nn.Dense(in_channels, self.inner_dim)

        self.transformer_blocks = nn.CellList(
            [
                FluxTransformerBlock(
                    dim=self.inner_dim,
                    num_attention_heads=num_attention_heads,
                    attention_head_dim=attention_head_dim,
                )
                for i in range(num_layers)
            ]
        )

        self.single_transformer_blocks = nn.CellList(
            [
                FluxSingleTransformerBlock(
                    dim=self.inner_dim,
                    num_attention_heads=num_attention_heads,
                    attention_head_dim=attention_head_dim,
                )
                for i in range(num_single_layers)
            ]
        )

        # controlnet_blocks
        controlnet_blocks = []
        for _ in range(len(self.transformer_blocks)):
            controlnet_blocks.append(
                nn.Dense(self.inner_dim, self.inner_dim, weight_init="zeros", bias_init="zeros")
            )  # zero_module
        self.controlnet_blocks = nn.CellList(controlnet_blocks)

        controlnet_single_blocks = []
        for _ in range(len(self.single_transformer_blocks)):
            controlnet_single_blocks.append(
                nn.Dense(self.inner_dim, self.inner_dim, weight_init="zeros", bias_init="zeros")
            )  # zero_module
        self.controlnet_single_blocks = nn.CellList(controlnet_single_blocks)

        self.union = num_mode is not None
        if self.union:
            self.controlnet_mode_embedder = nn.Embedding(num_mode, self.inner_dim)

        if conditioning_embedding_channels is not None:
            self.input_hint_block = ControlNetConditioningEmbedding(
                conditioning_embedding_channels=conditioning_embedding_channels, block_out_channels=(16, 16, 16, 16)
            )
            self.controlnet_x_embedder = nn.Dense(in_channels, self.inner_dim)
        else:
            self.input_hint_block = None
            self.controlnet_x_embedder = nn.Dense(
                in_channels, self.inner_dim, weight_init="zeros", bias_init="zeros"
            )  # zero_module

        self.gradient_checkpointing = False
        self.patch_size = self.config.patch_size

    @property
    # Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.attn_processors
    def attn_processors(self):
        r"""
        Returns:
            `dict` of attention processors: A dictionary containing all attention processors used in the model with
            indexed by its weight name.
        """
        # set recursively
        processors = {}

        def fn_recursive_add_processors(name: str, module: nn.Cell, processors: Dict[str, AttentionProcessor]):
            if hasattr(module, "get_processor"):
                processors[f"{name}.processor"] = module.get_processor()

            for sub_name, child in module.name_cells().items():
                fn_recursive_add_processors(f"{name}.{sub_name}", child, processors)

            return processors

        for name, module in self.name_cells().items():
            fn_recursive_add_processors(name, module, processors)

        return processors

    # Copied from diffusers.models.unets.unet_2d_condition.UNet2DConditionModel.set_attn_processor
    def set_attn_processor(self, processor):
        r"""
        Sets the attention processor to use to compute attention.

        Parameters:
            processor (`dict` of `AttentionProcessor` or only `AttentionProcessor`):
                The instantiated processor class or a dictionary of processor classes that will be set as the processor
                for **all** `Attention` layers.

                If `processor` is a dict, the key needs to define the path to the corresponding cross attention
                processor. This is strongly recommended when setting trainable attention processors.

        """
        count = len(self.attn_processors.keys())

        if isinstance(processor, dict) and len(processor) != count:
            raise ValueError(
                f"A dict of processors was passed, but the number of processors {len(processor)} does not match the"
                f" number of attention layers: {count}. Please make sure to pass {count} processor classes."
            )

        def fn_recursive_attn_processor(name: str, module: nn.Cell, processor):
            if hasattr(module, "set_processor"):
                if not isinstance(processor, dict):
                    module.set_processor(processor)
                else:
                    module.set_processor(processor.pop(f"{name}.processor"))

            for sub_name, child in module.name_cells().items():
                fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor)

        for name, module in self.name_cells().items():
            fn_recursive_attn_processor(name, module, processor)

    def _set_gradient_checkpointing(self, module, value=False):
        if hasattr(module, "gradient_checkpointing"):
            module.gradient_checkpointing = value

    @classmethod
    def from_transformer(
        cls,
        transformer,
        num_layers: int = 4,
        num_single_layers: int = 10,
        attention_head_dim: int = 128,
        num_attention_heads: int = 24,
        load_weights_from_transformer=True,
    ):
        config = transformer.config
        config["num_layers"] = num_layers
        config["num_single_layers"] = num_single_layers
        config["attention_head_dim"] = attention_head_dim
        config["num_attention_heads"] = num_attention_heads

        controlnet = cls(**config)

        if load_weights_from_transformer:
            ms.load_param_into_net(controlnet.pos_embed, transformer.pos_embed.parameters_dict())
            ms.load_param_into_net(controlnet.time_text_embed, transformer.time_text_embed.parameters_dict())
            ms.load_param_into_net(controlnet.context_embedder, transformer.context_embedder.parameters_dict())
            ms.load_param_into_net(controlnet.x_embedder, transformer.x_embedder.parameters_dict())
            ms.load_param_into_net(
                controlnet.transformer_blocks, transformer.transformer_blocks.parameters_dict(), strict_load=False
            )
            ms.load_param_into_net(
                controlnet.single_transformer_blocks,
                transformer.single_transformer_blocks.parameters_dict(),
                strict_load=False,
            )

            # zero_module
            controlnet.controlnet_x_embedder.weight.set_data(
                initializer(
                    "zeros",
                    controlnet.controlnet_x_embedder.weight.shape,
                    controlnet.controlnet_x_embedder.weight.dtype,
                )
            )
            controlnet.controlnet_x_embedder.bias.set_data(
                initializer(
                    "zeros", controlnet.controlnet_x_embedder.bias.shape, controlnet.controlnet_x_embedder.bias.dtype
                )
            )

        return controlnet

    def construct(
        self,
        hidden_states: ms.Tensor,
        controlnet_cond: ms.Tensor,
        controlnet_mode: ms.Tensor = None,
        conditioning_scale: float = 1.0,
        encoder_hidden_states: ms.Tensor = None,
        pooled_projections: ms.Tensor = None,
        timestep: ms.Tensor = None,
        img_ids: ms.Tensor = None,
        txt_ids: ms.Tensor = None,
        guidance: ms.Tensor = None,
        joint_attention_kwargs: Optional[Dict[str, Any]] = None,
        return_dict: bool = False,
    ) -> Union[ms.Tensor, Transformer2DModelOutput]:
        """
        The [`FluxTransformer2DModel`] forward method.

        Args:
            hidden_states (`ms.Tensor` of shape `(batch size, channel, height, width)`):
                Input `hidden_states`.
            controlnet_cond (`ms.Tensor`):
                The conditional input tensor of shape `(batch_size, sequence_length, hidden_size)`.
            controlnet_mode (`ms.Tensor`):
                The mode tensor of shape `(batch_size, 1)`.
            conditioning_scale (`float`, defaults to `1.0`):
                The scale factor for ControlNet outputs.
            encoder_hidden_states (`ms.Tensor` of shape `(batch size, sequence_len, embed_dims)`):
                Conditional embeddings (embeddings computed from the input conditions such as prompts) to use.
            pooled_projections (`ms.Tensor` of shape `(batch_size, projection_dim)`): Embeddings projected
                from the embeddings of input conditions.
            timestep ( `ms.Tensor`):
                Used to indicate denoising step.
            block_controlnet_hidden_states: (`list` of `ms.Tensor`):
                A list of tensors that if specified are added to the residuals of transformer blocks.
            joint_attention_kwargs (`dict`, *optional*):
                A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
                `self.processor` in
                [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
            return_dict (`bool`, *optional*, defaults to `False`):
                Whether or not to return a [`~models.transformer_2d.Transformer2DModelOutput`] instead of a plain
                tuple.

        Returns:
            If `return_dict` is True, an [`~models.transformer_2d.Transformer2DModelOutput`] is returned, otherwise a
            `tuple` where the first element is the sample tensor.
        """
        if joint_attention_kwargs is not None:
            joint_attention_kwargs = joint_attention_kwargs.copy()

        if joint_attention_kwargs is not None and joint_attention_kwargs.get("scale", None) is not None:
            logger.warning(
                "Passing `scale` via `joint_attention_kwargs` when not using the PEFT backend is ineffective."
            )
        hidden_states = self.x_embedder(hidden_states)

        if self.input_hint_block is not None:
            controlnet_cond = self.input_hint_block(controlnet_cond)
            batch_size, channels, height_pw, width_pw = controlnet_cond.shape
            height = height_pw // self.patch_size
            width = width_pw // self.patch_size
            controlnet_cond = controlnet_cond.reshape(
                batch_size, channels, height, self.patch_size, width, self.patch_size
            )
            controlnet_cond = controlnet_cond.permute(0, 2, 4, 1, 3, 5)
            controlnet_cond = controlnet_cond.reshape(batch_size, height * width, -1)
        # add
        hidden_states = hidden_states + self.controlnet_x_embedder(controlnet_cond)

        timestep = timestep.to(hidden_states.dtype) * 1000
        if guidance is not None:
            guidance = guidance.to(hidden_states.dtype) * 1000
        else:
            guidance = None
        temb = (
            self.time_text_embed(timestep, pooled_projections)
            if guidance is None
            else self.time_text_embed(timestep, guidance, pooled_projections)
        )
        encoder_hidden_states = self.context_embedder(encoder_hidden_states)

        if self.union:
            # union mode
            if controlnet_mode is None:
                raise ValueError("`controlnet_mode` cannot be `None` when applying ControlNet-Union")
            # union mode emb
            controlnet_mode_emb = self.controlnet_mode_embedder(controlnet_mode)
            encoder_hidden_states = ops.cat([controlnet_mode_emb, encoder_hidden_states], axis=1)
            txt_ids = ops.cat([txt_ids[:1], txt_ids], axis=0)

        if txt_ids.ndim == 3:
            logger.warning(
                "Passing `txt_ids` 3d ms.Tensor is deprecated."
                "Please remove the batch dimension and pass it as a 2d torch Tensor"
            )
            txt_ids = txt_ids[0]
        if img_ids.ndim == 3:
            logger.warning(
                "Passing `img_ids` 3d ms.Tensor is deprecated."
                "Please remove the batch dimension and pass it as a 2d torch Tensor"
            )
            img_ids = img_ids[0]

        ids = ops.cat((txt_ids, img_ids), axis=0)
        image_rotary_emb = self.pos_embed(ids)

        block_samples = ()
        for index_block, block in enumerate(self.transformer_blocks):
            encoder_hidden_states, hidden_states = block(
                hidden_states=hidden_states,
                encoder_hidden_states=encoder_hidden_states,
                temb=temb,
                image_rotary_emb=image_rotary_emb,
            )
            block_samples = block_samples + (hidden_states,)

        hidden_states = ops.cat([encoder_hidden_states, hidden_states], axis=1)

        single_block_samples = ()
        for index_block, block in enumerate(self.single_transformer_blocks):
            hidden_states = block(
                hidden_states=hidden_states,
                temb=temb,
                image_rotary_emb=image_rotary_emb,
            )
            single_block_samples = single_block_samples + (hidden_states[:, encoder_hidden_states.shape[1] :],)

        # controlnet block
        controlnet_block_samples = ()
        for block_sample, controlnet_block in zip(block_samples, self.controlnet_blocks):
            block_sample = controlnet_block(block_sample)
            controlnet_block_samples = controlnet_block_samples + (block_sample,)

        controlnet_single_block_samples = ()
        for single_block_sample, controlnet_block in zip(single_block_samples, self.controlnet_single_blocks):
            single_block_sample = controlnet_block(single_block_sample)
            controlnet_single_block_samples = controlnet_single_block_samples + (single_block_sample,)

        # scaling
        controlnet_block_samples = [sample * conditioning_scale for sample in controlnet_block_samples]
        controlnet_single_block_samples = [sample * conditioning_scale for sample in controlnet_single_block_samples]

        controlnet_block_samples = None if len(controlnet_block_samples) == 0 else controlnet_block_samples
        controlnet_single_block_samples = (
            None if len(controlnet_single_block_samples) == 0 else controlnet_single_block_samples
        )

        if not return_dict:
            return (controlnet_block_samples, controlnet_single_block_samples)

        return FluxControlNetOutput(
            controlnet_block_samples=controlnet_block_samples,
            controlnet_single_block_samples=controlnet_single_block_samples,
        )

`mindone.diffusers.models.controlnet_flux.FluxControlNetModel.attn_processors` `property` ¶

RETURNS	DESCRIPTION
	`dict` of attention processors: A dictionary containing all attention processors used in the model with
	indexed by its weight name.

`mindone.diffusers.models.controlnet_flux.FluxControlNetModel.construct(hidden_states, controlnet_cond, controlnet_mode=None, conditioning_scale=1.0, encoder_hidden_states=None, pooled_projections=None, timestep=None, img_ids=None, txt_ids=None, guidance=None, joint_attention_kwargs=None, return_dict=False)` ¶

The [FluxTransformer2DModel] forward method.

PARAMETER	DESCRIPTION
`hidden_states`	Input `hidden_states`. TYPE: `ms.Tensor` of shape `(batch size, channel, height, width)`
`controlnet_cond`	The conditional input tensor of shape `(batch_size, sequence_length, hidden_size)`. TYPE: `ms.Tensor`
`controlnet_mode`	The mode tensor of shape `(batch_size, 1)`. TYPE: `ms.Tensor` DEFAULT: `None`
`conditioning_scale`	The scale factor for ControlNet outputs. TYPE: `float`, defaults to `1.0` DEFAULT: `1.0`
`encoder_hidden_states`	Conditional embeddings (embeddings computed from the input conditions such as prompts) to use. TYPE: `ms.Tensor` of shape `(batch size, sequence_len, embed_dims)` DEFAULT: `None`
`pooled_projections`	Embeddings projected from the embeddings of input conditions. TYPE: `ms.Tensor` of shape `(batch_size, projection_dim)` DEFAULT: `None`
`timestep`	Used to indicate denoising step. TYPE: `ms.Tensor` DEFAULT: `None`
`block_controlnet_hidden_states`	(`list` of `ms.Tensor`): A list of tensors that if specified are added to the residuals of transformer blocks.
`joint_attention_kwargs`	A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under `self.processor` in diffusers.models.attention_processor. TYPE: `dict`, optional DEFAULT: `None`
`return_dict`	Whether or not to return a [`~models.transformer_2d.Transformer2DModelOutput`] instead of a plain tuple. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`

RETURNS	DESCRIPTION
`Union[Tensor, Transformer2DModelOutput]`	If `return_dict` is True, an [`~models.transformer_2d.Transformer2DModelOutput`] is returned, otherwise a
`Union[Tensor, Transformer2DModelOutput]`	`tuple` where the first element is the sample tensor.

Source code in mindone/diffusers/models/controlnet_flux.py

def construct(
    self,
    hidden_states: ms.Tensor,
    controlnet_cond: ms.Tensor,
    controlnet_mode: ms.Tensor = None,
    conditioning_scale: float = 1.0,
    encoder_hidden_states: ms.Tensor = None,
    pooled_projections: ms.Tensor = None,
    timestep: ms.Tensor = None,
    img_ids: ms.Tensor = None,
    txt_ids: ms.Tensor = None,
    guidance: ms.Tensor = None,
    joint_attention_kwargs: Optional[Dict[str, Any]] = None,
    return_dict: bool = False,
) -> Union[ms.Tensor, Transformer2DModelOutput]:
    """
    The [`FluxTransformer2DModel`] forward method.

    Args:
        hidden_states (`ms.Tensor` of shape `(batch size, channel, height, width)`):
            Input `hidden_states`.
        controlnet_cond (`ms.Tensor`):
            The conditional input tensor of shape `(batch_size, sequence_length, hidden_size)`.
        controlnet_mode (`ms.Tensor`):
            The mode tensor of shape `(batch_size, 1)`.
        conditioning_scale (`float`, defaults to `1.0`):
            The scale factor for ControlNet outputs.
        encoder_hidden_states (`ms.Tensor` of shape `(batch size, sequence_len, embed_dims)`):
            Conditional embeddings (embeddings computed from the input conditions such as prompts) to use.
        pooled_projections (`ms.Tensor` of shape `(batch_size, projection_dim)`): Embeddings projected
            from the embeddings of input conditions.
        timestep ( `ms.Tensor`):
            Used to indicate denoising step.
        block_controlnet_hidden_states: (`list` of `ms.Tensor`):
            A list of tensors that if specified are added to the residuals of transformer blocks.
        joint_attention_kwargs (`dict`, *optional*):
            A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
            `self.processor` in
            [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
        return_dict (`bool`, *optional*, defaults to `False`):
            Whether or not to return a [`~models.transformer_2d.Transformer2DModelOutput`] instead of a plain
            tuple.

    Returns:
        If `return_dict` is True, an [`~models.transformer_2d.Transformer2DModelOutput`] is returned, otherwise a
        `tuple` where the first element is the sample tensor.
    """
    if joint_attention_kwargs is not None:
        joint_attention_kwargs = joint_attention_kwargs.copy()

    if joint_attention_kwargs is not None and joint_attention_kwargs.get("scale", None) is not None:
        logger.warning(
            "Passing `scale` via `joint_attention_kwargs` when not using the PEFT backend is ineffective."
        )
    hidden_states = self.x_embedder(hidden_states)

    if self.input_hint_block is not None:
        controlnet_cond = self.input_hint_block(controlnet_cond)
        batch_size, channels, height_pw, width_pw = controlnet_cond.shape
        height = height_pw // self.patch_size
        width = width_pw // self.patch_size
        controlnet_cond = controlnet_cond.reshape(
            batch_size, channels, height, self.patch_size, width, self.patch_size
        )
        controlnet_cond = controlnet_cond.permute(0, 2, 4, 1, 3, 5)
        controlnet_cond = controlnet_cond.reshape(batch_size, height * width, -1)
    # add
    hidden_states = hidden_states + self.controlnet_x_embedder(controlnet_cond)

    timestep = timestep.to(hidden_states.dtype) * 1000
    if guidance is not None:
        guidance = guidance.to(hidden_states.dtype) * 1000
    else:
        guidance = None
    temb = (
        self.time_text_embed(timestep, pooled_projections)
        if guidance is None
        else self.time_text_embed(timestep, guidance, pooled_projections)
    )
    encoder_hidden_states = self.context_embedder(encoder_hidden_states)

    if self.union:
        # union mode
        if controlnet_mode is None:
            raise ValueError("`controlnet_mode` cannot be `None` when applying ControlNet-Union")
        # union mode emb
        controlnet_mode_emb = self.controlnet_mode_embedder(controlnet_mode)
        encoder_hidden_states = ops.cat([controlnet_mode_emb, encoder_hidden_states], axis=1)
        txt_ids = ops.cat([txt_ids[:1], txt_ids], axis=0)

    if txt_ids.ndim == 3:
        logger.warning(
            "Passing `txt_ids` 3d ms.Tensor is deprecated."
            "Please remove the batch dimension and pass it as a 2d torch Tensor"
        )
        txt_ids = txt_ids[0]
    if img_ids.ndim == 3:
        logger.warning(
            "Passing `img_ids` 3d ms.Tensor is deprecated."
            "Please remove the batch dimension and pass it as a 2d torch Tensor"
        )
        img_ids = img_ids[0]

    ids = ops.cat((txt_ids, img_ids), axis=0)
    image_rotary_emb = self.pos_embed(ids)

    block_samples = ()
    for index_block, block in enumerate(self.transformer_blocks):
        encoder_hidden_states, hidden_states = block(
            hidden_states=hidden_states,
            encoder_hidden_states=encoder_hidden_states,
            temb=temb,
            image_rotary_emb=image_rotary_emb,
        )
        block_samples = block_samples + (hidden_states,)

    hidden_states = ops.cat([encoder_hidden_states, hidden_states], axis=1)

    single_block_samples = ()
    for index_block, block in enumerate(self.single_transformer_blocks):
        hidden_states = block(
            hidden_states=hidden_states,
            temb=temb,
            image_rotary_emb=image_rotary_emb,
        )
        single_block_samples = single_block_samples + (hidden_states[:, encoder_hidden_states.shape[1] :],)

    # controlnet block
    controlnet_block_samples = ()
    for block_sample, controlnet_block in zip(block_samples, self.controlnet_blocks):
        block_sample = controlnet_block(block_sample)
        controlnet_block_samples = controlnet_block_samples + (block_sample,)

    controlnet_single_block_samples = ()
    for single_block_sample, controlnet_block in zip(single_block_samples, self.controlnet_single_blocks):
        single_block_sample = controlnet_block(single_block_sample)
        controlnet_single_block_samples = controlnet_single_block_samples + (single_block_sample,)

    # scaling
    controlnet_block_samples = [sample * conditioning_scale for sample in controlnet_block_samples]
    controlnet_single_block_samples = [sample * conditioning_scale for sample in controlnet_single_block_samples]

    controlnet_block_samples = None if len(controlnet_block_samples) == 0 else controlnet_block_samples
    controlnet_single_block_samples = (
        None if len(controlnet_single_block_samples) == 0 else controlnet_single_block_samples
    )

    if not return_dict:
        return (controlnet_block_samples, controlnet_single_block_samples)

    return FluxControlNetOutput(
        controlnet_block_samples=controlnet_block_samples,
        controlnet_single_block_samples=controlnet_single_block_samples,
    )

`mindone.diffusers.models.controlnet_flux.FluxControlNetModel.set_attn_processor(processor)` ¶

Sets the attention processor to use to compute attention.

PARAMETER	DESCRIPTION
`processor`	The instantiated processor class or a dictionary of processor classes that will be set as the processor for all `Attention` layers. If `processor` is a dict, the key needs to define the path to the corresponding cross attention processor. This is strongly recommended when setting trainable attention processors. TYPE: `dict` of `AttentionProcessor` or only `AttentionProcessor`

Source code in mindone/diffusers/models/controlnet_flux.py

def set_attn_processor(self, processor):
    r"""
    Sets the attention processor to use to compute attention.

    Parameters:
        processor (`dict` of `AttentionProcessor` or only `AttentionProcessor`):
            The instantiated processor class or a dictionary of processor classes that will be set as the processor
            for **all** `Attention` layers.

            If `processor` is a dict, the key needs to define the path to the corresponding cross attention
            processor. This is strongly recommended when setting trainable attention processors.

    """
    count = len(self.attn_processors.keys())

    if isinstance(processor, dict) and len(processor) != count:
        raise ValueError(
            f"A dict of processors was passed, but the number of processors {len(processor)} does not match the"
            f" number of attention layers: {count}. Please make sure to pass {count} processor classes."
        )

    def fn_recursive_attn_processor(name: str, module: nn.Cell, processor):
        if hasattr(module, "set_processor"):
            if not isinstance(processor, dict):
                module.set_processor(processor)
            else:
                module.set_processor(processor.pop(f"{name}.processor"))

        for sub_name, child in module.name_cells().items():
            fn_recursive_attn_processor(f"{name}.{sub_name}", child, processor)

    for name, module in self.name_cells().items():
        fn_recursive_attn_processor(name, module, processor)

`mindone.diffusers.models.controlnet_flux.FluxControlNetOutput` `dataclass` ¶

Bases: BaseOutput

Source code in mindone/diffusers/models/controlnet_flux.py

@dataclass
class FluxControlNetOutput(BaseOutput):
    controlnet_block_samples: Tuple[ms.Tensor]
    controlnet_single_block_samples: Tuple[ms.Tensor]