Skip to content

MindOne - One for All

VAE Image Processor

mindspore-lab/mindone

VAE Image Processor¶

The VaeImageProcessor provides a unified API for StableDiffusionPipeline to prepare image inputs for VAE encoding and post-processing outputs once they're decoded. This includes transformations such as resizing, normalization, and conversion between PIL Image, MindSpore, and NumPy arrays.

All pipelines with VaeImageProcessor accept PIL Image, MindSpore tensor, or NumPy arrays as image inputs and return outputs based on the output_type argument by the user. You can pass encoded image latents directly to the pipeline and return latents from the pipeline as a specific output with the output_type argument (for example output_type="latent"). This allows you to take the generated latents from one pipeline and pass it to another pipeline as input without leaving the latent space. It also makes it much easier to use multiple pipelines together by passing MindSpore tensors directly between different pipelines.

`mindone.diffusers.image_processor.VaeImageProcessor` ¶

Bases: ConfigMixin

Image processor for VAE.

PARAMETER	DESCRIPTION
`do_resize`	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept `height` and `width` arguments from [`image_processor.VaeImageProcessor.preprocess`] method. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`vae_scale_factor`	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor. TYPE: `int`, optional, defaults to `8` DEFAULT: `8`
`resample`	Resampling filter to use when resizing the image. TYPE: `str`, optional, defaults to `lanczos` DEFAULT: `'lanczos'`
`do_normalize`	Whether to normalize the image to [-1,1]. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`do_binarize`	Whether to binarize the image to 0/1. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`
`do_convert_rgb`	Whether to convert the images to RGB format. TYPE: `bool`, optional, defaults to be `False` DEFAULT: `False`
`do_convert_grayscale`	Whether to convert the images to grayscale format. TYPE: `bool`, optional, defaults to be `False` DEFAULT: `False`

Source code in mindone/diffusers/image_processor.py

class VaeImageProcessor(ConfigMixin):
    """
    Image processor for VAE.

    Args:
        do_resize (`bool`, *optional*, defaults to `True`):
            Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept
            `height` and `width` arguments from [`image_processor.VaeImageProcessor.preprocess`] method.
        vae_scale_factor (`int`, *optional*, defaults to `8`):
            VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
        resample (`str`, *optional*, defaults to `lanczos`):
            Resampling filter to use when resizing the image.
        do_normalize (`bool`, *optional*, defaults to `True`):
            Whether to normalize the image to [-1,1].
        do_binarize (`bool`, *optional*, defaults to `False`):
            Whether to binarize the image to 0/1.
        do_convert_rgb (`bool`, *optional*, defaults to be `False`):
            Whether to convert the images to RGB format.
        do_convert_grayscale (`bool`, *optional*, defaults to be `False`):
            Whether to convert the images to grayscale format.
    """

    config_name = CONFIG_NAME

    @register_to_config
    def __init__(
        self,
        do_resize: bool = True,
        vae_scale_factor: int = 8,
        vae_latent_channels: int = 4,
        resample: str = "lanczos",
        do_normalize: bool = True,
        do_binarize: bool = False,
        do_convert_rgb: bool = False,
        do_convert_grayscale: bool = False,
    ):
        super().__init__()
        if do_convert_rgb and do_convert_grayscale:
            raise ValueError(
                "`do_convert_rgb` and `do_convert_grayscale` can not both be set to `True`,"
                " if you intended to convert the image into RGB format, please set `do_convert_grayscale = False`.",
                " if you intended to convert the image into grayscale format, please set `do_convert_rgb = False`",
            )

    @staticmethod
    def numpy_to_pil(images: np.ndarray) -> List[PIL.Image.Image]:
        r"""
        Convert a numpy image or a batch of images to a PIL image.

        Args:
            images (`np.ndarray`):
                The image array to convert to PIL format.
        Returns:
            `List[PIL.Image.Image]`:
                A list of PIL images.
        """
        if images.ndim == 3:
            images = images[None, ...]
        images = (images * 255).round().astype("uint8")
        if images.shape[-1] == 1:
            # special case for grayscale (single channel) images
            pil_images = [Image.fromarray(image.squeeze(), mode="L") for image in images]
        else:
            pil_images = [Image.fromarray(image) for image in images]

        return pil_images

    @staticmethod
    def pil_to_numpy(images: Union[List[PIL.Image.Image], PIL.Image.Image]) -> np.ndarray:
        r"""
        Convert a PIL image or a list of PIL images to NumPy arrays.
        Args:
            images (`PIL.Image.Image` or `List[PIL.Image.Image]`):
                The PIL image or list of images to convert to NumPy format.
        Returns:
            `np.ndarray`:
                A NumPy array representation of the images.
        """
        if not isinstance(images, list):
            images = [images]
        images = [np.array(image).astype(np.float32) / 255.0 for image in images]
        images = np.stack(images, axis=0)

        return images

    @staticmethod
    def numpy_to_ms(images: np.ndarray) -> ms.Tensor:
        r"""
        Convert a NumPy image to a MindSpore tensor.

        Args:
            images (`np.ndarray`):
                The NumPy image array to convert to MindSpore format.
        Returns:
            `ms.Tensor`:
                A MindSpore tensor representation of the images.
        """
        if images.ndim == 3:
            images = images[..., None]

        images = ms.Tensor(images.transpose(0, 3, 1, 2))
        return images

    @staticmethod
    def ms_to_numpy(images: ms.Tensor) -> np.ndarray:
        r"""
        Convert a MindSpore tensor to a NumPy image.

        Args:
            images (`ms.Tensor`):
                The MindSpore tensor to convert to NumPy format.
        Returns:
            `np.ndarray`:
                A NumPy array representation of the images.
        """
        images = images.permute(0, 2, 3, 1).float().numpy()
        return images

    @staticmethod
    def normalize(images: Union[np.ndarray, ms.Tensor]) -> Union[np.ndarray, ms.Tensor]:
        r"""
        Normalize an image array to [-1,1].

        Args:
            images (`np.ndarray` or `ms.Tensor`):
                The image array to normalize.
        Returns:
            `np.ndarray` or `ms.Tensor`:
                The normalized image array.
        """
        return 2.0 * images - 1.0

    @staticmethod
    def denormalize(images: Union[np.ndarray, ms.Tensor]) -> Union[np.ndarray, ms.Tensor]:
        r"""
        Denormalize an image array to [0,1].

        Args:
            images (`np.ndarray` or `ms.Tensor`):
                The image array to denormalize.
        Returns:
            `np.ndarray` or `ms.Tensor`:
                The denormalized image array.
        """
        return (images / 2 + 0.5).clamp(0, 1)

    @staticmethod
    def convert_to_rgb(image: PIL.Image.Image) -> PIL.Image.Image:
        r"""
        Converts a PIL image to RGB format.

        Args:
            image (`PIL.Image.Image`):
                The PIL image to convert to RGB.
        Returns:
            `PIL.Image.Image`:
                The RGB-converted PIL image.
        """
        image = image.convert("RGB")

        return image

    @staticmethod
    def convert_to_grayscale(image: PIL.Image.Image) -> PIL.Image.Image:
        r"""
        Converts a given PIL image to grayscale.

        Args:
            image (`PIL.Image.Image`):
                The input image to convert.
        Returns:
            `PIL.Image.Image`:
                The image converted to grayscale.
        """
        image = image.convert("L")

        return image

    @staticmethod
    def blur(image: PIL.Image.Image, blur_factor: int = 4) -> PIL.Image.Image:
        r"""
        Applies Gaussian blur to an image.

        Args:
            image (`PIL.Image.Image`):
                The PIL image to convert to grayscale.
        Returns:
            `PIL.Image.Image`:
                The grayscale-converted PIL image.
        """
        image = image.filter(ImageFilter.GaussianBlur(blur_factor))

        return image

    @staticmethod
    def get_crop_region(mask_image: PIL.Image.Image, width: int, height: int, pad=0):
        r"""
        Finds a rectangular region that contains all masked ares in an image, and expands region to match the aspect
        ratio of the original image; for example, if user drew mask in a 128x32 region, and the dimensions for
        processing are 512x512, the region will be expanded to 128x128.

        Args:
            mask_image (PIL.Image.Image): Mask image.
            width (int): Width of the image to be processed.
            height (int): Height of the image to be processed.
            pad (int, optional): Padding to be added to the crop region. Defaults to 0.

        Returns:
            tuple: (x1, y1, x2, y2) represent a rectangular region that contains all masked ares in an image and
            matches the original aspect ratio.
        """

        mask_image = mask_image.convert("L")
        mask = np.array(mask_image)

        # 1. find a rectangular region that contains all masked ares in an image
        h, w = mask.shape
        crop_left = 0
        for i in range(w):
            if not (mask[:, i] == 0).all():
                break
            crop_left += 1

        crop_right = 0
        for i in reversed(range(w)):
            if not (mask[:, i] == 0).all():
                break
            crop_right += 1

        crop_top = 0
        for i in range(h):
            if not (mask[i] == 0).all():
                break
            crop_top += 1

        crop_bottom = 0
        for i in reversed(range(h)):
            if not (mask[i] == 0).all():
                break
            crop_bottom += 1

        # 2. add padding to the crop region
        x1, y1, x2, y2 = (
            int(max(crop_left - pad, 0)),
            int(max(crop_top - pad, 0)),
            int(min(w - crop_right + pad, w)),
            int(min(h - crop_bottom + pad, h)),
        )

        # 3. expands crop region to match the aspect ratio of the image to be processed
        ratio_crop_region = (x2 - x1) / (y2 - y1)
        ratio_processing = width / height

        if ratio_crop_region > ratio_processing:
            desired_height = (x2 - x1) / ratio_processing
            desired_height_diff = int(desired_height - (y2 - y1))
            y1 -= desired_height_diff // 2
            y2 += desired_height_diff - desired_height_diff // 2
            if y2 >= mask_image.height:
                diff = y2 - mask_image.height
                y2 -= diff
                y1 -= diff
            if y1 < 0:
                y2 -= y1
                y1 -= y1
            if y2 >= mask_image.height:
                y2 = mask_image.height
        else:
            desired_width = (y2 - y1) * ratio_processing
            desired_width_diff = int(desired_width - (x2 - x1))
            x1 -= desired_width_diff // 2
            x2 += desired_width_diff - desired_width_diff // 2
            if x2 >= mask_image.width:
                diff = x2 - mask_image.width
                x2 -= diff
                x1 -= diff
            if x1 < 0:
                x2 -= x1
                x1 -= x1
            if x2 >= mask_image.width:
                x2 = mask_image.width

        return x1, y1, x2, y2

    def _resize_and_fill(
        self,
        image: PIL.Image.Image,
        width: int,
        height: int,
    ) -> PIL.Image.Image:
        r"""
        Resize the image to fit within the specified width and height, maintaining the aspect ratio, and then center
        the image within the dimensions, filling empty with data from image.

        Args:
            image (`PIL.Image.Image`):
                The image to resize and fill.
            width (`int`):
                The width to resize the image to.
            height (`int`):
                The height to resize the image to.

        Returns:
            `PIL.Image.Image`:
                The resized and filled image.
        """

        ratio = width / height
        src_ratio = image.width / image.height

        src_w = width if ratio < src_ratio else image.width * height // image.height
        src_h = height if ratio >= src_ratio else image.height * width // image.width

        resized = image.resize((src_w, src_h), resample=PIL_INTERPOLATION["lanczos"])
        res = Image.new("RGB", (width, height))
        res.paste(resized, box=(width // 2 - src_w // 2, height // 2 - src_h // 2))

        if ratio < src_ratio:
            fill_height = height // 2 - src_h // 2
            if fill_height > 0:
                res.paste(resized.resize((width, fill_height), box=(0, 0, width, 0)), box=(0, 0))
                res.paste(
                    resized.resize((width, fill_height), box=(0, resized.height, width, resized.height)),
                    box=(0, fill_height + src_h),
                )
        elif ratio > src_ratio:
            fill_width = width // 2 - src_w // 2
            if fill_width > 0:
                res.paste(resized.resize((fill_width, height), box=(0, 0, 0, height)), box=(0, 0))
                res.paste(
                    resized.resize((fill_width, height), box=(resized.width, 0, resized.width, height)),
                    box=(fill_width + src_w, 0),
                )

        return res

    def _resize_and_crop(
        self,
        image: PIL.Image.Image,
        width: int,
        height: int,
    ) -> PIL.Image.Image:
        r"""
        Resize the image to fit within the specified width and height, maintaining the aspect ratio, and then center
        the image within the dimensions, cropping the excess.

        Args:
            image (`PIL.Image.Image`):
                The image to resize and crop.
            width (`int`):
                The width to resize the image to.
            height (`int`):
                The height to resize the image to.

        Returns:
            `PIL.Image.Image`:
                The resized and cropped image.
        """
        ratio = width / height
        src_ratio = image.width / image.height

        src_w = width if ratio > src_ratio else image.width * height // image.height
        src_h = height if ratio <= src_ratio else image.height * width // image.width

        resized = image.resize((src_w, src_h), resample=PIL_INTERPOLATION["lanczos"])
        res = Image.new("RGB", (width, height))
        res.paste(resized, box=(width // 2 - src_w // 2, height // 2 - src_h // 2))
        return res

    def resize(
        self,
        image: Union[PIL.Image.Image, np.ndarray, ms.Tensor],
        height: int,
        width: int,
        resize_mode: str = "default",  # "default", "fill", "crop"
    ) -> Union[PIL.Image.Image, np.ndarray, ms.Tensor]:
        """
        Resize image.

        Args:
            image (`PIL.Image.Image`, `np.ndarray` or `ms.Tensor`):
                The image input, can be a PIL image, numpy array or mindspore tensor.
            height (`int`):
                The height to resize to.
            width (`int`):
                The width to resize to.
            resize_mode (`str`, *optional*, defaults to `default`):
                The resize mode to use, can be one of `default` or `fill`. If `default`, will resize the image to fit
                within the specified width and height, and it may not maintaining the original aspect ratio. If `fill`,
                will resize the image to fit within the specified width and height, maintaining the aspect ratio, and
                then center the image within the dimensions, filling empty with data from image. If `crop`, will resize
                the image to fit within the specified width and height, maintaining the aspect ratio, and then center
                the image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
                supported for PIL image input.

        Returns:
            `PIL.Image.Image`, `np.ndarray` or `ms.Tensor`:
                The resized image.
        """
        if resize_mode != "default" and not isinstance(image, PIL.Image.Image):
            raise ValueError(f"Only PIL image input is supported for resize_mode {resize_mode}")
        if isinstance(image, PIL.Image.Image):
            if resize_mode == "default":
                image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample])
            elif resize_mode == "fill":
                image = self._resize_and_fill(image, width, height)
            elif resize_mode == "crop":
                image = self._resize_and_crop(image, width, height)
            else:
                raise ValueError(f"resize_mode {resize_mode} is not supported")

        elif isinstance(image, ms.Tensor):
            image = ops.interpolate(
                image,
                size=(height, width),
            )
        elif isinstance(image, np.ndarray):
            image = self.numpy_to_ms(image)
            image = ops.interpolate(
                image,
                size=(height, width),
            )
            image = self.ms_to_numpy(image)
        return image

    def binarize(self, image: PIL.Image.Image) -> PIL.Image.Image:
        """
        Create a mask.

        Args:
            image (`PIL.Image.Image`):
                The image input, should be a PIL image.

        Returns:
            `PIL.Image.Image`:
                The binarized image. Values less than 0.5 are set to 0, values greater than 0.5 are set to 1.
        """
        image[image < 0.5] = 0
        image[image >= 0.5] = 1

        return image

    def get_default_height_width(
        self,
        image: Union[PIL.Image.Image, np.ndarray, ms.Tensor],
        height: Optional[int] = None,
        width: Optional[int] = None,
    ) -> Tuple[int, int]:
        r"""
        Returns the height and width of the image, downscaled to the next integer multiple of `vae_scale_factor`.

        Args:
            image (`Union[PIL.Image.Image, np.ndarray, ms.Tensor]`):
                The image input, which can be a PIL image, NumPy array, or MindSpore tensor. If it is a NumPy array, it
                should have shape `[batch, height, width]` or `[batch, height, width, channels]`. If it is a MindSpore
                tensor, it should have shape `[batch, channels, height, width]`.
            height (`Optional[int]`, *optional*, defaults to `None`):
                The height of the preprocessed image. If `None`, the height of the `image` input will be used.
            width (`Optional[int]`, *optional*, defaults to `None`):
                The width of the preprocessed image. If `None`, the width of the `image` input will be used.

        Returns:
            `Tuple[int, int]`:
                A tuple containing the height and width, both resized to the nearest integer multiple of
                `vae_scale_factor`.
        """

        if height is None:
            if isinstance(image, PIL.Image.Image):
                height = image.height
            elif isinstance(image, ms.Tensor):
                height = image.shape[2]
            else:
                height = image.shape[1]

        if width is None:
            if isinstance(image, PIL.Image.Image):
                width = image.width
            elif isinstance(image, ms.Tensor):
                width = image.shape[3]
            else:
                width = image.shape[2]

        width, height = (
            x - x % self.config.vae_scale_factor for x in (width, height)
        )  # resize to integer multiple of vae_scale_factor

        return height, width

    def preprocess(
        self,
        image: PipelineImageInput,
        height: Optional[int] = None,
        width: Optional[int] = None,
        resize_mode: str = "default",  # "default", "fill", "crop"
        crops_coords: Optional[Tuple[int, int, int, int]] = None,
    ) -> ms.Tensor:
        """
        Preprocess the image input.

        Args:
            image (`PipelineImageInput`):
                The image input, accepted formats are PIL images, NumPy arrays, MindSpore tensors; Also accept list of
                supported formats.
            height (`int`, *optional*):
                The height in preprocessed image. If `None`, will use the `get_default_height_width()` to get default
                height.
            width (`int`, *optional*):
                The width in preprocessed. If `None`, will use get_default_height_width()` to get the default width.
            resize_mode (`str`, *optional*, defaults to `default`):
                The resize mode, can be one of `default` or `fill`. If `default`, will resize the image to fit within
                the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will
                resize the image to fit within the specified width and height, maintaining the aspect ratio, and then
                center the image within the dimensions, filling empty with data from image. If `crop`, will resize the
                image to fit within the specified width and height, maintaining the aspect ratio, and then center the
                image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
                supported for PIL image input.
            crops_coords (`List[Tuple[int, int, int, int]]`, *optional*, defaults to `None`):
                The crop coordinates for each image in the batch. If `None`, will not crop the image.

        Returns:
            `ms.Tensor`:
                The preprocessed image.
        """
        supported_formats = (PIL.Image.Image, np.ndarray, ms.Tensor)

        # Expand the missing dimension for 3-dimensional mindspore tensor or numpy array that represents grayscale image
        if self.config.do_convert_grayscale and isinstance(image, (ms.Tensor, np.ndarray)) and image.ndim == 3:
            if isinstance(image, ms.Tensor):
                # if image is a mindspore tensor could have 2 possible shapes:
                #    1. batch x height x width: we should insert the channel dimension at position 1
                #    2. channnel x height x width: we should insert batch dimension at position 0,
                #       however, since both channel and batch dimension has same size 1, it is same to insert at position 1
                #    for simplicity, we insert a dimension of size 1 at position 1 for both cases
                image = image.unsqueeze(1)
            else:
                # if it is a numpy array, it could have 2 possible shapes:
                #   1. batch x height x width: insert channel dimension on last position
                #   2. height x width x channel: insert batch dimension on first position
                if image.shape[-1] == 1:
                    image = np.expand_dims(image, axis=0)
                else:
                    image = np.expand_dims(image, axis=-1)

        if isinstance(image, list) and isinstance(image[0], np.ndarray) and image[0].ndim == 4:
            warnings.warn(
                "Passing `image` as a list of 4d np.ndarray is deprecated."
                "Please concatenate the list along the batch dimension and pass it as a single 4d np.ndarray",
                FutureWarning,
            )
            image = np.concatenate(image, axis=0)
        if isinstance(image, list) and isinstance(image[0], ms.Tensor) and image[0].ndim == 4:
            warnings.warn(
                "Passing `image` as a list of 4d ms.Tensor is deprecated."
                "Please concatenate the list along the batch dimension and pass it as a single 4d ms.Tensor",
                FutureWarning,
            )
            image = ops.cat(image, axis=0)

        if not is_valid_image_imagelist(image):
            raise ValueError(
                f"Input is in incorrect format. Currently, we only support {', '.join(str(x) for x in supported_formats)}"
            )
        if not isinstance(image, list):
            image = [image]

        if isinstance(image[0], PIL.Image.Image):
            if crops_coords is not None:
                image = [i.crop(crops_coords) for i in image]
            if self.config.do_resize:
                height, width = self.get_default_height_width(image[0], height, width)
                image = [self.resize(i, height, width, resize_mode=resize_mode) for i in image]
            if self.config.do_convert_rgb:
                image = [self.convert_to_rgb(i) for i in image]
            elif self.config.do_convert_grayscale:
                image = [self.convert_to_grayscale(i) for i in image]
            image = self.pil_to_numpy(image)  # to np
            image = self.numpy_to_ms(image)  # to ms

        elif isinstance(image[0], np.ndarray):
            image = np.concatenate(image, axis=0) if image[0].ndim == 4 else np.stack(image, axis=0)

            image = self.numpy_to_ms(image)

            height, width = self.get_default_height_width(image, height, width)
            if self.config.do_resize:
                image = self.resize(image, height, width)

        elif isinstance(image[0], ms.Tensor):
            image = ops.cat(image, axis=0) if image[0].ndim == 4 else ops.stack(image, axis=0)

            if self.config.do_convert_grayscale and image.ndim == 3:
                image = image.unsqueeze(1)

            channel = image.shape[1]
            # don't need any preprocess if the image is latents
            if channel == self.config.vae_latent_channels:
                return image

            height, width = self.get_default_height_width(image, height, width)
            if self.config.do_resize:
                image = self.resize(image, height, width)

        # expected range [0,1], normalize to [-1,1]
        do_normalize = self.config.do_normalize
        if do_normalize and image.min() < 0:
            warnings.warn(
                "Passing `image` as MindSpore tensor with value range in [-1,1] is deprecated. The expected value range for image tensor is [0,1] "
                f"when passing as mindspore tensor or numpy Array. You passed `image` with value range [{image.min()},{image.max()}]",
                FutureWarning,
            )
            do_normalize = False
        if do_normalize:
            image = self.normalize(image)

        if self.config.do_binarize:
            image = self.binarize(image)

        return image

    def postprocess(
        self,
        image: ms.Tensor,
        output_type: str = "pil",
        do_denormalize: Optional[List[bool]] = None,
    ) -> Union[PIL.Image.Image, np.ndarray, ms.Tensor]:
        """
        Postprocess the image output from tensor to `output_type`.

        Args:
            image (`ms.Tensor`):
                The image input, should be a mindspore tensor with shape `B x C x H x W`.
            output_type (`str`, *optional*, defaults to `pil`):
                The output type of the image, can be one of `pil`, `np`, `pt`, `latent`.
            do_denormalize (`List[bool]`, *optional*, defaults to `None`):
                Whether to denormalize the image to [0,1]. If `None`, will use the value of `do_normalize` in the
                `VaeImageProcessor` config.

        Returns:
            `PIL.Image.Image`, `np.ndarray` or `ms.Tensor`:
                The postprocessed image.
        """
        if not isinstance(image, ms.Tensor):
            raise ValueError(
                f"Input for postprocessing is in incorrect format: {type(image)}. We only support mindspore tensor"
            )
        if output_type not in ["latent", "ms", "np", "pil"]:
            deprecation_message = (
                f"the output_type {output_type} is outdated and has been set to `np`. Please make sure to set it to one of these instead: "
                "`pil`, `np`, `ms`, `latent`"
            )
            deprecate("Unsupported output_type", "1.0.0", deprecation_message, standard_warn=False)
            output_type = "np"

        if output_type == "latent":
            return image

        if do_denormalize is None:
            do_denormalize = [self.config.do_normalize] * image.shape[0]

        image = ops.stack(
            [self.denormalize(image[i]) if do_denormalize[i] else image[i] for i in range(image.shape[0])]
        )

        if output_type == "ms":
            return image

        image = self.ms_to_numpy(image)

        if output_type == "np":
            return image

        if output_type == "pil":
            return self.numpy_to_pil(image)

    def apply_overlay(
        self,
        mask: PIL.Image.Image,
        init_image: PIL.Image.Image,
        image: PIL.Image.Image,
        crop_coords: Optional[Tuple[int, int, int, int]] = None,
    ) -> PIL.Image.Image:
        r"""
        Applies an overlay of the mask and the inpainted image on the original image.

        Args:
            mask (`PIL.Image.Image`):
                The mask image that highlights regions to overlay.
            init_image (`PIL.Image.Image`):
                The original image to which the overlay is applied.
            image (`PIL.Image.Image`):
                The image to overlay onto the original.
            crop_coords (`Tuple[int, int, int, int]`, *optional*):
                Coordinates to crop the image. If provided, the image will be cropped accordingly.

        Returns:
            `PIL.Image.Image`:
                The final image with the overlay applied.
        """

        width, height = image.width, image.height

        init_image = self.resize(init_image, width=width, height=height)
        mask = self.resize(mask, width=width, height=height)

        init_image_masked = PIL.Image.new("RGBa", (width, height))
        init_image_masked.paste(init_image.convert("RGBA").convert("RGBa"), mask=ImageOps.invert(mask.convert("L")))
        init_image_masked = init_image_masked.convert("RGBA")

        if crop_coords is not None:
            x, y, x2, y2 = crop_coords
            w = x2 - x
            h = y2 - y
            base_image = PIL.Image.new("RGBA", (width, height))
            image = self.resize(image, height=h, width=w, resize_mode="crop")
            base_image.paste(image, (x, y))
            image = base_image.convert("RGB")

        image = image.convert("RGBA")
        image.alpha_composite(init_image_masked)
        image = image.convert("RGB")

        return image

`mindone.diffusers.image_processor.VaeImageProcessor.apply_overlay(mask, init_image, image, crop_coords=None)` ¶

Applies an overlay of the mask and the inpainted image on the original image.

PARAMETER	DESCRIPTION
`mask`	The mask image that highlights regions to overlay. TYPE: `PIL.Image.Image`
`init_image`	The original image to which the overlay is applied. TYPE: `PIL.Image.Image`
`image`	The image to overlay onto the original. TYPE: `PIL.Image.Image`
`crop_coords`	Coordinates to crop the image. If provided, the image will be cropped accordingly. TYPE: `Tuple[int, int, int, int]`, optional DEFAULT: `None`

RETURNS	DESCRIPTION
`Image`	`PIL.Image.Image`: The final image with the overlay applied.

Source code in mindone/diffusers/image_processor.py

def apply_overlay(
    self,
    mask: PIL.Image.Image,
    init_image: PIL.Image.Image,
    image: PIL.Image.Image,
    crop_coords: Optional[Tuple[int, int, int, int]] = None,
) -> PIL.Image.Image:
    r"""
    Applies an overlay of the mask and the inpainted image on the original image.

    Args:
        mask (`PIL.Image.Image`):
            The mask image that highlights regions to overlay.
        init_image (`PIL.Image.Image`):
            The original image to which the overlay is applied.
        image (`PIL.Image.Image`):
            The image to overlay onto the original.
        crop_coords (`Tuple[int, int, int, int]`, *optional*):
            Coordinates to crop the image. If provided, the image will be cropped accordingly.

    Returns:
        `PIL.Image.Image`:
            The final image with the overlay applied.
    """

    width, height = image.width, image.height

    init_image = self.resize(init_image, width=width, height=height)
    mask = self.resize(mask, width=width, height=height)

    init_image_masked = PIL.Image.new("RGBa", (width, height))
    init_image_masked.paste(init_image.convert("RGBA").convert("RGBa"), mask=ImageOps.invert(mask.convert("L")))
    init_image_masked = init_image_masked.convert("RGBA")

    if crop_coords is not None:
        x, y, x2, y2 = crop_coords
        w = x2 - x
        h = y2 - y
        base_image = PIL.Image.new("RGBA", (width, height))
        image = self.resize(image, height=h, width=w, resize_mode="crop")
        base_image.paste(image, (x, y))
        image = base_image.convert("RGB")

    image = image.convert("RGBA")
    image.alpha_composite(init_image_masked)
    image = image.convert("RGB")

    return image

`mindone.diffusers.image_processor.VaeImageProcessor.binarize(image)` ¶

Create a mask.

PARAMETER	DESCRIPTION
`image`	The image input, should be a PIL image. TYPE: `PIL.Image.Image`

RETURNS	DESCRIPTION
`Image`	`PIL.Image.Image`: The binarized image. Values less than 0.5 are set to 0, values greater than 0.5 are set to 1.

Source code in mindone/diffusers/image_processor.py

def binarize(self, image: PIL.Image.Image) -> PIL.Image.Image:
    """
    Create a mask.

    Args:
        image (`PIL.Image.Image`):
            The image input, should be a PIL image.

    Returns:
        `PIL.Image.Image`:
            The binarized image. Values less than 0.5 are set to 0, values greater than 0.5 are set to 1.
    """
    image[image < 0.5] = 0
    image[image >= 0.5] = 1

    return image

`mindone.diffusers.image_processor.VaeImageProcessor.blur(image, blur_factor=4)` `staticmethod` ¶

Applies Gaussian blur to an image.

PARAMETER	DESCRIPTION
`image`	The PIL image to convert to grayscale. TYPE: `PIL.Image.Image`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def blur(image: PIL.Image.Image, blur_factor: int = 4) -> PIL.Image.Image:
    r"""
    Applies Gaussian blur to an image.

    Args:
        image (`PIL.Image.Image`):
            The PIL image to convert to grayscale.
    Returns:
        `PIL.Image.Image`:
            The grayscale-converted PIL image.
    """
    image = image.filter(ImageFilter.GaussianBlur(blur_factor))

    return image

`mindone.diffusers.image_processor.VaeImageProcessor.convert_to_grayscale(image)` `staticmethod` ¶

Converts a given PIL image to grayscale.

PARAMETER	DESCRIPTION
`image`	The input image to convert. TYPE: `PIL.Image.Image`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def convert_to_grayscale(image: PIL.Image.Image) -> PIL.Image.Image:
    r"""
    Converts a given PIL image to grayscale.

    Args:
        image (`PIL.Image.Image`):
            The input image to convert.
    Returns:
        `PIL.Image.Image`:
            The image converted to grayscale.
    """
    image = image.convert("L")

    return image

`mindone.diffusers.image_processor.VaeImageProcessor.convert_to_rgb(image)` `staticmethod` ¶

Converts a PIL image to RGB format.

PARAMETER	DESCRIPTION
`image`	The PIL image to convert to RGB. TYPE: `PIL.Image.Image`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def convert_to_rgb(image: PIL.Image.Image) -> PIL.Image.Image:
    r"""
    Converts a PIL image to RGB format.

    Args:
        image (`PIL.Image.Image`):
            The PIL image to convert to RGB.
    Returns:
        `PIL.Image.Image`:
            The RGB-converted PIL image.
    """
    image = image.convert("RGB")

    return image

`mindone.diffusers.image_processor.VaeImageProcessor.denormalize(images)` `staticmethod` ¶

Denormalize an image array to [0,1].

PARAMETER	DESCRIPTION
`images`	The image array to denormalize. TYPE: `np.ndarray` or `ms.Tensor`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def denormalize(images: Union[np.ndarray, ms.Tensor]) -> Union[np.ndarray, ms.Tensor]:
    r"""
    Denormalize an image array to [0,1].

    Args:
        images (`np.ndarray` or `ms.Tensor`):
            The image array to denormalize.
    Returns:
        `np.ndarray` or `ms.Tensor`:
            The denormalized image array.
    """
    return (images / 2 + 0.5).clamp(0, 1)

`mindone.diffusers.image_processor.VaeImageProcessor.get_crop_region(mask_image, width, height, pad=0)` `staticmethod` ¶

Finds a rectangular region that contains all masked ares in an image, and expands region to match the aspect ratio of the original image; for example, if user drew mask in a 128x32 region, and the dimensions for processing are 512x512, the region will be expanded to 128x128.

PARAMETER	DESCRIPTION
`mask_image`	Mask image. TYPE: `Image`
`width`	Width of the image to be processed. TYPE: `int`
`height`	Height of the image to be processed. TYPE: `int`
`pad`	Padding to be added to the crop region. Defaults to 0. TYPE: `int` DEFAULT: `0`

RETURNS	DESCRIPTION
`tuple`	(x1, y1, x2, y2) represent a rectangular region that contains all masked ares in an image and
	matches the original aspect ratio.

Source code in mindone/diffusers/image_processor.py

@staticmethod
def get_crop_region(mask_image: PIL.Image.Image, width: int, height: int, pad=0):
    r"""
    Finds a rectangular region that contains all masked ares in an image, and expands region to match the aspect
    ratio of the original image; for example, if user drew mask in a 128x32 region, and the dimensions for
    processing are 512x512, the region will be expanded to 128x128.

    Args:
        mask_image (PIL.Image.Image): Mask image.
        width (int): Width of the image to be processed.
        height (int): Height of the image to be processed.
        pad (int, optional): Padding to be added to the crop region. Defaults to 0.

    Returns:
        tuple: (x1, y1, x2, y2) represent a rectangular region that contains all masked ares in an image and
        matches the original aspect ratio.
    """

    mask_image = mask_image.convert("L")
    mask = np.array(mask_image)

    # 1. find a rectangular region that contains all masked ares in an image
    h, w = mask.shape
    crop_left = 0
    for i in range(w):
        if not (mask[:, i] == 0).all():
            break
        crop_left += 1

    crop_right = 0
    for i in reversed(range(w)):
        if not (mask[:, i] == 0).all():
            break
        crop_right += 1

    crop_top = 0
    for i in range(h):
        if not (mask[i] == 0).all():
            break
        crop_top += 1

    crop_bottom = 0
    for i in reversed(range(h)):
        if not (mask[i] == 0).all():
            break
        crop_bottom += 1

    # 2. add padding to the crop region
    x1, y1, x2, y2 = (
        int(max(crop_left - pad, 0)),
        int(max(crop_top - pad, 0)),
        int(min(w - crop_right + pad, w)),
        int(min(h - crop_bottom + pad, h)),
    )

    # 3. expands crop region to match the aspect ratio of the image to be processed
    ratio_crop_region = (x2 - x1) / (y2 - y1)
    ratio_processing = width / height

    if ratio_crop_region > ratio_processing:
        desired_height = (x2 - x1) / ratio_processing
        desired_height_diff = int(desired_height - (y2 - y1))
        y1 -= desired_height_diff // 2
        y2 += desired_height_diff - desired_height_diff // 2
        if y2 >= mask_image.height:
            diff = y2 - mask_image.height
            y2 -= diff
            y1 -= diff
        if y1 < 0:
            y2 -= y1
            y1 -= y1
        if y2 >= mask_image.height:
            y2 = mask_image.height
    else:
        desired_width = (y2 - y1) * ratio_processing
        desired_width_diff = int(desired_width - (x2 - x1))
        x1 -= desired_width_diff // 2
        x2 += desired_width_diff - desired_width_diff // 2
        if x2 >= mask_image.width:
            diff = x2 - mask_image.width
            x2 -= diff
            x1 -= diff
        if x1 < 0:
            x2 -= x1
            x1 -= x1
        if x2 >= mask_image.width:
            x2 = mask_image.width

    return x1, y1, x2, y2

`mindone.diffusers.image_processor.VaeImageProcessor.get_default_height_width(image, height=None, width=None)` ¶

Returns the height and width of the image, downscaled to the next integer multiple of vae_scale_factor.

PARAMETER	DESCRIPTION
`image`	The image input, which can be a PIL image, NumPy array, or MindSpore tensor. If it is a NumPy array, it should have shape `[batch, height, width]` or `[batch, height, width, channels]`. If it is a MindSpore tensor, it should have shape `[batch, channels, height, width]`. TYPE: `Union[PIL.Image.Image, np.ndarray, ms.Tensor]`
`height`	The height of the preprocessed image. If `None`, the height of the `image` input will be used. TYPE: `Optional[int]`, optional, defaults to `None` DEFAULT: `None`
`width`	The width of the preprocessed image. If `None`, the width of the `image` input will be used. TYPE: `Optional[int]`, optional, defaults to `None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tuple[int, int]`	`Tuple[int, int]`: A tuple containing the height and width, both resized to the nearest integer multiple of `vae_scale_factor`.

Source code in mindone/diffusers/image_processor.py

def get_default_height_width(
    self,
    image: Union[PIL.Image.Image, np.ndarray, ms.Tensor],
    height: Optional[int] = None,
    width: Optional[int] = None,
) -> Tuple[int, int]:
    r"""
    Returns the height and width of the image, downscaled to the next integer multiple of `vae_scale_factor`.

    Args:
        image (`Union[PIL.Image.Image, np.ndarray, ms.Tensor]`):
            The image input, which can be a PIL image, NumPy array, or MindSpore tensor. If it is a NumPy array, it
            should have shape `[batch, height, width]` or `[batch, height, width, channels]`. If it is a MindSpore
            tensor, it should have shape `[batch, channels, height, width]`.
        height (`Optional[int]`, *optional*, defaults to `None`):
            The height of the preprocessed image. If `None`, the height of the `image` input will be used.
        width (`Optional[int]`, *optional*, defaults to `None`):
            The width of the preprocessed image. If `None`, the width of the `image` input will be used.

    Returns:
        `Tuple[int, int]`:
            A tuple containing the height and width, both resized to the nearest integer multiple of
            `vae_scale_factor`.
    """

    if height is None:
        if isinstance(image, PIL.Image.Image):
            height = image.height
        elif isinstance(image, ms.Tensor):
            height = image.shape[2]
        else:
            height = image.shape[1]

    if width is None:
        if isinstance(image, PIL.Image.Image):
            width = image.width
        elif isinstance(image, ms.Tensor):
            width = image.shape[3]
        else:
            width = image.shape[2]

    width, height = (
        x - x % self.config.vae_scale_factor for x in (width, height)
    )  # resize to integer multiple of vae_scale_factor

    return height, width

`mindone.diffusers.image_processor.VaeImageProcessor.ms_to_numpy(images)` `staticmethod` ¶

Convert a MindSpore tensor to a NumPy image.

PARAMETER	DESCRIPTION
`images`	The MindSpore tensor to convert to NumPy format. TYPE: `ms.Tensor`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def ms_to_numpy(images: ms.Tensor) -> np.ndarray:
    r"""
    Convert a MindSpore tensor to a NumPy image.

    Args:
        images (`ms.Tensor`):
            The MindSpore tensor to convert to NumPy format.
    Returns:
        `np.ndarray`:
            A NumPy array representation of the images.
    """
    images = images.permute(0, 2, 3, 1).float().numpy()
    return images

`mindone.diffusers.image_processor.VaeImageProcessor.normalize(images)` `staticmethod` ¶

Normalize an image array to [-1,1].

PARAMETER	DESCRIPTION
`images`	The image array to normalize. TYPE: `np.ndarray` or `ms.Tensor`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def normalize(images: Union[np.ndarray, ms.Tensor]) -> Union[np.ndarray, ms.Tensor]:
    r"""
    Normalize an image array to [-1,1].

    Args:
        images (`np.ndarray` or `ms.Tensor`):
            The image array to normalize.
    Returns:
        `np.ndarray` or `ms.Tensor`:
            The normalized image array.
    """
    return 2.0 * images - 1.0

`mindone.diffusers.image_processor.VaeImageProcessor.numpy_to_ms(images)` `staticmethod` ¶

Convert a NumPy image to a MindSpore tensor.

PARAMETER	DESCRIPTION
`images`	The NumPy image array to convert to MindSpore format. TYPE: `np.ndarray`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def numpy_to_ms(images: np.ndarray) -> ms.Tensor:
    r"""
    Convert a NumPy image to a MindSpore tensor.

    Args:
        images (`np.ndarray`):
            The NumPy image array to convert to MindSpore format.
    Returns:
        `ms.Tensor`:
            A MindSpore tensor representation of the images.
    """
    if images.ndim == 3:
        images = images[..., None]

    images = ms.Tensor(images.transpose(0, 3, 1, 2))
    return images

`mindone.diffusers.image_processor.VaeImageProcessor.numpy_to_pil(images)` `staticmethod` ¶

Convert a numpy image or a batch of images to a PIL image.

PARAMETER	DESCRIPTION
`images`	The image array to convert to PIL format. TYPE: `np.ndarray`

Source code in mindone/diffusers/image_processor.py

@staticmethod
def numpy_to_pil(images: np.ndarray) -> List[PIL.Image.Image]:
    r"""
    Convert a numpy image or a batch of images to a PIL image.

    Args:
        images (`np.ndarray`):
            The image array to convert to PIL format.
    Returns:
        `List[PIL.Image.Image]`:
            A list of PIL images.
    """
    if images.ndim == 3:
        images = images[None, ...]
    images = (images * 255).round().astype("uint8")
    if images.shape[-1] == 1:
        # special case for grayscale (single channel) images
        pil_images = [Image.fromarray(image.squeeze(), mode="L") for image in images]
    else:
        pil_images = [Image.fromarray(image) for image in images]

    return pil_images

`mindone.diffusers.image_processor.VaeImageProcessor.pil_to_numpy(images)` `staticmethod` ¶

Convert a PIL image or a list of PIL images to NumPy arrays. Args: images (PIL.Image.Image or List[PIL.Image.Image]): The PIL image or list of images to convert to NumPy format. Returns: np.ndarray: A NumPy array representation of the images.

Source code in mindone/diffusers/image_processor.py

@staticmethod
def pil_to_numpy(images: Union[List[PIL.Image.Image], PIL.Image.Image]) -> np.ndarray:
    r"""
    Convert a PIL image or a list of PIL images to NumPy arrays.
    Args:
        images (`PIL.Image.Image` or `List[PIL.Image.Image]`):
            The PIL image or list of images to convert to NumPy format.
    Returns:
        `np.ndarray`:
            A NumPy array representation of the images.
    """
    if not isinstance(images, list):
        images = [images]
    images = [np.array(image).astype(np.float32) / 255.0 for image in images]
    images = np.stack(images, axis=0)

    return images

`mindone.diffusers.image_processor.VaeImageProcessor.postprocess(image, output_type='pil', do_denormalize=None)` ¶

Postprocess the image output from tensor to output_type.

PARAMETER	DESCRIPTION
`image`	The image input, should be a mindspore tensor with shape `B x C x H x W`. TYPE: `ms.Tensor`
`output_type`	The output type of the image, can be one of `pil`, `np`, `pt`, `latent`. TYPE: `str`, optional, defaults to `pil` DEFAULT: `'pil'`
`do_denormalize`	Whether to denormalize the image to [0,1]. If `None`, will use the value of `do_normalize` in the `VaeImageProcessor` config. TYPE: `List[bool]`, optional, defaults to `None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Union[Image, ndarray, Tensor]`	`PIL.Image.Image`, `np.ndarray` or `ms.Tensor`: The postprocessed image.

Source code in mindone/diffusers/image_processor.py

def postprocess(
    self,
    image: ms.Tensor,
    output_type: str = "pil",
    do_denormalize: Optional[List[bool]] = None,
) -> Union[PIL.Image.Image, np.ndarray, ms.Tensor]:
    """
    Postprocess the image output from tensor to `output_type`.

    Args:
        image (`ms.Tensor`):
            The image input, should be a mindspore tensor with shape `B x C x H x W`.
        output_type (`str`, *optional*, defaults to `pil`):
            The output type of the image, can be one of `pil`, `np`, `pt`, `latent`.
        do_denormalize (`List[bool]`, *optional*, defaults to `None`):
            Whether to denormalize the image to [0,1]. If `None`, will use the value of `do_normalize` in the
            `VaeImageProcessor` config.

    Returns:
        `PIL.Image.Image`, `np.ndarray` or `ms.Tensor`:
            The postprocessed image.
    """
    if not isinstance(image, ms.Tensor):
        raise ValueError(
            f"Input for postprocessing is in incorrect format: {type(image)}. We only support mindspore tensor"
        )
    if output_type not in ["latent", "ms", "np", "pil"]:
        deprecation_message = (
            f"the output_type {output_type} is outdated and has been set to `np`. Please make sure to set it to one of these instead: "
            "`pil`, `np`, `ms`, `latent`"
        )
        deprecate("Unsupported output_type", "1.0.0", deprecation_message, standard_warn=False)
        output_type = "np"

    if output_type == "latent":
        return image

    if do_denormalize is None:
        do_denormalize = [self.config.do_normalize] * image.shape[0]

    image = ops.stack(
        [self.denormalize(image[i]) if do_denormalize[i] else image[i] for i in range(image.shape[0])]
    )

    if output_type == "ms":
        return image

    image = self.ms_to_numpy(image)

    if output_type == "np":
        return image

    if output_type == "pil":
        return self.numpy_to_pil(image)

`mindone.diffusers.image_processor.VaeImageProcessor.preprocess(image, height=None, width=None, resize_mode='default', crops_coords=None)` ¶

Preprocess the image input.

PARAMETER	DESCRIPTION
`image`	The image input, accepted formats are PIL images, NumPy arrays, MindSpore tensors; Also accept list of supported formats. TYPE: `PipelineImageInput`
`height`	The height in preprocessed image. If `None`, will use the `get_default_height_width()` to get default height. TYPE: `int`, optional DEFAULT: `None`
`width`	The width in preprocessed. If `None`, will use get_default_height_width()` to get the default width. TYPE: `int`, optional DEFAULT: `None`
`resize_mode`	The resize mode, can be one of `default` or `fill`. If `default`, will resize the image to fit within the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will resize the image to fit within the specified width and height, maintaining the aspect ratio, and then center the image within the dimensions, filling empty with data from image. If `crop`, will resize the image to fit within the specified width and height, maintaining the aspect ratio, and then center the image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only supported for PIL image input. TYPE: `str`, optional, defaults to `default` DEFAULT: `'default'`
`crops_coords`	The crop coordinates for each image in the batch. If `None`, will not crop the image. TYPE: `List[Tuple[int, int, int, int]]`, optional, defaults to `None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Tensor`	`ms.Tensor`: The preprocessed image.

Source code in mindone/diffusers/image_processor.py

def preprocess(
    self,
    image: PipelineImageInput,
    height: Optional[int] = None,
    width: Optional[int] = None,
    resize_mode: str = "default",  # "default", "fill", "crop"
    crops_coords: Optional[Tuple[int, int, int, int]] = None,
) -> ms.Tensor:
    """
    Preprocess the image input.

    Args:
        image (`PipelineImageInput`):
            The image input, accepted formats are PIL images, NumPy arrays, MindSpore tensors; Also accept list of
            supported formats.
        height (`int`, *optional*):
            The height in preprocessed image. If `None`, will use the `get_default_height_width()` to get default
            height.
        width (`int`, *optional*):
            The width in preprocessed. If `None`, will use get_default_height_width()` to get the default width.
        resize_mode (`str`, *optional*, defaults to `default`):
            The resize mode, can be one of `default` or `fill`. If `default`, will resize the image to fit within
            the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will
            resize the image to fit within the specified width and height, maintaining the aspect ratio, and then
            center the image within the dimensions, filling empty with data from image. If `crop`, will resize the
            image to fit within the specified width and height, maintaining the aspect ratio, and then center the
            image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
            supported for PIL image input.
        crops_coords (`List[Tuple[int, int, int, int]]`, *optional*, defaults to `None`):
            The crop coordinates for each image in the batch. If `None`, will not crop the image.

    Returns:
        `ms.Tensor`:
            The preprocessed image.
    """
    supported_formats = (PIL.Image.Image, np.ndarray, ms.Tensor)

    # Expand the missing dimension for 3-dimensional mindspore tensor or numpy array that represents grayscale image
    if self.config.do_convert_grayscale and isinstance(image, (ms.Tensor, np.ndarray)) and image.ndim == 3:
        if isinstance(image, ms.Tensor):
            # if image is a mindspore tensor could have 2 possible shapes:
            #    1. batch x height x width: we should insert the channel dimension at position 1
            #    2. channnel x height x width: we should insert batch dimension at position 0,
            #       however, since both channel and batch dimension has same size 1, it is same to insert at position 1
            #    for simplicity, we insert a dimension of size 1 at position 1 for both cases
            image = image.unsqueeze(1)
        else:
            # if it is a numpy array, it could have 2 possible shapes:
            #   1. batch x height x width: insert channel dimension on last position
            #   2. height x width x channel: insert batch dimension on first position
            if image.shape[-1] == 1:
                image = np.expand_dims(image, axis=0)
            else:
                image = np.expand_dims(image, axis=-1)

    if isinstance(image, list) and isinstance(image[0], np.ndarray) and image[0].ndim == 4:
        warnings.warn(
            "Passing `image` as a list of 4d np.ndarray is deprecated."
            "Please concatenate the list along the batch dimension and pass it as a single 4d np.ndarray",
            FutureWarning,
        )
        image = np.concatenate(image, axis=0)
    if isinstance(image, list) and isinstance(image[0], ms.Tensor) and image[0].ndim == 4:
        warnings.warn(
            "Passing `image` as a list of 4d ms.Tensor is deprecated."
            "Please concatenate the list along the batch dimension and pass it as a single 4d ms.Tensor",
            FutureWarning,
        )
        image = ops.cat(image, axis=0)

    if not is_valid_image_imagelist(image):
        raise ValueError(
            f"Input is in incorrect format. Currently, we only support {', '.join(str(x) for x in supported_formats)}"
        )
    if not isinstance(image, list):
        image = [image]

    if isinstance(image[0], PIL.Image.Image):
        if crops_coords is not None:
            image = [i.crop(crops_coords) for i in image]
        if self.config.do_resize:
            height, width = self.get_default_height_width(image[0], height, width)
            image = [self.resize(i, height, width, resize_mode=resize_mode) for i in image]
        if self.config.do_convert_rgb:
            image = [self.convert_to_rgb(i) for i in image]
        elif self.config.do_convert_grayscale:
            image = [self.convert_to_grayscale(i) for i in image]
        image = self.pil_to_numpy(image)  # to np
        image = self.numpy_to_ms(image)  # to ms

    elif isinstance(image[0], np.ndarray):
        image = np.concatenate(image, axis=0) if image[0].ndim == 4 else np.stack(image, axis=0)

        image = self.numpy_to_ms(image)

        height, width = self.get_default_height_width(image, height, width)
        if self.config.do_resize:
            image = self.resize(image, height, width)

    elif isinstance(image[0], ms.Tensor):
        image = ops.cat(image, axis=0) if image[0].ndim == 4 else ops.stack(image, axis=0)

        if self.config.do_convert_grayscale and image.ndim == 3:
            image = image.unsqueeze(1)

        channel = image.shape[1]
        # don't need any preprocess if the image is latents
        if channel == self.config.vae_latent_channels:
            return image

        height, width = self.get_default_height_width(image, height, width)
        if self.config.do_resize:
            image = self.resize(image, height, width)

    # expected range [0,1], normalize to [-1,1]
    do_normalize = self.config.do_normalize
    if do_normalize and image.min() < 0:
        warnings.warn(
            "Passing `image` as MindSpore tensor with value range in [-1,1] is deprecated. The expected value range for image tensor is [0,1] "
            f"when passing as mindspore tensor or numpy Array. You passed `image` with value range [{image.min()},{image.max()}]",
            FutureWarning,
        )
        do_normalize = False
    if do_normalize:
        image = self.normalize(image)

    if self.config.do_binarize:
        image = self.binarize(image)

    return image

`mindone.diffusers.image_processor.VaeImageProcessor.resize(image, height, width, resize_mode='default')` ¶

Resize image.

PARAMETER	DESCRIPTION
`image`	The image input, can be a PIL image, numpy array or mindspore tensor. TYPE: `PIL.Image.Image`, `np.ndarray` or `ms.Tensor`
`height`	The height to resize to. TYPE: `int`
`width`	The width to resize to. TYPE: `int`
`resize_mode`	The resize mode to use, can be one of `default` or `fill`. If `default`, will resize the image to fit within the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will resize the image to fit within the specified width and height, maintaining the aspect ratio, and then center the image within the dimensions, filling empty with data from image. If `crop`, will resize the image to fit within the specified width and height, maintaining the aspect ratio, and then center the image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only supported for PIL image input. TYPE: `str`, optional, defaults to `default` DEFAULT: `'default'`

RETURNS	DESCRIPTION
`Union[Image, ndarray, Tensor]`	`PIL.Image.Image`, `np.ndarray` or `ms.Tensor`: The resized image.

Source code in mindone/diffusers/image_processor.py

def resize(
    self,
    image: Union[PIL.Image.Image, np.ndarray, ms.Tensor],
    height: int,
    width: int,
    resize_mode: str = "default",  # "default", "fill", "crop"
) -> Union[PIL.Image.Image, np.ndarray, ms.Tensor]:
    """
    Resize image.

    Args:
        image (`PIL.Image.Image`, `np.ndarray` or `ms.Tensor`):
            The image input, can be a PIL image, numpy array or mindspore tensor.
        height (`int`):
            The height to resize to.
        width (`int`):
            The width to resize to.
        resize_mode (`str`, *optional*, defaults to `default`):
            The resize mode to use, can be one of `default` or `fill`. If `default`, will resize the image to fit
            within the specified width and height, and it may not maintaining the original aspect ratio. If `fill`,
            will resize the image to fit within the specified width and height, maintaining the aspect ratio, and
            then center the image within the dimensions, filling empty with data from image. If `crop`, will resize
            the image to fit within the specified width and height, maintaining the aspect ratio, and then center
            the image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only
            supported for PIL image input.

    Returns:
        `PIL.Image.Image`, `np.ndarray` or `ms.Tensor`:
            The resized image.
    """
    if resize_mode != "default" and not isinstance(image, PIL.Image.Image):
        raise ValueError(f"Only PIL image input is supported for resize_mode {resize_mode}")
    if isinstance(image, PIL.Image.Image):
        if resize_mode == "default":
            image = image.resize((width, height), resample=PIL_INTERPOLATION[self.config.resample])
        elif resize_mode == "fill":
            image = self._resize_and_fill(image, width, height)
        elif resize_mode == "crop":
            image = self._resize_and_crop(image, width, height)
        else:
            raise ValueError(f"resize_mode {resize_mode} is not supported")

    elif isinstance(image, ms.Tensor):
        image = ops.interpolate(
            image,
            size=(height, width),
        )
    elif isinstance(image, np.ndarray):
        image = self.numpy_to_ms(image)
        image = ops.interpolate(
            image,
            size=(height, width),
        )
        image = self.ms_to_numpy(image)
    return image

`mindone.diffusers.image_processor.PixArtImageProcessor` ¶

Bases: VaeImageProcessor

Image processor for PixArt image resize and crop.

PARAMETER	DESCRIPTION
`do_resize`	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept `height` and `width` arguments from [`image_processor.VaeImageProcessor.preprocess`] method. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`vae_scale_factor`	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor. TYPE: `int`, optional, defaults to `8` DEFAULT: `8`
`resample`	Resampling filter to use when resizing the image. TYPE: `str`, optional, defaults to `lanczos` DEFAULT: `'lanczos'`
`do_normalize`	Whether to normalize the image to [-1,1]. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`do_binarize`	Whether to binarize the image to 0/1. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`
`do_convert_rgb`	Whether to convert the images to RGB format. TYPE: `bool`, optional, defaults to be `False`
`do_convert_grayscale`	Whether to convert the images to grayscale format. TYPE: `bool`, optional, defaults to be `False` DEFAULT: `False`

Source code in mindone/diffusers/image_processor.py

class PixArtImageProcessor(VaeImageProcessor):
    """
    Image processor for PixArt image resize and crop.

    Args:
        do_resize (`bool`, *optional*, defaults to `True`):
            Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. Can accept
            `height` and `width` arguments from [`image_processor.VaeImageProcessor.preprocess`] method.
        vae_scale_factor (`int`, *optional*, defaults to `8`):
            VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
        resample (`str`, *optional*, defaults to `lanczos`):
            Resampling filter to use when resizing the image.
        do_normalize (`bool`, *optional*, defaults to `True`):
            Whether to normalize the image to [-1,1].
        do_binarize (`bool`, *optional*, defaults to `False`):
            Whether to binarize the image to 0/1.
        do_convert_rgb (`bool`, *optional*, defaults to be `False`):
            Whether to convert the images to RGB format.
        do_convert_grayscale (`bool`, *optional*, defaults to be `False`):
            Whether to convert the images to grayscale format.
    """

    @register_to_config
    def __init__(
        self,
        do_resize: bool = True,
        vae_scale_factor: int = 8,
        resample: str = "lanczos",
        do_normalize: bool = True,
        do_binarize: bool = False,
        do_convert_grayscale: bool = False,
    ):
        super().__init__(
            do_resize=do_resize,
            vae_scale_factor=vae_scale_factor,
            resample=resample,
            do_normalize=do_normalize,
            do_binarize=do_binarize,
            do_convert_grayscale=do_convert_grayscale,
        )

    @staticmethod
    def classify_height_width_bin(height: int, width: int, ratios: dict) -> Tuple[int, int]:
        r"""
        Returns the binned height and width based on the aspect ratio.

        Args:
            height (`int`): The height of the image.
            width (`int`): The width of the image.
            ratios (`dict`): A dictionary where keys are aspect ratios and values are tuples of (height, width).

        Returns:
            `Tuple[int, int]`: The closest binned height and width.
        """
        ar = float(height / width)
        closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))
        default_hw = ratios[closest_ratio]
        return int(default_hw[0]), int(default_hw[1])

    @staticmethod
    def resize_and_crop_tensor(samples: ms.Tensor, new_width: int, new_height: int) -> ms.Tensor:
        r"""
        Resizes and crops a tensor of images to the specified dimensions.
        Args:
            samples (`ms.Tensor`):
                A tensor of shape (N, C, H, W) where N is the batch size, C is the number of channels, H is the height,
                and W is the width.
            new_width (`int`): The desired width of the output images.
            new_height (`int`): The desired height of the output images.
        Returns:
            `ms.Tensor`: A tensor containing the resized and cropped images.
        """
        orig_height, orig_width = samples.shape[2], samples.shape[3]

        # Check if resizing is needed
        if orig_height != new_height or orig_width != new_width:
            ratio = max(new_height / orig_height, new_width / orig_width)
            resized_width = int(orig_width * ratio)
            resized_height = int(orig_height * ratio)

            # Resize
            samples = ops.interpolate(
                samples, size=(resized_height, resized_width), mode="bilinear", align_corners=False
            )

            # Center Crop
            start_x = (resized_width - new_width) // 2
            end_x = start_x + new_width
            start_y = (resized_height - new_height) // 2
            end_y = start_y + new_height
            samples = samples[:, :, start_y:end_y, start_x:end_x]

        return samples

`mindone.diffusers.image_processor.PixArtImageProcessor.classify_height_width_bin(height, width, ratios)` `staticmethod` ¶

Returns the binned height and width based on the aspect ratio.

PARAMETER	DESCRIPTION
`height`	The height of the image. TYPE: `int`
`width`	The width of the image. TYPE: `int`
`ratios`	A dictionary where keys are aspect ratios and values are tuples of (height, width). TYPE: `dict`

RETURNS	DESCRIPTION
`Tuple[int, int]`	`Tuple[int, int]`: The closest binned height and width.

Source code in mindone/diffusers/image_processor.py

@staticmethod
def classify_height_width_bin(height: int, width: int, ratios: dict) -> Tuple[int, int]:
    r"""
    Returns the binned height and width based on the aspect ratio.

    Args:
        height (`int`): The height of the image.
        width (`int`): The width of the image.
        ratios (`dict`): A dictionary where keys are aspect ratios and values are tuples of (height, width).

    Returns:
        `Tuple[int, int]`: The closest binned height and width.
    """
    ar = float(height / width)
    closest_ratio = min(ratios.keys(), key=lambda ratio: abs(float(ratio) - ar))
    default_hw = ratios[closest_ratio]
    return int(default_hw[0]), int(default_hw[1])

`mindone.diffusers.image_processor.PixArtImageProcessor.resize_and_crop_tensor(samples, new_width, new_height)` `staticmethod` ¶

Resizes and crops a tensor of images to the specified dimensions. Args: samples (ms.Tensor): A tensor of shape (N, C, H, W) where N is the batch size, C is the number of channels, H is the height, and W is the width. new_width (int): The desired width of the output images. new_height (int): The desired height of the output images. Returns: ms.Tensor: A tensor containing the resized and cropped images.

Source code in mindone/diffusers/image_processor.py

@staticmethod
def resize_and_crop_tensor(samples: ms.Tensor, new_width: int, new_height: int) -> ms.Tensor:
    r"""
    Resizes and crops a tensor of images to the specified dimensions.
    Args:
        samples (`ms.Tensor`):
            A tensor of shape (N, C, H, W) where N is the batch size, C is the number of channels, H is the height,
            and W is the width.
        new_width (`int`): The desired width of the output images.
        new_height (`int`): The desired height of the output images.
    Returns:
        `ms.Tensor`: A tensor containing the resized and cropped images.
    """
    orig_height, orig_width = samples.shape[2], samples.shape[3]

    # Check if resizing is needed
    if orig_height != new_height or orig_width != new_width:
        ratio = max(new_height / orig_height, new_width / orig_width)
        resized_width = int(orig_width * ratio)
        resized_height = int(orig_height * ratio)

        # Resize
        samples = ops.interpolate(
            samples, size=(resized_height, resized_width), mode="bilinear", align_corners=False
        )

        # Center Crop
        start_x = (resized_width - new_width) // 2
        end_x = start_x + new_width
        start_y = (resized_height - new_height) // 2
        end_y = start_y + new_height
        samples = samples[:, :, start_y:end_y, start_x:end_x]

    return samples

`mindone.diffusers.image_processor.IPAdapterMaskProcessor` ¶

Bases: VaeImageProcessor

Image processor for IP Adapter image masks.

PARAMETER	DESCRIPTION
`do_resize`	Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`vae_scale_factor`	VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor. TYPE: `int`, optional, defaults to `8` DEFAULT: `8`
`resample`	Resampling filter to use when resizing the image. TYPE: `str`, optional, defaults to `lanczos` DEFAULT: `'lanczos'`
`do_normalize`	Whether to normalize the image to [-1,1]. TYPE: `bool`, optional, defaults to `False` DEFAULT: `False`
`do_binarize`	Whether to binarize the image to 0/1. TYPE: `bool`, optional, defaults to `True` DEFAULT: `True`
`do_convert_grayscale`	Whether to convert the images to grayscale format. TYPE: `bool`, optional, defaults to be `True` DEFAULT: `True`

Source code in mindone/diffusers/image_processor.py

class IPAdapterMaskProcessor(VaeImageProcessor):
    """
    Image processor for IP Adapter image masks.

    Args:
        do_resize (`bool`, *optional*, defaults to `True`):
            Whether to downscale the image's (height, width) dimensions to multiples of `vae_scale_factor`.
        vae_scale_factor (`int`, *optional*, defaults to `8`):
            VAE scale factor. If `do_resize` is `True`, the image is automatically resized to multiples of this factor.
        resample (`str`, *optional*, defaults to `lanczos`):
            Resampling filter to use when resizing the image.
        do_normalize (`bool`, *optional*, defaults to `False`):
            Whether to normalize the image to [-1,1].
        do_binarize (`bool`, *optional*, defaults to `True`):
            Whether to binarize the image to 0/1.
        do_convert_grayscale (`bool`, *optional*, defaults to be `True`):
            Whether to convert the images to grayscale format.

    """

    config_name = CONFIG_NAME

    @register_to_config
    def __init__(
        self,
        do_resize: bool = True,
        vae_scale_factor: int = 8,
        resample: str = "lanczos",
        do_normalize: bool = False,
        do_binarize: bool = True,
        do_convert_grayscale: bool = True,
    ):
        super().__init__(
            do_resize=do_resize,
            vae_scale_factor=vae_scale_factor,
            resample=resample,
            do_normalize=do_normalize,
            do_binarize=do_binarize,
            do_convert_grayscale=do_convert_grayscale,
        )

    @staticmethod
    def downsample(mask: ms.Tensor, batch_size: int, num_queries: int, value_embed_dim: int):
        """
        Downsamples the provided mask tensor to match the expected dimensions for scaled dot-product attention. If the
        aspect ratio of the mask does not match the aspect ratio of the output image, a warning is issued.

        Args:
            mask (`ms.Tensor`):
                The input mask tensor generated with `IPAdapterMaskProcessor.preprocess()`.
            batch_size (`int`):
                The batch size.
            num_queries (`int`):
                The number of queries.
            value_embed_dim (`int`):
                The dimensionality of the value embeddings.

        Returns:
            `ms.Tensor`:
                The downsampled mask tensor.

        """
        o_h = mask.shape[1]
        o_w = mask.shape[2]
        ratio = o_w / o_h
        mask_h = int(math.sqrt(num_queries / ratio))
        mask_h = int(mask_h) + int((num_queries % int(mask_h)) != 0)
        mask_w = num_queries // mask_h

        mask_downsample = ops.interpolate(mask.unsqueeze(0), size=(mask_h, mask_w), mode="bicubic").squeeze(0)

        # Repeat batch_size times
        if mask_downsample.shape[0] < batch_size:
            mask_downsample = mask_downsample.tile((batch_size, 1, 1))

        mask_downsample = mask_downsample.view(mask_downsample.shape[0], -1)

        downsampled_area = mask_h * mask_w
        # If the output image and the mask do not have the same aspect ratio, tensor shapes will not match
        # Pad tensor if downsampled_mask.shape[1] is smaller than num_queries
        if downsampled_area < num_queries:
            mask_downsample = ops.Pad(paddings=((0, 0), (0, num_queries - mask_downsample.shape[1])))(mask_downsample)
        # Discard last embeddings if downsampled_mask.shape[1] is bigger than num_queries
        if downsampled_area > num_queries:
            mask_downsample = mask_downsample[:, :num_queries]

        # Repeat last dimension to match SDPA output shape
        mask_downsample = mask_downsample.view(mask_downsample.shape[0], mask_downsample.shape[1], 1).tile(
            (1, 1, value_embed_dim)
        )

        return mask_downsample

`mindone.diffusers.image_processor.IPAdapterMaskProcessor.downsample(mask, batch_size, num_queries, value_embed_dim)` `staticmethod` ¶

Downsamples the provided mask tensor to match the expected dimensions for scaled dot-product attention. If the aspect ratio of the mask does not match the aspect ratio of the output image, a warning is issued.

PARAMETER	DESCRIPTION
`mask`	The input mask tensor generated with `IPAdapterMaskProcessor.preprocess()`. TYPE: `ms.Tensor`
`batch_size`	The batch size. TYPE: `int`
`num_queries`	The number of queries. TYPE: `int`
`value_embed_dim`	The dimensionality of the value embeddings. TYPE: `int`

RETURNS	DESCRIPTION
	`ms.Tensor`: The downsampled mask tensor.

Source code in mindone/diffusers/image_processor.py

@staticmethod
def downsample(mask: ms.Tensor, batch_size: int, num_queries: int, value_embed_dim: int):
    """
    Downsamples the provided mask tensor to match the expected dimensions for scaled dot-product attention. If the
    aspect ratio of the mask does not match the aspect ratio of the output image, a warning is issued.

    Args:
        mask (`ms.Tensor`):
            The input mask tensor generated with `IPAdapterMaskProcessor.preprocess()`.
        batch_size (`int`):
            The batch size.
        num_queries (`int`):
            The number of queries.
        value_embed_dim (`int`):
            The dimensionality of the value embeddings.

    Returns:
        `ms.Tensor`:
            The downsampled mask tensor.

    """
    o_h = mask.shape[1]
    o_w = mask.shape[2]
    ratio = o_w / o_h
    mask_h = int(math.sqrt(num_queries / ratio))
    mask_h = int(mask_h) + int((num_queries % int(mask_h)) != 0)
    mask_w = num_queries // mask_h

    mask_downsample = ops.interpolate(mask.unsqueeze(0), size=(mask_h, mask_w), mode="bicubic").squeeze(0)

    # Repeat batch_size times
    if mask_downsample.shape[0] < batch_size:
        mask_downsample = mask_downsample.tile((batch_size, 1, 1))

    mask_downsample = mask_downsample.view(mask_downsample.shape[0], -1)

    downsampled_area = mask_h * mask_w
    # If the output image and the mask do not have the same aspect ratio, tensor shapes will not match
    # Pad tensor if downsampled_mask.shape[1] is smaller than num_queries
    if downsampled_area < num_queries:
        mask_downsample = ops.Pad(paddings=((0, 0), (0, num_queries - mask_downsample.shape[1])))(mask_downsample)
    # Discard last embeddings if downsampled_mask.shape[1] is bigger than num_queries
    if downsampled_area > num_queries:
        mask_downsample = mask_downsample[:, :num_queries]

    # Repeat last dimension to match SDPA output shape
    mask_downsample = mask_downsample.view(mask_downsample.shape[0], mask_downsample.shape[1], 1).tile(
        (1, 1, value_embed_dim)
    )

    return mask_downsample