Limitations¶
Due to differences in framework, some APIs & models will not be identical to huggingface/diffusers in the foreseeable future.
APIs¶
xxx.from_pretrained¶
torch_dtypeis renamed tomindspore_dtypedevice_map,max_memory,offload_folder,offload_state_dict,low_cpu_mem_usagewill not be supported.
BaseOutput¶
- Default value of
return_dictis changed toFalse, forGRAPH_MODEdoes not allow to construct an instance of it.
Output of AutoencoderKL.encode¶
Unlike the output posterior = DiagonalGaussianDistribution(latent), which can do sampling by posterior.sample().
We can only output the latent and then do sampling through AutoencoderKL.diag_gauss_dist.sample(latent).
Models¶
The table below represents the current support in mindone/diffusers for each of those modules, whether they have support in Pynative fp16 mode, Graph fp16 mode, Pynative fp32 mode or Graph fp32 mode.
| Names | Pynative FP16 | Pynative FP32 | Graph FP16 | Graph FP32 | Description |
|---|---|---|---|---|---|
| StableCascadeUNet | โ | โ | โ | โ | huggingface/diffusers output NaN when using float16. |
| nn.Conv3d | โ | โ | โ | โ | FP32 is not supported on Ascend |
| TemporalConvLayer | โ | โ | โ | โ | contains nn.Conv3d |
| TemporalResnetBlock | โ | โ | โ | โ | contains nn.Conv3d |
| SpatioTemporalResBlock | โ | โ | โ | โ | contains TemporalResnetBlock |
| UNetMidBlock3DCrossAttn | โ | โ | โ | โ | contains TemporalConvLayer |
| CrossAttnDownBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
| DownBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
| CrossAttnUpBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
| UpBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
| MidBlockTemporalDecoder | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| UpBlockTemporalDecoder | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| UNetMidBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| DownBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| CrossAttnDownBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| UpBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| CrossAttnUpBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
| TemporalDecoder | โ | โ | โ | โ | contains nn.Conv3d, MidBlockTemporalDecoder etc. |
| UNet3DConditionModel | โ | โ | โ | โ | contains UNetMidBlock3DCrossAttn etc. |
| I2VGenXLUNet | โ | โ | โ | โ | contains UNetMidBlock3DCrossAttn etc. |
| AutoencoderKLTemporalDecoder | โ | โ | โ | โ | contains MidBlockTemporalDecoder etc. |
| UNetSpatioTemporalConditionModel | โ | โ | โ | โ | contains UNetMidBlockSpatioTemporal etc. |
| FirUpsample2D | โ | โ | โ | โ | ops.Conv2D has poor precision in fp16 and PyNative mode |
| FirDownsample2D | โ | โ | โ | โ | ops.Conv2D has poor precision in fp16 and PyNative mode |
| AttnSkipUpBlock2D | โ | โ | โ | โ | contains FirUpsample2D |
| SkipUpBlock2D | โ | โ | โ | โ | contains FirUpsample2D |
| AttnSkipDownBlock2D | โ | โ | โ | โ | contains FirDownsample2D |
| SkipDownBlock2D | โ | โ | โ | โ | contains FirDownsample2D |
| ResnetBlock2D (kernel='fir') | โ | โ | โ | โ | ops.Conv2D has poor precision in fp16 and PyNative mode |