Limitations¶
Due to differences in framework, some APIs & models will not be identical to huggingface/diffusers in the foreseeable future.
APIs¶
xxx.from_pretrained
¶
torch_dtype
is renamed tomindspore_dtype
device_map
,max_memory
,offload_folder
,offload_state_dict
,low_cpu_mem_usage
will not be supported.
BaseOutput
¶
- Default value of
return_dict
is changed toFalse
, forGRAPH_MODE
does not allow to construct an instance of it.
Output of AutoencoderKL.encode
¶
Unlike the output posterior = DiagonalGaussianDistribution(latent)
, which can do sampling by posterior.sample()
.
We can only output the latent
and then do sampling through AutoencoderKL.diag_gauss_dist.sample(latent)
.
Models¶
The table below represents the current support in mindone/diffusers for each of those modules, whether they have support in Pynative fp16 mode, Graph fp16 mode, Pynative fp32 mode or Graph fp32 mode.
Names | Pynative FP16 | Pynative FP32 | Graph FP16 | Graph FP32 | Description |
---|---|---|---|---|---|
StableCascadeUNet | โ | โ | โ | โ | huggingface/diffusers output NaN when using float16. |
nn.Conv3d | โ | โ | โ | โ | FP32 is not supported on Ascend |
TemporalConvLayer | โ | โ | โ | โ | contains nn.Conv3d |
TemporalResnetBlock | โ | โ | โ | โ | contains nn.Conv3d |
SpatioTemporalResBlock | โ | โ | โ | โ | contains TemporalResnetBlock |
UNetMidBlock3DCrossAttn | โ | โ | โ | โ | contains TemporalConvLayer |
CrossAttnDownBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
DownBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
CrossAttnUpBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
UpBlock3D | โ | โ | โ | โ | contains TemporalConvLayer |
MidBlockTemporalDecoder | โ | โ | โ | โ | contains SpatioTemporalResBlock |
UpBlockTemporalDecoder | โ | โ | โ | โ | contains SpatioTemporalResBlock |
UNetMidBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
DownBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
CrossAttnDownBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
UpBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
CrossAttnUpBlockSpatioTemporal | โ | โ | โ | โ | contains SpatioTemporalResBlock |
TemporalDecoder | โ | โ | โ | โ | contains nn.Conv3d, MidBlockTemporalDecoder etc. |
UNet3DConditionModel | โ | โ | โ | โ | contains UNetMidBlock3DCrossAttn etc. |
I2VGenXLUNet | โ | โ | โ | โ | contains UNetMidBlock3DCrossAttn etc. |
AutoencoderKLTemporalDecoder | โ | โ | โ | โ | contains MidBlockTemporalDecoder etc. |
UNetSpatioTemporalConditionModel | โ | โ | โ | โ | contains UNetMidBlockSpatioTemporal etc. |
FirUpsample2D | โ | โ | โ | โ | ops.Conv2D has poor precision in fp16 and PyNative mode |
FirDownsample2D | โ | โ | โ | โ | ops.Conv2D has poor precision in fp16 and PyNative mode |
AttnSkipUpBlock2D | โ | โ | โ | โ | contains FirUpsample2D |
SkipUpBlock2D | โ | โ | โ | โ | contains FirUpsample2D |
AttnSkipDownBlock2D | โ | โ | โ | โ | contains FirDownsample2D |
SkipDownBlock2D | โ | โ | โ | โ | contains FirDownsample2D |
ResnetBlock2D (kernel='fir') | โ | โ | โ | โ | ops.Conv2D has poor precision in fp16 and PyNative mode |