Skip to content

Get Pretrained Txt/Img Encoder from ๐Ÿค— Transformers

This MindSpore patch for ๐Ÿค— Transformers enables researchers or developers in the field of text-to-image (t2i) and text-to-video (t2v) generation to utilize pretrained text and image models from ๐Ÿค— Transformers on MindSpore. The pretrained models from ๐Ÿค— Transformers can be employed either as frozen encoders or fine-tuned with denoising networks for generative tasks. This approach aligns with the practices of PyTorch users[1][2]. Now, MindSpore users can benefit from the same functionality!

Philosophy

  • Only the MindSpore model definition will be implemented, which will be identical to the PyTorch model.
  • Configuration, Tokenizer, etc. will utilize the original ๐Ÿค— Transformers.
  • Models here will be limited to the scope of generative tasks.