Get Pretrained Txt/Img Encoder from 🤗 Transformers¶

This MindSpore patch for 🤗 Transformers enables researchers or developers in the field of text-to-image (t2i) and text-to-video (t2v) generation to utilize pretrained text and image models from 🤗 Transformers on MindSpore. The pretrained models from 🤗 Transformers can be employed either as frozen encoders or fine-tuned with denoising networks for generative tasks. This approach aligns with the practices of PyTorch users^[1][2]. Now, MindSpore users can benefit from the same functionality!

Philosophy¶

Only the MindSpore model definition will be implemented, which will be identical to the PyTorch model.
Configuration, Tokenizer, etc. will utilize the original 🤗 Transformers.
Models here will be limited to the scope of generative tasks.