TextDiffuser: Diffusion Models as Text Painters

TextDiffuser generates images with visually appealing text that is coherent with backgrounds. It is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.

Highlights

We propose TextDiffuser, which is a two-stage diffusion-based framework for text rendering. It generates accurate and coherent text images from text prompts or additionally with template images, as well as conducting text inpainting to reconstruct incomplete images.
We release MARIO-10M, containing large-scale image-text pairs with OCR annotations, including text recognition, detection, and character-level segmentation masks. (To be released)

Acknowledgement

We sincerely thank the following projects: Hugging Face Diffuser, LAION, DB, PARSeq, img2dataset.

Also, special thanks to the open-source diffusion project or available demo: DALLE, Stable Diffusion, Stable Diffusion XL, Midjourney, ControlNet, DeepFloyd.

Contact

For help or issues using TextDiffuser, please email Jingye Chen (qwerty.chen@connect.ust.hk), Yupan Huang (huangyp28@mail2.sysu.edu.cn) or submit a GitHub issue.

For other communications related to TextDiffuser, please contact Lei Cui (lecu@microsoft.com) or Furu Wei (fuwei@microsoft.com).

Citation

If you find this code useful in your research, please consider citing:

@article{chen2023textdiffuser,
  title={TextDiffuser: Diffusion Models as Text Painters},
  author={Chen, Jingye and Huang, Yupan and Lv, Tengchao and Cui, Lei and Chen, Qifeng and Wei, Furu},
  journal={arXiv preprint arXiv:2305.10855},
  year={2023}
}

Model created over 1 year ago