TextDiffuser: Diffusion Models as Text Painters
TextDiffuser generates images with visually appealing text that is coherent with backgrounds. It is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.
Highlights
-
We propose TextDiffuser, which is a two-stage diffusion-based framework for text rendering. It generates accurate and coherent text images from text prompts or additionally with template images, as well as conducting text inpainting to reconstruct incomplete images.
-
We release MARIO-10M, containing large-scale image-text pairs with OCR annotations, including text recognition, detection, and character-level segmentation masks. (To be released)
Acknowledgement
We sincerely thank the following projects: Hugging Face Diffuser, LAION, DB, PARSeq, img2dataset.
Also, special thanks to the open-source diffusion project or available demo: DALLE, Stable Diffusion, Stable Diffusion XL, Midjourney, ControlNet, DeepFloyd.
Contact
For help or issues using TextDiffuser, please email Jingye Chen (qwerty.chen@connect.ust.hk), Yupan Huang (huangyp28@mail2.sysu.edu.cn) or submit a GitHub issue.
For other communications related to TextDiffuser, please contact Lei Cui (lecu@microsoft.com) or Furu Wei (fuwei@microsoft.com).
Citation
If you find this code useful in your research, please consider citing:
@article{chen2023textdiffuser,
title={TextDiffuser: Diffusion Models as Text Painters},
author={Chen, Jingye and Huang, Yupan and Lv, Tengchao and Cui, Lei and Chen, Qifeng and Wei, Furu},
journal={arXiv preprint arXiv:2305.10855},
year={2023}
}