cjwbw / textdiffuser

Diffusion Models as Text Painters

  • Public
  • 1.7K runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.12 to run on Replicate, or 8 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 10 minutes. The predict time for this model varies significantly based on the inputs.

Readme

TextDiffuser: Diffusion Models as Text Painters

TextDiffuser generates images with visually appealing text that is coherent with backgrounds. It is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text.

Highlights

  • We propose TextDiffuser, which is a two-stage diffusion-based framework for text rendering. It generates accurate and coherent text images from text prompts or additionally with template images, as well as conducting text inpainting to reconstruct incomplete images.

  • We release MARIO-10M, containing large-scale image-text pairs with OCR annotations, including text recognition, detection, and character-level segmentation masks. (To be released)

Acknowledgement

We sincerely thank the following projects: Hugging Face Diffuser, LAION, DB, PARSeq, img2dataset.

Also, special thanks to the open-source diffusion project or available demo: DALLE, Stable Diffusion, Stable Diffusion XL, Midjourney, ControlNet, DeepFloyd.

Contact

For help or issues using TextDiffuser, please email Jingye Chen (qwerty.chen@connect.ust.hk), Yupan Huang (huangyp28@mail2.sysu.edu.cn) or submit a GitHub issue.

For other communications related to TextDiffuser, please contact Lei Cui (lecu@microsoft.com) or Furu Wei (fuwei@microsoft.com).

Citation

If you find this code useful in your research, please consider citing:

@article{chen2023textdiffuser,
  title={TextDiffuser: Diffusion Models as Text Painters},
  author={Chen, Jingye and Huang, Yupan and Lv, Tengchao and Cui, Lei and Chen, Qifeng and Wei, Furu},
  journal={arXiv preprint arXiv:2305.10855},
  year={2023}
}