chenxwh / oneformer

One Transformer to Rule Universal Image Segmentation

No versions have been pushed to this model yet.


OneFormer: One Transformer to Rule Universal Image Segmentation

Jitesh Jain, Jiachen Li†, MangTik Chiu†, Ali Hassani, Nikita Orlov, Humphrey Shi

† Equal Contribution

This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.


  • OneFormer is the first multi-task universal image segmentation framework based on transformers.
  • OneFormer needs to be trained only once with a single universal architecture, a single model, and on a single dataset , to outperform existing frameworks across semantic, instance, and panoptic segmentation tasks.
  • OneFormer uses a task-conditioned joint training strategy, uniformly sampling different ground truth domains (semantic instance, or panoptic) by deriving all labels from panoptic annotations to train its multi-task model.
  • OneFormer uses a task token to condition the model on the task in focus, making our architecture task-guided for training, and task-dynamic for inference, all with a single model.



If you found OneFormer useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},


We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.