OneFormer: One Transformer to Rule Universal Image Segmentation
Jitesh Jain, Jiachen Li†, MangTik Chiu†, Ali Hassani, Nikita Orlov, Humphrey Shi
† Equal Contribution
This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.
Features
- OneFormer is the first multi-task universal image segmentation framework based on transformers.
- OneFormer needs to be trained only once with a single universal architecture, a single model, and on a single dataset , to outperform existing frameworks across semantic, instance, and panoptic segmentation tasks.
- OneFormer uses a task-conditioned joint training strategy, uniformly sampling different ground truth domains (semantic instance, or panoptic) by deriving all labels from panoptic annotations to train its multi-task model.
- OneFormer uses a task token to condition the model on the task in focus, making our architecture task-guided for training, and task-dynamic for inference, all with a single model.
Citation
If you found OneFormer useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!
@inproceedings{jain2022oneformer,
title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
journal={CVPR},
year={2023}
}
Acknowledgement
We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.