This is a cog implementation of https://github.com/google-research/pix2seq
Pix2Seq - A general framework for turning RGB pixels into semantically meaningful sequences
This is the official implementation of Pix2Seq in Tensorflow 2 with efficient TPUs/GPUs support as well as interactive debugging similar to Pytorch.
An illustration of Pix2Seq for object detection (from our Google AI blog post).
Cite
@article{chen2021pix2seq,
title={Pix2seq: A language modeling framework for object detection},
author={Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J and Hinton, Geoffrey},
journal={arXiv preprint arXiv:2109.10852},
year={2021}
}