chenxwh / pix2seq

Turning RGB pixels into semantically meaningful sequences

No versions have been pushed to this model yet.

Readme

This is a cog implementation of https://github.com/google-research/pix2seq

Pix2Seq - A general framework for turning RGB pixels into semantically meaningful sequences

This is the official implementation of Pix2Seq in Tensorflow 2 with efficient TPUs/GPUs support as well as interactive debugging similar to Pytorch.

Pix2Seq Illustration

An illustration of Pix2Seq for object detection (from our Google AI blog post).

Cite

Pix2seq paper:

@article{chen2021pix2seq,
  title={Pix2seq: A language modeling framework for object detection},
  author={Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2109.10852},
  year={2021}
}