chenxwh / pix2seq

Turning RGB pixels into semantically meaningful sequences

No versions have been pushed to this model yet.


This is a cog implementation of

Pix2Seq - A general framework for turning RGB pixels into semantically meaningful sequences

This is the official implementation of Pix2Seq in Tensorflow 2 with efficient TPUs/GPUs support as well as interactive debugging similar to Pytorch.

Pix2Seq Illustration

An illustration of Pix2Seq for object detection (from our Google AI blog post).


Pix2seq paper:

  title={Pix2seq: A language modeling framework for object detection},
  author={Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2109.10852},