✋ This model is not published yet.

You can claim this model if you're @google-research on GitHub. Contact us.

google-research / maskgit

Masked Generative Image Transformer

  • Public
  • 136 runs
  • T4
  • GitHub
  • Paper
  • License

Input

string

Choose task type.

Default: "Class-conditional Image Synthesis"

*string

Choose the ImageNet label, which determines what type of object to synthesize or edit to.

integer

Choose the size of the generated image. Output will generate 8 images.

Default: 256

integer

set random seed

Default: 42

file

Provide input image for Class-conditional Image Editing. The image will be resized to image_size.

string
Shift + Return to add a new line

For Class-conditional Image Editing, provide the area for image editing in the format of top_left_height_width, e.g. input image 128_64_256_288. Output will show the resized image with the box highlighting the edited area and 8 edited images.

Output

output
Generated in

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

MaskGIT: Masked Generative Image Transformer

Official Jax Implementation of the CVPR 2022 Paper

Summary

MaskGIT is a novel image synthesis paradigm using a bidirectional transformer decoder. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.

BibTeX

@InProceedings{chang2022maskgit,
  title = {MaskGIT: Masked Generative Image Transformer},
  author={Huiwen Chang and Han Zhang and Lu Jiang and Ce Liu and William T. Freeman},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2022}
}