cjwbw / pix2struct

Pix2Struct: Screenshot Parsing as Pretraining for Visual Language Understanding

  • Public
  • 6.1K runs
  • GitHub
  • Paper
Iterate in playground
  • Prediction

    cjwbw/pix2struct:e32d77481424b47e7959836638b62082d8528b0c66a3a30eedca3970aaf786e7
    ID
    xmbwph5cunhytna3aacbbpofcq
    Status
    Succeeded
    Source
    Web
    Hardware
    Total duration
    Created

    Input

    text
    What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud
    image
    image
    model_name
    ai2d

    Output

    ash cloud
    Generated in
  • Prediction

    cjwbw/pix2struct:e32d77481424b47e7959836638b62082d8528b0c66a3a30eedca3970aaf786e7
    ID
    6hww4ae2ebgnrdsc4qi43jv2fa
    Status
    Succeeded
    Source
    Web
    Hardware
    Total duration
    Created
    by @chenxwh

    Input

    text
     
    image
    image
    model_name
    screen2words

    Output

    page displaying the discord
    Generated in
  • Prediction

    cjwbw/pix2struct:e32d77481424b47e7959836638b62082d8528b0c66a3a30eedca3970aaf786e7
    ID
    6y454gv2jzd2dmt6a7zx6hkxby
    Status
    Succeeded
    Source
    Web
    Hardware
    Total duration
    Created

    Input

    text
     
    image
    image
    model_name
    textcaps

    Output

    A street scene with a stop sign and a sign that says Optus.
    Generated in

Want to make some of these yourself?

Run this model