lucataco / kosmos-2

Grounding Multimodal Large Language Models to the World

  • Public
  • 1.9K runs
  • L40S
  • GitHub
  • License
  • Prediction

    lucataco/kosmos-2:3e7b211c29c092f4bcc8853922cc986baa52efe255876b80cac2c2fbb4aff805
    ID
    z6jvacdbl7t3c4233kqqilzaoa
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    description_type
    Brief

    Output

    An image of a snowman warming himself by a campfire. [('a snowman', (12, 21), [(0.390625, 0.046875, 0.984375, 0.828125)]), ('a campfire', (41, 51), [(0.109375, 0.640625, 0.546875, 0.984375)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:3e7b211c29c092f4bcc8853922cc986baa52efe255876b80cac2c2fbb4aff805
    ID
    x67z6i3bwdqsygwh2ib4evrtym
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    description_type
    Detailed

    Output

    Describe this image in detail: The image features a snowman sitting by a campfire in the snow. He is wearing a hat, scarf, and gloves, with a pot nearby and a cup nearby. The snowman appears to be enjoying the warmth of the fire, and it appears to have a warm and cozy atmosphere. [('a campfire', (71, 81), [(0.171875, 0.015625, 0.484375, 0.984375)]), ('a hat', (109, 114), [(0.515625, 0.046875, 0.828125, 0.234375)]), ('scarf', (116, 121), [(0.515625, 0.234375, 0.890625, 0.578125)]), ('gloves', (127, 133), [(0.515625, 0.390625, 0.640625, 0.515625)]), ('a pot', (140, 145), [(0.078125, 0.609375, 0.265625, 0.859375)]), ('a cup', (157, 162), [(0.890625, 0.765625, 0.984375, 0.984375)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    3msw4b3bl3nsdkexyacy4av5lq
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Detailed

    Output

    img

    img

    text

    Describe this image in detail: The image features a snowman sitting by a campfire in the snow. He is wearing a hat, scarf, and gloves, with a pot nearby and a cup nearby. The snowman appears to be enjoying the warmth of the fire, and it appears to have a warm and cozy atmosphere. [('a campfire', (71, 81), [(0.171875, 0.015625, 0.484375, 0.984375)]), ('a hat', (109, 114), [(0.515625, 0.046875, 0.828125, 0.234375)]), ('scarf', (116, 121), [(0.515625, 0.234375, 0.890625, 0.578125)]), ('gloves', (127, 133), [(0.515625, 0.390625, 0.640625, 0.515625)]), ('a pot', (140, 145), [(0.078125, 0.609375, 0.265625, 0.859375)]), ('a cup', (157, 162), [(0.890625, 0.765625, 0.984375, 0.984375)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    znjksy3bwob6sjq6lrls77xhoe
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Brief

    Output

    An image of two puppies sitting in the grass [('two puppies', (12, 23), [(0.484375, 0.078125, 0.765625, 0.984375), (0.234375, 0.140625, 0.515625, 0.984375)]), ('the grass', (35, 44), [(0.015625, 0.015625, 0.984375, 0.984375)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    62y6t2lbm3fpto33vcbe76vj7y
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Detailed

    Output

    Describe this image in detail: Two adorable Golden Retriever puppies sit side by side in a field of orange flowers. They are both looking at the camera with their tongues hanging out. [('Two adorable Golden Retriever puppies', (31, 68), [(0.515625, 0.078125, 0.765625, 0.984375), (0.234375, 0.140625, 0.515625, 0.984375)]), ('orange flowers', (100, 114), [(0.015625, 0.453125, 0.984375, 0.984375)]), ('their tongues', (157, 170), [(0.359375, 0.421875, 0.421875, 0.578125), (0.640625, 0.328125, 0.703125, 0.453125)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    vkxpaj3bxdp4zxzvt6yew75zb4
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Detailed

    Output

    img

    img

    text

    Describe this image in detail: Two adorable Golden Retriever puppies sit side by side in a field of orange flowers. They are both looking at the camera with their tongues hanging out. [('Two adorable Golden Retriever puppies', (31, 68), [(0.515625, 0.078125, 0.765625, 0.984375), (0.234375, 0.140625, 0.515625, 0.984375)]), ('orange flowers', (100, 114), [(0.015625, 0.453125, 0.984375, 0.984375)]), ('their tongues', (157, 170), [(0.359375, 0.421875, 0.421875, 0.578125), (0.640625, 0.328125, 0.703125, 0.453125)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    444qsetbs6gujmu6cdaga4e4iu
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Brief

    Output

    An image of a group of fighter planes flying in formation [('a group of fighter planes flying in formation', (12, 57), [(0.453125, 0.234375, 0.546875, 0.328125), (0.640625, 0.203125, 0.703125, 0.296875), (0.296875, 0.328125, 0.390625, 0.390625), (0.171875, 0.578125, 0.265625, 0.671875), (0.265625, 0.421875, 0.359375, 0.484375), (0.453125, 0.453125, 0.515625, 0.515625)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    ruhg5g3bbe7xgic73gxb6xus7a
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Detailed

    Output

    Describe this image in detail: Six airplanes are flying in a row, creating a smoke trail. [('Six airplanes', (31, 44), [(0.453125, 0.234375, 0.546875, 0.328125), (0.640625, 0.203125, 0.703125, 0.296875), (0.296875, 0.296875, 0.390625, 0.390625), (0.171875, 0.578125, 0.265625, 0.671875), (0.265625, 0.421875, 0.359375, 0.484375), (0.453125, 0.453125, 0.515625, 0.515625)]), ('a smoke trail', (75, 88), [(0.171875, 0.234375, 0.984375, 0.984375)])]
    Generated in
  • Prediction

    lucataco/kosmos-2:d5098d8db2a801b45ca11451a0ce421e27353b0298fb3aeba4a9055bd67c582a
    ID
    3qnvxadbsfqjggfnmmj7zb626i
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image
    image
    visual_output
    description_type
    Detailed

    Output

    img

    img

    text

    Describe this image in detail: Six airplanes are flying in a row, creating a smoke trail. [('Six airplanes', (31, 44), [(0.453125, 0.234375, 0.546875, 0.328125), (0.640625, 0.203125, 0.703125, 0.296875), (0.296875, 0.296875, 0.390625, 0.390625), (0.171875, 0.578125, 0.265625, 0.671875), (0.265625, 0.421875, 0.359375, 0.484375), (0.453125, 0.453125, 0.515625, 0.515625)]), ('a smoke trail', (75, 88), [(0.171875, 0.234375, 0.984375, 0.984375)])]
    Generated in

Want to make some of these yourself?

Run this model