bytedance / sa2va-26b-image

Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

  • Public
  • 1.1K runs
  • A100 (80GB)
  • GitHub
  • Weights
  • Paper
  • License
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    twn4165kb9rj60cn5gg86kwycw
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    please segment the woman dancing in a blue dress

    Output

    img

    img

    response

    Sure, [SEG] .<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    vrc0ya2vcnrj20cn5ggse1dw28
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    describe the image

    Output

    The image depicts a woman in a blue dress dancing in the middle of a street. She is surrounded by a crowd of people who are watching her performance. There are also musicians playing guitars in the background, adding to the lively atmosphere. The scene appears to be a street performance or a cultural event, with the woman as the main focus.<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    nszcy6gtknrj00cn5ghbj6v5n8
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    caption this image

    Output

    A snowboarder in a white jacket and black pants is performing a jump on a snowy mountain slope, with a stunning mountain range in the background.<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    eqz94j2d8drj20cn5gh95p53p4
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    segment the snowboarder

    Output

    img

    img

    response

    Sure, [SEG] .<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    525scwzcz1rj60cn5gh9j3c6gw
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    what is the giraffe doing?

    Output

    The giraffe is standing under a tree, reaching up to eat leaves from the branches.<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    tj9fjars4srj40cn5ghrgfpsp8
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    segment the giraffe

    Output

    img

    img

    response

    Sure, [SEG] .<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    9k3n2mcxq1rj00cn5ghrh547z8
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    segment the dinosaur to the left of the sunglasses

    Output

    img

    img

    response

    Sure, [SEG] .<|im_end|>
    Generated in
  • Prediction

    bytedance/sa2va-26b-image:addd35cc
    ID
    3zrgeppa69rj00cn5ghszr28aw
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (80GB)
    Total duration
    Created

    Input

    image
    image
    instruction
    how many feathers are there?

    Output

    There are two feathers in the image.<|im_end|>
    Generated in

Want to make some of these yourself?

Run this model