xiankgx / panda-70m-video-captioning

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

  • Public
  • 1 run
  • L40S
  • GitHub
  • Paper
  • License

Input

*
This input type is only available via the API.

Video path.

string
Shift + Return to add a new line

prompt

Default: "Please faithfully summarize the following video in one sentence."

Output

No output yet! Press "Submit" to start a prediction.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

We propose a video captioning model to generate a caption for a short video clip. The model includes vision (green) and textual (blue) branches to benefit video captioning by both video and text inputs. We release the checkpoint trained on Panda-70M.