xiankgx/panda-70m-video-captioning

We propose a video captioning model to generate a caption for a short video clip. The model includes vision (green) and textual (blue) branches to benefit video captioning by both video and text inputs. We release the checkpoint trained on Panda-70M.

Model created over 1 year ago