nateraw
/
video-llava
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Want to make some of these yourself?
Run this model