Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models
Want to make some of these yourself?