AuraSR v2: Second-gen GAN-based Super-Resolution for real-world applications
Qwen 2: A 0.5 billion parameter language model from Alibaba Cloud, fine tuned for chat completions
Generates realistic talking face animations from a portrait image and audio using the CVPR 2025 Sonic model
STAR Video Upscaler: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
🎙️Hololive text-to-speech and voice-to-voice (Japanese🇯🇵 + English🇬🇧)
Unofficial Re-Trained AnimateAnyone (Image + DWPose Video → Animated Video of Image)
AuraSR: GAN-based Super-Resolution for real-world
Idefics3-8B-Llama3, Answers questions and caption about images
🎨AnimateDiff Prompt Travel🧭 Seamlessly Navigate and Animate Between Text-to-Image Prompts for Dynamic Visual Narratives
Audio-driven multi-person conversational video generation - Upload audio files and a reference image to create realistic conversations between multiple people
Create song covers with any RVC v2 trained AI voice from audio files.
allenai/Molmo-7B-D-0924, Answers questions and caption about images
🐲 DragGAN 🐉 - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold"
Jina-CLIP v2: 0.9B multimodal embedding model with 89-language multilingual support, 512x512 image resolution, and Matryoshka representations
Merge multiple images into clean horizontal or vertical strips with precise alignment and sizing controls.
Image tagger fine-tuned on WaifuDiffusion w/ (SwinV2, SwinV2, ConvNext, and ViT)
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
🎨 AnimateDiff (w/ MotionLoRAs for Panning, Zooming, etc): Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Dkamacho’s Scene Assembler
🕹️FramePack: video diffusion that feels like image diffusion🎥
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
This model runs on L40S. View more.