Pheme generates a variety of conversational voices in 16 kHz for phone-call applications
A Step Towards Music Generation Foundation Model text2music
Ostris AI-Toolkit for Flux LoRA Training (DEPRECATED. Please use: ostris/flux-dev-lora-trainer)
Animate Your Personalized Text-to-Image Diffusion Models
Animate Your Personalized Text-to-Image Diffusion Models with SDXL and LCM
AnimateDiff video to video
Apollo 3B - An Exploration of Video Understanding in Large Multimodal Models
Apollo 7B - An Exploration of Video Understanding in Large Multimodal Models
A fully open-sourced, large flow-based text-to-image generation model
BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture
BGE-M3, the first embedding model which supports multiple retrieval mode, multilingual and multi-granularity retrieval.
BLIP3(XGen-MM) is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research
An SDXL fine-tune based on blueprints
Video Preprocessing tool for captioning multiple videos using GPT, Claude or Gemini
CLIP Interrogator (for faster inference)
openai/clip-vit-large-patch32
Robust face restoration algorithm for old photos/AI-generated faces
Salesforce/codegen2-1B
CogvideoX Keyframe Interpolation by Zhengcong Fei
CogView-4 model, which has 6B parameters, supports native Chinese input, and Chinese text-to-image generation.
Using a ComfyUI workflow to run SDXL text2img
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.