zsxkib / jina-clip-v2
Jina-CLIP v2: 0.9B multimodal embedding model with 89-language multilingual support, 512x512 image resolution, and Matryoshka representations
zsxkib / samurai
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
zsxkib / allegro
Powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text prompt
zsxkib / prototype-model
A test model
zsxkib / pyramid-flow
Text-to-Video + Image-to-Video: Pyramid Flow Autoregressive Video Generation method based on Flow Matching
zsxkib / flux-abstract-beings
Surrealist digital art featuring whimsical, anthropomorphic characters with exaggerated textures and vibrant color blocking
zsxkib / flux-caricature
zsxkib / molmo-7b
allenai/Molmo-7B-D-0924, Answers questions and caption about images
zsxkib / flux-music
🎼FluxMusic Text-to-Music Generation with Rectified Flow Transformer🎶
zsxkib / flux-dev-inpainting-controlnet
FLUX.1-dev Inpainting ControlNet model
zsxkib / flux-pulid
⚡️FLUX PuLID: FLUX-dev based Pure and Lightning ID Customization via Contrastive Alignment🎭
zsxkib / flux-dev-inpainting
🎨 Fill in masked parts of images with FLUX.1-dev 🖌️
zsxkib / instant-id-basic
Cubiq's ComfyUI InstantID node running `instantid_basic.json` example
zsxkib / flux-schnell-inpainting
🎨 Fill in masked parts of images with FLUX.1-schnell 🖌️
zsxkib / idefics3
Idefics3-8B-Llama3, Answers questions and caption about images
zsxkib / aura-sr-v2
AuraSR v2: Second-gen GAN-based Super-Resolution for real-world applications
zsxkib / mimic-motion
MimicMotion: High-quality human motion video generation with pose-guided control
zsxkib / instant-id-ipadapter-plus-face
Make realistic images of real people instantly (w/ ip-adapter-plus-face_sdxl_vit-h)
zsxkib / whisper-lazyloading
Convert speech in audio to text w/ `tiny`, `small`, `base`, and `large-v3` models
zsxkib / aura-sr
AuraSR: GAN-based Super-Resolution for real-world
zsxkib / qwen2-7b-instruct
Qwen 2: A 7 billion parameter language model from Alibaba Cloud, fine tuned for chat completions
zsxkib / qwen2-1.5b-instruct
Qwen 2: A 1.5 billion parameter language model from Alibaba Cloud, fine tuned for chat completions
zsxkib / qwen2-0.5b-instruct
Qwen 2: A 0.5 billion parameter language model from Alibaba Cloud, fine tuned for chat completions
zsxkib / llm-prototype-model
zsxkib / sd3-controlnet
✨Stable Diffusion 3 w/ ⚡InstantX's Canny, Pose, and Tile ControlNets🖼️
zsxkib / v-express
🫦 Realistic facial expression manipulation (lip-syncing) using audio or video
zsxkib / hololive-style-bert-vits2
🎙️Hololive text-to-speech and voice-to-voice (Japanese🇯🇵 + English🇬🇧)
zsxkib / instant-id
Make realistic images of real people instantly
zsxkib / wd-image-tagger
Image tagger fine-tuned on WaifuDiffusion w/ (SwinV2, SwinV2, ConvNext, and ViT)
zsxkib / ic-light
✍️✨Prompts to auto-magically relights your images
zsxkib / ic-light-background
🖼️✨Background images + prompts to auto-magically relights your images (+normal maps🗺️)
zsxkib / pulid
📖 PuLID: Pure and Lightning ID Customization via Contrastive Alignment
zsxkib / blip-3
Blip 3 / XGen-MM, Answers questions about images ({blip3,xgen-mm}-phi3-mini-base-r-v1)
zsxkib / talknet-asd
🗣️ TalkNet-ASD: Detect who is speaking in a video
zsxkib / flash-face
FlashFace: Human Image Personalization with High-fidelity Identity Preservation
zsxkib / animate-diff-scene-assembler
Dkamacho’s Scene Assembler
zsxkib / yolo-world
Real-Time Open-Vocabulary Object Detection
zsxkib / uform-gen
🖼️ Super fast 1.5B Image Captioning/VQA Multimodal LLM (Image-to-Text) 🖋️
zsxkib / moore-animateanyone
Unofficial Re-Trained AnimateAnyone (Image + DWPose Video → Animated Video of Image)
zsxkib / patch-fusion
Super High Quality Depth Maps 🗺️: An End-to-End Tile-Based Framework 🏗️ for High-Resolution Monocular Metric Depth Estimation 🔍📏
zsxkib / tortoise-then-rvc
zsxkib / create-rvc-dataset
Create your own Realistic Voice Cloning (RVC v2) dataset using a YouTube link
zsxkib / realistic-voice-cloning
Create song covers with any RVC v2 trained AI voice from audio files.
zsxkib / stable-diffusion-safety-checker
Identifies NSFW images
zsxkib / animatediff-illusions
Monster Labs' Controlnet QR Code Monster v2 For SD-1.5 on top of AnimateDiff Prompt Travel (Motion Module SD 1.5 v2)
zsxkib / film-frame-interpolation-for-large-motion
FILM: Frame Interpolation for Large Motion, In ECCV 2022.
zsxkib / prototype-model2
zsxkib / animatediff-prompt-travel
🎨AnimateDiff Prompt Travel🧭 Seamlessly Navigate and Animate Between Text-to-Image Prompts for Dynamic Visual Narratives
zsxkib / diffbir
✨DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
zsxkib / st-mfnet
📽️ Increase Framerate 🎬 ST-MFNet: A Spatio-Temporal Multi-Flow Network for Frame Interpolation
zsxkib / animate-diff
🎨 AnimateDiff (w/ MotionLoRAs for Panning, Zooming, etc): Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
zsxkib / draggan
🐲 DragGAN 🐉 - "Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold"
zsxkib / lil-flan-bias-logits-warper
Logit Warping via Biases for Google's FLAN-T5-small
zsxkib / clip-age-predictor
Age prediction using CLIP - Patched version of `https://replicate.com/andreasjansson/clip-age-predictor` that works with the new version of cog!
zsxkib / emotion2color
Transform your text into a beautiful two-tone color gradient that represents your emotions.
zsxkib / hello-world
A "Hello World" model for me to get to grips with `cog` and Replicate
zsxkib / open-sora
zsxkib / illuminati-diffusion
🧿 Illuminati Diffusion w/ Textual Inversion Embeddings 🧬
zsxkib / qwen2-57b-a14b-instruct
zsxkib / aya-101
📚 Aya, an LLM by Cohere capable of understanding and generating content in 101 languages 🗣️
zsxkib / qwen2-0.5b-instruct-gptq-int8
zsxkib / qwen2-72b-instruct
zsxkib / test
zsxkib / animate-diff-prompt-walking
zsxkib / flux-tapoz-rf-inversion
zsxkib / trocr-base-handwritten
🖋️➡️📱Converts handwritten text images into digital text