lucataco/minicpm-v-2
OpenBMB MiniCPM-V 2.8B is a strong multimodal large language model for efficient end-side deployment
lucataco/nous-hermes-2-mixtral-8x7b-dpo
Nous Hermes 2 Mixtral 8x7B DPO is a Nous Research model trained over the Mixtral 8x7B MoE LLM
lucataco/sdxs-512-0.9
sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching
lucataco/mvsep-mdx23-music-separation
Model for Sound demixing challenge 2023: Music Demixing Track - MDX'23
lucataco/moondream2
moondream2 is a small vision language model designed to run efficiently on edge devices
lucataco/deepseek-vl-7b-base
DeepSeek-VL: An open-source Vision-Language Model designed for real-world vision and language understanding applications
lucataco/rembg-video
Remove video background
lucataco/clip-vit-base-patch32
openai/clip-vit-large-patch32
lucataco/sdxl-inpainting
SDXL Inpainting developed by the HF Diffusers team
lucataco/whisperspeech-small
An Open Source text-to-speech system built by inverting Whisper
lucataco/zeta-editing
Zero-Shot Text-Based Audio Editing Using DDPM Inversion
lucataco/differential-diffusion
Modify an image with a prompt and a depth image
lucataco/juggernaut-xl-v9
Juggernaut XL v9
lucataco/sdxl-lightning-multi-controlnet
SDXL lightning mult-controlnet, img2img & inpainting
lucataco/dreamshaper-xl-lightning
dreamshaper-xl-lightning is a Stable Diffusion model that has been fine-tuned on SDXL
lucataco/proteus-v0.4
ProteusV0.4: The Style Update
lucataco/animate-diff-vid2vid
AnimateDiff video to video
lucataco/depth-anything-video-sbs
POC implementation of Depth-anything to produce a 3D SBS video
lucataco/proteus-v0.4-lightning
ProteusV0.4: The Style Update - enhances stylistic capabilities, similar to Midjourney's approach, rather than advancing prompt comprehension
lucataco/rgb2grayscale-cuda
POC CUDA implementation of an rgb2grayscale function
lucataco/deep3d
Deep3D: Real-Time end-to-end 2D-to-3D Video Conversion, based on deep learning
lucataco/proteus-v0.3
ProteusV0.3: The Anime Update
lucataco/glpn-nyu
Global-Local Path Networks (GLPN) model trained on NYUv2 for Monocular Depth Estimation
lucataco/nomic-embed-text-v1
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
lucataco/depth-anything-video
Depth Anything on full video files
lucataco/phixtral-2x2_8
phixtral-2x2_8 is the first Mixure of Experts (MoE) made with two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture
lucataco/bge-m3
BGE-M3, the first embedding model which supports multiple retrieval mode, multilingual and multi-granularity retrieval.
lucataco/qwen1.5-72b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco/qwen1.5-14b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco/qwen1.5-7b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco/qwen1.5-4b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco/qwen1.5-1.8b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco/qwen1.5-0.5b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco/olmo-7b
OLMo is a series of Open Language Models designed to enable the science of language models
lucataco/rave
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
lucataco/diffusionlight
DiffusionLight: Light Probes by Painting a Chrome Ball
lucataco/phi-2
Phi-2 by Microsoft
lucataco/img-and-audio2video
Take an image and an audio file and create a video clip
lucataco/watermark_detector
amrul-hzz's fine-tuned version of vit-base-patch16-224-in21k for watermark image detection
lucataco/moondream1
(Research only) Moondream1 is a vision language model that performs on par with models twice its size
lucataco/proteus-v0.2
Proteus v0.2 shows subtle yet significant improvements over Version 0.1. It demonstrates enhanced prompt understanding that surpasses MJ6, while also approaching its stylistic capabilities.
lucataco/siglip
SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss
lucataco/wizardcoder-33b-v1.1-gguf
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
lucataco/magnet
MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer
lucataco/proteus-v0.1
ProteusV0.1 uses OpenDalleV1.1 as a base and further refines prompt adherence and stylistic capabilities to a measurable degree
lucataco/pheme
Pheme generates a variety of conversational voices in 16 kHz for phone-call applications
lucataco/pasd-magnify
(Academic and Non-commercial use only) Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
lucataco/sdxl-deepcache
SDXL using DeepCache
lucataco/tinyllama-1.1b-chat-v1.0
This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
lucataco/open-dalle-v1.1
A unique fusion that showcases exceptional prompt adherence and semantic understanding, it seems to be a step above base SDXL and a step closer to DALLE-3 in terms of prompt comprehension
lucataco/diffusion-motion-transfer
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
lucataco/singing_voice_conversion
Amphion Singing Voice Conversion: DiffWaveNetSVC
lucataco/ip-adapter-faceid
(Research only) IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts
lucataco/dreamshaper-xl-turbo
DreamShaper is a general purpose SD model that aims at doing everything well, photos, art, anime, manga. It's designed to match Midjourney and DALL-E.
lucataco/dpo-sdxl
Direct Preference Optimization (DPO) is a method to align diffusion models to text human preferences by directly optimizing on human comparison data
lucataco/seamless_communication
FacebookResearch/SeamlessM4T v2 - Massively Multilingual & Multimodal Machine Translation
lucataco/stable-diffusion-x4-upscaler
Stable Diffusion x4 upscaler model
lucataco/resemble-enhance
AI-driven audio enhancement for your audio files, powered by Resemble AI
lucataco/segmind-vega
Segmind-Vega Model is a distilled version of SDXL, offering a 70% reduction in size and an 100% speedup
lucataco/style-aligned
GoogleAI: Style Aligned Image Generation via Shared Attention
lucataco/sdxl-img-blend
SDXL Image Blending
lucataco/demofusion-enhance
Image to Image enhancer using DemoFusion
lucataco/vid2openpose
Video to OpenPose
lucataco/magic-animate-openpose
MagicAnimate using an OpenPose input video
lucataco/playground-v2
Playground v2 is a diffusion-based text-to-image generative model trained from scratch. Try out all 3 models here
lucataco/vid2densepose
Convert your videos to DensePose and use it with MagicAnimate
lucataco/magic-animate
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
lucataco/cross-image-attention
Given two images depicting a source structure and a target appearance, generate an image merging the structure of one image with the appearance of the other
lucataco/pixart-xl-2
PixArt-Alpha 1024px is a transformer-based text-to-image diffusion system trained on text embeddings from T5
lucataco/pixart-lcm-xl-2
PixArt-Alpha LCM is a transformer-based text-to-image diffusion system trained on text embeddings from T5
lucataco/demofusion
DemoFusion: Democratising High-Resolution Image Generation With No 💰
lucataco/interpany-clearer
InterpAny-Clearer: Clearer anytime frame interpolation & Manipulated interpolation
lucataco/xtts-v2
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
lucataco/controlnet-tile
Controlnet v1.1 - Tile Version
lucataco/real-esrgan-video
Real-ESRGAN Video Upscaler
lucataco/seine
Image-to-video - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
lucataco/nsfw_image_detection
Falcons.ai Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
lucataco/animate-diff-sdxl-lcm
Animate Your Personalized Text-to-Image Diffusion Models with SDXL and LCM
lucataco/vseq2vseq
Text to video diffusion model with variable length frame conditioning for infinite length video
lucataco/dreamshaper7-img2img-lcm
Dreamshaper-7 img2img with LCM LoRA for faster inference
lucataco/realvisxl2-lcm
RealvisXL-v2.0 with LCM LoRA - requires fewer steps (4 to 8 instead of the original 40 to 50)
lucataco/modelscope-facefusion
Auto fuse a user's face onto the template image, with a similar appearance to the user
lucataco/ip_adapter-face-inpaint
A combination of ip_adapter SDv1.5 and mediapipe-face to inpaint a face
lucataco/sdxl-niji-se
SDXL_Niji_Special Edition
lucataco/sdxl-lcm-zeke
A fine-tuned SDXL-LCM LoRA based on the photos of Zeke
lucataco/sdxl-lcm
Latent Consistency Model (LCM): SDXL, distills the original model into a version that requires fewer steps (4 to 8 instead of the original 25 to 50)
lucataco/ip_adapter-sdxl-face
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDXL images with an image prompt
lucataco/sdxl-lcm-loras
POC of SDXL-LCM LoRA combined with Replicate LoRA for 4 second inference times
lucataco/lcm-ssd-1b
Latent Consistency Model (LCM): SSD-1B, is a LCM distilled version that reduces the number of inference steps needed to only 2 - 8 steps
lucataco/ip_adapter-face
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDv1.5 images with an image prompt
lucataco/realvisxl-v2.0
Implementation of SDXL RealVisXL_V2.0
lucataco/realvisxl2-lora-inference
POC to run inference on Realvisxl2 LoRAs
lucataco/realvisxl2-lora-training
POC to train Realvisxl2 LoRAs
lucataco/ssd-1b
Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of SDXL, offering a 60% speedup while maintaining high-quality text-to-image generation capabilities
lucataco/ssd-lora-inference
POC to run inference on SSD-1B LoRAs
lucataco/ssd-lora-training
POC to train SSD-1B LoRAs for cheaper & faster training
lucataco/ssd-1b-txt2img_batch
Batch mode for Segmind Stable Diffusion Model (SSD-1B) txt2img
lucataco/realvisxl-v2-img2img
Implementation of SDXL RealVisXL_V2.0 img2img
lucataco/thinkdiffusionxl
ThinkDiffusionXL is a go-to model capable of amazing photorealism that's also versatile enough to generate high-quality images across a variety of styles and subjects without needing to be a prompting genius
lucataco/kosmos-2
Grounding Multimodal Large Language Models to the World
lucataco/ssd-1b-img2img
Segmind Stable Diffusion Model (SSD-1B) img2img
lucataco/sdxl
SDXL v1.0 - A text-to-image generative AI model that creates beautiful images
lucataco/realvisxl-v1-img2img
Implementation of SDXL RealVisXL_V1.0 img2img
lucataco/dolphin-2.2.1-mistral-7b
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
lucataco/dolphin-2.1-mistral-7b
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
lucataco/mistrallite
MistralLiteA is a fine-tuned Mistral-7B-v0.1 language model, with enhanced capabilities of processing long context (up to 32K tokens)
lucataco/bakllava
BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture
lucataco/hotshot-xl
😊 Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL
lucataco/fuyu-8b
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI
lucataco/video-crafter
Open diffusion model for high-quality video generation
lucataco/qwen-vl-chat
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
lucataco/comfyui-sdxl-txt2img
Using a ComfyUI workflow to run SDXL text2img
lucataco/sadtalker
Stylized Audio-Driven Single Image Talking Face Animation
lucataco/sdxl-controlnet
SDXL ControlNet - Canny
lucataco/mistral-7b-v0.1
Mistral-7B-v0.1 is a pretrained generative text model that outperforms Llama 2 13B on all benchmarks
lucataco/animate-diff
Animate Your Personalized Text-to-Image Diffusion Models
lucataco/illusion-diffusion-hq
Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1
lucataco/remove-bg
Remove background from an image
lucataco/realvisxl-v1.0
Implementation of SDXL RealVisXL_V1.0
lucataco/sdxl-controlnet-depth
SDXL ControlNet - Depth
lucataco/clip-interrogator
CLIP Interrogator (for faster inference)
lucataco/sdxl-panoramic
360 Panorama SDXL image with inpainted wrapping seam
lucataco/codeformer
Robust face restoration algorithm for old photos/AI-generated faces - (A40 GPU)
lucataco/blueprint
An SDXL fine-tune based on blueprints
lucataco/ms-img2vid
Turn any image into a video
lucataco/wizardcoder-python-34b-v1.0
Empowering Code Large Language Models with Evol-Instruct
lucataco/realistic-vision-v5-openpose
Realistic Vision V5 with OpenPose
lucataco/spider-gwen-style
SDXL fine tune on Spider-Gwen style
lucataco/realistic-vision-v5
Realistic Vision v5.0 with VAE
lucataco/sdxl-controlnet-openpose
SDXL ControlNet - OpenPose
lucataco/realistic-vision-v5-inpainting
Realistic Vision v5.0 Inpainting
lucataco/realistic-vision-v5-img2img
Realistic Vision v5.0 Image 2 Image
lucataco/realistic-vision-v5.1
Implementation of Realistic Vision v5.1 with VAE
lucataco/sdxl-clip-interrogator
CLIP Interrogator for SDXL optimizes text prompts to match a given image
lucataco/upstage-llama-2-70b-instruct-v2
Upstage/Llama-2-70B-instruct-v2 - GPTQ
lucataco/glaive-function-calling-v1
2.7B param open source chat model trained on Glaive’s synthetic data generation platform
lucataco/gfpgan
Practical face restoration algorithm for *old photos* or *AI-generated faces* (for larger images)
lucataco/freewilly2
Stability AI's FreeWilly2
lucataco/llama-2-70b-chat
Meta's Llama 2 70b Chat - GPTQ
lucataco/llama-2-13b-chat
Meta's Llama 2 13b Chat - GPTQ
lucataco/llama-2-7b-chat
Meta's Llama 2 7b Chat - GPTQ
lucataco/speaker-diarization
Segments an audio recording based on who is speaking (on A100)
lucataco/rivers-stable-diffusion-upscaler
RiversHaveWings Stable Diffusion Upscaler
lucataco/real-esrgan
Real-ESRGAN with optional face correction and adjustable upscale (for larger images)
lucataco/wsrglow
A working wsrglow model
lucataco/stable-diffusion-image-variation
Image Variations with Stable Diffusion
lucataco/realistic-vision-v4.0
Realistic Vision V4.0
lucataco/realistic-vision-v3.0
Realistic Vision V3.0 with VAE
lucataco/instruct-glaive
sahil2801/replit-code-instruct-glaive
lucataco/xgen-7b-8k-base
Salesforce/xgen-7b-8k-base
lucataco/vicuna-13b-v1.3
lmsys/vicuna-13b-v1.3
lucataco/vicuna-7b-v1.3
lmsys/vicuna-7b-v1.3
lucataco/codegen2-1b
Salesforce/codegen2-1B
lucataco/tiny-starcoder-py
bigcode/tiny_starcoder_py
lucataco/wizardcoder-15b-v1.0
WizardLM/WizardCoder-15B-V1.0
lucataco/shiba-diffusion
Shiba stable diffusion model
lucataco/idefics-80b
IDEFICS 80b Quantized
lucataco/wizardcoder-15b-v1
WizardLM/WizardCoder-15B-V1.0 in 4bit
lucataco/replit-code-v1-3b
replit/replit-code-v1-3b
lucataco/idefics-9b
IDEFICS 9b Quantized
lucataco/mpt-30b-chat
mosaicml/mpt-30b-chat in 8bit
lucataco/vicuna-33b-v1.3
lmsys/vicuna-33b-v1.3
lucataco/phi-1.5
microsoft/phi-1.5 was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts
lucataco/instant-id-lcm
InstantID with LCM