lucataco / llama-3-vision-alpha
Projection module trained to add vision capabilties to Llama 3 using SigLIP
lucataco / stable-diffusion-3.5-large-lora-trainer
Fine-tune StableDiffusion3.5-Large with Hugging Face Diffusers
lucataco / sd3.5-fine-tuner
Ostris AI-Toolkit for StableDiffusion3.5-Large LoRA Training
lucataco / stable-diffusion-3.5-large-lora
Stable Diffusion 3.5 Large - LoRA Explorer
lucataco / cogvideox-interpolation
CogvideoX Keyframe Interpolation by Zhengcong Fei
lucataco / ollama-nemotron-70b
Ollama Nemotron 70b
lucataco / flux.1-turbo-alpha
8-step distilled lora for FLUX.1-dev model released by the Alimama-Creative Team
lucataco / diffusers-dreambooth-lora-x2
FLUX.1-Dev LoRA Training (with 2x GPUs) by Huggingface Diffusers
lucataco / diffusers-dreambooth-lora
FLUX.1-Dev LoRA Training by Huggingface Diffusers
lucataco / flux-dev-multi-lora
FLUX.1-Dev Multi LoRA Explorer
lucataco / flux-dev-lora
FLUX.1-Dev LoRA Explorer
lucataco / flux-rplctcpl
Flux LoRA Training Experiment - Training two people in one LoRA with two images
lucataco / flux-vlta-layer
Flux finetune of Violeta - specific layer training
lucataco / flux.1-controlnet-lineart-promeai
Controlnet trained on black-forest-labs/FLUX.1-dev with lineart condition
lucataco / joy-caption-pre-alpha
Image Caption model
lucataco / nsfw_video_detection
FalconAIs NSFW detection model, extended for videos
lucataco / ollama-reflection-70b
Ollama Reflection 70b
lucataco / flux-time100
Flux finetune of the style: TIMES 100 Most Influential People in AI
lucataco / flux-schnell-lora
FLUX.1-Schnell LoRA Explorer
lucataco / flux-vlta
A Flux finetune of an AI character named: Violeta
lucataco / controlnet-union-pro
ControlNet for FLUX.1-dev model jointly released by InstantX and Shakker Labs
lucataco / flux-syd-mead
Flux finetune trained on Syd Mead concept art for Blade Runner
lucataco / ai-toolkit
Ostris AI-Toolkit for Flux LoRA Training (Proof of Concept). Please use the official trainer at: ostris/flux-dev-lora-trainer
lucataco / flux-queso
A Flux LoRA trained on photos of Jake's dog: Queso
lucataco / flux-watercolor
A Flux LoRA trained on watercolor style photos
lucataco / simpletuner-flux
FLUX.1-Dev LoRA trainer via SimpleTuner (Work in Progress)
lucataco / dis-background-removal
ECCV2022 Quick background removal
lucataco / segment-anything-2
Segment Anything 2 (SAM2) by Meta - Automatic mask generation
lucataco / moondream2
moondream2 is a small vision language model designed to run efficiently on edge devices
lucataco / train-text-to-image-lora
Huggingface Diffusers: SDv1.4/1.5/2.0/2.1 finetuner
lucataco / aura-flow-v0.2
A fully open-sourced, large flow-based text-to-image generation model
lucataco / prompt-guard-86m
LLM-powered applications are susceptible to prompt attacks, which are prompts intentionally designed to subvert the developer’s intended behavior of the LLM
lucataco / pixart-sigma-900m
PixArt Sigma 900M is a text-to-image generation model based on the PixArt Sigma architecture
lucataco / numinamath-7b-tir
NuminaMath is a series of language models that are trained to solve math problems using tool-integrated reasoning (TIR)
lucataco / ollama-deepseek-coder-v2-236b
Cog wrapper for Ollama deepseek-coder-v2:236b
lucataco / ollama-llama3-70b
Cog wrapper for Ollama llama3:70b
lucataco / ollama-llama3-8b
Cog wrapper for Ollama llama3:8b
lucataco / internlm2_5-7b-chat
InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model tailored for practical scenarios.
lucataco / qwen2-57b-a14b-instruct
Qwen2 57 billion parameter language model from Alibaba Cloud, fine tuned for chat completions
lucataco / dolphin-2.9-llama3-8b
Dolphin-2.9 has a variety of instruction, conversational, and coding skills. It also has initial agentic abilities and supports function calling
lucataco / hermes-2-pro-llama-3-70b
Hermes 2 Pro is an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house
lucataco / hermes-2-theta-llama-3-8b
Hermes-2 Θ (Theta) is the first experimental merged model released by Nous Research
lucataco / hermes-2-pro-llama-3-8b
Hermes 2 Pro is an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house
lucataco / hunyuandit-v1.1
A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
lucataco / florence-2-large
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
lucataco / florence-2-base
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
lucataco / sdxl-clip-interrogator
CLIP Interrogator for SDXL optimizes text prompts to match a given image
lucataco / paligemma-3b-pt-224
PaliGemma 3B, an open VLM by Google, pre-trained with 224*224 input images and 128 token input/output text sequences
lucataco / yi-1.5-6b
Yi-1.5 is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples
lucataco / blip3-phi3-mini-instruct-r-v1
BLIP3(XGen-MM) is a series of foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research
lucataco / llava-phi-3-mini
llava-phi-3-mini is a LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct
lucataco / qwen1.5-110b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / idefics-8b
Idefics2 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces text outputs
lucataco / snowflake-arctic-embed-l
snowflake-arctic-embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance
lucataco / sdxs-512-0.9
sdxs-512-0.9 can generate high-resolution images in real-time based on prompt texts, trained using score distillation and feature matching
lucataco / mvsep-mdx23-music-separation
Model for Sound demixing challenge 2023: Music Demixing Track - MDX'23
lucataco / rembg-video
Remove video background
lucataco / clip-vit-base-patch32
openai/clip-vit-large-patch32
lucataco / sdxl-inpainting
SDXL Inpainting developed by the HF Diffusers team
lucataco / zeta-editing
Zero-Shot Text-Based Audio Editing Using DDPM Inversion
lucataco / juggernaut-xl-v9
Juggernaut XL v9
lucataco / sdxl-lightning-multi-controlnet
SDXL lightning mult-controlnet, img2img & inpainting
lucataco / dreamshaper-xl-lightning
dreamshaper-xl-lightning is a Stable Diffusion model that has been fine-tuned on SDXL
lucataco / animate-diff-vid2vid
AnimateDiff video to video
lucataco / depth-anything-video-sbs
POC implementation of Depth-anything to produce a 3D SBS video
lucataco / rgb2grayscale-cuda
CUDA implementation of an rgb2grayscale function
lucataco / deep3d
Deep3D: Real-Time end-to-end 2D-to-3D Video Conversion, based on deep learning
lucataco / glpn-nyu
Global-Local Path Networks (GLPN) model trained on NYUv2 for Monocular Depth Estimation
lucataco / nomic-embed-text-v1
nomic-embed-text-v1 is 8192 context length text encoder that surpasses OpenAI text-embedding-ada-002 and text-embedding-3-small performance on short and long context tasks
lucataco / depth-anything-video
Depth Anything on full video files
lucataco / phixtral-2x2_8
phixtral-2x2_8 is the first Mixure of Experts (MoE) made with two microsoft/phi-2 models, inspired by the mistralai/Mixtral-8x7B-v0.1 architecture
lucataco / bge-m3
BGE-M3, the first embedding model which supports multiple retrieval mode, multilingual and multi-granularity retrieval.
lucataco / qwen1.5-72b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / qwen1.5-14b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / qwen1.5-7b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / qwen1.5-4b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / qwen1.5-1.8b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / qwen1.5-0.5b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / diffusionlight
DiffusionLight: Light Probes by Painting a Chrome Ball
lucataco / phi-2
Phi-2 by Microsoft
lucataco / img-and-audio2video
Take an image and an audio file and create a video clip
lucataco / watermark_detector
amrul-hzz's fine-tuned version of vit-base-patch16-224-in21k for watermark image detection
lucataco / moondream1
(Research only) Moondream1 is a vision language model that performs on par with models twice its size
lucataco / siglip
SigLIP proposes to replace the loss function used in CLIP by a simple pairwise sigmoid loss
lucataco / wizardcoder-33b-v1.1-gguf
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
lucataco / whisperspeech-small
An Open Source text-to-speech system built by inverting Whisper
lucataco / magnet
MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer
lucataco / pheme
Pheme generates a variety of conversational voices in 16 kHz for phone-call applications
lucataco / pasd-magnify
(Academic and Non-commercial use only) Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization
lucataco / sdxl-deepcache
SDXL using DeepCache
lucataco / tinyllama-1.1b-chat-v1.0
This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
lucataco / open-dalle-v1.1
A unique fusion that showcases exceptional prompt adherence and semantic understanding, it seems to be a step above base SDXL and a step closer to DALLE-3 in terms of prompt comprehension
lucataco / diffusion-motion-transfer
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
lucataco / singing_voice_conversion
Amphion Singing Voice Conversion: DiffWaveNetSVC
lucataco / ip-adapter-faceid
(Research only) IP-Adapter-FaceID can generate various style images conditioned on a face with only text prompts
lucataco / dreamshaper-xl-turbo
DreamShaper is a general purpose SD model that aims at doing everything well, photos, art, anime, manga. It's designed to match Midjourney and DALL-E.
lucataco / dpo-sdxl
Direct Preference Optimization (DPO) is a method to align diffusion models to text human preferences by directly optimizing on human comparison data
lucataco / seamless_communication
FacebookResearch/SeamlessM4T v2 - Massively Multilingual & Multimodal Machine Translation
lucataco / stable-diffusion-x4-upscaler
Stable Diffusion x4 upscaler model
lucataco / resemble-enhance
AI-driven audio enhancement for your audio files, powered by Resemble AI
lucataco / segmind-vega
Segmind-Vega Model is a distilled version of SDXL, offering a 70% reduction in size and an 100% speedup
lucataco / style-aligned
GoogleAI: Style Aligned Image Generation via Shared Attention
lucataco / sdxl-img-blend
SDXL Image Blending
lucataco / demofusion-enhance
Image to Image enhancer using DemoFusion
lucataco / vid2openpose
Video to OpenPose
lucataco / magic-animate-openpose
MagicAnimate using an OpenPose input video
lucataco / vid2densepose
Convert your videos to DensePose and use it with MagicAnimate
lucataco / magic-animate
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
lucataco / pixart-xl-2
PixArt-Alpha 1024px is a transformer-based text-to-image diffusion system trained on text embeddings from T5
lucataco / demofusion
DemoFusion: Democratising High-Resolution Image Generation With No 💰
lucataco / interpany-clearer
InterpAny-Clearer: Clearer anytime frame interpolation & Manipulated interpolation
lucataco / xtts-v2
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
lucataco / controlnet-tile
Controlnet v1.1 - Tile Version
lucataco / real-esrgan-video
Real-ESRGAN Video Upscaler
lucataco / seine
Image-to-video - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction
lucataco / animate-diff-sdxl-lcm
Animate Your Personalized Text-to-Image Diffusion Models with SDXL and LCM
lucataco / vseq2vseq
Text to video diffusion model with variable length frame conditioning for infinite length video
lucataco / dreamshaper7-img2img-lcm
Dreamshaper-7 img2img with LCM LoRA for faster inference
lucataco / realvisxl2-lcm
RealvisXL-v2.0 with LCM LoRA - requires fewer steps (4 to 8 instead of the original 40 to 50)
lucataco / modelscope-facefusion
Auto fuse a user's face onto the template image, with a similar appearance to the user
lucataco / ip_adapter-face-inpaint
A combination of ip_adapter SDv1.5 and mediapipe-face to inpaint a face
lucataco / sdxl-niji-se
SDXL_Niji_Special Edition
lucataco / sdxl-lcm-zeke
A fine-tuned SDXL-LCM LoRA based on the photos of Zeke
lucataco / sdxl-lcm
Latent Consistency Model (LCM): SDXL, distills the original model into a version that requires fewer steps (4 to 8 instead of the original 25 to 50)
lucataco / ip_adapter-sdxl-face
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDXL images with an image prompt
lucataco / sdxl-lcm-loras
POC of SDXL-LCM LoRA combined with a Replicate LoRA for 4 second inference time
lucataco / lcm-ssd-1b
Latent Consistency Model (LCM): SSD-1B, is a LCM distilled version that reduces the number of inference steps needed to only 2 - 8 steps
lucataco / ip_adapter-face
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate SDv1.5 images with an image prompt
lucataco / realvisxl-v2.0
Implementation of SDXL RealVisXL_V2.0
lucataco / realvisxl2-lora-inference
POC to run inference on Realvisxl2 LoRAs
lucataco / realvisxl2-lora-training
POC to train Realvisxl2 LoRAs
lucataco / ssd-1b
Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of SDXL, offering a 60% speedup while maintaining high-quality text-to-image generation capabilities
lucataco / ssd-lora-inference
POC to run inference on SSD-1B LoRAs
lucataco / ssd-lora-training
POC to train SSD-1B LoRAs for cheaper & faster training
lucataco / ssd-1b-txt2img_batch
Batch mode for Segmind Stable Diffusion Model (SSD-1B) txt2img
lucataco / realvisxl-v2-img2img
Implementation of SDXL RealVisXL_V2.0 img2img
lucataco / thinkdiffusionxl
ThinkDiffusionXL is a go-to model capable of amazing photorealism that's also versatile enough to generate high-quality images across a variety of styles and subjects without needing to be a prompting genius
lucataco / kosmos-2
Grounding Multimodal Large Language Models to the World
lucataco / ssd-1b-img2img
Segmind Stable Diffusion Model (SSD-1B) img2img
lucataco / sdxl
SDXL v1.0 - A text-to-image generative AI model that creates beautiful images
lucataco / realvisxl-v1-img2img
Implementation of SDXL RealVisXL_V1.0 img2img
lucataco / dolphin-2.2.1-mistral-7b
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
lucataco / dolphin-2.1-mistral-7b
Mistral-7B-v0.1 fine tuned for chat with the Dolphin dataset (an open-source implementation of Microsoft's Orca)
lucataco / mistrallite
MistralLiteA is a fine-tuned Mistral-7B-v0.1 language model, with enhanced capabilities of processing long context (up to 32K tokens)
lucataco / bakllava
BakLLaVA-1 is a Mistral 7B base augmented with the LLaVA 1.5 architecture
lucataco / hotshot-xl
😊 Hotshot-XL is an AI text-to-GIF model trained to work alongside Stable Diffusion XL
lucataco / fuyu-8b
Fuyu-8B is a multi-modal text and image transformer trained by Adept AI
lucataco / video-crafter
Open diffusion model for high-quality video generation
lucataco / qwen-vl-chat
A multimodal LLM-based AI assistant, which is trained with alignment techniques. Qwen-VL-Chat supports more flexible interaction, such as multi-round question answering, and creative capabilities.
lucataco / comfyui-sdxl-txt2img
Using a ComfyUI workflow to run SDXL text2img
lucataco / sadtalker
Stylized Audio-Driven Single Image Talking Face Animation
lucataco / sdxl-controlnet
SDXL ControlNet - Canny
lucataco / animate-diff
Animate Your Personalized Text-to-Image Diffusion Models
lucataco / illusion-diffusion-hq
Monster Labs QrCode ControlNet on top of SD Realistic Vision v5.1
lucataco / remove-bg
Remove background from an image
lucataco / realvisxl-v1.0
Implementation of SDXL RealVisXL_V1.0
lucataco / sdxl-controlnet-depth
SDXL ControlNet - Depth
lucataco / clip-interrogator
CLIP Interrogator (for faster inference)
lucataco / sdxl-panoramic
360 Panorama SDXL image with inpainted wrapping seam
lucataco / codeformer
Robust face restoration algorithm for old photos/AI-generated faces - (A40 GPU)
lucataco / blueprint
An SDXL fine-tune based on blueprints
lucataco / ms-img2vid
Turn any image into a video
lucataco / wizardcoder-python-34b-v1.0
Empowering Code Large Language Models with Evol-Instruct
lucataco / realistic-vision-v5-openpose
Realistic Vision V5 with OpenPose
lucataco / spider-gwen-style
SDXL fine tune on Spider-Gwen style
lucataco / realistic-vision-v5
Realistic Vision v5.0 with VAE
lucataco / sdxl-controlnet-openpose
SDXL ControlNet - OpenPose
lucataco / realistic-vision-v5-inpainting
Realistic Vision v5.0 Inpainting
lucataco / realistic-vision-v5-img2img
Realistic Vision v5.0 Image 2 Image
lucataco / realistic-vision-v5.1
Implementation of Realistic Vision v5.1 with VAE
lucataco / upstage-llama-2-70b-instruct-v2
Upstage/Llama-2-70B-instruct-v2 - GPTQ
lucataco / glaive-function-calling-v1
2.7B param open source chat model trained on Glaive’s synthetic data generation platform
lucataco / gfpgan
Practical face restoration algorithm for *old photos* or *AI-generated faces* (for larger images)
lucataco / freewilly2
Stability AI's FreeWilly2
lucataco / llama-2-13b-chat
Meta's Llama 2 13b Chat - GPTQ
lucataco / llama-2-7b-chat
Meta's Llama 2 7b Chat - GPTQ
lucataco / speaker-diarization
Segments an audio recording based on who is speaking (on A100)
lucataco / rivers-stable-diffusion-upscaler
RiversHaveWings Stable Diffusion Upscaler
lucataco / real-esrgan
Real-ESRGAN with optional face correction and adjustable upscale (for larger images)
lucataco / wsrglow
A working wsrglow model
lucataco / realistic-vision-v4.0
Realistic Vision V4.0
lucataco / realistic-vision-v3.0
Realistic Vision V3.0 with VAE
lucataco / instruct-glaive
sahil2801/replit-code-instruct-glaive
lucataco / xgen-7b-8k-base
Salesforce/xgen-7b-8k-base
lucataco / vicuna-13b-v1.3
lmsys/vicuna-13b-v1.3
lucataco / vicuna-7b-v1.3
lmsys/vicuna-7b-v1.3
lucataco / codegen2-1b
Salesforce/codegen2-1B
lucataco / tiny-starcoder-py
bigcode/tiny_starcoder_py
lucataco / shiba-diffusion
Shiba stable diffusion model
lucataco / phi-1.5
microsoft/phi-1.5 was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts
lucataco / idefics-80b
IDEFICS 80b Quantized
lucataco / wizardcoder-15b-v1
WizardLM/WizardCoder-15B-V1.0 in 4bit
lucataco / wizardcoder-15b-v1.0
WizardLM/WizardCoder-15B-V1.0
lucataco / qwen2.5-72b-instruct
lucataco / mpt-30b-chat
mosaicml/mpt-30b-chat in 8bit
lucataco / differential-diffusion
Modify an image with a prompt and a depth image
lucataco / pixart-lcm-xl-2
PixArt-Alpha LCM is a transformer-based text-to-image diffusion system trained on text embeddings from T5
lucataco / mistral-7b-v0.1
Mistral-7B-v0.1 is a pretrained generative text model that outperforms Llama 2 13B on all benchmarks
lucataco / replit-code-v1-3b
replit/replit-code-v1-3b
lucataco / cross-image-attention
Given two images depicting a source structure and a target appearance, generate an image merging the structure of one image with the appearance of the other
lucataco / instant-id-lcm
InstantID with LCM
lucataco / wizard-vicuna-13b-uncensored
This is wizard-vicuna-13b trained with a subset of the dataset - responses that contained alignment / moralizing were removed
lucataco / olmo-7b
OLMo is a series of Open Language Models designed to enable the science of language models
lucataco / ollama-qwen2.5-72b
Ollama Qwen2.5 72b
lucataco / stable-diffusion-image-variation
Image Variations with Stable Diffusion
lucataco / rave
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
lucataco / mobius
Mobius, a diffusion model that pushes the boundaries of domain-agnostic debiasing and representation realignment
lucataco / mistral-7b-instruct-v0.3
The Mistral-7B-Instruct-v0.3 Large Language Model is an instruct fine-tuned version of the Mistral-7B-v0.3
lucataco / idefics-9b
IDEFICS 9b Quantized
lucataco / qwen1.5-32b
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data
lucataco / playground-v2
Playground v2 is a diffusion-based text-to-image generative model trained from scratch. Try out all 3 models here
lucataco / vicuna-33b-v1.3
lmsys/vicuna-33b-v1.3
lucataco / minicpm-v-2
OpenBMB MiniCPM-V 2.8B is a strong multimodal large language model for efficient end-side deployment
lucataco / nous-hermes-2-mixtral-8x7b-dpo
Nous Hermes 2 Mixtral 8x7B DPO is a Nous Research model trained over the Mixtral 8x7B MoE LLM