Latest models

POC implementation of Depth-anything to produce a 3D SBS video

Merge two images together with a prompt

Honeycomb NLQ Generator

ProteusV0.4: The Style Update - enhances stylistic capabilities, similar to Midjourney's approach, rather than advancing prompt comprehension

SDXL-Lightning by ByteDance, is a fast text-to-image model that makes high-quality images in 4 steps

hello-world from cog example

A collection of anime stable diffusion models with VAEs and LORAs.

Get the width, height, and duration in seconds from a video

7B base version of Google’s Gemma model

2B base version of Google’s Gemma model

7B instruct version of Google’s Gemma model

2B instruct version of Google’s Gemma model

DreamCraft3D is a text and image to 3D model. Dreamcraft3D uses DeepFloyd IF and Stable Zero123, non-commercial research-only models. Please make sure you read and abide to the relevant licenses before using it.

POC CUDA implementation of an rgb2grayscale function

SDXL Concept Art and Illustrations

Automatically remove "dead-air" from videos with a loudness and motion threshold.

MusicGen stereo-medium model fine-tuned on Bbongjjak(뽕짝), the Korean modern electronic pop music genre, with the text token 'Bbongjjak'.

Fine-tune MusicGen small, medium and melody models. Also stereo models available.

Trained Stable Diffusion XL Lora With Black Male Hairstyles

A SDXL inpainting model that can be used for Replicate finetuning

Realism photo with RealVisXl v4.0 ( Realistic Vision with Stable Diffusion XL )

Photorealism with RealVisXL V4.0

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Split a single video, into it's individual clips by detecting cuts.

majicMIX realistic V7 support civitai lora,text2image,image2image

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

Accelerated transcription, word-level timestamps and diarization with whisperX large-v3 for large audio files

Accelerated transcription, word-level timestamps and diarization with whisperX large-v3

Utilize the capabilities of SD WebUI, including Hires. fix and plenty of extensions (e.g. ADetailer)

The Mixtral-8x7B-instruct-v0.1 Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts tuned to be a helpful assistant.

Pocket-Sized Multimodal AI For Content Understanding and Generation

Stable Diffusion XL fine-tuned to create images based on Le Corbusier's architectural style.

⚡️ Fast audio transcription | whisper v3 | speaker diarization | word level timestamps | prompt

Mamba 2.8B state space language model fine tuned for chat

MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer

