lucataco / orpheus-3b-0.1-ft

Orpheus 3B - high quality, emotive Text to Speech

  • Public
  • 1.6K runs
  • GitHub
  • Weights
  • License

Run time and cost

This model costs approximately $0.0090 to run on Replicate, or 111 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 10 seconds.

Readme

Orpheus 3B 0.1 Finetuned

03/18/2025 – We are releasing our 3B Orpheus TTS model with additional finetunes. Code is available on GitHub: CanopyAI/Orpheus-TTS

Note: supports <laugh>, <chuckle>, <sigh>, <cough> <sniffle>, <groan>, <yawn>, <gasp> or uhm


Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

Model Details

Model Capabilities

  • Human-Like Speech: Natural intonation, emotion, and rhythm that is superior to SOTA closed source models
  • Zero-Shot Voice Cloning: Clone voices without prior fine-tuning
  • Guided Emotion and Intonation: Control speech and emotion characteristics with simple tags
  • Low Latency: ~200ms streaming latency for realtime applications, reducible to ~100ms with input streaming

Model Sources

Usage

Check out our Colab (link to Colab) or GitHub (link to GitHub) on how to run easy inference on our finetuned models.

Model Misuse

Do not use our models for impersonation without consent, misinformation or deception (including fake news or fraudulent calls), or any illegal or harmful activity. By using this model, you agree to follow all applicable laws and ethical guidelines. We disclaim responsibility for any use.