zsxkib / hibiki

Hibiki: High-Fidelity Simultaneous Speech-To-Speech Translation

  • Public
  • 12 runs
  • GitHub
  • Weights
  • Paper
  • License

Hibiki: Real-Time Voice-Preserving Translation

[Paper] | [Hear Samples] | [Model Weights]

Hibiki delivers real-time speech translation while preserving the speaker’s voice characteristics. Designed for seamless French→English conversion, it operates locally on consumer hardware with natural-sounding results.

Why Hibiki?

  • 🎭 Voice Preservation - Maintains speaker’s vocal identity using advanced guidance techniques
  • Instant Translation - Processes audio at 12.5 frames/sec for real-time conversion
  • 🔊 Natural Output - Generates fluent target speech with human-like prosody
  • 📝 Dual Output - Produces both translated speech and text simultaneously

Quick Translation

Run with Cog using our sample file:

sudo cog predict -i audio_input=@examples/sample_fr_hibiki_crepes.mp3

Use your own .mp3 file for custom translations.

Supported Languages

Currently supports French → English translation. More languages coming soon.

📄 Citation
If using Hibiki in research, please cite our paper.

Model weights licensed under CC-BY 4.0
Inference code MIT licensed


Maintained by @zsxkib for Replicate integration (follow me on X/Twitter for updates)!