Hibiki: Real-Time Voice-Preserving Translation

[Paper] | [Hear Samples] | [Model Weights]

Hibiki delivers real-time speech translation while preserving the speaker’s voice characteristics. Designed for seamless French→English conversion, it operates locally on consumer hardware with natural-sounding results.

Why Hibiki?

🎭 Voice Preservation - Maintains speaker’s vocal identity using advanced guidance techniques
⚡ Instant Translation - Processes audio at 12.5 frames/sec for real-time conversion
🔊 Natural Output - Generates fluent target speech with human-like prosody
📝 Dual Output - Produces both translated speech and text simultaneously

Quick Translation

Run with Cog using our sample file:

sudo cog predict -i audio_input=@examples/sample_fr_hibiki_crepes.mp3

Use your own .mp3 file for custom translations.

Supported Languages

Currently supports French → English translation. More languages coming soon.

📄 Citation
If using Hibiki in research, please cite our paper.

Model weights licensed under CC-BY 4.0
Inference code MIT licensed

Maintained by @zsxkib for Replicate integration (follow me on X/Twitter for updates)!