Readme
Hibiki: Real-Time Voice-Preserving Translation
[Paper] | [Hear Samples] | [Model Weights]
Hibiki delivers real-time speech translation while preserving the speaker’s voice characteristics. Designed for seamless French→English conversion, it operates locally on consumer hardware with natural-sounding results.
Why Hibiki?
- 🎭 Voice Preservation - Maintains speaker’s vocal identity using advanced guidance techniques
- ⚡ Instant Translation - Processes audio at 12.5 frames/sec for real-time conversion
- 🔊 Natural Output - Generates fluent target speech with human-like prosody
- 📝 Dual Output - Produces both translated speech and text simultaneously
Quick Translation
Run with Cog using our sample file:
sudo cog predict -i audio_input=@examples/sample_fr_hibiki_crepes.mp3
Use your own .mp3 file for custom translations.
Supported Languages
Currently supports French → English translation. More languages coming soon.
📄 Citation
If using Hibiki in research, please cite our paper.
Model weights licensed under CC-BY 4.0
Inference code MIT licensed
Maintained by @zsxkib for Replicate integration (follow me on X/Twitter for updates)!
