Readme

Rhubarb Lip Sync - Replicate Model

A Replicate/Cog model that provides automatic lip synchronization analysis using Rhubarb Lip Sync by Daniel Wolf. This model processes audio files and generates precise mouth cue data for lip synchronization in animations and videos.

🎯 Features

Automatic Lip Sync Analysis: Generates mouth cue data from audio input
Multiple Audio Format Support: Handles MP3, WAV, and other common audio formats
Chunked Processing: Automatically splits long audio files into manageable chunks
JSON Output: Returns structured mouth cue data in JSON format
Phonetic Recognition: Uses phonetic recognition for accurate lip sync
Cloud-Ready: Deployed on Replicate for easy API access

🚀 Quick Start

Using the Replicate API

import replicate

# Process audio file
output = replicate.run(
    "emiliacb/replicate-rhubarb:latest",
    input={
        "audio_data": "base64_encoded_audio_data",
        "wake_up": False
    }
)

print(output)

Local Development

Clone the repository: bash git clone https://github.com/emiliacb/replicate-rhubarb.git cd replicate-rhubarb
Install Cog (if not already installed): bash curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m) chmod +x /usr/local/bin/cog
Run the model locally: bash cog predict -i audio_data="base64_encoded_audio" -i wake_up=false

📋 Requirements

Python: 3.12
System Packages:
ca-certificates
libc6
unzip
libsndfile1
libportaudio2
curl
ffmpeg

🔧 API Reference

Input Parameters

Parameter	Type	Default	Description
`audio_data`	string	-	Audio data as base64 encoded string
`wake_up`	boolean	`false`	Set to true to wake up the model without processing audio

Output Format

The model returns a JSON string with the following structure:

{
  "mouthCues": [
    {
      "start": 0.0,
      "end": 0.1,
      "value": "X"
    },
    {
      "start": 0.1,
      "end": 0.2,
      "value": "A"
    }
  ]
}

Mouth Cue Values

A: Mouth closed
B: Mouth slightly open
C: Mouth open
D: Mouth wide open
E: Mouth slightly rounded
F: Mouth rounded
G: Mouth wide rounded
H: Mouth slightly puckered
X: Mouth closed (rest position)

🎵 Supported Audio Formats

MP3
WAV
FLAC
AAC
OGG
M4A
WMA

The model automatically converts all input audio to WAV format (44.1kHz, mono, 16-bit) for processing.

⚙️ Technical Details

Audio Processing Pipeline

Base64 Decoding: Converts base64 audio data to binary
Format Conversion: Uses FFmpeg to convert to WAV format
Chunking: Splits audio into 30-second chunks for processing
Rhubarb Analysis: Processes each chunk with Rhubarb Lip Sync
Result Merging: Combines results from all chunks
Cleanup: Removes temporary files

Rhubarb Configuration

Recognizer: Phonetic
Export Format: JSON
Machine Readable: Enabled
Quiet Mode: Enabled

📝 Usage Examples

Basic Usage

import base64
import replicate

# Read and encode audio file
with open("audio.mp3", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode()

# Process with Replicate
result = replicate.run(
    "emiliacb/replicate-rhubarb:latest",
    input={"audio_data": audio_data}
)

# Parse the result
import json
mouth_cues = json.loads(result)
print(f"Generated {len(mouth_cues['mouthCues'])} mouth cues")

Wake Up Call

# Test if the model is ready
result = replicate.run(
    "emiliacb/replicate-rhubarb:latest",
    input={"wake_up": True}
)

print(result)  # {"status": "OK", "message": "Rhubarb model is ready", "mouthCues": []}

Error Handling

try:
    result = replicate.run(
        "emiliacb/replicate-rhubarb:latest",
        input={"audio_data": audio_data}
    )

    data = json.loads(result)

    if "error" in data:
        print(f"Error: {data['error']}")
    else:
        print(f"Success: {len(data['mouthCues'])} cues generated")

except Exception as e:
    print(f"Request failed: {e}")

🎬 Use Cases

Animation: Generate lip sync data for animated characters
Video Production: Synchronize lips in video content
Game Development: Create realistic character animations
Accessibility: Improve video accessibility with accurate lip sync
Content Creation: Automate lip sync for video content

🔍 Troubleshooting

Common Issues

Empty Audio Data: Ensure the audio file is properly encoded as base64
Unsupported Format: The model will attempt to convert unsupported formats
Large Files: Very large audio files are automatically chunked
Processing Time: Longer audio files take more time to process

Error Messages

"No audio data provided": The audio_data parameter is empty or missing
"Audio conversion failed": FFmpeg couldn’t convert the audio format
"Audio chunking failed": Error occurred while splitting the audio
"Rhubarb processing failed": The Rhubarb tool encountered an error

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Daniel Wolf for creating the amazing Rhubarb Lip Sync tool
Replicate for providing the platform to deploy ML models
Cog for making model containerization easy

📞 Support

If you encounter any issues or have questions:

Check the troubleshooting section
Open an issue
Contact the maintainers

Run time and cost