Collections

Create songs with voice cloning

This model collection is all about creating, cloning, and transforming singing voices using AI.

These models let you generate new vocal performances, clone voices from clean audio samples, and adjust pitch or style for creative effects.

Whether you're making covers, building custom vocal styles, or experimenting with AI singers, these models help you bring vocal performances to life without traditional recording.

Frequently asked questions

What kinds of things can I do with this collection?

This collection focuses on generating and transforming singing voices. You can:

  • Clone a singing voice from a clean audio sample.
  • Generate a new vocal performance using that cloned voice.
  • Adjust vocal style, pitch, or tone for creative effects.
  • Build and fine-tune your own custom singing voices.

Which models are best for quick singing voice generation?

If you want fast results without training a custom model, zsxkib/realistic-voice-cloning is a good choice. It can take an existing audio clip and transform it into a new sung performance in the target voice.
This is ideal for quick covers, creative remixes, or testing ideas without a lot of setup.

How can I clone a specific singing voice?

To build a more accurate or personalized singing voice, you can use voice cloning and dataset creation tools in this collection, such as:

  • zsxkib/realistic-voice-cloning — clone a voice directly from a sample.
  • zsxkib/create-rvc-dataset — build a clean dataset from audio.
  • Training and fine-tuning tools — for higher fidelity and control.
    The cleaner and more isolated the voice sample, the better the clone.

Can I generate a song with lyrics and melody?

Yes — some models support using lyrics, melody, or reference vocals to guide the singing performance.
You can input lyrics and have the model sing them in the cloned voice, or convert an existing vocal recording into the target voice.

How do voice cloning and singing generation differ?

  • Voice cloning: Captures the tone and timbre of a specific voice so it can be used for future singing.
  • Singing generation: Produces a sung performance from lyrics, melody, or a prompt.
  • Style adjustments: Some models let you shift pitch or add stylistic effects during generation.
    Voice cloning is about who’s singing; singing generation is about what’s being sung.

What kind of input and output do these models use?

  • Inputs: Voice samples, lyrics, melody, or reference vocals.
  • Outputs: Audio files (commonly WAV or MP3) of the generated or converted singing performance.
    Input quality has a big impact — clear, noise-free audio works best.

Can I use these models to auto-tune or style my own vocals?

While the collection focuses on cloning and singing generation, some models let you adjust pitch or style as part of the conversion process. These can help smooth vocals or apply a creative effect.
They’re not traditional DAW-style auto-tune plugins, but can achieve similar results in context.

How can I publish my own singing voice model?

You can package your trained singing-voice model with Cog and push it to Replicate.
Define your inputs (e.g., audio sample, lyrics, melody) and outputs (audio file) and list it in the Sing With Voices collection so others can use it or build on it.

Can I use these models commercially?

Many models in this collection can be used commercially, but you must respect voice rights and copyright law.
Cloning or imitating real artists without permission may violate legal or ethical guidelines. Always review licenses and applicable laws before using these outputs in public projects.

How do I run a singing voice model on Replicate?

  1. Pick a model from the Sing With Voices collection.
  2. Upload a clean audio sample or provide lyrics and melody.
  3. Configure any pitch or style settings.
  4. Run the model to generate the vocal performance.
  5. Download the audio and use it in your project or mix.

What should I keep in mind when working with singing voice models?

  • Clean, isolated voice samples produce the best clones.
  • The model won’t perfectly mimic complex vocal runs or heavy effects.
  • Lyrics and melody inputs should be clear and well-formatted.
  • Cloning real voices requires rights and consent.
  • Always listen back — some artifacts or pitch drift can occur.