cottom/svc
Public
3
runs
Run cottom/svc with an API
Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.
Input schema
The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| source |
string
|
Lead vocal or dry stem that should be re-sung.
|
|
| reference |
string
|
Reference vocal timbre (1-30 seconds of clean solo vocals).
|
|
| quality_preset |
None
|
Balanced · 24 steps
|
Trade quality for speed via diffusion steps.
|
| timbre_blend |
None
|
Balanced · CFG 0.70
|
How strongly to follow the reference timbre versus the source phrasing.
|
| auto_f0_adjust |
boolean
|
True
|
Align pitch centre to the reference to keep tone natural for covers.
|
| pitch_shift_choice |
None
|
No shift (0)
|
Apply a musical semitone shift after conversion to match the song key.
|
| max_length_choice |
None
|
90s (default)
|
Limit the source duration to save inference cost.
|
{
"type": "object",
"title": "Input",
"required": [
"source",
"reference"
],
"properties": {
"source": {
"type": "string",
"title": "Source",
"format": "uri",
"x-order": 0,
"description": "Lead vocal or dry stem that should be re-sung."
},
"reference": {
"type": "string",
"title": "Reference",
"format": "uri",
"x-order": 1,
"description": "Reference vocal timbre (1-30 seconds of clean solo vocals)."
},
"timbre_blend": {
"enum": [
"Source-led \u00b7 CFG 0.50",
"Balanced \u00b7 CFG 0.70",
"Reference-locked \u00b7 CFG 0.95"
],
"type": "string",
"title": "timbre_blend",
"description": "How strongly to follow the reference timbre versus the source phrasing.",
"default": "Balanced \u00b7 CFG 0.70",
"x-order": 3
},
"auto_f0_adjust": {
"type": "boolean",
"title": "Auto F0 Adjust",
"default": true,
"x-order": 4,
"description": "Align pitch centre to the reference to keep tone natural for covers."
},
"quality_preset": {
"enum": [
"Turbo \u00b7 12 steps",
"Fast \u00b7 18 steps",
"Balanced \u00b7 24 steps",
"Quality \u00b7 32 steps",
"Studio \u00b7 40 steps"
],
"type": "string",
"title": "quality_preset",
"description": "Trade quality for speed via diffusion steps.",
"default": "Balanced \u00b7 24 steps",
"x-order": 2
},
"max_length_choice": {
"enum": [
"60s",
"90s (default)",
"120s",
"180s",
"No cap"
],
"type": "string",
"title": "max_length_choice",
"description": "Limit the source duration to save inference cost.",
"default": "90s (default)",
"x-order": 6
},
"pitch_shift_choice": {
"enum": [
"Down 5 semitones (-5)",
"Down 3 semitones (-3)",
"Down 2 semitones (-2)",
"Down 1 semitone (-1)",
"No shift (0)",
"Up 1 semitone (+1)",
"Up 2 semitones (+2)",
"Up 3 semitones (+3)",
"Up 5 semitones (+5)"
],
"type": "string",
"title": "pitch_shift_choice",
"description": "Apply a musical semitone shift after conversion to match the song key.",
"default": "No shift (0)",
"x-order": 5
}
}
}
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{
"type": "string",
"title": "Output",
"format": "uri"
}