cottom/svc | API reference

Public

3 runs

Run cottom/svc with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field	Type	Default value	Description
source	string		Lead vocal or dry stem that should be re-sung.
reference	string		Reference vocal timbre (1-30 seconds of clean solo vocals).
quality_preset	None	Balanced · 24 steps	Trade quality for speed via diffusion steps.
timbre_blend	None	Balanced · CFG 0.70	How strongly to follow the reference timbre versus the source phrasing.
auto_f0_adjust	boolean	True	Align pitch centre to the reference to keep tone natural for covers.
pitch_shift_choice	None	No shift (0)	Apply a musical semitone shift after conversion to match the song key.
max_length_choice	None	90s (default)	Limit the source duration to save inference cost.

{
  "type": "object",
  "title": "Input",
  "required": [
    "source",
    "reference"
  ],
  "properties": {
    "source": {
      "type": "string",
      "title": "Source",
      "format": "uri",
      "x-order": 0,
      "description": "Lead vocal or dry stem that should be re-sung."
    },
    "reference": {
      "type": "string",
      "title": "Reference",
      "format": "uri",
      "x-order": 1,
      "description": "Reference vocal timbre (1-30 seconds of clean solo vocals)."
    },
    "timbre_blend": {
      "enum": [
        "Source-led \u00b7 CFG 0.50",
        "Balanced \u00b7 CFG 0.70",
        "Reference-locked \u00b7 CFG 0.95"
      ],
      "type": "string",
      "title": "timbre_blend",
      "description": "How strongly to follow the reference timbre versus the source phrasing.",
      "default": "Balanced \u00b7 CFG 0.70",
      "x-order": 3
    },
    "auto_f0_adjust": {
      "type": "boolean",
      "title": "Auto F0 Adjust",
      "default": true,
      "x-order": 4,
      "description": "Align pitch centre to the reference to keep tone natural for covers."
    },
    "quality_preset": {
      "enum": [
        "Turbo \u00b7 12 steps",
        "Fast \u00b7 18 steps",
        "Balanced \u00b7 24 steps",
        "Quality \u00b7 32 steps",
        "Studio \u00b7 40 steps"
      ],
      "type": "string",
      "title": "quality_preset",
      "description": "Trade quality for speed via diffusion steps.",
      "default": "Balanced \u00b7 24 steps",
      "x-order": 2
    },
    "max_length_choice": {
      "enum": [
        "60s",
        "90s (default)",
        "120s",
        "180s",
        "No cap"
      ],
      "type": "string",
      "title": "max_length_choice",
      "description": "Limit the source duration to save inference cost.",
      "default": "90s (default)",
      "x-order": 6
    },
    "pitch_shift_choice": {
      "enum": [
        "Down 5 semitones (-5)",
        "Down 3 semitones (-3)",
        "Down 2 semitones (-2)",
        "Down 1 semitone (-1)",
        "No shift (0)",
        "Up 1 semitone (+1)",
        "Up 2 semitones (+2)",
        "Up 3 semitones (+3)",
        "Up 5 semitones (+5)"
      ],
      "type": "string",
      "title": "pitch_shift_choice",
      "description": "Apply a musical semitone shift after conversion to match the song key.",
      "default": "No shift (0)",
      "x-order": 5
    }
  }
}

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{
  "type": "string",
  "title": "Output",
  "format": "uri"
}