subhash25rawat/tts | API reference

subhash25rawat / tts

Public
22 runs

Run subhash25rawat/tts with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field	Type	Default value	Description
ref_audio_input	string		the audio to be copied
ref_text_input	string		the word to word transcription of the reference audio
gen_text_input	string		the text for which audio need to be generated
remove_silence	boolean	False	remove silences from the generated audio
cross_fade_duration	number	1 Min: 0.3 Max: 2	None
nfe_steps	integer	32 Min: 4 Max: 64	Number of denoising steps
speed	number	1 Min: 0.3 Max: 2	The speed up factor of the generated audio
model	string (enum)	E2-TTS Options: E2-TTS, F5-TTS, Hindi-TTS	An enumeration.

{
  "type": "object",
  "title": "Input",
  "required": [
    "ref_audio_input",
    "gen_text_input"
  ],
  "properties": {
    "model": {
      "enum": [
        "E2-TTS",
        "F5-TTS",
        "Hindi-TTS"
      ],
      "type": "string",
      "title": "model",
      "description": "An enumeration.",
      "default": "E2-TTS",
      "x-order": 7
    },
    "speed": {
      "type": "number",
      "title": "Speed",
      "default": 1,
      "maximum": 2,
      "minimum": 0.3,
      "x-order": 6,
      "description": "The speed up factor of the generated audio"
    },
    "nfe_steps": {
      "type": "integer",
      "title": "Nfe Steps",
      "default": 32,
      "maximum": 64,
      "minimum": 4,
      "x-order": 5,
      "description": "Number of denoising steps"
    },
    "gen_text_input": {
      "type": "string",
      "title": "Gen Text Input",
      "x-order": 2,
      "description": "the text for which audio need to be generated"
    },
    "ref_text_input": {
      "type": "string",
      "title": "Ref Text Input",
      "default": "",
      "x-order": 1,
      "description": "the word to word transcription of the reference audio"
    },
    "remove_silence": {
      "type": "boolean",
      "title": "Remove Silence",
      "default": false,
      "x-order": 3,
      "description": "remove silences from the generated audio"
    },
    "ref_audio_input": {
      "type": "string",
      "title": "Ref Audio Input",
      "format": "uri",
      "x-order": 0,
      "description": "the audio to be copied"
    },
    "cross_fade_duration": {
      "type": "number",
      "title": "Cross Fade Duration",
      "default": 1,
      "maximum": 2,
      "minimum": 0.3,
      "x-order": 4
    }
  }
}

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{
  "type": "string",
  "title": "Output",
  "format": "uri"
}