jarvissan22/diarization-and-speaker-embedding

Public
525 runs

Run jarvissan22/diarization-and-speaker-embedding with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field Type Default value Description
file_string
string
Base64 encoded audio file
file_url
string
A direct audio file URL
file
string
An audio file
hf_token
string
Provide a hf.co/settings/token for Pyannote.audio. You need to agree to the terms in 'https://huggingface.co/pyannote/speaker-diarization-3.1' and 'https://huggingface.co/pyannote/segmentation-3.0' first.
num_speakers
integer

Min: 1

Max: 50

Number of speakers, leave empty to autodetect.
min_speakers
integer
1

Min: 1

Max: 50

Minimum number of speakers
max_speakers
integer
10

Min: 1

Max: 50

Maximum number of speakers
language
string
ja
Language of the spoken words as a language code like 'ja'. Leave empty to auto detect language.
batch_size
integer
64

Min: 1

Batch size for inference. (Reduce if face OOM error)
offset_seconds
number
0
Offset in seconds, used for chunked inputs

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{
  "type": "object",
  "title": "DiarizationEmbeddingOutput",
  "required": [
    "diarization_segments",
    "transcript_segments",
    "speaker_embeddings",
    "speaker_info"
  ],
  "properties": {
    "language": {
      "type": "string",
      "title": "Language"
    },
    "speaker_info": {
      "type": "array",
      "items": {},
      "title": "Speaker Info"
    },
    "speaker_count": {
      "type": "integer",
      "title": "Speaker Count"
    },
    "audio_duration": {
      "type": "number",
      "title": "Audio Duration"
    },
    "processing_time": {
      "type": "number",
      "title": "Processing Time"
    },
    "speaker_embeddings": {
      "type": "object",
      "title": "Speaker Embeddings",
      "additionalProperties": true
    },
    "transcript_segments": {
      "type": "array",
      "items": {},
      "title": "Transcript Segments"
    },
    "diarization_segments": {
      "type": "array",
      "items": {},
      "title": "Diarization Segments"
    }
  }
}