You're looking at a specific version of this model. Jump to the model overview.
victor-upmeet /whisperx:5dfdddf6
            
              
                
              
            
            Input schema
          
        The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description | 
|---|---|---|---|
| audio_file | 
           
            string
            
           
         | 
        
           
            Audio file
           
         | 
      |
| language | 
           
            string
            
           
         | 
        
           
            ISO code of the language spoken in the audio, specify None to perform language detection
           
         | 
      |
| language_detection_min_prob | 
           
            number
            
           
         | 
        
          
             
              0
             
          
          
          
         | 
        
           
            If language is not specified, then the language will be detected recursively on different parts of the file until it reaches the given probability
           
         | 
      
| language_detection_max_tries | 
           
            integer
            
           
         | 
        
          
             
              5
             
          
          
          
         | 
        
           
            If language is not specified, then the language will be detected following the logic of language_detection_min_prob parameter, but will stop after the given max retries. If max retries is reached, the most probable language is kept.
           
         | 
      
| initial_prompt | 
           
            string
            
           
         | 
        
           
            Optional text to provide as a prompt for the first window
           
         | 
      |
| batch_size | 
           
            integer
            
           
         | 
        
          
             
              64
             
          
          
          
         | 
        
           
            Parallelization of input audio transcription
           
         | 
      
| temperature | 
           
            number
            
           
         | 
        
          
             
              0
             
          
          
          
         | 
        
           
            Temperature to use for sampling
           
         | 
      
| vad_onset | 
           
            number
            
           
         | 
        
          
             
              0.5
             
          
          
          
         | 
        
           
            VAD onset
           
         | 
      
| vad_offset | 
           
            number
            
           
         | 
        
          
             
              0.363
             
          
          
          
         | 
        
           
            VAD offset
           
         | 
      
| align_output | 
           
            boolean
            
           
         | 
        
          
             
              False
             
          
          
          
         | 
        
           
            Aligns whisper output to get accurate word-level timestamps
           
         | 
      
| diarization | 
           
            boolean
            
           
         | 
        
          
             
              False
             
          
          
          
         | 
        
           
            Assign speaker ID labels
           
         | 
      
| huggingface_access_token | 
           
            string
            
           
         | 
        
           
            To enable diarization, please enter your HuggingFace token (read). You need to accept the user agreement for the models specified in the README.
           
         | 
      |
| min_speakers | 
           
            integer
            
           
         | 
        
           
            Minimum number of speakers if diarization is activated (leave blank if unknown)
           
         | 
      |
| max_speakers | 
           
            integer
            
           
         | 
        
           
            Maximum number of speakers if diarization is activated (leave blank if unknown)
           
         | 
      |
| debug | 
           
            boolean
            
           
         | 
        
          
             
              False
             
          
          
          
         | 
        
           
            Print out compute/inference times and memory usage information
           
         | 
      
            
              
                
              
            
            Output schema
          
        The shape of the response you’ll get when you run this model with an API.
              Schema
            
            {'properties': {'detected_language': {'title': 'Detected Language',
                                      'type': 'string'},
                'segments': {'title': 'Segments'}},
 'required': ['detected_language'],
 'title': 'Output',
 'type': 'object'}