You're looking at a specific version of this model. Jump to the model overview.
            
              
                
              
            
            Input schema
          
        The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description | 
|---|---|---|---|
| image | 
           
            string
            
           
         | 
        
           
            Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.
           
         | 
      |
| audio | 
           
            string
            
           
         | 
        
           
            Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.
           
         | 
      |
| style_clip | 
           
            None
            
           
         | 
        
          
             
              data/style_clip/3DMM/M030_front_neutral_level1_001.mat
             
          
          
          
         | 
        
           
            Input style_clip_mat, optional. This specifies the reference speaking style.
           
         | 
      
| pose | 
           
            None
            
           
         | 
        
          
             
              data/pose/RichardShelby_front_neutral_level1_001.mat
             
          
          
          
         | 
        
           
            Input pose, specifies the head pose and should be a .mat file.
           
         | 
      
| max_gen_len | 
           
            integer
            
           
         | 
        
          
             
              1000
             
          
          
          
         | 
        
           
            The maximum length (seconds) limitation for generating videos.
           
         | 
      
| cfg_scale | 
           
            number
            
           
         | 
        
          
             
              1
             
          
          
          
         | 
        
           
            The scale of classifier-free guidance. It can adjust the intensity of speaking styles.
           
         | 
      
| num_inference_steps | 
           
            integer
            
           
         | 
        
          
             
              10
             
          
          
          
            Min: 1 Max: 500  | 
        
           
            Number of denoising steps
           
         | 
      
| crop_image | 
           
            boolean
            
           
         | 
        
          
             
              True
             
          
          
          
         | 
        
           
            Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.
           
         | 
      
            
              
                
              
            
            Output schema
          
        The shape of the response you’ll get when you run this model with an API.
              Schema
            
            {'format': 'uri', 'title': 'Output', 'type': 'string'}