zsxkib / v-express

🫦 Realistic facial expression manipulation (lip-syncing) using audio or video

  • Public
  • 1K runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License
  • Prediction

    zsxkib/v-express:e0122658
    ID
    qvktt6bg6nrgp0cfye780czr88
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created
    by @zsxkib

    Input

    audio_path
    https://replicate.delivery/pbxt/L3WYDN9aRYHe4jqstFC1gNYslCrtmd3oAjGoCSMEDSAg8jsn/aud.mp3
    reference_image
    reference_image
    retarget_strategy
    fix_face
    num_inference_steps
    25
    audio_attention_weight
    3
    reference_attention_weight
    0.95

    Output

    Generated in
  • Prediction

    zsxkib/v-express:e0122658
    ID
    wqdtjwgdw1rgj0cg0zjv2de6t0
    Status
    Succeeded
    Source
    Web
    Hardware
    A40
    Total duration
    Created

    Input

    image_width
    512
    motion_mode
    fast
    image_height
    512
    driving_video
    context_stride
    1
    guidance_scale
    3.5
    context_overlap
    4
    reference_image
    reference_image
    use_video_audio
    frames_per_second
    30
    num_context_frames
    12
    num_inference_steps
    25
    audio_attention_weight
    3
    num_audio_padding_frames
    2
    reference_attention_weight
    0.95

    Output

    Generated in
  • Prediction

    zsxkib/v-express:e0122658
    ID
    qh4s7s8j91rgp0cg1jbt9025c8
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    image_width
    512
    motion_mode
    standard
    image_height
    512
    driving_audio
    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x
    context_stride
    1
    guidance_scale
    3.5
    context_overlap
    4
    reference_image
    reference_image
    use_video_audio
    frames_per_second
    30
    num_context_frames
    12
    num_inference_steps
    10
    audio_attention_weight
    3
    num_audio_padding_frames
    2
    reference_attention_weight
    0.95

    Output

    Generated in
  • Prediction

    zsxkib/v-express:e0122658
    ID
    jwjjmrvv39rge0cg1jzswggdsg
    Status
    Succeeded
    Source
    Web
    Hardware
    A100 (40GB)
    Total duration
    Created

    Input

    image_width
    512
    motion_mode
    standard
    image_height
    512
    driving_audio
    Video Player is loading.
    Current Time 00:00:000
    Duration 00:00:000
    Loaded: 0%
    Stream Type LIVE
    Remaining Time 00:00:000
     
    1x
    context_stride
    1
    guidance_scale
    3.5
    context_overlap
    4
    reference_image
    reference_image
    use_video_audio
    frames_per_second
    30
    num_context_frames
    12
    num_inference_steps
    10
    audio_attention_weight
    3
    num_audio_padding_frames
    2
    reference_attention_weight
    0.95

    Output

    Generated in

Want to make some of these yourself?

Run this model