cjwbw / sadtalker

Stylized Audio-Driven Single Image Talking Face Animation

  • Public
  • 126.8K runs
  • A100 (80GB)
  • GitHub
  • Paper
  • License

Input

*file
Preview
source_image

Upload the source image, it can be video.mp4 or picture.png

*file
Preview
Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x

Upload the driven audio, accepts .wav and .mp4 file

boolean

Use GFPGAN as Face enhancer

Default: false

integer
(minimum: 0, maximum: 45)

Pose style

Default: 0

number

a larger value will make the expression motion stronger

Default: 1

boolean

Use eye blink

Default: true

string

Choose how to preprocess the images

Default: "crop"

integer

Face model resolution

Default: 256

string

Choose face render

Default: "facevid2vid"

boolean

Still Mode (fewer head motion, works with preprocess 'full')

Default: true

Output

Generated in

Run time and cost

This model costs approximately $0.18 to run on Replicate, or 5 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 130 seconds. The predict time for this model varies significantly based on the inputs.

Readme

original repo: https://github.com/OpenTalker/SadTalker


CVPR 2023

sadtalker

TL;DR:       single portrait image 🙎‍♂️      +       audio 🎤       =       talking head video 🎞.

</div>

🛎 Citation

If you find our work useful in your research, please consider citing:

@article{zhang2022sadtalker,
  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
  journal={arXiv preprint arXiv:2211.12194},
  year={2022}
}

💗 Acknowledgements

Facerender code borrows heavily from zhanglonghao’s reproduction of face-vid2vid and PIRender. We thank the authors for sharing their wonderful code. In training process, We also use the model from Deep3DFaceReconstruction and Wav2lip. We thank for their wonderful work.

See also these wonderful 3rd libraries we use:

🥂 Extensions:

📢 Disclaimer

This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.

LOGO: color and font suggestion: ChatGPT, logo font:Montserrat Alternates .

All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.