cudanexus / makeittalk
Make any Image Talk. The image must have a human face and should be of dimensions strictly 256x256.
- Public
- 8.1K runs
Prediction
cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6Input
- audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions and subtitles off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "audio": "https://replicate.delivery/pbxt/IX9TfC9r2AynlMcx83YA1skJ10yuozM6n8egzxCnKcaWqI1P/tmp.wav", "image": "https://replicate.delivery/pbxt/IX9TfcEGOZaZ5tIXgdg8uxdwDI9ywoX1IKCrruU7LSU24bIp/taylor.jpg" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", { input: { audio: "https://replicate.delivery/pbxt/IX9TfC9r2AynlMcx83YA1skJ10yuozM6n8egzxCnKcaWqI1P/tmp.wav", image: "https://replicate.delivery/pbxt/IX9TfcEGOZaZ5tIXgdg8uxdwDI9ywoX1IKCrruU7LSU24bIp/taylor.jpg" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", input={ "audio": "https://replicate.delivery/pbxt/IX9TfC9r2AynlMcx83YA1skJ10yuozM6n8egzxCnKcaWqI1P/tmp.wav", "image": "https://replicate.delivery/pbxt/IX9TfcEGOZaZ5tIXgdg8uxdwDI9ywoX1IKCrruU7LSU24bIp/taylor.jpg" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", "input": { "audio": "https://replicate.delivery/pbxt/IX9TfC9r2AynlMcx83YA1skJ10yuozM6n8egzxCnKcaWqI1P/tmp.wav", "image": "https://replicate.delivery/pbxt/IX9TfcEGOZaZ5tIXgdg8uxdwDI9ywoX1IKCrruU7LSU24bIp/taylor.jpg" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2023-03-25T03:39:45.154419Z", "created_at": "2023-03-25T03:39:28.329495Z", "data_removed": false, "error": null, "id": "qhnue6oddjeyrbjzijirxvt4jq", "input": { "audio": "https://replicate.delivery/pbxt/IX9TfC9r2AynlMcx83YA1skJ10yuozM6n8egzxCnKcaWqI1P/tmp.wav", "image": "https://replicate.delivery/pbxt/IX9TfcEGOZaZ5tIXgdg8uxdwDI9ywoX1IKCrruU7LSU24bIp/taylor.jpg" }, "logs": "/tmp/tmprp0g9geptmp.wav\n/tmp/tmpo7a2ujzdtaylor.jpg\nAudio-----> tmprp0g9geptmp.wav\nParameters===== tmprp0g9geptmp.wav 16000 [95 75 80 ... 25 30 35]\nLoaded the voice encoder model on cuda in 0.02 seconds.\nProcessing audio file tmprp0g9geptmp.wav\n0 out of 0 are in this portion\nLoaded the voice encoder model on cuda in 0.08 seconds.\nsource shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257])\nconverted shape: torch.Size([1, 320, 80])torch.Size([1, 640])\nRun on device: cuda\nLoading Data random_val\nEVAL num videos: 1\nG: Running on cuda, total num params = 3.00M\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth =========\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth =========\n====================================\n48uYS3bHIA8\nYAZuSHvwVC0\n0yaLdVk_UyQ\nE_kmpT-EfOg\nfQR31F7L3ww\nJPMZAOGGHh8\nW6uRNCJmdtI\n2KL8PfQPmBg\np575B7k07a8\niUoAe2gXKE4\nHH-iOC056aQ\nS8fiWqrZEew\nROWN2ssXek8\nirx71tYyI-Q\nme6cdZCM2FY\nOkqHtWOFliM\nOfPKHc6w2vw\n1lh57VnuaKE\n_ldiVrXgZKc\nH1Xnb_rtgqY\n45hn7-LXDX8\nbs7ZWVqAGCU\nUElg0R7fmlk\nbCs5SoifsiY\n1Lx_ZqrK1bM\nRrnL6Pcjjbw\nsRbWv2R2hxE\nwJmdE0G4sEg\nhE-4e1vEiT8\nXXbxe3fCQqg\n02HOKnTjBlQ\nwAAMEC1OsRc\n7Sk--XzX8b0\nI5Lm0Qce5kg\nqLxfiUMYgQg\n_VpqWkdcaqM\nljIkW4uVVQY\n5m5iPZNJS6c\nJ-NPsvtQ8lE\ngOrQyrbptGo\n43BiUVlNy58\nswLghyvhoqA\nX3FCAoFnmdA\n2NiCRAmwoc4\nKVUf0J2LAaA\nYtZS9hH1j24\n5fZj9Fzi5K0\nwbWKG26ebMw\nQgNlXur0wrs\nqek_5m1MRik\nrmFsUV5ICKk\nbEdGv1wixF4\nljh5PB6Utsc\nizudwWTXuUk\nB08yOvYMF7Y\nUEmI4r5G-5Y\nScujgl9GbHA\nsxCbrYjBsGA\nqvQC0w3y_Fo\nbXpavyiCu10\niWeklsXc0H8\nH00oAfd_GsM\nZ7WRt--g-h4\n29k8RtSUjE0\nE0zgrhQ0QDw\n9KhvSxKE6Mc\nqLNvRwMkhik\n====================================\nOpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nexamples/tmprp0g9geptmp.wav\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf59.27.100\nDuration: 00:00:04.59, start: 0.000000, bitrate: 5588 kb/s\nStream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5585 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nGuessed Channel Layout for Input Stream #1.0 : mono\nInput #1, wav, from 'examples/tmprp0g9geptmp.wav':\nDuration: 00:00:04.88, bitrate: 256 kb/s\nStream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))\nStream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))\nPress [q] to stop, [?] for help\n[libx264 @ 0x563dd5b04040] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n[libx264 @ 0x563dd5b04040] profile High, level 3.0, 4:2:0, 8-bit\n[libx264 @ 0x563dd5b04040] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to 'examples/tmprp0g9geptmp_av.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf58.76.100\nStream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nencoder : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\nStream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s\nMetadata:\nencoder : Lavc58.134.100 aac\nframe= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x\nframe= 256 fps=0.0 q=32.0 size= 0kB time=00:00:03.15 bitrate= 0.1kbits/s speed= 6.2x\nframe= 287 fps=0.0 q=-1.0 Lsize= 236kB time=00:00:04.54 bitrate= 426.4kbits/s speed=6.84x\nvideo:192kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.409990%\n[libx264 @ 0x563dd5b04040] frame I:2 Avg QP:10.18 size: 4790\n[libx264 @ 0x563dd5b04040] frame P:80 Avg QP:24.58 size: 1299\n[libx264 @ 0x563dd5b04040] frame B:205 Avg QP:30.72 size: 401\n[libx264 @ 0x563dd5b04040] consecutive B-frames: 3.1% 4.2% 2.1% 90.6%\n[libx264 @ 0x563dd5b04040] mb I I16..4: 87.4% 0.5% 12.1%\n[libx264 @ 0x563dd5b04040] mb P I16..4: 0.8% 1.5% 0.1% P16..4: 6.3% 4.2% 2.3% 0.0% 0.0% skip:84.8%\n[libx264 @ 0x563dd5b04040] mb B I16..4: 0.2% 0.2% 0.0% B16..8: 9.2% 2.1% 0.9% direct: 0.3% skip:87.2% L0:51.4% L1:46.9% BI: 1.7%\n[libx264 @ 0x563dd5b04040] 8x8 transform intra:34.6% inter:6.5%\n[libx264 @ 0x563dd5b04040] coded y,uvDC,uvAC intra: 3.5% 17.4% 11.5% inter: 1.4% 3.7% 3.4%\n[libx264 @ 0x563dd5b04040] i16 v,h,dc,p: 81% 15% 4% 0%\n[libx264 @ 0x563dd5b04040] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 6% 9% 84% 1% 0% 0% 0% 0% 0%\n[libx264 @ 0x563dd5b04040] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 25% 31% 4% 2% 4% 5% 3% 5%\n[libx264 @ 0x563dd5b04040] i8c dc,h,v,p: 55% 23% 22% 0%\n[libx264 @ 0x563dd5b04040] Weighted P-Frames: Y:7.5% UV:0.0%\n[libx264 @ 0x563dd5b04040] ref P L0: 45.0% 16.4% 18.5% 18.4% 1.6%\n[libx264 @ 0x563dd5b04040] ref B L0: 77.2% 17.2% 5.6%\n[libx264 @ 0x563dd5b04040] ref B L1: 91.5% 8.5%\n[libx264 @ 0x563dd5b04040] kb/s:340.95\n[aac @ 0x563dd5b05680] Qavg: 135.552\nRun on device cuda\nOpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nTime - only video: 8.78536581993103\nTime - ffmpeg add audio: 10.202400922775269\nfinish image2image gen\nexamples/test_pred_fls_tmprp0g9geptmp_audio_embed.mp4", "metrics": { "predict_time": 16.740507, "total_time": 16.824924 }, "output": "https://replicate.delivery/pbxt/pz6HMeRfI3lEy0bHopYb57NkT1JVjKfcXyZ3Labv63GB4DWhA/test_pred_fls_tmprp0g9geptmp_audio_embed.mp4", "started_at": "2023-03-25T03:39:28.413912Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/qhnue6oddjeyrbjzijirxvt4jq", "cancel": "https://api.replicate.com/v1/predictions/qhnue6oddjeyrbjzijirxvt4jq/cancel" }, "version": "e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6" }
Generated in/tmp/tmprp0g9geptmp.wav /tmp/tmpo7a2ujzdtaylor.jpg Audio-----> tmprp0g9geptmp.wav Parameters===== tmprp0g9geptmp.wav 16000 [95 75 80 ... 25 30 35] Loaded the voice encoder model on cuda in 0.02 seconds. Processing audio file tmprp0g9geptmp.wav 0 out of 0 are in this portion Loaded the voice encoder model on cuda in 0.08 seconds. source shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257]) converted shape: torch.Size([1, 320, 80])torch.Size([1, 640]) Run on device: cuda Loading Data random_val EVAL num videos: 1 G: Running on cuda, total num params = 3.00M ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth ========= ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth ========= ==================================== 48uYS3bHIA8 YAZuSHvwVC0 0yaLdVk_UyQ E_kmpT-EfOg fQR31F7L3ww JPMZAOGGHh8 W6uRNCJmdtI 2KL8PfQPmBg p575B7k07a8 iUoAe2gXKE4 HH-iOC056aQ S8fiWqrZEew ROWN2ssXek8 irx71tYyI-Q me6cdZCM2FY OkqHtWOFliM OfPKHc6w2vw 1lh57VnuaKE _ldiVrXgZKc H1Xnb_rtgqY 45hn7-LXDX8 bs7ZWVqAGCU UElg0R7fmlk bCs5SoifsiY 1Lx_ZqrK1bM RrnL6Pcjjbw sRbWv2R2hxE wJmdE0G4sEg hE-4e1vEiT8 XXbxe3fCQqg 02HOKnTjBlQ wAAMEC1OsRc 7Sk--XzX8b0 I5Lm0Qce5kg qLxfiUMYgQg _VpqWkdcaqM ljIkW4uVVQY 5m5iPZNJS6c J-NPsvtQ8lE gOrQyrbptGo 43BiUVlNy58 swLghyvhoqA X3FCAoFnmdA 2NiCRAmwoc4 KVUf0J2LAaA YtZS9hH1j24 5fZj9Fzi5K0 wbWKG26ebMw QgNlXur0wrs qek_5m1MRik rmFsUV5ICKk bEdGv1wixF4 ljh5PB6Utsc izudwWTXuUk B08yOvYMF7Y UEmI4r5G-5Y Scujgl9GbHA sxCbrYjBsGA qvQC0w3y_Fo bXpavyiCu10 iWeklsXc0H8 H00oAfd_GsM Z7WRt--g-h4 29k8RtSUjE0 E0zgrhQ0QDw 9KhvSxKE6Mc qLNvRwMkhik ==================================== OpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' examples/tmprp0g9geptmp.wav ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf59.27.100 Duration: 00:00:04.59, start: 0.000000, bitrate: 5588 kb/s Stream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5585 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] Guessed Channel Layout for Input Stream #1.0 : mono Input #1, wav, from 'examples/tmprp0g9geptmp.wav': Duration: 00:00:04.88, bitrate: 256 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s Stream mapping: Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264)) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help [libx264 @ 0x563dd5b04040] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 [libx264 @ 0x563dd5b04040] profile High, level 3.0, 4:2:0, 8-bit [libx264 @ 0x563dd5b04040] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to 'examples/tmprp0g9geptmp_av.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf58.76.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc58.134.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s Metadata: encoder : Lavc58.134.100 aac frame= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x frame= 256 fps=0.0 q=32.0 size= 0kB time=00:00:03.15 bitrate= 0.1kbits/s speed= 6.2x frame= 287 fps=0.0 q=-1.0 Lsize= 236kB time=00:00:04.54 bitrate= 426.4kbits/s speed=6.84x video:192kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.409990% [libx264 @ 0x563dd5b04040] frame I:2 Avg QP:10.18 size: 4790 [libx264 @ 0x563dd5b04040] frame P:80 Avg QP:24.58 size: 1299 [libx264 @ 0x563dd5b04040] frame B:205 Avg QP:30.72 size: 401 [libx264 @ 0x563dd5b04040] consecutive B-frames: 3.1% 4.2% 2.1% 90.6% [libx264 @ 0x563dd5b04040] mb I I16..4: 87.4% 0.5% 12.1% [libx264 @ 0x563dd5b04040] mb P I16..4: 0.8% 1.5% 0.1% P16..4: 6.3% 4.2% 2.3% 0.0% 0.0% skip:84.8% [libx264 @ 0x563dd5b04040] mb B I16..4: 0.2% 0.2% 0.0% B16..8: 9.2% 2.1% 0.9% direct: 0.3% skip:87.2% L0:51.4% L1:46.9% BI: 1.7% [libx264 @ 0x563dd5b04040] 8x8 transform intra:34.6% inter:6.5% [libx264 @ 0x563dd5b04040] coded y,uvDC,uvAC intra: 3.5% 17.4% 11.5% inter: 1.4% 3.7% 3.4% [libx264 @ 0x563dd5b04040] i16 v,h,dc,p: 81% 15% 4% 0% [libx264 @ 0x563dd5b04040] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 6% 9% 84% 1% 0% 0% 0% 0% 0% [libx264 @ 0x563dd5b04040] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 25% 31% 4% 2% 4% 5% 3% 5% [libx264 @ 0x563dd5b04040] i8c dc,h,v,p: 55% 23% 22% 0% [libx264 @ 0x563dd5b04040] Weighted P-Frames: Y:7.5% UV:0.0% [libx264 @ 0x563dd5b04040] ref P L0: 45.0% 16.4% 18.5% 18.4% 1.6% [libx264 @ 0x563dd5b04040] ref B L0: 77.2% 17.2% 5.6% [libx264 @ 0x563dd5b04040] ref B L1: 91.5% 8.5% [libx264 @ 0x563dd5b04040] kb/s:340.95 [aac @ 0x563dd5b05680] Qavg: 135.552 Run on device cuda OpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' Time - only video: 8.78536581993103 Time - ffmpeg add audio: 10.202400922775269 finish image2image gen examples/test_pred_fls_tmprp0g9geptmp_audio_embed.mp4
Prediction
cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6ID5vbreezmubbkpbu6yoyfkcbc3eStatusSucceededSourceWebHardware–Total durationCreatedInput
- audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions and subtitles off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "audio": "https://replicate.delivery/pbxt/IXGW0bCnfMzd0wAxEXqU91V4IlcGAdInxreKOQJEbnfZZkxM/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGVzv74hwydl86bHOfcetRLqN9HGS7jdnnVmuZK2ut6jvf1/dragonmom.jpg" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", { input: { audio: "https://replicate.delivery/pbxt/IXGW0bCnfMzd0wAxEXqU91V4IlcGAdInxreKOQJEbnfZZkxM/tmp.wav", image: "https://replicate.delivery/pbxt/IXGVzv74hwydl86bHOfcetRLqN9HGS7jdnnVmuZK2ut6jvf1/dragonmom.jpg" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", input={ "audio": "https://replicate.delivery/pbxt/IXGW0bCnfMzd0wAxEXqU91V4IlcGAdInxreKOQJEbnfZZkxM/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGVzv74hwydl86bHOfcetRLqN9HGS7jdnnVmuZK2ut6jvf1/dragonmom.jpg" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", "input": { "audio": "https://replicate.delivery/pbxt/IXGW0bCnfMzd0wAxEXqU91V4IlcGAdInxreKOQJEbnfZZkxM/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGVzv74hwydl86bHOfcetRLqN9HGS7jdnnVmuZK2ut6jvf1/dragonmom.jpg" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2023-03-25T11:25:40.423325Z", "created_at": "2023-03-25T11:19:58.020379Z", "data_removed": false, "error": null, "id": "5vbreezmubbkpbu6yoyfkcbc3e", "input": { "audio": "https://replicate.delivery/pbxt/IXGW0bCnfMzd0wAxEXqU91V4IlcGAdInxreKOQJEbnfZZkxM/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGVzv74hwydl86bHOfcetRLqN9HGS7jdnnVmuZK2ut6jvf1/dragonmom.jpg" }, "logs": "/tmp/tmpgantyuv9tmp.wav\n/tmp/tmp5cbkmtwhdragonmom.jpg\nAudio-----> tmpgantyuv9tmp.wav\nParameters===== tmpgantyuv9tmp.wav 16000 [95 75 80 ... 25 30 35]\nLoaded the voice encoder model on cuda in 0.08 seconds.\nProcessing audio file tmpgantyuv9tmp.wav\n0 out of 0 are in this portion\nLoaded the voice encoder model on cuda in 0.02 seconds.\nsource shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257])\nconverted shape: torch.Size([1, 320, 80]) torch.Size([1, 640])\nRun on device: cuda\nLoading Data random_val\nEVAL num videos: 1\nG: Running on cuda, total num params = 3.00M\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth =========\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth =========\n====================================\n48uYS3bHIA8\nYAZuSHvwVC0\n0yaLdVk_UyQ\nE_kmpT-EfOg\nfQR31F7L3ww\nJPMZAOGGHh8\nW6uRNCJmdtI\n2KL8PfQPmBg\np575B7k07a8\niUoAe2gXKE4\nHH-iOC056aQ\nS8fiWqrZEew\nROWN2ssXek8\nirx71tYyI-Q\nme6cdZCM2FY\nOkqHtWOFliM\nOfPKHc6w2vw\n1lh57VnuaKE\n_ldiVrXgZKc\nH1Xnb_rtgqY\n45hn7-LXDX8\nbs7ZWVqAGCU\nUElg0R7fmlk\nbCs5SoifsiY\n1Lx_ZqrK1bM\nRrnL6Pcjjbw\nsRbWv2R2hxE\nwJmdE0G4sEg\nhE-4e1vEiT8\nXXbxe3fCQqg\n02HOKnTjBlQ\nwAAMEC1OsRc\n7Sk--XzX8b0\nI5Lm0Qce5kg\nqLxfiUMYgQg\n_VpqWkdcaqM\nljIkW4uVVQY\n5m5iPZNJS6c\nJ-NPsvtQ8lE\ngOrQyrbptGo\n43BiUVlNy58\nswLghyvhoqA\nX3FCAoFnmdA\n2NiCRAmwoc4\nKVUf0J2LAaA\nYtZS9hH1j24\n5fZj9Fzi5K0\nwbWKG26ebMw\nQgNlXur0wrs\nqek_5m1MRik\nrmFsUV5ICKk\nbEdGv1wixF4\nljh5PB6Utsc\nizudwWTXuUk\nB08yOvYMF7Y\nUEmI4r5G-5Y\nScujgl9GbHA\nsxCbrYjBsGA\nqvQC0w3y_Fo\nbXpavyiCu10\niWeklsXc0H8\nH00oAfd_GsM\nZ7WRt--g-h4\n29k8RtSUjE0\nE0zgrhQ0QDw\n9KhvSxKE6Mc\nqLNvRwMkhik\n====================================\n/src/src/approaches/train_audio2landmark.py:98: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\nz = torch.tensor(torch.zeros(aus.shape[0], 128), requires_grad=False, dtype=torch.float).to(device)\nOpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nexamples/tmpgantyuv9tmp.wav\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf59.27.100\nDuration: 00:00:04.59, start: 0.000000, bitrate: 5392 kb/s\nStream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5389 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nGuessed Channel Layout for Input Stream #1.0 : mono\nInput #1, wav, from 'examples/tmpgantyuv9tmp.wav':\nDuration: 00:00:04.88, bitrate: 256 kb/s\nStream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))\nStream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))\nPress [q] to stop, [?] for help\n[libx264 @ 0x5568010d1dc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n[libx264 @ 0x5568010d1dc0] profile High, level 3.0, 4:2:0, 8-bit\n[libx264 @ 0x5568010d1dc0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to 'examples/tmpgantyuv9tmp_av.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf58.76.100\nStream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nencoder : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\nStream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s\nMetadata:\nencoder : Lavc58.134.100 aac\nframe= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x\nframe= 287 fps=0.0 q=-1.0 Lsize= 229kB time=00:00:04.54 bitrate= 413.6kbits/s speed=8.01x\nvideo:185kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.489596%\n[libx264 @ 0x5568010d1dc0] frame I:2 Avg QP: 8.96 size: 4739\n[libx264 @ 0x5568010d1dc0] frame P:75 Avg QP:24.26 size: 1282\n[libx264 @ 0x5568010d1dc0] frame B:210 Avg QP:31.47 size: 394\n[libx264 @ 0x5568010d1dc0] consecutive B-frames: 1.7% 0.7% 4.2% 93.4%\n[libx264 @ 0x5568010d1dc0] mb I I16..4: 87.8% 0.2% 12.0%\n[libx264 @ 0x5568010d1dc0] mb P I16..4: 0.7% 1.8% 0.1% P16..4: 6.7% 4.1% 2.1% 0.0% 0.0% skip:84.6%\n[libx264 @ 0x5568010d1dc0] mb B I16..4: 0.3% 0.3% 0.0% B16..8: 9.3% 1.9% 0.8% direct: 0.3% skip:87.2% L0:52.4% L1:46.0% BI: 1.7%\n[libx264 @ 0x5568010d1dc0] 8x8 transform intra:38.1% inter:7.3%\n[libx264 @ 0x5568010d1dc0] coded y,uvDC,uvAC intra: 3.7% 14.1% 10.1% inter: 1.3% 3.5% 3.2%\n[libx264 @ 0x5568010d1dc0] i16 v,h,dc,p: 87% 10% 3% 0%\n[libx264 @ 0x5568010d1dc0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 5% 6% 90% 0% 0% 0% 0% 0% 0%\n[libx264 @ 0x5568010d1dc0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 24% 31% 4% 2% 4% 4% 2% 6%\n[libx264 @ 0x5568010d1dc0] i8c dc,h,v,p: 60% 19% 20% 0%\n[libx264 @ 0x5568010d1dc0] Weighted P-Frames: Y:9.3% UV:0.0%\n[libx264 @ 0x5568010d1dc0] ref P L0: 45.3% 15.2% 19.0% 18.9% 1.6%\n[libx264 @ 0x5568010d1dc0] ref B L0: 76.9% 17.0% 6.1%\n[libx264 @ 0x5568010d1dc0] ref B L1: 90.3% 9.7%\n[libx264 @ 0x5568010d1dc0] kb/s:328.34\n[aac @ 0x5568010d3800] Qavg: 135.552\nRun on device cuda\nOpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nTime - only video: 8.509207248687744\nTime - ffmpeg add audio: 9.418673992156982\nfinish image2image gen\nexamples/test_pred_fls_tmpgantyuv9tmp_audio_embed.mp4", "metrics": { "predict_time": 53.570513, "total_time": 342.402946 }, "output": "https://replicate.delivery/pbxt/TvOjt4SVEJ7ALtTQZVEmRwpaXCPYB86SqetfB2Q4WAi0wIrQA/test_pred_fls_tmpgantyuv9tmp_audio_embed.mp4", "started_at": "2023-03-25T11:24:46.852812Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/5vbreezmubbkpbu6yoyfkcbc3e", "cancel": "https://api.replicate.com/v1/predictions/5vbreezmubbkpbu6yoyfkcbc3e/cancel" }, "version": "e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6" }
Generated in/tmp/tmpgantyuv9tmp.wav /tmp/tmp5cbkmtwhdragonmom.jpg Audio-----> tmpgantyuv9tmp.wav Parameters===== tmpgantyuv9tmp.wav 16000 [95 75 80 ... 25 30 35] Loaded the voice encoder model on cuda in 0.08 seconds. Processing audio file tmpgantyuv9tmp.wav 0 out of 0 are in this portion Loaded the voice encoder model on cuda in 0.02 seconds. source shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257]) converted shape: torch.Size([1, 320, 80]) torch.Size([1, 640]) Run on device: cuda Loading Data random_val EVAL num videos: 1 G: Running on cuda, total num params = 3.00M ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth ========= ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth ========= ==================================== 48uYS3bHIA8 YAZuSHvwVC0 0yaLdVk_UyQ E_kmpT-EfOg fQR31F7L3ww JPMZAOGGHh8 W6uRNCJmdtI 2KL8PfQPmBg p575B7k07a8 iUoAe2gXKE4 HH-iOC056aQ S8fiWqrZEew ROWN2ssXek8 irx71tYyI-Q me6cdZCM2FY OkqHtWOFliM OfPKHc6w2vw 1lh57VnuaKE _ldiVrXgZKc H1Xnb_rtgqY 45hn7-LXDX8 bs7ZWVqAGCU UElg0R7fmlk bCs5SoifsiY 1Lx_ZqrK1bM RrnL6Pcjjbw sRbWv2R2hxE wJmdE0G4sEg hE-4e1vEiT8 XXbxe3fCQqg 02HOKnTjBlQ wAAMEC1OsRc 7Sk--XzX8b0 I5Lm0Qce5kg qLxfiUMYgQg _VpqWkdcaqM ljIkW4uVVQY 5m5iPZNJS6c J-NPsvtQ8lE gOrQyrbptGo 43BiUVlNy58 swLghyvhoqA X3FCAoFnmdA 2NiCRAmwoc4 KVUf0J2LAaA YtZS9hH1j24 5fZj9Fzi5K0 wbWKG26ebMw QgNlXur0wrs qek_5m1MRik rmFsUV5ICKk bEdGv1wixF4 ljh5PB6Utsc izudwWTXuUk B08yOvYMF7Y UEmI4r5G-5Y Scujgl9GbHA sxCbrYjBsGA qvQC0w3y_Fo bXpavyiCu10 iWeklsXc0H8 H00oAfd_GsM Z7WRt--g-h4 29k8RtSUjE0 E0zgrhQ0QDw 9KhvSxKE6Mc qLNvRwMkhik ==================================== /src/src/approaches/train_audio2landmark.py:98: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). z = torch.tensor(torch.zeros(aus.shape[0], 128), requires_grad=False, dtype=torch.float).to(device) OpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' examples/tmpgantyuv9tmp.wav ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf59.27.100 Duration: 00:00:04.59, start: 0.000000, bitrate: 5392 kb/s Stream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5389 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] Guessed Channel Layout for Input Stream #1.0 : mono Input #1, wav, from 'examples/tmpgantyuv9tmp.wav': Duration: 00:00:04.88, bitrate: 256 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s Stream mapping: Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264)) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help [libx264 @ 0x5568010d1dc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 [libx264 @ 0x5568010d1dc0] profile High, level 3.0, 4:2:0, 8-bit [libx264 @ 0x5568010d1dc0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to 'examples/tmpgantyuv9tmp_av.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf58.76.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc58.134.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s Metadata: encoder : Lavc58.134.100 aac frame= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x frame= 287 fps=0.0 q=-1.0 Lsize= 229kB time=00:00:04.54 bitrate= 413.6kbits/s speed=8.01x video:185kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.489596% [libx264 @ 0x5568010d1dc0] frame I:2 Avg QP: 8.96 size: 4739 [libx264 @ 0x5568010d1dc0] frame P:75 Avg QP:24.26 size: 1282 [libx264 @ 0x5568010d1dc0] frame B:210 Avg QP:31.47 size: 394 [libx264 @ 0x5568010d1dc0] consecutive B-frames: 1.7% 0.7% 4.2% 93.4% [libx264 @ 0x5568010d1dc0] mb I I16..4: 87.8% 0.2% 12.0% [libx264 @ 0x5568010d1dc0] mb P I16..4: 0.7% 1.8% 0.1% P16..4: 6.7% 4.1% 2.1% 0.0% 0.0% skip:84.6% [libx264 @ 0x5568010d1dc0] mb B I16..4: 0.3% 0.3% 0.0% B16..8: 9.3% 1.9% 0.8% direct: 0.3% skip:87.2% L0:52.4% L1:46.0% BI: 1.7% [libx264 @ 0x5568010d1dc0] 8x8 transform intra:38.1% inter:7.3% [libx264 @ 0x5568010d1dc0] coded y,uvDC,uvAC intra: 3.7% 14.1% 10.1% inter: 1.3% 3.5% 3.2% [libx264 @ 0x5568010d1dc0] i16 v,h,dc,p: 87% 10% 3% 0% [libx264 @ 0x5568010d1dc0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 5% 6% 90% 0% 0% 0% 0% 0% 0% [libx264 @ 0x5568010d1dc0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 24% 31% 4% 2% 4% 4% 2% 6% [libx264 @ 0x5568010d1dc0] i8c dc,h,v,p: 60% 19% 20% 0% [libx264 @ 0x5568010d1dc0] Weighted P-Frames: Y:9.3% UV:0.0% [libx264 @ 0x5568010d1dc0] ref P L0: 45.3% 15.2% 19.0% 18.9% 1.6% [libx264 @ 0x5568010d1dc0] ref B L0: 76.9% 17.0% 6.1% [libx264 @ 0x5568010d1dc0] ref B L1: 90.3% 9.7% [libx264 @ 0x5568010d1dc0] kb/s:328.34 [aac @ 0x5568010d3800] Qavg: 135.552 Run on device cuda OpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' Time - only video: 8.509207248687744 Time - ffmpeg add audio: 9.418673992156982 finish image2image gen examples/test_pred_fls_tmpgantyuv9tmp_audio_embed.mp4
Prediction
cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6Input
- audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions and subtitles off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "audio": "https://replicate.delivery/pbxt/IXGbria2TQkDJ2cLTEFcV2S20QIe8Mp77ZHC0cXCFL3Ucjhi/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGbrIcMpwb3WJDbPcD1qKMBVQg7wclMNP8MKfdxTTrWZz06/statue1.jpg" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", { input: { audio: "https://replicate.delivery/pbxt/IXGbria2TQkDJ2cLTEFcV2S20QIe8Mp77ZHC0cXCFL3Ucjhi/tmp.wav", image: "https://replicate.delivery/pbxt/IXGbrIcMpwb3WJDbPcD1qKMBVQg7wclMNP8MKfdxTTrWZz06/statue1.jpg" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", input={ "audio": "https://replicate.delivery/pbxt/IXGbria2TQkDJ2cLTEFcV2S20QIe8Mp77ZHC0cXCFL3Ucjhi/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGbrIcMpwb3WJDbPcD1qKMBVQg7wclMNP8MKfdxTTrWZz06/statue1.jpg" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", "input": { "audio": "https://replicate.delivery/pbxt/IXGbria2TQkDJ2cLTEFcV2S20QIe8Mp77ZHC0cXCFL3Ucjhi/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGbrIcMpwb3WJDbPcD1qKMBVQg7wclMNP8MKfdxTTrWZz06/statue1.jpg" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2023-03-25T11:26:32.024657Z", "created_at": "2023-03-25T11:26:09.310233Z", "data_removed": false, "error": null, "id": "mhfulf7omjeqnetupzy7lbx2bm", "input": { "audio": "https://replicate.delivery/pbxt/IXGbria2TQkDJ2cLTEFcV2S20QIe8Mp77ZHC0cXCFL3Ucjhi/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGbrIcMpwb3WJDbPcD1qKMBVQg7wclMNP8MKfdxTTrWZz06/statue1.jpg" }, "logs": "/tmp/tmphjca72p_tmp.wav\n/tmp/tmpc7y0z_m_statue1.jpg\nAudio-----> tmphjca72p_tmp.wav\nParameters===== tmphjca72p_tmp.wav 16000 [95 75 80 ... 25 30 35]\nLoaded the voice encoder model on cuda in 0.01 seconds.\nProcessing audio filetmphjca72p_tmp.wav\n0 out of 0 are in this portion\nLoaded the voice encoder model on cuda in 0.02 seconds.\nsource shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257])\nconverted shape: torch.Size([1, 320, 80]) torch.Size([1, 640])\nRun on device: cuda\nLoading Data random_val\nEVAL num videos: 1\nG: Running on cuda, total num params = 3.00M\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth =========\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth =========\n====================================\n48uYS3bHIA8\nYAZuSHvwVC0\n0yaLdVk_UyQ\nE_kmpT-EfOg\nfQR31F7L3ww\nJPMZAOGGHh8\nW6uRNCJmdtI\n2KL8PfQPmBg\np575B7k07a8\niUoAe2gXKE4\nHH-iOC056aQ\nS8fiWqrZEew\nROWN2ssXek8\nirx71tYyI-Q\nme6cdZCM2FY\nOkqHtWOFliM\nOfPKHc6w2vw\n1lh57VnuaKE\n_ldiVrXgZKc\nH1Xnb_rtgqY\n45hn7-LXDX8\nbs7ZWVqAGCU\nUElg0R7fmlk\nbCs5SoifsiY\n1Lx_ZqrK1bM\nRrnL6Pcjjbw\nsRbWv2R2hxE\nwJmdE0G4sEg\nhE-4e1vEiT8\nXXbxe3fCQqg\n02HOKnTjBlQ\nwAAMEC1OsRc\n7Sk--XzX8b0\nI5Lm0Qce5kg\nqLxfiUMYgQg\n_VpqWkdcaqM\nljIkW4uVVQY\n5m5iPZNJS6c\nJ-NPsvtQ8lE\ngOrQyrbptGo\n43BiUVlNy58\nswLghyvhoqA\nX3FCAoFnmdA\n2NiCRAmwoc4\nKVUf0J2LAaA\nYtZS9hH1j24\n5fZj9Fzi5K0\nwbWKG26ebMw\nQgNlXur0wrs\nqek_5m1MRik\nrmFsUV5ICKk\nbEdGv1wixF4\nljh5PB6Utsc\nizudwWTXuUk\nB08yOvYMF7Y\nUEmI4r5G-5Y\nScujgl9GbHA\nsxCbrYjBsGA\nqvQC0w3y_Fo\nbXpavyiCu10\niWeklsXc0H8\nH00oAfd_GsM\nZ7WRt--g-h4\n29k8RtSUjE0\nE0zgrhQ0QDw\n9KhvSxKE6Mc\nqLNvRwMkhik\n====================================\nOpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nexamples/tmphjca72p_tmp.wav\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf59.27.100\nDuration: 00:00:04.59, start: 0.000000, bitrate: 5256 kb/s\nStream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5252 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nGuessed Channel Layout for Input Stream #1.0 : mono\nInput #1, wav, from 'examples/tmphjca72p_tmp.wav':\nDuration: 00:00:04.88, bitrate: 256 kb/s\nStream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))\nStream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))\nPress [q] to stop, [?] for help\n[libx264 @ 0x5564f37d34c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n[libx264 @ 0x5564f37d34c0] profile High, level 3.0, 4:2:0, 8-bit\n[libx264 @ 0x5564f37d34c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to 'examples/tmphjca72p_tmp_av.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf58.76.100\nStream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nencoder : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\nStream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s\nMetadata:\nencoder : Lavc58.134.100 aac\nframe= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x\nframe= 271 fps=0.0 q=32.0 size= 0kB time=00:00:03.39 bitrate= 0.1kbits/s speed=6.63x\nframe= 287 fps=0.0 q=-1.0 Lsize= 217kB time=00:00:04.54 bitrate= 391.8kbits/s speed=7.33x\nvideo:173kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.647152%\n[libx264 @ 0x5564f37d34c0] frame I:2 Avg QP: 8.75 size: 4535\n[libx264 @ 0x5564f37d34c0] frame P:75 Avg QP:24.30 size: 1201\n[libx264 @ 0x5564f37d34c0] frame B:210 Avg QP:31.38 size: 366\n[libx264 @ 0x5564f37d34c0] consecutive B-frames: 1.0% 4.2% 0.0% 94.8%\n[libx264 @ 0x5564f37d34c0] mb I I16..4: 88.7% 0.6% 10.6%\n[libx264 @ 0x5564f37d34c0] mb P I16..4: 0.5% 1.4% 0.1% P16..4: 6.2% 4.0% 2.2% 0.0% 0.0% skip:85.7%\n[libx264 @ 0x5564f37d34c0] mb B I16..4: 0.2% 0.2% 0.0% B16..8: 8.4% 2.0% 0.8% direct: 0.3% skip:88.3% L0:52.3% L1:46.0% BI: 1.7%\n[libx264 @ 0x5564f37d34c0] 8x8 transform intra:33.5% inter:6.6%\n[libx264 @ 0x5564f37d34c0] coded y,uvDC,uvAC intra: 4.7% 15.3% 11.3% inter: 1.2% 3.3% 3.0%\n[libx264 @ 0x5564f37d34c0] i16 v,h,dc,p: 86% 9% 4% 0%\n[libx264 @ 0x5564f37d34c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 5% 4% 90% 0% 0% 0% 0% 0% 0%\n[libx264 @ 0x5564f37d34c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 17% 31% 24% 4% 2% 4% 6% 4% 8%\n[libx264 @ 0x5564f37d34c0] i8c dc,h,v,p: 66% 16% 17% 0%\n[libx264 @ 0x5564f37d34c0] Weighted P-Frames: Y:8.0% UV:0.0%\n[libx264 @ 0x5564f37d34c0] ref P L0: 45.9% 16.0% 18.5% 18.0% 1.5%\n[libx264 @ 0x5564f37d34c0] ref B L0: 76.3% 17.6% 6.1%\n[libx264 @ 0x5564f37d34c0] ref B L1: 90.9% 9.1%\n[libx264 @ 0x5564f37d34c0] kb/s:306.68\n[aac @ 0x5564f37d5900] Qavg: 135.552\nRun on device cuda\nOpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nTime - only video: 8.431016206741333\nTime - ffmpeg add audio: 9.595462560653687\nfinish image2image gen\nexamples/test_pred_fls_tmphjca72p_tmp_audio_embed.mp4", "metrics": { "predict_time": 22.637205, "total_time": 22.714424 }, "output": "https://replicate.delivery/pbxt/XOreJf67AJmBTUJzO5ZC8H36QPfLQp550O8CRlvQ4KVOjRWhA/test_pred_fls_tmphjca72p_tmp_audio_embed.mp4", "started_at": "2023-03-25T11:26:09.387452Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/mhfulf7omjeqnetupzy7lbx2bm", "cancel": "https://api.replicate.com/v1/predictions/mhfulf7omjeqnetupzy7lbx2bm/cancel" }, "version": "e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6" }
Generated in/tmp/tmphjca72p_tmp.wav /tmp/tmpc7y0z_m_statue1.jpg Audio-----> tmphjca72p_tmp.wav Parameters===== tmphjca72p_tmp.wav 16000 [95 75 80 ... 25 30 35] Loaded the voice encoder model on cuda in 0.01 seconds. Processing audio filetmphjca72p_tmp.wav 0 out of 0 are in this portion Loaded the voice encoder model on cuda in 0.02 seconds. source shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257]) converted shape: torch.Size([1, 320, 80]) torch.Size([1, 640]) Run on device: cuda Loading Data random_val EVAL num videos: 1 G: Running on cuda, total num params = 3.00M ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth ========= ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth ========= ==================================== 48uYS3bHIA8 YAZuSHvwVC0 0yaLdVk_UyQ E_kmpT-EfOg fQR31F7L3ww JPMZAOGGHh8 W6uRNCJmdtI 2KL8PfQPmBg p575B7k07a8 iUoAe2gXKE4 HH-iOC056aQ S8fiWqrZEew ROWN2ssXek8 irx71tYyI-Q me6cdZCM2FY OkqHtWOFliM OfPKHc6w2vw 1lh57VnuaKE _ldiVrXgZKc H1Xnb_rtgqY 45hn7-LXDX8 bs7ZWVqAGCU UElg0R7fmlk bCs5SoifsiY 1Lx_ZqrK1bM RrnL6Pcjjbw sRbWv2R2hxE wJmdE0G4sEg hE-4e1vEiT8 XXbxe3fCQqg 02HOKnTjBlQ wAAMEC1OsRc 7Sk--XzX8b0 I5Lm0Qce5kg qLxfiUMYgQg _VpqWkdcaqM ljIkW4uVVQY 5m5iPZNJS6c J-NPsvtQ8lE gOrQyrbptGo 43BiUVlNy58 swLghyvhoqA X3FCAoFnmdA 2NiCRAmwoc4 KVUf0J2LAaA YtZS9hH1j24 5fZj9Fzi5K0 wbWKG26ebMw QgNlXur0wrs qek_5m1MRik rmFsUV5ICKk bEdGv1wixF4 ljh5PB6Utsc izudwWTXuUk B08yOvYMF7Y UEmI4r5G-5Y Scujgl9GbHA sxCbrYjBsGA qvQC0w3y_Fo bXpavyiCu10 iWeklsXc0H8 H00oAfd_GsM Z7WRt--g-h4 29k8RtSUjE0 E0zgrhQ0QDw 9KhvSxKE6Mc qLNvRwMkhik ==================================== OpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' examples/tmphjca72p_tmp.wav ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf59.27.100 Duration: 00:00:04.59, start: 0.000000, bitrate: 5256 kb/s Stream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5252 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] Guessed Channel Layout for Input Stream #1.0 : mono Input #1, wav, from 'examples/tmphjca72p_tmp.wav': Duration: 00:00:04.88, bitrate: 256 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s Stream mapping: Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264)) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help [libx264 @ 0x5564f37d34c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 [libx264 @ 0x5564f37d34c0] profile High, level 3.0, 4:2:0, 8-bit [libx264 @ 0x5564f37d34c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to 'examples/tmphjca72p_tmp_av.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf58.76.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc58.134.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s Metadata: encoder : Lavc58.134.100 aac frame= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x frame= 271 fps=0.0 q=32.0 size= 0kB time=00:00:03.39 bitrate= 0.1kbits/s speed=6.63x frame= 287 fps=0.0 q=-1.0 Lsize= 217kB time=00:00:04.54 bitrate= 391.8kbits/s speed=7.33x video:173kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.647152% [libx264 @ 0x5564f37d34c0] frame I:2 Avg QP: 8.75 size: 4535 [libx264 @ 0x5564f37d34c0] frame P:75 Avg QP:24.30 size: 1201 [libx264 @ 0x5564f37d34c0] frame B:210 Avg QP:31.38 size: 366 [libx264 @ 0x5564f37d34c0] consecutive B-frames: 1.0% 4.2% 0.0% 94.8% [libx264 @ 0x5564f37d34c0] mb I I16..4: 88.7% 0.6% 10.6% [libx264 @ 0x5564f37d34c0] mb P I16..4: 0.5% 1.4% 0.1% P16..4: 6.2% 4.0% 2.2% 0.0% 0.0% skip:85.7% [libx264 @ 0x5564f37d34c0] mb B I16..4: 0.2% 0.2% 0.0% B16..8: 8.4% 2.0% 0.8% direct: 0.3% skip:88.3% L0:52.3% L1:46.0% BI: 1.7% [libx264 @ 0x5564f37d34c0] 8x8 transform intra:33.5% inter:6.6% [libx264 @ 0x5564f37d34c0] coded y,uvDC,uvAC intra: 4.7% 15.3% 11.3% inter: 1.2% 3.3% 3.0% [libx264 @ 0x5564f37d34c0] i16 v,h,dc,p: 86% 9% 4% 0% [libx264 @ 0x5564f37d34c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 5% 4% 90% 0% 0% 0% 0% 0% 0% [libx264 @ 0x5564f37d34c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 17% 31% 24% 4% 2% 4% 6% 4% 8% [libx264 @ 0x5564f37d34c0] i8c dc,h,v,p: 66% 16% 17% 0% [libx264 @ 0x5564f37d34c0] Weighted P-Frames: Y:8.0% UV:0.0% [libx264 @ 0x5564f37d34c0] ref P L0: 45.9% 16.0% 18.5% 18.0% 1.5% [libx264 @ 0x5564f37d34c0] ref B L0: 76.3% 17.6% 6.1% [libx264 @ 0x5564f37d34c0] ref B L1: 90.9% 9.1% [libx264 @ 0x5564f37d34c0] kb/s:306.68 [aac @ 0x5564f37d5900] Qavg: 135.552 Run on device cuda OpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' Time - only video: 8.431016206741333 Time - ffmpeg add audio: 9.595462560653687 finish image2image gen examples/test_pred_fls_tmphjca72p_tmp_audio_embed.mp4
Prediction
cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6Input
- audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions and subtitles off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "audio": "https://replicate.delivery/pbxt/IXGiQgs73ZakEvZWQv7pIVxDiBu1PAYDjX4B35yj3BXpPMWn/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGiQKsOwuUU3tdBJFUEzvDzF5bFCddVqwvQak4cp9HbeUAS/Safeimagekit-crop-image-to-256x256.png" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", { input: { audio: "https://replicate.delivery/pbxt/IXGiQgs73ZakEvZWQv7pIVxDiBu1PAYDjX4B35yj3BXpPMWn/tmp.wav", image: "https://replicate.delivery/pbxt/IXGiQKsOwuUU3tdBJFUEzvDzF5bFCddVqwvQak4cp9HbeUAS/Safeimagekit-crop-image-to-256x256.png" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", input={ "audio": "https://replicate.delivery/pbxt/IXGiQgs73ZakEvZWQv7pIVxDiBu1PAYDjX4B35yj3BXpPMWn/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGiQKsOwuUU3tdBJFUEzvDzF5bFCddVqwvQak4cp9HbeUAS/Safeimagekit-crop-image-to-256x256.png" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run cudanexus/makeittalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "cudanexus/makeittalk:e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6", "input": { "audio": "https://replicate.delivery/pbxt/IXGiQgs73ZakEvZWQv7pIVxDiBu1PAYDjX4B35yj3BXpPMWn/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGiQKsOwuUU3tdBJFUEzvDzF5bFCddVqwvQak4cp9HbeUAS/Safeimagekit-crop-image-to-256x256.png" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2023-03-25T11:33:23.608897Z", "created_at": "2023-03-25T11:33:05.146638Z", "data_removed": false, "error": null, "id": "vtwovu4utba6je6vhogigklwvq", "input": { "audio": "https://replicate.delivery/pbxt/IXGiQgs73ZakEvZWQv7pIVxDiBu1PAYDjX4B35yj3BXpPMWn/tmp.wav", "image": "https://replicate.delivery/pbxt/IXGiQKsOwuUU3tdBJFUEzvDzF5bFCddVqwvQak4cp9HbeUAS/Safeimagekit-crop-image-to-256x256.png" }, "logs": "/tmp/tmpr1atprs0tmp.wav\n/tmp/tmp7iikesyzSafeimagekit-crop-image-to-256x256.png\nAudio-----> tmpr1atprs0tmp.wav\nParameters===== tmpr1atprs0tmp.wav 16000 [95 75 80 ... 25 30 35]\nLoaded the voice encoder model on cuda in 0.01 seconds.\nProcessing audio file tmpr1atprs0tmp.wav\n0 out of 0 are in this portion\nLoaded the voice encoder model on cuda in 0.01 seconds.\nsource shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257])\nconverted shape: torch.Size([1, 320, 80]) torch.Size([1, 640])\nRun on device: cuda\nLoading Data random_val\nEVAL num videos: 1\nG: Running on cuda, total num params = 3.00M\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth =========\n======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth =========\n====================================\n48uYS3bHIA8\nYAZuSHvwVC0\n0yaLdVk_UyQ\nE_kmpT-EfOg\nfQR31F7L3ww\nJPMZAOGGHh8\nW6uRNCJmdtI\n2KL8PfQPmBg\np575B7k07a8\niUoAe2gXKE4\nHH-iOC056aQ\nS8fiWqrZEew\nROWN2ssXek8\nirx71tYyI-Q\nme6cdZCM2FY\nOkqHtWOFliM\nOfPKHc6w2vw\n1lh57VnuaKE\n_ldiVrXgZKc\nH1Xnb_rtgqY\n45hn7-LXDX8\nbs7ZWVqAGCU\nUElg0R7fmlk\nbCs5SoifsiY\n1Lx_ZqrK1bM\nRrnL6Pcjjbw\nsRbWv2R2hxE\nwJmdE0G4sEg\nhE-4e1vEiT8\nXXbxe3fCQqg\n02HOKnTjBlQ\nwAAMEC1OsRc\n7Sk--XzX8b0\nI5Lm0Qce5kg\nqLxfiUMYgQg\n_VpqWkdcaqM\nljIkW4uVVQY\n5m5iPZNJS6c\nJ-NPsvtQ8lE\ngOrQyrbptGo\n43BiUVlNy58\nswLghyvhoqA\nX3FCAoFnmdA\n2NiCRAmwoc4\nKVUf0J2LAaA\nYtZS9hH1j24\n5fZj9Fzi5K0\nwbWKG26ebMw\nQgNlXur0wrs\nqek_5m1MRik\nrmFsUV5ICKk\nbEdGv1wixF4\nljh5PB6Utsc\nizudwWTXuUk\nB08yOvYMF7Y\nUEmI4r5G-5Y\nScujgl9GbHA\nsxCbrYjBsGA\nqvQC0w3y_Fo\nbXpavyiCu10\niWeklsXc0H8\nH00oAfd_GsM\nZ7WRt--g-h4\n29k8RtSUjE0\nE0zgrhQ0QDw\n9KhvSxKE6Mc\nqLNvRwMkhik\n====================================\nOpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nexamples/tmpr1atprs0tmp.wav\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf59.27.100\nDuration: 00:00:04.59, start: 0.000000, bitrate: 5995 kb/s\nStream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5992 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nGuessed Channel Layout for Input Stream #1.0 : mono\nInput #1, wav, from 'examples/tmpr1atprs0tmp.wav':\nDuration: 00:00:04.88, bitrate: 256 kb/s\nStream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))\nStream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native))\nPress [q] to stop, [?] for help\n[libx264 @ 0x5591141500c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n[libx264 @ 0x5591141500c0] profile High, level 3.0, 4:2:0, 8-bit\n[libx264 @ 0x5591141500c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to 'examples/tmpr1atprs0tmp_av.mp4':\nMetadata:\nmajor_brand : isom\nminor_version : 512\ncompatible_brands: isomiso2mp41\nencoder : Lavf58.76.100\nStream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default)\nMetadata:\nhandler_name : VideoHandler\nvendor_id : [0][0][0][0]\nencoder : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\nStream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s\nMetadata:\nencoder : Lavc58.134.100 aac\nframe= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x\nframe= 252 fps=0.0 q=32.0 size= 0kB time=00:00:03.13 bitrate= 0.1kbits/s speed=6.01x\nframe= 287 fps=0.0 q=-1.0 Lsize= 248kB time=00:00:04.54 bitrate= 447.0kbits/s speed=6.71x\nvideo:203kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.305839%\n[libx264 @ 0x5591141500c0] frame I:2 Avg QP:12.51 size: 5695\n[libx264 @ 0x5591141500c0] frame P:75 Avg QP:26.03 size: 1441\n[libx264 @ 0x5591141500c0] frame B:210 Avg QP:32.35 size: 419\n[libx264 @ 0x5591141500c0] consecutive B-frames: 1.0% 3.5% 2.1% 93.4%\n[libx264 @ 0x5591141500c0] mb I I16..4: 48.8% 37.9% 13.3%\n[libx264 @ 0x5591141500c0] mb P I16..4: 0.6% 1.5% 0.0% P16..4: 7.1% 4.1% 2.6% 0.0% 0.0% skip:84.0%\n[libx264 @ 0x5591141500c0] mb B I16..4: 0.2% 0.2% 0.0% B16..8: 9.2% 1.9% 0.9% direct: 0.3% skip:87.3% L0:51.1% L1:46.9% BI: 2.0%\n[libx264 @ 0x5591141500c0] 8x8 transform intra:52.9% inter:7.0%\n[libx264 @ 0x5591141500c0] coded y,uvDC,uvAC intra: 4.1% 15.2% 10.0% inter: 1.4% 3.8% 3.5%\n[libx264 @ 0x5591141500c0] i16 v,h,dc,p: 80% 13% 6% 0%\n[libx264 @ 0x5591141500c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 15% 67% 0% 0% 0% 0% 0% 0%\n[libx264 @ 0x5591141500c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 31% 15% 28% 10% 2% 4% 2% 3% 6%\n[libx264 @ 0x5591141500c0] i8c dc,h,v,p: 69% 15% 15% 1%\n[libx264 @ 0x5591141500c0] Weighted P-Frames: Y:1.3% UV:0.0%\n[libx264 @ 0x5591141500c0] ref P L0: 48.2% 16.0% 17.9% 17.6% 0.2%\n[libx264 @ 0x5591141500c0] ref B L0: 78.2% 16.3% 5.5%\n[libx264 @ 0x5591141500c0] ref B L1: 91.6% 8.4%\n[libx264 @ 0x5591141500c0] kb/s:361.37\n[aac @ 0x5591141526c0] Qavg: 135.552\nRun on device cuda\nOpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'\nOpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'\nTime - only video: 9.052097797393799\nTime - ffmpeg add audio: 10.245426416397095\nfinish image2image gen\nexamples/test_pred_fls_tmpr1atprs0tmp_audio_embed.mp4", "metrics": { "predict_time": 18.381924, "total_time": 18.462259 }, "output": "https://replicate.delivery/pbxt/YJ1taoNZAVakExw1LLQ9TVU9rCRpTrOqmtSJCpaCLo2AOyKE/test_pred_fls_tmpr1atprs0tmp_audio_embed.mp4", "started_at": "2023-03-25T11:33:05.226973Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/vtwovu4utba6je6vhogigklwvq", "cancel": "https://api.replicate.com/v1/predictions/vtwovu4utba6je6vhogigklwvq/cancel" }, "version": "e63aa3e0830945d12340aba53c63e27288b5705eec0c8ea0db5b144c5d64dbf6" }
Generated in/tmp/tmpr1atprs0tmp.wav /tmp/tmp7iikesyzSafeimagekit-crop-image-to-256x256.png Audio-----> tmpr1atprs0tmp.wav Parameters===== tmpr1atprs0tmp.wav 16000 [95 75 80 ... 25 30 35] Loaded the voice encoder model on cuda in 0.01 seconds. Processing audio file tmpr1atprs0tmp.wav 0 out of 0 are in this portion Loaded the voice encoder model on cuda in 0.01 seconds. source shape: torch.Size([1, 320, 80]) torch.Size([1, 256]) torch.Size([1, 256]) torch.Size([1, 320, 257]) converted shape: torch.Size([1, 320, 80]) torch.Size([1, 640]) Run on device: cuda Loading Data random_val EVAL num videos: 1 G: Running on cuda, total num params = 3.00M ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_speaker_branch.pth ========= ======== LOAD PRETRAINED FACE ID MODEL examples/ckpt/ckpt_content_branch.pth ========= ==================================== 48uYS3bHIA8 YAZuSHvwVC0 0yaLdVk_UyQ E_kmpT-EfOg fQR31F7L3ww JPMZAOGGHh8 W6uRNCJmdtI 2KL8PfQPmBg p575B7k07a8 iUoAe2gXKE4 HH-iOC056aQ S8fiWqrZEew ROWN2ssXek8 irx71tYyI-Q me6cdZCM2FY OkqHtWOFliM OfPKHc6w2vw 1lh57VnuaKE _ldiVrXgZKc H1Xnb_rtgqY 45hn7-LXDX8 bs7ZWVqAGCU UElg0R7fmlk bCs5SoifsiY 1Lx_ZqrK1bM RrnL6Pcjjbw sRbWv2R2hxE wJmdE0G4sEg hE-4e1vEiT8 XXbxe3fCQqg 02HOKnTjBlQ wAAMEC1OsRc 7Sk--XzX8b0 I5Lm0Qce5kg qLxfiUMYgQg _VpqWkdcaqM ljIkW4uVVQY 5m5iPZNJS6c J-NPsvtQ8lE gOrQyrbptGo 43BiUVlNy58 swLghyvhoqA X3FCAoFnmdA 2NiCRAmwoc4 KVUf0J2LAaA YtZS9hH1j24 5fZj9Fzi5K0 wbWKG26ebMw QgNlXur0wrs qek_5m1MRik rmFsUV5ICKk bEdGv1wixF4 ljh5PB6Utsc izudwWTXuUk B08yOvYMF7Y UEmI4r5G-5Y Scujgl9GbHA sxCbrYjBsGA qvQC0w3y_Fo bXpavyiCu10 iWeklsXc0H8 H00oAfd_GsM Z7WRt--g-h4 29k8RtSUjE0 E0zgrhQ0QDw 9KhvSxKE6Mc qLNvRwMkhik ==================================== OpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' examples/tmpr1atprs0tmp.wav ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'examples/tmp.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf59.27.100 Duration: 00:00:04.59, start: 0.000000, bitrate: 5995 kb/s Stream #0:0(und): Video: mjpeg (Baseline) (mp4v / 0x7634706D), yuvj420p(pc, bt470bg/unknown/unknown), 400x400, 5992 kb/s, 62.50 fps, 62.50 tbr, 10k tbn, 10k tbc (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] Guessed Channel Layout for Input Stream #1.0 : mono Input #1, wav, from 'examples/tmpr1atprs0tmp.wav': Duration: 00:00:04.88, bitrate: 256 kb/s Stream #1:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s Stream mapping: Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264)) Stream #1:0 -> #0:1 (pcm_s16le (native) -> aac (native)) Press [q] to stop, [?] for help [libx264 @ 0x5591141500c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 [libx264 @ 0x5591141500c0] profile High, level 3.0, 4:2:0, 8-bit [libx264 @ 0x5591141500c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=12 lookahead_threads=2 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00 Output #0, mp4, to 'examples/tmpr1atprs0tmp_av.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2mp41 encoder : Lavf58.76.100 Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuvj420p(pc, bt470bg/unknown/unknown, progressive), 400x400, q=2-31, 62.50 fps, 16k tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc58.134.100 libx264 Side data: cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s Metadata: encoder : Lavc58.134.100 aac frame= 1 fps=0.0 q=0.0 size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x frame= 252 fps=0.0 q=32.0 size= 0kB time=00:00:03.13 bitrate= 0.1kbits/s speed=6.01x frame= 287 fps=0.0 q=-1.0 Lsize= 248kB time=00:00:04.54 bitrate= 447.0kbits/s speed=6.71x video:203kB audio:39kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.305839% [libx264 @ 0x5591141500c0] frame I:2 Avg QP:12.51 size: 5695 [libx264 @ 0x5591141500c0] frame P:75 Avg QP:26.03 size: 1441 [libx264 @ 0x5591141500c0] frame B:210 Avg QP:32.35 size: 419 [libx264 @ 0x5591141500c0] consecutive B-frames: 1.0% 3.5% 2.1% 93.4% [libx264 @ 0x5591141500c0] mb I I16..4: 48.8% 37.9% 13.3% [libx264 @ 0x5591141500c0] mb P I16..4: 0.6% 1.5% 0.0% P16..4: 7.1% 4.1% 2.6% 0.0% 0.0% skip:84.0% [libx264 @ 0x5591141500c0] mb B I16..4: 0.2% 0.2% 0.0% B16..8: 9.2% 1.9% 0.9% direct: 0.3% skip:87.3% L0:51.1% L1:46.9% BI: 2.0% [libx264 @ 0x5591141500c0] 8x8 transform intra:52.9% inter:7.0% [libx264 @ 0x5591141500c0] coded y,uvDC,uvAC intra: 4.1% 15.2% 10.0% inter: 1.4% 3.8% 3.5% [libx264 @ 0x5591141500c0] i16 v,h,dc,p: 80% 13% 6% 0% [libx264 @ 0x5591141500c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 15% 67% 0% 0% 0% 0% 0% 0% [libx264 @ 0x5591141500c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 31% 15% 28% 10% 2% 4% 2% 3% 6% [libx264 @ 0x5591141500c0] i8c dc,h,v,p: 69% 15% 15% 1% [libx264 @ 0x5591141500c0] Weighted P-Frames: Y:1.3% UV:0.0% [libx264 @ 0x5591141500c0] ref P L0: 48.2% 16.0% 17.9% 17.6% 0.2% [libx264 @ 0x5591141500c0] ref B L0: 78.2% 16.3% 5.5% [libx264 @ 0x5591141500c0] ref B L1: 91.6% 8.4% [libx264 @ 0x5591141500c0] kb/s:361.37 [aac @ 0x5591141526c0] Qavg: 135.552 Run on device cuda OpenCV: FFMPEG: tag 0x67706a6d/'mjpg' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)' OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v' Time - only video: 9.052097797393799 Time - ffmpeg add audio: 10.245426416397095 finish image2image gen examples/test_pred_fls_tmpr1atprs0tmp_audio_embed.mp4
Want to make some of these yourself?
Run this model