cjwbw/dreamtalk – Run with an API on Replicate

cjwbw / dreamtalk

RESEARCH/NON-COMMERCIAL USE ONLY: diffusion-based audio-driven expressive talking head generation

Cold

Public
1K runs
L40S
GitHub
Paper
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

image

*file

Preview

Input image. This specifies the input portrait. The resolution should be larger than 256x256 and will be cropped to 256x256.

audio

*file

Preview

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

Input audio file. The input audio file extensions should be wav, mp3, m4a, and mp4 (video with sound) should all be compatible.

style_clip

string

Input style_clip_mat, optional. This specifies the reference speaking style.

Default: "data/style_clip/3DMM/M030_front_neutral_level1_001.mat"

pose

string

Input pose, specifies the head pose and should be a .mat file.

Default: "data/pose/RichardShelby_front_neutral_level1_001.mat"

max_gen_len

integer

The maximum length (seconds) limitation for generating videos.

Default: 1000

num_inference_steps

integer

(minimum: 1, maximum: 500)

Number of denoising steps

Default: 10

crop_image

boolean

Enable cropping the input image. If your portrait is already cropped to 256x256, set this to False.

Default: true

Run this model in Node.js with one line of code:

npx create-replicate --model=cjwbw/dreamtalk

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run cjwbw/dreamtalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "cjwbw/dreamtalk:c52a2bad8c0bdf9645609de071dddb1ddab0b396b8bf7096027819473a85b4ca",
  {
    input: {
      pose: "data/pose/RichardShelby_front_neutral_level1_001.mat",
      audio: "https://replicate.delivery/pbxt/KBf8dw9d7uahoVoe0LXhzlV2X3hC6VVAza4HpXgiTK3NgOqy/example_reference.mp3",
      image: "https://replicate.delivery/pbxt/KBf8e4NvKBhVPly3fDK3vJoSdO8NDYUmukrtJAs3glm9mXaX/uncut_src_img.jpg",
      crop_image: true,
      style_clip: "data/style_clip/3DMM/M030_front_neutral_level1_001.mat",
      max_gen_len: 1000,
      num_inference_steps: 10
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run cjwbw/dreamtalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "cjwbw/dreamtalk:c52a2bad8c0bdf9645609de071dddb1ddab0b396b8bf7096027819473a85b4ca",
    input={
        "pose": "data/pose/RichardShelby_front_neutral_level1_001.mat",
        "audio": "https://replicate.delivery/pbxt/KBf8dw9d7uahoVoe0LXhzlV2X3hC6VVAza4HpXgiTK3NgOqy/example_reference.mp3",
        "image": "https://replicate.delivery/pbxt/KBf8e4NvKBhVPly3fDK3vJoSdO8NDYUmukrtJAs3glm9mXaX/uncut_src_img.jpg",
        "crop_image": True,
        "style_clip": "data/style_clip/3DMM/M030_front_neutral_level1_001.mat",
        "max_gen_len": 1000,
        "num_inference_steps": 10
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run cjwbw/dreamtalk using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "cjwbw/dreamtalk:c52a2bad8c0bdf9645609de071dddb1ddab0b396b8bf7096027819473a85b4ca",
    "input": {
      "pose": "data/pose/RichardShelby_front_neutral_level1_001.mat",
      "audio": "https://replicate.delivery/pbxt/KBf8dw9d7uahoVoe0LXhzlV2X3hC6VVAza4HpXgiTK3NgOqy/example_reference.mp3",
      "image": "https://replicate.delivery/pbxt/KBf8e4NvKBhVPly3fDK3vJoSdO8NDYUmukrtJAs3glm9mXaX/uncut_src_img.jpg",
      "crop_image": true,
      "style_clip": "data/style_clip/3DMM/M030_front_neutral_level1_001.mat",
      "max_gen_len": 1000,
      "num_inference_steps": 10
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2024-01-07T23:03:40.197536Z",
  "created_at": "2024-01-07T23:02:33.892397Z",
  "data_removed": false,
  "error": null,
  "id": "yrudhslb56mpvlolvhbwaafeoy",
  "input": {
    "pose": "data/pose/RichardShelby_front_neutral_level1_001.mat",
    "audio": "https://replicate.delivery/pbxt/KBf8dw9d7uahoVoe0LXhzlV2X3hC6VVAza4HpXgiTK3NgOqy/example_reference.mp3",
    "image": "https://replicate.delivery/pbxt/KBf8e4NvKBhVPly3fDK3vJoSdO8NDYUmukrtJAs3glm9mXaX/uncut_src_img.jpg",
    "crop_image": true,
    "style_clip": "data/style_clip/3DMM/M030_front_neutral_level1_001.mat",
    "max_gen_len": 1000,
    "num_inference_steps": 10
  },
  "logs": "ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil      56. 70.100 / 56. 70.100\nlibavcodec     58.134.100 / 58.134.100\nlibavformat    58. 76.100 / 58. 76.100\nlibavdevice    58. 13.100 / 58. 13.100\nlibavfilter     7.110.100 /  7.110.100\nlibswscale      5.  9.100 /  5.  9.100\nlibswresample   3.  9.100 /  3.  9.100\nlibpostproc    55.  9.100 / 55.  9.100\nInput #0, mp3, from '/tmp/tmp8b3ii7t7example_reference.mp3':\nMetadata:\nencoder         : Lavf58.29.100\nDuration: 00:00:14.59, start: 0.023021, bitrate: 64 kb/s\nStream #0:0: Audio: mp3, 48000 Hz, mono, fltp, 64 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))\nPress [q] to stop, [?] for help\n-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.\nOutput #0, wav, to 'cog_temp/tmp_input_16K.wav':\nMetadata:\nISFT            : Lavf58.76.100\nStream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s\nMetadata:\nencoder         : Lavc58.134.100 pcm_s16le\nsize=       0kB time=00:00:00.00 bitrate=N/A speed=N/A\nsize=     455kB time=00:00:14.56 bitrate= 256.1kbits/s speed=1.06e+03x\nvideo:0kB audio:455kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.016733%\ndata/style_clip/3DMM/M030_front_neutral_level1_001.mat\n/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/nn/functional.py:4236: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.\nwarnings.warn(\nffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil      56. 70.100 / 56. 70.100\nlibavcodec     58.134.100 / 58.134.100\nlibavformat    58. 76.100 / 58. 76.100\nlibavdevice    58. 13.100 / 58. 13.100\nlibavfilter     7.110.100 /  7.110.100\nlibswscale      5.  9.100 /  5.  9.100\nlibswresample   3.  9.100 /  3.  9.100\nlibpostproc    55.  9.100 / 55.  9.100\nInput #0, mov,mp4,m4a,3gp,3g2,mj2, from 'cog_temp/no_watermark.mp4':\nMetadata:\nmajor_brand     : isom\nminor_version   : 512\ncompatible_brands: isomiso2avc1mp41\nencoder         : Lavf58.76.100\nDuration: 00:00:14.57, start: 0.000000, bitrate: 305 kb/s\nStream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 256x256, 228 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)\nMetadata:\nhandler_name    : VideoHandler\nvendor_id       : [0][0][0][0]\nStream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 71 kb/s (default)\nMetadata:\nhandler_name    : SoundHandler\nvendor_id       : [0][0][0][0]\nStream mapping:\nStream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))\nStream #0:1 -> #0:1 (aac (native) -> aac (native))\nPress [q] to stop, [?] for help\n[png @ 0x5623cc248280] Application has requested 97 threads. Using a thread count greater than 16 is not recommended.\n[libx264 @ 0x5623cbe047c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2\n[libx264 @ 0x5623cbe047c0] profile High, level 1.3, 4:2:0, 8-bit\n[libx264 @ 0x5623cbe047c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00\nOutput #0, mp4, to 'tmp/out.mp4':\nMetadata:\nmajor_brand     : isom\nminor_version   : 512\ncompatible_brands: isomiso2avc1mp41\nencoder         : Lavf58.76.100\nStream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 256x256, q=2-31, 25 fps, 12800 tbn (default)\nMetadata:\nhandler_name    : VideoHandler\nvendor_id       : [0][0][0][0]\nencoder         : Lavc58.134.100 libx264\nSide data:\ncpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A\nStream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s (default)\nMetadata:\nhandler_name    : SoundHandler\nvendor_id       : [0][0][0][0]\nencoder         : Lavc58.134.100 aac\nframe=    1 fps=0.0 q=0.0 size=       0kB time=00:00:00.44 bitrate=   0.9kbits/s speed=71.8x\n[Parsed_movie_0 @ 0x5623cbe45bc0] EOF timestamp not reliable\nframe=  364 fps=0.0 q=-1.0 Lsize=     498kB time=00:00:14.52 bitrate= 280.9kbits/s speed=42.7x\nvideo:358kB audio:130kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.144712%\n[libx264 @ 0x5623cbe047c0] frame I:2     Avg QP:21.47  size: 11592\n[libx264 @ 0x5623cbe047c0] frame P:96    Avg QP:23.56  size:  2931\n[libx264 @ 0x5623cbe047c0] frame B:266   Avg QP:27.96  size:   230\n[libx264 @ 0x5623cbe047c0] consecutive B-frames:  0.8%  4.9%  0.8% 93.4%\n[libx264 @ 0x5623cbe047c0] mb I  I16..4:  4.1% 61.3% 34.6%\n[libx264 @ 0x5623cbe047c0] mb P  I16..4:  0.0%  0.2%  0.1%  P16..4: 37.6% 29.6% 19.1%  0.0%  0.0%    skip:13.4%\n[libx264 @ 0x5623cbe047c0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 35.0%  3.2%  0.5%  direct: 0.7%  skip:60.6%  L0:41.7% L1:49.2% BI: 9.1%\n[libx264 @ 0x5623cbe047c0] 8x8 transform intra:62.4% inter:60.9%\n[libx264 @ 0x5623cbe047c0] coded y,uvDC,uvAC intra: 92.6% 97.4% 79.8% inter: 13.7% 7.7% 0.1%\n[libx264 @ 0x5623cbe047c0] i16 v,h,dc,p: 33%  8%  8% 50%\n[libx264 @ 0x5623cbe047c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 15%  8%  5%  8% 14%  6%  9%  9%\n[libx264 @ 0x5623cbe047c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 14%  7%  6% 10% 15%  7%  9%  8%\n[libx264 @ 0x5623cbe047c0] i8c dc,h,v,p: 33% 21% 33% 14%\n[libx264 @ 0x5623cbe047c0] Weighted P-Frames: Y:4.2% UV:0.0%\n[libx264 @ 0x5623cbe047c0] ref P L0: 54.0% 19.8% 17.0%  8.6%  0.5%\n[libx264 @ 0x5623cbe047c0] ref B L0: 88.3%  8.9%  2.8%\n[libx264 @ 0x5623cbe047c0] ref B L1: 97.3%  2.7%\n[libx264 @ 0x5623cbe047c0] kb/s:200.88\n[aac @ 0x5623cbe337c0] Qavg: 205.920",
  "metrics": {
    "predict_time": 8.047499,
    "total_time": 66.305139
  },
  "output": "https://replicate.delivery/pbxt/fffqyCuUuCPv4pOCfclvZ0sr3P98hXCM8yzSW8Mof6nY5vRRC/out.mp4",
  "started_at": "2024-01-07T23:03:32.150037Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/yrudhslb56mpvlolvhbwaafeoy",
    "cancel": "https://api.replicate.com/v1/predictions/yrudhslb56mpvlolvhbwaafeoy/cancel"
  },
  "version": "c52a2bad8c0bdf9645609de071dddb1ddab0b396b8bf7096027819473a85b4ca"
}

Generated in

8.1 seconds

Tweak itReport

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Input #0, mp3, from '/tmp/tmp8b3ii7t7example_reference.mp3':
Metadata:
encoder         : Lavf58.29.100
Duration: 00:00:14.59, start: 0.023021, bitrate: 64 kb/s
Stream #0:0: Audio: mp3, 48000 Hz, mono, fltp, 64 kb/s
Stream mapping:
Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
-async is forwarded to lavfi similarly to -af aresample=async=1:min_hard_comp=0.100000:first_pts=0.
Output #0, wav, to 'cog_temp/tmp_input_16K.wav':
Metadata:
ISFT            : Lavf58.76.100
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s
Metadata:
encoder         : Lavc58.134.100 pcm_s16le
size=       0kB time=00:00:00.00 bitrate=N/A speed=N/A
size=     455kB time=00:00:14.56 bitrate= 256.1kbits/s speed=1.06e+03x
video:0kB audio:455kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.016733%
data/style_clip/3DMM/M030_front_neutral_level1_001.mat
/root/.pyenv/versions/3.11.7/lib/python3.11/site-packages/torch/nn/functional.py:4236: UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.
warnings.warn(
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'cog_temp/no_watermark.mp4':
Metadata:
major_brand     : isom
minor_version   : 512
compatible_brands: isomiso2avc1mp41
encoder         : Lavf58.76.100
Duration: 00:00:14.57, start: 0.000000, bitrate: 305 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 256x256, 228 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
handler_name    : VideoHandler
vendor_id       : [0][0][0][0]
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 71 kb/s (default)
Metadata:
handler_name    : SoundHandler
vendor_id       : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
[png @ 0x5623cc248280] Application has requested 97 threads. Using a thread count greater than 16 is not recommended.
[libx264 @ 0x5623cbe047c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x5623cbe047c0] profile High, level 1.3, 4:2:0, 8-bit
[libx264 @ 0x5623cbe047c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=8 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'tmp/out.mp4':
Metadata:
major_brand     : isom
minor_version   : 512
compatible_brands: isomiso2avc1mp41
encoder         : Lavf58.76.100
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 256x256, q=2-31, 25 fps, 12800 tbn (default)
Metadata:
handler_name    : VideoHandler
vendor_id       : [0][0][0][0]
encoder         : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 16000 Hz, mono, fltp, 69 kb/s (default)
Metadata:
handler_name    : SoundHandler
vendor_id       : [0][0][0][0]
encoder         : Lavc58.134.100 aac
frame=    1 fps=0.0 q=0.0 size=       0kB time=00:00:00.44 bitrate=   0.9kbits/s speed=71.8x
[Parsed_movie_0 @ 0x5623cbe45bc0] EOF timestamp not reliable
frame=  364 fps=0.0 q=-1.0 Lsize=     498kB time=00:00:14.52 bitrate= 280.9kbits/s speed=42.7x
video:358kB audio:130kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.144712%
[libx264 @ 0x5623cbe047c0] frame I:2     Avg QP:21.47  size: 11592
[libx264 @ 0x5623cbe047c0] frame P:96    Avg QP:23.56  size:  2931
[libx264 @ 0x5623cbe047c0] frame B:266   Avg QP:27.96  size:   230
[libx264 @ 0x5623cbe047c0] consecutive B-frames:  0.8%  4.9%  0.8% 93.4%
[libx264 @ 0x5623cbe047c0] mb I  I16..4:  4.1% 61.3% 34.6%
[libx264 @ 0x5623cbe047c0] mb P  I16..4:  0.0%  0.2%  0.1%  P16..4: 37.6% 29.6% 19.1%  0.0%  0.0%    skip:13.4%
[libx264 @ 0x5623cbe047c0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 35.0%  3.2%  0.5%  direct: 0.7%  skip:60.6%  L0:41.7% L1:49.2% BI: 9.1%
[libx264 @ 0x5623cbe047c0] 8x8 transform intra:62.4% inter:60.9%
[libx264 @ 0x5623cbe047c0] coded y,uvDC,uvAC intra: 92.6% 97.4% 79.8% inter: 13.7% 7.7% 0.1%
[libx264 @ 0x5623cbe047c0] i16 v,h,dc,p: 33%  8%  8% 50%
[libx264 @ 0x5623cbe047c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 27% 15%  8%  5%  8% 14%  6%  9%  9%
[libx264 @ 0x5623cbe047c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 25% 14%  7%  6% 10% 15%  7%  9%  8%
[libx264 @ 0x5623cbe047c0] i8c dc,h,v,p: 33% 21% 33% 14%
[libx264 @ 0x5623cbe047c0] Weighted P-Frames: Y:4.2% UV:0.0%
[libx264 @ 0x5623cbe047c0] ref P L0: 54.0% 19.8% 17.0%  8.6%  0.5%
[libx264 @ 0x5623cbe047c0] ref B L0: 88.3%  8.9%  2.8%
[libx264 @ 0x5623cbe047c0] ref B L1: 97.3%  2.7%
[libx264 @ 0x5623cbe047c0] kb/s:200.88
[aac @ 0x5623cbe337c0] Qavg: 205.920

Examples

View more examples

Run time and cost

This model costs approximately $0.013 to run on Replicate, or 76 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 14 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This model doesn't have a readme.