Readme
This model doesn't have a readme.
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run acappemin/deepaudio-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"acappemin/deepaudio-v1:354a16e5caccc8bcc33d084b6604f544006e315721f469737a3f3005327b7f45",
{
input: {
text: "Who finally decided to show up for work Yay",
video: "https://replicate.delivery/pbxt/MuPH7VmyWmOEmsGhJDawkwrJR4Ss1HwLdBJ4eXiLwkuPugOf/0235.mp4",
prompt: "",
text_prompt: "I've still got a few knocking around in here",
audio_prompt: "https://replicate.delivery/pbxt/MuPH7KLZCZhnSJ6etBmvdeeJmUjhOMqzb9TLJj4NN5vFZK0Y/Gobber-00-0778.wav",
v2a_num_steps: 25,
v2s_num_steps: 32
}
}
);
// To access the file URL:
console.log(output.url()); //=> "http://example.com"
// To write the file to disk:
fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import replicate
Run acappemin/deepaudio-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"acappemin/deepaudio-v1:354a16e5caccc8bcc33d084b6604f544006e315721f469737a3f3005327b7f45",
input={
"text": "Who finally decided to show up for work Yay",
"video": "https://replicate.delivery/pbxt/MuPH7VmyWmOEmsGhJDawkwrJR4Ss1HwLdBJ4eXiLwkuPugOf/0235.mp4",
"prompt": "",
"text_prompt": "I've still got a few knocking around in here",
"audio_prompt": "https://replicate.delivery/pbxt/MuPH7KLZCZhnSJ6etBmvdeeJmUjhOMqzb9TLJj4NN5vFZK0Y/Gobber-00-0778.wav",
"v2a_num_steps": 25,
"v2s_num_steps": 32
}
)
print(output)
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run acappemin/deepaudio-v1 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "acappemin/deepaudio-v1:354a16e5caccc8bcc33d084b6604f544006e315721f469737a3f3005327b7f45",
"input": {
"text": "Who finally decided to show up for work Yay",
"video": "https://replicate.delivery/pbxt/MuPH7VmyWmOEmsGhJDawkwrJR4Ss1HwLdBJ4eXiLwkuPugOf/0235.mp4",
"prompt": "",
"text_prompt": "I\'ve still got a few knocking around in here",
"audio_prompt": "https://replicate.delivery/pbxt/MuPH7KLZCZhnSJ6etBmvdeeJmUjhOMqzb9TLJj4NN5vFZK0Y/Gobber-00-0778.wav",
"v2a_num_steps": 25,
"v2s_num_steps": 32
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/acappemin/deepaudio-v1@sha256:354a16e5caccc8bcc33d084b6604f544006e315721f469737a3f3005327b7f45 \
-i 'text="Who finally decided to show up for work Yay"' \
-i 'video="https://replicate.delivery/pbxt/MuPH7VmyWmOEmsGhJDawkwrJR4Ss1HwLdBJ4eXiLwkuPugOf/0235.mp4"' \
-i 'prompt=""' \
-i $'text_prompt="I\'ve still got a few knocking around in here"' \
-i 'audio_prompt="https://replicate.delivery/pbxt/MuPH7KLZCZhnSJ6etBmvdeeJmUjhOMqzb9TLJj4NN5vFZK0Y/Gobber-00-0778.wav"' \
-i 'v2a_num_steps=25' \
-i 'v2s_num_steps=32'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/acappemin/deepaudio-v1@sha256:354a16e5caccc8bcc33d084b6604f544006e315721f469737a3f3005327b7f45
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "text": "Who finally decided to show up for work Yay", "video": "https://replicate.delivery/pbxt/MuPH7VmyWmOEmsGhJDawkwrJR4Ss1HwLdBJ4eXiLwkuPugOf/0235.mp4", "prompt": "", "text_prompt": "I\'ve still got a few knocking around in here", "audio_prompt": "https://replicate.delivery/pbxt/MuPH7KLZCZhnSJ6etBmvdeeJmUjhOMqzb9TLJj4NN5vFZK0Y/Gobber-00-0778.wav", "v2a_num_steps": 25, "v2s_num_steps": 32 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Add a payment method to run this model.
By signing in, you agree to our
terms of service and privacy policy
{
"completed_at": "2025-04-27T08:16:32.542033Z",
"created_at": "2025-04-27T08:16:08.393000Z",
"data_removed": false,
"error": null,
"id": "j60h4tb6s5rme0cpes8rt8z0cg",
"input": {
"text": "Who finally decided to show up for work Yay",
"video": "https://replicate.delivery/pbxt/MuPH7VmyWmOEmsGhJDawkwrJR4Ss1HwLdBJ4eXiLwkuPugOf/0235.mp4",
"prompt": "",
"text_prompt": "I've still got a few knocking around in here",
"audio_prompt": "https://replicate.delivery/pbxt/MuPH7KLZCZhnSJ6etBmvdeeJmUjhOMqzb9TLJj4NN5vFZK0Y/Gobber-00-0778.wav",
"v2a_num_steps": 25,
"v2s_num_steps": 32
},
"logs": "paths /tmp/tmp1oqg4zow0235.mp4 /tmp/tmp6xokgjqn.mp4/tmp /tmp/__tmp__tmp6xokgjqn.mp4.mp4\npaths /tmp/tmp79vmi389Gobber-00-0778.wav /tmp/tmpzu5awjru.wav\n2025-04-27 08:16:08.917 start\n[\u001b[32mINFO \u001b[0m]: \u001b[32mUsing video /tmp/tmp6xokgjqn.mp4\u001b[0m\n[\u001b[33mWARNING \u001b[0m]: \u001b[33mClip video is too short: 3.25 < 8.00\u001b[0m\n[\u001b[33mWARNING \u001b[0m]: \u001b[33mTruncating to 3.25 sec\u001b[0m\n[\u001b[33mWARNING \u001b[0m]: \u001b[33mSync video is too short: 3.20 < 3.25\u001b[0m\n[\u001b[33mWARNING \u001b[0m]: \u001b[33mTruncating to 3.20 sec\u001b[0m\n[\u001b[32mINFO \u001b[0m]: \u001b[32mPrompt: \u001b[0m\n[\u001b[32mINFO \u001b[0m]: \u001b[32mNegative prompt: \u001b[0m\n[\u001b[32mINFO \u001b[0m]: \u001b[32mAudio saved to /tmp/__tmp__tmp6xokgjqn.mp4.flac\u001b[0m\n[\u001b[32mINFO \u001b[0m]: \u001b[32mVideo saved to /tmp/__tmp__tmp6xokgjqn.mp4.mp4\u001b[0m\n[\u001b[32mINFO \u001b[0m]: \u001b[32mMemory usage: 4.87 GB\u001b[0m\n2025-04-27 08:16:29.705 end\ndatas2 1\n/tmp/__tmp__tmp6xokgjqn.mp4.mp4 None /tmp/__tmp__tmp6xokgjqn.mp4.flac None /tmp/tmpzu5awjru.wav\n############energy shape torch.Size([1, 252, 1]) torch.Size([1, 300, 1]) <class 'torch.Tensor'> <class 'torch.Tensor'> torch.float32 torch.float32\nVoice: main\nref_audio /tmp/tmpzu5awjru.wav\nConverting audio...\nUsing custom reference text...\nref_text I've still got a few knocking around in here.\nref_audio_ /tmp/tmpul11dhd9.wav\nNo voice tag found, using main.\nVoice: main\ngen_text 0 Who finally decided to show up for work Yay\nGenerating audio in 1 batches...\n 0%| | 0/1 [00:00<?, ?it/s]\n100%|██████████| 1/1 [00:00<00:00, 1.35it/s]\n100%|██████████| 1/1 [00:00<00:00, 1.35it/s]\nMoviepy - Building video /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4.\nMoviePy - Writing audio in __tmp__tmp6xokgjqn.mp4.mp4.genTEMP_MPY_wvf_snd.mp4\nchunk: 0%| | 0/71 [00:00<?, ?it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4\nt: 0%| | 0/78 [00:00<?, ?it/s, now=None]\nt: 14%|█▍ | 11/78 [00:00<00:00, 105.93it/s, now=None]\nt: 33%|███▎ | 26/78 [00:00<00:00, 125.79it/s, now=None]\nt: 51%|█████▏ | 40/78 [00:00<00:00, 130.95it/s, now=None]\nt: 71%|███████ | 55/78 [00:00<00:00, 134.89it/s, now=None]\nt: 88%|████████▊ | 69/78 [00:00<00:00, 135.55it/s, now=None][\u001b[33mWARNING \u001b[0m]: \u001b[33m/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file /tmp/__tmp__tmp6xokgjqn.mp4.mp4, 6220800 bytes wanted but 0 bytes read,at frame 77/78, at time 3.21/3.23 sec. Using the last valid frame instead.\nwarnings.warn(\"Warning: in file %s, \"%(self.filename)+\n\u001b[0m\n \nMoviepy - Done !\nMoviepy - video ready /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4",
"metrics": {
"predict_time": 24.139381105,
"total_time": 24.149033
},
"output": "https://replicate.delivery/xezq/I4yWImuAUrLfVqe6QefimD0NkCa2T4Oo6vSPhgoTioxDeM3kC/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4",
"started_at": "2025-04-27T08:16:08.402652Z",
"status": "succeeded",
"urls": {
"stream": "https://stream.replicate.com/v1/files/bcwr-j6fk52dc4ac7u5nc3bk6rsssafapyennsvvuwehtjljhflbaceqq",
"get": "https://api.replicate.com/v1/predictions/j60h4tb6s5rme0cpes8rt8z0cg",
"cancel": "https://api.replicate.com/v1/predictions/j60h4tb6s5rme0cpes8rt8z0cg/cancel"
},
"version": "354a16e5caccc8bcc33d084b6604f544006e315721f469737a3f3005327b7f45"
}
paths /tmp/tmp1oqg4zow0235.mp4 /tmp/tmp6xokgjqn.mp4/tmp /tmp/__tmp__tmp6xokgjqn.mp4.mp4
paths /tmp/tmp79vmi389Gobber-00-0778.wav /tmp/tmpzu5awjru.wav
2025-04-27 08:16:08.917 start
[INFO ]: Using video /tmp/tmp6xokgjqn.mp4
[WARNING ]: Clip video is too short: 3.25 < 8.00
[WARNING ]: Truncating to 3.25 sec
[WARNING ]: Sync video is too short: 3.20 < 3.25
[WARNING ]: Truncating to 3.20 sec
[INFO ]: Prompt:
[INFO ]: Negative prompt:
[INFO ]: Audio saved to /tmp/__tmp__tmp6xokgjqn.mp4.flac
[INFO ]: Video saved to /tmp/__tmp__tmp6xokgjqn.mp4.mp4
[INFO ]: Memory usage: 4.87 GB
2025-04-27 08:16:29.705 end
datas2 1
/tmp/__tmp__tmp6xokgjqn.mp4.mp4 None /tmp/__tmp__tmp6xokgjqn.mp4.flac None /tmp/tmpzu5awjru.wav
############energy shape torch.Size([1, 252, 1]) torch.Size([1, 300, 1]) <class 'torch.Tensor'> <class 'torch.Tensor'> torch.float32 torch.float32
Voice: main
ref_audio /tmp/tmpzu5awjru.wav
Converting audio...
Using custom reference text...
ref_text I've still got a few knocking around in here.
ref_audio_ /tmp/tmpul11dhd9.wav
No voice tag found, using main.
Voice: main
gen_text 0 Who finally decided to show up for work Yay
Generating audio in 1 batches...
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 1.35it/s]
100%|██████████| 1/1 [00:00<00:00, 1.35it/s]
Moviepy - Building video /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4.
MoviePy - Writing audio in __tmp__tmp6xokgjqn.mp4.mp4.genTEMP_MPY_wvf_snd.mp4
chunk: 0%| | 0/71 [00:00<?, ?it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4
t: 0%| | 0/78 [00:00<?, ?it/s, now=None]
t: 14%|█▍ | 11/78 [00:00<00:00, 105.93it/s, now=None]
t: 33%|███▎ | 26/78 [00:00<00:00, 125.79it/s, now=None]
t: 51%|█████▏ | 40/78 [00:00<00:00, 130.95it/s, now=None]
t: 71%|███████ | 55/78 [00:00<00:00, 134.89it/s, now=None]
t: 88%|████████▊ | 69/78 [00:00<00:00, 135.55it/s, now=None][WARNING ]: /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file /tmp/__tmp__tmp6xokgjqn.mp4.mp4, 6220800 bytes wanted but 0 bytes read,at frame 77/78, at time 3.21/3.23 sec. Using the last valid frame instead.
warnings.warn("Warning: in file %s, "%(self.filename)+
Moviepy - Done !
Moviepy - video ready /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4
This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.
This model doesn't have a readme.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
Choose a file from your machine
Hint: you can also drag files onto the input
Choose a file from your machine
Hint: you can also drag files onto the input
paths /tmp/tmp1oqg4zow0235.mp4 /tmp/tmp6xokgjqn.mp4/tmp /tmp/__tmp__tmp6xokgjqn.mp4.mp4
paths /tmp/tmp79vmi389Gobber-00-0778.wav /tmp/tmpzu5awjru.wav
2025-04-27 08:16:08.917 start
[INFO ]: Using video /tmp/tmp6xokgjqn.mp4
[WARNING ]: Clip video is too short: 3.25 < 8.00
[WARNING ]: Truncating to 3.25 sec
[WARNING ]: Sync video is too short: 3.20 < 3.25
[WARNING ]: Truncating to 3.20 sec
[INFO ]: Prompt:
[INFO ]: Negative prompt:
[INFO ]: Audio saved to /tmp/__tmp__tmp6xokgjqn.mp4.flac
[INFO ]: Video saved to /tmp/__tmp__tmp6xokgjqn.mp4.mp4
[INFO ]: Memory usage: 4.87 GB
2025-04-27 08:16:29.705 end
datas2 1
/tmp/__tmp__tmp6xokgjqn.mp4.mp4 None /tmp/__tmp__tmp6xokgjqn.mp4.flac None /tmp/tmpzu5awjru.wav
############energy shape torch.Size([1, 252, 1]) torch.Size([1, 300, 1]) <class 'torch.Tensor'> <class 'torch.Tensor'> torch.float32 torch.float32
Voice: main
ref_audio /tmp/tmpzu5awjru.wav
Converting audio...
Using custom reference text...
ref_text I've still got a few knocking around in here.
ref_audio_ /tmp/tmpul11dhd9.wav
No voice tag found, using main.
Voice: main
gen_text 0 Who finally decided to show up for work Yay
Generating audio in 1 batches...
0%| | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00, 1.35it/s]
100%|██████████| 1/1 [00:00<00:00, 1.35it/s]
Moviepy - Building video /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4.
MoviePy - Writing audio in __tmp__tmp6xokgjqn.mp4.mp4.genTEMP_MPY_wvf_snd.mp4
chunk: 0%| | 0/71 [00:00<?, ?it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4
t: 0%| | 0/78 [00:00<?, ?it/s, now=None]
t: 14%|█▍ | 11/78 [00:00<00:00, 105.93it/s, now=None]
t: 33%|███▎ | 26/78 [00:00<00:00, 125.79it/s, now=None]
t: 51%|█████▏ | 40/78 [00:00<00:00, 130.95it/s, now=None]
t: 71%|███████ | 55/78 [00:00<00:00, 134.89it/s, now=None]
t: 88%|████████▊ | 69/78 [00:00<00:00, 135.55it/s, now=None][WARNING ]: /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/moviepy/video/io/ffmpeg_reader.py:123: UserWarning: Warning: in file /tmp/__tmp__tmp6xokgjqn.mp4.mp4, 6220800 bytes wanted but 0 bytes read,at frame 77/78, at time 3.21/3.23 sec. Using the last valid frame instead.
warnings.warn("Warning: in file %s, "%(self.filename)+
Moviepy - Done !
Moviepy - video ready /tmp/__tmp__tmp6xokgjqn.mp4.mp4.gen.mp4