acappemin / video-to-audio-and-piano
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Prediction
acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1Input
- video
- prompt
- the sound of playing piano
- if_piano
- v2a_num_steps
- 25
{ "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4", "prompt": "the sound of playing piano", "if_piano": true, "v2a_num_steps": 25 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", { input: { video: "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4", prompt: "the sound of playing piano", if_piano: true, v2a_num_steps: 25 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", input={ "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4", "prompt": "the sound of playing piano", "if_piano": True, "v2a_num_steps": 25 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", "input": { "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4", "prompt": "the sound of playing piano", "if_piano": true, "v2a_num_steps": 25 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2025-04-27T06:44:26.482730Z", "created_at": "2025-04-27T06:43:07.754000Z", "data_removed": false, "error": null, "id": "4knj559zd9rma0cpeqy88f83jw", "input": { "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4", "prompt": "the sound of playing piano", "if_piano": true, "v2a_num_steps": 25 }, "logs": "torch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplocuin6v.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.)\n2025-04-27 06:44:08.666 start\nframes_embed midis cond torch.Size([1, 751, 51]) tensor(1601.2759, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(72.5166, device='cuda:0')\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:44:25.534 sample\nduration 10.01 10.01\nMoviepy - Building video /tmp/tmplocuin6v.mp4.mp4.\nMoviePy - Writing audio in tmplocuin6v.mp4TEMP_MPY_wvf_snd.mp4\nchunk: 0%| | 0/221 [00:00<?, ?it/s, now=None]\nchunk: 71%|███████ | 157/221 [00:00<00:00, 1537.85it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmplocuin6v.mp4.mp4\nt: 0%| | 0/251 [00:00<?, ?it/s, now=None]\nt: 31%|███ | 77/251 [00:00<00:00, 765.68it/s, now=None]\nt: 61%|██████▏ | 154/251 [00:00<00:00, 549.76it/s, now=None]\nt: 85%|████████▍ | 213/251 [00:00<00:00, 531.87it/s, now=None]\nMoviepy - Done !\nMoviepy - video ready /tmp/tmplocuin6v.mp4.mp4\npaths /tmp/tmplocuin6v.mp4 /tmp/tmplocuin6v.mp4.wav /tmp/tmplocuin6v.mp4.mp4", "metrics": { "predict_time": 20.980655471, "total_time": 78.72873 }, "output": "https://replicate.delivery/xezq/RfuYXnQi6JxefJivecvfXCXacMFgnc7DsYs8clLeObhoSEuJF/tmplocuin6v.mp4.mp4", "started_at": "2025-04-27T06:44:05.502075Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bcwr-noxt5oqucrjxlpr36zplxdxgd65s522h7ebxx7gssjzdlqenvy7a", "get": "https://api.replicate.com/v1/predictions/4knj559zd9rma0cpeqy88f83jw", "cancel": "https://api.replicate.com/v1/predictions/4knj559zd9rma0cpeqy88f83jw/cancel" }, "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1" }
Generated intorch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplocuin6v.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.) 2025-04-27 06:44:08.666 start frames_embed midis cond torch.Size([1, 751, 51]) tensor(1601.2759, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(72.5166, device='cuda:0') No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) 2025-04-27 06:44:25.534 sample duration 10.01 10.01 Moviepy - Building video /tmp/tmplocuin6v.mp4.mp4. MoviePy - Writing audio in tmplocuin6v.mp4TEMP_MPY_wvf_snd.mp4 chunk: 0%| | 0/221 [00:00<?, ?it/s, now=None] chunk: 71%|███████ | 157/221 [00:00<00:00, 1537.85it/s, now=None] MoviePy - Done. Moviepy - Writing video /tmp/tmplocuin6v.mp4.mp4 t: 0%| | 0/251 [00:00<?, ?it/s, now=None] t: 31%|███ | 77/251 [00:00<00:00, 765.68it/s, now=None] t: 61%|██████▏ | 154/251 [00:00<00:00, 549.76it/s, now=None] t: 85%|████████▍ | 213/251 [00:00<00:00, 531.87it/s, now=None] Moviepy - Done ! Moviepy - video ready /tmp/tmplocuin6v.mp4.mp4 paths /tmp/tmplocuin6v.mp4 /tmp/tmplocuin6v.mp4.wav /tmp/tmplocuin6v.mp4.mp4
Prediction
acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1IDcw7p1f6tx5rma0cpeqz99rh7q8StatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- video
- prompt
- the sound of playing piano
- if_piano
- v2a_num_steps
- 25
{ "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4", "prompt": "the sound of playing piano", "if_piano": true, "v2a_num_steps": 25 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", { input: { video: "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4", prompt: "the sound of playing piano", if_piano: true, v2a_num_steps: 25 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", input={ "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4", "prompt": "the sound of playing piano", "if_piano": True, "v2a_num_steps": 25 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", "input": { "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4", "prompt": "the sound of playing piano", "if_piano": true, "v2a_num_steps": 25 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2025-04-27T06:46:15.728540Z", "created_at": "2025-04-27T06:45:58.633000Z", "data_removed": false, "error": null, "id": "cw7p1f6tx5rma0cpeqz99rh7q8", "input": { "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4", "prompt": "the sound of playing piano", "if_piano": true, "v2a_num_steps": 25 }, "logs": "torch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplyuyirb7.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.)\n2025-04-27 06:46:00.352 start\nframes_embed midis cond torch.Size([1, 751, 51]) tensor(2460.8906, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(216.9728, device='cuda:0')\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:46:14.943 sample\nduration 10.01 10.01\nMoviepy - Building video /tmp/tmplyuyirb7.mp4.mp4.\nMoviePy - Writing audio in tmplyuyirb7.mp4TEMP_MPY_wvf_snd.mp4\nchunk: 0%| | 0/221 [00:00<?, ?it/s, now=None]\nchunk: 82%|████████▏ | 181/221 [00:00<00:00, 1808.73it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmplyuyirb7.mp4.mp4\nt: 0%| | 0/251 [00:00<?, ?it/s, now=None]\nt: 30%|██▉ | 75/251 [00:00<00:00, 741.92it/s, now=None]\nt: 60%|█████▉ | 150/251 [00:00<00:00, 640.95it/s, now=None]\nt: 86%|████████▌ | 215/251 [00:00<00:00, 594.20it/s, now=None]\nMoviepy - Done !\nMoviepy - video ready /tmp/tmplyuyirb7.mp4.mp4\npaths /tmp/tmplyuyirb7.mp4 /tmp/tmplyuyirb7.mp4.wav /tmp/tmplyuyirb7.mp4.mp4", "metrics": { "predict_time": 17.087861393, "total_time": 17.09554 }, "output": "https://replicate.delivery/xezq/O8CQlR3efKh4MEOntzp8af2dlZFdHA5IIRLuYgxhB8kulwNpA/tmplyuyirb7.mp4.mp4", "started_at": "2025-04-27T06:45:58.640678Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bcwr-rsc2xfeioy4xnovpg3dtdxgv5tsbngfu6dvv537zdjmxel4s76bq", "get": "https://api.replicate.com/v1/predictions/cw7p1f6tx5rma0cpeqz99rh7q8", "cancel": "https://api.replicate.com/v1/predictions/cw7p1f6tx5rma0cpeqz99rh7q8/cancel" }, "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1" }
Generated intorch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplyuyirb7.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.) 2025-04-27 06:46:00.352 start frames_embed midis cond torch.Size([1, 751, 51]) tensor(2460.8906, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(216.9728, device='cuda:0') No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32) 2025-04-27 06:46:14.943 sample duration 10.01 10.01 Moviepy - Building video /tmp/tmplyuyirb7.mp4.mp4. MoviePy - Writing audio in tmplyuyirb7.mp4TEMP_MPY_wvf_snd.mp4 chunk: 0%| | 0/221 [00:00<?, ?it/s, now=None] chunk: 82%|████████▏ | 181/221 [00:00<00:00, 1808.73it/s, now=None] MoviePy - Done. Moviepy - Writing video /tmp/tmplyuyirb7.mp4.mp4 t: 0%| | 0/251 [00:00<?, ?it/s, now=None] t: 30%|██▉ | 75/251 [00:00<00:00, 741.92it/s, now=None] t: 60%|█████▉ | 150/251 [00:00<00:00, 640.95it/s, now=None] t: 86%|████████▌ | 215/251 [00:00<00:00, 594.20it/s, now=None] Moviepy - Done ! Moviepy - video ready /tmp/tmplyuyirb7.mp4.mp4 paths /tmp/tmplyuyirb7.mp4 /tmp/tmplyuyirb7.mp4.wav /tmp/tmplyuyirb7.mp4.mp4
Prediction
acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1IDhaxymbrchsrme0cper0vtrd73rStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- video
- prompt
- the sound of ripping paper
- if_piano
- v2a_num_steps
- 25
{ "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4", "prompt": "the sound of ripping paper", "if_piano": false, "v2a_num_steps": 25 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", { input: { video: "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4", prompt: "the sound of ripping paper", if_piano: false, v2a_num_steps: 25 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", input={ "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4", "prompt": "the sound of ripping paper", "if_piano": False, "v2a_num_steps": 25 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", "input": { "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4", "prompt": "the sound of ripping paper", "if_piano": false, "v2a_num_steps": 25 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2025-04-27T06:48:45.702405Z", "created_at": "2025-04-27T06:48:22.414000Z", "data_removed": false, "error": null, "id": "haxymbrchsrme0cper0vtrd73r", "input": { "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4", "prompt": "the sound of ripping paper", "if_piano": false, "v2a_num_steps": 25 }, "logs": "torch.Size([1, 752, 128]) tensor([752], dtype=torch.int32) ['the sound of ripping paper'] ['/tmp/tmpvx0f1eg3.mp4'] [False] None None None None\n2025-04-27 06:48:23.156 start\nframes_embed midis cond torch.Size([1, 752, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 752, 128]) tensor(74.5061, device='cuda:0')\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:48:42.812 sample\nduration 10.02 10.03\nMoviepy - Building video /tmp/tmpvx0f1eg3.mp4.mp4.\nMoviePy - Writing audio in tmpvx0f1eg3.mp4TEMP_MPY_wvf_snd.mp4\nchunk: 0%| | 0/221 [00:00<?, ?it/s, now=None]\nchunk: 58%|█████▊ | 129/221 [00:00<00:00, 1273.47it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmpvx0f1eg3.mp4.mp4\nt: 0%| | 0/301 [00:00<?, ?it/s, now=None]\nt: 7%|▋ | 21/301 [00:00<00:01, 197.98it/s, now=None]\nt: 15%|█▍ | 44/301 [00:00<00:01, 213.66it/s, now=None]\nt: 22%|██▏ | 66/301 [00:00<00:01, 192.46it/s, now=None]\nt: 29%|██▊ | 86/301 [00:00<00:01, 152.84it/s, now=None]\nt: 35%|███▌ | 106/301 [00:00<00:01, 156.03it/s, now=None]\nt: 42%|████▏ | 126/301 [00:00<00:01, 166.35it/s, now=None]\nt: 48%|████▊ | 144/301 [00:00<00:00, 169.37it/s, now=None]\nt: 54%|█████▍ | 162/301 [00:00<00:00, 165.00it/s, now=None]\nt: 60%|█████▉ | 180/301 [00:01<00:00, 165.40it/s, now=None]\nt: 65%|██████▌ | 197/301 [00:01<00:00, 149.37it/s, now=None]\nt: 71%|███████ | 213/301 [00:01<00:00, 143.34it/s, now=None]\nt: 76%|███████▌ | 229/301 [00:01<00:00, 146.87it/s, now=None]\nt: 81%|████████▏ | 245/301 [00:01<00:00, 149.78it/s, now=None]\nt: 87%|████████▋ | 261/301 [00:01<00:00, 151.83it/s, now=None]\nt: 92%|█████████▏| 277/301 [00:01<00:00, 146.75it/s, now=None]\nt: 98%|█████████▊| 295/301 [00:01<00:00, 155.36it/s, now=None]\n \nMoviepy - Done !\nMoviepy - video ready /tmp/tmpvx0f1eg3.mp4.mp4\npaths /tmp/tmpvx0f1eg3.mp4 /tmp/tmpvx0f1eg3.mp4.wav /tmp/tmpvx0f1eg3.mp4.mp4", "metrics": { "predict_time": 23.281406247, "total_time": 23.288405 }, "output": "https://replicate.delivery/xezq/HPL3CNp0JCJ0I5VzqaEwzSmQP2oMV6Sq464fUiTefAxbqwNpA/tmpvx0f1eg3.mp4.mp4", "started_at": "2025-04-27T06:48:22.420998Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bcwr-m24ttdxzg7muuugj7nza5byebctesqfzcxwn52nkktd6izvzydpa", "get": "https://api.replicate.com/v1/predictions/haxymbrchsrme0cper0vtrd73r", "cancel": "https://api.replicate.com/v1/predictions/haxymbrchsrme0cper0vtrd73r/cancel" }, "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1" }
Generated intorch.Size([1, 752, 128]) tensor([752], dtype=torch.int32) ['the sound of ripping paper'] ['/tmp/tmpvx0f1eg3.mp4'] [False] None None None None 2025-04-27 06:48:23.156 start frames_embed midis cond torch.Size([1, 752, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 752, 128]) tensor(74.5061, device='cuda:0') No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32) 2025-04-27 06:48:42.812 sample duration 10.02 10.03 Moviepy - Building video /tmp/tmpvx0f1eg3.mp4.mp4. MoviePy - Writing audio in tmpvx0f1eg3.mp4TEMP_MPY_wvf_snd.mp4 chunk: 0%| | 0/221 [00:00<?, ?it/s, now=None] chunk: 58%|█████▊ | 129/221 [00:00<00:00, 1273.47it/s, now=None] MoviePy - Done. Moviepy - Writing video /tmp/tmpvx0f1eg3.mp4.mp4 t: 0%| | 0/301 [00:00<?, ?it/s, now=None] t: 7%|▋ | 21/301 [00:00<00:01, 197.98it/s, now=None] t: 15%|█▍ | 44/301 [00:00<00:01, 213.66it/s, now=None] t: 22%|██▏ | 66/301 [00:00<00:01, 192.46it/s, now=None] t: 29%|██▊ | 86/301 [00:00<00:01, 152.84it/s, now=None] t: 35%|███▌ | 106/301 [00:00<00:01, 156.03it/s, now=None] t: 42%|████▏ | 126/301 [00:00<00:01, 166.35it/s, now=None] t: 48%|████▊ | 144/301 [00:00<00:00, 169.37it/s, now=None] t: 54%|█████▍ | 162/301 [00:00<00:00, 165.00it/s, now=None] t: 60%|█████▉ | 180/301 [00:01<00:00, 165.40it/s, now=None] t: 65%|██████▌ | 197/301 [00:01<00:00, 149.37it/s, now=None] t: 71%|███████ | 213/301 [00:01<00:00, 143.34it/s, now=None] t: 76%|███████▌ | 229/301 [00:01<00:00, 146.87it/s, now=None] t: 81%|████████▏ | 245/301 [00:01<00:00, 149.78it/s, now=None] t: 87%|████████▋ | 261/301 [00:01<00:00, 151.83it/s, now=None] t: 92%|█████████▏| 277/301 [00:01<00:00, 146.75it/s, now=None] t: 98%|█████████▊| 295/301 [00:01<00:00, 155.36it/s, now=None] Moviepy - Done ! Moviepy - video ready /tmp/tmpvx0f1eg3.mp4.mp4 paths /tmp/tmpvx0f1eg3.mp4 /tmp/tmpvx0f1eg3.mp4.wav /tmp/tmpvx0f1eg3.mp4.mp4
Prediction
acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1IDnfhqsz27dhrm80cper19z11x0mStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- video
- prompt
- the sound of race car, auto racing
- if_piano
- v2a_num_steps
- 25
{ "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4", "prompt": "the sound of race car, auto racing", "if_piano": false, "v2a_num_steps": 25 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", { input: { video: "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4", prompt: "the sound of race car, auto racing", if_piano: false, v2a_num_steps: 25 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", input={ "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4", "prompt": "the sound of race car, auto racing", "if_piano": False, "v2a_num_steps": 25 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1", "input": { "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4", "prompt": "the sound of race car, auto racing", "if_piano": false, "v2a_num_steps": 25 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2025-04-27T06:50:01.255748Z", "created_at": "2025-04-27T06:49:43.020000Z", "data_removed": false, "error": null, "id": "nfhqsz27dhrm80cper19z11x0m", "input": { "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4", "prompt": "the sound of race car, auto racing", "if_piano": false, "v2a_num_steps": 25 }, "logs": "torch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None\n2025-04-27 06:49:43.474 start\nframes_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0')\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:49:59.069 sample\nduration 10.04 10.04\nMoviepy - Building video /tmp/tmprdangfkr.mp4.mp4.\nMoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4\nchunk: 0%| | 0/222 [00:00<?, ?it/s, now=None]\nchunk: 76%|███████▌ | 169/222 [00:00<00:00, 1668.18it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmprdangfkr.mp4.mp4\nt: 0%| | 0/251 [00:00<?, ?it/s, now=None]\nt: 12%|█▏ | 30/251 [00:00<00:00, 297.99it/s, now=None]\nt: 25%|██▌ | 63/251 [00:00<00:00, 311.38it/s, now=None]\nt: 38%|███▊ | 95/251 [00:00<00:00, 291.66it/s, now=None]\nt: 50%|████▉ | 125/251 [00:00<00:00, 215.98it/s, now=None]\nt: 59%|█████▉ | 149/251 [00:00<00:00, 190.23it/s, now=None]\nt: 68%|██████▊ | 170/251 [00:00<00:00, 169.79it/s, now=None]\nt: 75%|███████▌ | 189/251 [00:00<00:00, 166.17it/s, now=None]\nt: 82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None]\nt: 90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None]\nt: 97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None]\nMoviepy - Done !\nMoviepy - video ready /tmp/tmprdangfkr.mp4.mp4\npaths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4", "metrics": { "predict_time": 18.228138133, "total_time": 18.235748 }, "output": "https://replicate.delivery/xezq/KGYPIdYAWZ70DZJXnVx84Xlrwg2fYXW4SegDGBWROX8ZW4mUA/tmprdangfkr.mp4.mp4", "started_at": "2025-04-27T06:49:43.027610Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bcwr-cofgrcqqbzwid7wxaur2knwobnvdsqpnizgdn774o4nfkua3jp4q", "get": "https://api.replicate.com/v1/predictions/nfhqsz27dhrm80cper19z11x0m", "cancel": "https://api.replicate.com/v1/predictions/nfhqsz27dhrm80cper19z11x0m/cancel" }, "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1" }
Generated intorch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None 2025-04-27 06:49:43.474 start frames_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0') No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32) 2025-04-27 06:49:59.069 sample duration 10.04 10.04 Moviepy - Building video /tmp/tmprdangfkr.mp4.mp4. MoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4 chunk: 0%| | 0/222 [00:00<?, ?it/s, now=None] chunk: 76%|███████▌ | 169/222 [00:00<00:00, 1668.18it/s, now=None] MoviePy - Done. Moviepy - Writing video /tmp/tmprdangfkr.mp4.mp4 t: 0%| | 0/251 [00:00<?, ?it/s, now=None] t: 12%|█▏ | 30/251 [00:00<00:00, 297.99it/s, now=None] t: 25%|██▌ | 63/251 [00:00<00:00, 311.38it/s, now=None] t: 38%|███▊ | 95/251 [00:00<00:00, 291.66it/s, now=None] t: 50%|████▉ | 125/251 [00:00<00:00, 215.98it/s, now=None] t: 59%|█████▉ | 149/251 [00:00<00:00, 190.23it/s, now=None] t: 68%|██████▊ | 170/251 [00:00<00:00, 169.79it/s, now=None] t: 75%|███████▌ | 189/251 [00:00<00:00, 166.17it/s, now=None] t: 82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None] t: 90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None] t: 97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None] Moviepy - Done ! Moviepy - video ready /tmp/tmprdangfkr.mp4.mp4 paths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4
Want to make some of these yourself?
Run this model