acappemin / video-to-audio-and-piano

{
  "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4",
  "prompt": "the sound of playing piano",
  "if_piano": true,
  "v2a_num_steps": 25
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
  {
    input: {
      video: "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4",
      prompt: "the sound of playing piano",
      if_piano: true,
      v2a_num_steps: 25
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    input={
        "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4",
        "prompt": "the sound of playing piano",
        "if_piano": True,
        "v2a_num_steps": 25
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    "input": {
      "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4",
      "prompt": "the sound of playing piano",
      "if_piano": true,
      "v2a_num_steps": 25
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2025-04-27T06:44:26.482730Z",
  "created_at": "2025-04-27T06:43:07.754000Z",
  "data_removed": false,
  "error": null,
  "id": "4knj559zd9rma0cpeqy88f83jw",
  "input": {
    "video": "https://replicate.delivery/pbxt/MuNr3iImIwHmZ1hsqv5BxSytNEb8I2TuNKDJ62fczDuszDx9/nwwHuxHMIpc.00000001.mp4",
    "prompt": "the sound of playing piano",
    "if_piano": true,
    "v2a_num_steps": 25
  },
  "logs": "torch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplocuin6v.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.)\n2025-04-27 06:44:08.666 start\nframes_embed midis cond torch.Size([1, 751, 51]) tensor(1601.2759, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(72.5166, device='cuda:0')\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:44:25.534 sample\nduration 10.01 10.01\nMoviepy - Building video /tmp/tmplocuin6v.mp4.mp4.\nMoviePy - Writing audio in tmplocuin6v.mp4TEMP_MPY_wvf_snd.mp4\nchunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]\nchunk:  71%|███████   | 157/221 [00:00<00:00, 1537.85it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmplocuin6v.mp4.mp4\nt:   0%|          | 0/251 [00:00<?, ?it/s, now=None]\nt:  31%|███       | 77/251 [00:00<00:00, 765.68it/s, now=None]\nt:  61%|██████▏   | 154/251 [00:00<00:00, 549.76it/s, now=None]\nt:  85%|████████▍ | 213/251 [00:00<00:00, 531.87it/s, now=None]\nMoviepy - Done !\nMoviepy - video ready /tmp/tmplocuin6v.mp4.mp4\npaths /tmp/tmplocuin6v.mp4 /tmp/tmplocuin6v.mp4.wav /tmp/tmplocuin6v.mp4.mp4",
  "metrics": {
    "predict_time": 20.980655471,
    "total_time": 78.72873
  },
  "output": "https://replicate.delivery/xezq/RfuYXnQi6JxefJivecvfXCXacMFgnc7DsYs8clLeObhoSEuJF/tmplocuin6v.mp4.mp4",
  "started_at": "2025-04-27T06:44:05.502075Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-noxt5oqucrjxlpr36zplxdxgd65s522h7ebxx7gssjzdlqenvy7a",
    "get": "https://api.replicate.com/v1/predictions/4knj559zd9rma0cpeqy88f83jw",
    "cancel": "https://api.replicate.com/v1/predictions/4knj559zd9rma0cpeqy88f83jw/cancel"
  },
  "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1"
}

Generated in

21.0 seconds

Tweak it Share Report

torch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplocuin6v.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.)
2025-04-27 06:44:08.666 start
frames_embed midis cond torch.Size([1, 751, 51]) tensor(1601.2759, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(72.5166, device='cuda:0')
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
2025-04-27 06:44:25.534 sample
duration 10.01 10.01
Moviepy - Building video /tmp/tmplocuin6v.mp4.mp4.
MoviePy - Writing audio in tmplocuin6v.mp4TEMP_MPY_wvf_snd.mp4
chunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]
chunk:  71%|███████   | 157/221 [00:00<00:00, 1537.85it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/tmplocuin6v.mp4.mp4
t:   0%|          | 0/251 [00:00<?, ?it/s, now=None]
t:  31%|███       | 77/251 [00:00<00:00, 765.68it/s, now=None]
t:  61%|██████▏   | 154/251 [00:00<00:00, 549.76it/s, now=None]
t:  85%|████████▍ | 213/251 [00:00<00:00, 531.87it/s, now=None]
Moviepy - Done !
Moviepy - video ready /tmp/tmplocuin6v.mp4.mp4
paths /tmp/tmplocuin6v.mp4 /tmp/tmplocuin6v.mp4.wav /tmp/tmplocuin6v.mp4.mp4

Prediction

acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1

Model

acappemin/video-to-audio-and-piano:d0808790

cw7p1f6tx5rma0cpeqz99rh7q8

Status

Succeeded

Source

Web

Hardware

L40S

Total duration

17.1s

Created

2 months ago

Input

video
prompt: the sound of playing piano
if_piano
v2a_num_steps: 25

{
  "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4",
  "prompt": "the sound of playing piano",
  "if_piano": true,
  "v2a_num_steps": 25
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
  {
    input: {
      video: "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4",
      prompt: "the sound of playing piano",
      if_piano: true,
      v2a_num_steps: 25
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    input={
        "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4",
        "prompt": "the sound of playing piano",
        "if_piano": True,
        "v2a_num_steps": 25
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    "input": {
      "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4",
      "prompt": "the sound of playing piano",
      "if_piano": true,
      "v2a_num_steps": 25
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2025-04-27T06:46:15.728540Z",
  "created_at": "2025-04-27T06:45:58.633000Z",
  "data_removed": false,
  "error": null,
  "id": "cw7p1f6tx5rma0cpeqz99rh7q8",
  "input": {
    "video": "https://replicate.delivery/pbxt/MuNtlZd6EryS1SUKOcriHXHeJVGCEKlSTKU4b3HXuMv4Acb8/u5nBBJndN3I.00000004.mp4",
    "prompt": "the sound of playing piano",
    "if_piano": true,
    "v2a_num_steps": 25
  },
  "logs": "torch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplyuyirb7.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.)\n2025-04-27 06:46:00.352 start\nframes_embed midis cond torch.Size([1, 751, 51]) tensor(2460.8906, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(216.9728, device='cuda:0')\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\nNo cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:46:14.943 sample\nduration 10.01 10.01\nMoviepy - Building video /tmp/tmplyuyirb7.mp4.mp4.\nMoviePy - Writing audio in tmplyuyirb7.mp4TEMP_MPY_wvf_snd.mp4\nchunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]\nchunk:  82%|████████▏ | 181/221 [00:00<00:00, 1808.73it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmplyuyirb7.mp4.mp4\nt:   0%|          | 0/251 [00:00<?, ?it/s, now=None]\nt:  30%|██▉       | 75/251 [00:00<00:00, 741.92it/s, now=None]\nt:  60%|█████▉    | 150/251 [00:00<00:00, 640.95it/s, now=None]\nt:  86%|████████▌ | 215/251 [00:00<00:00, 594.20it/s, now=None]\nMoviepy - Done !\nMoviepy - video ready /tmp/tmplyuyirb7.mp4.mp4\npaths /tmp/tmplyuyirb7.mp4 /tmp/tmplyuyirb7.mp4.wav /tmp/tmplyuyirb7.mp4.mp4",
  "metrics": {
    "predict_time": 17.087861393,
    "total_time": 17.09554
  },
  "output": "https://replicate.delivery/xezq/O8CQlR3efKh4MEOntzp8af2dlZFdHA5IIRLuYgxhB8kulwNpA/tmplyuyirb7.mp4.mp4",
  "started_at": "2025-04-27T06:45:58.640678Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-rsc2xfeioy4xnovpg3dtdxgv5tsbngfu6dvv537zdjmxel4s76bq",
    "get": "https://api.replicate.com/v1/predictions/cw7p1f6tx5rma0cpeqz99rh7q8",
    "cancel": "https://api.replicate.com/v1/predictions/cw7p1f6tx5rma0cpeqz99rh7q8/cancel"
  },
  "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1"
}

Generated in

17.1 seconds

Tweak it Share Report

torch.Size([1, 751, 128]) tensor([751], dtype=torch.int32) ['the sound of playing piano'] ['/tmp/tmplyuyirb7.mp4'] [False] None torch.Size([1, 1, 251, 100, 900]) torch.Size([1, 751, 51]) tensor(0.)
2025-04-27 06:46:00.352 start
frames_embed midis cond torch.Size([1, 751, 51]) tensor(2460.8906, device='cuda:0') torch.Size([1, 751, 51]) tensor(0., device='cuda:0') torch.Size([1, 751, 128]) tensor(216.9728, device='cuda:0')
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
No cond tensor([751], device='cuda:0', dtype=torch.int32) tensor([751], device='cuda:0', dtype=torch.int32)
2025-04-27 06:46:14.943 sample
duration 10.01 10.01
Moviepy - Building video /tmp/tmplyuyirb7.mp4.mp4.
MoviePy - Writing audio in tmplyuyirb7.mp4TEMP_MPY_wvf_snd.mp4
chunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]
chunk:  82%|████████▏ | 181/221 [00:00<00:00, 1808.73it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/tmplyuyirb7.mp4.mp4
t:   0%|          | 0/251 [00:00<?, ?it/s, now=None]
t:  30%|██▉       | 75/251 [00:00<00:00, 741.92it/s, now=None]
t:  60%|█████▉    | 150/251 [00:00<00:00, 640.95it/s, now=None]
t:  86%|████████▌ | 215/251 [00:00<00:00, 594.20it/s, now=None]
Moviepy - Done !
Moviepy - video ready /tmp/tmplyuyirb7.mp4.mp4
paths /tmp/tmplyuyirb7.mp4 /tmp/tmplyuyirb7.mp4.wav /tmp/tmplyuyirb7.mp4.mp4

Prediction

acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1

Model

acappemin/video-to-audio-and-piano:d0808790

haxymbrchsrme0cper0vtrd73r

Status

Succeeded

Source

Web

Hardware

L40S

Total duration

23.3s

Created

2 months ago

Input

video
prompt: the sound of ripping paper
if_piano
v2a_num_steps: 25

{
  "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4",
  "prompt": "the sound of ripping paper",
  "if_piano": false,
  "v2a_num_steps": 25
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
  {
    input: {
      video: "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4",
      prompt: "the sound of ripping paper",
      if_piano: false,
      v2a_num_steps: 25
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    input={
        "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4",
        "prompt": "the sound of ripping paper",
        "if_piano": False,
        "v2a_num_steps": 25
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    "input": {
      "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4",
      "prompt": "the sound of ripping paper",
      "if_piano": false,
      "v2a_num_steps": 25
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2025-04-27T06:48:45.702405Z",
  "created_at": "2025-04-27T06:48:22.414000Z",
  "data_removed": false,
  "error": null,
  "id": "haxymbrchsrme0cper0vtrd73r",
  "input": {
    "video": "https://replicate.delivery/pbxt/MuNvuAORvZG45IeGaBKw0zweyK5TJkJILmdKeAyRC5bDuC9c/1u1orBeV4xI_000428.mp4",
    "prompt": "the sound of ripping paper",
    "if_piano": false,
    "v2a_num_steps": 25
  },
  "logs": "torch.Size([1, 752, 128]) tensor([752], dtype=torch.int32) ['the sound of ripping paper'] ['/tmp/tmpvx0f1eg3.mp4'] [False] None None None None\n2025-04-27 06:48:23.156 start\nframes_embed midis cond torch.Size([1, 752, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 752, 128]) tensor(74.5061, device='cuda:0')\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\nNo cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:48:42.812 sample\nduration 10.02 10.03\nMoviepy - Building video /tmp/tmpvx0f1eg3.mp4.mp4.\nMoviePy - Writing audio in tmpvx0f1eg3.mp4TEMP_MPY_wvf_snd.mp4\nchunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]\nchunk:  58%|█████▊    | 129/221 [00:00<00:00, 1273.47it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmpvx0f1eg3.mp4.mp4\nt:   0%|          | 0/301 [00:00<?, ?it/s, now=None]\nt:   7%|▋         | 21/301 [00:00<00:01, 197.98it/s, now=None]\nt:  15%|█▍        | 44/301 [00:00<00:01, 213.66it/s, now=None]\nt:  22%|██▏       | 66/301 [00:00<00:01, 192.46it/s, now=None]\nt:  29%|██▊       | 86/301 [00:00<00:01, 152.84it/s, now=None]\nt:  35%|███▌      | 106/301 [00:00<00:01, 156.03it/s, now=None]\nt:  42%|████▏     | 126/301 [00:00<00:01, 166.35it/s, now=None]\nt:  48%|████▊     | 144/301 [00:00<00:00, 169.37it/s, now=None]\nt:  54%|█████▍    | 162/301 [00:00<00:00, 165.00it/s, now=None]\nt:  60%|█████▉    | 180/301 [00:01<00:00, 165.40it/s, now=None]\nt:  65%|██████▌   | 197/301 [00:01<00:00, 149.37it/s, now=None]\nt:  71%|███████   | 213/301 [00:01<00:00, 143.34it/s, now=None]\nt:  76%|███████▌  | 229/301 [00:01<00:00, 146.87it/s, now=None]\nt:  81%|████████▏ | 245/301 [00:01<00:00, 149.78it/s, now=None]\nt:  87%|████████▋ | 261/301 [00:01<00:00, 151.83it/s, now=None]\nt:  92%|█████████▏| 277/301 [00:01<00:00, 146.75it/s, now=None]\nt:  98%|█████████▊| 295/301 [00:01<00:00, 155.36it/s, now=None]\n                                                               \nMoviepy - Done !\nMoviepy - video ready /tmp/tmpvx0f1eg3.mp4.mp4\npaths /tmp/tmpvx0f1eg3.mp4 /tmp/tmpvx0f1eg3.mp4.wav /tmp/tmpvx0f1eg3.mp4.mp4",
  "metrics": {
    "predict_time": 23.281406247,
    "total_time": 23.288405
  },
  "output": "https://replicate.delivery/xezq/HPL3CNp0JCJ0I5VzqaEwzSmQP2oMV6Sq464fUiTefAxbqwNpA/tmpvx0f1eg3.mp4.mp4",
  "started_at": "2025-04-27T06:48:22.420998Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-m24ttdxzg7muuugj7nza5byebctesqfzcxwn52nkktd6izvzydpa",
    "get": "https://api.replicate.com/v1/predictions/haxymbrchsrme0cper0vtrd73r",
    "cancel": "https://api.replicate.com/v1/predictions/haxymbrchsrme0cper0vtrd73r/cancel"
  },
  "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1"
}

Generated in

23.3 seconds

Tweak it Share Report

torch.Size([1, 752, 128]) tensor([752], dtype=torch.int32) ['the sound of ripping paper'] ['/tmp/tmpvx0f1eg3.mp4'] [False] None None None None
2025-04-27 06:48:23.156 start
frames_embed midis cond torch.Size([1, 752, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 752, 128]) tensor(74.5061, device='cuda:0')
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
No cond tensor([752], device='cuda:0', dtype=torch.int32) tensor([752], device='cuda:0', dtype=torch.int32)
2025-04-27 06:48:42.812 sample
duration 10.02 10.03
Moviepy - Building video /tmp/tmpvx0f1eg3.mp4.mp4.
MoviePy - Writing audio in tmpvx0f1eg3.mp4TEMP_MPY_wvf_snd.mp4
chunk:   0%|          | 0/221 [00:00<?, ?it/s, now=None]
chunk:  58%|█████▊    | 129/221 [00:00<00:00, 1273.47it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/tmpvx0f1eg3.mp4.mp4
t:   0%|          | 0/301 [00:00<?, ?it/s, now=None]
t:   7%|▋         | 21/301 [00:00<00:01, 197.98it/s, now=None]
t:  15%|█▍        | 44/301 [00:00<00:01, 213.66it/s, now=None]
t:  22%|██▏       | 66/301 [00:00<00:01, 192.46it/s, now=None]
t:  29%|██▊       | 86/301 [00:00<00:01, 152.84it/s, now=None]
t:  35%|███▌      | 106/301 [00:00<00:01, 156.03it/s, now=None]
t:  42%|████▏     | 126/301 [00:00<00:01, 166.35it/s, now=None]
t:  48%|████▊     | 144/301 [00:00<00:00, 169.37it/s, now=None]
t:  54%|█████▍    | 162/301 [00:00<00:00, 165.00it/s, now=None]
t:  60%|█████▉    | 180/301 [00:01<00:00, 165.40it/s, now=None]
t:  65%|██████▌   | 197/301 [00:01<00:00, 149.37it/s, now=None]
t:  71%|███████   | 213/301 [00:01<00:00, 143.34it/s, now=None]
t:  76%|███████▌  | 229/301 [00:01<00:00, 146.87it/s, now=None]
t:  81%|████████▏ | 245/301 [00:01<00:00, 149.78it/s, now=None]
t:  87%|████████▋ | 261/301 [00:01<00:00, 151.83it/s, now=None]
t:  92%|█████████▏| 277/301 [00:01<00:00, 146.75it/s, now=None]
t:  98%|█████████▊| 295/301 [00:01<00:00, 155.36it/s, now=None]
                                                               
Moviepy - Done !
Moviepy - video ready /tmp/tmpvx0f1eg3.mp4.mp4
paths /tmp/tmpvx0f1eg3.mp4 /tmp/tmpvx0f1eg3.mp4.wav /tmp/tmpvx0f1eg3.mp4.mp4

Prediction

acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1

Model

acappemin/video-to-audio-and-piano:d0808790

nfhqsz27dhrm80cper19z11x0m

Status

Succeeded

Source

Web

Hardware

L40S

Total duration

18.2s

Created

2 months ago

Input

video
prompt: the sound of race car, auto racing
if_piano
v2a_num_steps: 25

{
  "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
  "prompt": "the sound of race car, auto racing",
  "if_piano": false,
  "v2a_num_steps": 25
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
  {
    input: {
      video: "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
      prompt: "the sound of race car, auto racing",
      if_piano: false,
      v2a_num_steps: 25
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    input={
        "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
        "prompt": "the sound of race car, auto racing",
        "if_piano": False,
        "v2a_num_steps": 25
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run acappemin/video-to-audio-and-piano using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "acappemin/video-to-audio-and-piano:d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1",
    "input": {
      "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
      "prompt": "the sound of race car, auto racing",
      "if_piano": false,
      "v2a_num_steps": 25
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2025-04-27T06:50:01.255748Z",
  "created_at": "2025-04-27T06:49:43.020000Z",
  "data_removed": false,
  "error": null,
  "id": "nfhqsz27dhrm80cper19z11x0m",
  "input": {
    "video": "https://replicate.delivery/pbxt/MuNxDqicnHV7mODmC0oITGRo9Sri0Ns0GpipeZ1M2gVc1knq/1uCzQCdCC1U_000170.mp4",
    "prompt": "the sound of race car, auto racing",
    "if_piano": false,
    "v2a_num_steps": 25
  },
  "logs": "torch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None\n2025-04-27 06:49:43.474 start\nframes_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0')\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\nNo cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)\n2025-04-27 06:49:59.069 sample\nduration 10.04 10.04\nMoviepy - Building video /tmp/tmprdangfkr.mp4.mp4.\nMoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4\nchunk:   0%|          | 0/222 [00:00<?, ?it/s, now=None]\nchunk:  76%|███████▌  | 169/222 [00:00<00:00, 1668.18it/s, now=None]\nMoviePy - Done.\nMoviepy - Writing video /tmp/tmprdangfkr.mp4.mp4\nt:   0%|          | 0/251 [00:00<?, ?it/s, now=None]\nt:  12%|█▏        | 30/251 [00:00<00:00, 297.99it/s, now=None]\nt:  25%|██▌       | 63/251 [00:00<00:00, 311.38it/s, now=None]\nt:  38%|███▊      | 95/251 [00:00<00:00, 291.66it/s, now=None]\nt:  50%|████▉     | 125/251 [00:00<00:00, 215.98it/s, now=None]\nt:  59%|█████▉    | 149/251 [00:00<00:00, 190.23it/s, now=None]\nt:  68%|██████▊   | 170/251 [00:00<00:00, 169.79it/s, now=None]\nt:  75%|███████▌  | 189/251 [00:00<00:00, 166.17it/s, now=None]\nt:  82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None]\nt:  90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None]\nt:  97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None]\nMoviepy - Done !\nMoviepy - video ready /tmp/tmprdangfkr.mp4.mp4\npaths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4",
  "metrics": {
    "predict_time": 18.228138133,
    "total_time": 18.235748
  },
  "output": "https://replicate.delivery/xezq/KGYPIdYAWZ70DZJXnVx84Xlrwg2fYXW4SegDGBWROX8ZW4mUA/tmprdangfkr.mp4.mp4",
  "started_at": "2025-04-27T06:49:43.027610Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-cofgrcqqbzwid7wxaur2knwobnvdsqpnizgdn774o4nfkua3jp4q",
    "get": "https://api.replicate.com/v1/predictions/nfhqsz27dhrm80cper19z11x0m",
    "cancel": "https://api.replicate.com/v1/predictions/nfhqsz27dhrm80cper19z11x0m/cancel"
  },
  "version": "d08087903b561981d8fe41af352a027e0e50b725e2a4dc8bd7b233f23dc2bdf1"
}

Generated in

18.2 seconds

Tweak it Share Report

torch.Size([1, 753, 128]) tensor([753], dtype=torch.int32) ['the sound of race car, auto racing'] ['/tmp/tmprdangfkr.mp4'] [False] None None None None
2025-04-27 06:49:43.474 start
frames_embed midis cond torch.Size([1, 753, 51]) tensor(0., device='cuda:0') None None torch.Size([1, 753, 128]) tensor(14.5177, device='cuda:0')
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
No cond tensor([753], device='cuda:0', dtype=torch.int32) tensor([753], device='cuda:0', dtype=torch.int32)
2025-04-27 06:49:59.069 sample
duration 10.04 10.04
Moviepy - Building video /tmp/tmprdangfkr.mp4.mp4.
MoviePy - Writing audio in tmprdangfkr.mp4TEMP_MPY_wvf_snd.mp4
chunk:   0%|          | 0/222 [00:00<?, ?it/s, now=None]
chunk:  76%|███████▌  | 169/222 [00:00<00:00, 1668.18it/s, now=None]
MoviePy - Done.
Moviepy - Writing video /tmp/tmprdangfkr.mp4.mp4
t:   0%|          | 0/251 [00:00<?, ?it/s, now=None]
t:  12%|█▏        | 30/251 [00:00<00:00, 297.99it/s, now=None]
t:  25%|██▌       | 63/251 [00:00<00:00, 311.38it/s, now=None]
t:  38%|███▊      | 95/251 [00:00<00:00, 291.66it/s, now=None]
t:  50%|████▉     | 125/251 [00:00<00:00, 215.98it/s, now=None]
t:  59%|█████▉    | 149/251 [00:00<00:00, 190.23it/s, now=None]
t:  68%|██████▊   | 170/251 [00:00<00:00, 169.79it/s, now=None]
t:  75%|███████▌  | 189/251 [00:00<00:00, 166.17it/s, now=None]
t:  82%|████████▏ | 207/251 [00:01<00:00, 154.44it/s, now=None]
t:  90%|█████████ | 226/251 [00:01<00:00, 159.56it/s, now=None]
t:  97%|█████████▋| 244/251 [00:01<00:00, 158.32it/s, now=None]
Moviepy - Done !
Moviepy - video ready /tmp/tmprdangfkr.mp4.mp4
paths /tmp/tmprdangfkr.mp4 /tmp/tmprdangfkr.mp4.wav /tmp/tmprdangfkr.mp4.mp4

Want to make some of these yourself?

Run this model

acappemin / video-to-audio-and-piano

Prediction

Input

Output

Prediction

Input

Output

Prediction

Input

Output

Prediction

Input

Output

Logs (4knj559zd9rma0cpeqy88f83jw)

Logs (cw7p1f6tx5rma0cpeqz99rh7q8)

Logs (haxymbrchsrme0cper0vtrd73r)

Logs (nfhqsz27dhrm80cper19z11x0m)