Readme
The affordable alternatives of gpt-4o-mini.
Chat with image or video.
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run pipi32167/minicpm-v-26 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"pipi32167/minicpm-v-26:c1a2241cb7cbb6227cd55bc04e01df545daeb8d229d76bf737cb9bf7209a016e",
{
input: {
image: "https://replicate.delivery/pbxt/LYg9UV9q67McivPuwvpQPjClOZ7KqaOHH5F2DgbfCZspIhOQ/input.mp4",
prompt: ""
}
}
);
console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import replicate
Run pipi32167/minicpm-v-26 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"pipi32167/minicpm-v-26:c1a2241cb7cbb6227cd55bc04e01df545daeb8d229d76bf737cb9bf7209a016e",
input={
"image": "https://replicate.delivery/pbxt/LYg9UV9q67McivPuwvpQPjClOZ7KqaOHH5F2DgbfCZspIhOQ/input.mp4",
"prompt": ""
}
)
print(output)
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run pipi32167/minicpm-v-26 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "pipi32167/minicpm-v-26:c1a2241cb7cbb6227cd55bc04e01df545daeb8d229d76bf737cb9bf7209a016e",
"input": {
"image": "https://replicate.delivery/pbxt/LYg9UV9q67McivPuwvpQPjClOZ7KqaOHH5F2DgbfCZspIhOQ/input.mp4",
"prompt": ""
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Add a payment method to run this model.
By signing in, you agree to our
terms of service and privacy policy
{
"completed_at": "2024-09-03T11:07:33.733939Z",
"created_at": "2024-09-03T11:06:02.389000Z",
"data_removed": false,
"error": null,
"id": "0stvkw132nrgm0chpy88ab58bc",
"input": {
"image": "https://replicate.delivery/pbxt/LYg9UV9q67McivPuwvpQPjClOZ7KqaOHH5F2DgbfCZspIhOQ/input.mp4",
"prompt": ""
},
"logs": "video_duration 20.099\nframe_rate 0.49753719090502013\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-001.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-002.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-003.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-004.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-005.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-006.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-007.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-008.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-009.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-010.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-011.jpg\nurl file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-012.jpg\n<|im_start|>system\nYou are an assistant who perfectly describes videos.<|im_end|>\n<|im_start|>user\nfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-001.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-002.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-003.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-004.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-005.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-006.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-007.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-008.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-009.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-010.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-011.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-012.jpgPlease describe the main content of the video.<|im_end|>\n<|im_start|>assistant\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 119.08 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 84.09 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 75.66 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 69.73 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 68.84 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 69.82 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 69.14 ms\nencode_image_with_clip: all 7 segments encoded in 556.90 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 559.59 ms by CLIP ( 1.25 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 74.52 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 69.42 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 69.61 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 69.40 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 68.89 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 68.94 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 69.49 ms\nencode_image_with_clip: all 7 segments encoded in 490.78 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 491.73 ms by CLIP ( 1.10 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 75.94 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 70.75 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 71.90 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 70.07 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 70.25 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 71.70 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 69.42 ms\nencode_image_with_clip: all 7 segments encoded in 500.53 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 501.22 ms by CLIP ( 1.12 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 73.35 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 67.97 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 69.35 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 67.98 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 70.11 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 68.95 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 69.41 ms\nencode_image_with_clip: all 7 segments encoded in 487.77 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 488.71 ms by CLIP ( 1.09 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 74.29 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 69.76 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 75.80 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 70.52 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 71.28 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 70.43 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 69.91 ms\nencode_image_with_clip: all 7 segments encoded in 502.64 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 505.78 ms by CLIP ( 1.13 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 75.32 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 68.72 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 71.33 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 69.88 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 69.59 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 69.81 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 69.47 ms\nencode_image_with_clip: all 7 segments encoded in 494.69 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 497.21 ms by CLIP ( 1.11 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 73.98 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 71.01 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 69.62 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 73.14 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 70.56 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 71.26 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 70.41 ms\nencode_image_with_clip: all 7 segments encoded in 500.52 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 502.73 ms by CLIP ( 1.12 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 73.91 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 69.73 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 70.25 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 70.65 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 68.18 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 69.02 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 68.29 ms\nencode_image_with_clip: all 7 segments encoded in 490.62 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 493.28 ms by CLIP ( 1.10 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 75.18 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 71.48 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 69.44 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 69.53 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 69.74 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 69.00 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 71.82 ms\nencode_image_with_clip: all 7 segments encoded in 496.73 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 497.47 ms by CLIP ( 1.11 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 74.10 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 68.35 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 68.40 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 70.15 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 67.47 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 67.69 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 68.39 ms\nencode_image_with_clip: all 7 segments encoded in 485.04 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 486.12 ms by CLIP ( 1.09 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 74.13 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 68.29 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 67.89 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 68.82 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 70.03 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 68.70 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 68.47 ms\nencode_image_with_clip: all 7 segments encoded in 486.90 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 487.62 ms by CLIP ( 1.09 ms per image patch)\nuhd_slice_image: multiple 5\nuhd_slice_image: image_size: 720 1280; source_image size: 336 602\nuhd_slice_image: image_size: 720 1280; best_grid: 2 3\nuhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470\nclip_image_preprocess: 336 602\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_preprocess: 406 490\nclip_image_build_graph: 336 602\nencode_image_with_clip: step 1 of 7 encoded in 73.05 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 2 of 7 encoded in 68.70 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 3 of 7 encoded in 67.93 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 4 of 7 encoded in 68.90 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 5 of 7 encoded in 70.89 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 6 of 7 encoded in 68.80 ms\nclip_image_build_graph: 406 490\nencode_image_with_clip: step 7 of 7 encoded in 70.85 ms\nencode_image_with_clip: all 7 segments encoded in 489.69 ms\nencode_image_with_clip: load_image_size 720 1280\nencode_image_with_clip: image embedding created: 448 tokens\nencode_image_with_clip: image encoded in 490.52 ms by CLIP ( 1.09 ms per image patch)\nLlama.generate: 5406 prefix-match hit, remaining 1 prompt tokens to eval\nllama_print_timings: load time = 3364.43 ms\nllama_print_timings: sample time = 31.65 ms / 301 runs ( 0.11 ms per token, 9510.27 tokens per second)\nllama_print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)\nllama_print_timings: eval time = 3527.89 ms / 301 runs ( 11.72 ms per token, 85.32 tokens per second)\nllama_print_timings: total time = 3880.06 ms / 301 tokens\n{'id': 'chatcmpl-20dbbaae-c86a-49eb-9859-a2e36981fe2b', 'object': 'chat.completion', 'created': 1725361649, 'model': './models/MiniCPM-V-2_6/ggml-model-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': \"The video showcases a commercial kitchen setup where a person is cooking using a specialized cooking device. Here's a detailed description:\\n\\n1. **Cooking Device**: The device appears to be a high-pressure cooking unit, often used in commercial kitchens for quick and efficient cooking. It has a round, deep pot with a lid and is mounted on a stainless steel platform.\\n\\n2. **Cooking Process**: \\n - In the first few frames, the device is shown with oil heating in the pot. The person is using a ladle to stir the contents, which seem to be pieces of food being cooked.\\n - As the video progresses, the pot is filled with a mixture of ingredients, likely vegetables and possibly meat or tofu, being cooked in the high-pressure environment.\\n - The final frames show the pot being opened, revealing the cooked mixture. The steam indicates that the cooking process was recently completed.\\n\\n3. **Kitchen Environment**: The surrounding area includes other kitchen equipment and utensils, indicating a professional or commercial kitchen setting.\\n\\n4. **User Interaction**: The person operating the device is seen using a ladle to stir and manage the cooking process, ensuring even cooking.\\n\\n5. **Branding and Information**: The video includes a watermark and a search prompt for a specific commercial kitchen equipment brand, suggesting that the video is promotional or informational in nature.\\n\\nOverall, the video demonstrates the use of a high-pressure cooking device in a commercial kitchen, highlighting its efficiency and the quality of the cooking process.\"}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 5407, 'completion_tokens': 300, 'total_tokens': 5707}}",
"metrics": {
"predict_time": 16.566017158,
"total_time": 91.344939
},
"output": "The video showcases a commercial kitchen setup where a person is cooking using a specialized cooking device. Here's a detailed description:\n\n1. **Cooking Device**: The device appears to be a high-pressure cooking unit, often used in commercial kitchens for quick and efficient cooking. It has a round, deep pot with a lid and is mounted on a stainless steel platform.\n\n2. **Cooking Process**: \n - In the first few frames, the device is shown with oil heating in the pot. The person is using a ladle to stir the contents, which seem to be pieces of food being cooked.\n - As the video progresses, the pot is filled with a mixture of ingredients, likely vegetables and possibly meat or tofu, being cooked in the high-pressure environment.\n - The final frames show the pot being opened, revealing the cooked mixture. The steam indicates that the cooking process was recently completed.\n\n3. **Kitchen Environment**: The surrounding area includes other kitchen equipment and utensils, indicating a professional or commercial kitchen setting.\n\n4. **User Interaction**: The person operating the device is seen using a ladle to stir and manage the cooking process, ensuring even cooking.\n\n5. **Branding and Information**: The video includes a watermark and a search prompt for a specific commercial kitchen equipment brand, suggesting that the video is promotional or informational in nature.\n\nOverall, the video demonstrates the use of a high-pressure cooking device in a commercial kitchen, highlighting its efficiency and the quality of the cooking process.",
"started_at": "2024-09-03T11:07:17.167922Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/0stvkw132nrgm0chpy88ab58bc",
"cancel": "https://api.replicate.com/v1/predictions/0stvkw132nrgm0chpy88ab58bc/cancel"
},
"version": "c1a2241cb7cbb6227cd55bc04e01df545daeb8d229d76bf737cb9bf7209a016e"
}
video_duration 20.099
frame_rate 0.49753719090502013
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-001.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-002.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-003.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-004.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-005.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-006.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-007.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-008.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-009.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-010.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-011.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-012.jpg
<|im_start|>system
You are an assistant who perfectly describes videos.<|im_end|>
<|im_start|>user
file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-001.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-002.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-003.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-004.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-005.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-006.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-007.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-008.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-009.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-010.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-011.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-012.jpgPlease describe the main content of the video.<|im_end|>
<|im_start|>assistant
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 119.08 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 84.09 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 75.66 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.73 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 68.84 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.82 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.14 ms
encode_image_with_clip: all 7 segments encoded in 556.90 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 559.59 ms by CLIP ( 1.25 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.52 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 69.42 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.61 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.40 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 68.89 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.94 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.49 ms
encode_image_with_clip: all 7 segments encoded in 490.78 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 491.73 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 75.94 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 70.75 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 71.90 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.07 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.25 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 71.70 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.42 ms
encode_image_with_clip: all 7 segments encoded in 500.53 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 501.22 ms by CLIP ( 1.12 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.35 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 67.97 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.35 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 67.98 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.11 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.95 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.41 ms
encode_image_with_clip: all 7 segments encoded in 487.77 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 488.71 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.29 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 69.76 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 75.80 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.52 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 71.28 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 70.43 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.91 ms
encode_image_with_clip: all 7 segments encoded in 502.64 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 505.78 ms by CLIP ( 1.13 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 75.32 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.72 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 71.33 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.88 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 69.59 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.81 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.47 ms
encode_image_with_clip: all 7 segments encoded in 494.69 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 497.21 ms by CLIP ( 1.11 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.98 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 71.01 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.62 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 73.14 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.56 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 71.26 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 70.41 ms
encode_image_with_clip: all 7 segments encoded in 500.52 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 502.73 ms by CLIP ( 1.12 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.91 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 69.73 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 70.25 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.65 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 68.18 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.02 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 68.29 ms
encode_image_with_clip: all 7 segments encoded in 490.62 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 493.28 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 75.18 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 71.48 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.44 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.53 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 69.74 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.00 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 71.82 ms
encode_image_with_clip: all 7 segments encoded in 496.73 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 497.47 ms by CLIP ( 1.11 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.10 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.35 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 68.40 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.15 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 67.47 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 67.69 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 68.39 ms
encode_image_with_clip: all 7 segments encoded in 485.04 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 486.12 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.13 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.29 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 67.89 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 68.82 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.03 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.70 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 68.47 ms
encode_image_with_clip: all 7 segments encoded in 486.90 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 487.62 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.05 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.70 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 67.93 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 68.90 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.89 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.80 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 70.85 ms
encode_image_with_clip: all 7 segments encoded in 489.69 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 490.52 ms by CLIP ( 1.09 ms per image patch)
Llama.generate: 5406 prefix-match hit, remaining 1 prompt tokens to eval
llama_print_timings: load time = 3364.43 ms
llama_print_timings: sample time = 31.65 ms / 301 runs ( 0.11 ms per token, 9510.27 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
llama_print_timings: eval time = 3527.89 ms / 301 runs ( 11.72 ms per token, 85.32 tokens per second)
llama_print_timings: total time = 3880.06 ms / 301 tokens
{'id': 'chatcmpl-20dbbaae-c86a-49eb-9859-a2e36981fe2b', 'object': 'chat.completion', 'created': 1725361649, 'model': './models/MiniCPM-V-2_6/ggml-model-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "The video showcases a commercial kitchen setup where a person is cooking using a specialized cooking device. Here's a detailed description:\n\n1. **Cooking Device**: The device appears to be a high-pressure cooking unit, often used in commercial kitchens for quick and efficient cooking. It has a round, deep pot with a lid and is mounted on a stainless steel platform.\n\n2. **Cooking Process**: \n - In the first few frames, the device is shown with oil heating in the pot. The person is using a ladle to stir the contents, which seem to be pieces of food being cooked.\n - As the video progresses, the pot is filled with a mixture of ingredients, likely vegetables and possibly meat or tofu, being cooked in the high-pressure environment.\n - The final frames show the pot being opened, revealing the cooked mixture. The steam indicates that the cooking process was recently completed.\n\n3. **Kitchen Environment**: The surrounding area includes other kitchen equipment and utensils, indicating a professional or commercial kitchen setting.\n\n4. **User Interaction**: The person operating the device is seen using a ladle to stir and manage the cooking process, ensuring even cooking.\n\n5. **Branding and Information**: The video includes a watermark and a search prompt for a specific commercial kitchen equipment brand, suggesting that the video is promotional or informational in nature.\n\nOverall, the video demonstrates the use of a high-pressure cooking device in a commercial kitchen, highlighting its efficiency and the quality of the cooking process."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 5407, 'completion_tokens': 300, 'total_tokens': 5707}}
This model costs approximately $0.057 to run on Replicate, or 17 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.
This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 59 seconds. The predict time for this model varies significantly based on the inputs.
The affordable alternatives of gpt-4o-mini.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
This model costs approximately $0.057 to run on Replicate, but this varies depending on your inputs. View more.
Choose a file from your machine
Hint: you can also drag files onto the input
video_duration 20.099
frame_rate 0.49753719090502013
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-001.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-002.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-003.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-004.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-005.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-006.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-007.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-008.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-009.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-010.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-011.jpg
url file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-012.jpg
<|im_start|>system
You are an assistant who perfectly describes videos.<|im_end|>
<|im_start|>user
file:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-001.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-002.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-003.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-004.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-005.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-006.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-007.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-008.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-009.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-010.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-011.jpgfile:///tmp/08e736d9-cac3-4df9-a293-b5b6d4f726a7/keyframe-012.jpgPlease describe the main content of the video.<|im_end|>
<|im_start|>assistant
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 119.08 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 84.09 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 75.66 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.73 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 68.84 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.82 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.14 ms
encode_image_with_clip: all 7 segments encoded in 556.90 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 559.59 ms by CLIP ( 1.25 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.52 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 69.42 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.61 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.40 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 68.89 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.94 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.49 ms
encode_image_with_clip: all 7 segments encoded in 490.78 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 491.73 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 75.94 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 70.75 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 71.90 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.07 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.25 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 71.70 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.42 ms
encode_image_with_clip: all 7 segments encoded in 500.53 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 501.22 ms by CLIP ( 1.12 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.35 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 67.97 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.35 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 67.98 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.11 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.95 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.41 ms
encode_image_with_clip: all 7 segments encoded in 487.77 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 488.71 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.29 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 69.76 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 75.80 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.52 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 71.28 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 70.43 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.91 ms
encode_image_with_clip: all 7 segments encoded in 502.64 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 505.78 ms by CLIP ( 1.13 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 75.32 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.72 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 71.33 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.88 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 69.59 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.81 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 69.47 ms
encode_image_with_clip: all 7 segments encoded in 494.69 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 497.21 ms by CLIP ( 1.11 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.98 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 71.01 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.62 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 73.14 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.56 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 71.26 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 70.41 ms
encode_image_with_clip: all 7 segments encoded in 500.52 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 502.73 ms by CLIP ( 1.12 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.91 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 69.73 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 70.25 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.65 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 68.18 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.02 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 68.29 ms
encode_image_with_clip: all 7 segments encoded in 490.62 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 493.28 ms by CLIP ( 1.10 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 75.18 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 71.48 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 69.44 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 69.53 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 69.74 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 69.00 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 71.82 ms
encode_image_with_clip: all 7 segments encoded in 496.73 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 497.47 ms by CLIP ( 1.11 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.10 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.35 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 68.40 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 70.15 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 67.47 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 67.69 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 68.39 ms
encode_image_with_clip: all 7 segments encoded in 485.04 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 486.12 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 74.13 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.29 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 67.89 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 68.82 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.03 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.70 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 68.47 ms
encode_image_with_clip: all 7 segments encoded in 486.90 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 487.62 ms by CLIP ( 1.09 ms per image patch)
uhd_slice_image: multiple 5
uhd_slice_image: image_size: 720 1280; source_image size: 336 602
uhd_slice_image: image_size: 720 1280; best_grid: 2 3
uhd_slice_image: refine_image_size: 812 1470; refine_size: 812 1470
clip_image_preprocess: 336 602
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_preprocess: 406 490
clip_image_build_graph: 336 602
encode_image_with_clip: step 1 of 7 encoded in 73.05 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 2 of 7 encoded in 68.70 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 3 of 7 encoded in 67.93 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 4 of 7 encoded in 68.90 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 5 of 7 encoded in 70.89 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 6 of 7 encoded in 68.80 ms
clip_image_build_graph: 406 490
encode_image_with_clip: step 7 of 7 encoded in 70.85 ms
encode_image_with_clip: all 7 segments encoded in 489.69 ms
encode_image_with_clip: load_image_size 720 1280
encode_image_with_clip: image embedding created: 448 tokens
encode_image_with_clip: image encoded in 490.52 ms by CLIP ( 1.09 ms per image patch)
Llama.generate: 5406 prefix-match hit, remaining 1 prompt tokens to eval
llama_print_timings: load time = 3364.43 ms
llama_print_timings: sample time = 31.65 ms / 301 runs ( 0.11 ms per token, 9510.27 tokens per second)
llama_print_timings: prompt eval time = 0.00 ms / 0 tokens ( -nan ms per token, -nan tokens per second)
llama_print_timings: eval time = 3527.89 ms / 301 runs ( 11.72 ms per token, 85.32 tokens per second)
llama_print_timings: total time = 3880.06 ms / 301 tokens
{'id': 'chatcmpl-20dbbaae-c86a-49eb-9859-a2e36981fe2b', 'object': 'chat.completion', 'created': 1725361649, 'model': './models/MiniCPM-V-2_6/ggml-model-Q4_K_M.gguf', 'choices': [{'index': 0, 'message': {'role': 'assistant', 'content': "The video showcases a commercial kitchen setup where a person is cooking using a specialized cooking device. Here's a detailed description:\n\n1. **Cooking Device**: The device appears to be a high-pressure cooking unit, often used in commercial kitchens for quick and efficient cooking. It has a round, deep pot with a lid and is mounted on a stainless steel platform.\n\n2. **Cooking Process**: \n - In the first few frames, the device is shown with oil heating in the pot. The person is using a ladle to stir the contents, which seem to be pieces of food being cooked.\n - As the video progresses, the pot is filled with a mixture of ingredients, likely vegetables and possibly meat or tofu, being cooked in the high-pressure environment.\n - The final frames show the pot being opened, revealing the cooked mixture. The steam indicates that the cooking process was recently completed.\n\n3. **Kitchen Environment**: The surrounding area includes other kitchen equipment and utensils, indicating a professional or commercial kitchen setting.\n\n4. **User Interaction**: The person operating the device is seen using a ladle to stir and manage the cooking process, ensuring even cooking.\n\n5. **Branding and Information**: The video includes a watermark and a search prompt for a specific commercial kitchen equipment brand, suggesting that the video is promotional or informational in nature.\n\nOverall, the video demonstrates the use of a high-pressure cooking device in a commercial kitchen, highlighting its efficiency and the quality of the cooking process."}, 'logprobs': None, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 5407, 'completion_tokens': 300, 'total_tokens': 5707}}