lucataco / bulk-video-caption

model: gpt-4o
include_csv
system_prompt: Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.
caption_prefix
caption_suffix
openai_api_key: ████████████████████
This value was redacted after being sent to the model.
frames_to_extract: 2
video_zip_archive: vhs-segmented.zip

{
  "model": "gpt-4o",
  "include_csv": false,
  "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
  "caption_prefix": "",
  "caption_suffix": "",
  "openai_api_key": "[REDACTED]",
  "frames_to_extract": 2,
  "video_zip_archive": "https://replicate.delivery/pbxt/M8gKmAjJduERKnufcY1E6DyxZ7xUxyjqvO4mqgTvHqlJ6h0e/vhs-segmented.zip"
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
  {
    input: {
      model: "gpt-4o",
      include_csv: false,
      system_prompt: "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
      caption_prefix: "",
      caption_suffix: "",
      openai_api_key: "[REDACTED]",
      frames_to_extract: 2,
      video_zip_archive: "https://replicate.delivery/pbxt/M8gKmAjJduERKnufcY1E6DyxZ7xUxyjqvO4mqgTvHqlJ6h0e/vhs-segmented.zip"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
    input={
        "model": "gpt-4o",
        "include_csv": False,
        "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
        "caption_prefix": "",
        "caption_suffix": "",
        "openai_api_key": "[REDACTED]",
        "frames_to_extract": 2,
        "video_zip_archive": "https://replicate.delivery/pbxt/M8gKmAjJduERKnufcY1E6DyxZ7xUxyjqvO4mqgTvHqlJ6h0e/vhs-segmented.zip"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
    "input": {
      "model": "gpt-4o",
      "include_csv": false,
      "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
      "caption_prefix": "",
      "caption_suffix": "",
      "openai_api_key": "[REDACTED]",
      "frames_to_extract": 2,
      "video_zip_archive": "https://replicate.delivery/pbxt/M8gKmAjJduERKnufcY1E6DyxZ7xUxyjqvO4mqgTvHqlJ6h0e/vhs-segmented.zip"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

video_captions.zip

{
  "completed_at": "2024-12-13T21:24:25.601741Z",
  "created_at": "2024-12-13T21:24:03.658000Z",
  "data_removed": false,
  "error": null,
  "id": "4weatz1zs9rg80ckr7m83bz2k4",
  "input": {
    "model": "gpt-4o",
    "include_csv": false,
    "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
    "caption_prefix": "",
    "caption_suffix": "",
    "openai_api_key": "[REDACTED]",
    "frames_to_extract": 2,
    "video_zip_archive": "https://replicate.delivery/pbxt/M8gKmAjJduERKnufcY1E6DyxZ7xUxyjqvO4mqgTvHqlJ6h0e/vhs-segmented.zip"
  },
  "logs": "Files extracted:\n/tmp/outputs/segment5.mp4\n/tmp/outputs/segment1.mp4\n/tmp/outputs/segment2.mp4\n/tmp/outputs/segment4.mp4\n/tmp/outputs/segment3.mp4\nNumber of videos to be captioned: 5\n===================================================\nProcessing segment5.mp4\nCaption: The frames depict a scene filled with multicolored static noise, commonly seen on analog television screens when a signal is not properly received. The image is characterized by a grainy texture with scattered bright specks against a predominantly dark background. This suggests a transition or a disruption in the video signal, creating a mood of ambiguity or interruption. The repetitive nature of these frames reinforces this feeling of disconnection or static pause in the video.\n===================================================\nProcessing segment1.mp4\nCaption: The frames from the video 'segment1.mp4' show a pattern of multicolored static noise, suggesting an interrupted signal or a transition effect. This static effect is reminiscent of old television screens when no broadcast is detected. The overall mood is mysterious, potentially indicating a disruption or an intentional stylistic choice to create a sense of anticipation or nostalgia. The frames are dark with a scattered array of colors, primarily blues and greens, accentuated by occasional purples and reds.\n===================================================\nProcessing segment2.mp4\nCaption: The frames from the video 'segment2.mp4' show a static-like pattern, characterized by multicolored noise scattered throughout a dark background. This effect is reminiscent of old television static, suggesting either a transition scene or an intentional visual effect. The mood is mysterious and evokes a sense of ambiguity, as no discernible subjects or settings are visible. The video type appears to be abstract, likely aiming to create an atmospheric or disorienting experience for the viewer.\n===================================================\nProcessing segment4.mp4\nCaption: These frames appear to be from a video showing the classic \"TV static\" effect, often used to depict a loss of signal or transition. The first frame shows a random pattern of multicolored noise on a dark background. In the second frame, a faint horizontal line is visible, suggesting a brief distortion or attempt to re-establish signal. The overall mood is chaotic and unresolved, typical of analog static interference. The video likely conveys themes of disruption or transition.\n===================================================\nProcessing segment3.mp4\nCaption: These frames from the video \"segment3.mp4\" depict a scene with static noise, often characteristic of analog television interference or a digital error. The frames have a dark background interspersed with multicolored specks, creating a grainy texture. The mood is enigmatic, suggesting a transition or interruption in the video content. There's no discernible motion or subjects visible.\n===================================================",
  "metrics": {
    "predict_time": 14.689693508,
    "total_time": 21.943741
  },
  "output": "https://replicate.delivery/czjl/rzctbdIA3P7BKdemX09bSReHlKTNRJE9li33Sj9aSfwTAL1nA/video_captions.zip",
  "started_at": "2024-12-13T21:24:10.912048Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/fddq-jv44dxnkgt2y3vbb2xxmvlqyrk5ml7wv5gzgizjm33y6gilh76ma",
    "get": "https://api.replicate.com/v1/predictions/4weatz1zs9rg80ckr7m83bz2k4",
    "cancel": "https://api.replicate.com/v1/predictions/4weatz1zs9rg80ckr7m83bz2k4/cancel"
  },
  "version": "bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791"
}

Generated in

14.7 seconds

Tweak it ShareReport

Files extracted:
/tmp/outputs/segment5.mp4
/tmp/outputs/segment1.mp4
/tmp/outputs/segment2.mp4
/tmp/outputs/segment4.mp4
/tmp/outputs/segment3.mp4
Number of videos to be captioned: 5
===================================================
Processing segment5.mp4
Caption: The frames depict a scene filled with multicolored static noise, commonly seen on analog television screens when a signal is not properly received. The image is characterized by a grainy texture with scattered bright specks against a predominantly dark background. This suggests a transition or a disruption in the video signal, creating a mood of ambiguity or interruption. The repetitive nature of these frames reinforces this feeling of disconnection or static pause in the video.
===================================================
Processing segment1.mp4
Caption: The frames from the video 'segment1.mp4' show a pattern of multicolored static noise, suggesting an interrupted signal or a transition effect. This static effect is reminiscent of old television screens when no broadcast is detected. The overall mood is mysterious, potentially indicating a disruption or an intentional stylistic choice to create a sense of anticipation or nostalgia. The frames are dark with a scattered array of colors, primarily blues and greens, accentuated by occasional purples and reds.
===================================================
Processing segment2.mp4
Caption: The frames from the video 'segment2.mp4' show a static-like pattern, characterized by multicolored noise scattered throughout a dark background. This effect is reminiscent of old television static, suggesting either a transition scene or an intentional visual effect. The mood is mysterious and evokes a sense of ambiguity, as no discernible subjects or settings are visible. The video type appears to be abstract, likely aiming to create an atmospheric or disorienting experience for the viewer.
===================================================
Processing segment4.mp4
Caption: These frames appear to be from a video showing the classic "TV static" effect, often used to depict a loss of signal or transition. The first frame shows a random pattern of multicolored noise on a dark background. In the second frame, a faint horizontal line is visible, suggesting a brief distortion or attempt to re-establish signal. The overall mood is chaotic and unresolved, typical of analog static interference. The video likely conveys themes of disruption or transition.
===================================================
Processing segment3.mp4
Caption: These frames from the video "segment3.mp4" depict a scene with static noise, often characteristic of analog television interference or a digital error. The frames have a dark background interspersed with multicolored specks, creating a grainy texture. The mood is enigmatic, suggesting a transition or interruption in the video content. There's no discernible motion or subjects visible.
===================================================

Prediction

lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791

Model

lucataco/bulk-video-caption:bd610b3c

8ma06swr21rg80ckr7rt2krra4

Status

Succeeded

Source

Web

Hardware

CPU

Total duration

75.8s

Created

7 months ago

Input

model: claude-3-5-sonnet-20240620
include_csv
system_prompt: Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.
caption_prefix
caption_suffix
anthropic_api_key: ████████████████████
This value was redacted after being sent to the model.
frames_to_extract: 1
video_zip_archive: disney-12.zip

{
  "model": "claude-3-5-sonnet-20240620",
  "include_csv": false,
  "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
  "caption_prefix": "",
  "caption_suffix": "",
  "anthropic_api_key": "[REDACTED]",
  "frames_to_extract": 1,
  "video_zip_archive": "https://replicate.delivery/pbxt/M8gUShBtt7fx8Oj1HNr6pxlMxyMx9URZzvtwj8yPzbA4nW03/disney-12.zip"
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
  {
    input: {
      model: "claude-3-5-sonnet-20240620",
      include_csv: false,
      system_prompt: "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
      caption_prefix: "",
      caption_suffix: "",
      anthropic_api_key: "[REDACTED]",
      frames_to_extract: 1,
      video_zip_archive: "https://replicate.delivery/pbxt/M8gUShBtt7fx8Oj1HNr6pxlMxyMx9URZzvtwj8yPzbA4nW03/disney-12.zip"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
    input={
        "model": "claude-3-5-sonnet-20240620",
        "include_csv": False,
        "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
        "caption_prefix": "",
        "caption_suffix": "",
        "anthropic_api_key": "[REDACTED]",
        "frames_to_extract": 1,
        "video_zip_archive": "https://replicate.delivery/pbxt/M8gUShBtt7fx8Oj1HNr6pxlMxyMx9URZzvtwj8yPzbA4nW03/disney-12.zip"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
    "input": {
      "model": "claude-3-5-sonnet-20240620",
      "include_csv": false,
      "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
      "caption_prefix": "",
      "caption_suffix": "",
      "anthropic_api_key": "[REDACTED]",
      "frames_to_extract": 1,
      "video_zip_archive": "https://replicate.delivery/pbxt/M8gUShBtt7fx8Oj1HNr6pxlMxyMx9URZzvtwj8yPzbA4nW03/disney-12.zip"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

video_captions.zip

{
  "completed_at": "2024-12-13T21:35:31.834346Z",
  "created_at": "2024-12-13T21:34:16.080000Z",
  "data_removed": false,
  "error": null,
  "id": "8ma06swr21rg80ckr7rt2krra4",
  "input": {
    "model": "claude-3-5-sonnet-20240620",
    "include_csv": false,
    "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
    "caption_prefix": "",
    "caption_suffix": "",
    "anthropic_api_key": "[REDACTED]",
    "frames_to_extract": 1,
    "video_zip_archive": "https://replicate.delivery/pbxt/M8gUShBtt7fx8Oj1HNr6pxlMxyMx9URZzvtwj8yPzbA4nW03/disney-12.zip"
  },
  "logs": "Files extracted:\n/tmp/outputs/05ccfa61ece031e881d173289761cf91.mp4\n/tmp/outputs/4c918b917308ff03120e9e86650a2d3c.mp4\n/tmp/outputs/2c1ed5408882479b06681f7cf372916a.mp4\n/tmp/outputs/1d50a3d9703f152758d5422c8b48010f.mp4\n/tmp/outputs/5a0229ffdb3bd9d8e81dca7988d7cdbb.mp4\n/tmp/outputs/7d6dcf13f5c3d45b85c5ea0544c429e4.mp4\n/tmp/outputs/4adbb3a2945c9edd78785daccfd23e80.mp4\n/tmp/outputs/0bb5f6dbf8ed2e0060f0ac4164b24847.mp4\n/tmp/outputs/7fe0c83572de828da1cab0c118dece14.mp4\n/tmp/outputs/05a234b0164d015d468f2f53e771b4cf.mp4\n/tmp/outputs/8adfde998361b1d7c6f38a35481667fd.mp4\n/tmp/outputs/3f0979e6cae25447f416372c49ad5e07.mp4\nNumber of videos to be captioned: 12\n===================================================\nProcessing 05ccfa61ece031e881d173289761cf91.mp4\nCaption: This frame is from a classic black and white animated cartoon. The main subject is a stylized cartoon cow character with a cheerful expression, standing on what appears to be a wooden platform or bench. The cow has large, expressive eyes, a big smile, and is wearing a collar with the name \"BOB\" visible.\nThe animation style is reminiscent of early Disney or Fleischer Studios cartoons from the 1930s, with simple, bold lines and high contrast between the black character and the lighter background. The background itself is minimalistic, showing some curved lines suggesting a hilly or mountainous landscape in the distance.\nThe overall mood is lighthearted and whimsical, typical of early animated shorts. The cow's posture and expression give a sense of anticipation, as if it's about to embark on an adventure or perform some comical act.\nThis type of animation would likely be part of a short comedic cartoon, possibly featuring anthropomorphic animals in humorous situations. The simplicity of the style and the black and white coloration indicate this is likely from the early era of animated films.\n===================================================\nProcessing 4c918b917308ff03120e9e86650a2d3c.mp4\nCaption: This frame is from a classic black and white animated cartoon. The style is characteristic of early 20th century animation, reminiscent of Walt Disney or Fleischer Studios productions.\nThe main subject of the frame is a cartoon character that appears to be an anthropomorphic cow or bull. It has a distinctive round nose, horns, and is standing on two legs like a person. The character is positioned on what looks like a wooden platform or deck.\nThe background shows a simple, sketched landscape with minimal detail, typical of the era's animation style. There are suggestions of hills or mountains in the distance, and possibly a tree or structure at the top of the frame.\nThe overall mood is whimsical and light-hearted, as is common in cartoons of this period. The simplicity of the black and white imagery adds to the nostalgic feel of the scene.\nThis appears to be a single frame from a longer animated sequence, likely part of a story or comedic situation involving the cartoon character. The vintage animation style and character design suggest this is from a classic animated short film, possibly from the 1920s or 1930s.\n===================================================\nProcessing 2c1ed5408882479b06681f7cf372916a.mp4\nCaption: This frame is from a classic black and white animated cartoon, likely from the early days of Disney animation. The scene depicts Mickey Mouse, the iconic Disney character, standing in front of a large, menacing creature with an enormous open mouth filled with teeth. The creature appears to be some sort of monster or beast, possibly a representation of a cow or bull given its horns.\nThe animation style is characteristic of early 20th century cartoons, with simple, bold lines and high contrast between black and white elements. Mickey is shown in his classic design, with his round ears, big shoes, and white gloves. He appears small in comparison to the looming mouth of the creature, creating a sense of danger or challenge for the character.\nThe background is minimalistic, focusing attention on the main action. There's a hint of a curved line that could represent the floor or a wall, and a circular object in the upper right corner that might be a light fixture.\nThis frame likely comes from a comical or adventurous sequence where Mickey is facing off against or escaping from this large, intimidating creature. The exaggerated proportions and expressions are typical of the playful, often surreal nature of early animated shorts.\nThe overall mood is a mix of humor and mild peril, which was common in cartoons of this era. The stark black and white palette adds to the dramatic effect of the scene.\n===================================================\nProcessing 1d50a3d9703f152758d5422c8b48010f.mp4\nCaption: This frame is from a classic black and white animated cartoon, likely from the early days of Disney animation. The scene depicts Mickey Mouse standing at the helm of a steamboat, steering the large wheel. Behind him, a tall anthropomorphic character (possibly Goofy or Pete) is leaning over, appearing to interact with Mickey or observe his actions.\nThe setting is the deck of a steamboat, with a bell visible in the upper right corner and simplified, sketchy landscape elements in the background suggesting a body of water and distant shoreline. The art style is characteristic of early 20th century animation, with exaggerated, rubbery limbs and simplified features.\nThe mood of the scene appears light-hearted and adventurous, typical of Mickey Mouse cartoons of that era. The composition suggests movement and action, with Mickey at the center of the frame, actively engaged in steering the boat.\nThis type of animation represents a significant period in animation history, showcasing the charm and creativity of early Disney shorts that laid the foundation for future animated storytelling.\n===================================================\nProcessing 5a0229ffdb3bd9d8e81dca7988d7cdbb.mp4\nCaption: This frame is from a classic black and white animated cartoon, likely from the early days of Disney animation. The image shows three cartoon characters in a simple, hand-drawn style characteristic of early 20th century animation.\nIn the center of the frame is a tall, lanky character with exaggerated features, including an extremely long snout or nose. To the left is Mickey Mouse, instantly recognizable with his round ears and button nose. On the right is another mouse-like character, possibly Minnie Mouse, given the similar style to Mickey.\nThe background is minimal, with horizontal lines suggesting a basic setting. The characters are drawn with bold, black outlines against a light background, creating a stark contrast typical of this era of animation.\nThe scene appears to be a comedic moment, with the central character's elongated face creating a humorous visual gag. The positioning of the characters suggests they are interacting or reacting to this central figure's unusual appearance.\nThis frame captures the charm and simplicity of early animated shorts, where character expression and physical comedy were key elements in storytelling. The style is nostalgic, evoking the golden age of hand-drawn animation and the birth of iconic cartoon characters.\n===================================================\nProcessing 7d6dcf13f5c3d45b85c5ea0544c429e4.mp4\nCaption: This frame is from a classic black and white animated cartoon featuring Mickey Mouse. The scene takes place on a boat, with Mickey standing at the ship's wheel. The animation style is characteristic of early Disney cartoons, with simple, bold lines and high contrast.\nMickey is depicted in his iconic form, with large round ears, white gloves, and oversized shoes. He's shown in profile, gripping the spokes of the large wooden ship's wheel, which dominates the left side of the frame. The wheel is much larger than Mickey, emphasizing his small stature and the grand scale of the vessel.\nIn the background, we can see stylized representations of the sea and sky through what appears to be a porthole or window. A life preserver is visible on the right side of the frame, further establishing the nautical setting.\nThe mood of the scene is lighthearted and adventurous, typical of Mickey Mouse cartoons of this era. The composition suggests Mickey is steering the boat, possibly embarking on a sea voyage or maritime adventure.\nThis single frame captures the essence of early animation techniques and the charm of Mickey Mouse as a character, inviting viewers into a world of whimsical nautical exploration.\n===================================================\nProcessing 4adbb3a2945c9edd78785daccfd23e80.mp4\nCaption: This frame is from a classic black and white animated cartoon. The image shows a stylized cartoon bird, likely a crow or raven, perched on what appears to be a T-shaped stand or perch. The bird has a distinctive large beak, exaggerated eye, and sleek black feathers. Its posture is upright and alert, with its tail curling around the perch.\nThe animation style is characteristic of early to mid-20th century cartoons, with bold, simple lines and high contrast between the black bird and the light background. The bird's expression seems mischievous or cunning, typical of animated animal characters from this era.\nThe overall mood of the video appears to be lighthearted and comical, as is common in classic cartoon shorts. The simplicity of the scene suggests that this might be part of a longer narrative where the bird character plays a central role, perhaps as a trickster or clever protagonist.\nThis type of vintage animation often relies on exaggerated movements and expressions to convey humor and personality, so it's likely that subsequent frames would show the bird engaging in animated antics or reactions.\n===================================================\nProcessing 0bb5f6dbf8ed2e0060f0ac4164b24847.mp4\nCaption: This frame is from a classic black and white animated cartoon. The scene depicts a small, anthropomorphic mouse character, likely Mickey Mouse, standing on what appears to be a wooden surface or platform. Next to the mouse is a much larger, comical-looking cow or bull character, suspended by a harness or pulley system.\nThe cow is drawn in a exaggerated, cartoonish style with a large body, thin legs, and an expressive face. It appears to be dangling or about to be lifted, creating a humorous contrast with the small mouse character.\nThe background is minimalistic, with a white space suggesting a snowy or blank environment, typical of early animation styles. There's a hint of a horizon line or landscape in the distance.\nThis frame captures the whimsical and often absurd nature of early animated shorts, where physical comedy and visual gags were key elements. The stark contrast between the tiny mouse and the suspended large animal creates an immediate sense of anticipation for comedic action to follow.\nThe overall mood is light-hearted and playful, inviting viewers into a world where the laws of physics are bent for comedic effect. This type of scene would likely lead to a series of slapstick interactions between the characters, a hallmark of classic cartoon humor.\n===================================================\nProcessing 7fe0c83572de828da1cab0c118dece14.mp4\nCaption: This frame is from a black and white animated video, likely an early cartoon from the mid-20th century. The image shows a stylized cartoon bird perched on what appears to be a cylindrical object, possibly a perch or branch. The bird has an exaggerated, comical appearance with a large, smiling beak and a rounded body. Its tail feathers are long and curved, adding to the whimsical design.\nThe animation style is characteristic of early hand-drawn cartoons, with bold outlines and simple shapes. The background is minimal, focusing attention on the bird character. The contrast between the dark figure of the bird and the lighter background creates a clear, striking image.\nThis type of cartoon typically features anthropomorphized animals in humorous situations. The bird's cheerful expression suggests it might be part of a lighthearted, comedic scene or story. The simplicity of the design and the black and white coloration are indicative of the limitations and aesthetic choices of early animation techniques.\nThe overall mood of the video, based on this frame, appears to be playful and entertaining, aimed at delivering visual comedy through exaggerated character design and animated antics.\n===================================================\nProcessing 05a234b0164d015d468f2f53e771b4cf.mp4\nCaption: This frame is from a classic black and white animated cartoon. The scene depicts three cartoon characters on what appears to be a ship's deck or platform. In the center, there's a tall, lanky character (likely Goofy) holding onto a rope or cable. To his right is a smaller character with round ears (resembling Mickey Mouse) who seems to be leaning or looking overboard. On the left is another small character, possibly a sailor or captain given the hat.\nThe animation style is characteristic of early Disney cartoons, with simple, expressive designs and bold outlines. The background suggests an ocean setting, with faint outlines of what might be waves or distant landforms visible.\nThe scene conveys a sense of adventure or exploration, with the characters positioned as if they're observing something off-screen or preparing for some nautical activity. The composition creates a feeling of anticipation or curiosity about what the characters are seeing or about to do.\nThis type of vintage animation often features slapstick humor and exaggerated physical comedy, so the setup might be leading to a humorous situation or gag involving the characters and their seafaring exploits.\n===================================================\nProcessing 8adfde998361b1d7c6f38a35481667fd.mp4\nCaption: This frame is from a classic black and white animated cartoon, likely featuring Mickey Mouse. The scene is set in a kitchen, as indicated by the \"KITCH\" sign visible in the upper right corner and various kitchen implements throughout the frame.\nMickey Mouse stands in the center, holding utensils in each hand. He's surrounded by kitchen equipment, including a row of hanging pans above him, a large barrel or drum to his left, and what appears to be a trash can to his right. In the foreground, we can see part of a sink or countertop.\nThe animation style is characteristic of early Disney cartoons, with simple, bold lines and high contrast between the black and white elements. The kitchen setting suggests this might be the beginning of a cooking-related adventure or mishap.\nThe overall mood is lighthearted and playful, typical of classic Mickey Mouse cartoons. The kitchen setting and Mickey's pose with utensils imply that some form of culinary activity or comedy is about to unfold.\n===================================================\nProcessing 3f0979e6cae25447f416372c49ad5e07.mp4\nCaption: This frame is from a classic black and white animated cartoon video. The style is reminiscent of early Disney or similar pioneering animation studios from the 1920s or 1930s.\nThe scene depicts two cartoon characters on a simple background with horizontal lines, likely representing a musical staff. On the right is a character that appears to be an early version of Mickey Mouse, recognizable by his round ears and gloved hands. To the left is an anthropomorphic animal character with an exaggerated open mouth, possibly in the act of singing or making noise.\nMusical notes and symbols are scattered around the frame, reinforcing the musical theme. The animation style is characteristic of the era, with bold black outlines and simple shapes. The overall mood is playful and whimsical, typical of early animated shorts that often featured musical numbers or rhythmic action.\nThis single frame suggests the video is likely a musical cartoon featuring animated characters performing or interacting with music in some way. The vintage look and feel add to its charm and historical significance in the realm of animation.\n===================================================",
  "metrics": {
    "predict_time": 70.370199444,
    "total_time": 75.754346
  },
  "output": "https://replicate.delivery/czjl/ePE59RxX1mW4MafwHJuifFjTnBVbKmTh1ULc6VXVgfcOqWqPB/video_captions.zip",
  "started_at": "2024-12-13T21:34:21.464147Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/fddq-qthilsykfnjcrv3yym5kv6ek2qza5jgpgtn7ebetqmwoy7ggbjya",
    "get": "https://api.replicate.com/v1/predictions/8ma06swr21rg80ckr7rt2krra4",
    "cancel": "https://api.replicate.com/v1/predictions/8ma06swr21rg80ckr7rt2krra4/cancel"
  },
  "version": "bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791"
}

Generated in

70.4 seconds

Tweak it ShareReport

Files extracted:
/tmp/outputs/05ccfa61ece031e881d173289761cf91.mp4
/tmp/outputs/4c918b917308ff03120e9e86650a2d3c.mp4
/tmp/outputs/2c1ed5408882479b06681f7cf372916a.mp4
/tmp/outputs/1d50a3d9703f152758d5422c8b48010f.mp4
/tmp/outputs/5a0229ffdb3bd9d8e81dca7988d7cdbb.mp4
/tmp/outputs/7d6dcf13f5c3d45b85c5ea0544c429e4.mp4
/tmp/outputs/4adbb3a2945c9edd78785daccfd23e80.mp4
/tmp/outputs/0bb5f6dbf8ed2e0060f0ac4164b24847.mp4
/tmp/outputs/7fe0c83572de828da1cab0c118dece14.mp4
/tmp/outputs/05a234b0164d015d468f2f53e771b4cf.mp4
/tmp/outputs/8adfde998361b1d7c6f38a35481667fd.mp4
/tmp/outputs/3f0979e6cae25447f416372c49ad5e07.mp4
Number of videos to be captioned: 12
===================================================
Processing 05ccfa61ece031e881d173289761cf91.mp4
Caption: This frame is from a classic black and white animated cartoon. The main subject is a stylized cartoon cow character with a cheerful expression, standing on what appears to be a wooden platform or bench. The cow has large, expressive eyes, a big smile, and is wearing a collar with the name "BOB" visible.
The animation style is reminiscent of early Disney or Fleischer Studios cartoons from the 1930s, with simple, bold lines and high contrast between the black character and the lighter background. The background itself is minimalistic, showing some curved lines suggesting a hilly or mountainous landscape in the distance.
The overall mood is lighthearted and whimsical, typical of early animated shorts. The cow's posture and expression give a sense of anticipation, as if it's about to embark on an adventure or perform some comical act.
This type of animation would likely be part of a short comedic cartoon, possibly featuring anthropomorphic animals in humorous situations. The simplicity of the style and the black and white coloration indicate this is likely from the early era of animated films.
===================================================
Processing 4c918b917308ff03120e9e86650a2d3c.mp4
Caption: This frame is from a classic black and white animated cartoon. The style is characteristic of early 20th century animation, reminiscent of Walt Disney or Fleischer Studios productions.
The main subject of the frame is a cartoon character that appears to be an anthropomorphic cow or bull. It has a distinctive round nose, horns, and is standing on two legs like a person. The character is positioned on what looks like a wooden platform or deck.
The background shows a simple, sketched landscape with minimal detail, typical of the era's animation style. There are suggestions of hills or mountains in the distance, and possibly a tree or structure at the top of the frame.
The overall mood is whimsical and light-hearted, as is common in cartoons of this period. The simplicity of the black and white imagery adds to the nostalgic feel of the scene.
This appears to be a single frame from a longer animated sequence, likely part of a story or comedic situation involving the cartoon character. The vintage animation style and character design suggest this is from a classic animated short film, possibly from the 1920s or 1930s.
===================================================
Processing 2c1ed5408882479b06681f7cf372916a.mp4
Caption: This frame is from a classic black and white animated cartoon, likely from the early days of Disney animation. The scene depicts Mickey Mouse, the iconic Disney character, standing in front of a large, menacing creature with an enormous open mouth filled with teeth. The creature appears to be some sort of monster or beast, possibly a representation of a cow or bull given its horns.
The animation style is characteristic of early 20th century cartoons, with simple, bold lines and high contrast between black and white elements. Mickey is shown in his classic design, with his round ears, big shoes, and white gloves. He appears small in comparison to the looming mouth of the creature, creating a sense of danger or challenge for the character.
The background is minimalistic, focusing attention on the main action. There's a hint of a curved line that could represent the floor or a wall, and a circular object in the upper right corner that might be a light fixture.
This frame likely comes from a comical or adventurous sequence where Mickey is facing off against or escaping from this large, intimidating creature. The exaggerated proportions and expressions are typical of the playful, often surreal nature of early animated shorts.
The overall mood is a mix of humor and mild peril, which was common in cartoons of this era. The stark black and white palette adds to the dramatic effect of the scene.
===================================================
Processing 1d50a3d9703f152758d5422c8b48010f.mp4
Caption: This frame is from a classic black and white animated cartoon, likely from the early days of Disney animation. The scene depicts Mickey Mouse standing at the helm of a steamboat, steering the large wheel. Behind him, a tall anthropomorphic character (possibly Goofy or Pete) is leaning over, appearing to interact with Mickey or observe his actions.
The setting is the deck of a steamboat, with a bell visible in the upper right corner and simplified, sketchy landscape elements in the background suggesting a body of water and distant shoreline. The art style is characteristic of early 20th century animation, with exaggerated, rubbery limbs and simplified features.
The mood of the scene appears light-hearted and adventurous, typical of Mickey Mouse cartoons of that era. The composition suggests movement and action, with Mickey at the center of the frame, actively engaged in steering the boat.
This type of animation represents a significant period in animation history, showcasing the charm and creativity of early Disney shorts that laid the foundation for future animated storytelling.
===================================================
Processing 5a0229ffdb3bd9d8e81dca7988d7cdbb.mp4
Caption: This frame is from a classic black and white animated cartoon, likely from the early days of Disney animation. The image shows three cartoon characters in a simple, hand-drawn style characteristic of early 20th century animation.
In the center of the frame is a tall, lanky character with exaggerated features, including an extremely long snout or nose. To the left is Mickey Mouse, instantly recognizable with his round ears and button nose. On the right is another mouse-like character, possibly Minnie Mouse, given the similar style to Mickey.
The background is minimal, with horizontal lines suggesting a basic setting. The characters are drawn with bold, black outlines against a light background, creating a stark contrast typical of this era of animation.
The scene appears to be a comedic moment, with the central character's elongated face creating a humorous visual gag. The positioning of the characters suggests they are interacting or reacting to this central figure's unusual appearance.
This frame captures the charm and simplicity of early animated shorts, where character expression and physical comedy were key elements in storytelling. The style is nostalgic, evoking the golden age of hand-drawn animation and the birth of iconic cartoon characters.
===================================================
Processing 7d6dcf13f5c3d45b85c5ea0544c429e4.mp4
Caption: This frame is from a classic black and white animated cartoon featuring Mickey Mouse. The scene takes place on a boat, with Mickey standing at the ship's wheel. The animation style is characteristic of early Disney cartoons, with simple, bold lines and high contrast.
Mickey is depicted in his iconic form, with large round ears, white gloves, and oversized shoes. He's shown in profile, gripping the spokes of the large wooden ship's wheel, which dominates the left side of the frame. The wheel is much larger than Mickey, emphasizing his small stature and the grand scale of the vessel.
In the background, we can see stylized representations of the sea and sky through what appears to be a porthole or window. A life preserver is visible on the right side of the frame, further establishing the nautical setting.
The mood of the scene is lighthearted and adventurous, typical of Mickey Mouse cartoons of this era. The composition suggests Mickey is steering the boat, possibly embarking on a sea voyage or maritime adventure.
This single frame captures the essence of early animation techniques and the charm of Mickey Mouse as a character, inviting viewers into a world of whimsical nautical exploration.
===================================================
Processing 4adbb3a2945c9edd78785daccfd23e80.mp4
Caption: This frame is from a classic black and white animated cartoon. The image shows a stylized cartoon bird, likely a crow or raven, perched on what appears to be a T-shaped stand or perch. The bird has a distinctive large beak, exaggerated eye, and sleek black feathers. Its posture is upright and alert, with its tail curling around the perch.
The animation style is characteristic of early to mid-20th century cartoons, with bold, simple lines and high contrast between the black bird and the light background. The bird's expression seems mischievous or cunning, typical of animated animal characters from this era.
The overall mood of the video appears to be lighthearted and comical, as is common in classic cartoon shorts. The simplicity of the scene suggests that this might be part of a longer narrative where the bird character plays a central role, perhaps as a trickster or clever protagonist.
This type of vintage animation often relies on exaggerated movements and expressions to convey humor and personality, so it's likely that subsequent frames would show the bird engaging in animated antics or reactions.
===================================================
Processing 0bb5f6dbf8ed2e0060f0ac4164b24847.mp4
Caption: This frame is from a classic black and white animated cartoon. The scene depicts a small, anthropomorphic mouse character, likely Mickey Mouse, standing on what appears to be a wooden surface or platform. Next to the mouse is a much larger, comical-looking cow or bull character, suspended by a harness or pulley system.
The cow is drawn in a exaggerated, cartoonish style with a large body, thin legs, and an expressive face. It appears to be dangling or about to be lifted, creating a humorous contrast with the small mouse character.
The background is minimalistic, with a white space suggesting a snowy or blank environment, typical of early animation styles. There's a hint of a horizon line or landscape in the distance.
This frame captures the whimsical and often absurd nature of early animated shorts, where physical comedy and visual gags were key elements. The stark contrast between the tiny mouse and the suspended large animal creates an immediate sense of anticipation for comedic action to follow.
The overall mood is light-hearted and playful, inviting viewers into a world where the laws of physics are bent for comedic effect. This type of scene would likely lead to a series of slapstick interactions between the characters, a hallmark of classic cartoon humor.
===================================================
Processing 7fe0c83572de828da1cab0c118dece14.mp4
Caption: This frame is from a black and white animated video, likely an early cartoon from the mid-20th century. The image shows a stylized cartoon bird perched on what appears to be a cylindrical object, possibly a perch or branch. The bird has an exaggerated, comical appearance with a large, smiling beak and a rounded body. Its tail feathers are long and curved, adding to the whimsical design.
The animation style is characteristic of early hand-drawn cartoons, with bold outlines and simple shapes. The background is minimal, focusing attention on the bird character. The contrast between the dark figure of the bird and the lighter background creates a clear, striking image.
This type of cartoon typically features anthropomorphized animals in humorous situations. The bird's cheerful expression suggests it might be part of a lighthearted, comedic scene or story. The simplicity of the design and the black and white coloration are indicative of the limitations and aesthetic choices of early animation techniques.
The overall mood of the video, based on this frame, appears to be playful and entertaining, aimed at delivering visual comedy through exaggerated character design and animated antics.
===================================================
Processing 05a234b0164d015d468f2f53e771b4cf.mp4
Caption: This frame is from a classic black and white animated cartoon. The scene depicts three cartoon characters on what appears to be a ship's deck or platform. In the center, there's a tall, lanky character (likely Goofy) holding onto a rope or cable. To his right is a smaller character with round ears (resembling Mickey Mouse) who seems to be leaning or looking overboard. On the left is another small character, possibly a sailor or captain given the hat.
The animation style is characteristic of early Disney cartoons, with simple, expressive designs and bold outlines. The background suggests an ocean setting, with faint outlines of what might be waves or distant landforms visible.
The scene conveys a sense of adventure or exploration, with the characters positioned as if they're observing something off-screen or preparing for some nautical activity. The composition creates a feeling of anticipation or curiosity about what the characters are seeing or about to do.
This type of vintage animation often features slapstick humor and exaggerated physical comedy, so the setup might be leading to a humorous situation or gag involving the characters and their seafaring exploits.
===================================================
Processing 8adfde998361b1d7c6f38a35481667fd.mp4
Caption: This frame is from a classic black and white animated cartoon, likely featuring Mickey Mouse. The scene is set in a kitchen, as indicated by the "KITCH" sign visible in the upper right corner and various kitchen implements throughout the frame.
Mickey Mouse stands in the center, holding utensils in each hand. He's surrounded by kitchen equipment, including a row of hanging pans above him, a large barrel or drum to his left, and what appears to be a trash can to his right. In the foreground, we can see part of a sink or countertop.
The animation style is characteristic of early Disney cartoons, with simple, bold lines and high contrast between the black and white elements. The kitchen setting suggests this might be the beginning of a cooking-related adventure or mishap.
The overall mood is lighthearted and playful, typical of classic Mickey Mouse cartoons. The kitchen setting and Mickey's pose with utensils imply that some form of culinary activity or comedy is about to unfold.
===================================================
Processing 3f0979e6cae25447f416372c49ad5e07.mp4
Caption: This frame is from a classic black and white animated cartoon video. The style is reminiscent of early Disney or similar pioneering animation studios from the 1920s or 1930s.
The scene depicts two cartoon characters on a simple background with horizontal lines, likely representing a musical staff. On the right is a character that appears to be an early version of Mickey Mouse, recognizable by his round ears and gloved hands. To the left is an anthropomorphic animal character with an exaggerated open mouth, possibly in the act of singing or making noise.
Musical notes and symbols are scattered around the frame, reinforcing the musical theme. The animation style is characteristic of the era, with bold black outlines and simple shapes. The overall mood is playful and whimsical, typical of early animated shorts that often featured musical numbers or rhythmic action.
This single frame suggests the video is likely a musical cartoon featuring animated characters performing or interacting with music in some way. The vintage look and feel add to its charm and historical significance in the realm of animation.
===================================================

Prediction

lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791

Model

lucataco/bulk-video-caption:bd610b3c

ah7ykc0gxxrg80ckr7tty5xvc8

Status

Succeeded

Source

Web

Hardware

CPU

Total duration

16.8s

Created

7 months ago

Input

model: gpt-4o
include_csv
system_prompt: Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.
caption_prefix
caption_suffix: melty.
openai_api_key: ████████████████████
This value was redacted after being sent to the model.
frames_to_extract: 1
video_zip_archive: melty-seg-3.zip

{
  "model": "gpt-4o",
  "include_csv": false,
  "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
  "caption_prefix": "",
  "caption_suffix": "melty.",
  "openai_api_key": "[REDACTED]",
  "frames_to_extract": 1,
  "video_zip_archive": "https://replicate.delivery/pbxt/M8gY3MV17leQVFCbrJo0CWRznIv6joAa20dfVfqCpwulNHKj/melty-seg-3.zip"
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
  {
    input: {
      model: "gpt-4o",
      include_csv: false,
      system_prompt: "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
      caption_prefix: "",
      caption_suffix: "melty.",
      openai_api_key: "[REDACTED]",
      frames_to_extract: 1,
      video_zip_archive: "https://replicate.delivery/pbxt/M8gY3MV17leQVFCbrJo0CWRznIv6joAa20dfVfqCpwulNHKj/melty-seg-3.zip"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
    input={
        "model": "gpt-4o",
        "include_csv": False,
        "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
        "caption_prefix": "",
        "caption_suffix": "melty.",
        "openai_api_key": "[REDACTED]",
        "frames_to_extract": 1,
        "video_zip_archive": "https://replicate.delivery/pbxt/M8gY3MV17leQVFCbrJo0CWRznIv6joAa20dfVfqCpwulNHKj/melty-seg-3.zip"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lucataco/bulk-video-caption using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/bulk-video-caption:bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791",
    "input": {
      "model": "gpt-4o",
      "include_csv": false,
      "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
      "caption_prefix": "",
      "caption_suffix": "melty.",
      "openai_api_key": "[REDACTED]",
      "frames_to_extract": 1,
      "video_zip_archive": "https://replicate.delivery/pbxt/M8gY3MV17leQVFCbrJo0CWRznIv6joAa20dfVfqCpwulNHKj/melty-seg-3.zip"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

video_captions.zip

{
  "completed_at": "2024-12-13T21:38:20.454214Z",
  "created_at": "2024-12-13T21:38:03.631000Z",
  "data_removed": false,
  "error": null,
  "id": "ah7ykc0gxxrg80ckr7tty5xvc8",
  "input": {
    "model": "gpt-4o",
    "include_csv": false,
    "system_prompt": "Analyze these frames from a video and write a detailed caption. Describe the type of video (e.g., animation, live-action footage, etc.). Focus on consistent elements across frames and any notable motion or action. Describe the main subjects, setting, and overall mood of the video. Use clear, descriptive language suitable for text-to-video generation.",
    "caption_prefix": "",
    "caption_suffix": "melty.",
    "openai_api_key": "[REDACTED]",
    "frames_to_extract": 1,
    "video_zip_archive": "https://replicate.delivery/pbxt/M8gY3MV17leQVFCbrJo0CWRznIv6joAa20dfVfqCpwulNHKj/melty-seg-3.zip"
  },
  "logs": "Files extracted:\n/tmp/outputs/straws.mov\n/tmp/outputs/glove.mov\n/tmp/outputs/cat.mov\nNumber of videos to be captioned: 3\n===================================================\nProcessing straws.mov\nCaption: A vibrant cluster of colorful drinking straws is tightly packed together, showcasing a range of bright hues including orange, pink, blue, green, and yellow. The straws are slightly tilted, suggesting movement or rearrangement. The background is a neutral gray, emphasizing the vividness of the straws. The mood is playful and dynamic. Melty.\n===================================================\nProcessing glove.mov\nCaption: A pale yellow rubber glove is placed against a smooth, softly-lit background. The material appears shiny and slightly wrinkled, suggesting a relaxed and casual setting. The overall mood is calm and minimalistic.\nmelty.\n===================================================\nProcessing cat.mov\nCaption: The frame depicts an adorable kitten with large, expressive eyes and soft, striped fur, looking intently at something off-camera while nestled on a cozy surface. The kitten's curious expression is captivating, adding to the overall warm and endearing mood of the scene. melty.\n===================================================",
  "metrics": {
    "predict_time": 10.573291217,
    "total_time": 16.823214
  },
  "output": "https://replicate.delivery/czjl/4rU99k2uoTr6PFHn0fQCtw2BtMSG7F2ekCwdoQBlBkJMtl6TA/video_captions.zip",
  "started_at": "2024-12-13T21:38:09.880923Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/fddq-s3okqivpofyzbvqriiomt27rwbvkvv7g65b6yadjzz6qv2f6qgxq",
    "get": "https://api.replicate.com/v1/predictions/ah7ykc0gxxrg80ckr7tty5xvc8",
    "cancel": "https://api.replicate.com/v1/predictions/ah7ykc0gxxrg80ckr7tty5xvc8/cancel"
  },
  "version": "bd610b3c0ecd967e0528ff94d6d1b19cb067aaa12cf1516029ca3803d7e46791"
}

Generated in

10.6 seconds

Tweak it ShareReport