fofr
/
batch-image-captioning
A wrapper model for captioning multiple images using GPT, Claude or Gemini, useful for lora training
Prediction
fofr/batch-image-captioning:d0adb15fID8zj4ygh84srg80ch9e19xw111mStatusSucceededSourceWebHardwareCPUTotal durationCreatedInput
- model
- gpt-4o-2024-08-06
- max_dimension
- 1024
- system_prompt
- Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using "or". Precise composition is important. Avoid phrases like "conveys a sense of" and "capturing the", just use the terms themselves. Good examples are: "Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine." "A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series." "An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed."
- caption_prefix
- caption_suffix
- message_prompt
- Caption this image please
- openai_api_key
- ████████████████████
This value was redacted after being sent to the model.
- image_zip_archive
- Archive.zip
- resize_images_for_captioning
{ "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", "caption_prefix": "", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip", "resize_images_for_captioning": true }
Install Replicate’s Node.js client library:npm install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run fofr/batch-image-captioning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "fofr/batch-image-captioning:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726", { input: { model: "gpt-4o-2024-08-06", max_dimension: 1024, system_prompt: "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", caption_prefix: "", caption_suffix: "", message_prompt: "Caption this image please", openai_api_key: "[REDACTED]", image_zip_archive: "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip", resize_images_for_captioning: true } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import the client:import replicate
Run fofr/batch-image-captioning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "fofr/batch-image-captioning:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726", input={ "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", "caption_prefix": "", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip", "resize_images_for_captioning": True } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run fofr/batch-image-captioning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726", "input": { "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \\"or\\". Precise composition is important. Avoid phrases like \\"conveys a sense of\\" and \\"capturing the\\", just use the terms themselves.\\n\\nGood examples are:\\n\\n\\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\\"\\n\\n\\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\\"\\n\\n\\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\\"\\n", "caption_prefix": "", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip", "resize_images_for_captioning": true } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/fofr/batch-image-captioning@sha256:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726 \ -i 'model="gpt-4o-2024-08-06"' \ -i 'max_dimension=1024' \ -i $'system_prompt="Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \\"or\\". Precise composition is important. Avoid phrases like \\"conveys a sense of\\" and \\"capturing the\\", just use the terms themselves.\\n\\nGood examples are:\\n\\n\\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\\"\\n\\n\\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\\"\\n\\n\\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\\"\\n"' \ -i 'caption_prefix=""' \ -i 'caption_suffix=""' \ -i 'message_prompt="Caption this image please"' \ -i 'openai_api_key="[REDACTED]"' \ -i 'image_zip_archive="https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip"' \ -i 'resize_images_for_captioning=true'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 r8.im/fofr/batch-image-captioning@sha256:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \\"or\\". Precise composition is important. Avoid phrases like \\"conveys a sense of\\" and \\"capturing the\\", just use the terms themselves.\\n\\nGood examples are:\\n\\n\\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\\"\\n\\n\\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\\"\\n\\n\\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\\"\\n", "caption_prefix": "", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip", "resize_images_for_captioning": true } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
{ "completed_at": "2024-08-13T11:32:47.134586Z", "created_at": "2024-08-13T11:31:46.854000Z", "data_removed": false, "error": null, "id": "8zj4ygh84srg80ch9e19xw111m", "input": { "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", "caption_prefix": "", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LREOQCiXFRxVaSpwt2MYMwuwiEMIuiIw8YPm7rLLGPH94f57/Archive.zip", "resize_images_for_captioning": true }, "logs": "Files extracted:\n/tmp/outputs/2024-06-01--15-16-53-u-q3-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\n/tmp/outputs/2024-06-01--15-16-53-u-q1-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\n/tmp/outputs/2024-06-01--15-16-53-u-q4-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\n/tmp/outputs/2024-06-01--15-16-53-u-q2-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\nNumber of images to be captioned: 4\n===================================================\nProcessing 2024-06-01--15-16-53-u-q3-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\nResized from 928x1232 to 771x1024\nCaption: Digital artwork of an abstract, cybernetic rabbit. The rabbit is composed of intricate, neon-like lines and patterns in blue and white, with a luminous, swirling design. Its eyes are glowing red, and it sits against a dark background. A spherical, luminous object hovers in the top right corner, casting a mystical glow.\n===================================================\nProcessing 2024-06-01--15-16-53-u-q1-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\nResized from 928x1232 to 771x1024\nCaption: Abstract digital illustration of a rabbit, featuring dynamic, swirling lines and vivid colors. The focus is on the rabbit's glowing red eyes and long ears, drawn in a sketchy style. The background is a dark, cosmic mix of deep blues and bright reds, creating a sense of mystery. Thin, energetic white lines surround the figure, adding motion and intensity.\n===================================================\nProcessing 2024-06-01--15-16-53-u-q4-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\nResized from 928x1232 to 771x1024\nCaption: Abstract digital painting featuring an intense, mischievous creature with large ears and glowing yellow eyes. The figure is surrounded by a chaotic swirl of red and blue strokes, giving a sense of movement and energy. The creature's wide grin and sharp features contribute to its menacing presence. Dark background contrasts with vibrant colors, enhancing the dramatic effect.\n===================================================\nProcessing 2024-06-01--15-16-53-u-q2-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png\nResized from 928x1232 to 771x1024\nCaption: Digital painting, abstract, vibrant. A stylized creature resembling a rabbit is depicted in dark blue and black tones. Bold, swirling yellow and white strokes create dynamic movement around the creature. The background is filled with intricate patterns, deep shadows, and highlights.\n===================================================", "metrics": { "predict_time": 23.038720888, "total_time": 60.280586 }, "output": "https://replicate.delivery/czjl/U2CPf1tLXev5gkSFqfxUUeD1cqyyi2VWS0fGB6iBmgL7L7RaC/captions_and_csv.zip", "started_at": "2024-08-13T11:32:24.095865Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/8zj4ygh84srg80ch9e19xw111m", "cancel": "https://api.replicate.com/v1/predictions/8zj4ygh84srg80ch9e19xw111m/cancel" }, "version": "3adde40e56d70b1ff1a6f1300da81b8af9a0f7983163f83022ebdb2c911fdc49" }
Generated inFiles extracted: /tmp/outputs/2024-06-01--15-16-53-u-q3-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png /tmp/outputs/2024-06-01--15-16-53-u-q1-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png /tmp/outputs/2024-06-01--15-16-53-u-q4-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png /tmp/outputs/2024-06-01--15-16-53-u-q2-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png Number of images to be captioned: 4 =================================================== Processing 2024-06-01--15-16-53-u-q3-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png Resized from 928x1232 to 771x1024 Caption: Digital artwork of an abstract, cybernetic rabbit. The rabbit is composed of intricate, neon-like lines and patterns in blue and white, with a luminous, swirling design. Its eyes are glowing red, and it sits against a dark background. A spherical, luminous object hovers in the top right corner, casting a mystical glow. =================================================== Processing 2024-06-01--15-16-53-u-q1-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png Resized from 928x1232 to 771x1024 Caption: Abstract digital illustration of a rabbit, featuring dynamic, swirling lines and vivid colors. The focus is on the rabbit's glowing red eyes and long ears, drawn in a sketchy style. The background is a dark, cosmic mix of deep blues and bright reds, creating a sense of mystery. Thin, energetic white lines surround the figure, adding motion and intensity. =================================================== Processing 2024-06-01--15-16-53-u-q4-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png Resized from 928x1232 to 771x1024 Caption: Abstract digital painting featuring an intense, mischievous creature with large ears and glowing yellow eyes. The figure is surrounded by a chaotic swirl of red and blue strokes, giving a sense of movement and energy. The creature's wide grin and sharp features contribute to its menacing presence. Dark background contrasts with vibrant colors, enhancing the dramatic effect. =================================================== Processing 2024-06-01--15-16-53-u-q2-fofr_pikachu_91215d95-1cb7-43c3-9448-5d97975efcf1.png Resized from 928x1232 to 771x1024 Caption: Digital painting, abstract, vibrant. A stylized creature resembling a rabbit is depicted in dark blue and black tones. Bold, swirling yellow and white strokes create dynamic movement around the creature. The background is filled with intricate patterns, deep shadows, and highlights. ===================================================
Prediction
fofr/batch-image-captioning:d0adb15fIDh6t1p9ahv1rgc0ch9e3v9h2d2mStatusSucceededSourceWebHardwareCPUTotal durationCreatedInput
- model
- gpt-4o-2024-08-06
- max_dimension
- 1024
- system_prompt
- Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using "or". Precise composition is important. Avoid phrases like "conveys a sense of" and "capturing the", just use the terms themselves. Good examples are: "Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine." "A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series." "An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed."
- caption_prefix
- a photo of a phone in a toaster
- caption_suffix
- message_prompt
- Caption this image please
- openai_api_key
- ████████████████████
This value was redacted after being sent to the model.
- image_zip_archive
- Archive.zip
- resize_images_for_captioning
{ "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", "caption_prefix": "a photo of a phone in a toaster", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip", "resize_images_for_captioning": true }
Install Replicate’s Node.js client library:npm install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run fofr/batch-image-captioning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "fofr/batch-image-captioning:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726", { input: { model: "gpt-4o-2024-08-06", max_dimension: 1024, system_prompt: "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", caption_prefix: "a photo of a phone in a toaster", caption_suffix: "", message_prompt: "Caption this image please", openai_api_key: "[REDACTED]", image_zip_archive: "https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip", resize_images_for_captioning: true } } ); console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Import the client:import replicate
Run fofr/batch-image-captioning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "fofr/batch-image-captioning:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726", input={ "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", "caption_prefix": "a photo of a phone in a toaster", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip", "resize_images_for_captioning": True } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Set theREPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run fofr/batch-image-captioning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726", "input": { "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \\"or\\". Precise composition is important. Avoid phrases like \\"conveys a sense of\\" and \\"capturing the\\", just use the terms themselves.\\n\\nGood examples are:\\n\\n\\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\\"\\n\\n\\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\\"\\n\\n\\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\\"\\n", "caption_prefix": "a photo of a phone in a toaster", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip", "resize_images_for_captioning": true } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/fofr/batch-image-captioning@sha256:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726 \ -i 'model="gpt-4o-2024-08-06"' \ -i 'max_dimension=1024' \ -i $'system_prompt="Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \\"or\\". Precise composition is important. Avoid phrases like \\"conveys a sense of\\" and \\"capturing the\\", just use the terms themselves.\\n\\nGood examples are:\\n\\n\\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\\"\\n\\n\\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\\"\\n\\n\\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\\"\\n"' \ -i 'caption_prefix="a photo of a phone in a toaster"' \ -i 'caption_suffix=""' \ -i 'message_prompt="Caption this image please"' \ -i 'openai_api_key="[REDACTED]"' \ -i 'image_zip_archive="https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip"' \ -i 'resize_images_for_captioning=true'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 r8.im/fofr/batch-image-captioning@sha256:d0adb15f4826881a68f1d82e0b10fe2ee1af536632dc8313f7f777ed8d264726
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \\"or\\". Precise composition is important. Avoid phrases like \\"conveys a sense of\\" and \\"capturing the\\", just use the terms themselves.\\n\\nGood examples are:\\n\\n\\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\\"\\n\\n\\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\\"\\n\\n\\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\\"\\n", "caption_prefix": "a photo of a phone in a toaster", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip", "resize_images_for_captioning": true } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
{ "completed_at": "2024-08-13T11:37:45.452413Z", "created_at": "2024-08-13T11:37:25.208000Z", "data_removed": false, "error": null, "id": "h6t1p9ahv1rgc0ch9e3v9h2d2m", "input": { "model": "gpt-4o-2024-08-06", "max_dimension": 1024, "system_prompt": "Write a four sentence caption for this image. In the first sentence describe the style and type (painting, photo, etc) of the image. Describe in the remaining sentences the contents and composition of the image. Only use language that would be used to prompt a text to image model. Do not include usage. Comma separate keywords rather than using \"or\". Precise composition is important. Avoid phrases like \"conveys a sense of\" and \"capturing the\", just use the terms themselves.\n\nGood examples are:\n\n\"Photo of an alien woman with a glowing halo standing on top of a mountain, wearing a white robe and silver mask in the futuristic style with futuristic design, sky background, soft lighting, dynamic pose, a sense of future technology, a science fiction movie scene rendered in the Unreal Engine.\"\n\n\"A scene from the cartoon series Masters of the Universe depicts Man-At-Arms wearing a gray helmet and gray armor with red gloves. He is holding an iron bar above his head while looking down on Orko, a pink blob character. Orko is sitting behind Man-At-Arms facing left on a chair. Both characters are standing near each other, with Orko inside a yellow chestplate over a blue shirt and black pants. The scene is drawn in the style of the Masters of the Universe cartoon series.\"\n\n\"An emoji, digital illustration, playful, whimsical. A cartoon zombie character with green skin and tattered clothes reaches forward with two hands, they have green skin, messy hair, an open mouth and gaping teeth, one eye is half closed.\"\n", "caption_prefix": "a photo of a phone in a toaster", "caption_suffix": "", "message_prompt": "Caption this image please", "openai_api_key": "[REDACTED]", "image_zip_archive": "https://replicate.delivery/pbxt/LRETkJ5qT5ghj6DfjAhUYUNdufuX47IKlFfUV8QDNyS6uwMy/Archive.zip", "resize_images_for_captioning": true }, "logs": "Files extracted:\n/tmp/outputs/2024-05-14--09-10-07-u-q3-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\n/tmp/outputs/2024-05-14--09-10-07-u-q2-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\n/tmp/outputs/2024-05-14--09-10-07-u-q1-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\n/tmp/outputs/2024-05-14--09-10-07-u-q4-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\nNumber of images to be captioned: 4\n===================================================\nProcessing 2024-05-14--09-10-07-u-q3-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\nResized from 928x1232 to 771x1024\nCaption: A photo of a phone in a toaster, on a sunny kitchen countertop. The toaster holds a piece of toast and a phone displaying a full battery icon. A glass mug filled with a beige drink sits in the background. Soft shadows add warmth to the scene, creating a playful juxtaposition.\n===================================================\nProcessing 2024-05-14--09-10-07-u-q2-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\nResized from 928x1232 to 771x1024\nCaption: A photo of a phone in a toaster, featuring a modern smartphone inserted into the slot of a vibrant orange toaster. The screen displays an image of an orange slice. The scene is set on a wooden countertop with natural light filtering through a nearby window. A clear glass tumbler and wooden kitchen items are faintly visible in the background.\n===================================================\nProcessing 2024-05-14--09-10-07-u-q1-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\nResized from 928x1232 to 771x1024\nCaption: A photo of a phone in a toaster shows a modern smartphone sitting vertically inside the slot of a retro-style toaster. The phone screen displays the time as 9:17 with a simple background. The toaster has a cream and silver finish with three dials on the front. The setting is a cozy kitchen with a wooden countertop and soft lighting.\n===================================================\nProcessing 2024-05-14--09-10-07-u-q4-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png\nResized from 928x1232 to 771x1024\nCaption: a photo of a phone in a toaster, with a green smartphone inserted into a teal vintage toaster. The toaster is placed on a kitchen countertop with a loaf of bread surrounding the device. In the background, there’s a blurred silver kettle and ceramic cookware. The scene is warmly lit, creating a cozy kitchen atmosphere.\n===================================================", "metrics": { "predict_time": 20.228402371, "total_time": 20.244413 }, "output": "https://replicate.delivery/czjl/NBDLQFY9K4rPMNhLTegSWUU230XvRzrLxvickadzVruEvHpJA/captions_and_csv.zip", "started_at": "2024-08-13T11:37:25.224010Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/h6t1p9ahv1rgc0ch9e3v9h2d2m", "cancel": "https://api.replicate.com/v1/predictions/h6t1p9ahv1rgc0ch9e3v9h2d2m/cancel" }, "version": "3adde40e56d70b1ff1a6f1300da81b8af9a0f7983163f83022ebdb2c911fdc49" }
Generated inFiles extracted: /tmp/outputs/2024-05-14--09-10-07-u-q3-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png /tmp/outputs/2024-05-14--09-10-07-u-q2-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png /tmp/outputs/2024-05-14--09-10-07-u-q1-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png /tmp/outputs/2024-05-14--09-10-07-u-q4-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png Number of images to be captioned: 4 =================================================== Processing 2024-05-14--09-10-07-u-q3-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png Resized from 928x1232 to 771x1024 Caption: A photo of a phone in a toaster, on a sunny kitchen countertop. The toaster holds a piece of toast and a phone displaying a full battery icon. A glass mug filled with a beige drink sits in the background. Soft shadows add warmth to the scene, creating a playful juxtaposition. =================================================== Processing 2024-05-14--09-10-07-u-q2-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png Resized from 928x1232 to 771x1024 Caption: A photo of a phone in a toaster, featuring a modern smartphone inserted into the slot of a vibrant orange toaster. The screen displays an image of an orange slice. The scene is set on a wooden countertop with natural light filtering through a nearby window. A clear glass tumbler and wooden kitchen items are faintly visible in the background. =================================================== Processing 2024-05-14--09-10-07-u-q1-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png Resized from 928x1232 to 771x1024 Caption: A photo of a phone in a toaster shows a modern smartphone sitting vertically inside the slot of a retro-style toaster. The phone screen displays the time as 9:17 with a simple background. The toaster has a cream and silver finish with three dials on the front. The setting is a cozy kitchen with a wooden countertop and soft lighting. =================================================== Processing 2024-05-14--09-10-07-u-q4-fofr_a_photo_of_a_phone_in_a_toaster_cb8e7a47-edf9-4b11-af8d-cbd799ccc983.png Resized from 928x1232 to 771x1024 Caption: a photo of a phone in a toaster, with a green smartphone inserted into a teal vintage toaster. The toaster is placed on a kitchen countertop with a loaf of bread surrounding the device. In the background, there’s a blurred silver kettle and ceramic cookware. The scene is warmly lit, creating a cozy kitchen atmosphere. ===================================================
Want to make some of these yourself?
Run this model