Prediction

Model

intelligent-utilities/topic-tags

6ddyjg7p4xrm80crn6gta56cy4

Status

Succeeded

Source

Web

Hardware

CPU (Small)

Total duration

2.5s

Created

3 months ago

Webhook

–

Input

text: Generate consistent characters Posted July 21, 2025 by fofr A grid of 8 images showing the same character in different scenes Until recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them. If you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler. Today we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs. she is wearing a pink t-shirt with the text “Replicate” on it Original reference image Original A grid of 4 outputs “she is wearing a pink t-shirt with the text “Replicate” on it” The best models for consistent characters As of July 2025, there are four models on Replicate that can create a realistic and accurate output from a single reference. In order of release: OpenAI’s gpt-image-1 Runway’s Gen-4 Image Black Forest Labs’s FLUX.1 Kontext Bytedance’s SeedEdit 3 Since this blog post was written, two new models have also been released: Ideogram’s Character Runway’s Gen-4 Image Turbo FLUX.1 Kontext comes in a few different flavors: pro, max and dev. Dev is an open source version of kontext, which is more controllable and fine-tunable, but isn’t as powerful as pro. To help write this blog post, I put together a little Replicate model to make it easy to compare outputs. Here is our comparison model, it runs FLUX.1 Kontext, SeedEdit 3.0, gpt-image-1 and Runway’s Gen-4 in parallel: fofr/compare-character-consistency. (Did you know that anyone can create and push models to Replicate?) Price and speed comparison First, the essentials: speed and cost. The table below shows the price and speed of each model. The price of gpt-image-1 depends on the output quality you choose (low, medium, high). The price of Gen-4 Image depends on whether you choose 720p or 1080p resolution. In summary though, gpt-image-1 is the slowest and most expensive model, and Kontext Dev is the cheapest and fastest. The tradeoffs are in quality, and we’ll look at that in more detail below. Model Price (per image) Speed Date OpenAI gpt-image-1 $0.04–$0.17 16s–59s April 2025 Runway Gen-4 Image $0.05–$0.08 20s–27s April 2025 Black Forest Labs FLUX.1 Kontext Pro $0.04 5s May 2025 Black Forest Labs FLUX.1 Kontext Max $0.08 7s May 2025 Black Forest Labs FLUX.1 Kontext Dev $0.025 4s May 2025 Bytedance SeedEdit 3 $0.03 13s July 2025 Preserving a character’s identity Let’s compare how well each model preserves a character’s identity. In the following comparisons, we are using gpt-image-1 with the high quality and high fidelity settings. We stick with FLUX.1 Kontext Pro as the best compromise between quality and speed. And we use Gen-4 Image at 1080p. Photographic accuracy Below are a varied set of examples, showing the strengths and weaknesses of each model, all focusing on photographic outputs. A new activity In these two examples, we can see the strengths of Gen-4 coming through. The composition is the most compelling, and the character is the most accurate. she is playing the piano Original reference image Original A grid of 4 outputs “she is playing the piano” he is playing the guitar Original reference image Original A grid of 4 outputs “he is playing the guitar” Tweak the scene If you want to keep most of the original composition, and change just a small part of the scene, all models handle this well. remove the glass of drink Original reference image Original A grid of 4 outputs “remove the glass of drink” Half-length portrait with unusual hair and eye color A more challenging comparison, here is a character with heterochromia and hair with two colors, as well as some facial marks. We can see that every model is capable of handling the hair and eyes. (Some needed a few retries to get this right.) a half-length portrait photo of her in a summer forest Original reference image Original A grid of 4 outputs “a half-length portrait photo of her in a summer forest” A shave, a coat and some rain Rather than keeping everything consistent, let’s try to keep the same person but change some things. It’s a bit of a mixed bag here, only SeedEdit 3 and gpt-image-1 can handle the clean-shaven request. But gpt-image-1 is also a completely different person, so that’s probably the worst result. remove his beard, put him in a raincoat, it is raining Original reference image Original A grid of 4 outputs “remove his beard, put him in a raincoat, it is raining” Trying tattoos Here we try a character with many distinct tattoos to see how well each model handles them. None are perfect, with Gen-4 and gpt-image-1 maintaining the neck tattoos the best. he is a chef cooking a meal in a restaurant kitchen Original reference image Original A grid of 4 outputs “he is a chef cooking a meal in a restaurant kitchen” Creative tasks and full transformations In these examples, we are looking to transform the character into something else, or show them in a different style. A good model will perform the transformation while maintaining the character’s identity. Changing the style With these simple style changes, we can see quickly that Gen-4 should not be used for these stylistic tasks. restyle this person as anime Original reference image Original A grid of 4 outputs “restyle this person as anime” make this a watercolor painting Original reference image Original A grid of 4 outputs “make this a watercolor painting” Becoming something else It’s halloween. We turn her into a witch, and him into an ogre, and someone else into a blue na’vi from Pandora. Gen-4 does the best witch output, but also the least convincing ogre. make her a witch Original reference image Original A grid of 4 outputs “make her a witch” turn him into a green skinned ogre Original reference image Original A grid of 4 outputs “turn him into a green skinned ogre” For this example, Kontext Pro didn’t want to create an image of a blue na’vi from Pandora, we’re showing Kontext Dev instead. turn him into a blue na’vi from pandora (avatar) Original reference image Original A grid of 4 outputs “turn him into a blue na’vi from pandora (avatar)“ Conclusion Overall, we found that: Kontext Pro is versatile and can give fabulous results, but often there are too many artifacts around the face, and these frequently make the image unusable (these artifacts do not seem to be present in Kontext Dev, but Dev has overall lower quality) gpt-image-1 will always add a distinctive yellow tint, and even with the high quality and high fidelity settings enabled, the identity will frequently change. With the highest cost and slowest speed, we’d only use this for the most complex of tasks. SeedEdit 3 tends to restrict itself to the initial composition, making it difficult to prompt a new angle or scene. Outputs are typically softer and can look more AI generated. Coherency is also a problem in complex scenes. Runway’s Gen-4 is the most adaptable and accurate when it comes to likeness in photos. It’s main drawback is coherency in complex scenes, and you might find some unexpected arms, limbs or hands. Sometimes this can be fixed with a few retries, sometimes not. Gen-4 also cannot restyle a scene. Our recommendations For photos you should start with Runway’s Gen-4 Image model. If you need faster or cheaper outputs, then Kontext Pro is the next best option. If you get some outputs from Gen-4 that aren’t coherent, you can always put them through Kontext Pro to fix them. For more creative tasks, and complete character transformations, try Kontext Pro first. If the task is more complex, and if you can afford it, you should also try gpt-image-1. SeedEdit 3 is a good cheap alternative if you can’t afford gpt-image-1 and kontext isn’t working for you. Do not use Gen-4 for stylistic tasks. That’s it for now, but stay tuned for more models, comparisons and experiments. Until then, try something new at replicate.com/explore, and follow us on X to see what we’re up to.
number_of_tags: 10

{
  "number_of_tags": 10,
  "text": "Generate consistent characters\nPosted July 21, 2025 by \nfofr\nA grid of 8 images showing the same character in different scenes\nUntil recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them.\n\nIf you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler.\n\nToday we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs.\n\nshe is wearing a pink t-shirt with the text “Replicate” on it\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is wearing a pink t-shirt with the text “Replicate” on it”\n\nThe best models for consistent characters\nAs of July 2025, there are four models on Replicate that can create a realistic and accurate output from a single reference. In order of release:\n\nOpenAI’s gpt-image-1\nRunway’s Gen-4 Image\nBlack Forest Labs’s FLUX.1 Kontext\nBytedance’s SeedEdit 3\nSince this blog post was written, two new models have also been released:\n\nIdeogram’s Character\nRunway’s Gen-4 Image Turbo\nFLUX.1 Kontext comes in a few different flavors: pro, max and dev. Dev is an open source version of kontext, which is more controllable and fine-tunable, but isn’t as powerful as pro.\n\nTo help write this blog post, I put together a little Replicate model to make it easy to compare outputs. Here is our comparison model, it runs FLUX.1 Kontext, SeedEdit 3.0, gpt-image-1 and Runway’s Gen-4 in parallel: fofr/compare-character-consistency.\n\n(Did you know that anyone can create and push models to Replicate?)\n\nPrice and speed comparison\nFirst, the essentials: speed and cost. The table below shows the price and speed of each model. The price of gpt-image-1 depends on the output quality you choose (low, medium, high). The price of Gen-4 Image depends on whether you choose 720p or 1080p resolution.\n\nIn summary though, gpt-image-1 is the slowest and most expensive model, and Kontext Dev is the cheapest and fastest. The tradeoffs are in quality, and we’ll look at that in more detail below.\n\nModel\tPrice (per image)\tSpeed\tDate\nOpenAI\ngpt-image-1\t$0.04–$0.17\t16s–59s\tApril 2025\nRunway\nGen-4 Image\t$0.05–$0.08\t20s–27s\tApril 2025\nBlack Forest Labs\nFLUX.1 Kontext Pro\t$0.04\t5s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Max\t$0.08\t7s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Dev\t$0.025\t4s\tMay 2025\nBytedance\nSeedEdit 3\t$0.03\t13s\tJuly 2025\nPreserving a character’s identity\nLet’s compare how well each model preserves a character’s identity.\n\nIn the following comparisons, we are using gpt-image-1 with the high quality and high fidelity settings. We stick with FLUX.1 Kontext Pro as the best compromise between quality and speed. And we use Gen-4 Image at 1080p.\n\nPhotographic accuracy\nBelow are a varied set of examples, showing the strengths and weaknesses of each model, all focusing on photographic outputs.\n\nA new activity\nIn these two examples, we can see the strengths of Gen-4 coming through. The composition is the most compelling, and the character is the most accurate.\n\nshe is playing the piano\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is playing the piano”\n\nhe is playing the guitar\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is playing the guitar”\n\nTweak the scene\nIf you want to keep most of the original composition, and change just a small part of the scene, all models handle this well.\n\nremove the glass of drink\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove the glass of drink”\n\nHalf-length portrait with unusual hair and eye color\nA more challenging comparison, here is a character with heterochromia and hair with two colors, as well as some facial marks.\n\nWe can see that every model is capable of handling the hair and eyes. (Some needed a few retries to get this right.)\n\na half-length portrait photo of her in a summer forest\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“a half-length portrait photo of her in a summer forest”\n\nA shave, a coat and some rain\nRather than keeping everything consistent, let’s try to keep the same person but change some things.\n\nIt’s a bit of a mixed bag here, only SeedEdit 3 and gpt-image-1 can handle the clean-shaven request. But gpt-image-1 is also a completely different person, so that’s probably the worst result.\n\nremove his beard, put him in a raincoat, it is raining\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove his beard, put him in a raincoat, it is raining”\n\nTrying tattoos\nHere we try a character with many distinct tattoos to see how well each model handles them. None are perfect, with Gen-4 and gpt-image-1 maintaining the neck tattoos the best.\n\nhe is a chef cooking a meal in a restaurant kitchen\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is a chef cooking a meal in a restaurant kitchen”\n\nCreative tasks and full transformations\nIn these examples, we are looking to transform the character into something else, or show them in a different style. A good model will perform the transformation while maintaining the character’s identity.\n\nChanging the style\nWith these simple style changes, we can see quickly that Gen-4 should not be used for these stylistic tasks.\n\nrestyle this person as anime\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“restyle this person as anime”\n\nmake this a watercolor painting\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make this a watercolor painting”\n\nBecoming something else\nIt’s halloween. We turn her into a witch, and him into an ogre, and someone else into a blue na’vi from Pandora. Gen-4 does the best witch output, but also the least convincing ogre.\n\nmake her a witch\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make her a witch”\n\nturn him into a green skinned ogre\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a green skinned ogre”\n\nFor this example, Kontext Pro didn’t want to create an image of a blue na’vi from Pandora, we’re showing Kontext Dev instead.\n\nturn him into a blue na’vi from pandora (avatar)\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a blue na’vi from pandora (avatar)“\n\nConclusion\nOverall, we found that:\n\nKontext Pro is versatile and can give fabulous results, but often there are too many artifacts around the face, and these frequently make the image unusable (these artifacts do not seem to be present in Kontext Dev, but Dev has overall lower quality)\ngpt-image-1 will always add a distinctive yellow tint, and even with the high quality and high fidelity settings enabled, the identity will frequently change. With the highest cost and slowest speed, we’d only use this for the most complex of tasks.\nSeedEdit 3 tends to restrict itself to the initial composition, making it difficult to prompt a new angle or scene. Outputs are typically softer and can look more AI generated. Coherency is also a problem in complex scenes.\nRunway’s Gen-4 is the most adaptable and accurate when it comes to likeness in photos. It’s main drawback is coherency in complex scenes, and you might find some unexpected arms, limbs or hands. Sometimes this can be fixed with a few retries, sometimes not. Gen-4 also cannot restyle a scene.\nOur recommendations\nFor photos you should start with Runway’s Gen-4 Image model. If you need faster or cheaper outputs, then Kontext Pro is the next best option. If you get some outputs from Gen-4 that aren’t coherent, you can always put them through Kontext Pro to fix them.\n\nFor more creative tasks, and complete character transformations, try Kontext Pro first. If the task is more complex, and if you can afford it, you should also try gpt-image-1. SeedEdit 3 is a good cheap alternative if you can’t afford gpt-image-1 and kontext isn’t working for you. Do not use Gen-4 for stylistic tasks.\n\nThat’s it for now, but stay tuned for more models, comparisons and experiments. Until then, try something new at replicate.com/explore, and follow us on X to see what we’re up to."
}

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_BVM**********************************

This is your API token. Keep it to yourself.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run intelligent-utilities/topic-tags using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const input = {
  number_of_tags: 10,
  text: "Generate consistent characters\nPosted July 21, 2025 by \nfofr\nA grid of 8 images showing the same character in different scenes\nUntil recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them.\n\nIf you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler.\n\nToday we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs.\n\nshe is wearing a pink t-shirt with the text “Replicate” on it\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is wearing a pink t-shirt with the text “Replicate” on it”\n\nThe best models for consistent characters\nAs of July 2025, there are four models on Replicate that can create a realistic and accurate output from a single reference. In order of release:\n\nOpenAI’s gpt-image-1\nRunway’s Gen-4 Image\nBlack Forest Labs’s FLUX.1 Kontext\nBytedance’s SeedEdit 3\nSince this blog post was written, two new models have also been released:\n\nIdeogram’s Character\nRunway’s Gen-4 Image Turbo\nFLUX.1 Kontext comes in a few different flavors: pro, max and dev. Dev is an open source version of kontext, which is more controllable and fine-tunable, but isn’t as powerful as pro.\n\nTo help write this blog post, I put together a little Replicate model to make it easy to compare outputs. Here is our comparison model, it runs FLUX.1 Kontext, SeedEdit 3.0, gpt-image-1 and Runway’s Gen-4 in parallel: fofr/compare-character-consistency.\n\n(Did you know that anyone can create and push models to Replicate?)\n\nPrice and speed comparison\nFirst, the essentials: speed and cost. The table below shows the price and speed of each model. The price of gpt-image-1 depends on the output quality you choose (low, medium, high). The price of Gen-4 Image depends on whether you choose 720p or 1080p resolution.\n\nIn summary though, gpt-image-1 is the slowest and most expensive model, and Kontext Dev is the cheapest and fastest. The tradeoffs are in quality, and we’ll look at that in more detail below.\n\nModel\tPrice (per image)\tSpeed\tDate\nOpenAI\ngpt-image-1\t$0.04–$0.17\t16s–59s\tApril 2025\nRunway\nGen-4 Image\t$0.05–$0.08\t20s–27s\tApril 2025\nBlack Forest Labs\nFLUX.1 Kontext Pro\t$0.04\t5s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Max\t$0.08\t7s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Dev\t$0.025\t4s\tMay 2025\nBytedance\nSeedEdit 3\t$0.03\t13s\tJuly 2025\nPreserving a character’s identity\nLet’s compare how well each model preserves a character’s identity.\n\nIn the following comparisons, we are using gpt-image-1 with the high quality and high fidelity settings. We stick with FLUX.1 Kontext Pro as the best compromise between quality and speed. And we use Gen-4 Image at 1080p.\n\nPhotographic accuracy\nBelow are a varied set of examples, showing the strengths and weaknesses of each model, all focusing on photographic outputs.\n\nA new activity\nIn these two examples, we can see the strengths of Gen-4 coming through. The composition is the most compelling, and the character is the most accurate.\n\nshe is playing the piano\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is playing the piano”\n\nhe is playing the guitar\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is playing the guitar”\n\nTweak the scene\nIf you want to keep most of the original composition, and change just a small part of the scene, all models handle this well.\n\nremove the glass of drink\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove the glass of drink”\n\nHalf-length portrait with unusual hair and eye color\nA more challenging comparison, here is a character with heterochromia and hair with two colors, as well as some facial marks.\n\nWe can see that every model is capable of handling the hair and eyes. (Some needed a few retries to get this right.)\n\na half-length portrait photo of her in a summer forest\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“a half-length portrait photo of her in a summer forest”\n\nA shave, a coat and some rain\nRather than keeping everything consistent, let’s try to keep the same person but change some things.\n\nIt’s a bit of a mixed bag here, only SeedEdit 3 and gpt-image-1 can handle the clean-shaven request. But gpt-image-1 is also a completely different person, so that’s probably the worst result.\n\nremove his beard, put him in a raincoat, it is raining\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove his beard, put him in a raincoat, it is raining”\n\nTrying tattoos\nHere we try a character with many distinct tattoos to see how well each model handles them. None are perfect, with Gen-4 and gpt-image-1 maintaining the neck tattoos the best.\n\nhe is a chef cooking a meal in a restaurant kitchen\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is a chef cooking a meal in a restaurant kitchen”\n\nCreative tasks and full transformations\nIn these examples, we are looking to transform the character into something else, or show them in a different style. A good model will perform the transformation while maintaining the character’s identity.\n\nChanging the style\nWith these simple style changes, we can see quickly that Gen-4 should not be used for these stylistic tasks.\n\nrestyle this person as anime\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“restyle this person as anime”\n\nmake this a watercolor painting\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make this a watercolor painting”\n\nBecoming something else\nIt’s halloween. We turn her into a witch, and him into an ogre, and someone else into a blue na’vi from Pandora. Gen-4 does the best witch output, but also the least convincing ogre.\n\nmake her a witch\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make her a witch”\n\nturn him into a green skinned ogre\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a green skinned ogre”\n\nFor this example, Kontext Pro didn’t want to create an image of a blue na’vi from Pandora, we’re showing Kontext Dev instead.\n\nturn him into a blue na’vi from pandora (avatar)\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a blue na’vi from pandora (avatar)“\n\nConclusion\nOverall, we found that:\n\nKontext Pro is versatile and can give fabulous results, but often there are too many artifacts around the face, and these frequently make the image unusable (these artifacts do not seem to be present in Kontext Dev, but Dev has overall lower quality)\ngpt-image-1 will always add a distinctive yellow tint, and even with the high quality and high fidelity settings enabled, the identity will frequently change. With the highest cost and slowest speed, we’d only use this for the most complex of tasks.\nSeedEdit 3 tends to restrict itself to the initial composition, making it difficult to prompt a new angle or scene. Outputs are typically softer and can look more AI generated. Coherency is also a problem in complex scenes.\nRunway’s Gen-4 is the most adaptable and accurate when it comes to likeness in photos. It’s main drawback is coherency in complex scenes, and you might find some unexpected arms, limbs or hands. Sometimes this can be fixed with a few retries, sometimes not. Gen-4 also cannot restyle a scene.\nOur recommendations\nFor photos you should start with Runway’s Gen-4 Image model. If you need faster or cheaper outputs, then Kontext Pro is the next best option. If you get some outputs from Gen-4 that aren’t coherent, you can always put them through Kontext Pro to fix them.\n\nFor more creative tasks, and complete character transformations, try Kontext Pro first. If the task is more complex, and if you can afford it, you should also try gpt-image-1. SeedEdit 3 is a good cheap alternative if you can’t afford gpt-image-1 and kontext isn’t working for you. Do not use Gen-4 for stylistic tasks.\n\nThat’s it for now, but stay tuned for more models, comparisons and experiments. Until then, try something new at replicate.com/explore, and follow us on X to see what we’re up to."
};

const output = await replicate.run("intelligent-utilities/topic-tags", { input });

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_BVM**********************************

This is your API token. Keep it to yourself.

Import the client:

import replicate

Run intelligent-utilities/topic-tags using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "intelligent-utilities/topic-tags",
    input={
        "number_of_tags": 10,
        "text": "Generate consistent characters\nPosted July 21, 2025 by \nfofr\nA grid of 8 images showing the same character in different scenes\nUntil recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them.\n\nIf you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler.\n\nToday we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs.\n\nshe is wearing a pink t-shirt with the text “Replicate” on it\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is wearing a pink t-shirt with the text “Replicate” on it”\n\nThe best models for consistent characters\nAs of July 2025, there are four models on Replicate that can create a realistic and accurate output from a single reference. In order of release:\n\nOpenAI’s gpt-image-1\nRunway’s Gen-4 Image\nBlack Forest Labs’s FLUX.1 Kontext\nBytedance’s SeedEdit 3\nSince this blog post was written, two new models have also been released:\n\nIdeogram’s Character\nRunway’s Gen-4 Image Turbo\nFLUX.1 Kontext comes in a few different flavors: pro, max and dev. Dev is an open source version of kontext, which is more controllable and fine-tunable, but isn’t as powerful as pro.\n\nTo help write this blog post, I put together a little Replicate model to make it easy to compare outputs. Here is our comparison model, it runs FLUX.1 Kontext, SeedEdit 3.0, gpt-image-1 and Runway’s Gen-4 in parallel: fofr/compare-character-consistency.\n\n(Did you know that anyone can create and push models to Replicate?)\n\nPrice and speed comparison\nFirst, the essentials: speed and cost. The table below shows the price and speed of each model. The price of gpt-image-1 depends on the output quality you choose (low, medium, high). The price of Gen-4 Image depends on whether you choose 720p or 1080p resolution.\n\nIn summary though, gpt-image-1 is the slowest and most expensive model, and Kontext Dev is the cheapest and fastest. The tradeoffs are in quality, and we’ll look at that in more detail below.\n\nModel\tPrice (per image)\tSpeed\tDate\nOpenAI\ngpt-image-1\t$0.04–$0.17\t16s–59s\tApril 2025\nRunway\nGen-4 Image\t$0.05–$0.08\t20s–27s\tApril 2025\nBlack Forest Labs\nFLUX.1 Kontext Pro\t$0.04\t5s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Max\t$0.08\t7s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Dev\t$0.025\t4s\tMay 2025\nBytedance\nSeedEdit 3\t$0.03\t13s\tJuly 2025\nPreserving a character’s identity\nLet’s compare how well each model preserves a character’s identity.\n\nIn the following comparisons, we are using gpt-image-1 with the high quality and high fidelity settings. We stick with FLUX.1 Kontext Pro as the best compromise between quality and speed. And we use Gen-4 Image at 1080p.\n\nPhotographic accuracy\nBelow are a varied set of examples, showing the strengths and weaknesses of each model, all focusing on photographic outputs.\n\nA new activity\nIn these two examples, we can see the strengths of Gen-4 coming through. The composition is the most compelling, and the character is the most accurate.\n\nshe is playing the piano\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is playing the piano”\n\nhe is playing the guitar\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is playing the guitar”\n\nTweak the scene\nIf you want to keep most of the original composition, and change just a small part of the scene, all models handle this well.\n\nremove the glass of drink\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove the glass of drink”\n\nHalf-length portrait with unusual hair and eye color\nA more challenging comparison, here is a character with heterochromia and hair with two colors, as well as some facial marks.\n\nWe can see that every model is capable of handling the hair and eyes. (Some needed a few retries to get this right.)\n\na half-length portrait photo of her in a summer forest\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“a half-length portrait photo of her in a summer forest”\n\nA shave, a coat and some rain\nRather than keeping everything consistent, let’s try to keep the same person but change some things.\n\nIt’s a bit of a mixed bag here, only SeedEdit 3 and gpt-image-1 can handle the clean-shaven request. But gpt-image-1 is also a completely different person, so that’s probably the worst result.\n\nremove his beard, put him in a raincoat, it is raining\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove his beard, put him in a raincoat, it is raining”\n\nTrying tattoos\nHere we try a character with many distinct tattoos to see how well each model handles them. None are perfect, with Gen-4 and gpt-image-1 maintaining the neck tattoos the best.\n\nhe is a chef cooking a meal in a restaurant kitchen\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is a chef cooking a meal in a restaurant kitchen”\n\nCreative tasks and full transformations\nIn these examples, we are looking to transform the character into something else, or show them in a different style. A good model will perform the transformation while maintaining the character’s identity.\n\nChanging the style\nWith these simple style changes, we can see quickly that Gen-4 should not be used for these stylistic tasks.\n\nrestyle this person as anime\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“restyle this person as anime”\n\nmake this a watercolor painting\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make this a watercolor painting”\n\nBecoming something else\nIt’s halloween. We turn her into a witch, and him into an ogre, and someone else into a blue na’vi from Pandora. Gen-4 does the best witch output, but also the least convincing ogre.\n\nmake her a witch\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make her a witch”\n\nturn him into a green skinned ogre\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a green skinned ogre”\n\nFor this example, Kontext Pro didn’t want to create an image of a blue na’vi from Pandora, we’re showing Kontext Dev instead.\n\nturn him into a blue na’vi from pandora (avatar)\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a blue na’vi from pandora (avatar)“\n\nConclusion\nOverall, we found that:\n\nKontext Pro is versatile and can give fabulous results, but often there are too many artifacts around the face, and these frequently make the image unusable (these artifacts do not seem to be present in Kontext Dev, but Dev has overall lower quality)\ngpt-image-1 will always add a distinctive yellow tint, and even with the high quality and high fidelity settings enabled, the identity will frequently change. With the highest cost and slowest speed, we’d only use this for the most complex of tasks.\nSeedEdit 3 tends to restrict itself to the initial composition, making it difficult to prompt a new angle or scene. Outputs are typically softer and can look more AI generated. Coherency is also a problem in complex scenes.\nRunway’s Gen-4 is the most adaptable and accurate when it comes to likeness in photos. It’s main drawback is coherency in complex scenes, and you might find some unexpected arms, limbs or hands. Sometimes this can be fixed with a few retries, sometimes not. Gen-4 also cannot restyle a scene.\nOur recommendations\nFor photos you should start with Runway’s Gen-4 Image model. If you need faster or cheaper outputs, then Kontext Pro is the next best option. If you get some outputs from Gen-4 that aren’t coherent, you can always put them through Kontext Pro to fix them.\n\nFor more creative tasks, and complete character transformations, try Kontext Pro first. If the task is more complex, and if you can afford it, you should also try gpt-image-1. SeedEdit 3 is a good cheap alternative if you can’t afford gpt-image-1 and kontext isn’t working for you. Do not use Gen-4 for stylistic tasks.\n\nThat’s it for now, but stay tuned for more models, comparisons and experiments. Until then, try something new at replicate.com/explore, and follow us on X to see what we’re up to."
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=r8_BVM**********************************

This is your API token. Keep it to yourself.

Run intelligent-utilities/topic-tags using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "input": {
      "number_of_tags": 10,
      "text": "Generate consistent characters\\nPosted July 21, 2025 by \\nfofr\\nA grid of 8 images showing the same character in different scenes\\nUntil recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them.\\n\\nIf you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler.\\n\\nToday we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs.\\n\\nshe is wearing a pink t-shirt with the text “Replicate” on it\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“she is wearing a pink t-shirt with the text “Replicate” on it”\\n\\nThe best models for consistent characters\\nAs of July 2025, there are four models on Replicate that can create a realistic and accurate output from a single reference. In order of release:\\n\\nOpenAI’s gpt-image-1\\nRunway’s Gen-4 Image\\nBlack Forest Labs’s FLUX.1 Kontext\\nBytedance’s SeedEdit 3\\nSince this blog post was written, two new models have also been released:\\n\\nIdeogram’s Character\\nRunway’s Gen-4 Image Turbo\\nFLUX.1 Kontext comes in a few different flavors: pro, max and dev. Dev is an open source version of kontext, which is more controllable and fine-tunable, but isn’t as powerful as pro.\\n\\nTo help write this blog post, I put together a little Replicate model to make it easy to compare outputs. Here is our comparison model, it runs FLUX.1 Kontext, SeedEdit 3.0, gpt-image-1 and Runway’s Gen-4 in parallel: fofr/compare-character-consistency.\\n\\n(Did you know that anyone can create and push models to Replicate?)\\n\\nPrice and speed comparison\\nFirst, the essentials: speed and cost. The table below shows the price and speed of each model. The price of gpt-image-1 depends on the output quality you choose (low, medium, high). The price of Gen-4 Image depends on whether you choose 720p or 1080p resolution.\\n\\nIn summary though, gpt-image-1 is the slowest and most expensive model, and Kontext Dev is the cheapest and fastest. The tradeoffs are in quality, and we’ll look at that in more detail below.\\n\\nModel\\tPrice (per image)\\tSpeed\\tDate\\nOpenAI\\ngpt-image-1\\t$0.04–$0.17\\t16s–59s\\tApril 2025\\nRunway\\nGen-4 Image\\t$0.05–$0.08\\t20s–27s\\tApril 2025\\nBlack Forest Labs\\nFLUX.1 Kontext Pro\\t$0.04\\t5s\\tMay 2025\\nBlack Forest Labs\\nFLUX.1 Kontext Max\\t$0.08\\t7s\\tMay 2025\\nBlack Forest Labs\\nFLUX.1 Kontext Dev\\t$0.025\\t4s\\tMay 2025\\nBytedance\\nSeedEdit 3\\t$0.03\\t13s\\tJuly 2025\\nPreserving a character’s identity\\nLet’s compare how well each model preserves a character’s identity.\\n\\nIn the following comparisons, we are using gpt-image-1 with the high quality and high fidelity settings. We stick with FLUX.1 Kontext Pro as the best compromise between quality and speed. And we use Gen-4 Image at 1080p.\\n\\nPhotographic accuracy\\nBelow are a varied set of examples, showing the strengths and weaknesses of each model, all focusing on photographic outputs.\\n\\nA new activity\\nIn these two examples, we can see the strengths of Gen-4 coming through. The composition is the most compelling, and the character is the most accurate.\\n\\nshe is playing the piano\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“she is playing the piano”\\n\\nhe is playing the guitar\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“he is playing the guitar”\\n\\nTweak the scene\\nIf you want to keep most of the original composition, and change just a small part of the scene, all models handle this well.\\n\\nremove the glass of drink\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“remove the glass of drink”\\n\\nHalf-length portrait with unusual hair and eye color\\nA more challenging comparison, here is a character with heterochromia and hair with two colors, as well as some facial marks.\\n\\nWe can see that every model is capable of handling the hair and eyes. (Some needed a few retries to get this right.)\\n\\na half-length portrait photo of her in a summer forest\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“a half-length portrait photo of her in a summer forest”\\n\\nA shave, a coat and some rain\\nRather than keeping everything consistent, let’s try to keep the same person but change some things.\\n\\nIt’s a bit of a mixed bag here, only SeedEdit 3 and gpt-image-1 can handle the clean-shaven request. But gpt-image-1 is also a completely different person, so that’s probably the worst result.\\n\\nremove his beard, put him in a raincoat, it is raining\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“remove his beard, put him in a raincoat, it is raining”\\n\\nTrying tattoos\\nHere we try a character with many distinct tattoos to see how well each model handles them. None are perfect, with Gen-4 and gpt-image-1 maintaining the neck tattoos the best.\\n\\nhe is a chef cooking a meal in a restaurant kitchen\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“he is a chef cooking a meal in a restaurant kitchen”\\n\\nCreative tasks and full transformations\\nIn these examples, we are looking to transform the character into something else, or show them in a different style. A good model will perform the transformation while maintaining the character’s identity.\\n\\nChanging the style\\nWith these simple style changes, we can see quickly that Gen-4 should not be used for these stylistic tasks.\\n\\nrestyle this person as anime\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“restyle this person as anime”\\n\\nmake this a watercolor painting\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“make this a watercolor painting”\\n\\nBecoming something else\\nIt’s halloween. We turn her into a witch, and him into an ogre, and someone else into a blue na’vi from Pandora. Gen-4 does the best witch output, but also the least convincing ogre.\\n\\nmake her a witch\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“make her a witch”\\n\\nturn him into a green skinned ogre\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“turn him into a green skinned ogre”\\n\\nFor this example, Kontext Pro didn’t want to create an image of a blue na’vi from Pandora, we’re showing Kontext Dev instead.\\n\\nturn him into a blue na’vi from pandora (avatar)\\n\\nOriginal reference image\\nOriginal\\nA grid of 4 outputs\\n“turn him into a blue na’vi from pandora (avatar)“\\n\\nConclusion\\nOverall, we found that:\\n\\nKontext Pro is versatile and can give fabulous results, but often there are too many artifacts around the face, and these frequently make the image unusable (these artifacts do not seem to be present in Kontext Dev, but Dev has overall lower quality)\\ngpt-image-1 will always add a distinctive yellow tint, and even with the high quality and high fidelity settings enabled, the identity will frequently change. With the highest cost and slowest speed, we’d only use this for the most complex of tasks.\\nSeedEdit 3 tends to restrict itself to the initial composition, making it difficult to prompt a new angle or scene. Outputs are typically softer and can look more AI generated. Coherency is also a problem in complex scenes.\\nRunway’s Gen-4 is the most adaptable and accurate when it comes to likeness in photos. It’s main drawback is coherency in complex scenes, and you might find some unexpected arms, limbs or hands. Sometimes this can be fixed with a few retries, sometimes not. Gen-4 also cannot restyle a scene.\\nOur recommendations\\nFor photos you should start with Runway’s Gen-4 Image model. If you need faster or cheaper outputs, then Kontext Pro is the next best option. If you get some outputs from Gen-4 that aren’t coherent, you can always put them through Kontext Pro to fix them.\\n\\nFor more creative tasks, and complete character transformations, try Kontext Pro first. If the task is more complex, and if you can afford it, you should also try gpt-image-1. SeedEdit 3 is a good cheap alternative if you can’t afford gpt-image-1 and kontext isn’t working for you. Do not use Gen-4 for stylistic tasks.\\n\\nThat’s it for now, but stay tuned for more models, comparisons and experiments. Until then, try something new at replicate.com/explore, and follow us on X to see what we’re up to."
    }
  }' \
  https://api.replicate.com/v1/models/intelligent-utilities/topic-tags/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

ai-generated-artconsistent-character-generationimage-model-comparisonreplicate-modelskontext-progpt-image-1gen-4-imageseededit-3flux-kontextface-persistence

{
  "id": "6ddyjg7p4xrm80crn6gta56cy4",
  "model": "intelligent-utilities/topic-tags",
  "version": "hidden",
  "input": {
    "number_of_tags": 10,
    "text": "Generate consistent characters\nPosted July 21, 2025 by \nfofr\nA grid of 8 images showing the same character in different scenes\nUntil recently, the best way to generate images of a consistent character was from a trained lora. You would need to create a dataset of images and then train a FLUX lora on them.\n\nIf you want to go back further, you might remember having to use a ComfyUI workflow. A workflow that would combine SDXL, controlnets, IPAdapters and some non-commercial face landmark models. Things have got remarkably simpler.\n\nToday we have a choice of state of the art image models that can do this accurately from a single reference. In this blog post we’ll highlight which models can do this, and which is best depending on your needs.\n\nshe is wearing a pink t-shirt with the text “Replicate” on it\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is wearing a pink t-shirt with the text “Replicate” on it”\n\nThe best models for consistent characters\nAs of July 2025, there are four models on Replicate that can create a realistic and accurate output from a single reference. In order of release:\n\nOpenAI’s gpt-image-1\nRunway’s Gen-4 Image\nBlack Forest Labs’s FLUX.1 Kontext\nBytedance’s SeedEdit 3\nSince this blog post was written, two new models have also been released:\n\nIdeogram’s Character\nRunway’s Gen-4 Image Turbo\nFLUX.1 Kontext comes in a few different flavors: pro, max and dev. Dev is an open source version of kontext, which is more controllable and fine-tunable, but isn’t as powerful as pro.\n\nTo help write this blog post, I put together a little Replicate model to make it easy to compare outputs. Here is our comparison model, it runs FLUX.1 Kontext, SeedEdit 3.0, gpt-image-1 and Runway’s Gen-4 in parallel: fofr/compare-character-consistency.\n\n(Did you know that anyone can create and push models to Replicate?)\n\nPrice and speed comparison\nFirst, the essentials: speed and cost. The table below shows the price and speed of each model. The price of gpt-image-1 depends on the output quality you choose (low, medium, high). The price of Gen-4 Image depends on whether you choose 720p or 1080p resolution.\n\nIn summary though, gpt-image-1 is the slowest and most expensive model, and Kontext Dev is the cheapest and fastest. The tradeoffs are in quality, and we’ll look at that in more detail below.\n\nModel\tPrice (per image)\tSpeed\tDate\nOpenAI\ngpt-image-1\t$0.04–$0.17\t16s–59s\tApril 2025\nRunway\nGen-4 Image\t$0.05–$0.08\t20s–27s\tApril 2025\nBlack Forest Labs\nFLUX.1 Kontext Pro\t$0.04\t5s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Max\t$0.08\t7s\tMay 2025\nBlack Forest Labs\nFLUX.1 Kontext Dev\t$0.025\t4s\tMay 2025\nBytedance\nSeedEdit 3\t$0.03\t13s\tJuly 2025\nPreserving a character’s identity\nLet’s compare how well each model preserves a character’s identity.\n\nIn the following comparisons, we are using gpt-image-1 with the high quality and high fidelity settings. We stick with FLUX.1 Kontext Pro as the best compromise between quality and speed. And we use Gen-4 Image at 1080p.\n\nPhotographic accuracy\nBelow are a varied set of examples, showing the strengths and weaknesses of each model, all focusing on photographic outputs.\n\nA new activity\nIn these two examples, we can see the strengths of Gen-4 coming through. The composition is the most compelling, and the character is the most accurate.\n\nshe is playing the piano\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“she is playing the piano”\n\nhe is playing the guitar\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is playing the guitar”\n\nTweak the scene\nIf you want to keep most of the original composition, and change just a small part of the scene, all models handle this well.\n\nremove the glass of drink\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove the glass of drink”\n\nHalf-length portrait with unusual hair and eye color\nA more challenging comparison, here is a character with heterochromia and hair with two colors, as well as some facial marks.\n\nWe can see that every model is capable of handling the hair and eyes. (Some needed a few retries to get this right.)\n\na half-length portrait photo of her in a summer forest\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“a half-length portrait photo of her in a summer forest”\n\nA shave, a coat and some rain\nRather than keeping everything consistent, let’s try to keep the same person but change some things.\n\nIt’s a bit of a mixed bag here, only SeedEdit 3 and gpt-image-1 can handle the clean-shaven request. But gpt-image-1 is also a completely different person, so that’s probably the worst result.\n\nremove his beard, put him in a raincoat, it is raining\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“remove his beard, put him in a raincoat, it is raining”\n\nTrying tattoos\nHere we try a character with many distinct tattoos to see how well each model handles them. None are perfect, with Gen-4 and gpt-image-1 maintaining the neck tattoos the best.\n\nhe is a chef cooking a meal in a restaurant kitchen\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“he is a chef cooking a meal in a restaurant kitchen”\n\nCreative tasks and full transformations\nIn these examples, we are looking to transform the character into something else, or show them in a different style. A good model will perform the transformation while maintaining the character’s identity.\n\nChanging the style\nWith these simple style changes, we can see quickly that Gen-4 should not be used for these stylistic tasks.\n\nrestyle this person as anime\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“restyle this person as anime”\n\nmake this a watercolor painting\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make this a watercolor painting”\n\nBecoming something else\nIt’s halloween. We turn her into a witch, and him into an ogre, and someone else into a blue na’vi from Pandora. Gen-4 does the best witch output, but also the least convincing ogre.\n\nmake her a witch\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“make her a witch”\n\nturn him into a green skinned ogre\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a green skinned ogre”\n\nFor this example, Kontext Pro didn’t want to create an image of a blue na’vi from Pandora, we’re showing Kontext Dev instead.\n\nturn him into a blue na’vi from pandora (avatar)\n\nOriginal reference image\nOriginal\nA grid of 4 outputs\n“turn him into a blue na’vi from pandora (avatar)“\n\nConclusion\nOverall, we found that:\n\nKontext Pro is versatile and can give fabulous results, but often there are too many artifacts around the face, and these frequently make the image unusable (these artifacts do not seem to be present in Kontext Dev, but Dev has overall lower quality)\ngpt-image-1 will always add a distinctive yellow tint, and even with the high quality and high fidelity settings enabled, the identity will frequently change. With the highest cost and slowest speed, we’d only use this for the most complex of tasks.\nSeedEdit 3 tends to restrict itself to the initial composition, making it difficult to prompt a new angle or scene. Outputs are typically softer and can look more AI generated. Coherency is also a problem in complex scenes.\nRunway’s Gen-4 is the most adaptable and accurate when it comes to likeness in photos. It’s main drawback is coherency in complex scenes, and you might find some unexpected arms, limbs or hands. Sometimes this can be fixed with a few retries, sometimes not. Gen-4 also cannot restyle a scene.\nOur recommendations\nFor photos you should start with Runway’s Gen-4 Image model. If you need faster or cheaper outputs, then Kontext Pro is the next best option. If you get some outputs from Gen-4 that aren’t coherent, you can always put them through Kontext Pro to fix them.\n\nFor more creative tasks, and complete character transformations, try Kontext Pro first. If the task is more complex, and if you can afford it, you should also try gpt-image-1. SeedEdit 3 is a good cheap alternative if you can’t afford gpt-image-1 and kontext isn’t working for you. Do not use Gen-4 for stylistic tasks.\n\nThat’s it for now, but stay tuned for more models, comparisons and experiments. Until then, try something new at replicate.com/explore, and follow us on X to see what we’re up to."
  },
  "logs": "",
  "output": [
    "ai-generated-art",
    "consistent-character-generation",
    "image-model-comparison",
    "replicate-models",
    "kontext-pro",
    "gpt-image-1",
    "gen-4-image",
    "seededit-3",
    "flux-kontext",
    "face-persistence"
  ],
  "data_removed": false,
  "error": null,
  "source": "web",
  "status": "succeeded",
  "created_at": "2025-08-14T17:30:20.583Z",
  "started_at": "2025-08-14T17:30:20.886373Z",
  "completed_at": "2025-08-14T17:30:23.077858Z",
  "urls": {
    "cancel": "https://api.replicate.com/v1/predictions/6ddyjg7p4xrm80crn6gta56cy4/cancel",
    "children": "https://api.replicate.com/v1/predictions/6ddyjg7p4xrm80crn6gta56cy4/children",
    "get": "https://api.replicate.com/v1/predictions/6ddyjg7p4xrm80crn6gta56cy4",
    "root": "https://api.replicate.com/v1/predictions/6ddyjg7p4xrm80crn6gta56cy4",
    "web": "https://replicate.com/p/6ddyjg7p4xrm80crn6gta56cy4"
  },
  "metrics": {
    "predict_time": 2.191485402,
    "total_time": 2.494858
  }
}

Generated in

2.2 seconds

Tweak it Report