chenxwh / omost

Convert LLM's coding to image generation

  • Public
  • 1.9K runs
  • L40S
  • GitHub
  • License

Input

string
Shift + Return to add a new line

Input prompt

Default: "generate an image of the fierce battle of warriors and the dragon"

string
Shift + Return to add a new line

Specify things to not see in the output

Default: "lowres, bad anatomy, bad hands, cropped, worst quality"

integer

Width of output image

Default: 896

integer

Height of output image

Default: 1152

integer
(minimum: 1, maximum: 100)

Number of denoising steps

Default: 25

number
(minimum: 1, maximum: 32)

Scale for classifier-free guidance

Default: 5

number
(minimum: 0, maximum: 2)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic

Default: 0.6

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens

Default: 0.9

integer
(minimum: 128, maximum: 4096)

Maximum number of tokens to generate

Default: 4096

integer

Random seed. Leave blank to randomize the seed

Output

code

# Initialize the canvas canvas = Canvas() # Set a global description for the canvas canvas.set_global_description( description='A fierce battle scene with warriors and a dragon.', detailed_descriptions=[ 'The image captures a dramatic and intense battle between fierce warriors and a menacing dragon.', 'The warriors are clad in traditional armor, wielding swords and shields, while the dragon, with its scales glistening in the sunlight, breathes fire.', 'The background shows a war-torn landscape with smoldering ruins and smoke billowing into the sky.', 'The sky is filled with dark, ominous clouds, adding to the tension and drama of the scene.', 'The warriors are engaged in a fierce fight, their expressions filled with determination and fear.', 'The dragon, with its powerful wings and sharp claws, dominates the scene, its eyes glowing with a fierce intensity.', 'The overall atmosphere is one of chaos and adrenaline, with the clash of metal and the roar of the dragon creating a sense of urgency and danger.', ], tags='battle, warriors, dragon, fierce, armor, swords, shields, fire, ruins, smoke, clouds, tension, drama, fight, determination, fear, wings, claws, glowing eyes, chaos, adrenaline, clash, roar, urgency, danger', HTML_web_color_name='darkslategray', ) # Add fierce warriors engaged in battle. canvas.add_local_description( location='in the center', offset='no offset', area='a large square area', distance_to_viewer=5.0, description='Fierce warriors engaged in battle.', detailed_descriptions=[ 'A group of fierce warriors, clad in traditional armor, are engaged in an intense battle.', 'They wield swords and shields, their faces filled with determination and fear as they clash with their enemies.', 'The warriors are positioned in a formation, ready to defend and attack.', 'The details of their armor, with intricate designs and battle-worn marks, are visible, showing their experience and bravery.', 'The expressions on their faces are a mix of grit and fear, reflecting the chaos and adrenaline of the battle.', ], tags='warriors, battle, armor, swords, shields, determination, fear, formation, defense, attack, intricate designs, battle-worn, experience, bravery, expressions, chaos, adrenaline', atmosphere='The atmosphere is one of intense chaos and adrenaline.', style='The style is detailed and realistic, capturing the intensity of the battle.', quality_meta='The image quality is high, with detailed and realistic depictions.', HTML_web_color_name='slategray', ) # Add a menacing dragon breathing fire. canvas.add_local_description( location='on the left', offset='slightly to the upper', area='a medium-sized vertical area', distance_to_viewer=7.0, description='A menacing dragon breathing fire.', detailed_descriptions=[ 'A menacing dragon, with its scales glistening in the sunlight, dominates the scene.', 'The dragon has powerful wings and sharp claws, its eyes glowing with a fierce intensity.', 'It is breathing fire, with flames erupting from its mouth, illuminating the surrounding area with a fiery glow.', 'The details of its scales, wings, and claws are meticulously depicted, showing the strength and ferocity of the beast.', 'The dragon’s presence adds an element of danger and urgency to the battle scene.', ], tags='dragon, menacing, scales, glistening, sunlight, powerful wings, sharp claws, glowing eyes, fire, flames, fiery glow, strength, ferocity, danger, urgency', atmosphere='The atmosphere is one of danger and urgency.', style='The style is dramatic and intense, highlighting the dragon’s ferocity.', quality_meta='The image quality is high, with detailed and dramatic depictions.', HTML_web_color_name='firebrick', ) # Add smoldering ruins and smoke in the background. canvas.add_local_description( location='on the top', offset='slightly to the right', area='a large horizontal area', distance_to_viewer=10.0, description='Smoldering ruins and smoke in the background.', detailed_descriptions=[ 'The background shows a war-torn landscape with smoldering ruins and smoke billowing into the sky.', 'The ruins are charred and blackened, indicating recent destruction.', 'The smoke rises in thick, dark clouds, adding to the drama and tension of the scene.', 'The details of the ruins, with their crumbling structures and scattered debris, are visible, showing the aftermath of the battle.', 'The dark clouds in the sky enhance the ominous and chaotic atmosphere of the scene.', ], tags='background, war-torn, ruins, smoke, billowing, sky, dark clouds, drama, tension, charred, blackened, destruction, crumbling structures, debris, aftermath, battle, ominous, chaotic', atmosphere='The atmosphere is ominous and chaotic.', style='The style is detailed and dramatic, emphasizing the aftermath of the battle.', quality_meta='The image quality is high, with detailed and dramatic depictions.', HTML_web_color_name='dimgray', ) # Add a group of fallen warriors and their weapons. canvas.add_local_description( location='on the bottom-left', offset='slightly to the lower-right', area='a small square area', distance_to_viewer=8.0, description='A group of fallen warriors and their weapons.', detailed_descriptions=[ 'A group of fallen warriors lies on the ground, their weapons scattered around them.', 'The fallen warriors are clad in the same traditional armor, their faces frozen in expressions of pain and struggle.', 'The details of their armor and weapons are visible, showing the marks of battle and the ferocity of the fight.', 'The ground around them is littered with weapons, shields, and swords, indicating the intensity of the battle.', 'The fallen warriors add a somber and tragic element to the scene, highlighting the cost of the battle.', ], tags='fallen warriors, ground, weapons, scattered, armor, expressions, pain, struggle, battle marks, ferocity, fight, intensity, somber, tragic, cost, battle', atmosphere='The atmosphere is somber and tragic.', style='The style is detailed and realistic, capturing the somber aftermath.', quality_meta='The image quality is high, with detailed and realistic depictions.', HTML_web_color_name='darkred', ) # Add a fallen dragon with broken wings. canvas.add_local_description( location='on the bottom-right', offset='slightly to the upper-left', area='a small vertical area', distance_to_viewer=9.0, description='A fallen dragon with broken wings.', detailed_descriptions=[ 'A fallen dragon lies on the ground, its powerful wings broken and scattered around.', 'The dragon’s scales are charred and blackened, showing the effects of the fire it breathed.', 'Its eyes are dimmed, and its body is still, indicating its defeat.', 'The details of its broken wings and charred scales are meticulously depicted, showing the ferocity and strength of the beast.', 'The fallen dragon adds a dramatic and tragic element to the scene, highlighting the cost of the battle.', ], tags='fallen dragon, ground, broken wings, scattered, scales, charred, blackened, fire effects, dimmed eyes, defeated, ferocity, strength, dramatic, tragic, cost, battle', atmosphere='The atmosphere is dramatic and tragic.', style='The style is detailed and realistic, capturing the dramatic aftermath.', quality_meta='The image quality is high, with detailed and realistic depictions.', HTML_web_color_name='darkslateblue', )

image

image
Generated in

Run time and cost

This model costs approximately $0.18 to run on Replicate, or 5 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Omost

Omost is a project to convert LLM’s coding capability to image generation (or more accurately, image composing) capability.

The name Omost (pronunciation: almost) has two meanings: 1) everytime after you use Omost, your image is almost there; 2) the O mean “omni” (multi-modal) and most means we want to get the most out of it.

Omost provides LLMs models that will write codes to compose image visual contents with Omost’s virtual Canvas agent. This Canvas can be rendered by specific implementations of image generators to actually generate images.

Currently, we provide 3 pretrained LLM models based on variations of Llama3 and Phi3 (see also the model notes at the end of this page).

All models are trained with mixed data of (1) ground-truth annotations of several datasets including Open-Images, (2) extracted data by automatically annotating images, (3) reinforcement from DPO (Direct Preference Optimization, “whether the codes can be compiled by python 3.10 or not” as a direct preference), and (4) a small amount of tuning data from OpenAI GPT4o’s multi-modal capability.

Some notes:

  1. The recommended quant for omost-llama-3-8b is 4bits, and for omost-phi-3-mini-128k (3.8B) is 8 bits. They all fit in 8GB VRAM without offloads. The performance degradation caused by quant is very minimal and I personally never observed any evidences of degradation. However, quant omost-phi-3-mini-128k into 4 bits is not recommended since I noticed some obvious performance degradation. The 4bit inference of omost-phi-3-mini-128k should be viewed as a last method in extreme cases when you really do not have more capable GPUs.
  2. My user study shows that omost-llama-3-8b-4bits > omost-dolphin-2.9-llama3-8b-4bits > omost-phi-3-mini-128k-8bits. So in most cases one should just use omost-llama-3-8b-4bits.
  3. The omost-llama-3-8b and omost-phi-3-mini-128k are trained with filtered safe data without NSFW or inappropriate contents. See (4) if you need a different option.
  4. The omost-dolphin-2.9-llama3-8b is trained with all data WITHOUT any filtering. You must apply your own safety alignment methods if you expose any service of omost-dolphin-2.9-llama3-8b to public.
  5. Note that the filtering in (3) is not because of any policy - the reason is that I noticed slight instability in training gradients in those models since they are pretrained with instruct following regulated by safety alignment, causing the performance to degrade a bit. But the instruct following of omost-dolphin-2.9-llama3-8b is pretrained with community efforts and do not have this problem.
  6. The 128k context length of omost-phi-3-mini-128k cannot be trusted. The performance of it will degrade a lot after the tokens reach about 8k. One should just view it as a model with about 8k content length.
  7. A model of 8k context length can do about 5 to 6 rounds of conversational editing. If you are about to run out of token lengths, use the UI to modify your message and respond again (this can be done with infinite times).
  8. All models are fully trained with our H100 clusters at precision fp16 without any tricks like quant or Q-LoRA etc. The optimizer is Adam without any tricks.
  9. You must also follow the licenses of Llama-3 and Phi-3.
  10. You can request us to train on other LLMs if reasonable and necessary.

Cite

@Misc{omost,
  author = {Omost Team},
  title  = {Omost GitHub Page},
  year   = {2024},
}

Related Work

Also read …

DOCCI: Descriptions of Connected and Contrasting Images

(RPG-DiffusionMaster) Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models and Self-correcting LLM-controlled Diffusion Models

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

sd-webui-regional-prompter