Readme
How does this Pipeline work?
Stage 1 — Text-to-Image (selectable):
Picks a base T2I model from models — black-forest-labs/flux-dev (default), flux-schnell, stability-ai/sdxl, or ideogram-ai/ideogram-v2(-turbo).
It introspects the model’s OpenAPI schema to ensure it accepts a prompt and returns an image URI, then generates the initial image from the prompt.
Stage 2 — Framed copy setup:
Computes two concentric rectangles (an outer border and an inner window) from border_start and border_width.
Creates a shrunken copy of the original image sized to the inner window and pastes it back into the original—this sets up a “picture-in-picture” center.
Stage 3 — Mask for inpainting (model-aware):
Builds a mask that reveals only the frame region between the outer and inner rectangles.
Mask color scheme flips depending on the chosen inpaint backend — ideogram-ai/ideogram-v2 expects white-keep/black-paint, while black-forest-labs/flux-fill-pro expects black-keep/white-paint.
Stage 4 — Inpainting (selectable backend):
Calls the chosen inpaint model — ideogram-ai/ideogram-v2 or black-forest-labs/flux-fill-pro — with the prompt, the image-with-center-copy, and the mask to synthesize new frame content around the center.
Output is normalized to a single image path.
Stage 5 — Multi-scale still:
From the inpainted image, builds a composite “still” by pasting two additional center resizes (at copy_scale and copy_scale**2) into the middle, reinforcing the zoom-in focal layers.
Stage 6 — Animated zoom render:
Generates num_frames frames by progressively zooming the background still while overlaying a center section that fades in across frames.
Exports as GIF/WebP directly or transcodes to MP4 via ffmpeg.