How does this Pipeline work?
Stage 1 — Text-to-Image (selectable):
Picks a base T2I model from models
— black-forest-labs/flux-dev
(default), flux-schnell
, stability-ai/sdxl
, or ideogram-ai/ideogram-v2(-turbo)
.
It introspects the model’s OpenAPI schema to ensure it accepts a prompt
and returns an image URI, then generates the initial image from the prompt.
Stage 2 — Framed copy setup:
Computes two concentric rectangles (an outer border and an inner window) from border_start
and border_width
.
Creates a shrunken copy of the original image sized to the inner window and pastes it back into the original—this sets up a “picture-in-picture” center.
Stage 3 — Mask for inpainting (model-aware):
Builds a mask that reveals only the frame region between the outer and inner rectangles.
Mask color scheme flips depending on the chosen inpaint backend — ideogram-ai/ideogram-v2
expects white-keep/black-paint, while black-forest-labs/flux-fill-pro
expects black-keep/white-paint.
Stage 4 — Inpainting (selectable backend):
Calls the chosen inpaint model — ideogram-ai/ideogram-v2
or black-forest-labs/flux-fill-pro
— with the prompt, the image-with-center-copy, and the mask to synthesize new frame content around the center.
Output is normalized to a single image path.
Stage 5 — Multi-scale still:
From the inpainted image, builds a composite “still” by pasting two additional center resizes (at copy_scale
and copy_scale**2
) into the middle, reinforcing the zoom-in focal layers.
Stage 6 — Animated zoom render:
Generates num_frames
frames by progressively zooming the background still while overlaying a center section that fades in across frames.
Exports as GIF/WebP directly or transcodes to MP4 via ffmpeg
.