Readme
Z-Image-Turbo (VideoX-Fun)
This model is an implementation of Z-Image-Turbo based on the VideoX-Fun repository. It utilizes the Union ControlNet (alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union) to allow for highly controllable image generation using various conditions (Canny, Depth, Pose, HED) while supporting custom LoRA weights.
✨ Features
- Turbo Generation: High-quality image generation with fewer inference steps (typically 20 steps).
- Union ControlNet: Supports multiple control modes in a single model:
canny: Edge detection.depth: Depth map estimation.pose: Human pose estimation.hed: Soft edge detection.
- Custom LoRA Support: Dynamically load LoRA weights (
.safetensors) from a URL to stylize your generations. - Smart Resizing: Automatically adjusts output resolution to match the dimensions of your ControlNet input image.
🚀 How to use
Basic Parameters
- prompt: The text description of the image you want to generate.
- num_outputs: Number of images to generate (default: 1).
- num_inference_steps: The number of denoising steps. Default is
20. - guidance_scale: The classifier-free guidance scale. Set to
0for Turbo models or adjust as needed. - seed: Random seed for reproducibility. Leave blank or set to
-1for random.
ControlNet Parameters
To guide the generation structure:
- controlnet_1: Select the control type (
canny,depth,pose,hed, ornone). - controlnet_1_image: URL or file upload of the image to use as the structural reference.
- controlnet_1_end: Control strength (Control Context Scale). Default is
1.0. Lower values (e.g.,0.6-0.8) allow the model more freedom away from the reference structure.
LoRA Parameters
To apply a specific style:
- lora_weights: URL to a
.safetensors(or.tar) file containing the LoRA weights. - lora_scale: Strength of the LoRA application (0.0 to 2.0). Default is
1.0.
🔧 Technical Details
- Base Model:
Tongyi-MAI/Z-Image-Turbo - Architecture: S3-DiT (Diffusion Transformer)
- ControlNet: Uses a unified ControlNet model capable of handling multiple conditions by projecting them into the correct latent space.
🔗 Credits
- Based on VideoX-Fun by Alibaba PAI & Tongyi-MAI.
- Original weights: HuggingFace.