zsxkib/star

STAR Video Upscaler: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

Public
626 runs

Run time and cost

This model costs approximately $1.48 to run on Replicate, or 0 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia H100 GPU hardware. Predictions typically complete within 17 minutes.

Readme

STAR: Spatial-Temporal Video Super-Resolution

STAR is a powerful text-guided video super-resolution model that can enhance low-quality videos while maintaining temporal consistency. It leverages text-to-video models to generate high-quality reference frames and combines them with spatial-temporal features for superior upscaling results.

More visual results can be found on our project page and video demo.

Usage

The model accepts: - A video file (supported formats: mp4, avi, mov) - Optional text prompt describing the video content - Target upscaling factor (default: 4x)

The model outputs an enhanced, higher-resolution version of the input video.

Limitations

  • For optimal results, input videos should be at least 240p resolution
  • Processing time increases with video length and resolution
  • Due to VRAM requirements, longer videos may need to be processed in segments
  • The CogVideoX-5B variant only supports 720x480 input resolution

Model Versions

Two variants are available:

  1. I2VGen-XL-based:
  2. Light degradation model: Best for mild quality enhancement
  3. Heavy degradation model: Optimized for severely degraded videos

  4. CogVideoX-5B-based:

  5. Specialized for heavy degradation scenarios
  6. Fixed input resolution of 720x480

Citation

@misc{xie2025starspatialtemporalaugmentationtexttovideo,
      title={STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution}, 
      author={Rui Xie and Yinhong Liu and Penghao Zhou and Chen Zhao and Jun Zhou and Kai Zhang and Zhenyu Zhang and Jian Yang and Zhenheng Yang and Ying Tai},
      year={2025},
      eprint={2501.02976},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

  • I2VGen-XL-based models: MIT License
  • CogVideoX-5B-based model: CogVideoX License

Maintained by @zsxkib for Replicate integration