adirik / dat

Dual Aggregation Transformer for Image Super-Resolution

  • Public
  • 149 runs
  • GitHub
  • Paper
  • License

Model Description

Image super resolution model that is based on a novel transformer model called “Dual Aggregation Transformer” which is introduced in the paper: “Dual Aggregation Transformer for Image Super-Resolution” by Chen Z. et al. There ara three seperate pretrained models that can be used for x2, x3 and x4 upscaling. They are trained on DIV2K and Flickr2K datasets. This cog wrapper is based heavily on the original implementation

Abstract

Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Transformer model, Dual Aggregation Transformer (DAT), for image SR. Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner. Specifically, we alternately apply spatial and channel self-attention in consecutive Transformer blocks. The alternate strategy enables DAT to capture the global context and realize inter-block feature aggregation. Furthermore, we propose the adaptive interaction module (AIM) and the spatial-gate feed-forward network (SGFN) to achieve intra-block feature aggregation. AIM complements two self-attention mechanisms from corresponding dimensions. Meanwhile, SGFN introduces additional non-linear spatial information in the feed-forward network. Extensive experiments show that our DAT surpasses current methods. Code and models are obtainable at this https URL.

See the paper for more details.

Usage

Model is pretty straightforward to use. You provide the image and choose one of three upscaling factors (x2, x3, x4) and it will return the upscaled image.

For images that have more than 1024*1024 pixels, it will be divided into patches and upscaled separately. Then, the patches will be merged back together. This is done to prevent memory errors. So be careful when using the model on large images.

Limitations

For each upscaling factor, there is a limit on the maximum size of the image that can be upscaled. The maximum size of the image that can be upscaled is as follows:

Scale Max Image Size
2 2048
3 1024
4 1024

Note: Max Image Size denotes the larger side of the image.

References

@inproceedings{chen2023dual, title={Dual Aggregation Transformer for Image Super-Resolution}, author={Chen, Zheng and Zhang, Yulun and Gu, Jinjin and Kong, Linghe and Yang, Xiaokang and Yu, Fisher}, booktitle={ICCV}, year={2023} }