hanglics/dse-qwen2-2b-mrl-v1

The screenshot retriever model base on DSE.

Public
3.1K runs

Run time and cost

This model costs approximately $0.14 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 147 seconds. The predict time for this model varies significantly based on the inputs.

Readme

DSE-QWen2-2b-MRL-V1

DSE-QWen2-2b-MRL-V1 is a bi-encoder model designed to encode document screenshots into dense vectors for document retrieval. The Document Screenshot Embedding (DSE) approach captures documents in their original visual format, preserving all information such as text, images, and layout, thus avoiding tedious parsing and potential information loss. DSE aims to provide a generalizable embedding model for Text, PDF documents, Webpage, Slides retrieval.

For example, DSE-QWen2-2b-MRL-V1 achieves 85.8 nDCG@5 on ViDoRE leaderboard.

Note:

Please see here (pytorch.bin version) and here (*.safetensors version) for more details.

Citation

If you find this checkpoint is helpful, please consider citing QWen2, Docmatix, ViDoRe, and DSE work.