hanglics/dse-qwen2-2b-mrl-v1

The screenshot retriever model base on DSE.

Public
23 runs

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

DSE-QWen2-2b-MRL-V1

DSE-QWen2-2b-MRL-V1 is a bi-encoder model designed to encode document screenshots into dense vectors for document retrieval. The Document Screenshot Embedding (DSE) approach captures documents in their original visual format, preserving all information such as text, images, and layout, thus avoiding tedious parsing and potential information loss. DSE aims to provide a generalizable embedding model for Text, PDF documents, Webpage, Slides retrieval.

For example, DSE-QWen2-2b-MRL-V1 achieves 85.8 nDCG@5 on ViDoRE leaderboard.

Note:

Please see here (pytorch.bin version) and here (*.safetensors version) for more details.

Citation

If you find this checkpoint is helpful, please consider citing QWen2, Docmatix, ViDoRe, and DSE work.