This model is a sentence-transformer based on MPNet, an encoder-style language model introduced by Microsoft.
You can use this model to support downstream tasks like document clustering and semantic search.
This model obtains token-level embeddings and then aggregates them with mean-pooling to produce a single 768-dimensional document embedding.
The base model has been fine-tuned with a contrastive learning objective to predict sentence pairs using a dataset of 1 billion sentence pairs.
The fine-tuning procedure for this model makes it particularly well suited for encoding the semantic content of short documents, such as sentences. The maximum sequence length of the fine-tuning data was 384 tokens. This means that representational fidelity may degrade as sequence length exceeds this threshold.
However, this issue can be mitigated with additional fine-tuning on relevant data.
Language models are widely-known to encode social biases and it should be assumed that this model is not an exception.