center-for-curriculum-redesign/bge_1-5_embedding

BAAI_bge-large-en v1.5 text embedding model for passage retrieval and sentence similarity

Public
985 runs

Input

string
Shift + Return to add a new line

A serialized JSON array of strings you wish to generate *retreival* embeddings for. (note, that you should keep this list short to avoid Replicate response size limitations). Use this to embed short text queries intended for comparison against document text. A vector will be returned corresponding to each line of text in the input array (in order of input). This endpoint will automatically format your query strings for retrieval, you do not need to preprocess them.

Default: "[]"

string
Shift + Return to add a new line

A serialized JSON object, where each value of the object is a URL to a publicly accessible `.json` file containing an array of text chunks you wish to generate embeddings for, and each key of the object is an identifier which will be used to webhook-notify you that the embeddings for the file at that url are ready (or that they have failed). If the contents of your file are successfully embedded, a new JSON file will be generated for you to download. This file will contain an array of vectors, where the vector at each index represents the embedding of the text excerpt at the corresponding index of your text chunk array in your input file. Each vector is itself just an array, so returned file should ultimately just look like an array of arrays of floating point numbers.

Default: " { }"

boolean

normalizes returned embedding vectors to a magnitude of 1. (default: true, as this model presumes cosine similarity comparisons downstream)

Default: true

number
(minimum: 0.5)

maximumum number of kibiTokens (1 kibiToken = 1024 tokens) to try to stuff into a batch (to avoid out of memory errors but maximize throughput). If the total number of tokens across the flattened list of requested embeddings exceed this value, the list will be split internally and run across multiple forward passes. This will not affect the shape of your output, just the time it takes to run.

Default: 200

string

numerical precision for inference computations. Either full or half. Defaults to a paranoid value of full. You may want to test if 'half' is sufficient for your needs, though regardless you should probably prefer to use the same precision for querying as you do for archiving.

Default: "full"

Output

No output yet! Press "Submit" to start a prediction.

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

BAAI’s (english) embedding model version 1.5 for passage retrieval and sentence similarity, using the Center For Curriculum Redesign’s Replicate Embedding Cog template.

As of September 2023, this model is state of the art.

  • output dimensions : 1024
  • parameters: 326M
  • MTEB score average: 64.23
DimensionSequence LengthAverage (56)Retrieval (15)Clustering (11)Pair Classification (3)Reranking (4)STS (10)Summarization (1)Classification (12)
102451264.2354.2946.0887.1260.0383.1131.6175.97

Usage


This endpoint is structured primarily for generating passage/document embeddings. You can give it a large excerpt_files json object of the following form as input:

{
"file1.json": "http://someplace.com/somefile1.json",
//..
"fileN.json": "https://someplace.come/somefileN.json"
}

The contents of each input file should be an array of strings parsing to no more than 512 BERT tokens per string.

This endpoint will asynchronously download the files from the provided URLs and generate embedings for the strings contained in each file. The embeddings will be stored as downloadable files named after the keys of your excerpt_files JSON. Whatever webhook you specify will be provided with URLS from which you may download the generated embeddings for each input file. Each generated embedding files will contain an array of 1024 dimensional vectors, where each is the embedding of the string at the same index in your input file.

Make sure to test your webhook code with a few small batches before attempting a large indexing run. And make sure your webhook code downloads the files within an hour of being notified of their availability.