adirik / mamba-2.8b-slimpj

Base version of Mamba 2.8B Slim Pyjama, a 2.8 billion parameter state space language model

  • Public
  • 73 runs
  • L40S
  • GitHub
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Text prompt to send to the model.

integer
(minimum: 1, maximum: 5000)

Maximum number of tokens to generate. A word is generally 2-3 tokens.

Default: 100

number
(minimum: 0.1, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 1

number
(minimum: 0.01, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens.

Default: 1

integer

When decoding text, samples from the top k most likely tokens; lower to ignore less likely tokens.

Default: 1

number
(minimum: 0.01, maximum: 10)

Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.

Default: 1.2

integer

The seed for the random number generator

Output

I'm fine. I am very well, thank-you! How about yourself?" "Fine." (1) The man said that he was feeling good and asked how the woman felt herself to be at this moment in time; she replied by saying: 'Well' or 'Very Well'. This is a polite way of asking someone if they feel OK – it's not rude but just an expression used when we want people around us know what our feelings towards them may have been like recently as
Generated in

Run time and cost

This model costs approximately $0.14 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 144 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Mamba

Mamba is a large language model with state space model architecture showing promising performance on information-dense data such as language modeling. See the original repo and paper for details.

Basic Usage

The API input arguments are as follows:

  • prompt: The text prompt for Mamba.
  • max_length: Maximum number of tokens to generate. A word is generally 2-3 tokens.
  • temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.
  • top_p: Samples from the top p percentage of most likely tokens during text decoding, lower to ignore less likely tokens.
  • top_k: Samples from the top k most likely tokens during text decoding, lower to ignore less likely tokens.
  • repetition_penalty: Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.
  • seed: The seed parameter for deterministic text generation. A specific seed can be used to reproduce results or left blank for random generation.

References

@article{mamba,
  title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}