adirik / mamba-370m

Base version of Mamba 370M, a 370 million parameter state space language model

  • Public
  • 45 runs
  • GitHub
  • Paper
  • License

Run time and cost

This model runs on Nvidia A40 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Mamba

Mamba is a large language model with state space model architecture showing promising performance on information-dense data such as language modeling. See the original repo and paper for details.

Basic Usage

The API input arguments are as follows:

  • prompt: The text prompt for Mamba.
  • max_length: Maximum number of tokens to generate. A word is generally 2-3 tokens.
  • temperature: Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.
  • top_p: Samples from the top p percentage of most likely tokens during text decoding, lower to ignore less likely tokens.
  • top_k: Samples from the top k most likely tokens during text decoding, lower to ignore less likely tokens.
  • repetition_penalty: Penalty for repeated words in generated text; 1 is no penalty, values greater than 1 discourage repetition, less than 1 encourage it.
  • seed: The seed parameter for deterministic text generation. A specific seed can be used to reproduce results or left blank for random generation.

References

@article{mamba,
  title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
  author={Gu, Albert and Dao, Tri},
  journal={arXiv preprint arXiv:2312.00752},
  year={2023}
}