replicate / mpt-7b-storywriter

A 7B parameter LLM fine-tuned to support contexts with more than 65K tokens

  • Public
  • 8.3K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 5 seconds. The predict time for this model varies significantly based on the inputs.

Readme

MPT-7B-StoryWriter-65k+

This documentation was adapted from the MPT-7B-StoryWriter-65k+ Hugging Face Hub README.

MPT-7B-StoryWriter-65k+ is a language model that specializes in generating fictional stories with lengthy context lengths. The model was created by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. Thanks to ALiBi, the model can extrapolate beyond 65k tokens at inference time, allowing for longer story generations. The MosaicML team demonstrated the ability to generate stories as long as 84k tokens on a single node of 8 A100-80GB GPUs in MosaicML’s blog post.

License: Apache 2.0

This model was trained by MosaicML and is based on a modified decoder-only transformer architecture.

Model Date

The model was released on May 5, 2023.

Model License

This model is licensed under Apache 2.0.

Model Description

MPT-7B-StoryWriter-65k+ is a modification of a standard decoder-only transformer that was specifically designed for generating fictional stories. Our team has made several modifications to the standard transformer, including the use of FlashAttention and ALiBi (Attention with Linear Biases), and the removal of positional embeddings and biases.

The model’s hyperparameters are as follows:

Hyperparameter Value
n_parameters 6.7B
n_layers 32
n_heads 32
d_model 4096
vocab size 50432
sequence length 65536

PreTraining Data

For more information on the pretraining process, please refer to the MPT-7B documentation.

Limitations and Biases

MPT-7B-StoryWriter-65k+ has the ability to generate fictional stories that may contain inaccuracies and should not be relied upon to produce factually accurate information. The model was trained on various public datasets, and although significant efforts have been taken to clean the pretraining data, there is still the possibility that it may generate biased or offensive content.

Acknowledgements

Alex Trott and the MosaicML NLP team finetuned this model.

Disclaimer

This model’s license does not constitute legal advice, and we are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.