This documentation was adapted from the MPT-7B-StoryWriter-65k+ Hugging Face Hub README.
MPT-7B-StoryWriter-65k+ is a language model that specializes in generating fictional stories with lengthy context lengths. The model was created by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. Thanks to ALiBi, the model can extrapolate beyond 65k tokens at inference time, allowing for longer story generations. The MosaicML team demonstrated the ability to generate stories as long as 84k tokens on a single node of 8 A100-80GB GPUs in MosaicML's blog post.
License: Apache 2.0
This model was trained by MosaicML and is based on a modified decoder-only transformer architecture.
The model was released on May 5, 2023.
This model is licensed under Apache 2.0.
MPT-7B-StoryWriter-65k+ is a modification of a standard decoder-only transformer that was specifically designed for generating fictional stories. Our team has made several modifications to the standard transformer, including the use of FlashAttention and ALiBi (Attention with Linear Biases), and the removal of positional embeddings and biases.
The model's hyperparameters are as follows:
For more information on the pretraining process, please refer to the MPT-7B documentation.
Limitations and Biases
MPT-7B-StoryWriter-65k+ has the ability to generate fictional stories that may contain inaccuracies and should not be relied upon to produce factually accurate information. The model was trained on various public datasets, and although significant efforts have been taken to clean the pretraining data, there is still the possibility that it may generate biased or offensive content.
Alex Trott and the MosaicML NLP team finetuned this model.
This model's license does not constitute legal advice, and we are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.