replicate / flan-t5-xl

A language model by Google for tasks like classification, summarization, and more

  • Public
  • 144.1K runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.14 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 103 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Model description

Flan-T5 is a family of large language models trained at Google, finetuned on a collection of datasets phrased as instructions. It has strong zero-shot, few-shot, and chain of thought abilities. Because of these abilities, Flan-T5 is useful for a wide array of natural language tasks. This model is Flan-T5 XL, the 3B parameter version of Flan-T5. To learn more about Flan-T5, read the Flan paper here.

Fine-tuning

If you have access to the training beta, you can fine-tune this model.

Here’s an example using replicate-python:

training = replicate.trainings.create(
    version="replicate/flan-t5-xl:7a216605843d87f5426a10d2cc6940485a232336ed04d655ef86b91e020e9210", 
    input={
        "train_data": "https://storage.googleapis.com/dan-scratch-public/fine-tuning/70k_samples.jsonl", 
    }, 
    destination="my-username/my-model"
)

Training takes these input parameters:

  • train_data (required): URL to a file where each row is a JSON record in the format {"prompt": ..., "completion": ...}. Can be JSONL or one JSON list.
  • train_batch_size (optional, default=4): Train batch size. Large batches = faster training, too large and you may run out of GPU memory. While the default is 4, we recommend training this model with a batch size of 32.
  • gradient_accumulation_steps (optional, default=8): Number of training steps (each of train_batch_size) to update gradients for before performing a backward pass.
  • learning_rate (optional, default=2e-5): Learning rate!
  • num_train_epochs (optional, default=1): Number of epochs (iterations over the entire training dataset) to train for.
  • warmup_ratio (optional, default=0.03): Percentage of all training steps used for a linear LR warmup.
  • logging_steps (optional, default=1): Prints loss & other logging info every logging_steps.
  • max_steps (optional, default=-1): Maximum number of training steps. Unlimited if max_steps=-1.

Usage

Flan-T5 is capable of various natural language tasks. Some of these include question answering, classification, summarization and translation, among others. Here are some examples of this, summarized here and linked for information about the parameters used.

Question Answering:

Prompt: Answer the following yes/no question by reasoning step by step. Can a dog drive a car?

Output: Dogs do not have a drivers license nor can they operate a car. Therefore, the final answer is no.

Sentiment Analysis/Classification:

Prompt:
Here are some phrases and sentiments. 

I love this spaghetti: Positive 
The air outside is great: Positive 
You hated the cat in the hat: Negative 
They are in a foul mood: Negative 
The dinner was horrible: 

Output: Negative

Caveats and Limitations

The information below is copied from the model’s official model card:

Language models, including Flan-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). Flan-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.

Flan-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.

Citation

@misc{https://doi.org/10.48550/arxiv.2210.11416,
  doi = {10.48550/ARXIV.2210.11416},
  url = {https://arxiv.org/abs/2210.11416},
  author = {Chung, Hyung Won and Hou, Le and Longpre, Shayne and Zoph, Barret and Tay, Yi and Fedus, William and Li, Yunxuan and Wang, Xuezhi and Dehghani, Mostafa and Brahma, Siddhartha and Webson, Albert and Gu, Shixiang Shane and Dai, Zhuyun and Suzgun, Mirac and Chen, Xinyun and Chowdhery, Aakanksha and Castro-Ros, Alex and Pellat, Marie and Robinson, Kevin and Valter, Dasha and Narang, Sharan and Mishra, Gaurav and Yu, Adams and Zhao, Vincent and Huang, Yanping and Dai, Andrew and Yu, Hongkun and Petrov, Slav and Chi, Ed H. and Dean, Jeff and Devlin, Jacob and Roberts, Adam and Zhou, Denny and Le, Quoc V. and Wei, Jason},
  keywords = {Machine Learning (cs.LG), Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Scaling Instruction-Finetuned Language Models},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}