Official

ibm-granite / granite-3.3-8b-instruct

Granite-3.3-8B-Instruct is a 8-billion parameter 128K context length language model fine-tuned for improved reasoning and instruction-following capabilities.

  • Public
  • 20.9K runs
  • Priced per token
  • GitHub
  • Weights
  • License
Run with an API
  • Prediction

    ibm-granite/granite-3.3-8b-instruct:a325a0cacfb0aa9226e6bad1abe5385f1073f4c7f8c36e52ed040e5409e6c034
    ID
    t96wgpwzn9rm80cp79q8cxprm8
    Status
    Succeeded
    Source
    Web
    Hardware
    H100
    Total duration
    Created
    by @ibm-granite

    Input

    top_k
    50
    top_p
    0.9
    prompt
    How is perplexity measured for LLMs and why is it useful?
    max_tokens
    512
    min_tokens
    0
    temperature
    0.6
    presence_penalty
    0
    frequency_penalty
    0

    Output

    Perplexity is a common metric used to evaluate the performance of language models, including large language models (LLMs). It measures how well a language model predicts a sample. A lower perplexity score indicates better performance, as it suggests the model is more confident in its predictions. Perplexity is calculated by raising 2 to the average negative log-likelihood of the test set. In simpler terms, it quantifies the uncertainty of the model in predicting the next word in a sequence. A perplexity of 2^n means that, on average, the model is uncertain about the next word, assigning a probability of 1/n to the correct word. Here's a step-by-step breakdown of the calculation: 1. For each word in the test set, the language model computes a probability distribution over the vocabulary, predicting the likelihood of the next word. 2. The negative logarithm (base 2) of these probabilities is calculated for each word. 3. The negative log-likelihood is averaged across all words in the test set. 4. 2 raised to the power of this average negative log-likelihood gives the perplexity score. Perplexity is useful for several reasons: 1. **Model comparison**: It allows for comparing different language models or model configurations. A model with lower perplexity on a given test set is generally considered to have better performance. 2. **Training optimization**: Perplexity can be used as a loss function during training, guiding the model to minimize uncertainty in its predictions. Lower perplexity during training indicates that the model is learning to make more accurate predictions. 3. **Transfer learning and fine-tuning**: Perplexity can help assess the impact of transfer learning or fine-tuning on pre-trained language models. A decrease in perplexity after applying these techniques suggests improved performance. 4. **Understanding model behavior**: Perplexity can offer insights into how well the model captures the underlying patterns and structures of the language. However, it's essential to consider the limitations of perplexity. It doesn't always correlate with human-evaluated quality, especially for specific tasks or domains. Moreover, it tends to favor models that assign very low probabilities to a small subset of words, which might not necessarily indicate better performance in practical applications. In summary, perplexity is a valuable
    Generated in

Want to make some of these yourself?

Run this model