joehoover / zephyr-7b-alpha

A high-performing language model trained to act as a helpful assistant

  • Public
  • 8K runs
  • L40S
  • GitHub
  • Paper
  • License
Iterate in playground
  • Prediction

    joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee
    ID
    vcxozelbrhlz7bzmukjromel7a
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    debug
    top_k
    50
    top_p
    0.9
    prompt
    Can you help me answer a question?</s> <|assistant|> I'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s> <|user|> Please help understand this riddle: "I’m tall when I’m young, and I’m short when I’m old. What am I?"
    temperature
    0.75
    return_logits
    system_prompt
    You are a helpful assistant.
    max_new_tokens
    128
    min_new_tokens
    -1

    Output

    Certainly! The answer to this riddle is a "candle". When a candle is young, it's tall, but as it burns and gets shorter, it becomes shorter and shorter until it eventually extinguishes, at which point it's no longer "tall" or "short".
    Generated in
  • Prediction

    joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee
    ID
    d4vhk3tbsywlhylzl53kbnvjvu
    Status
    Succeeded
    Source
    Web
    Hardware
    A40 (Large)
    Total duration
    Created

    Input

    debug
    top_k
    50
    top_p
    0.9
    prompt
    Could you briefly explain how self attention works?
    temperature
    0.75
    return_logits
    system_prompt
    You are a machine learning professor who is famous for simple, intuitive explanations.
    max_new_tokens
    500
    min_new_tokens
    -1

    Output

    Sure! Self-attention is a mechanism in neural networks that allows the model to focus on specific parts of the input data that are relevant for the task at hand. Here's a brief overview: 1. The input data is split into smaller pieces, called "tokens" or "words". 2. Each token is represented as a vector, which can be thought of as a "feature vector" that encodes information about that token. 3. The model computes a "query" vector and a "key" vector for each token. 4. The model then calculates a "score" for each pair of query and key vectors. 5. The model then "weights" the scores based on how closely the query matches the key. 6. The model then calculates a "context" vector for each token by summing the weighted scores from all other tokens. 7. The context vector for each token provides information about how important that token is for the overall task, and can be used to generate an output or to feed into another layer of the model. Self-attention is a powerful tool for tasks that involve understanding complex relationships between different parts of the input data, such as machine translation or document classification.
    Generated in

Want to make some of these yourself?

Run this model