joehoover / zephyr-7b-alpha

debug
top_k: 50
top_p: 0.9
prompt: Can you help me answer a question?</s> <|assistant|> I'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s> <|user|> Please help understand this riddle: "I’m tall when I’m young, and I’m short when I’m old. What am I?"
temperature: 0.75
return_logits
system_prompt: You are a helpful assistant.
max_new_tokens: 128
min_new_tokens: -1

{
  "debug": false,
  "top_k": 50,
  "top_p": 0.9,
  "prompt": "Can you help me answer a question?</s>\n<|assistant|>\nI'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s>\n<|user|>\nPlease help understand this riddle: \"I’m tall when I’m young, and I’m short when I’m old. What am I?\"",
  "temperature": 0.75,
  "return_logits": false,
  "system_prompt": "You are a helpful assistant.",
  "max_new_tokens": 128,
  "min_new_tokens": -1
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run joehoover/zephyr-7b-alpha using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee",
  {
    input: {
      debug: false,
      top_k: 50,
      top_p: 0.9,
      prompt: "Can you help me answer a question?</s>\n<|assistant|>\nI'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s>\n<|user|>\nPlease help understand this riddle: \"I’m tall when I’m young, and I’m short when I’m old. What am I?\"",
      temperature: 0.75,
      return_logits: false,
      system_prompt: "You are a helpful assistant.",
      max_new_tokens: 128,
      min_new_tokens: -1
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run joehoover/zephyr-7b-alpha using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee",
    input={
        "debug": False,
        "top_k": 50,
        "top_p": 0.9,
        "prompt": "Can you help me answer a question?</s>\n<|assistant|>\nI'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s>\n<|user|>\nPlease help understand this riddle: \"I’m tall when I’m young, and I’m short when I’m old. What am I?\"",
        "temperature": 0.75,
        "return_logits": False,
        "system_prompt": "You are a helpful assistant.",
        "max_new_tokens": 128,
        "min_new_tokens": -1
    }
)

# The joehoover/zephyr-7b-alpha model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/joehoover/zephyr-7b-alpha/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Run joehoover/zephyr-7b-alpha using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee",
    "input": {
      "debug": false,
      "top_k": 50,
      "top_p": 0.9,
      "prompt": "Can you help me answer a question?</s>\\n<|assistant|>\\nI\'d be happy to help you answer any question you have. Please provide me with the question you\'d like assistance with, and I\'ll do my best to provide you with an answer.</s>\\n<|user|>\\nPlease help understand this riddle: \\"I’m tall when I’m young, and I’m short when I’m old. What am I?\\"",
      "temperature": 0.75,
      "return_logits": false,
      "system_prompt": "You are a helpful assistant.",
      "max_new_tokens": 128,
      "min_new_tokens": -1
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Certainly! The answer to this riddle is a "candle". When a candle is young, it's tall, but as it burns and gets shorter, it becomes shorter and shorter until it eventually extinguishes, at which point it's no longer "tall" or "short".

{
  "completed_at": "2023-10-13T16:57:50.793307Z",
  "created_at": "2023-10-13T16:57:46.114914Z",
  "data_removed": false,
  "error": null,
  "id": "vcxozelbrhlz7bzmukjromel7a",
  "input": {
    "debug": false,
    "top_k": 50,
    "top_p": 0.9,
    "prompt": "Can you help me answer a question?</s>\n<|assistant|>\nI'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s>\n<|user|>\nPlease help understand this riddle: \"I’m tall when I’m young, and I’m short when I’m old. What am I?\"",
    "temperature": 0.75,
    "return_logits": false,
    "system_prompt": "You are a helpful assistant.",
    "max_new_tokens": 128,
    "min_new_tokens": -1
  },
  "logs": "You are a helpful assistant.\nPROMPT_TEMPLATE: <|system|>\n{system_prompt}</s>\n<|user|>\n{instruction}</s>\n<|assistant|>\nYour formatted prompt is:\n<|system|>\nYou are a helpful assistant.</s>\n<|user|>\nCan you help me answer a question?</s>\n<|assistant|>\nI'd be happy to help you answer any question you have. Please provide me with the question you'd like assistance with, and I'll do my best to provide you with an answer.</s>\n<|user|>\nPlease help understand this riddle: \"I’m tall when I’m young, and I’m short when I’m old. What am I?\"</s>\n<|assistant|>\nDisabling loras\nNo loras were ever loaded, nothing to disable.\nNot using LoRA\nTokens:  tensor([[    1,   523, 28766,  6574, 28766, 28767,    13,  1976,   460,   264,\n10865, 13892, 28723,     2,   523, 28766,  1838, 28766, 28767,    13,\n6325,   368,  1316,   528,  4372,   264,  2996, 28804,     2,   523,\n28766,   489, 11143, 28766, 28767,    13, 28737, 28742, 28715,   347,\n4610,   298,  1316,   368,  4372,   707,  2996,   368,   506, 28723,\n5919,  3084,   528,   395,   272,  2996,   368, 28742, 28715,   737,\n11611,   395, 28725,   304,   315, 28742,   584,   511,   586,  1489,\n298,  3084,   368,   395,   396,  4372, 28723,     2,   523, 28766,\n1838, 28766, 28767,    13, 12069,  1316,  2380,   456,   408,  3287,\n28747,   345, 28737, 28809, 28719,  9369,   739,   315, 28809, 28719,\n2518, 28725,   304,   315, 28809, 28719,  2485,   739,   315, 28809,\n28719,  1571, 28723,  1824,   837,   315,  1110,     2,   523, 28766,\n489, 11143, 28766, 28767,    13]], device='cuda:0')\nhostname: model-14ec6336-bdb00b47d4679dba-gpu-a40-557dfb9fc5-frhh6",
  "metrics": {
    "predict_time": 4.718597,
    "total_time": 4.678393
  },
  "output": [
    "",
    "",
    "",
    "",
    "Certainly! ",
    "The ",
    "answer ",
    "to ",
    "this ",
    "",
    "riddle ",
    "is ",
    "a ",
    "",
    "",
    "",
    "",
    "\"candle\". ",
    "When ",
    "a ",
    "candle ",
    "is ",
    "",
    "young, ",
    "",
    "",
    "it's ",
    "",
    "tall, ",
    "but ",
    "as ",
    "it ",
    "",
    "burns ",
    "and ",
    "gets ",
    "",
    "shorter, ",
    "it ",
    "becomes ",
    "shorter ",
    "and ",
    "shorter ",
    "until ",
    "it ",
    "eventually ",
    "",
    "",
    "",
    "",
    "extinguishes, ",
    "at ",
    "which ",
    "point ",
    "",
    "",
    "it's ",
    "no ",
    "longer ",
    "",
    "",
    "",
    "\"tall\" ",
    "or ",
    "",
    "",
    "",
    "\"short\"."
  ],
  "started_at": "2023-10-13T16:57:46.074710Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/vcxozelbrhlz7bzmukjromel7a",
    "cancel": "https://api.replicate.com/v1/predictions/vcxozelbrhlz7bzmukjromel7a/cancel"
  },
  "version": "14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee"
}

Generated in

4.7 seconds

Tweak it Share Report View full prediction

Prediction

joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee

Model

joehoover/zephyr-7b-alpha:14ec6336

d4vhk3tbsywlhylzl53kbnvjvu

Status

Succeeded

Source

Web

Hardware

A40 (Large)

Total duration

18.0s

Created

over 1 year ago

Input

debug
top_k: 50
top_p: 0.9
prompt: Could you briefly explain how self attention works?
temperature: 0.75
return_logits
system_prompt: You are a machine learning professor who is famous for simple, intuitive explanations.
max_new_tokens: 500
min_new_tokens: -1

{
  "debug": false,
  "top_k": 50,
  "top_p": 0.9,
  "prompt": "Could you briefly explain how self attention works?",
  "temperature": 0.75,
  "return_logits": false,
  "system_prompt": "You are a machine learning professor who is famous for simple, intuitive explanations.",
  "max_new_tokens": 500,
  "min_new_tokens": -1
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run joehoover/zephyr-7b-alpha using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee",
  {
    input: {
      debug: false,
      top_k: 50,
      top_p: 0.9,
      prompt: "Could you briefly explain how self attention works?",
      temperature: 0.75,
      return_logits: false,
      system_prompt: "You are a machine learning professor who is famous for simple, intuitive explanations.",
      max_new_tokens: 500,
      min_new_tokens: -1
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run joehoover/zephyr-7b-alpha using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee",
    input={
        "debug": False,
        "top_k": 50,
        "top_p": 0.9,
        "prompt": "Could you briefly explain how self attention works?",
        "temperature": 0.75,
        "return_logits": False,
        "system_prompt": "You are a machine learning professor who is famous for simple, intuitive explanations.",
        "max_new_tokens": 500,
        "min_new_tokens": -1
    }
)

# The joehoover/zephyr-7b-alpha model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/joehoover/zephyr-7b-alpha/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Run joehoover/zephyr-7b-alpha using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "joehoover/zephyr-7b-alpha:14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee",
    "input": {
      "debug": false,
      "top_k": 50,
      "top_p": 0.9,
      "prompt": "Could you briefly explain how self attention works?",
      "temperature": 0.75,
      "return_logits": false,
      "system_prompt": "You are a machine learning professor who is famous for simple, intuitive explanations.",
      "max_new_tokens": 500,
      "min_new_tokens": -1
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Sure! Self-attention is a mechanism in neural networks that allows the model to focus on specific parts of the input data that are relevant for the task at hand. Here's a brief overview: 1. The input data is split into smaller pieces, called "tokens" or "words". 2. Each token is represented as a vector, which can be thought of as a "feature vector" that encodes information about that token. 3. The model computes a "query" vector and a "key" vector for each token. 4. The model then calculates a "score" for each pair of query and key vectors. 5. The model then "weights" the scores based on how closely the query matches the key. 6. The model then calculates a "context" vector for each token by summing the weighted scores from all other tokens. 7. The context vector for each token provides information about how important that token is for the overall task, and can be used to generate an output or to feed into another layer of the model. Self-attention is a powerful tool for tasks that involve understanding complex relationships between different parts of the input data, such as machine translation or document classification.

{
  "completed_at": "2023-10-13T17:10:58.892613Z",
  "created_at": "2023-10-13T17:10:40.933795Z",
  "data_removed": false,
  "error": null,
  "id": "d4vhk3tbsywlhylzl53kbnvjvu",
  "input": {
    "debug": false,
    "top_k": 50,
    "top_p": 0.9,
    "prompt": "Could you briefly explain how self attention works?",
    "temperature": 0.75,
    "return_logits": false,
    "system_prompt": "You are a machine learning professor who is famous for simple, intuitive explanations.",
    "max_new_tokens": 500,
    "min_new_tokens": -1
  },
  "logs": "You are a machine learning professor who is famous for simple, intuitive explanations.\nPROMPT_TEMPLATE: <|system|>\n{system_prompt}</s>\n<|user|>\n{instruction}</s>\n<|assistant|>\nYour formatted prompt is:\n<|system|>\nYou are a machine learning professor who is famous for simple, intuitive explanations.</s>\n<|user|>\nCould you briefly explain how self attention works?</s>\n<|assistant|>\nDisabling loras\nNo loras were ever loaded, nothing to disable.\nNot using LoRA\nTokens:  tensor([[    1,   523, 28766,  6574, 28766, 28767,    13,  1976,   460,   264,\n5599,  5168, 12192,   693,   349,  8376,   354,  3588, 28725, 21141,\n2468, 10928,   697, 28723,     2,   523, 28766,  1838, 28766, 28767,\n13, 17024,   368, 15643,  7282,   910,  1008,  4501,  3791, 28804,\n2,   523, 28766,   489, 11143, 28766, 28767,    13]],\ndevice='cuda:0')\nhostname: model-14ec6336-bdb00b47d4679dba-gpu-a40-557dfb9fc5-gvgj5",
  "metrics": {
    "predict_time": 18.019209,
    "total_time": 17.958818
  },
  "output": [
    "",
    "",
    "Sure! ",
    "",
    "",
    "",
    "Self-attention ",
    "is ",
    "a ",
    "mechanism ",
    "in ",
    "neural ",
    "networks ",
    "that ",
    "allows ",
    "the ",
    "model ",
    "to ",
    "focus ",
    "on ",
    "specific ",
    "parts ",
    "of ",
    "the ",
    "input ",
    "data ",
    "that ",
    "are ",
    "relevant ",
    "for ",
    "the ",
    "task ",
    "at ",
    "",
    "hand.\n",
    "\n",
    "",
    "",
    "",
    "Here's ",
    "a ",
    "brief ",
    "",
    "overview:\n",
    "\n",
    "",
    "",
    "1. ",
    "The ",
    "input ",
    "data ",
    "is ",
    "split ",
    "into ",
    "smaller ",
    "",
    "pieces, ",
    "called ",
    "",
    "",
    "\"tokens\" ",
    "or ",
    "",
    "",
    "\"words\".\n",
    "\n",
    "",
    "",
    "2. ",
    "Each ",
    "token ",
    "is ",
    "represented ",
    "as ",
    "a ",
    "",
    "vector, ",
    "which ",
    "can ",
    "be ",
    "thought ",
    "of ",
    "as ",
    "a ",
    "",
    "\"feature ",
    "",
    "vector\" ",
    "that ",
    "",
    "encodes ",
    "information ",
    "about ",
    "that ",
    "",
    "token.\n",
    "\n",
    "",
    "",
    "3. ",
    "The ",
    "model ",
    "",
    "computes ",
    "a ",
    "",
    "",
    "\"query\" ",
    "vector ",
    "and ",
    "a ",
    "",
    "",
    "\"key\" ",
    "vector ",
    "for ",
    "each ",
    "",
    "token.\n",
    "\n",
    "",
    "",
    "4. ",
    "The ",
    "model ",
    "then ",
    "",
    "calculates ",
    "a ",
    "",
    "",
    "\"score\" ",
    "for ",
    "each ",
    "pair ",
    "of ",
    "query ",
    "and ",
    "key ",
    "",
    "vectors.\n",
    "\n",
    "",
    "",
    "5. ",
    "The ",
    "model ",
    "then ",
    "",
    "",
    "\"weights\" ",
    "the ",
    "scores ",
    "based ",
    "on ",
    "how ",
    "closely ",
    "the ",
    "query ",
    "matches ",
    "the ",
    "",
    "key.\n",
    "\n",
    "",
    "",
    "6. ",
    "The ",
    "model ",
    "then ",
    "",
    "calculates ",
    "a ",
    "",
    "",
    "\"context\" ",
    "vector ",
    "for ",
    "each ",
    "token ",
    "by ",
    "",
    "summing ",
    "the ",
    "",
    "weighted ",
    "scores ",
    "from ",
    "all ",
    "other ",
    "",
    "tokens.\n",
    "\n",
    "",
    "",
    "7. ",
    "The ",
    "context ",
    "vector ",
    "for ",
    "each ",
    "token ",
    "provides ",
    "information ",
    "about ",
    "how ",
    "important ",
    "that ",
    "token ",
    "is ",
    "for ",
    "the ",
    "overall ",
    "",
    "task, ",
    "and ",
    "can ",
    "be ",
    "used ",
    "to ",
    "generate ",
    "an ",
    "output ",
    "or ",
    "to ",
    "feed ",
    "into ",
    "another ",
    "layer ",
    "of ",
    "the ",
    "",
    "model.\n",
    "\n",
    "",
    "",
    "",
    "",
    "Self-attention ",
    "is ",
    "a ",
    "powerful ",
    "tool ",
    "for ",
    "tasks ",
    "that ",
    "involve ",
    "understanding ",
    "complex ",
    "relationships ",
    "between ",
    "different ",
    "parts ",
    "of ",
    "the ",
    "input ",
    "",
    "data, ",
    "such ",
    "as ",
    "machine ",
    "translation ",
    "or ",
    "document ",
    "",
    "",
    "classification."
  ],
  "started_at": "2023-10-13T17:10:40.873404Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/d4vhk3tbsywlhylzl53kbnvjvu",
    "cancel": "https://api.replicate.com/v1/predictions/d4vhk3tbsywlhylzl53kbnvjvu/cancel"
  },
  "version": "14ec63365a1141134c41b652fe798633f48b1fd28b356725c4d8842a0ac151ee"
}

Generated in

18.0 seconds

Tweak it Share Report View full prediction

Want to make some of these yourself?

Run this model

joehoover / zephyr-7b-alpha

Prediction

Input

Output

Prediction

Input

Output

Logs (vcxozelbrhlz7bzmukjromel7a)

Logs (d4vhk3tbsywlhylzl53kbnvjvu)