kcaverly / nous-capybara-34b-gguf

prompt: Why does a GPU process matrix multiplication faster than a CPU?
temperature: 0.5
system_prompt: You are 'Nous-Capybara', an AI assistant and your purpose and drive is to assist the user with any request they have.
max_new_tokens: -1
repeat_penalty: 1.1
prompt_template: SYSTEM: {system_prompt} USER: {prompt} ASSISTANT:

{
  "prompt": "Why does a GPU process matrix multiplication faster than a CPU?",
  "temperature": 0.5,
  "system_prompt": "You are 'Nous-Capybara', an AI assistant and your purpose and drive is to assist the user with any request they have.",
  "max_new_tokens": -1,
  "repeat_penalty": 1.1,
  "prompt_template": "SYSTEM: {system_prompt}\nUSER: {prompt}\nASSISTANT: "
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run kcaverly/nous-capybara-34b-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "kcaverly/nous-capybara-34b-gguf:6b9b7741e719899f26571567b892a8900d0b517bfadee3997f5d477897d10eef",
  {
    input: {
      prompt: "Why does a GPU process matrix multiplication faster than a CPU?",
      temperature: 0.5,
      system_prompt: "You are 'Nous-Capybara', an AI assistant and your purpose and drive is to assist the user with any request they have.",
      max_new_tokens: -1,
      repeat_penalty: 1.1,
      prompt_template: "SYSTEM: {system_prompt}\nUSER: {prompt}\nASSISTANT: "
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run kcaverly/nous-capybara-34b-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "kcaverly/nous-capybara-34b-gguf:6b9b7741e719899f26571567b892a8900d0b517bfadee3997f5d477897d10eef",
    input={
        "prompt": "Why does a GPU process matrix multiplication faster than a CPU?",
        "temperature": 0.5,
        "system_prompt": "You are 'Nous-Capybara', an AI assistant and your purpose and drive is to assist the user with any request they have.",
        "max_new_tokens": -1,
        "repeat_penalty": 1.1,
        "prompt_template": "SYSTEM: {system_prompt}\nUSER: {prompt}\nASSISTANT: "
    }
)

# The kcaverly/nous-capybara-34b-gguf model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/kcaverly/nous-capybara-34b-gguf/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Run kcaverly/nous-capybara-34b-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "kcaverly/nous-capybara-34b-gguf:6b9b7741e719899f26571567b892a8900d0b517bfadee3997f5d477897d10eef",
    "input": {
      "prompt": "Why does a GPU process matrix multiplication faster than a CPU?",
      "temperature": 0.5,
      "system_prompt": "You are \'Nous-Capybara\', an AI assistant and your purpose and drive is to assist the user with any request they have.",
      "max_new_tokens": -1,
      "repeat_penalty": 1.1,
      "prompt_template": "SYSTEM: {system_prompt}\\nUSER: {prompt}\\nASSISTANT: "
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

1. Parallel Processing: GPUs are designed specifically for parallel processing, which makes them ideal for tasks like matrix multiplication that can be split into smaller, independent parts. On the other hand, CPUs are optimized for sequential processing and handling complex calculations. 2. Specialized Hardware: GPUs have specialized hardware called streaming multiprocessors that perform mathematical operations much faster than a CPU's cores. These streaming multiprocessors work together to process large amounts of data simultaneously, leading to significant performance improvements in tasks like matrix multiplication. 3. Memory Bandwidth: GPUs have higher memory bandwidth compared to CPUs, which allows them to load and manipulate larger datasets more efficiently. This is crucial for matrix multiplication as the operation requires a significant amount of memory to store intermediate results. 4. Compute Unified Device Architecture (CUDA) : NVIDIA's CUDA technology enables developers to write programs that can take advantage of GPUs' parallel processing capabilities. This allows CPU-intensive tasks like matrix multiplication to be offloaded to the GPU, resulting in faster computation times. 5. Task Switching: GPUs excel at performing the same task repeatedly and can switch between tasks much faster than a CPU. When performing matrix multiplication, this means that the GPU can work on multiple matrices simultaneously without wasting time on context switching.</s>

{
  "completed_at": "2024-01-09T19:51:45.485007Z",
  "created_at": "2024-01-09T19:47:31.371314Z",
  "data_removed": false,
  "error": null,
  "id": "imnbvg3bsfeqqtih6sfixxishu",
  "input": {
    "prompt": "Why does a GPU process matrix multiplication faster than a CPU?",
    "temperature": 0.5,
    "system_prompt": "You are 'Nous-Capybara', an AI assistant and your purpose and drive is to assist the user with any request they have.",
    "max_new_tokens": -1,
    "repeat_penalty": 1.1,
    "prompt_template": "SYSTEM: {system_prompt}\nUSER: {prompt}\nASSISTANT: "
  },
  "logs": "llama_print_timings:        load time =     773.84 ms\nllama_print_timings:      sample time =      83.54 ms /   266 runs   (    0.31 ms per token,  3184.18 tokens per second)\nllama_print_timings: prompt eval time =     770.88 ms /    56 tokens (   13.77 ms per token,    72.64 tokens per second)\nllama_print_timings:        eval time =   16066.31 ms /   265 runs   (   60.63 ms per token,    16.49 tokens per second)\nllama_print_timings:       total time =   18610.34 ms",
  "metrics": {
    "predict_time": 18.625253,
    "total_time": 254.113693
  },
  "output": [
    "1",
    ".",
    " Par",
    "allel",
    " Processing",
    ":",
    " GPUs",
    " are",
    " designed",
    " specifically",
    " for",
    " parallel",
    " processing",
    ",",
    " which",
    " makes",
    " them",
    " ideal",
    " for",
    " tasks",
    " like",
    " matrix",
    " multiplication",
    " that",
    " can",
    " be",
    " split",
    " into",
    " smaller",
    ",",
    " independent",
    " parts",
    ".",
    " On",
    " the",
    " other",
    " hand",
    ",",
    " CPU",
    "s",
    " are",
    " optimized",
    " for",
    " sequential",
    " processing",
    " and",
    " handling",
    " complex",
    " calculations",
    ".",
    "\n",
    "2",
    ".",
    " Special",
    "ized",
    " Hardware",
    ":",
    " GPUs",
    " have",
    " specialized",
    " hardware",
    " called",
    " streaming",
    " multi",
    "process",
    "ors",
    " that",
    " perform",
    " mathematical",
    " operations",
    " much",
    " faster",
    " than",
    " a",
    " CPU",
    "'",
    "s",
    " cores",
    ".",
    " These",
    " streaming",
    " multi",
    "process",
    "ors",
    " work",
    " together",
    " to",
    " process",
    " large",
    " amounts",
    " of",
    " data",
    " simultaneously",
    ",",
    " leading",
    " to",
    " significant",
    " performance",
    " improvements",
    " in",
    " tasks",
    " like",
    " matrix",
    " multiplication",
    ".",
    "\n",
    "3",
    ".",
    " Memory",
    " Band",
    "width",
    ":",
    " GPUs",
    " have",
    " higher",
    " memory",
    " bandwidth",
    " compared",
    " to",
    " CPU",
    "s",
    ",",
    " which",
    " allows",
    " them",
    " to",
    " load",
    " and",
    " manipulate",
    " larger",
    " datasets",
    " more",
    " efficiently",
    ".",
    " This",
    " is",
    " crucial",
    " for",
    " matrix",
    " multiplication",
    " as",
    " the",
    " operation",
    " requires",
    " a",
    " significant",
    " amount",
    " of",
    " memory",
    " to",
    " store",
    " intermediate",
    " results",
    ".",
    "\n",
    "4",
    ".",
    " Compute",
    " Unified",
    " Device",
    " Architecture",
    " (",
    "CU",
    "DA",
    ")",
    " :",
    " NVIDIA",
    "'",
    "s",
    " CU",
    "DA",
    " technology",
    " enables",
    " developers",
    " to",
    " write",
    " programs",
    " that",
    " can",
    " take",
    " advantage",
    " of",
    " GPUs",
    "'",
    " parallel",
    " processing",
    " capabilities",
    ".",
    " This",
    " allows",
    " CPU",
    "-",
    "int",
    "ensive",
    " tasks",
    " like",
    " matrix",
    " multiplication",
    " to",
    " be",
    " off",
    "loaded",
    " to",
    " the",
    " GPU",
    ",",
    " resulting",
    " in",
    " faster",
    " computation",
    " times",
    ".",
    "\n",
    "\n",
    "5",
    ".",
    " Task",
    " Sw",
    "itching",
    ":",
    " GPUs",
    " excel",
    " at",
    " performing",
    " the",
    " same",
    " task",
    " repeatedly",
    " and",
    " can",
    " switch",
    " between",
    " tasks",
    " much",
    " faster",
    " than",
    " a",
    " CPU",
    ".",
    " When",
    " performing",
    " matrix",
    " multiplication",
    ",",
    " this",
    " means",
    " that",
    " the",
    " GPU",
    " can",
    " work",
    " on",
    " multiple",
    " matrices",
    " simultaneously",
    " without",
    " wasting",
    " time",
    " on",
    " context",
    " switching",
    ".",
    "</",
    "s",
    ">",
    ""
  ],
  "started_at": "2024-01-09T19:51:26.859754Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/imnbvg3bsfeqqtih6sfixxishu",
    "get": "https://api.replicate.com/v1/predictions/imnbvg3bsfeqqtih6sfixxishu",
    "cancel": "https://api.replicate.com/v1/predictions/imnbvg3bsfeqqtih6sfixxishu/cancel"
  },
  "version": "6b9b7741e719899f26571567b892a8900d0b517bfadee3997f5d477897d10eef"
}

Generated in

18.6 seconds

Tweak it Share Report

Want to make some of these yourself?

Run this model