kcaverly/nous-hermes-2-solar-10.7b-gguf | Run with an API on Replicate

kcaverly / nous-hermes-2-solar-10.7b-gguf

Nous Hermes 2 - SOLAR 10.7B is the flagship Nous Research model on the SOLAR 10.7B base model.

Cold

Public
9K runs
L40S
GitHub
Paper

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

prompt

*string

Shift + Return to add a new line

What are three lacking areas to bridge the gap between open and closed source AI?What are three lacking areas to bridge the gap between open and closed source AI?

Instruction for model

system_prompt

string

Shift + Return to add a new line

You are 'Hermes 2', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.You are 'Hermes 2', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.

System prompt for the model, helps guides model behaviour.

Default: "You are 'Hermes 2', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia."

prompt_template

string

Shift + Return to add a new line

<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Template to pass to model. Override if you are providing multi-turn instructions.

Default: "<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"

max_new_tokens

integer

Maximum new tokens to generate.

Default: -1

repeat_penalty

number

This parameter plays a role in controlling the behavior of an AI language model during conversation or text generation. Its purpose is to discourage the model from repeating itself too often by increasing the likelihood of following up with different content after each response. By adjusting this parameter, users can influence the model's tendency to either stay within familiar topics (lower penalty) or explore new ones (higher penalty). For instance, setting a high repeat penalty might result in more varied and dynamic conversations, whereas a low penalty could be suitable for scenarios where consistency and predictability are preferred.

Default: 1.1

temperature

number

This parameter used to control the 'warmth' or responsiveness of an AI model based on the LLaMA architecture. It adjusts how likely the model is to generate new, unexpected information versus sticking closely to what it has been trained on. A higher value for this parameter can lead to more creative and diverse responses, while a lower value results in safer, more conservative answers that are closer to those found in its training data. This parameter is particularly useful when fine-tuning models for specific tasks where you want to balance between generating novel insights and maintaining accuracy and coherence.

Default: 0.7

Run this model in Node.js with one line of code:

npx create-replicate --model=kcaverly/nous-hermes-2-solar-10.7b-gguf

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run kcaverly/nous-hermes-2-solar-10.7b-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "kcaverly/nous-hermes-2-solar-10.7b-gguf:955f2924d182e60e80caedecd15261d03d4ccc0151ff08e7fb14d0cad1fbcca6",
  {
    input: {
      prompt: "What are three lacking areas to bridge the gap between open and closed source AI?",
      temperature: 0.7,
      system_prompt: "You are 'Hermes 2', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.",
      max_new_tokens: -1,
      repeat_penalty: 1.1,
      prompt_template: "<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run kcaverly/nous-hermes-2-solar-10.7b-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "kcaverly/nous-hermes-2-solar-10.7b-gguf:955f2924d182e60e80caedecd15261d03d4ccc0151ff08e7fb14d0cad1fbcca6",
    input={
        "prompt": "What are three lacking areas to bridge the gap between open and closed source AI?",
        "temperature": 0.7,
        "system_prompt": "You are 'Hermes 2', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.",
        "max_new_tokens": -1,
        "repeat_penalty": 1.1,
        "prompt_template": "<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
    }
)

# The kcaverly/nous-hermes-2-solar-10.7b-gguf model can stream output as it's running.
# The predict method returns an iterator, and you can iterate over that output.
for item in output:
    # https://replicate.com/kcaverly/nous-hermes-2-solar-10.7b-gguf/api#output-schema
    print(item, end="")

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run kcaverly/nous-hermes-2-solar-10.7b-gguf using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "kcaverly/nous-hermes-2-solar-10.7b-gguf:955f2924d182e60e80caedecd15261d03d4ccc0151ff08e7fb14d0cad1fbcca6",
    "input": {
      "prompt": "What are three lacking areas to bridge the gap between open and closed source AI?",
      "temperature": 0.7,
      "system_prompt": "You are \'Hermes 2\', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.",
      "max_new_tokens": -1,
      "repeat_penalty": 1.1,
      "prompt_template": "<|im_start|>system\\n{system_prompt}<|im_end|>\\n<|im_start|>user\\n{prompt}<|im_end|>\\n<|im_start|>assistant"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

To bridge the gap between open and closed-source AI, we need to address three key areas: collaboration, accessibility, and transparency. 1. Collaboration: Encourage more cooperation between developers working on both open and closed-source AI projects. This can be achieved through open forums, workshops, and shared resources where they can exchange ideas, techniques, and knowledge to improve the field as a whole. 2. Accessibility: Make AI tools and resources more accessible to a wider audience, including those without significant financial or technical resources. This can involve providing free or low-cost educational materials, offering open-source alternatives to closed-source software, and creating user-friendly interfaces for both types of AI systems. 3. Transparency: Increase transparency in AI development by sharing information about the inner workings of closed-source AI systems and ensuring that open-source projects adhere to best practices for documentation, testing, and security. This will help build trust between users and developers, fostering a more collaborative environment where both parties can benefit from each other's expertise.

{
  "completed_at": "2024-01-02T14:38:54.031075Z",
  "created_at": "2024-01-02T14:38:46.279478Z",
  "data_removed": false,
  "error": null,
  "id": "mdgtvndbditjr6aet534tthpkm",
  "input": {
    "prompt": "What are three lacking areas to bridge the gap between open and closed source AI?",
    "temperature": 0.7,
    "system_prompt": "You are 'Hermes 2', a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.",
    "max_new_tokens": -1,
    "repeat_penalty": 1.1,
    "prompt_template": "<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant"
  },
  "logs": "Llama.generate: prefix-match hit\nllama_print_timings:        load time =     381.02 ms\nllama_print_timings:      sample time =      34.43 ms /   231 runs   (    0.15 ms per token,  6708.88 tokens per second)\nllama_print_timings: prompt eval time =     179.76 ms /    19 tokens (    9.46 ms per token,   105.70 tokens per second)\nllama_print_timings:        eval time =    6995.37 ms /   230 runs   (   30.41 ms per token,    32.88 tokens per second)\nllama_print_timings:       total time =    7654.57 ms",
  "metrics": {
    "predict_time": 7.718343,
    "total_time": 7.751597
  },
  "output": [
    "\n",
    "To",
    " bridge",
    " the",
    " gap",
    " between",
    " open",
    " and",
    " closed",
    "-",
    "source",
    " AI",
    ",",
    " we",
    " need",
    " to",
    " address",
    " three",
    " key",
    " areas",
    ":",
    " collaboration",
    ",",
    " access",
    "ibility",
    ",",
    " and",
    " trans",
    "parency",
    ".",
    "\n",
    "\n",
    "1",
    ".",
    " Coll",
    "abor",
    "ation",
    ":",
    " Enc",
    "our",
    "age",
    " more",
    " cooperation",
    " between",
    " developers",
    " working",
    " on",
    " both",
    " open",
    " and",
    " closed",
    "-",
    "source",
    " AI",
    " projects",
    ".",
    " This",
    " can",
    " be",
    " achieved",
    " through",
    " open",
    " for",
    "ums",
    ",",
    " workshops",
    ",",
    " and",
    " shared",
    " resources",
    " where",
    " they",
    " can",
    " exchange",
    " ideas",
    ",",
    " techniques",
    ",",
    " and",
    " knowledge",
    " to",
    " improve",
    " the",
    " field",
    " as",
    " a",
    " whole",
    ".",
    "\n",
    "\n",
    "2",
    ".",
    " Access",
    "ibility",
    ":",
    " Make",
    " AI",
    " tools",
    " and",
    " resources",
    " more",
    " accessible",
    " to",
    " a",
    " wider",
    " audience",
    ",",
    " including",
    " those",
    " without",
    " significant",
    " financial",
    " or",
    " technical",
    " resources",
    ".",
    " This",
    " can",
    " involve",
    " providing",
    " free",
    " or",
    " low",
    "-",
    "cost",
    " educational",
    " materials",
    ",",
    " offering",
    " open",
    "-",
    "source",
    " alternatives",
    " to",
    " closed",
    "-",
    "source",
    " software",
    ",",
    " and",
    " creating",
    " user",
    "-",
    "friendly",
    " inter",
    "faces",
    " for",
    " both",
    " types",
    " of",
    " AI",
    " systems",
    ".",
    "\n",
    "\n",
    "3",
    ".",
    " Trans",
    "parency",
    ":",
    " Incre",
    "ase",
    " trans",
    "parency",
    " in",
    " AI",
    " development",
    " by",
    " sharing",
    " information",
    " about",
    " the",
    " inner",
    " work",
    "ings",
    " of",
    " closed",
    "-",
    "source",
    " AI",
    " systems",
    " and",
    " ensuring",
    " that",
    " open",
    "-",
    "source",
    " projects",
    " ad",
    "here",
    " to",
    " best",
    " practices",
    " for",
    " documentation",
    ",",
    " testing",
    ",",
    " and",
    " security",
    ".",
    " This",
    " will",
    " help",
    " build",
    " trust",
    " between",
    " users",
    " and",
    " developers",
    ",",
    " fost",
    "ering",
    " a",
    " more",
    " collabor",
    "ative",
    " environment",
    " where",
    " both",
    " parties",
    " can",
    " benefit",
    " from",
    " each",
    " other",
    "'",
    "s",
    " expertise",
    ".",
    ""
  ],
  "started_at": "2024-01-02T14:38:46.312732Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://streaming-api.svc.us.c.replicate.net/v1/predictions/mdgtvndbditjr6aet534tthpkm",
    "get": "https://api.replicate.com/v1/predictions/mdgtvndbditjr6aet534tthpkm",
    "cancel": "https://api.replicate.com/v1/predictions/mdgtvndbditjr6aet534tthpkm/cancel"
  },
  "version": "955f2924d182e60e80caedecd15261d03d4ccc0151ff08e7fb14d0cad1fbcca6"
}

Generated in

7.7 seconds

Tweak it Report View full prediction

Run time and cost

This model costs approximately $0.13 to run on Replicate, or 7 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 130 seconds. The predict time for this model varies significantly based on the inputs.

Readme

“Nous Hermes 2 - SOLAR 10.7B is the flagship Nous Research model on the SOLAR 10.7B base model.

Nous Hermes 2 SOLAR 10.7B was trained on 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape.”

The original model can be found here. The quantized version used is Q5_K_M, and can be found here.