kcaverly / openchat-3.5-1210-gguf

The "Overall Best Performing Open Source 7B Model" for Coding + Generalization or Mathematical Reasoning

  • Public
  • 26.2K runs
  • L40S
  • GitHub
  • Paper

Input

*string
Shift + Return to add a new line

Instruction for model

string
Shift + Return to add a new line

Template to pass to model. Override if you are providing multi-turn instructions.

Default: "GPT Correct User: {prompt}<|end_of_turn|>GPT4 Correct Assistant: "

integer

Maximum new tokens to generate.

Default: -1

number

This parameter plays a role in controlling the behavior of an AI language model during conversation or text generation. Its purpose is to discourage the model from repeating itself too often by increasing the likelihood of following up with different content after each response. By adjusting this parameter, users can influence the model's tendency to either stay within familiar topics (lower penalty) or explore new ones (higher penalty). For instance, setting a high repeat penalty might result in more varied and dynamic conversations, whereas a low penalty could be suitable for scenarios where consistency and predictability are preferred.

Default: 1.1

number

This parameter used to control the 'warmth' or responsiveness of an AI model based on the LLaMA architecture. It adjusts how likely the model is to generate new, unexpected information versus sticking closely to what it has been trained on. A higher value for this parameter can lead to more creative and diverse responses, while a lower value results in safer, more conservative answers that are closer to those found in its training data. This parameter is particularly useful when fine-tuning models for specific tasks where you want to balance between generating novel insights and maintaining accuracy and coherence.

Default: 0.7

Output

Since Sally is the only girl in her family, she must be considered as one of the "sisters" mentioned. Therefore, Sally has 2 sisters (including herself).
Generated in

Run time and cost

This model costs approximately $0.016 to run on Replicate, or 62 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 17 seconds. The predict time for this model varies significantly based on the inputs.

Readme

A quantized, version of the OpenChat 3.5 1210 model.

The original model can be found here. The quantized version used is Q5_K_M, and can be found here.

There are two separate ‘modes’ available, which can be chosen via the prompt. This model has no system prompt.

Default Mode (GPT4 Correct): Best for coding, chat and general tasks

GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:

Mathematical Reasoning Mode: Tailored for solving math problems

Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant: