kcaverly / openchat-3.5-1210-gguf

The "Overall Best Performing Open Source 7B Model" for Coding + Generalization or Mathematical Reasoning

  • Public
  • 26.2K runs
  • GitHub
  • Paper

Run time and cost

This model costs approximately $0.012 to run on Replicate, or 83 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 17 seconds. The predict time for this model varies significantly based on the inputs.

Readme

A quantized, version of the OpenChat 3.5 1210 model.

The original model can be found here. The quantized version used is Q5_K_M, and can be found here.

There are two separate ‘modes’ available, which can be chosen via the prompt. This model has no system prompt.

Default Mode (GPT4 Correct): Best for coding, chat and general tasks

GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:

Mathematical Reasoning Mode: Tailored for solving math problems

Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant: