kcaverly/openchat-3.5-1210-gguf | Run with an API on Replicate

Run time and cost

This model costs approximately $0.016 to run on Replicate, or 62 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 17 seconds. The predict time for this model varies significantly based on the inputs.

Readme

A quantized, version of the OpenChat 3.5 1210 model.

The original model can be found here. The quantized version used is Q5_K_M, and can be found here.

There are two separate ‘modes’ available, which can be chosen via the prompt. This model has no system prompt.

Default Mode (GPT4 Correct): Best for coding, chat and general tasks

Mathematical Reasoning Mode: Tailored for solving math problems

Math Correct User: 10.3 − 7988.8133=<|end_of_turn|>Math Correct Assistant: