kcaverly / phind-codellama-34b-v2-gguf

A quantized 34B parameter language model from Phind for code completion

  • Public
  • 230 runs
  • L40S
  • GitHub
  • Paper
  • License

Input

*string
Shift + Return to add a new line

Instruction for model

string
Shift + Return to add a new line

System prompt for the model, helps guides model behaviour.

Default: "You are an intelligent programming assistant."

string
Shift + Return to add a new line

Template to pass to model. Override if you are providing multi-turn instructions.

Default: "### System Prompt\n{system_prompt}\n### User Message\n{prompt}\n### Assistant\n"

integer

Maximum new tokens to generate.

Default: -1

number

This parameter controls how many of the highest-probability words are selected to be included in the generated text

Default: 0.75

integer

This is the number of probable next words, to create a pool of words to choose from

Default: 40

number

This parameter used to control the 'warmth' or responsiveness of an AI model based on the LLaMA architecture. It adjusts how likely the model is to generate new, unexpected information versus sticking closely to what it has been trained on. A higher value for this parameter can lead to more creative and diverse responses, while a lower value results in safer, more conservative answers that are closer to those found in its training data. This parameter is particularly useful when fine-tuning models for specific tasks where you want to balance between generating novel insights and maintaining accuracy and coherence.

Default: 0.01

Output

```rust pub enum PredictionStatus { Starting, InProgress, Completed, } ```
Generated in

Run time and cost

This model costs approximately $0.18 to run on Replicate, or 5 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 4 minutes. The predict time for this model varies significantly based on the inputs.

Readme

TheBloke’s quantized version of Phind’s CodeLlama 33B V2 Instruct model in GGUF format. The full model card can be found here.

Specifically, this is the phind-codellama-34b-v2.Q5_K_M.gguf model, with a 4096 context window.