lucataco / llama-2-7b-chat

Meta's Llama 2 7b Chat - GPTQ

  • Public
  • 20.1K runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 6 seconds.

Readme

This an attempt at an implementation of the model: TheBloke/Llama-2-7b-Chat-GPTQ

A quantized version of Llama 2 7b model

Give me a follow if you like my work! @lucataco93