Readme
This an attempt at an implementation of the model: TheBloke/Llama-2-70b-Chat-GPTQ
A quantized version of Llama 2 70b model
Meta's Llama 2 70b Chat - GPTQ
This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 5 minutes. The predict time for this model varies significantly based on the inputs.
This an attempt at an implementation of the model: TheBloke/Llama-2-70b-Chat-GPTQ
A quantized version of Llama 2 70b model