Readme
Model: https://huggingface.co/TheBloke/airoboros-7B-gpt4-1.4-GPTQ
Fast inference thanks to https://github.com/turboderp/exllama
Model created
Test out fast inference with ExLlama and 4bit quantization!
This model runs on Nvidia A100 (80GB) GPU hardware. We don't yet have enough runs of this model to provide performance information.
Model: https://huggingface.co/TheBloke/airoboros-7B-gpt4-1.4-GPTQ
Fast inference thanks to https://github.com/turboderp/exllama