paragekbote/gemma3-torchao-quant-sparse

An optimized gemma-3-4b setup with INT8 weight-only quantization, torch_compile and sparsity for efficient inference.

Public
12 runs