
paragekbote/gemma3-torchao-quant-sparse

An optimized gemma-3-4b setup with INT8 weight-only quantization, torch_compile and sparsity for efficient inference.
Public
12
runs
-
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- 4fc4406854715fa734121ed81374dac58a0d0336
761f5c47
Latest -
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- 8192e4347298cae48a9aeb5941ae4ab8e20b5438
-
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- 9ecdb2634057d900c67930024429ee770d0396cc