paragekbote/gemma3-torchao-quant-sparse
An optimized gemma-3-4b setup with INT8 weight-only quantization, torch_compile and sparsity for efficient inference.
Public
32
runs
-
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- bb089c7886c920e0dd50e6d002d57b32cc0bbc98
44626bdc
Latest -
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- bb089c7886c920e0dd50e6d002d57b32cc0bbc98
-
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- 4fc4406854715fa734121ed81374dac58a0d0336
-
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- 8192e4347298cae48a9aeb5941ae4ab8e20b5438
-
- Author
-
@paragekbote
- Version
- cuda12.1-python3.10-X64
- Commit
- 9ecdb2634057d900c67930024429ee770d0396cc