paragekbote/gemma3-torchao-quant-sparse

An optimized gemma-3-4b setup with INT8 weight-only quantization, torch_compile and sparsity for efficient inference.

Public
32 runs
  1. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    bb089c7886c920e0dd50e6d002d57b32cc0bbc98

    44626bdc

    Latest
  2. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    bb089c7886c920e0dd50e6d002d57b32cc0bbc98
  3. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    4fc4406854715fa734121ed81374dac58a0d0336
  4. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    8192e4347298cae48a9aeb5941ae4ab8e20b5438
  5. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    9ecdb2634057d900c67930024429ee770d0396cc