paragekbote/gemma3-torchao-quant-sparse

An optimized gemma-3-4b setup with INT8 weight-only quantization, torch_compile and sparsity for efficient inference.

Public
12 runs
  1. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    4fc4406854715fa734121ed81374dac58a0d0336

    761f5c47

    Latest
  2. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    8192e4347298cae48a9aeb5941ae4ab8e20b5438
  3. Author
    @paragekbote
    Version
    cuda12.1-python3.10-X64
    Commit
    9ecdb2634057d900c67930024429ee770d0396cc