Versions – paragekbote/gemma3-torchao-quant-sparse | Replicate

An optimized gemma-3-4b setup with INT8 weight-only quantization, torch_compile and sparsity for efficient inference.

Public

45 runs

License

Weights

Playground API Examples README Versions

2 months ago

Author

@paragekbote

Version

cuda12.1-python3.10-X64

Commit

bb089c7886c920e0dd50e6d002d57b32cc0bbc98

44626bdc
Latest
2 months ago

Author

@paragekbote

Version

cuda12.1-python3.10-X64

Commit

bb089c7886c920e0dd50e6d002d57b32cc0bbc98

56167886
2 months, 2 weeks ago

Author

@paragekbote

Version

cuda12.1-python3.10-X64

Commit

4fc4406854715fa734121ed81374dac58a0d0336

761f5c47
2 months, 3 weeks ago

Author

@paragekbote

Version

cuda12.1-python3.10-X64

Commit

8192e4347298cae48a9aeb5941ae4ab8e20b5438

ee0d1f2a
2 months, 3 weeks ago

Author

@paragekbote

Version

cuda12.1-python3.10-X64

Commit

9ecdb2634057d900c67930024429ee770d0396cc

ff274c57