See the full model card here
The model served here is the AWQ quantized version from here. Thank you to @TheBloke for sharing this model!
NOTE: As per the license, replicate was granted permission to share the model here.
NOTE: The sequence length here is limited to 8024 tokens due to GPU memory constraints.