Whiskii-chat ( Based On Qwen2.5-7B-Instruct )

Model details

Base: Qwen2.5-7B-Instruct (fine-tuned, uncensored variant)
Parameters: \~7.6B
Source: huggingface.co/Qwen/Qwen2.5-7B-Instruct
License: GPL-3.0 (inherits from upstream repo)
Chat formatting: Uses tokenizer’s apply_chat_template() for Qwen chat format.

⚠️ Safety & compliance: This is an uncensored model. If you publish it, configure external moderation/guardrails as needed and comply with Replicate’s policies and the model’s license.

Inputs

Name	Type	Default	Range	Description
`prompt`	string	— (required)	—	User message/content to generate from.
`system_prompt`	string	`"You are a helpful assistant."`	—	Optional system/behavior instruction, placed before user content.
`max_new_tokens`	integer	`512`	`1`–`4096`	Maximum new tokens to generate.
`temperature`	float	`0.7`	`0.0`–`2.0`	Sampling temperature; set `0` for greedy.
`top_p`	float	`0.9`	`0.0`–`1.0`	Nucleus sampling (Top-p).
`repetition_penalty`	float	`1.05`	`0.8`–`2.0`	Penalty to reduce repetition.
`stop`	string (token)	`null`	—	Optional single token used as `eos_token_id`.
`n`	integer	`1`	`1`–`4`	Number of candidates to generate. Multiple candidates are concatenated with separators in the single-string output.
`seed`	integer	`null`	—	Optional RNG seed for reproducibility.

Environment knobs

LOAD_IN_8BIT=1 — load weights in 8-bit via bitsandbytes (helps fit smaller GPUs).

Output

Type: string
Behavior:
Returns a plain string when n = 1.
When n > 1, multiple candidates are joined with \n\n---\n\n between them.
The code trims content prior to the \nassistant\n marker (Qwen chat template) if present.

Usage

Python

import replicate

client = replicate.Client(api_token="<REPLICATE_API_TOKEN>")
version = "lilcats/whiskii-chat:<version>"

out = client.run(version, input={
    "prompt": "Write a playful limerick about cats and cloud GPUs.",
    "temperature": 0.6,
    "max_new_tokens": 120,
})
print(out)

JavaScript

import Replicate from "replicate";
const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

const output = await replicate.run("lilcats/whiskii-chat:<version>", {
  input: {
    prompt: "Create a short onboarding message for new beta users.",
    temperature: 0.5,
    max_new_tokens: 120,
  },
});

console.log(output);

Example prompts

“Draft a product announcement (150 words) for a new markdown note editor with AI autocomplete.”
“Explain transformers to a 10-year-old in 5 sentences.”
“Rewrite this paragraph to be more concise: <paste text>.”

Limitations

As an uncensored model, output may be unfiltered; add external moderation if deploying publicly.
The optional stop input expects a single token (not a full string sequence). For substring stops, add post-processing.
Streaming tokens are not enabled in this template (can be added later).