Readme
Whiskii-chat ( Based On Qwen2.5-7B-Instruct )
Model details
- Base: Qwen2.5-7B-Instruct (fine-tuned, uncensored variant)
- Parameters: \~7.6B
- Source:
huggingface.co/Qwen/Qwen2.5-7B-Instruct
- License: GPL-3.0 (inherits from upstream repo)
- Chat formatting: Uses tokenizer’s
apply_chat_template()
for Qwen chat format.
⚠️ Safety & compliance: This is an uncensored model. If you publish it, configure external moderation/guardrails as needed and comply with Replicate’s policies and the model’s license.
Inputs
Name | Type | Default | Range | Description |
---|---|---|---|---|
prompt |
string | — (required) | — | User message/content to generate from. |
system_prompt |
string | "You are a helpful assistant." |
— | Optional system/behavior instruction, placed before user content. |
max_new_tokens |
integer | 512 |
1 –4096 |
Maximum new tokens to generate. |
temperature |
float | 0.7 |
0.0 –2.0 |
Sampling temperature; set 0 for greedy. |
top_p |
float | 0.9 |
0.0 –1.0 |
Nucleus sampling (Top-p). |
repetition_penalty |
float | 1.05 |
0.8 –2.0 |
Penalty to reduce repetition. |
stop |
string (token) | null |
— | Optional single token used as eos_token_id . |
n |
integer | 1 |
1 –4 |
Number of candidates to generate. Multiple candidates are concatenated with separators in the single-string output. |
seed |
integer | null |
— | Optional RNG seed for reproducibility. |
Environment knobs
LOAD_IN_8BIT=1
— load weights in 8-bit viabitsandbytes
(helps fit smaller GPUs).
Output
- Type:
string
-
Behavior:
-
Returns a plain string when
n = 1
. - When
n > 1
, multiple candidates are joined with\n\n---\n\n
between them. - The code trims content prior to the
\nassistant\n
marker (Qwen chat template) if present.
Usage
Python
import replicate
client = replicate.Client(api_token="<REPLICATE_API_TOKEN>")
version = "lilcats/whiskii-chat:<version>"
out = client.run(version, input={
"prompt": "Write a playful limerick about cats and cloud GPUs.",
"temperature": 0.6,
"max_new_tokens": 120,
})
print(out)
JavaScript
import Replicate from "replicate";
const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });
const output = await replicate.run("lilcats/whiskii-chat:<version>", {
input: {
prompt: "Create a short onboarding message for new beta users.",
temperature: 0.5,
max_new_tokens: 120,
},
});
console.log(output);
Example prompts
- “Draft a product announcement (150 words) for a new markdown note editor with AI autocomplete.”
- “Explain transformers to a 10-year-old in 5 sentences.”
- “Rewrite this paragraph to be more concise: <paste text>.”
Limitations
- As an uncensored model, output may be unfiltered; add external moderation if deploying publicly.
- The optional
stop
input expects a single token (not a full string sequence). For substring stops, add post-processing. - Streaming tokens are not enabled in this template (can be added later).