alicewuv/whiskii-chat

Fine-Tuned Qwen2.5-7B-Instruct

Public
147 runs

Whiskii-chat ( Based On Qwen2.5-7B-Instruct )

Model details

  • Base: Qwen2.5-7B-Instruct (fine-tuned, uncensored variant)
  • Parameters: \~7.6B
  • Source: huggingface.co/Qwen/Qwen2.5-7B-Instruct
  • License: GPL-3.0 (inherits from upstream repo)
  • Chat formatting: Uses tokenizer’s apply_chat_template() for Qwen chat format.

⚠️ Safety & compliance: This is an uncensored model. If you publish it, configure external moderation/guardrails as needed and comply with Replicate’s policies and the model’s license.


Inputs

Name Type Default Range Description
prompt string — (required) User message/content to generate from.
system_prompt string "You are a helpful assistant." Optional system/behavior instruction, placed before user content.
max_new_tokens integer 512 14096 Maximum new tokens to generate.
temperature float 0.7 0.02.0 Sampling temperature; set 0 for greedy.
top_p float 0.9 0.01.0 Nucleus sampling (Top-p).
repetition_penalty float 1.05 0.82.0 Penalty to reduce repetition.
stop string (token) null Optional single token used as eos_token_id.
n integer 1 14 Number of candidates to generate. Multiple candidates are concatenated with separators in the single-string output.
seed integer null Optional RNG seed for reproducibility.

Environment knobs

  • LOAD_IN_8BIT=1 — load weights in 8-bit via bitsandbytes (helps fit smaller GPUs).

Output

  • Type: string
  • Behavior:

  • Returns a plain string when n = 1.

  • When n > 1, multiple candidates are joined with \n\n---\n\n between them.
  • The code trims content prior to the \nassistant\n marker (Qwen chat template) if present.

Usage

Python

import replicate

client = replicate.Client(api_token="<REPLICATE_API_TOKEN>")
version = "lilcats/whiskii-chat:<version>"

out = client.run(version, input={
    "prompt": "Write a playful limerick about cats and cloud GPUs.",
    "temperature": 0.6,
    "max_new_tokens": 120,
})
print(out)

JavaScript

import Replicate from "replicate";
const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN });

const output = await replicate.run("lilcats/whiskii-chat:<version>", {
  input: {
    prompt: "Create a short onboarding message for new beta users.",
    temperature: 0.5,
    max_new_tokens: 120,
  },
});

console.log(output);

Example prompts

  • “Draft a product announcement (150 words) for a new markdown note editor with AI autocomplete.”
  • “Explain transformers to a 10-year-old in 5 sentences.”
  • “Rewrite this paragraph to be more concise: <paste text>.”

Limitations

  • As an uncensored model, output may be unfiltered; add external moderation if deploying publicly.
  • The optional stop input expects a single token (not a full string sequence). For substring stops, add post-processing.
  • Streaming tokens are not enabled in this template (can be added later).