zsxkib / qwen2-0.5b-instruct

Qwen 2: A 0.5 billion parameter language model from Alibaba Cloud, fine tuned for chat completions

  • Public
  • 179 runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.00072 to run on Replicate, or 1388 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 1 seconds.

Readme

Qwen2-0.5B-Instruct on Replicate

This Replicate model provides access to the Qwen2-0.5B-Instruct model, part of the Qwen2 language model series. It offers three variants:

  • Qwen/Qwen2-0.5B-Instruct: Full precision model
  • Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8: 8-bit quantized model
  • Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4: 4-bit quantized model

Introduction

Qwen2 is the latest series of Qwen large language models, offering both pretrained and instruction-tuned models in five sizes: 0.5B, 1.5B, 7B, 57B-A14B, and 72B. This Replicate implementation focuses on the instruction-tuned 0.5B Qwen2 model.

Qwen2 demonstrates competitive performance against state-of-the-art open-source and proprietary models across various benchmarks, including language understanding, generation, multilingual capability, coding, mathematics, and reasoning.

For more details about Qwen2, visit:

Model Details

Qwen2 is based on the Transformer architecture and incorporates: - SwiGLU activation - Attention QKV bias - Group query attention - Improved tokenizer for multiple natural languages and code

Training Details

The model underwent pretraining with a large dataset, followed by post-training using both supervised fine-tuning and direct preference optimization.

Quickstart

To use this Replicate implementation:

  1. Visit the Replicate model page.

  2. Use the web interface or API to run a prediction with your desired parameters.

For local testing or development:

  1. Clone the repository: git clone -b Qwen2-0.5B-Instruct https://github.com/zsxkib/cog-qwen-2.git cd cog-qwen-2

  2. Run a prediction using Cog: cog predict \ -i 'top_k=1' \ -i 'top_p=1' \ -i 'prompt="Tell me a funny joke about cowboys in the style of Yoda from star wars"' \ -i 'model_type="Qwen2-0.5B-Instruct"' \ -i 'temperature=1' \ -i 'system_prompt="You are a funny and helpful assistant."' \ -i 'max_new_tokens=512' \ -i 'repetition_penalty=1'

Evaluation

Performance comparison between Qwen2-0.5B-Instruct and Qwen1.5-0.5B-Chat:

Dataset Qwen1.5-0.5B-Chat Qwen2-0.5B-Instruct Qwen1.5-1.8B-Chat Qwen2-1.5B-Instruct
MMLU 35.0 37.9 43.7 52.4
HumanEval 9.1 17.1 25.0 37.8
GSM8K 11.3 40.1 35.3 61.6
C-Eval 37.2 45.2 55.3 63.8
IFEval (Prompt Strict-Acc.) 14.6 20.0 16.8 29.0

Citation

If you find the Qwen2 model helpful in your work, please cite:

@article{qwen2,
  title={Qwen2 Technical Report},
  year={2024}
}

License

The Qwen2 model is licensed under the Apache 2.0 License.

Credits and Support