tomasmcm / neural-chat-7b-v3-1

Source: Intel/neural-chat-7b-v3-1 ✦ Quant: TheBloke/neural-chat-7B-v3-1-AWQ ✦ Fine-tuned model based on mistralai/Mistral-7B-v0.1

  • Public
  • 775 runs
  • L40S
  • Paper
  • License
Iterate in playground

Input

*string
Shift + Return to add a new line

Text prompt to send to the model.

integer

Maximum number of tokens to generate per output sequence.

Default: 128

number
(minimum: -5, maximum: 5)

Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

Default: 0

number
(minimum: -5, maximum: 5)

Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

Default: 0

number
(minimum: 0.01, maximum: 5)

Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.

Default: 0.8

number
(minimum: 0.01, maximum: 1)

Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens.

Default: 0.95

integer

Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.

Default: -1

string
Shift + Return to add a new line

List of strings that stop the generation when they are generated. The returned output will not contain the stop strings.

Output

In Shanghai, there are numerous must-visit attractions that cater to various interests. Here's a list of top five attractions you simply cannot miss: 1. The Bund: A famous waterfront promenade lined with beautiful colonial architecture. 2. Yuyuan Garden: A historic garden filled with traditional Chinese elements, rich in history and culture. 3. Shanghai Disney Resort: A stunning amusement park with themed areas, shows, and entertainment. 4. Jade Buddha Temple: A renowned Buddhist temple featuring exquisite jade Buddha statues. 5.
Generated in

Run time and cost

This model costs approximately $0.0020 to run on Replicate, or 500 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 3 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Fine-tuning on Habana Gaudi2

This model is a fine-tuned model based on mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca. Then we align it with DPO algorithm. For more details, you can refer our blog: The Practice of Supervised Fine-tuning and Direct Preference Optimization on Habana Gaudi2.

Model date

Neural-chat-7b-v3-1 was trained between September and October, 2023.

Evaluation

We submit our model to open_llm_leaderboard, and the model performance has been improved significantly as we see from the average metric of 7 tasks from the leaderboard.

Model Average ⬆️ ARC (25-s) ⬆️ HellaSwag (10-s) ⬆️ MMLU (5-s) ⬆️ TruthfulQA (MC) (0-s) ⬆️ Winogrande (5-s) GSM8K (5-s) DROP (3-s)
mistralai/Mistral-7B-v0.1 50.32 59.58 83.31 64.16 42.15 78.37 18.12 6.14
Intel/neural-chat-7b-v3 57.31 67.15 83.29 62.26 58.77 78.06 1.21 50.43
Intel/neural-chat-7b-v3-1 59.06 66.21 83.64 62.37 59.65 78.14 19.56 43.84

Training procedure

Training hyperparameters

The following hyperparameters were used during training: - learning_rate: 1e-04 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-HPU - num_devices: 8 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.03 - num_epochs: 2.0

Prompt Template

### System:
{system}
### User:
{usr}
### Assistant:

Inference with transformers

import transformers
model = transformers.AutoModelForCausalLM.from_pretrained(
  'Intel/neural-chat-7b-v3-1'
)

Ethical Considerations and Limitations

neural-chat-7b-v3-1 can produce factually incorrect output, and should not be relied on to produce factually accurate information. neural-chat-7b-v3-1 was trained on Open-Orca/SlimOrca based on mistralai/Mistral-7B-v0.1. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of neural-chat-7b-v3-1, developers should perform safety testing.

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please cosult an attorney before using this model for commercial purposes.

Organizations developing the model

The NeuralChat team with members from Intel/SATG/AIA/AIPT. Core team members: Kaokao Lv, Liang Lv, Chang Wang, Wenxin Zhang, Xuhui Ren, and Haihao Shen.

  • Intel Neural Compressor link
  • Intel Extension for Transformers link