titocosta/starling | Readme and Docs

Starling-LM-7B-alpha

Supports Replicate streaming.

Model card follows below.

Developed by: Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao. Model type: Language Model finetuned with RLHF / RLAIF License: Non commercial license

Finetuned from model: Openchat 3.5 (based on Mistral-7B-v0.1) We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset, berkeley-nest/Nectar, and our new reward training and policy tuning pipeline. Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo. We release the ranking dataset Nectar, the reward model Starling-RM-7B-alpha and the language model Starling-LM-7B-alpha on HuggingFace, and an online demo in LMSYS Chatbot Arena. Stay tuned for our forthcoming code and paper, which will provide more details on the whole process.

Starling-LM-7B-alpha is a language model trained from Openchat 3.5 with reward model berkeley-nest/Starling-RM-7B-alpha and policy optimization method advantage-induced policy alignment (APA).

Model created over 1 year ago