titocosta / starling

Starling-LM-7B-alpha

  • Public
  • 46 runs
  • GitHub
  • Paper
  • License

Run time and cost

This model costs approximately $0.24 to run on Replicate, or 4 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 6 minutes. The predict time for this model varies significantly based on the inputs.

Readme

Starling-LM-7B-alpha

Supports Replicate streaming.

Model card follows below.

Developed by: Banghua Zhu * , Evan Frick * , Tianhao Wu * , Hanlin Zhu and Jiantao Jiao. Model type: Language Model finetuned with RLHF / RLAIF License: Non commercial license

Finetuned from model: Openchat 3.5 (based on Mistral-7B-v0.1) We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset, berkeley-nest/Nectar, and our new reward training and policy tuning pipeline. Starling-7B-alpha scores 8.09 in MT Bench with GPT-4 as a judge, outperforming every model to date on MT-Bench except for OpenAI’s GPT-4 and GPT-4 Turbo. We release the ranking dataset Nectar, the reward model Starling-RM-7B-alpha and the language model Starling-LM-7B-alpha on HuggingFace, and an online demo in LMSYS Chatbot Arena. Stay tuned for our forthcoming code and paper, which will provide more details on the whole process.

Starling-LM-7B-alpha is a language model trained from Openchat 3.5 with reward model berkeley-nest/Starling-RM-7B-alpha and policy optimization method advantage-induced policy alignment (APA).