Google's Gemma2 models, language model leaderboard, tips for Stable Diffusion 3
Posted by @deepfates
Editor’s note
It’s been a long week for me and I have many more busy days before I can actually catch up on everything. Forgive me for sending you such a short letter. I couldn’t bear to send nothing at all.
Trending models
New language models from Google
The new Gemma2 models were released in 9b and 27b sizes. They’re overtrained on tokens, as seems to be the trend since Llama3 at least. They’re also distilled from larger Gemini models? And everyone’s talking about the alternating global/local attention layers, also found in the Character.AI blog post (see below)....
post | paper | try on replicate
Cool tools
Updated leaderboard for language models
Huggingface have updated their previous meta-benchmark to include harder evaluations. They choose evals that are high quality, reliable, not widely contaminated into datasets, and measure interesting skills. The rankings pass my sniff test so far: Qwen 72b holds a strong lead against Meta LLama 3, which edges out Mixtral 8x22B, and so on.
Research radar
How to optimize AI inference for real
Character.AI serve 20,000 inference queries per second. This is a concise yet specific guide to the optimizations they use to do that — including hybrid attention, as mentioned earlier, and stateful caching for the long, repetitive chat histories they have to include with every turn of the conversation.
Changelog
How to get the best results from Stable Diffusion 3
Stable Diffusion 3 has been out for a couple weeks now. Our in-house AI experimenter @fofrAI has gotten some great results, but it’s not always easy. Learn how to pick the right version, craft quality prompts, and get the right settings in our blog post.
Bye for now
That’s it. That’s literally all that happened this week. Am I wrong? Reply and let me know. I will make my apologies next week.
— deepfates