Replicate Intelligence #6

Welcome to Replicate’s weekly bulletin! Each week, we’ll bring you updates on the latest open-source AI models, tools, and research. People are making cool stuff and we want to share it with you. Without further ado, here’s our hacker-in-residence deepfates with an unfiltered take on the week in AI.

Editor’s note

It’s been a long week for me and I have many more busy days before I can actually catch up on everything. Forgive me for sending you such a short letter. I couldn’t bear to send nothing at all.

--- deepfates

New language models from Google

The new Gemma2 models were released in 9b and 27b sizes. They’re overtrained on tokens, as seems to be the trend since Llama3 at least. They’re also distilled from larger Gemini models? And everyone’s talking about the alternating global/local attention layers, also found in the Character.AI blog post (see below)…

post | paper | try on replicate

Cool tools

Updated leaderboard for language models

Huggingface have updated their previous meta-benchmark to include harder evaluations. They choose evals that are high quality, reliable, not widely contaminated into datasets, and measure interesting skills. The rankings pass my sniff test so far: Qwen 72b holds a strong lead against Meta LLama 3, which edges out Mixtral 8x22B, and so on.

post | leaderboard

Research radar

How to optimize AI inference for real

Character.AI serve 20,000 inference queries per second. This is a concise yet specific guide to the optimizations they use to do that --- including hybrid attention, as mentioned earlier, and stateful caching for the long, repetitive chat histories they have to include with every turn of the conversation.

post

Changelog

How to get the best results from Stable Diffusion 3

Stable Diffusion 3 has been out for a couple weeks now. Our in-house AI experimenter @fofrAI has gotten some great results, but it’s not always easy. Learn how to pick the right version, craft quality prompts, and get the right settings in our blog post.