DIY Llama 3 implementation, open-source smart glasses, steering language models with dictionary learning

Editor’s note

Hey, it’s me, deepfates from the internet. They gave me the keys to the mailing list and said I can experiment with it.

Here’s my theory: You’re trying to do cool things with AI. You don’t need any more news bites about the closed AI platforms. Everyone’s already talking about them.

You want to know about new and cool things in AI, the DIY hacker stuff that you might have missed while building.

Well, at Replicate it’s our job to know.

So I’m tapping the deep knowledge of our Replicants and our community, to bring you all the news you really want, in one weekly update. And if you just want the changelog I fully trust you to scroll on past the other bits.

If you like it, and especially if you don’t, hit reply and let me know. I’ll read the replies at standup until they tell me to stop.

deepfates


A commercial-friendly voice cloning model

OpenVoice v2 is a voice cloning model from MIT CSAIL and MyShell.

The new v2 model has better audio quality, multilingual support, and commercial usage allowed under MIT license. English, Spanish, French, Chinese, Japanese and Korean are supported.

try on replicate

Language model implemented from scratch

A tutorial notebook that implements the matrix math for Llama 3 one operation at a time. Great explanation and fun visuals.

From AAAAAAAAAA, a lab focused on making research more accessible.

It looks great! Fully unwrapped it’s a lot easier to see what’s actually going on then with modules nesting and calling each other around

— Andrej Karpathy (twitter)

github | twitter

Converting pretrained models to a totally different architecture

Researchers at Toyota Research Institute (yes, that Toyota) find a way to convert existing language models from transformers (like GPT) into recurrent neural networks (like Mamba or RWKV).

Recurrent networks have different features than current transformer models: they’re much more efficient to run, but harder to train. This allows you to start an RNN from a pre-trained transformer model with all its existing knowledge.

This process is called Scalable UPtraining for Recurrent Attention (SUPRA), or linearizing, or m up-training. They release a Mistral RNN up-trained with SUPRA, and code that you can use to linearize models yourself.

github | paper | model


Cool tools

Open source smart glasses

Everybody’s announcing smart glasses these days. Meta, Amazon, now… Based Hardware?

The OpenGlass open source project costs $20 to build and won first place at the Meta Llama 3 Hackathon last week. It uses the moondream vision model to understand a stream of images and Llama 3 to talk about them. A small camera and battery attach to any glasses frame.

github | post | youtube demo


Research radar

Steer and interpret large language models with dictionary learning

Anthropic published a new paper with the ironically unreadable title Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet.

This scales up their previous work on interpreting and steering language models to work with the medium size Claude model.

They train a smaller model (a sparse autoencoder) to look at the language model’s activations and find common features that can be turned up or down to change the model’s outputs. This strategy is called dictionary learning because the model learns a finite set of “concepts” that are coherent across different languages and contexts.

One of the findings was a feature corresponding to the Golden Gate bridge. They found when you turn this feature up 10x, Claude literally thinks it is itself the bridge. This led to some other surprising bridge reveals. The best part: THEY LET US TALK TO GOLDEN GATE CLAUDE!

You may remember similar work recently on “control vectors” and “representation engineering”. This is notably absent from the Anthropic paper, causing some stir on twitter and LessWrong. If you want to try similar things with open models, the repeng repo for control vectors is MIT licensed.

post | paper

A longread on density in user interfaces

Matthew Ström leads design teams at Stripe. In this post he talks about UI density along different axes: visual space, but also information density, number of design decisions per feature, and temporal density (faster interfaces feel denser). The ultimate goal is to maximize value to the user in the smallest amount of time and space.

post


Changelog

Use GitHub Actions to push models

Did you know you can use GitHub Actions workflows to continuously push your models? Just drop a YAML file in your GitHub repo and you’re good to go. We wrote a guide that walks you through the process.

If you’re feeling saucy, you could even use one of these workflows to deploy a short-lived ephemeral staging model for every pull request. Replicate has APIs for creating and deleting models now, so it’s all do-able in userland.

zeke

guide | twitter

What is semantic search? A video

Charlie and Zeke recorded a short demo to show how easy it is to add image search to a web application using the ImageBind model from Meta.

twitter | youtube

RSS and Atom feeds

We now have feeds for our blog, changelog, and service status updates. Subscribe to stay up-to-date with the latest from Replicate:


Bye for now

For the one person who read this far: What do you think?

Do you want more research? More tooling? More weird tweets? Fewer annoying questions? Romantic advice?

I’m doing this for you. Tell me what you want.

— deepfates