DIY Llama 3 implementation, open-source smart glasses, steering language models with dictionary learning
Posted by @deepfates
Editor’s note
Hey, it’s me, deepfates from the internet. They gave me the keys to the mailing list and said I can experiment with it.
Here’s my theory: You’re trying to do cool things with AI. You don’t need any more news bites about the closed AI platforms. Everyone’s already talking about them.
You want to know about new and cool things in AI, the DIY hacker stuff that you might have missed while building.
Well, at Replicate it’s our job to know.
So I’m tapping the deep knowledge of our Replicants and our community, to bring you all the news you really want, in one weekly update. And if you just want the changelog I fully trust you to scroll on past the other bits.
If you like it, and especially if you don’t, hit reply and let me know. I’ll read the replies at standup until they tell me to stop.
Trending models
A commercial-friendly voice cloning model
OpenVoice v2 is a voice cloning model from MIT CSAIL and MyShell.
The new v2 model has better audio quality, multilingual support, and commercial usage allowed under MIT license. English, Spanish, French, Chinese, Japanese and Korean are supported.
Language model implemented from scratch
A tutorial notebook that implements the matrix math for Llama 3 one operation at a time. Great explanation and fun visuals.
From AAAAAAAAAA, a lab focused on making research more accessible.
It looks great! Fully unwrapped it’s a lot easier to see what’s actually going on then with modules nesting and calling each other around
— Andrej Karpathy (twitter)
Converting pretrained models to a totally different architecture
Researchers at Toyota Research Institute (yes, that Toyota) find a way to convert existing language models from transformers (like GPT) into recurrent neural networks (like Mamba or RWKV).
Recurrent networks have different features than current transformer models: they’re much more efficient to run, but harder to train. This allows you to start an RNN from a pre-trained transformer model with all its existing knowledge.
This process is called Scalable UPtraining for Recurrent Attention (SUPRA), or linearizing, or m up-training. They release a Mistral RNN up-trained with SUPRA, and code that you can use to linearize models yourself.
Cool tools
Open source smart glasses
Everybody’s announcing smart glasses these days. Meta, Amazon, now… Based Hardware?
The OpenGlass open source project costs $20 to build and won first place at the Meta Llama 3 Hackathon last week. It uses the moondream vision model to understand a stream of images and Llama 3 to talk about them. A small camera and battery attach to any glasses frame.
github | post | youtube demo
Research radar
Steer and interpret large language models with dictionary learning
Anthropic published a new paper with the ironically unreadable title Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet.
This scales up their previous work on interpreting and steering language models to work with the medium size Claude model.
They train a smaller model (a sparse autoencoder) to look at the language model’s activations and find common features that can be turned up or down to change the model’s outputs. This strategy is called dictionary learning because the model learns a finite set of “concepts” that are coherent across different languages and contexts.
One of the findings was a feature corresponding to the Golden Gate bridge. They found when you turn this feature up 10x, Claude literally thinks it is itself the bridge. This led to some other surprising bridge reveals. The best part: THEY LET US TALK TO GOLDEN GATE CLAUDE!
You may remember similar work recently on “control vectors” and “representation engineering”. This is notably absent from the Anthropic paper, causing some stir on twitter and LessWrong. If you want to try similar things with open models, the repeng repo for control vectors is MIT licensed.
A longread on density in user interfaces
Matthew Ström leads design teams at Stripe. In this post he talks about UI density along different axes: visual space, but also information density, number of design decisions per feature, and temporal density (faster interfaces feel denser). The ultimate goal is to maximize value to the user in the smallest amount of time and space.
Changelog
Use GitHub Actions to push models
Did you know you can use GitHub Actions workflows to continuously push your models? Just drop a YAML file in your GitHub repo and you’re good to go. We wrote a guide that walks you through the process.
If you’re feeling saucy, you could even use one of these workflows to deploy a short-lived ephemeral staging model for every pull request. Replicate has APIs for creating and deleting models now, so it’s all do-able in userland.
— zeke
What is semantic search? A video
Charlie and Zeke recorded a short demo to show how easy it is to add image search to a web application using the ImageBind model from Meta.
RSS and Atom feeds
We now have feeds for our blog, changelog, and service status updates. Subscribe to stay up-to-date with the latest from Replicate:
Bye for now
For the one person who read this far: What do you think?
Do you want more research? More tooling? More weird tweets? Fewer annoying questions? Romantic advice?
I’m doing this for you. Tell me what you want.
— deepfates