Jet-setting with Llama 2 + Grammars

Llamas may be docile by nature, but they have a stubborn streak. Push them too far, and they’re liable to spit out something foul and unpleasant.

True to their real-life counterpart, it can be challenging to get Meta’s Llama 2 to do exactly what you want. Which is fine for some generation tasks, but problematic for anything requiring syntactic perfection. Prompt engineering, few-shot examples, and fine-tuning can all help massage output into a desired shape. But grammars are the only sure-fire way to get exactly what you want, every time.

In this post, we’ll explore a family of Llama 2 models with built-in support for grammars, and show how you can use it for information extraction tasks.

Last month, Replicate hosted its first hackathon in San Francisco. It was lovely. I had a great time chatting with attendees and hanging out with fellow colleagues who also flew in for the event. Watching the demos, I thought to myself, “We really are living in the future”.

But, in a moment, the spell was broken. You know what snapped me out it? The email confirming my return flight back to Portland, which — get this — didn’t attach a calendar event. The nerve! It’s 2023, and I’m expected to… read? (Or worse, accept Siri’s help and create an event automatically?)

It's time to check in for your flight.
Use the [REDACTED] app for smooth sailing and we'll see you soon!
Confirmation code:
ABCDEF
CHECK IN
Your trip details

420
Seat  10D
5:00 PM            6:30 PM
SFO                PDX
San Francisco      Portland, OR

Departure
9/20/2023
Arrival
9/20/2023

I was having none of it. So I picked up my MacBook, cracked open the bottle of Stumptown cold brew that I’d smuggled in my luggage, and got to work.

Flight plan

My inspiration came from the project Replicate co-founder Andreas Jansson demoed at the hackathon. He took a bunch of 8-bit synth compositions and used Llama.cpp’s grammar decoder to constrain output to valid one-liners.

Taking a similar approach, I could specify a grammar to constrain output to a JSON document matching a given JSON Schema.

Here’s a simple schema for extracting the details of the flight:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "origin": {
      "type": "string",
      "description": "Three-letter ICAO airport code for the departure airport."
    },
    "destination": {
      "type": "string",
      "description": "Three-letter ICAO airport code for the departure airport."
    },
    "date": {
      "type": "string",
      "format": "date"
    },
    "departure_time": {
      "type": "string",
      "format": "time"
    },
    "arrival_time": {
      "type": "string",
      "format": "time"
    }
  },
  "required": ["origin", "destination", "date", "departure_time", "arrival_time"],
  "additionalProperties": false
}

Using Replicate’s JavaScript client library, I ran Andreas’ model, passing the JSON schema and the original text from my flight confirmation.

import Replicate from "replicate";

const replicate = new Replicate();

const model = "andreasjansson/codellama-34b-instruct-gguf:f1091fa795c142a018268b193c9eea729e0a3f4d55d723df0b69f17b863bf5ea"
const input = {
  prompt: `
    Extract flight information from the following email.
    Use RFC 3339 date time formatting.

    [...]
  `,
  jsonschema: `{...}`,
  max_tokens: 256
};
const output = await replicate.run(model, { input });

How did it fare? It passed with flying colors.

{
  "origin": "SFO"
  "destination": "PDX",
  "date": "2023-09-20",
  "arrival_time": "17:30",
  "departure_time": "15:00",
}

It took some hand-holding to get the model to produce dates and times in RFC 3339 format, but otherwise, it was a smooth flight the whole way through.

Satisfied by my work (and realizing that the hackathon had ended hours ago), I closed my laptop, got into bed, and instantly regretted drinking all that coffee a few paragraphs earlier 🫨

Cleared for takeoff

Years ago, I got hooked on TripIt’s trip planning workflow. Forward flight and hotel confirmation emails to plans@tripit.com, and they automatically became events in your personal iCalendar feed. Back in the 00s, that was nothing short of magic. But in the age of AI, you can whip up a prototype in a just a few minutes. (Heck, CodeLlama could’ve written that code for me, if I’d thought to ask!)

Don’t get me wrong: LLMs are by no means a replacement for conventional NLP tools, like SpaCy or NLTK (or even regular expressions!). Someone who knows their stuff could run circles around a general-purpose AI model. On the other hand, solving the problem may require more time and expertise than you have. In my case, I’d dreamed for years about building my own TripIt importer, without ever really getting it to work reliably.

Now that you know what’s possible with Llama 2 and grammars, perhaps you’ll take inspiration to revisit some long-forgotten idea that proved insurmountable way back when.

You are now free to move about the cabin

This post was just a pre-boarding into what you can make with Llama 2. If you’re looking to upgrade your understanding, these resources are first class:

Explore our stable of Llama 2 models with support for grammars 🧑‍🏫🦙
Read through the PR that added grammar-based samples to llama.cpp to understand the implementation details 🤓🦙
Become a llama whisperer with this guide to prompting Llama 2 🤫🦙
Teach your llama new tricks by fine-tuning Llama 2 from a dataset 🏋️🦙