Building a face swapping app with Val Town, HTMX, and Replicate

Warning

This guide is no longer supported, and includes references to deprecated models. Check out the docs homepage for more up-to-date guides and examples.

Now that we've learned about the tools and concepts, let's build a hypermedia AI app. We'll use Val Town to create a serverless function, HTMX to add hypermedia controls to the page, and Replicate to call AI models.

The concept

Face-swapping models are increasingly popular. And why not? It's fun to insert your face into a painting or movie screenshot. But wouldn't it be cool if you could do it with generated images, so you could star in any movie you can imagine?

Let's make a face swap app that lets you insert your face into a generated image. We'll use the popular lucataco/faceswap to switch the faces, and the photorealistic model adirik/realvisxl-v3.0-turbo to generate the imaginary scenes.

Preparing your development environment

This part is surprisingly easy. You don't need to install anything. You can follow along in the browser.

The only other preparation is to get a Replicate API token and set it as REPLICATE_API_TOKEN in your Val Town Environment Variables. This will let you call Replicate models from your val.

Watch out: this will also let anyone else call Replicate models from your val. Val Town protects the security of your environment variable, but it doesn't secure your HTTP endpoint or do any authentication. We won't do that for this demo app, but keep it in mind when building your own tool.

Start with a val

Conveniently, Val Town has an HTMX template we can use:

This is an embedded val, so you can run it here. It's just a hello world example right now, but it does have a live endpoint. You can click "Browser preview" in the bottom panel of the val (or "Open HTTP endpoint" in the top bar) and observe the dynamic behavior enabled by HTMX. Send a request, get a response and update a separate part of the DOM. All with just a 3kb library and two attributes:

hx-target="#answer" hx-post="/"

And all without a full page reload. It's... oddly refreshing.

Understanding HTMX attributes

HTMX adds a few attributes to HTML tags so you can add hypermedia controls to your page. The ones to know here are:

hx-get: fetch a resource
hx-post: send a request
hx-trigger: specify the event that triggers a request
hx-swap: define how to update the page after a request
hx-target: specify the element to update after a request

You can read more about the attributes in the HTMX documentation.

Integrate Replicate

Before we wire up the hypermedia controls, let's add the Replicate client to our val to call the models. We'll use the replicate package from npm. This code will run on the Val Town platform, so we don't need to install anything locally.

We also add the function calls that we use to call the models, with parameters set for a nice image. First we send a text prompt to the image generation model, and then send the generated image to the face swap model, and finally return the result.

Note that we encode the uploaded image into a Data URL. This is so we can pass it from the client to the val without putting it on a server or in blob storage somewhere.

Building the UI

Now we'll add the controls to the app: a form to upload a face image and a text input to prompt the model.

When you submit the form, it sends a POST request to the Val Town endpoint with the form data, which will call our Replicate models and get the generated image. Finally the val sends back a fragment of HTML with the generated image, and HTMX updates the DOM by putting the new fragment at the target.

Add styles

We have a working app! But it's not very pretty. We can make it look nicer by adding some CSS.

Let's use terminal.css to give it a clean retro look. We could import this from npm through Val Town, but it's just a css file that creates utility classes, so to emphasize the power of hypermedia we'll just include it with a style tag like HTML of old.

Once we've added the style tag, we can include the classes in our HTML. Let's give it a fun name too.

Add a loading indicator

It would also be nice to load more than one image at once. Since we're using hypermedia as the engine of application state, we can do that by retargeting the new content. Using hx-swap we can target the element before the most recent image, so they'll be stacked in reverse chronological order.

Finally let's also add a loading indicator so we know when the app is working. We'll add an attribute hx-indicator to the form, and HTMX will add a class .htmx-request to the element while the request is in progress. We can use a small piece of CSS to hide the indicator when not loading.

Customizing and Extending Your App

One of the best things about creating tools with Replicate is the ability to modify and customize them in real time to suit your needs.

Integrating a New AI Model

For instance, if the images generated by the app aren't aesthetically pleasing, you don't have to keep tweaking the prompt each time. Instead, you can integrate a language model to engineer the prompt for you. A popular open-source choice with instruction following abilities is mistralai/mistral-7b-instruct-v0.1. This model can transform a simple prompt into a more complex one, resulting in a better image.

We can use a metaprompt like the following to transform our short text suggestions into evocative descriptions that the image generation model can use to create a more interesting image.

Take the description that follows and imagine it vividly as a movie scene, describing character, action, setting, composition, lighting, and mood. Write your answer in the form of a brief terse sentence. Include only the text of your answer, no other information or communication.
"""
${textPrompt}
"""

And then we just pass the prompt through the language model before sending it to the image generation model.

Considering Real-World Factors

While this might not be a full-fledged app, it's a great balance of simplicity and power for a personal tool. However, in a real-world scenario, you'd want to consider factors like authorization, error handling, security, scalability, performance, and cost. This doesn't mean you have to change approach -- hypermedia can scale.

Forking the App and Further Customization

One of the key advantages of this paradigm is the ease of iteration and customization. You can fork and extend the API to suit your needs.

You can modify models, prompts, and behaviors with Replicate. The input and output from one stateless endpoint can be managed, and with the ability to have multiple vals, import them, and call their HTTP endpoints, you could even build an entire app with just vals.

The modular nature of the code means that if you need to scale up, you can easily move the code to a different platform. And since you're sending hypermedia, the app will continue to work even if the controls or the data change. The browser will be able to render it, and the user will be able to use it.

Wrapping Up

We've journeyed together through the creation of a hypermedia AI app, getting our hands dirty with practical coding and gaining a solid understanding of the key tools and concepts involved. We've seen how these elements can come together to build something that's not just functional, but also fun and imaginative.

What Lies Ahead

This guide is your starting point. Experiment with different models, tweak the code, and make it your own. The possibilities are endless, and the power to create is in your hands.