This is a quick demo of a way to “see” latents using the CompVis latent-diffusion model. The uploaded image is resized to 256x256 then encoded, which creates a 4 dimensional 32x32 tensor containing the latents representing it. It so happens we can turn this (or as is being done here, the mean of it) into either an RGBA image or 4 monochrome images, which are then upsampled back to 256x256 using simple nearest neighbor. Much of the structure of the image is retained with this specific approach, perhaps giving interesting insight into latent space.
nightmareai
/
latent-viz
Visualize the encoded latents of an image