Latent diffusion models, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches