rinnakk / japanese-stable-diffusion

Japanese-specific latent text-to-image diffusion model

Cold

Public
2.5K runs
T4
GitHub
License

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=rinnakk/japanese-stable-diffusion

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run rinnakk/japanese-stable-diffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "rinnakk/japanese-stable-diffusion:183d91429024984ac421394b65b2de4d0a51733a371d5dc38e18bac200bda4be",
  {
    input: {
      prompt: "サラリーマン 油絵",
      num_outputs: 1,
      guidance_scale: 7.5,
      num_inference_steps: 50
    }
  }
);

// To access the file URL:
console.log(output[0].url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output[0]);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run rinnakk/japanese-stable-diffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "rinnakk/japanese-stable-diffusion:183d91429024984ac421394b65b2de4d0a51733a371d5dc38e18bac200bda4be",
    input={
        "prompt": "サラリーマン 油絵",
        "num_outputs": 1,
        "guidance_scale": 7.5,
        "num_inference_steps": 50
    }
)

# To access the file URL:
print(output[0].url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output[0].read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run rinnakk/japanese-stable-diffusion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "rinnakk/japanese-stable-diffusion:183d91429024984ac421394b65b2de4d0a51733a371d5dc38e18bac200bda4be",
    "input": {
      "prompt": "サラリーマン 油絵",
      "num_outputs": 1,
      "guidance_scale": 7.5,
      "num_inference_steps": 50
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

{
  "completed_at": "2022-09-17T22:25:55.638967Z",
  "created_at": "2022-09-17T22:25:40.947972Z",
  "data_removed": false,
  "error": null,
  "id": "6cswdgufubhlvpl3umwuzl2qpq",
  "input": {
    "prompt": "サラリーマン 油絵",
    "num_outputs": 1,
    "guidance_scale": 7.5,
    "num_inference_steps": 50
  },
  "logs": "Using seed: 13560\nGlobal seed set to 13560\n\n  0%|          | 0/51 [00:00<?, ?it/s]\n  2%|▏         | 1/51 [00:00<00:08,  5.81it/s]\n  4%|▍         | 2/51 [00:00<00:11,  4.33it/s]\n  6%|▌         | 3/51 [00:00<00:11,  4.12it/s]\n  8%|▊         | 4/51 [00:00<00:11,  4.01it/s]\n 10%|▉         | 5/51 [00:01<00:11,  3.95it/s]\n 12%|█▏        | 6/51 [00:01<00:11,  3.89it/s]\n 14%|█▎        | 7/51 [00:01<00:11,  3.88it/s]\n 16%|█▌        | 8/51 [00:02<00:11,  3.87it/s]\n 18%|█▊        | 9/51 [00:02<00:10,  3.87it/s]\n 20%|█▉        | 10/51 [00:02<00:10,  3.84it/s]\n 22%|██▏       | 11/51 [00:02<00:10,  3.83it/s]\n 24%|██▎       | 12/51 [00:03<00:10,  3.84it/s]\n 25%|██▌       | 13/51 [00:03<00:09,  3.85it/s]\n 27%|██▋       | 14/51 [00:03<00:09,  3.83it/s]\n 29%|██▉       | 15/51 [00:03<00:09,  3.83it/s]\n 31%|███▏      | 16/51 [00:04<00:09,  3.83it/s]\n 33%|███▎      | 17/51 [00:04<00:08,  3.84it/s]\n 35%|███▌      | 18/51 [00:04<00:08,  3.83it/s]\n 37%|███▋      | 19/51 [00:04<00:08,  3.82it/s]\n 39%|███▉      | 20/51 [00:05<00:08,  3.81it/s]\n 41%|████      | 21/51 [00:05<00:07,  3.81it/s]\n 43%|████▎     | 22/51 [00:05<00:07,  3.81it/s]\n 45%|████▌     | 23/51 [00:05<00:07,  3.80it/s]\n 47%|████▋     | 24/51 [00:06<00:07,  3.80it/s]\n 49%|████▉     | 25/51 [00:06<00:06,  3.81it/s]\n 51%|█████     | 26/51 [00:06<00:06,  3.81it/s]\n 53%|█████▎    | 27/51 [00:06<00:06,  3.80it/s]\n 55%|█████▍    | 28/51 [00:07<00:06,  3.81it/s]\n 57%|█████▋    | 29/51 [00:07<00:05,  3.82it/s]\n 59%|█████▉    | 30/51 [00:07<00:05,  3.82it/s]\n 61%|██████    | 31/51 [00:08<00:05,  3.82it/s]\n 63%|██████▎   | 32/51 [00:08<00:04,  3.83it/s]\n 65%|██████▍   | 33/51 [00:08<00:04,  3.82it/s]\n 67%|██████▋   | 34/51 [00:08<00:04,  3.81it/s]\n 69%|██████▊   | 35/51 [00:09<00:04,  3.82it/s]\n 71%|███████   | 36/51 [00:09<00:03,  3.82it/s]\n 73%|███████▎  | 37/51 [00:09<00:03,  3.82it/s]\n 75%|███████▍  | 38/51 [00:09<00:03,  3.81it/s]\n 76%|███████▋  | 39/51 [00:10<00:03,  3.81it/s]\n 78%|███████▊  | 40/51 [00:10<00:02,  3.80it/s]\n 80%|████████  | 41/51 [00:10<00:02,  3.81it/s]\n 82%|████████▏ | 42/51 [00:10<00:02,  3.81it/s]\n 84%|████████▍ | 43/51 [00:11<00:02,  3.82it/s]\n 86%|████████▋ | 44/51 [00:11<00:01,  3.81it/s]\n 88%|████████▊ | 45/51 [00:11<00:01,  3.81it/s]\n 90%|█████████ | 46/51 [00:11<00:01,  3.80it/s]\n 92%|█████████▏| 47/51 [00:12<00:01,  3.79it/s]\n 94%|█████████▍| 48/51 [00:12<00:00,  3.79it/s]\n 96%|█████████▌| 49/51 [00:12<00:00,  3.79it/s]\n 98%|█████████▊| 50/51 [00:13<00:00,  3.79it/s]\n100%|██████████| 51/51 [00:13<00:00,  3.79it/s]\n100%|██████████| 51/51 [00:13<00:00,  3.84it/s]",
  "metrics": {
    "predict_time": 14.481305,
    "total_time": 14.690995
  },
  "output": [
    "https://replicate.delivery/mgxm/fa193a69-05e6-471f-a8bf-96fe1ed65b1b/out-0.png"
  ],
  "started_at": "2022-09-17T22:25:41.157662Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/6cswdgufubhlvpl3umwuzl2qpq",
    "cancel": "https://api.replicate.com/v1/predictions/6cswdgufubhlvpl3umwuzl2qpq/cancel"
  },
  "version": "183d91429024984ac421394b65b2de4d0a51733a371d5dc38e18bac200bda4be"
}

Generated in

14.5 seconds

Tweak itReport View full prediction

Using seed: 13560
Global seed set to 13560

  0%|          | 0/51 [00:00<?, ?it/s]
  2%|▏         | 1/51 [00:00<00:08,  5.81it/s]
  4%|▍         | 2/51 [00:00<00:11,  4.33it/s]
  6%|▌         | 3/51 [00:00<00:11,  4.12it/s]
  8%|▊         | 4/51 [00:00<00:11,  4.01it/s]
 10%|▉         | 5/51 [00:01<00:11,  3.95it/s]
 12%|█▏        | 6/51 [00:01<00:11,  3.89it/s]
 14%|█▎        | 7/51 [00:01<00:11,  3.88it/s]
 16%|█▌        | 8/51 [00:02<00:11,  3.87it/s]
 18%|█▊        | 9/51 [00:02<00:10,  3.87it/s]
 20%|█▉        | 10/51 [00:02<00:10,  3.84it/s]
 22%|██▏       | 11/51 [00:02<00:10,  3.83it/s]
 24%|██▎       | 12/51 [00:03<00:10,  3.84it/s]
 25%|██▌       | 13/51 [00:03<00:09,  3.85it/s]
 27%|██▋       | 14/51 [00:03<00:09,  3.83it/s]
 29%|██▉       | 15/51 [00:03<00:09,  3.83it/s]
 31%|███▏      | 16/51 [00:04<00:09,  3.83it/s]
 33%|███▎      | 17/51 [00:04<00:08,  3.84it/s]
 35%|███▌      | 18/51 [00:04<00:08,  3.83it/s]
 37%|███▋      | 19/51 [00:04<00:08,  3.82it/s]
 39%|███▉      | 20/51 [00:05<00:08,  3.81it/s]
 41%|████      | 21/51 [00:05<00:07,  3.81it/s]
 43%|████▎     | 22/51 [00:05<00:07,  3.81it/s]
 45%|████▌     | 23/51 [00:05<00:07,  3.80it/s]
 47%|████▋     | 24/51 [00:06<00:07,  3.80it/s]
 49%|████▉     | 25/51 [00:06<00:06,  3.81it/s]
 51%|█████     | 26/51 [00:06<00:06,  3.81it/s]
 53%|█████▎    | 27/51 [00:06<00:06,  3.80it/s]
 55%|█████▍    | 28/51 [00:07<00:06,  3.81it/s]
 57%|█████▋    | 29/51 [00:07<00:05,  3.82it/s]
 59%|█████▉    | 30/51 [00:07<00:05,  3.82it/s]
 61%|██████    | 31/51 [00:08<00:05,  3.82it/s]
 63%|██████▎   | 32/51 [00:08<00:04,  3.83it/s]
 65%|██████▍   | 33/51 [00:08<00:04,  3.82it/s]
 67%|██████▋   | 34/51 [00:08<00:04,  3.81it/s]
 69%|██████▊   | 35/51 [00:09<00:04,  3.82it/s]
 71%|███████   | 36/51 [00:09<00:03,  3.82it/s]
 73%|███████▎  | 37/51 [00:09<00:03,  3.82it/s]
 75%|███████▍  | 38/51 [00:09<00:03,  3.81it/s]
 76%|███████▋  | 39/51 [00:10<00:03,  3.81it/s]
 78%|███████▊  | 40/51 [00:10<00:02,  3.80it/s]
 80%|████████  | 41/51 [00:10<00:02,  3.81it/s]
 82%|████████▏ | 42/51 [00:10<00:02,  3.81it/s]
 84%|████████▍ | 43/51 [00:11<00:02,  3.82it/s]
 86%|████████▋ | 44/51 [00:11<00:01,  3.81it/s]
 88%|████████▊ | 45/51 [00:11<00:01,  3.81it/s]
 90%|█████████ | 46/51 [00:11<00:01,  3.80it/s]
 92%|█████████▏| 47/51 [00:12<00:01,  3.79it/s]
 94%|█████████▍| 48/51 [00:12<00:00,  3.79it/s]
 96%|█████████▌| 49/51 [00:12<00:00,  3.79it/s]
 98%|█████████▊| 50/51 [00:13<00:00,  3.79it/s]
100%|██████████| 51/51 [00:13<00:00,  3.79it/s]
100%|██████████| 51/51 [00:13<00:00,  3.84it/s]

Examples

View more examples

Run time and cost

This model costs approximately $0.023 to run on Replicate, or 43 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 103 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Japanese Stable Diffusion

Japanese Stable Diffusion is a Japanese-specific latent text-to-image diffusion model.

This model was trained by using a powerful text-to-image model, Stable Diffusion. Many thanks to CompVis, Stability AI and LAION for public release.

Model Details

Why Japanese Stable Diffusion?

Stable Diffusion is a very powerful text-to-image model, not only in terms of quality but also in terms of computational cost. Because Stable Diffusion was trained on English dataset, you need translate prompts or use directly if you are non-English users. Surprisingly, Stable Diffusion can sometimes generate nice images even by using non-English prompts. So, why do we need a language-specific Japanese Stable Diffusion?

Because we want a model to understand our culture, identity, and unique expressions such as slang. For example, one of the famous Japanglish is “salary man” which means a businessman especially we often imagine he’s wearing a suit. Stable Diffusion cannot understand such Japanese unique words correctly because Japanese is not their target.

So, we made a language-specific version of Stable Diffusion! Japanese Stable Diffusion can achieve the following points compared to the original Stable Diffusion. - Generate Japanese-style images - Understand Japanglish - Understand Japanese unique onomatope - Understand Japanese proper noun

caption: “サラリーマン油絵”, that means “salary man, oil painting”

Training

Japanese Stable Diffusion was trained by using Stable Diffusion and has the same architecture and the same number of parameters. But, this is not a fully fine-tuned model on Japanese datasets because Stable Diffusion was trained on English dataset and the CLIP tokenizer is basically for English. To achieve make a Japanese-specific model based on Stable Diffusion, we had 2 stages inspired by PITI.

Train a Japanese-specific text encoder with our Japanese tokenizer from scratch with the latent diffusion model fixed. This stage is expected to map Japanese captions to Stable Diffusion’s latent space.
Fine-tune the text encoder and the latent diffusion model jointly. This stage is expected to generate Japanese-style images more.

We used the following dataset for training the model:

Approximately 100 million images with Japanese captions, including the Japanese subset of LAION-5B.

Citation

@InProceedings{Rombach_2022_CVPR,
      author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
      title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      month     = {June},
      year      = {2022},
      pages     = {10684-10695}
  }

@misc{japanese_stable_diffusion,
    author    = {Shing, Makoto and Sawada, Kei},
    title     = {Japanese Stable Diffusion},
    howpublished = {\url{https://github.com/rinnakk/japanese-stable-diffusion}},
    month     = {September},
    year      = {2022},
}

License

The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.