Input

Run this model in Node.js with one line of code:

npx create-replicate --model=cjwbw/internlm-xcomposer

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run cjwbw/internlm-xcomposer using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "cjwbw/internlm-xcomposer:d16df299dbe3454023fcb47ed48dbff052e9b7cdf2837707adff3581edd11e95",
  {
    input: {
      text: "What makes this image special?",
      image: "https://replicate.delivery/pbxt/JcqDxAZJWep7WsZdWM0gc6Ead2ie0YDEXyemc9HXogSdpsOM/out-0%20%281%29.png"
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run cjwbw/internlm-xcomposer using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "cjwbw/internlm-xcomposer:d16df299dbe3454023fcb47ed48dbff052e9b7cdf2837707adff3581edd11e95",
    input={
        "text": "What makes this image special?",
        "image": "https://replicate.delivery/pbxt/JcqDxAZJWep7WsZdWM0gc6Ead2ie0YDEXyemc9HXogSdpsOM/out-0%20%281%29.png"
    }
)

print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run cjwbw/internlm-xcomposer using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "cjwbw/internlm-xcomposer:d16df299dbe3454023fcb47ed48dbff052e9b7cdf2837707adff3581edd11e95",
    "input": {
      "text": "What makes this image special?",
      "image": "https://replicate.delivery/pbxt/JcqDxAZJWep7WsZdWM0gc6Ead2ie0YDEXyemc9HXogSdpsOM/out-0%20%281%29.png"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

This image is special because it depicts an astronaut sitting in a chair surrounded by psychedelic mushrooms. The astronaut is wearing an orange spacesuit, which adds to the surreal and dreamlike atmosphere of the scene. The combination of the astronaut and the psychedelic mushrooms creates a unique and captivating composition.

Generated in

9.9 seconds

Tweak it Report View full prediction

Run time and cost

This model costs approximately $0.0013 to run on Replicate, or 769 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 2 seconds.

Readme

InternLM-XComposer

InternLM-XComposer is a vision-language large model (VLLM) based on InternLM for advanced text-image comprehension and composition. InternLM-XComposer has serveal appealing properties:

Interleaved Text-Image Composition: InternLM-XComposer can effortlessly generate coherent and contextual articles that seamlessly integrate images, providing a more engaging and immersive reading experience. The interleaved text-image composition is implemented in following steps:
1. Text Generation: It crafts long-form text based on human-provided instructions.
2. Image Spoting and Captioning: It pinpoints optimal locations for image placement and furnishes image descriptions.
3. Image Retrieval and Selection: It select image candidates and identify the image that optimally complements the content.
Comprehension with Rich Multilingual Knowledge: The text-image comprehension is empowered by training on extensive multi-modal multilingual concepts with carefully crafted strategies, resulting in a deep understanding of visual content.
Strong performance: It consistently achieves state-of-the-art results across various benchmarks for vision-language large models, including MME Benchmark (English), MMBench (English), Seed-Bench (English), CCBench(Chinese), and MMBench-CN (Chineese).

We release InternLM-XComposer series in two versions:

InternLM-XComposer-VL: The pretrained VLLM model with InternLM as the initialization of the LLM, achieving strong performance on various multimodal benchmarks, e.g., MME Benchmark, MMBench Seed-Bench, CCBench, and MMBench-CN.
InternLM-XComposer: The finetuned VLLM for Interleaved Text-Image Composition and LLM-based AI assistant.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil: :)

@misc{zhang2023internlmxcomposer,
      title={InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition}, 
      author={Pan Zhang and Xiaoyi Dong and Bin Wang and Yuhang Cao and Chao Xu and Linke Ouyang and Zhiyuan Zhao and Shuangrui Ding and Songyang Zhang and Haodong Duan and Hang Yan and Xinyue Zhang and Wei Li and Jingwen Li and Kai Chen and Conghui He and Xingcheng Zhang and Yu Qiao and Dahua Lin and Jiaqi Wang},
      year={2023},
      eprint={2309.15112},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

cjwbw / internlm-xcomposer

Input

Output

Run time and cost

Readme

InternLM-XComposer

Citation

Logs (7s4rnulbxfzswyq2cc6n223fwi)