tomasmcm / obsidian-3b-v0.5

Source: NousResearch/Obsidian-3B-V0.5 ✦ Worlds smallest multi-modal LLM

  • Public
  • 100 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 GPU hardware. Predictions typically complete within 11 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Obsidian: Worlds smallest multi-modal LLM. First multi-modal model in size 3B

Model Name: Obsidian-3B-V0.5

Obsidian is a brand new series of Multimodal Language Models. This first project is led by Quan N. and Luigi D.(LDJ).

Obsidian-3B-V0.5 is a multi-modal AI model that has vision! it’s smarts are built on Capybara-3B-V1.9 based on StableLM-3B-4e1t. Capybara-3B-V1.9 achieves state-of-the-art performance when compared to model with similar size, even beats some 7B models.

Current finetuning and inference code is available on our GitHub repo: Here

Acknowledgement

Obsidian-3B-V0.5 was developed and finetuned by Nous Research, in collaboration with Virtual Interactive. Special thank you to LDJ for the wonderful Capybara dataset, and qnguyen3 for the model training procedure.

Model Training

Obsidian-3B-V0.5 followed the same training procedure as LLaVA 1.5

Prompt Format

The model followed ChatML format. However, with ### as the seperator

<|im_start|>user
What is this sign about?\n<image>
###
<|im_start|>assistant
The sign is about bullying, and it is placed on a black background with a red background.
###

Benchmarks

Coming Soon!