yorickvp / llava-v1.6-vicuna-13b

LLaVA v1.6: Large Language and Vision Assistant (Vicuna-13B)

  • Public
  • 3.4M runs
  • L40S
  • GitHub
  • License

Input

image
file

Input image

*string
Shift + Return to add a new line

Prompt to use for text generation

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens

Default: 1

number
(minimum: 0)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic

Default: 0.2

integer
(minimum: 0)

Maximum number of tokens to generate. A word is generally 2-3 tokens

Default: 1024

string[]

List of earlier chat messages, alternating roles, starting with user input. Include <image> to specify which message to attach the image to.

Output

When visiting a place like the one shown in the image, which appears to be a serene lake with a dock and surrounded by mountains and forests, you should consider the following: 1. **Weather Conditions**: Be prepared for the local weather. It can change rapidly, especially in mountainous areas, so check the forecast and dress in layers. 2. **Safety**: If you plan to go on the water, make sure you have a life jacket and know how to swim. If you're not a strong swimmer, consider wearing a life jacket even if you're just walking along the shore. 3. **Wildlife**: Be aware of the local wildlife and maintain a safe distance. Do not feed the animals, as this can disrupt their natural behavior and diet. 4. **Leave No Trace**: Respect the environment by not littering and by staying on designated paths or trails. 5. **Emergency Preparedness**: Have a first aid kit and know basic first aid. It's also a good idea to have a way to contact emergency services if needed. 6. **Navigation**: If you're going into the woods or on the water, have a map or GPS device and know how to use it. 7. **Hydration and Nutrition**: Bring enough water and food for your visit, especially if you'll be active. 8. **Clothing and Gear**: Wear appropriate clothing for the weather and activities you plan to do. Bring sturdy footwear if you plan to hike or walk on uneven terrain. 9. **Local Regulations**: Follow any posted rules or regulations, which might include fishing regulations, fire restrictions, or camping rules. 10. **Respect for Others**: Be considerate of other visitors. Keep noise to a minimum and be mindful of others who may be trying to enjoy the same natural beauty as you. Enjoy your visit and remember to take care of the environment and your fellow visitors.
Generated in

This example was created by a different version, yorickvp/llava-v1.6-vicuna-13b:ba573f1c.

Run time and cost

This model costs approximately $0.012 to run on Replicate, or 83 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 13 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Check out the different LLaVA’s on Replicate:

Name Version Base Size Finetunable
v1.5 - Vicuna-13B v1.5 Vicuna 13B Yes
v1.6 - Vicuna-13B v1.6 Vicuna 13B No
v1.6 - Vicuna-7B v1.6 Vicuna 7B No
v1.6 - Mistral-7B v1.6 Mistral 7B No
v1.6 - Nous-Hermes-2-34B v1.6 Nous-Hermes-2 34B No

🌋 LLaVA v1.6: Large Language and Vision Assistant

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities.

[Project Page] [Demo] [Data] [Model Zoo]

Improved Baselines with Visual Instruction Tuning [Paper]
Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee

Visual Instruction Tuning (NeurIPS 2023, Oral) [Paper]
Haotian Liu*, Chunyuan Li*, Qingyang Wu, Yong Jae Lee (*Equal Contribution)

LLaVA v1.6 changes

LLaVA-1.6 is out! With additional scaling to LLaVA-1.5, LLaVA-1.6-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before. Check out the blog post!

Summary

LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.