yorickvp
/
llava-13b
Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
If you haven’t yet trained a model on Replicate, we recommend you read one of the following guides.
Pricing
Trainings for this model run on Nvidia A100 (80GB) GPU hardware, which costs $0.0014 per second.
Create a training
Note that before you can create a training, you’ll need to create a model and use its name as the value for the destination field.
You can finetune LLaVA with your own dataset, using LoRA techniques! Training data can be passed to cog train
with the train_data
parameter. Your training dataset should be a zip-file with the following structure:
- ./images/: A folder with training data images.
- ./data.json: A JSON file that links images to conversations. For details, see the dataset format instructions in the github repository.
Example code for training:
import replicate
training = replicate.trainings.create(
version="yorickvp/llava-13b:[version_id]",
input={
"train_data": "https://my-domain/my-input-images.zip",
},
destination="my-name/my-model"
)
print(training)
You can find more information about finetuning image models in the Replicate docs. The tutorial on finetuning SDXL with your own images is a good starting point.