yorickvp
/
llava-13b
Visual instruction tuning towards large language and vision models with GPT-4 level capabilities
Train yorickvp/llava-13b
Trainings for this model run on Nvidia A100 (80GB) GPU hardware, which costs $0.0014 per second.
You can finetune LLaVA with your own dataset, using LoRA techniques! Training data can be passed to cog train
with the train_data
parameter. Your training dataset should be a zip-file with the following structure:
- ./images/: A folder with training data images.
- ./data.json: A JSON file that links images to conversations. For details, see the dataset format instructions in the github repository.
Example code for training:
import replicate
training = replicate.trainings.create(
version="yorickvp/llava-13b:[version_id]",
input={
"train_data": "https://my-domain/my-input-images.zip",
},
destination="my-name/my-model"
)
print(training)
You can find more information about finetuning image models in the Replicate docs. The tutorial on finetuning SDXL with your own images is a good starting point.