yorickvp / llava-13b

Visual instruction tuning towards large language and vision models with GPT-4 level capabilities

Train yorickvp/llava-13b

Trainings for this model run on Nvidia A100 (80GB) GPU hardware, which costs $0.0014 per second.



You can finetune LLaVA with your own dataset, using LoRA techniques! Training data can be passed to cog train with the train_data parameter. Your training dataset should be a zip-file with the following structure:

  • ./images/: A folder with training data images.
  • ./data.json: A JSON file that links images to conversations. For details, see the dataset format instructions in the github repository.

Example code for training:

import replicate

training = replicate.trainings.create(
    version="yorickvp/llava-13b:[version_id]",
    input={
        "train_data": "https://my-domain/my-input-images.zip",
    },
    destination="my-name/my-model"
)
print(training)

You can find more information about finetuning image models in the Replicate docs. The tutorial on finetuning SDXL with your own images is a good starting point.