Report output

Machine learning models are powerful, but they're also unmoderated. Sometimes the output of the models can be inappropriate or incorrect. It might be broken or buggy. It might cause copyright issues. It might be inappropriate content.

Whatever the reason, please use this form to report to us when something is wrong with the output of a model you've run.

We'll investigate the reported output and take appropriate action. We may flag this output to the model author if we think they should be aware.

Your report

Output

The image depicts a three-stage process for training a vision-language model. 1. Stage 1: Training VL Adapter: In this stage, a vision-language adapter is trained using supervised fine-tuning. The adapter is trained on image-text pairs and pure language sequences. 2. Stage 2: Joint VL Pre-training: In this stage, a joint vision-language model is pre-trained using self-supervised learning. The model is trained on image-text pairs and pure language sequences. 3. Stage 3: Supervised Fine-tuning: In this stage, the model is fine-tuned on supervised tasks using image-text pairs and pure language sequences. The model is trained using a hybrid vision-language adapter, which combines a vision-language adapter with a language model. The model is trained on a variety of tasks, including image captioning, visual question answering, and visual reasoning. The model is able to understand the visual content of an image and generate a natural language description or answer.

Give us a few details about how this output is unsafe or broken.