chenxwh / deepseek-vl2
Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Prediction
chenxwh/deepseek-vl2:8ea887897e772107ce53f3a7fa4850e78ae88b2b73ff854b4700db9f0d59c7cbID6gx56twc39rme0cm2ccsdrrmcwStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
Output
The figure illustrates the three-stage training process for a Vision-Language (VL) model using the DeepSeek LLM. Here's a detailed description of each stage: ### Stage 1: Training VL Adaptor - **Components Involved:** - **DeepSeek LLM:** This is the large language model that serves as the backbone for the entire process. - **Vision-Language Adaptor:** This component is responsible for adapting the vision and language models to work together effectively. - **Hybrid Vision Encoder:** This encoder processes visual data and converts it into a format suitable for the model. It uses two specific models, SAM-B and SigLIP-L, which are pre-trained on image-text pairs. - **Image-Text Pairs:** These pairs are used to train the hybrid vision encoder. The encoder learns to understand and generate text based on visual inputs. - **Process:** - The hybrid vision encoder is trained using image-text pairs. This training helps the encoder learn how to map images to corresponding text descriptions. ### Stage 2: Joint VL Pre-training - **Components Involved:** - **DeepSeek LLM:** Continues to be the central model. - **Vision-Language Adaptor:** Continues to adapt the vision and language components. - **Hybrid Vision Encoder:** Continues to process visual data. - **Interleaved VL + Pure Language Sequences:** In this stage, both vision-language sequences and pure language sequences are interleaved. This means that the model is exposed to both types of data during training, allowing it to learn from a more diverse set of examples. - **Process:** - The model is pre-trained using a combination of vision-language sequences and pure language sequences. This joint training helps the model develop a better understanding of both visual and textual information. ### Stage 3: Supervised Finetuning - **Components Involved:** - **DeepSeek LLM:** Continues to be the central model. - **Vision-Language Adaptor:** Continues to refine the adaptation between vision and language. - **Hybrid Vision Encoder:** Continues to process visual data. - **VL Chat Data + Pure Language Chat Data:** In this final stage, the model is fine-tuned using both vision-language chat data and pure language chat data. This step focuses on improving the model's performance specifically in generating responses to chat queries. - **Process:** - The model undergoes supervised finetuning with a focus on chat tasks. The finetuning process involves adjusting the model parameters to optimize its performance in generating coherent and contextually relevant responses to user queries. ### Summary The figure outlines a comprehensive approach to training a Vision-Language model using the DeepSeek LLM. Starting with training the VL adaptor and hybrid vision encoder on image-text pairs, followed by joint pre-training with interleaved sequences, and finally, supervised finetuning with chat data, the model is progressively refined to enhance its ability to handle complex vision-language tasks.<|end▁of▁sentence|>Generated inPrediction
chenxwh/deepseek-vl2:8ea887897e772107ce53f3a7fa4850e78ae88b2b73ff854b4700db9f0d59c7cbID8knq41b1zxrme0cm2cfb8jxnegStatusSucceededSourceWebHardwareL40STotal durationCreatedPrediction
chenxwh/deepseek-vl2:8ea887897e772107ce53f3a7fa4850e78ae88b2b73ff854b4700db9f0d59c7cbIDg7g5qnefrhrma0cm2cft94nwvwStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
Prediction
chenxwh/deepseek-vl2:8ea887897e772107ce53f3a7fa4850e78ae88b2b73ff854b4700db9f0d59c7cbID54dbyeaknxrme0cm2cgt4eyzggStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
Prediction
chenxwh/deepseek-vl2:8ea887897e772107ce53f3a7fa4850e78ae88b2b73ff854b4700db9f0d59c7cbIDwksm0fwe7hrmc0cm2cg81fvfbwStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
Output
Certainly! Here are a few recipe ideas that incorporate carrots, corn, and raw meat (likely pork ribs): ### Carrot Corn Pork Ribs Soup **Ingredients:** - 4 large carrots - 2 ears of corn - 1 pound pork ribs - 2 tablespoons olive oil - 1 onion, chopped - 3 cloves garlic, minced - 8 cups chicken broth - 1 teaspoon dried thyme - Salt and pepper to taste **Instructions:** 1. **Preparation:** - Peel and chop the carrots into bite-sized pieces. - Remove the husks from the corn cobs and cut off the ends. - Cut the pork ribs into smaller pieces. 2. **Cooking:** - Heat the olive oil in a large pot over medium heat. - Add the chopped onion and sauté until translucent. - Add the minced garlic and cook for another minute. - Add the pork ribs and brown them on all sides. - Pour in the chicken broth and add the thyme. - Bring to a boil, then reduce the heat and let it simmer for about an hour or until the meat is tender. - Add the carrots and corn cobs, cover, and continue cooking for another 30 minutes. - Season with salt and pepper to taste. ### Grilled Pork Ribs with Roasted Carrots and Corn Salsa **Ingredients:** - 4 pork ribs - 4 large carrots - 2 ears of corn - 1/4 cup olive oil - 2 tablespoons honey - 1 tablespoon soy sauce - 1 teaspoon ground cumin - 1 teaspoon smoked paprika - 1/4 cup fresh cilantro, chopped - Salt and pepper to taste **Instructions:** 1. **Marinating:** - In a bowl, mix together olive oil, honey, soy sauce, cumin, smoked paprika, and salt and pepper. - Place the pork ribs in the marinade, turning to coat evenly. Let it marinate for at least 2 hours, preferably overnight. 2. **Grilling:** - Preheat your grill to medium-high heat. - Remove the ribs from the marinade, letting any excess liquid drip off. - Grill the ribs for about 5-7 minutes per side, or until they reach your desired level of doneness. 3. **Roasting:** - Preheat your oven to 400°F (200°C). - Toss the carrots and corn with olive oil, salt, and pepper. Spread them out on a baking sheet. - Roast in the oven for about 20-25 minutes, or until the vegetables are tender and slightly caramelized. 4. **Assembly:** - Serve the grilled pork ribs hot, accompanied by roasted carrots and corn. - Garnish with fresh cilantro and serve with a side of salsa made from the remaining marinade ingredients. These recipes should provide a delicious and hearty meal using the ingredients shown in the images. Enjoy cooking!<|end▁of▁sentence|>Generated inPrediction
chenxwh/deepseek-vl2:8ea887897e772107ce53f3a7fa4850e78ae88b2b73ff854b4700db9f0d59c7cbID00e9h1m96xrmc0cm2cjv7rj8fcStatusSucceededSourceWebHardwareL40STotal durationCreatedInput
Output
No, the two images are not showing the same animal. The first image features a raccoon with distinctive black "mask" markings around its eyes and a bushy tail. It has a grayish-brown fur coat and is looking directly at the camera. The second image shows a red panda with reddish-brown fur, white markings on its face, and a fluffy tail. It is perched on a branch against a clear blue sky background.<|end▁of▁sentence|>Generated in
Want to make some of these yourself?
Run this model