shimmercam / animatediff-v3
AnimateDiff v3 + SparseCtrl: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning. Created with Shimmer.
Prediction
shimmercam/animatediff-v3:dd87e0a6eed441ad5944d9b10a2020245889b6b85fe97bd9abeb76334d2c95ecIDpf4uuybbagzq4z7w5p6tkec2geStatusSucceededSourceWebHardwareA100 (40GB)Total durationCreatedInput
- seed
- -1
- steps
- 25
- length
- 16
- prompt
- husky running in the snow
- guidance
- 8.5
- negative_prompt
- worst quality, low quality, letterboxed
- dreambooth_model
- None
{ "seed": -1, "steps": 25, "length": 16, "prompt": "husky running in the snow", "guidance": 8.5, "negative_prompt": "worst quality, low quality, letterboxed", "dreambooth_model": "None" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run shimmercam/animatediff-v3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "shimmercam/animatediff-v3:dd87e0a6eed441ad5944d9b10a2020245889b6b85fe97bd9abeb76334d2c95ec", { input: { seed: -1, steps: 25, length: 16, prompt: "husky running in the snow", guidance: 8.5, negative_prompt: "worst quality, low quality, letterboxed", dreambooth_model: "None" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run shimmercam/animatediff-v3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "shimmercam/animatediff-v3:dd87e0a6eed441ad5944d9b10a2020245889b6b85fe97bd9abeb76334d2c95ec", input={ "seed": -1, "steps": 25, "length": 16, "prompt": "husky running in the snow", "guidance": 8.5, "negative_prompt": "worst quality, low quality, letterboxed", "dreambooth_model": "None" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run shimmercam/animatediff-v3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "shimmercam/animatediff-v3:dd87e0a6eed441ad5944d9b10a2020245889b6b85fe97bd9abeb76334d2c95ec", "input": { "seed": -1, "steps": 25, "length": 16, "prompt": "husky running in the snow", "guidance": 8.5, "negative_prompt": "worst quality, low quality, letterboxed", "dreambooth_model": "None" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2023-12-16T20:58:17.238586Z", "created_at": "2023-12-16T20:54:44.468540Z", "data_removed": false, "error": null, "id": "pf4uuybbagzq4z7w5p6tkec2ge", "input": { "seed": -1, "steps": 25, "length": 16, "prompt": "husky running in the snow", "guidance": 8.5, "negative_prompt": "worst quality, low quality, letterboxed", "dreambooth_model": "None" }, "logs": "loading controlnet checkpoint from /AnimateDiff/models/SparseCtrl/v3_sd15_sparsectrl_rgb.ckpt ...\nload motion module from /AnimateDiff/models/Motion_Module/v3_sd15_mm.ckpt\nload domain lora from /AnimateDiff/models/Motion_Module/v3_sd15_adapter.ckpt\n 0%| | 0/25 [00:00<?, ?it/s]\n 4%|▍ | 1/25 [00:02<00:49, 2.07s/it]\n 8%|▊ | 2/25 [00:03<00:40, 1.78s/it]\n 12%|█▏ | 3/25 [00:05<00:37, 1.68s/it]\n 16%|█▌ | 4/25 [00:06<00:34, 1.64s/it]\n 20%|██ | 5/25 [00:08<00:32, 1.61s/it]\n 24%|██▍ | 6/25 [00:09<00:30, 1.60s/it]\n 28%|██▊ | 7/25 [00:11<00:28, 1.59s/it]\n 32%|███▏ | 8/25 [00:13<00:26, 1.58s/it]\n 36%|███▌ | 9/25 [00:14<00:25, 1.58s/it]\n 40%|████ | 10/25 [00:16<00:23, 1.58s/it]\n 44%|████▍ | 11/25 [00:17<00:22, 1.58s/it]\n 48%|████▊ | 12/25 [00:19<00:20, 1.57s/it]\n 52%|█████▏ | 13/25 [00:20<00:18, 1.57s/it]\n 56%|█████▌ | 14/25 [00:22<00:17, 1.57s/it]\n 60%|██████ | 15/25 [00:24<00:15, 1.57s/it]\n 64%|██████▍ | 16/25 [00:25<00:14, 1.57s/it]\n 68%|██████▊ | 17/25 [00:27<00:12, 1.57s/it]\n 72%|███████▏ | 18/25 [00:28<00:11, 1.57s/it]\n 76%|███████▌ | 19/25 [00:30<00:09, 1.57s/it]\n 80%|████████ | 20/25 [00:31<00:07, 1.57s/it]\n 84%|████████▍ | 21/25 [00:33<00:06, 1.57s/it]\n 88%|████████▊ | 22/25 [00:35<00:04, 1.57s/it]\n 92%|█████████▏| 23/25 [00:36<00:03, 1.57s/it]\n 96%|█████████▌| 24/25 [00:38<00:01, 1.57s/it]\n100%|██████████| 25/25 [00:39<00:00, 1.57s/it]\n100%|██████████| 25/25 [00:39<00:00, 1.59s/it]\n 0%| | 0/16 [00:00<?, ?it/s]\n 31%|███▏ | 5/16 [00:00<00:00, 44.00it/s]\n 62%|██████▎ | 10/16 [00:00<00:00, 23.75it/s]\n 81%|████████▏ | 13/16 [00:00<00:00, 21.43it/s]\n100%|██████████| 16/16 [00:00<00:00, 20.17it/s]\n100%|██████████| 16/16 [00:00<00:00, 22.03it/s]", "metrics": { "predict_time": 51.429076, "total_time": 212.770046 }, "output": "https://replicate.delivery/pbxt/NPJe0gaetTmb1Ee6rebJ6fFjnFfgAh1dCjp9UySp3UNOaBvgE/output.gif", "started_at": "2023-12-16T20:57:25.809510Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/pf4uuybbagzq4z7w5p6tkec2ge", "cancel": "https://api.replicate.com/v1/predictions/pf4uuybbagzq4z7w5p6tkec2ge/cancel" }, "version": "dd87e0a6eed441ad5944d9b10a2020245889b6b85fe97bd9abeb76334d2c95ec" }
Generated inloading controlnet checkpoint from /AnimateDiff/models/SparseCtrl/v3_sd15_sparsectrl_rgb.ckpt ... load motion module from /AnimateDiff/models/Motion_Module/v3_sd15_mm.ckpt load domain lora from /AnimateDiff/models/Motion_Module/v3_sd15_adapter.ckpt 0%| | 0/25 [00:00<?, ?it/s] 4%|▍ | 1/25 [00:02<00:49, 2.07s/it] 8%|▊ | 2/25 [00:03<00:40, 1.78s/it] 12%|█▏ | 3/25 [00:05<00:37, 1.68s/it] 16%|█▌ | 4/25 [00:06<00:34, 1.64s/it] 20%|██ | 5/25 [00:08<00:32, 1.61s/it] 24%|██▍ | 6/25 [00:09<00:30, 1.60s/it] 28%|██▊ | 7/25 [00:11<00:28, 1.59s/it] 32%|███▏ | 8/25 [00:13<00:26, 1.58s/it] 36%|███▌ | 9/25 [00:14<00:25, 1.58s/it] 40%|████ | 10/25 [00:16<00:23, 1.58s/it] 44%|████▍ | 11/25 [00:17<00:22, 1.58s/it] 48%|████▊ | 12/25 [00:19<00:20, 1.57s/it] 52%|█████▏ | 13/25 [00:20<00:18, 1.57s/it] 56%|█████▌ | 14/25 [00:22<00:17, 1.57s/it] 60%|██████ | 15/25 [00:24<00:15, 1.57s/it] 64%|██████▍ | 16/25 [00:25<00:14, 1.57s/it] 68%|██████▊ | 17/25 [00:27<00:12, 1.57s/it] 72%|███████▏ | 18/25 [00:28<00:11, 1.57s/it] 76%|███████▌ | 19/25 [00:30<00:09, 1.57s/it] 80%|████████ | 20/25 [00:31<00:07, 1.57s/it] 84%|████████▍ | 21/25 [00:33<00:06, 1.57s/it] 88%|████████▊ | 22/25 [00:35<00:04, 1.57s/it] 92%|█████████▏| 23/25 [00:36<00:03, 1.57s/it] 96%|█████████▌| 24/25 [00:38<00:01, 1.57s/it] 100%|██████████| 25/25 [00:39<00:00, 1.57s/it] 100%|██████████| 25/25 [00:39<00:00, 1.59s/it] 0%| | 0/16 [00:00<?, ?it/s] 31%|███▏ | 5/16 [00:00<00:00, 44.00it/s] 62%|██████▎ | 10/16 [00:00<00:00, 23.75it/s] 81%|████████▏ | 13/16 [00:00<00:00, 21.43it/s] 100%|██████████| 16/16 [00:00<00:00, 20.17it/s] 100%|██████████| 16/16 [00:00<00:00, 22.03it/s]
Prediction
shimmercam/animatediff-v3:5e0d88c259792ad95a3c596ed1997de772c424dd0222802b830d43a36f861763IDpxl7jdlbirpfuq2g3bdrbcukvqStatusSucceededSourceWebHardwareA100 (40GB)Total durationCreatedInput
- seed
- -1
- steps
- 25
- length
- 16
- prompt
- close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot
- guidance
- 7
- negative_prompt
- (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
- dreambooth_model
- Realistic Vision
{ "seed": -1, "steps": 25, "length": 16, "prompt": "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot", "guidance": 7, "negative_prompt": "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck", "dreambooth_model": "Realistic Vision" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run shimmercam/animatediff-v3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "shimmercam/animatediff-v3:5e0d88c259792ad95a3c596ed1997de772c424dd0222802b830d43a36f861763", { input: { seed: -1, steps: 25, length: 16, prompt: "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot", guidance: 7, negative_prompt: "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck", dreambooth_model: "Realistic Vision" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run shimmercam/animatediff-v3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "shimmercam/animatediff-v3:5e0d88c259792ad95a3c596ed1997de772c424dd0222802b830d43a36f861763", input={ "seed": -1, "steps": 25, "length": 16, "prompt": "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot", "guidance": 7, "negative_prompt": "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck", "dreambooth_model": "Realistic Vision" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run shimmercam/animatediff-v3 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "shimmercam/animatediff-v3:5e0d88c259792ad95a3c596ed1997de772c424dd0222802b830d43a36f861763", "input": { "seed": -1, "steps": 25, "length": 16, "prompt": "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot", "guidance": 7, "negative_prompt": "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck", "dreambooth_model": "Realistic Vision" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
{ "completed_at": "2023-12-16T21:26:14.986192Z", "created_at": "2023-12-16T21:21:40.999806Z", "data_removed": false, "error": null, "id": "pxl7jdlbirpfuq2g3bdrbcukvq", "input": { "seed": -1, "steps": 25, "length": 16, "prompt": "close up photo of a rabbit, forest, haze, halation, bloom, dramatic atmosphere, centred, rule of thirds, 200mm 1.4f macro shot", "guidance": 7, "negative_prompt": "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck", "dreambooth_model": "Realistic Vision" }, "logs": "loading controlnet checkpoint from /AnimateDiff/models/SparseCtrl/v3_sd15_sparsectrl_rgb.ckpt ...\nload motion module from /AnimateDiff/models/Motion_Module/v3_sd15_mm.ckpt\nload dreambooth model from /AnimateDiff/models/DreamBooth_LoRA/realisticVisionV60B1_v51VAE.safetensors\nconfig.json: 0%| | 0.00/4.52k [00:00<?, ?B/s]\nconfig.json: 100%|██████████| 4.52k/4.52k [00:00<00:00, 18.9MB/s]\nmodel.safetensors: 0%| | 0.00/1.71G [00:00<?, ?B/s]\nmodel.safetensors: 1%| | 10.5M/1.71G [00:00<01:08, 25.0MB/s]\nmodel.safetensors: 2%|▏ | 41.9M/1.71G [00:00<00:18, 91.7MB/s]\nmodel.safetensors: 4%|▍ | 73.4M/1.71G [00:00<00:11, 146MB/s] \nmodel.safetensors: 6%|▌ | 105M/1.71G [00:00<00:08, 179MB/s] \nmodel.safetensors: 8%|▊ | 136M/1.71G [00:00<00:07, 202MB/s]\nmodel.safetensors: 10%|▉ | 168M/1.71G [00:01<00:07, 206MB/s]\nmodel.safetensors: 12%|█▏ | 199M/1.71G [00:01<00:06, 223MB/s]\nmodel.safetensors: 13%|█▎ | 231M/1.71G [00:01<00:06, 237MB/s]\nmodel.safetensors: 15%|█▌ | 262M/1.71G [00:01<00:05, 248MB/s]\nmodel.safetensors: 17%|█▋ | 294M/1.71G [00:01<00:06, 217MB/s]\nmodel.safetensors: 19%|█▉ | 325M/1.71G [00:01<00:05, 232MB/s]\nmodel.safetensors: 21%|██ | 357M/1.71G [00:01<00:05, 245MB/s]\nmodel.safetensors: 23%|██▎ | 388M/1.71G [00:01<00:05, 233MB/s]\nmodel.safetensors: 25%|██▍ | 419M/1.71G [00:02<00:05, 238MB/s]\nmodel.safetensors: 26%|██▋ | 451M/1.71G [00:02<00:05, 243MB/s]\nmodel.safetensors: 28%|██▊ | 482M/1.71G [00:02<00:04, 247MB/s]\nmodel.safetensors: 31%|███ | 524M/1.71G [00:02<00:04, 264MB/s]\nmodel.safetensors: 32%|███▏ | 556M/1.71G [00:02<00:04, 256MB/s]\nmodel.safetensors: 34%|███▍ | 587M/1.71G [00:02<00:04, 262MB/s]\nmodel.safetensors: 36%|███▌ | 619M/1.71G [00:02<00:04, 254MB/s]\nmodel.safetensors: 38%|███▊ | 650M/1.71G [00:02<00:04, 256MB/s]\nmodel.safetensors: 40%|███▉ | 682M/1.71G [00:03<00:03, 262MB/s]\nmodel.safetensors: 42%|████▏ | 713M/1.71G [00:03<00:03, 265MB/s]\nmodel.safetensors: 44%|████▎ | 744M/1.71G [00:03<00:03, 258MB/s]\nmodel.safetensors: 45%|████▌ | 776M/1.71G [00:03<00:03, 256MB/s]\nmodel.safetensors: 47%|████▋ | 807M/1.71G [00:03<00:03, 257MB/s]\nmodel.safetensors: 49%|████▉ | 839M/1.71G [00:03<00:03, 262MB/s]\nmodel.safetensors: 51%|█████ | 870M/1.71G [00:03<00:03, 264MB/s]\nmodel.safetensors: 53%|█████▎ | 902M/1.71G [00:03<00:03, 260MB/s]\nmodel.safetensors: 55%|█████▍ | 933M/1.71G [00:04<00:03, 256MB/s]\nmodel.safetensors: 56%|█████▋ | 965M/1.71G [00:04<00:02, 258MB/s]\nmodel.safetensors: 58%|█████▊ | 996M/1.71G [00:04<00:02, 259MB/s]\nmodel.safetensors: 60%|██████ | 1.03G/1.71G [00:04<00:02, 262MB/s]\nmodel.safetensors: 62%|██████▏ | 1.06G/1.71G [00:04<00:02, 262MB/s]\nmodel.safetensors: 64%|██████▍ | 1.09G/1.71G [00:04<00:02, 260MB/s]\nmodel.safetensors: 66%|██████▌ | 1.12G/1.71G [00:04<00:02, 260MB/s]\nmodel.safetensors: 67%|██████▋ | 1.15G/1.71G [00:04<00:02, 255MB/s]\nmodel.safetensors: 69%|██████▉ | 1.18G/1.71G [00:05<00:02, 257MB/s]\nmodel.safetensors: 71%|███████ | 1.22G/1.71G [00:05<00:01, 261MB/s]\nmodel.safetensors: 73%|███████▎ | 1.25G/1.71G [00:05<00:01, 255MB/s]\nmodel.safetensors: 75%|███████▍ | 1.28G/1.71G [00:05<00:01, 260MB/s]\nmodel.safetensors: 77%|███████▋ | 1.31G/1.71G [00:05<00:01, 264MB/s]\nmodel.safetensors: 78%|███████▊ | 1.34G/1.71G [00:05<00:01, 253MB/s]\nmodel.safetensors: 80%|████████ | 1.37G/1.71G [00:05<00:01, 261MB/s]\nmodel.safetensors: 82%|████████▏ | 1.41G/1.71G [00:05<00:01, 267MB/s]\nmodel.safetensors: 84%|████████▍ | 1.44G/1.71G [00:05<00:01, 263MB/s]\nmodel.safetensors: 86%|████████▌ | 1.47G/1.71G [00:06<00:00, 262MB/s]\nmodel.safetensors: 88%|████████▊ | 1.50G/1.71G [00:06<00:00, 248MB/s]\nmodel.safetensors: 89%|████████▉ | 1.53G/1.71G [00:06<00:00, 256MB/s]\nmodel.safetensors: 92%|█████████▏| 1.57G/1.71G [00:06<00:00, 271MB/s]\nmodel.safetensors: 94%|█████████▍| 1.60G/1.71G [00:06<00:00, 257MB/s]\nmodel.safetensors: 96%|█████████▌| 1.64G/1.71G [00:06<00:00, 257MB/s]\nmodel.safetensors: 97%|█████████▋| 1.67G/1.71G [00:06<00:00, 258MB/s]\nmodel.safetensors: 99%|█████████▉| 1.70G/1.71G [00:07<00:00, 261MB/s]\nmodel.safetensors: 100%|██████████| 1.71G/1.71G [00:07<00:00, 242MB/s]\nSome weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'visual_projection.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'text_projection.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'logit_scale', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.weight']\n- This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n- This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\nload domain lora from /AnimateDiff/models/Motion_Module/v3_sd15_adapter.ckpt\n 0%| | 0/25 [00:00<?, ?it/s]\n 4%|▍ | 1/25 [00:01<00:36, 1.54s/it]\n 8%|▊ | 2/25 [00:03<00:35, 1.54s/it]\n 12%|█▏ | 3/25 [00:04<00:33, 1.54s/it]\n 16%|█▌ | 4/25 [00:06<00:32, 1.54s/it]\n 20%|██ | 5/25 [00:07<00:30, 1.54s/it]\n 24%|██▍ | 6/25 [00:09<00:29, 1.54s/it]\n 28%|██▊ | 7/25 [00:10<00:27, 1.54s/it]\n 32%|███▏ | 8/25 [00:12<00:26, 1.54s/it]\n 36%|███▌ | 9/25 [00:13<00:24, 1.54s/it]\n 40%|████ | 10/25 [00:15<00:23, 1.54s/it]\n 44%|████▍ | 11/25 [00:16<00:21, 1.54s/it]\n 48%|████▊ | 12/25 [00:18<00:20, 1.54s/it]\n 52%|█████▏ | 13/25 [00:20<00:18, 1.54s/it]\n 56%|█████▌ | 14/25 [00:21<00:16, 1.54s/it]\n 60%|██████ | 15/25 [00:23<00:15, 1.54s/it]\n 64%|██████▍ | 16/25 [00:24<00:13, 1.54s/it]\n 68%|██████▊ | 17/25 [00:26<00:12, 1.54s/it]\n 72%|███████▏ | 18/25 [00:27<00:10, 1.54s/it]\n 76%|███████▌ | 19/25 [00:29<00:09, 1.54s/it]\n 80%|████████ | 20/25 [00:30<00:07, 1.54s/it]\n 84%|████████▍ | 21/25 [00:32<00:06, 1.55s/it]\n 88%|████████▊ | 22/25 [00:33<00:04, 1.54s/it]\n 92%|█████████▏| 23/25 [00:35<00:03, 1.55s/it]\n 96%|█████████▌| 24/25 [00:37<00:01, 1.54s/it]\n100%|██████████| 25/25 [00:38<00:00, 1.55s/it]\n100%|██████████| 25/25 [00:38<00:00, 1.54s/it]\n 0%| | 0/16 [00:00<?, ?it/s]\n 31%|███▏ | 5/16 [00:00<00:00, 46.58it/s]\n 62%|██████▎ | 10/16 [00:00<00:00, 24.95it/s]\n 88%|████████▊ | 14/16 [00:00<00:00, 22.06it/s]\n100%|██████████| 16/16 [00:00<00:00, 23.13it/s]", "metrics": { "predict_time": 56.183088, "total_time": 273.986386 }, "output": "https://replicate.delivery/pbxt/GqRNcQFu03qcFpj5cHQ6dXKLHsfIATWmm9KRbOhwA6N7PeCSA/output.gif", "started_at": "2023-12-16T21:25:18.803104Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/pxl7jdlbirpfuq2g3bdrbcukvq", "cancel": "https://api.replicate.com/v1/predictions/pxl7jdlbirpfuq2g3bdrbcukvq/cancel" }, "version": "5e0d88c259792ad95a3c596ed1997de772c424dd0222802b830d43a36f861763" }
Generated inloading controlnet checkpoint from /AnimateDiff/models/SparseCtrl/v3_sd15_sparsectrl_rgb.ckpt ... load motion module from /AnimateDiff/models/Motion_Module/v3_sd15_mm.ckpt load dreambooth model from /AnimateDiff/models/DreamBooth_LoRA/realisticVisionV60B1_v51VAE.safetensors config.json: 0%| | 0.00/4.52k [00:00<?, ?B/s] config.json: 100%|██████████| 4.52k/4.52k [00:00<00:00, 18.9MB/s] model.safetensors: 0%| | 0.00/1.71G [00:00<?, ?B/s] model.safetensors: 1%| | 10.5M/1.71G [00:00<01:08, 25.0MB/s] model.safetensors: 2%|▏ | 41.9M/1.71G [00:00<00:18, 91.7MB/s] model.safetensors: 4%|▍ | 73.4M/1.71G [00:00<00:11, 146MB/s] model.safetensors: 6%|▌ | 105M/1.71G [00:00<00:08, 179MB/s] model.safetensors: 8%|▊ | 136M/1.71G [00:00<00:07, 202MB/s] model.safetensors: 10%|▉ | 168M/1.71G [00:01<00:07, 206MB/s] model.safetensors: 12%|█▏ | 199M/1.71G [00:01<00:06, 223MB/s] model.safetensors: 13%|█▎ | 231M/1.71G [00:01<00:06, 237MB/s] model.safetensors: 15%|█▌ | 262M/1.71G [00:01<00:05, 248MB/s] model.safetensors: 17%|█▋ | 294M/1.71G [00:01<00:06, 217MB/s] model.safetensors: 19%|█▉ | 325M/1.71G [00:01<00:05, 232MB/s] model.safetensors: 21%|██ | 357M/1.71G [00:01<00:05, 245MB/s] model.safetensors: 23%|██▎ | 388M/1.71G [00:01<00:05, 233MB/s] model.safetensors: 25%|██▍ | 419M/1.71G [00:02<00:05, 238MB/s] model.safetensors: 26%|██▋ | 451M/1.71G [00:02<00:05, 243MB/s] model.safetensors: 28%|██▊ | 482M/1.71G [00:02<00:04, 247MB/s] model.safetensors: 31%|███ | 524M/1.71G [00:02<00:04, 264MB/s] model.safetensors: 32%|███▏ | 556M/1.71G [00:02<00:04, 256MB/s] model.safetensors: 34%|███▍ | 587M/1.71G [00:02<00:04, 262MB/s] model.safetensors: 36%|███▌ | 619M/1.71G [00:02<00:04, 254MB/s] model.safetensors: 38%|███▊ | 650M/1.71G [00:02<00:04, 256MB/s] model.safetensors: 40%|███▉ | 682M/1.71G [00:03<00:03, 262MB/s] model.safetensors: 42%|████▏ | 713M/1.71G [00:03<00:03, 265MB/s] model.safetensors: 44%|████▎ | 744M/1.71G [00:03<00:03, 258MB/s] model.safetensors: 45%|████▌ | 776M/1.71G [00:03<00:03, 256MB/s] model.safetensors: 47%|████▋ | 807M/1.71G [00:03<00:03, 257MB/s] model.safetensors: 49%|████▉ | 839M/1.71G [00:03<00:03, 262MB/s] model.safetensors: 51%|█████ | 870M/1.71G [00:03<00:03, 264MB/s] model.safetensors: 53%|█████▎ | 902M/1.71G [00:03<00:03, 260MB/s] model.safetensors: 55%|█████▍ | 933M/1.71G [00:04<00:03, 256MB/s] model.safetensors: 56%|█████▋ | 965M/1.71G [00:04<00:02, 258MB/s] model.safetensors: 58%|█████▊ | 996M/1.71G [00:04<00:02, 259MB/s] model.safetensors: 60%|██████ | 1.03G/1.71G [00:04<00:02, 262MB/s] model.safetensors: 62%|██████▏ | 1.06G/1.71G [00:04<00:02, 262MB/s] model.safetensors: 64%|██████▍ | 1.09G/1.71G [00:04<00:02, 260MB/s] model.safetensors: 66%|██████▌ | 1.12G/1.71G [00:04<00:02, 260MB/s] model.safetensors: 67%|██████▋ | 1.15G/1.71G [00:04<00:02, 255MB/s] model.safetensors: 69%|██████▉ | 1.18G/1.71G [00:05<00:02, 257MB/s] model.safetensors: 71%|███████ | 1.22G/1.71G [00:05<00:01, 261MB/s] model.safetensors: 73%|███████▎ | 1.25G/1.71G [00:05<00:01, 255MB/s] model.safetensors: 75%|███████▍ | 1.28G/1.71G [00:05<00:01, 260MB/s] model.safetensors: 77%|███████▋ | 1.31G/1.71G [00:05<00:01, 264MB/s] model.safetensors: 78%|███████▊ | 1.34G/1.71G [00:05<00:01, 253MB/s] model.safetensors: 80%|████████ | 1.37G/1.71G [00:05<00:01, 261MB/s] model.safetensors: 82%|████████▏ | 1.41G/1.71G [00:05<00:01, 267MB/s] model.safetensors: 84%|████████▍ | 1.44G/1.71G [00:05<00:01, 263MB/s] model.safetensors: 86%|████████▌ | 1.47G/1.71G [00:06<00:00, 262MB/s] model.safetensors: 88%|████████▊ | 1.50G/1.71G [00:06<00:00, 248MB/s] model.safetensors: 89%|████████▉ | 1.53G/1.71G [00:06<00:00, 256MB/s] model.safetensors: 92%|█████████▏| 1.57G/1.71G [00:06<00:00, 271MB/s] model.safetensors: 94%|█████████▍| 1.60G/1.71G [00:06<00:00, 257MB/s] model.safetensors: 96%|█████████▌| 1.64G/1.71G [00:06<00:00, 257MB/s] model.safetensors: 97%|█████████▋| 1.67G/1.71G [00:06<00:00, 258MB/s] model.safetensors: 99%|█████████▉| 1.70G/1.71G [00:07<00:00, 261MB/s] model.safetensors: 100%|██████████| 1.71G/1.71G [00:07<00:00, 242MB/s] Some weights of the model checkpoint at openai/clip-vit-large-patch14 were not used when initializing CLIPTextModel: ['vision_model.encoder.layers.14.mlp.fc1.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.bias', 'vision_model.encoder.layers.19.layer_norm2.weight', 'vision_model.encoder.layers.5.mlp.fc2.weight', 'vision_model.encoder.layers.16.mlp.fc1.bias', 'vision_model.encoder.layers.7.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.k_proj.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.bias', 'vision_model.encoder.layers.10.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.layer_norm1.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.weight', 'vision_model.encoder.layers.9.self_attn.v_proj.weight', 'vision_model.encoder.layers.15.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.mlp.fc1.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.mlp.fc1.weight', 'vision_model.encoder.layers.21.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.self_attn.k_proj.bias', 'vision_model.encoder.layers.23.layer_norm1.bias', 'vision_model.encoder.layers.20.mlp.fc2.bias', 'vision_model.encoder.layers.7.mlp.fc1.bias', 'vision_model.encoder.layers.12.layer_norm2.bias', 'vision_model.encoder.layers.21.mlp.fc1.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.bias', 'vision_model.encoder.layers.13.layer_norm1.weight', 'vision_model.encoder.layers.5.layer_norm1.weight', 'vision_model.encoder.layers.13.self_attn.k_proj.weight', 'vision_model.encoder.layers.19.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.v_proj.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.weight', 'vision_model.encoder.layers.17.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.q_proj.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.weight', 'vision_model.encoder.layers.14.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.k_proj.weight', 'vision_model.encoder.layers.6.mlp.fc2.weight', 'visual_projection.weight', 'vision_model.encoder.layers.17.layer_norm1.bias', 'vision_model.encoder.layers.15.layer_norm2.weight', 'vision_model.encoder.layers.12.layer_norm2.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.bias', 'vision_model.encoder.layers.8.mlp.fc2.weight', 'vision_model.encoder.layers.6.mlp.fc2.bias', 'vision_model.encoder.layers.10.mlp.fc2.weight', 'vision_model.encoder.layers.22.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.v_proj.bias', 'vision_model.post_layernorm.bias', 'vision_model.encoder.layers.4.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.weight', 'vision_model.encoder.layers.22.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.self_attn.out_proj.weight', 'vision_model.pre_layrnorm.bias', 'vision_model.encoder.layers.12.mlp.fc2.weight', 'vision_model.encoder.layers.9.mlp.fc2.bias', 'vision_model.encoder.layers.8.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.weight', 'vision_model.encoder.layers.18.layer_norm2.weight', 'vision_model.encoder.layers.18.mlp.fc2.weight', 'vision_model.encoder.layers.20.mlp.fc1.bias', 'vision_model.encoder.layers.5.layer_norm1.bias', 'vision_model.encoder.layers.6.layer_norm2.weight', 'vision_model.encoder.layers.0.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.mlp.fc1.bias', 'vision_model.encoder.layers.3.mlp.fc1.bias', 'vision_model.encoder.layers.17.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm2.bias', 'vision_model.encoder.layers.21.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.layer_norm1.weight', 'vision_model.encoder.layers.23.mlp.fc1.bias', 'vision_model.encoder.layers.3.self_attn.v_proj.bias', 'vision_model.encoder.layers.5.mlp.fc2.bias', 'vision_model.encoder.layers.8.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.k_proj.weight', 'vision_model.encoder.layers.8.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.mlp.fc2.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.weight', 'vision_model.encoder.layers.3.mlp.fc1.weight', 'vision_model.encoder.layers.10.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm2.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.layer_norm2.bias', 'vision_model.encoder.layers.7.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.k_proj.bias', 'vision_model.encoder.layers.13.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.self_attn.v_proj.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.self_attn.out_proj.bias', 'vision_model.encoder.layers.16.layer_norm2.weight', 'vision_model.encoder.layers.10.self_attn.q_proj.weight', 'vision_model.encoder.layers.19.self_attn.out_proj.bias', 'text_projection.weight', 'vision_model.encoder.layers.9.self_attn.q_proj.weight', 'vision_model.encoder.layers.5.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.layer_norm1.weight', 'vision_model.encoder.layers.23.mlp.fc2.weight', 'vision_model.encoder.layers.7.mlp.fc2.bias', 'vision_model.encoder.layers.14.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.q_proj.weight', 'vision_model.encoder.layers.23.mlp.fc2.bias', 'vision_model.encoder.layers.23.layer_norm1.weight', 'vision_model.encoder.layers.19.self_attn.v_proj.weight', 'vision_model.encoder.layers.3.self_attn.q_proj.bias', 'vision_model.encoder.layers.11.self_attn.k_proj.weight', 'vision_model.encoder.layers.1.layer_norm1.bias', 'vision_model.encoder.layers.17.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.self_attn.q_proj.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.mlp.fc2.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.bias', 'vision_model.encoder.layers.4.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.weight', 'vision_model.encoder.layers.5.self_attn.out_proj.bias', 'vision_model.encoder.layers.15.mlp.fc2.weight', 'vision_model.encoder.layers.15.layer_norm1.weight', 'vision_model.encoder.layers.3.mlp.fc2.weight', 'vision_model.encoder.layers.11.mlp.fc1.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.bias', 'vision_model.post_layernorm.weight', 'vision_model.encoder.layers.22.mlp.fc2.bias', 'logit_scale', 'vision_model.encoder.layers.15.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.self_attn.v_proj.bias', 'vision_model.encoder.layers.21.self_attn.out_proj.bias', 'vision_model.encoder.layers.20.self_attn.q_proj.weight', 'vision_model.encoder.layers.1.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm2.bias', 'vision_model.encoder.layers.4.mlp.fc2.bias', 'vision_model.encoder.layers.4.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.bias', 'vision_model.encoder.layers.2.mlp.fc2.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.weight', 'vision_model.encoder.layers.6.layer_norm1.bias', 'vision_model.encoder.layers.16.layer_norm1.bias', 'vision_model.encoder.layers.16.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.weight', 'vision_model.encoder.layers.11.self_attn.k_proj.bias', 'vision_model.encoder.layers.22.layer_norm2.weight', 'vision_model.encoder.layers.21.mlp.fc2.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.mlp.fc1.weight', 'vision_model.encoder.layers.0.self_attn.v_proj.bias', 'vision_model.encoder.layers.8.self_attn.q_proj.weight', 'vision_model.encoder.layers.10.layer_norm1.bias', 'vision_model.encoder.layers.0.mlp.fc2.bias', 'vision_model.encoder.layers.12.layer_norm1.bias', 'vision_model.encoder.layers.13.self_attn.out_proj.bias', 'vision_model.encoder.layers.23.layer_norm2.weight', 'vision_model.encoder.layers.22.layer_norm1.weight', 'vision_model.encoder.layers.6.mlp.fc1.weight', 'vision_model.encoder.layers.13.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.layer_norm1.bias', 'vision_model.encoder.layers.6.mlp.fc1.bias', 'vision_model.encoder.layers.16.mlp.fc1.weight', 'vision_model.encoder.layers.16.self_attn.out_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.bias', 'vision_model.encoder.layers.13.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.self_attn.v_proj.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.bias', 'vision_model.encoder.layers.7.mlp.fc1.weight', 'vision_model.encoder.layers.4.layer_norm2.weight', 'vision_model.encoder.layers.0.layer_norm2.weight', 'vision_model.embeddings.position_embedding.weight', 'vision_model.encoder.layers.11.layer_norm2.weight', 'vision_model.encoder.layers.17.mlp.fc1.bias', 'vision_model.encoder.layers.15.layer_norm2.bias', 'vision_model.encoder.layers.8.layer_norm2.bias', 'vision_model.encoder.layers.4.layer_norm2.bias', 'vision_model.encoder.layers.23.self_attn.v_proj.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.bias', 'vision_model.encoder.layers.22.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.q_proj.bias', 'vision_model.encoder.layers.22.layer_norm1.bias', 'vision_model.encoder.layers.22.self_attn.q_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.bias', 'vision_model.encoder.layers.2.layer_norm1.weight', 'vision_model.encoder.layers.2.self_attn.v_proj.bias', 'vision_model.encoder.layers.16.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.mlp.fc1.bias', 'vision_model.encoder.layers.21.layer_norm2.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.bias', 'vision_model.encoder.layers.19.self_attn.k_proj.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.bias', 'vision_model.encoder.layers.0.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.mlp.fc2.bias', 'vision_model.encoder.layers.5.layer_norm2.weight', 'vision_model.encoder.layers.0.layer_norm1.weight', 'vision_model.encoder.layers.11.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.mlp.fc2.bias', 'vision_model.encoder.layers.6.layer_norm1.weight', 'vision_model.encoder.layers.14.layer_norm1.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.weight', 'vision_model.encoder.layers.11.mlp.fc1.weight', 'vision_model.encoder.layers.21.layer_norm1.weight', 'vision_model.encoder.layers.16.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.mlp.fc1.weight', 'vision_model.encoder.layers.12.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.mlp.fc2.bias', 'vision_model.encoder.layers.17.mlp.fc1.weight', 'vision_model.encoder.layers.16.mlp.fc2.bias', 'vision_model.encoder.layers.14.layer_norm1.bias', 'vision_model.encoder.layers.23.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.self_attn.v_proj.weight', 'vision_model.encoder.layers.10.self_attn.v_proj.bias', 'vision_model.encoder.layers.14.self_attn.out_proj.bias', 'vision_model.encoder.layers.12.layer_norm1.weight', 'vision_model.encoder.layers.6.self_attn.v_proj.weight', 'vision_model.encoder.layers.0.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.weight', 'vision_model.encoder.layers.14.self_attn.k_proj.bias', 'vision_model.encoder.layers.8.layer_norm2.weight', 'vision_model.encoder.layers.9.mlp.fc1.bias', 'vision_model.encoder.layers.10.layer_norm2.weight', 'vision_model.encoder.layers.12.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.bias', 'vision_model.encoder.layers.6.self_attn.out_proj.weight', 'vision_model.encoder.layers.2.mlp.fc1.bias', 'vision_model.encoder.layers.5.self_attn.out_proj.weight', 'vision_model.encoder.layers.3.self_attn.k_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.bias', 'vision_model.encoder.layers.1.mlp.fc1.weight', 'vision_model.encoder.layers.16.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.q_proj.bias', 'vision_model.encoder.layers.16.self_attn.out_proj.weight', 'vision_model.encoder.layers.4.self_attn.v_proj.weight', 'vision_model.encoder.layers.7.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.self_attn.out_proj.weight', 'vision_model.encoder.layers.6.self_attn.q_proj.weight', 'vision_model.encoder.layers.22.mlp.fc2.weight', 'vision_model.encoder.layers.23.self_attn.k_proj.weight', 'vision_model.encoder.layers.20.self_attn.q_proj.bias', 'vision_model.encoder.layers.18.self_attn.out_proj.weight', 'vision_model.encoder.layers.16.mlp.fc2.weight', 'vision_model.encoder.layers.2.mlp.fc1.weight', 'vision_model.encoder.layers.19.layer_norm1.bias', 'vision_model.encoder.layers.12.self_attn.q_proj.weight', 'vision_model.encoder.layers.17.layer_norm2.weight', 'vision_model.encoder.layers.19.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.bias', 'vision_model.encoder.layers.18.mlp.fc1.weight', 'vision_model.encoder.layers.7.mlp.fc2.weight', 'vision_model.encoder.layers.21.layer_norm2.bias', 'vision_model.encoder.layers.23.layer_norm2.bias', 'vision_model.encoder.layers.15.mlp.fc2.bias', 'vision_model.encoder.layers.13.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.self_attn.q_proj.bias', 'vision_model.encoder.layers.5.self_attn.v_proj.weight', 'vision_model.embeddings.class_embedding', 'vision_model.encoder.layers.15.self_attn.k_proj.weight', 'vision_model.embeddings.position_ids', 'vision_model.encoder.layers.12.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.mlp.fc2.weight', 'vision_model.encoder.layers.23.mlp.fc1.weight', 'vision_model.encoder.layers.15.self_attn.out_proj.weight', 'vision_model.embeddings.patch_embedding.weight', 'vision_model.encoder.layers.9.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.q_proj.bias', 'vision_model.encoder.layers.4.self_attn.k_proj.bias', 'vision_model.encoder.layers.5.layer_norm2.bias', 'vision_model.encoder.layers.10.self_attn.q_proj.bias', 'vision_model.encoder.layers.12.self_attn.v_proj.bias', 'vision_model.encoder.layers.20.self_attn.v_proj.weight', 'vision_model.encoder.layers.18.layer_norm1.weight', 'vision_model.encoder.layers.22.self_attn.v_proj.weight', 'vision_model.encoder.layers.4.self_attn.k_proj.weight', 'vision_model.encoder.layers.13.self_attn.q_proj.weight', 'vision_model.encoder.layers.9.self_attn.out_proj.weight', 'vision_model.encoder.layers.0.self_attn.k_proj.bias', 'vision_model.encoder.layers.1.layer_norm2.weight', 'vision_model.encoder.layers.0.mlp.fc2.weight', 'vision_model.encoder.layers.12.mlp.fc1.bias', 'vision_model.encoder.layers.21.mlp.fc1.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.bias', 'vision_model.encoder.layers.11.mlp.fc2.bias', 'vision_model.encoder.layers.3.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.layer_norm2.bias', 'vision_model.encoder.layers.13.layer_norm2.weight', 'vision_model.encoder.layers.20.layer_norm1.weight', 'vision_model.encoder.layers.0.layer_norm1.bias', 'vision_model.encoder.layers.6.layer_norm2.bias', 'vision_model.encoder.layers.10.mlp.fc2.bias', 'vision_model.encoder.layers.9.layer_norm2.bias', 'vision_model.encoder.layers.9.self_attn.v_proj.bias', 'vision_model.encoder.layers.17.mlp.fc2.weight', 'vision_model.encoder.layers.20.self_attn.out_proj.bias', 'vision_model.encoder.layers.11.self_attn.v_proj.weight', 'vision_model.encoder.layers.5.mlp.fc1.weight', 'vision_model.encoder.layers.13.layer_norm1.bias', 'vision_model.encoder.layers.21.self_attn.q_proj.bias', 'vision_model.encoder.layers.7.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.k_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.bias', 'vision_model.encoder.layers.12.mlp.fc1.weight', 'vision_model.encoder.layers.18.layer_norm2.bias', 'vision_model.encoder.layers.8.mlp.fc2.bias', 'vision_model.encoder.layers.13.mlp.fc1.weight', 'vision_model.encoder.layers.11.self_attn.out_proj.weight', 'vision_model.encoder.layers.21.mlp.fc2.bias', 'vision_model.encoder.layers.14.self_attn.q_proj.weight', 'vision_model.encoder.layers.2.self_attn.k_proj.weight', 'vision_model.encoder.layers.14.mlp.fc1.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.bias', 'vision_model.encoder.layers.1.self_attn.q_proj.weight', 'vision_model.encoder.layers.4.self_attn.out_proj.weight', 'vision_model.encoder.layers.7.self_attn.v_proj.weight', 'vision_model.encoder.layers.21.self_attn.k_proj.bias', 'vision_model.encoder.layers.4.self_attn.q_proj.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.mlp.fc2.bias', 'vision_model.encoder.layers.10.self_attn.out_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.bias', 'vision_model.encoder.layers.15.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.layer_norm1.bias', 'vision_model.encoder.layers.7.self_attn.out_proj.weight', 'vision_model.encoder.layers.20.mlp.fc2.weight', 'vision_model.encoder.layers.9.self_attn.k_proj.bias', 'vision_model.encoder.layers.3.self_attn.q_proj.weight', 'vision_model.encoder.layers.14.self_attn.v_proj.bias', 'vision_model.encoder.layers.23.self_attn.out_proj.weight', 'vision_model.encoder.layers.17.layer_norm1.weight', 'vision_model.encoder.layers.19.mlp.fc2.weight', 'vision_model.encoder.layers.13.mlp.fc2.bias', 'vision_model.encoder.layers.3.layer_norm2.bias', 'vision_model.encoder.layers.18.mlp.fc1.bias', 'vision_model.encoder.layers.9.layer_norm1.weight', 'vision_model.encoder.layers.8.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.mlp.fc2.weight', 'vision_model.encoder.layers.6.self_attn.k_proj.weight', 'vision_model.encoder.layers.10.mlp.fc1.weight', 'vision_model.encoder.layers.1.layer_norm1.weight', 'vision_model.encoder.layers.20.layer_norm2.bias', 'vision_model.encoder.layers.11.layer_norm1.weight', 'vision_model.encoder.layers.22.layer_norm2.bias', 'vision_model.encoder.layers.13.mlp.fc1.bias', 'vision_model.encoder.layers.23.self_attn.k_proj.bias', 'vision_model.encoder.layers.18.self_attn.q_proj.bias', 'vision_model.encoder.layers.9.self_attn.out_proj.bias', 'vision_model.encoder.layers.14.mlp.fc2.weight', 'vision_model.encoder.layers.3.layer_norm1.weight', 'vision_model.encoder.layers.1.self_attn.out_proj.weight', 'vision_model.encoder.layers.15.self_attn.v_proj.bias', 'vision_model.encoder.layers.13.self_attn.q_proj.bias', 'vision_model.encoder.layers.2.layer_norm2.weight', 'vision_model.encoder.layers.11.layer_norm1.bias', 'vision_model.encoder.layers.10.self_attn.k_proj.weight', 'vision_model.encoder.layers.21.self_attn.out_proj.weight', 'vision_model.pre_layrnorm.weight', 'vision_model.encoder.layers.15.self_attn.k_proj.bias', 'vision_model.encoder.layers.19.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.weight', 'vision_model.encoder.layers.3.self_attn.v_proj.weight', 'vision_model.encoder.layers.20.layer_norm1.bias', 'vision_model.encoder.layers.9.layer_norm2.weight', 'vision_model.encoder.layers.6.self_attn.out_proj.bias', 'vision_model.encoder.layers.2.self_attn.v_proj.weight', 'vision_model.encoder.layers.8.self_attn.out_proj.weight', 'vision_model.encoder.layers.19.self_attn.q_proj.bias', 'vision_model.encoder.layers.3.layer_norm2.weight', 'vision_model.encoder.layers.18.self_attn.k_proj.weight', 'vision_model.encoder.layers.4.mlp.fc1.bias', 'vision_model.encoder.layers.20.layer_norm2.weight', 'vision_model.encoder.layers.1.self_attn.k_proj.weight'] - This IS expected if you are initializing CLIPTextModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing CLIPTextModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). load domain lora from /AnimateDiff/models/Motion_Module/v3_sd15_adapter.ckpt 0%| | 0/25 [00:00<?, ?it/s] 4%|▍ | 1/25 [00:01<00:36, 1.54s/it] 8%|▊ | 2/25 [00:03<00:35, 1.54s/it] 12%|█▏ | 3/25 [00:04<00:33, 1.54s/it] 16%|█▌ | 4/25 [00:06<00:32, 1.54s/it] 20%|██ | 5/25 [00:07<00:30, 1.54s/it] 24%|██▍ | 6/25 [00:09<00:29, 1.54s/it] 28%|██▊ | 7/25 [00:10<00:27, 1.54s/it] 32%|███▏ | 8/25 [00:12<00:26, 1.54s/it] 36%|███▌ | 9/25 [00:13<00:24, 1.54s/it] 40%|████ | 10/25 [00:15<00:23, 1.54s/it] 44%|████▍ | 11/25 [00:16<00:21, 1.54s/it] 48%|████▊ | 12/25 [00:18<00:20, 1.54s/it] 52%|█████▏ | 13/25 [00:20<00:18, 1.54s/it] 56%|█████▌ | 14/25 [00:21<00:16, 1.54s/it] 60%|██████ | 15/25 [00:23<00:15, 1.54s/it] 64%|██████▍ | 16/25 [00:24<00:13, 1.54s/it] 68%|██████▊ | 17/25 [00:26<00:12, 1.54s/it] 72%|███████▏ | 18/25 [00:27<00:10, 1.54s/it] 76%|███████▌ | 19/25 [00:29<00:09, 1.54s/it] 80%|████████ | 20/25 [00:30<00:07, 1.54s/it] 84%|████████▍ | 21/25 [00:32<00:06, 1.55s/it] 88%|████████▊ | 22/25 [00:33<00:04, 1.54s/it] 92%|█████████▏| 23/25 [00:35<00:03, 1.55s/it] 96%|█████████▌| 24/25 [00:37<00:01, 1.54s/it] 100%|██████████| 25/25 [00:38<00:00, 1.55s/it] 100%|██████████| 25/25 [00:38<00:00, 1.54s/it] 0%| | 0/16 [00:00<?, ?it/s] 31%|███▏ | 5/16 [00:00<00:00, 46.58it/s] 62%|██████▎ | 10/16 [00:00<00:00, 24.95it/s] 88%|████████▊ | 14/16 [00:00<00:00, 22.06it/s] 100%|██████████| 16/16 [00:00<00:00, 23.13it/s]
Want to make some of these yourself?
Run this model