Readme
This model doesn't have a readme.
Run this model in Node.js with one line of code:
npm install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import Replicate from "replicate";
const replicate = new Replicate({
auth: process.env.REPLICATE_API_TOKEN,
});
Run zhouzhengjun/lora_train_base using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run(
"zhouzhengjun/lora_train_base:32fdf1651c6024ec80472a0bc0f1c206f10a2e7c35639209d8e9f24b10e2707b",
{
input: {
task: "face",
resolution: 512,
instance_data: "https://replicate.delivery/pbxt/IwhSXVwpZVuNT3ESv09fE1NgdRg0N277YJWJflDDu7Hgil9G/clfutialw0007zu9mb7h3l51s.zip"
}
}
);
console.log(output);
To learn more, take a look at the guide on getting started with Node.js.
pip install replicate
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
import replicate
Run zhouzhengjun/lora_train_base using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run(
"zhouzhengjun/lora_train_base:32fdf1651c6024ec80472a0bc0f1c206f10a2e7c35639209d8e9f24b10e2707b",
input={
"task": "face",
"resolution": 512,
"instance_data": "https://replicate.delivery/pbxt/IwhSXVwpZVuNT3ESv09fE1NgdRg0N277YJWJflDDu7Hgil9G/clfutialw0007zu9mb7h3l51s.zip"
}
)
print(output)
To learn more, take a look at the guide on getting started with Python.
REPLICATE_API_TOKEN
environment variable:export REPLICATE_API_TOKEN=<paste-your-token-here>
Find your API token in your account settings.
Run zhouzhengjun/lora_train_base using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-H "Prefer: wait" \
-d $'{
"version": "32fdf1651c6024ec80472a0bc0f1c206f10a2e7c35639209d8e9f24b10e2707b",
"input": {
"task": "face",
"resolution": 512,
"instance_data": "https://replicate.delivery/pbxt/IwhSXVwpZVuNT3ESv09fE1NgdRg0N277YJWJflDDu7Hgil9G/clfutialw0007zu9mb7h3l51s.zip"
}
}' \
https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Add a payment method to run this model.
By signing in, you agree to our
terms of service and privacy policy
{
"completed_at": "2023-06-05T03:27:25.249683Z",
"created_at": "2023-06-05T03:03:56.716639Z",
"data_removed": false,
"error": null,
"id": "a22ulrmhu5epzjbt24uygdomcq",
"input": {
"task": "face",
"resolution": 512,
"instance_data": "https://replicate.delivery/pbxt/IwhSXVwpZVuNT3ESv09fE1NgdRg0N277YJWJflDDu7Hgil9G/clfutialw0007zu9mb7h3l51s.zip"
},
"logs": "Using seed: 44374\nPTI : Initializer Tokens not given, doing random inits\nPTI : Placeholder Tokens ['<s1>', '<s2>']\nPTI : Initializer Tokens ['<rand-0.017>', '<rand-0.017>']\nInitialized <s1> with random noise (sigma=0.017), empirically 0.000 +- 0.017\nNorm : 0.4636\nInitialized <s2> with random noise (sigma=0.017), empirically 0.000 +- 0.017\nNorm : 0.4810\n/root/.pyenv/versions/3.10.11/lib/python3.10/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.\ndeprecate(\"config-passed-as-path\", \"1.0.0\", deprecation_message, standard_warn=False)\nMask not found for cog_instance_data/0.mask.png\nWarning : this will pre-process all the images in the instance data root.\n 0%| | 0/15 [00:00<?, ?it/s]INFO: Created TensorFlow Lite XNNPACK delegate for CPU.\n 7%|▋ | 1/15 [00:00<00:02, 5.94it/s]\n 40%|████ | 6/15 [00:00<00:00, 25.47it/s]\n100%|██████████| 15/15 [00:00<00:00, 46.74it/s]\n100%|██████████| 15/15 [00:00<00:00, 37.89it/s]\na photo of a cool <s1><s2>\n 0%| | 0/15 [00:00<?, ?it/s]\na photo of a nice <s1><s2>\na cropped photo of the <s1><s2>\n 7%|▋ | 1/15 [00:07<01:49, 7.81s/it]\na photo of the nice <s1><s2>\n 20%|██ | 3/15 [00:08<00:25, 2.13s/it]\na photo of the clean <s1><s2>\n 27%|██▋ | 4/15 [00:08<00:16, 1.46s/it]\na good photo of the <s1><s2>\n 33%|███▎ | 5/15 [00:08<00:10, 1.05s/it]\na photo of a clean <s1><s2>\n 40%|████ | 6/15 [00:08<00:07, 1.28it/s]\na photo of the <s1><s2>\n 47%|████▋ | 7/15 [00:08<00:04, 1.66it/s]\na photo of a nice <s1><s2>\n 53%|█████▎ | 8/15 [00:09<00:03, 2.10it/s]\na rendition of the <s1><s2>\n 60%|██████ | 9/15 [00:09<00:02, 2.54it/s]\na photo of a nice <s1><s2>\n 67%|██████▋ | 10/15 [00:09<00:01, 2.98it/s]\na photo of a small <s1><s2>\n 73%|███████▎ | 11/15 [00:09<00:01, 3.37it/s]\na photo of the <s1><s2>\n 80%|████████ | 12/15 [00:09<00:00, 3.70it/s]\na rendition of the <s1><s2>\n 87%|████████▋ | 13/15 [00:10<00:00, 3.97it/s]\na photo of my <s1><s2>\n 93%|█████████▎| 14/15 [00:10<00:00, 4.17it/s]\n100%|██████████| 15/15 [00:10<00:00, 4.34it/s]\n100%|██████████| 15/15 [00:10<00:00, 1.42it/s]\nPTI : Using cached latent.\n0%| | 0/1000 [00:00<?, ?it/s]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4645],\n[0.4824]], device='cuda:0')\nCurrent Norm : tensor([0.4580, 0.4741], device='cuda:0')\nSteps: 0%| | 0/1000 [00:00<?, ?it/s]\nSteps: 0%| | 1/1000 [00:05<1:33:07, 5.59s/it]\nSteps: 0%| | 1/1000 [00:05<1:33:07, 5.59s/it, loss=0.00105, lr=0.001]\nSteps: 0%| | 2/1000 [00:06<43:50, 2.64s/it, loss=0.00105, lr=0.001] \ntensor(0.0063, device='cuda:0')\ntensor([[0.4600],\n[0.4757]], device='cuda:0')\nCurrent Norm : tensor([0.4540, 0.4681], device='cuda:0')\nSteps: 0%| | 2/1000 [00:06<43:50, 2.64s/it, loss=0.00522, lr=0.001]\nSteps: 0%| | 3/1000 [00:06<28:27, 1.71s/it, loss=0.00522, lr=0.001]\nSteps: 0%| | 3/1000 [00:06<28:27, 1.71s/it, loss=0.0674, lr=0.001] \nSteps: 0%| | 4/1000 [00:07<20:57, 1.26s/it, loss=0.0674, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4559],\n [0.4699]], device='cuda:0')\nCurrent Norm : tensor([0.4503, 0.4629], device='cuda:0')\nSteps: 0%| | 4/1000 [00:07<20:57, 1.26s/it, loss=0.00126, lr=0.001]\nSteps: 0%| | 5/1000 [00:07<17:01, 1.03s/it, loss=0.00126, lr=0.001]\nSteps: 0%| | 5/1000 [00:07<17:01, 1.03s/it, loss=0.00103, lr=0.001]\nSteps: 1%| | 6/1000 [00:08<14:27, 1.15it/s, loss=0.00103, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4523],\n[0.4645]], device='cuda:0')\nCurrent Norm : tensor([0.4470, 0.4581], device='cuda:0')\nSteps: 1%| | 6/1000 [00:08<14:27, 1.15it/s, loss=0.00719, lr=0.001]\nSteps: 1%| | 7/1000 [00:09<13:02, 1.27it/s, loss=0.00719, lr=0.001]\nSteps: 1%| | 7/1000 [00:09<13:02, 1.27it/s, loss=0.000785, lr=0.001]\nSteps: 1%| | 8/1000 [00:09<11:53, 1.39it/s, loss=0.000785, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4486],\n[0.4593]], device='cuda:0')\nCurrent Norm : tensor([0.4437, 0.4534], device='cuda:0')\nSteps: 1%| | 8/1000 [00:09<11:53, 1.39it/s, loss=0.00652, lr=0.001] \nSteps: 1%| | 9/1000 [00:10<11:19, 1.46it/s, loss=0.00652, lr=0.001]\nSteps: 1%| | 9/1000 [00:10<11:19, 1.46it/s, loss=0.000549, lr=0.001]\nSteps: 1%| | 10/1000 [00:10<10:46, 1.53it/s, loss=0.000549, lr=0.001]\ntensor(0.0057, device='cuda:0')\ntensor([[0.4453],\n[0.4547]], device='cuda:0')\nCurrent Norm : tensor([0.4408, 0.4493], device='cuda:0')\nSteps: 1%| | 10/1000 [00:10<10:46, 1.53it/s, loss=0.079, lr=0.001] \nSteps: 1%| | 11/1000 [00:11<10:31, 1.57it/s, loss=0.079, lr=0.001]\nSteps: 1%| | 11/1000 [00:11<10:31, 1.57it/s, loss=0.00539, lr=0.001]\nSteps: 1%| | 12/1000 [00:12<10:11, 1.62it/s, loss=0.00539, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4424],\n[0.4503]], device='cuda:0')\nCurrent Norm : tensor([0.4381, 0.4453], device='cuda:0')\nSteps: 1%| | 12/1000 [00:12<10:11, 1.62it/s, loss=0.0134, lr=0.001] \nSteps: 1%|▏ | 13/1000 [00:12<10:07, 1.63it/s, loss=0.0134, lr=0.001]\nSteps: 1%|▏ | 13/1000 [00:12<10:07, 1.63it/s, loss=0.00237, lr=0.001]\nSteps: 1%|▏ | 14/1000 [00:13<09:57, 1.65it/s, loss=0.00237, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4397],\n[0.4465]], device='cuda:0')\nCurrent Norm : tensor([0.4357, 0.4418], device='cuda:0')\nSteps: 1%|▏ | 14/1000 [00:13<09:57, 1.65it/s, loss=0.0392, lr=0.001] \nSteps: 2%|▏ | 15/1000 [00:13<09:57, 1.65it/s, loss=0.0392, lr=0.001]\nSteps: 2%|▏ | 15/1000 [00:13<09:57, 1.65it/s, loss=0.025, lr=0.001] \nSteps: 2%|▏ | 16/1000 [00:14<09:49, 1.67it/s, loss=0.025, lr=0.001]\ntensor(0.0057, device='cuda:0')\ntensor([[0.4370],\n[0.4430]], device='cuda:0')\nCurrent Norm : tensor([0.4333, 0.4387], device='cuda:0')\nSteps: 2%|▏ | 16/1000 [00:14<09:49, 1.67it/s, loss=0.00716, lr=0.001]\nSteps: 2%|▏ | 17/1000 [00:15<09:55, 1.65it/s, loss=0.00716, lr=0.001]\nSteps: 2%|▏ | 17/1000 [00:15<09:55, 1.65it/s, loss=0.101, lr=0.001] \nSteps: 2%|▏ | 18/1000 [00:15<09:49, 1.67it/s, loss=0.101, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4346],\n[0.4399]], device='cuda:0')\nCurrent Norm : tensor([0.4311, 0.4359], device='cuda:0')\nSteps: 2%|▏ | 18/1000 [00:15<09:49, 1.67it/s, loss=0.0104, lr=0.001]\nSteps: 2%|▏ | 19/1000 [00:16<09:50, 1.66it/s, loss=0.0104, lr=0.001]\nSteps: 2%|▏ | 19/1000 [00:16<09:50, 1.66it/s, loss=0.0845, lr=0.001]\nSteps: 2%|▏ | 20/1000 [00:16<09:41, 1.69it/s, loss=0.0845, lr=0.001]\ntensor(0.0006, device='cuda:0')\ntensor([[0.4324],\n[0.4371]], device='cuda:0')\nCurrent Norm : tensor([0.4291, 0.4334], device='cuda:0')\nSteps: 2%|▏ | 20/1000 [00:16<09:41, 1.69it/s, loss=0.000133, lr=0.001]\nSteps: 2%|▏ | 21/1000 [00:17<09:46, 1.67it/s, loss=0.000133, lr=0.001]\nSteps: 2%|▏ | 21/1000 [00:17<09:46, 1.67it/s, loss=0.000289, lr=0.001]\nSteps: 2%|▏ | 22/1000 [00:18<09:42, 1.68it/s, loss=0.000289, lr=0.001]\ntensor(0.0017, device='cuda:0')\ntensor([[0.4304],\n[0.4347]], device='cuda:0')\nCurrent Norm : tensor([0.4273, 0.4312], device='cuda:0')\nSteps: 2%|▏ | 22/1000 [00:18<09:42, 1.68it/s, loss=0.00106, lr=0.001] \nSteps: 2%|▏ | 23/1000 [00:18<09:47, 1.66it/s, loss=0.00106, lr=0.001]\nSteps: 2%|▏ | 23/1000 [00:18<09:47, 1.66it/s, loss=0.000295, lr=0.001]\nSteps: 2%|▏ | 24/1000 [00:19<09:39, 1.68it/s, loss=0.000295, lr=0.001]\ntensor(0.0042, device='cuda:0')\ntensor([[0.4285],\n[0.4325]], device='cuda:0')\nCurrent Norm : tensor([0.4256, 0.4292], device='cuda:0')\nSteps: 2%|▏ | 24/1000 [00:19<09:39, 1.68it/s, loss=0.0145, lr=0.001] \nSteps: 2%|▎ | 25/1000 [00:19<09:43, 1.67it/s, loss=0.0145, lr=0.001]\nSteps: 2%|▎ | 25/1000 [00:19<09:43, 1.67it/s, loss=0.0101, lr=0.001]\nSteps: 3%|▎ | 26/1000 [00:20<09:37, 1.69it/s, loss=0.0101, lr=0.001]\ntensor(0.0068, device='cuda:0')\ntensor([[0.4268],\n[0.4305]], device='cuda:0')\nCurrent Norm : tensor([0.4241, 0.4275], device='cuda:0')\nSteps: 3%|▎ | 26/1000 [00:20<09:37, 1.69it/s, loss=0.000702, lr=0.001]\nSteps: 3%|▎ | 27/1000 [00:21<09:40, 1.68it/s, loss=0.000702, lr=0.001]\nSteps: 3%|▎ | 27/1000 [00:21<09:40, 1.68it/s, loss=0.0608, lr=0.001] \nSteps: 3%|▎ | 28/1000 [00:21<09:33, 1.69it/s, loss=0.0608, lr=0.001]\ntensor(0.0042, device='cuda:0')\ntensor([[0.4252],\n[0.4288]], device='cuda:0')\nCurrent Norm : tensor([0.4227, 0.4259], device='cuda:0')\nSteps: 3%|▎ | 28/1000 [00:21<09:33, 1.69it/s, loss=0.000732, lr=0.001]\nSteps: 3%|▎ | 29/1000 [00:22<09:43, 1.66it/s, loss=0.000732, lr=0.001]\nSteps: 3%|▎ | 29/1000 [00:22<09:43, 1.66it/s, loss=0.0296, lr=0.001] \nSteps: 3%|▎ | 30/1000 [00:22<09:39, 1.67it/s, loss=0.0296, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4238],\n[0.4273]], device='cuda:0')\nCurrent Norm : tensor([0.4215, 0.4246], device='cuda:0')\nSteps: 3%|▎ | 30/1000 [00:22<09:39, 1.67it/s, loss=0.00688, lr=0.001]\nSteps: 3%|▎ | 31/1000 [00:23<11:19, 1.43it/s, loss=0.00688, lr=0.001]\nSteps: 3%|▎ | 31/1000 [00:23<11:19, 1.43it/s, loss=0.000233, lr=0.001]\nSteps: 3%|▎ | 32/1000 [00:24<10:44, 1.50it/s, loss=0.000233, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4226],\n[0.4259]], device='cuda:0')\nCurrent Norm : tensor([0.4203, 0.4233], device='cuda:0')\nSteps: 3%|▎ | 32/1000 [00:24<10:44, 1.50it/s, loss=0.0533, lr=0.001] \nSteps: 3%|▎ | 33/1000 [00:24<10:26, 1.54it/s, loss=0.0533, lr=0.001]\nSteps: 3%|▎ | 33/1000 [00:24<10:26, 1.54it/s, loss=0.00796, lr=0.001]\nSteps: 3%|▎ | 34/1000 [00:25<10:07, 1.59it/s, loss=0.00796, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4214],\n[0.4246]], device='cuda:0')\nCurrent Norm : tensor([0.4193, 0.4222], device='cuda:0')\nSteps: 3%|▎ | 34/1000 [00:25<10:07, 1.59it/s, loss=0.00294, lr=0.001]\nSteps: 4%|▎ | 35/1000 [00:26<10:00, 1.61it/s, loss=0.00294, lr=0.001]\nSteps: 4%|▎ | 35/1000 [00:26<10:00, 1.61it/s, loss=0.00516, lr=0.001]\nSteps: 4%|▎ | 36/1000 [00:26<09:47, 1.64it/s, loss=0.00516, lr=0.001]\ntensor(0.0097, device='cuda:0')\ntensor([[0.4203],\n[0.4232]], device='cuda:0')\nCurrent Norm : tensor([0.4183, 0.4209], device='cuda:0')\nSteps: 4%|▎ | 36/1000 [00:26<09:47, 1.64it/s, loss=0.149, lr=0.001] \nSteps: 4%|▎ | 37/1000 [00:27<09:46, 1.64it/s, loss=0.149, lr=0.001]\nSteps: 4%|▎ | 37/1000 [00:27<09:46, 1.64it/s, loss=0.0233, lr=0.001]\nSteps: 4%|▍ | 38/1000 [00:27<09:38, 1.66it/s, loss=0.0233, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4194],\n[0.4220]], device='cuda:0')\nCurrent Norm : tensor([0.4174, 0.4198], device='cuda:0')\nSteps: 4%|▍ | 38/1000 [00:27<09:38, 1.66it/s, loss=0.105, lr=0.001] \nSteps: 4%|▍ | 39/1000 [00:28<09:40, 1.66it/s, loss=0.105, lr=0.001]\nSteps: 4%|▍ | 39/1000 [00:28<09:40, 1.66it/s, loss=0.000409, lr=0.001]\nSteps: 4%|▍ | 40/1000 [00:29<09:32, 1.68it/s, loss=0.000409, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4186],\n[0.4208]], device='cuda:0')\nCurrent Norm : tensor([0.4167, 0.4188], device='cuda:0')\nSteps: 4%|▍ | 40/1000 [00:29<09:32, 1.68it/s, loss=0.0079, lr=0.001] \nSteps: 4%|▍ | 41/1000 [00:29<09:36, 1.66it/s, loss=0.0079, lr=0.001]\nSteps: 4%|▍ | 41/1000 [00:29<09:36, 1.66it/s, loss=0.0223, lr=0.001]\nSteps: 4%|▍ | 42/1000 [00:30<09:29, 1.68it/s, loss=0.0223, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4179],\n[0.4198]], device='cuda:0')\nCurrent Norm : tensor([0.4161, 0.4178], device='cuda:0')\nSteps: 4%|▍ | 42/1000 [00:30<09:29, 1.68it/s, loss=0.0519, lr=0.001]\nSteps: 4%|▍ | 43/1000 [00:30<09:38, 1.65it/s, loss=0.0519, lr=0.001]\nSteps: 4%|▍ | 43/1000 [00:30<09:38, 1.65it/s, loss=0.00597, lr=0.001]\nSteps: 4%|▍ | 44/1000 [00:31<09:30, 1.68it/s, loss=0.00597, lr=0.001]\ntensor(0.0069, device='cuda:0')\ntensor([[0.4173],\n[0.4189]], device='cuda:0')\nCurrent Norm : tensor([0.4156, 0.4170], device='cuda:0')\nSteps: 4%|▍ | 44/1000 [00:31<09:30, 1.68it/s, loss=0.028, lr=0.001] \nSteps: 4%|▍ | 45/1000 [00:32<09:34, 1.66it/s, loss=0.028, lr=0.001]\nSteps: 4%|▍ | 45/1000 [00:32<09:34, 1.66it/s, loss=0.000535, lr=0.001]\nSteps: 5%|▍ | 46/1000 [00:32<09:28, 1.68it/s, loss=0.000535, lr=0.001]\ntensor(0.0040, device='cuda:0')\ntensor([[0.4168],\n[0.4180]], device='cuda:0')\nCurrent Norm : tensor([0.4151, 0.4162], device='cuda:0')\nSteps: 5%|▍ | 46/1000 [00:32<09:28, 1.68it/s, loss=0.0098, lr=0.001] \nSteps: 5%|▍ | 47/1000 [00:33<09:33, 1.66it/s, loss=0.0098, lr=0.001]\nSteps: 5%|▍ | 47/1000 [00:33<09:33, 1.66it/s, loss=0.00272, lr=0.001]\nSteps: 5%|▍ | 48/1000 [00:33<09:26, 1.68it/s, loss=0.00272, lr=0.001]\ntensor(0.0039, device='cuda:0')\ntensor([[0.4162],\n[0.4172]], device='cuda:0')\nCurrent Norm : tensor([0.4146, 0.4155], device='cuda:0')\nSteps: 5%|▍ | 48/1000 [00:33<09:26, 1.68it/s, loss=0.0584, lr=0.001] \nSteps: 5%|▍ | 49/1000 [00:34<09:33, 1.66it/s, loss=0.0584, lr=0.001]\nSteps: 5%|▍ | 49/1000 [00:34<09:33, 1.66it/s, loss=0.000687, lr=0.001]\nSteps: 5%|▌ | 50/1000 [00:35<09:28, 1.67it/s, loss=0.000687, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4157],\n[0.4165]], device='cuda:0')\nCurrent Norm : tensor([0.4141, 0.4149], device='cuda:0')\nSteps: 5%|▌ | 50/1000 [00:35<09:28, 1.67it/s, loss=0.000356, lr=0.001]\nSteps: 5%|▌ | 51/1000 [00:35<09:31, 1.66it/s, loss=0.000356, lr=0.001]\nSteps: 5%|▌ | 51/1000 [00:35<09:31, 1.66it/s, loss=0.0267, lr=0.001] \nSteps: 5%|▌ | 52/1000 [00:36<09:25, 1.68it/s, loss=0.0267, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4152],\n[0.4158]], device='cuda:0')\nCurrent Norm : tensor([0.4137, 0.4142], device='cuda:0')\nSteps: 5%|▌ | 52/1000 [00:36<09:25, 1.68it/s, loss=0.000584, lr=0.001]\nSteps: 5%|▌ | 53/1000 [00:36<09:29, 1.66it/s, loss=0.000584, lr=0.001]\nSteps: 5%|▌ | 53/1000 [00:36<09:29, 1.66it/s, loss=0.00399, lr=0.001] \nSteps: 5%|▌ | 54/1000 [00:37<09:22, 1.68it/s, loss=0.00399, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4148],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4133, 0.4136], device='cuda:0')\nSteps: 5%|▌ | 54/1000 [00:37<09:22, 1.68it/s, loss=0.000407, lr=0.001]\nSteps: 6%|▌ | 55/1000 [00:38<09:24, 1.68it/s, loss=0.000407, lr=0.001]\nSteps: 6%|▌ | 55/1000 [00:38<09:24, 1.68it/s, loss=0.00915, lr=0.001] \nSteps: 6%|▌ | 56/1000 [00:38<09:18, 1.69it/s, loss=0.00915, lr=0.001]\ntensor(0.0025, device='cuda:0')\ntensor([[0.4144],\n[0.4145]], device='cuda:0')\nCurrent Norm : tensor([0.4129, 0.4131], device='cuda:0')\nSteps: 6%|▌ | 56/1000 [00:38<09:18, 1.69it/s, loss=0.00812, lr=0.001]\nSteps: 6%|▌ | 57/1000 [00:39<09:24, 1.67it/s, loss=0.00812, lr=0.001]\nSteps: 6%|▌ | 57/1000 [00:39<09:24, 1.67it/s, loss=0.000517, lr=0.001]\nSteps: 6%|▌ | 58/1000 [00:39<09:19, 1.68it/s, loss=0.000517, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4140],\n[0.4139]], device='cuda:0')\nCurrent Norm : tensor([0.4126, 0.4125], device='cuda:0')\nSteps: 6%|▌ | 58/1000 [00:39<09:19, 1.68it/s, loss=0.00064, lr=0.001] \nSteps: 6%|▌ | 59/1000 [00:40<09:24, 1.67it/s, loss=0.00064, lr=0.001]\nSteps: 6%|▌ | 59/1000 [00:40<09:24, 1.67it/s, loss=0.0373, lr=0.001] \nSteps: 6%|▌ | 60/1000 [00:41<09:18, 1.68it/s, loss=0.0373, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4136],\n[0.4134]], device='cuda:0')\nCurrent Norm : tensor([0.4122, 0.4121], device='cuda:0')\nSteps: 6%|▌ | 60/1000 [00:41<09:18, 1.68it/s, loss=0.0119, lr=0.001]\nSteps: 6%|▌ | 61/1000 [00:41<09:25, 1.66it/s, loss=0.0119, lr=0.001]\nSteps: 6%|▌ | 61/1000 [00:41<09:25, 1.66it/s, loss=0.000365, lr=0.001]\nSteps: 6%|▌ | 62/1000 [00:42<09:20, 1.67it/s, loss=0.000365, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4132],\n[0.4129]], device='cuda:0')\nCurrent Norm : tensor([0.4119, 0.4116], device='cuda:0')\nSteps: 6%|▌ | 62/1000 [00:42<09:20, 1.67it/s, loss=0.00457, lr=0.001] \nSteps: 6%|▋ | 63/1000 [00:42<09:24, 1.66it/s, loss=0.00457, lr=0.001]\nSteps: 6%|▋ | 63/1000 [00:42<09:24, 1.66it/s, loss=0.038, lr=0.001] \nSteps: 6%|▋ | 64/1000 [00:43<09:19, 1.67it/s, loss=0.038, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4128],\n[0.4124]], device='cuda:0')\nCurrent Norm : tensor([0.4116, 0.4112], device='cuda:0')\nSteps: 6%|▋ | 64/1000 [00:43<09:19, 1.67it/s, loss=0.0021, lr=0.001]\nSteps: 6%|▋ | 65/1000 [00:44<09:23, 1.66it/s, loss=0.0021, lr=0.001]\nSteps: 6%|▋ | 65/1000 [00:44<09:23, 1.66it/s, loss=0.000878, lr=0.001]\nSteps: 7%|▋ | 66/1000 [00:44<09:17, 1.68it/s, loss=0.000878, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4125],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4113, 0.4108], device='cuda:0')\nSteps: 7%|▋ | 66/1000 [00:44<09:17, 1.68it/s, loss=0.0884, lr=0.001] \nSteps: 7%|▋ | 67/1000 [00:45<09:21, 1.66it/s, loss=0.0884, lr=0.001]\nSteps: 7%|▋ | 67/1000 [00:45<09:21, 1.66it/s, loss=0.00163, lr=0.001]\nSteps: 7%|▋ | 68/1000 [00:45<09:15, 1.68it/s, loss=0.00163, lr=0.001]\ntensor(0.0089, device='cuda:0')\ntensor([[0.4123],\n[0.4117]], device='cuda:0')\nCurrent Norm : tensor([0.4111, 0.4106], device='cuda:0')\nSteps: 7%|▋ | 68/1000 [00:45<09:15, 1.68it/s, loss=0.00931, lr=0.001]\nSteps: 7%|▋ | 69/1000 [00:46<09:19, 1.67it/s, loss=0.00931, lr=0.001]\nSteps: 7%|▋ | 69/1000 [00:46<09:19, 1.67it/s, loss=0.00434, lr=0.001]\nSteps: 7%|▋ | 70/1000 [00:47<09:13, 1.68it/s, loss=0.00434, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4121],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4109, 0.4104], device='cuda:0')\nSteps: 7%|▋ | 70/1000 [00:47<09:13, 1.68it/s, loss=0.00211, lr=0.001]\nSteps: 7%|▋ | 71/1000 [00:47<09:22, 1.65it/s, loss=0.00211, lr=0.001]\nSteps: 7%|▋ | 71/1000 [00:47<09:22, 1.65it/s, loss=0.0829, lr=0.001] \nSteps: 7%|▋ | 72/1000 [00:48<09:16, 1.67it/s, loss=0.0829, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4119],\n [0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4107, 0.4104], device='cuda:0')\nSteps: 7%|▋ | 72/1000 [00:48<09:16, 1.67it/s, loss=0.00421, lr=0.001]\nSteps: 7%|▋ | 73/1000 [00:48<09:19, 1.66it/s, loss=0.00421, lr=0.001]\nSteps: 7%|▋ | 73/1000 [00:48<09:19, 1.66it/s, loss=0.00171, lr=0.001]\nSteps: 7%|▋ | 74/1000 [00:49<09:12, 1.67it/s, loss=0.00171, lr=0.001]\ntensor(0.0080, device='cuda:0')\ntensor([[0.4116],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4105, 0.4103], device='cuda:0')\nSteps: 7%|▋ | 74/1000 [00:49<09:12, 1.67it/s, loss=0.0162, lr=0.001] \nSteps: 8%|▊ | 75/1000 [00:50<09:16, 1.66it/s, loss=0.0162, lr=0.001]\nSteps: 8%|▊ | 75/1000 [00:50<09:16, 1.66it/s, loss=0.00691, lr=0.001]\nSteps: 8%|▊ | 76/1000 [00:50<09:12, 1.67it/s, loss=0.00691, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4114],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4103, 0.4103], device='cuda:0')\nSteps: 8%|▊ | 76/1000 [00:50<09:12, 1.67it/s, loss=0.002, lr=0.001] \nSteps: 8%|▊ | 77/1000 [00:51<09:20, 1.65it/s, loss=0.002, lr=0.001]\nSteps: 8%|▊ | 77/1000 [00:51<09:20, 1.65it/s, loss=0.000539, lr=0.001]\nSteps: 8%|▊ | 78/1000 [00:51<09:16, 1.66it/s, loss=0.000539, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4112],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4101, 0.4103], device='cuda:0')\nSteps: 8%|▊ | 78/1000 [00:51<09:16, 1.66it/s, loss=0.0036, lr=0.001] \nSteps: 8%|▊ | 79/1000 [00:52<09:21, 1.64it/s, loss=0.0036, lr=0.001]\nSteps: 8%|▊ | 79/1000 [00:52<09:21, 1.64it/s, loss=0.00652, lr=0.001]\nSteps: 8%|▊ | 80/1000 [00:53<09:16, 1.65it/s, loss=0.00652, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4111],\n [0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4100, 0.4103], device='cuda:0')\nSteps: 8%|▊ | 80/1000 [00:53<09:16, 1.65it/s, loss=0.00137, lr=0.001]\nSteps: 8%|▊ | 81/1000 [00:53<09:18, 1.64it/s, loss=0.00137, lr=0.001]\nSteps: 8%|▊ | 81/1000 [00:53<09:18, 1.64it/s, loss=0.00362, lr=0.001]\nSteps: 8%|▊ | 82/1000 [00:54<09:12, 1.66it/s, loss=0.00362, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4109],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4103], device='cuda:0')\nSteps: 8%|▊ | 82/1000 [00:54<09:12, 1.66it/s, loss=0.0197, lr=0.001] \nSteps: 8%|▊ | 83/1000 [00:54<09:16, 1.65it/s, loss=0.0197, lr=0.001]\nSteps: 8%|▊ | 83/1000 [00:54<09:16, 1.65it/s, loss=0.00979, lr=0.001]\nSteps: 8%|▊ | 84/1000 [00:55<09:10, 1.66it/s, loss=0.00979, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4107],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4103], device='cuda:0')\nSteps: 8%|▊ | 84/1000 [00:55<09:10, 1.66it/s, loss=0.0264, lr=0.001] \nSteps: 8%|▊ | 85/1000 [00:56<09:14, 1.65it/s, loss=0.0264, lr=0.001]\nSteps: 8%|▊ | 85/1000 [00:56<09:14, 1.65it/s, loss=0.000609, lr=0.001]\nSteps: 9%|▊ | 86/1000 [00:56<09:11, 1.66it/s, loss=0.000609, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4106],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4103], device='cuda:0')\nSteps: 9%|▊ | 86/1000 [00:56<09:11, 1.66it/s, loss=0.00065, lr=0.001] \nSteps: 9%|▊ | 87/1000 [00:57<09:15, 1.64it/s, loss=0.00065, lr=0.001]\nSteps: 9%|▊ | 87/1000 [00:57<09:15, 1.64it/s, loss=0.0177, lr=0.001] \nSteps: 9%|▉ | 88/1000 [00:57<09:08, 1.66it/s, loss=0.0177, lr=0.001]\ntensor(0.0062, device='cuda:0')\ntensor([[0.4104],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4103], device='cuda:0')\nSteps: 9%|▉ | 88/1000 [00:57<09:08, 1.66it/s, loss=0.0455, lr=0.001]\nSteps: 9%|▉ | 89/1000 [00:58<09:12, 1.65it/s, loss=0.0455, lr=0.001]\nSteps: 9%|▉ | 89/1000 [00:58<09:12, 1.65it/s, loss=0.0164, lr=0.001]\nSteps: 9%|▉ | 90/1000 [00:59<09:07, 1.66it/s, loss=0.0164, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4103],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4104], device='cuda:0')\nSteps: 9%|▉ | 90/1000 [00:59<09:07, 1.66it/s, loss=0.00294, lr=0.001]\nSteps: 9%|▉ | 91/1000 [00:59<09:13, 1.64it/s, loss=0.00294, lr=0.001]\nSteps: 9%|▉ | 91/1000 [00:59<09:13, 1.64it/s, loss=0.000789, lr=0.001]\nSteps: 9%|▉ | 92/1000 [01:00<09:06, 1.66it/s, loss=0.000789, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4102],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4104], device='cuda:0')\nSteps: 9%|▉ | 92/1000 [01:00<09:06, 1.66it/s, loss=0.000544, lr=0.001]\nSteps: 9%|▉ | 93/1000 [01:00<09:08, 1.65it/s, loss=0.000544, lr=0.001]\nSteps: 9%|▉ | 93/1000 [01:00<09:08, 1.65it/s, loss=0.000953, lr=0.001]\nSteps: 9%|▉ | 94/1000 [01:01<09:02, 1.67it/s, loss=0.000953, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4101],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4105], device='cuda:0')\nSteps: 9%|▉ | 94/1000 [01:01<09:02, 1.67it/s, loss=0.000429, lr=0.001]\nSteps: 10%|▉ | 95/1000 [01:02<09:06, 1.66it/s, loss=0.000429, lr=0.001]\nSteps: 10%|▉ | 95/1000 [01:02<09:06, 1.66it/s, loss=0.00231, lr=0.001] \nSteps: 10%|▉ | 96/1000 [01:02<09:04, 1.66it/s, loss=0.00231, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4100],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4105], device='cuda:0')\nSteps: 10%|▉ | 96/1000 [01:02<09:04, 1.66it/s, loss=0.00377, lr=0.001]\nSteps: 10%|▉ | 97/1000 [01:03<09:11, 1.64it/s, loss=0.00377, lr=0.001]\nSteps: 10%|▉ | 97/1000 [01:03<09:11, 1.64it/s, loss=0.000302, lr=0.001]\nSteps: 10%|▉ | 98/1000 [01:03<09:03, 1.66it/s, loss=0.000302, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4099],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4105], device='cuda:0')\nSteps: 10%|▉ | 98/1000 [01:03<09:03, 1.66it/s, loss=0.0706, lr=0.001] \nSteps: 10%|▉ | 99/1000 [01:04<09:07, 1.64it/s, loss=0.0706, lr=0.001]\nSteps: 10%|▉ | 99/1000 [01:04<09:07, 1.64it/s, loss=0.00305, lr=0.001]\nSteps: 10%|█ | 100/1000 [01:05<09:03, 1.65it/s, loss=0.00305, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 1.5137e-03, 9.1326e-05, -1.6363e-03, -1.4289e-02], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([-0.0204, -0.0029, 0.0139, -0.0003], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_100.safetensors\ntensor(0.0076, device='cuda:0')\ntensor([[0.4098],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4103], device='cuda:0')\nSteps: 10%|█ | 100/1000 [01:05<09:03, 1.65it/s, loss=0.000652, lr=0.001]\nSteps: 10%|█ | 101/1000 [01:05<09:10, 1.63it/s, loss=0.000652, lr=0.001]\nSteps: 10%|█ | 101/1000 [01:05<09:10, 1.63it/s, loss=0.0218, lr=0.001] \nSteps: 10%|█ | 102/1000 [01:06<09:03, 1.65it/s, loss=0.0218, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4096],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4103], device='cuda:0')\nSteps: 10%|█ | 102/1000 [01:06<09:03, 1.65it/s, loss=0.00278, lr=0.001]\nSteps: 10%|█ | 103/1000 [01:07<09:04, 1.65it/s, loss=0.00278, lr=0.001]\nSteps: 10%|█ | 103/1000 [01:07<09:04, 1.65it/s, loss=0.00829, lr=0.001]\nSteps: 10%|█ | 104/1000 [01:07<08:59, 1.66it/s, loss=0.00829, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4096],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4103], device='cuda:0')\nSteps: 10%|█ | 104/1000 [01:07<08:59, 1.66it/s, loss=0.0058, lr=0.001] \nSteps: 10%|█ | 105/1000 [01:08<09:04, 1.64it/s, loss=0.0058, lr=0.001]\nSteps: 10%|█ | 105/1000 [01:08<09:04, 1.64it/s, loss=0.000726, lr=0.001]\nSteps: 11%|█ | 106/1000 [01:08<08:59, 1.66it/s, loss=0.000726, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4095],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4102], device='cuda:0')\nSteps: 11%|█ | 106/1000 [01:08<08:59, 1.66it/s, loss=0.00454, lr=0.001] \nSteps: 11%|█ | 107/1000 [01:09<09:04, 1.64it/s, loss=0.00454, lr=0.001]\nSteps: 11%|█ | 107/1000 [01:09<09:04, 1.64it/s, loss=0.00176, lr=0.001]\nSteps: 11%|█ | 108/1000 [01:10<08:57, 1.66it/s, loss=0.00176, lr=0.001]\ntensor(0.0056, device='cuda:0')\ntensor([[0.4094],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4103], device='cuda:0')\nSteps: 11%|█ | 108/1000 [01:10<08:57, 1.66it/s, loss=0.0143, lr=0.001] \nSteps: 11%|█ | 109/1000 [01:10<09:00, 1.65it/s, loss=0.0143, lr=0.001]\nSteps: 11%|█ | 109/1000 [01:10<09:00, 1.65it/s, loss=0.00456, lr=0.001]\nSteps: 11%|█ | 110/1000 [01:11<08:56, 1.66it/s, loss=0.00456, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4093],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4103], device='cuda:0')\nSteps: 11%|█ | 110/1000 [01:11<08:56, 1.66it/s, loss=0.0555, lr=0.001] \nSteps: 11%|█ | 111/1000 [01:11<09:01, 1.64it/s, loss=0.0555, lr=0.001]\nSteps: 11%|█ | 111/1000 [01:11<09:01, 1.64it/s, loss=0.00316, lr=0.001]\nSteps: 11%|█ | 112/1000 [01:12<08:55, 1.66it/s, loss=0.00316, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4092],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4103], device='cuda:0')\nSteps: 11%|█ | 112/1000 [01:12<08:55, 1.66it/s, loss=0.00851, lr=0.001]\nSteps: 11%|█▏ | 113/1000 [01:13<08:57, 1.65it/s, loss=0.00851, lr=0.001]\nSteps: 11%|█▏ | 113/1000 [01:13<08:57, 1.65it/s, loss=0.0277, lr=0.001] \nSteps: 11%|█▏ | 114/1000 [01:13<08:55, 1.65it/s, loss=0.0277, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4091],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4104], device='cuda:0')\nSteps: 11%|█▏ | 114/1000 [01:13<08:55, 1.65it/s, loss=0.00622, lr=0.001]\nSteps: 12%|█▏ | 115/1000 [01:14<09:01, 1.63it/s, loss=0.00622, lr=0.001]\nSteps: 12%|█▏ | 115/1000 [01:14<09:01, 1.63it/s, loss=0.0382, lr=0.001] \nSteps: 12%|█▏ | 116/1000 [01:14<08:56, 1.65it/s, loss=0.0382, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4090],\n [0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4105], device='cuda:0')\nSteps: 12%|█▏ | 116/1000 [01:14<08:56, 1.65it/s, loss=0.00442, lr=0.001]\nSteps: 12%|█▏ | 117/1000 [01:15<08:58, 1.64it/s, loss=0.00442, lr=0.001]\nSteps: 12%|█▏ | 117/1000 [01:15<08:58, 1.64it/s, loss=0.0118, lr=0.001] \nSteps: 12%|█▏ | 118/1000 [01:16<08:53, 1.65it/s, loss=0.0118, lr=0.001]\ntensor(0.0080, device='cuda:0')\ntensor([[0.4090],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4106], device='cuda:0')\nSteps: 12%|█▏ | 118/1000 [01:16<08:53, 1.65it/s, loss=0.0209, lr=0.001]\nSteps: 12%|█▏ | 119/1000 [01:16<08:57, 1.64it/s, loss=0.0209, lr=0.001]\nSteps: 12%|█▏ | 119/1000 [01:16<08:57, 1.64it/s, loss=0.0189, lr=0.001]\nSteps: 12%|█▏ | 120/1000 [01:17<08:55, 1.64it/s, loss=0.0189, lr=0.001]\ntensor(0.0190, device='cuda:0')\ntensor([[0.4091],\n[0.4119]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4107], device='cuda:0')\nSteps: 12%|█▏ | 120/1000 [01:17<08:55, 1.64it/s, loss=0.0302, lr=0.001]\nSteps: 12%|█▏ | 121/1000 [01:17<09:00, 1.62it/s, loss=0.0302, lr=0.001]\nSteps: 12%|█▏ | 121/1000 [01:17<09:00, 1.62it/s, loss=0.0261, lr=0.001]\nSteps: 12%|█▏ | 122/1000 [01:18<08:52, 1.65it/s, loss=0.0261, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4093],\n[0.4122]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4110], device='cuda:0')\nSteps: 12%|█▏ | 122/1000 [01:18<08:52, 1.65it/s, loss=0.00258, lr=0.001]\nSteps: 12%|█▏ | 123/1000 [01:19<08:55, 1.64it/s, loss=0.00258, lr=0.001]\nSteps: 12%|█▏ | 123/1000 [01:19<08:55, 1.64it/s, loss=0.000634, lr=0.001]\nSteps: 12%|█▏ | 124/1000 [01:19<08:49, 1.65it/s, loss=0.000634, lr=0.001]\ntensor(0.0019, device='cuda:0')\ntensor([[0.4096],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4113], device='cuda:0')\nSteps: 12%|█▏ | 124/1000 [01:19<08:49, 1.65it/s, loss=0.00197, lr=0.001] \nSteps: 12%|█▎ | 125/1000 [01:20<08:51, 1.65it/s, loss=0.00197, lr=0.001]\nSteps: 12%|█▎ | 125/1000 [01:20<08:51, 1.65it/s, loss=0.0138, lr=0.001] \nSteps: 13%|█▎ | 126/1000 [01:20<08:44, 1.67it/s, loss=0.0138, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4098],\n[0.4128]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4116], device='cuda:0')\nSteps: 13%|█▎ | 126/1000 [01:20<08:44, 1.67it/s, loss=0.0196, lr=0.001]\nSteps: 13%|█▎ | 127/1000 [01:21<08:48, 1.65it/s, loss=0.0196, lr=0.001]\nSteps: 13%|█▎ | 127/1000 [01:21<08:48, 1.65it/s, loss=0.000179, lr=0.001]\nSteps: 13%|█▎ | 128/1000 [01:22<08:45, 1.66it/s, loss=0.000179, lr=0.001]\ntensor(0.0081, device='cuda:0')\ntensor([[0.4100],\n[0.4132]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4119], device='cuda:0')\nSteps: 13%|█▎ | 128/1000 [01:22<08:45, 1.66it/s, loss=0.0143, lr=0.001] \nSteps: 13%|█▎ | 129/1000 [01:22<08:47, 1.65it/s, loss=0.0143, lr=0.001]\nSteps: 13%|█▎ | 129/1000 [01:22<08:47, 1.65it/s, loss=0.0112, lr=0.001]\nSteps: 13%|█▎ | 130/1000 [01:23<08:45, 1.66it/s, loss=0.0112, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4101],\n[0.4136]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4122], device='cuda:0')\nSteps: 13%|█▎ | 130/1000 [01:23<08:45, 1.66it/s, loss=0.029, lr=0.001] \nSteps: 13%|█▎ | 131/1000 [01:24<08:52, 1.63it/s, loss=0.029, lr=0.001]\nSteps: 13%|█▎ | 131/1000 [01:24<08:52, 1.63it/s, loss=0.00343, lr=0.001]\nSteps: 13%|█▎ | 132/1000 [01:24<08:49, 1.64it/s, loss=0.00343, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4103],\n[0.4139]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4125], device='cuda:0')\nSteps: 13%|█▎ | 132/1000 [01:24<08:49, 1.64it/s, loss=0.00165, lr=0.001]\nSteps: 13%|█▎ | 133/1000 [01:25<08:52, 1.63it/s, loss=0.00165, lr=0.001]\nSteps: 13%|█▎ | 133/1000 [01:25<08:52, 1.63it/s, loss=0.00451, lr=0.001]\nSteps: 13%|█▎ | 134/1000 [01:25<08:45, 1.65it/s, loss=0.00451, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4104],\n[0.4142]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4128], device='cuda:0')\nSteps: 13%|█▎ | 134/1000 [01:25<08:45, 1.65it/s, loss=0.00724, lr=0.001]\nSteps: 14%|█▎ | 135/1000 [01:26<08:51, 1.63it/s, loss=0.00724, lr=0.001]\nSteps: 14%|█▎ | 135/1000 [01:26<08:51, 1.63it/s, loss=0.000926, lr=0.001]\nSteps: 14%|█▎ | 136/1000 [01:27<08:47, 1.64it/s, loss=0.000926, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4104],\n[0.4145]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4130], device='cuda:0')\nSteps: 14%|█▎ | 136/1000 [01:27<08:47, 1.64it/s, loss=0.000544, lr=0.001]\nSteps: 14%|█▎ | 137/1000 [01:27<08:49, 1.63it/s, loss=0.000544, lr=0.001]\nSteps: 14%|█▎ | 137/1000 [01:27<08:49, 1.63it/s, loss=0.0385, lr=0.001] \nSteps: 14%|█▍ | 138/1000 [01:28<08:44, 1.64it/s, loss=0.0385, lr=0.001]\ntensor(0.0040, device='cuda:0')\ntensor([[0.4104],\n[0.4147]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4132], device='cuda:0')\nSteps: 14%|█▍ | 138/1000 [01:28<08:44, 1.64it/s, loss=0.0509, lr=0.001]\nSteps: 14%|█▍ | 139/1000 [01:28<08:45, 1.64it/s, loss=0.0509, lr=0.001]\nSteps: 14%|█▍ | 139/1000 [01:28<08:45, 1.64it/s, loss=0.000223, lr=0.001]\nSteps: 14%|█▍ | 140/1000 [01:29<08:39, 1.66it/s, loss=0.000223, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4104],\n[0.4149]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4134], device='cuda:0')\nSteps: 14%|█▍ | 140/1000 [01:29<08:39, 1.66it/s, loss=0.113, lr=0.001] \nSteps: 14%|█▍ | 141/1000 [01:30<08:44, 1.64it/s, loss=0.113, lr=0.001]\nSteps: 14%|█▍ | 141/1000 [01:30<08:44, 1.64it/s, loss=0.0127, lr=0.001]\nSteps: 14%|█▍ | 142/1000 [01:30<08:38, 1.65it/s, loss=0.0127, lr=0.001]\ntensor(0.0056, device='cuda:0')\ntensor([[0.4105],\n[0.4151]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4135], device='cuda:0')\nSteps: 14%|█▍ | 142/1000 [01:30<08:38, 1.65it/s, loss=0.0342, lr=0.001]\nSteps: 14%|█▍ | 143/1000 [01:31<08:40, 1.65it/s, loss=0.0342, lr=0.001]\nSteps: 14%|█▍ | 143/1000 [01:31<08:40, 1.65it/s, loss=0.00191, lr=0.001]\nSteps: 14%|█▍ | 144/1000 [01:31<08:36, 1.66it/s, loss=0.00191, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4106],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4137], device='cuda:0')\nSteps: 14%|█▍ | 144/1000 [01:31<08:36, 1.66it/s, loss=0.0527, lr=0.001] \nSteps: 14%|█▍ | 145/1000 [01:32<08:39, 1.65it/s, loss=0.0527, lr=0.001]\nSteps: 14%|█▍ | 145/1000 [01:32<08:39, 1.65it/s, loss=0.000642, lr=0.001]\nSteps: 15%|█▍ | 146/1000 [01:33<08:34, 1.66it/s, loss=0.000642, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4107],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4137], device='cuda:0')\nSteps: 15%|█▍ | 146/1000 [01:33<08:34, 1.66it/s, loss=0.000191, lr=0.001]\nSteps: 15%|█▍ | 147/1000 [01:33<08:39, 1.64it/s, loss=0.000191, lr=0.001]\nSteps: 15%|█▍ | 147/1000 [01:33<08:39, 1.64it/s, loss=0.0652, lr=0.001] \nSteps: 15%|█▍ | 148/1000 [01:34<08:36, 1.65it/s, loss=0.0652, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4107],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4137], device='cuda:0')\nSteps: 15%|█▍ | 148/1000 [01:34<08:36, 1.65it/s, loss=0.00285, lr=0.001]\nSteps: 15%|█▍ | 149/1000 [01:34<08:39, 1.64it/s, loss=0.00285, lr=0.001]\nSteps: 15%|█▍ | 149/1000 [01:34<08:39, 1.64it/s, loss=0.0492, lr=0.001] \nSteps: 15%|█▌ | 150/1000 [01:35<08:34, 1.65it/s, loss=0.0492, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4107],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4137], device='cuda:0')\nSteps: 15%|█▌ | 150/1000 [01:35<08:34, 1.65it/s, loss=0.00289, lr=0.001]\nSteps: 15%|█▌ | 151/1000 [01:36<08:41, 1.63it/s, loss=0.00289, lr=0.001]\nSteps: 15%|█▌ | 151/1000 [01:36<08:41, 1.63it/s, loss=0.0184, lr=0.001] \nSteps: 15%|█▌ | 152/1000 [01:36<08:34, 1.65it/s, loss=0.0184, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4107],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4137], device='cuda:0')\nSteps: 15%|█▌ | 152/1000 [01:36<08:34, 1.65it/s, loss=0.00912, lr=0.001]\nSteps: 15%|█▌ | 153/1000 [01:37<08:39, 1.63it/s, loss=0.00912, lr=0.001]\nSteps: 15%|█▌ | 153/1000 [01:37<08:39, 1.63it/s, loss=0.0912, lr=0.001] \nSteps: 15%|█▌ | 154/1000 [01:38<08:34, 1.65it/s, loss=0.0912, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4107],\n[0.4152]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4136], device='cuda:0')\nSteps: 15%|█▌ | 154/1000 [01:38<08:34, 1.65it/s, loss=0.00224, lr=0.001]\nSteps: 16%|█▌ | 155/1000 [01:38<08:38, 1.63it/s, loss=0.00224, lr=0.001]\nSteps: 16%|█▌ | 155/1000 [01:38<08:38, 1.63it/s, loss=0.0296, lr=0.001] \nSteps: 16%|█▌ | 156/1000 [01:39<08:34, 1.64it/s, loss=0.0296, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4107],\n[0.4151]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4136], device='cuda:0')\nSteps: 16%|█▌ | 156/1000 [01:39<08:34, 1.64it/s, loss=0.0158, lr=0.001]\nSteps: 16%|█▌ | 157/1000 [01:39<08:37, 1.63it/s, loss=0.0158, lr=0.001]\nSteps: 16%|█▌ | 157/1000 [01:39<08:37, 1.63it/s, loss=0.0173, lr=0.001]\nSteps: 16%|█▌ | 158/1000 [01:40<08:32, 1.64it/s, loss=0.0173, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4106],\n[0.4151]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4136], device='cuda:0')\nSteps: 16%|█▌ | 158/1000 [01:40<08:32, 1.64it/s, loss=0.00296, lr=0.001]\nSteps: 16%|█▌ | 159/1000 [01:41<08:37, 1.63it/s, loss=0.00296, lr=0.001]\nSteps: 16%|█▌ | 159/1000 [01:41<08:37, 1.63it/s, loss=0.000782, lr=0.001]\nSteps: 16%|█▌ | 160/1000 [01:41<08:35, 1.63it/s, loss=0.000782, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4105],\n[0.4150]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4135], device='cuda:0')\nSteps: 16%|█▌ | 160/1000 [01:41<08:35, 1.63it/s, loss=0.0113, lr=0.001] \nSteps: 16%|█▌ | 161/1000 [01:42<08:36, 1.62it/s, loss=0.0113, lr=0.001]\nSteps: 16%|█▌ | 161/1000 [01:42<08:36, 1.62it/s, loss=0.019, lr=0.001] \nSteps: 16%|█▌ | 162/1000 [01:42<08:31, 1.64it/s, loss=0.019, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4104],\n[0.4149]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4134], device='cuda:0')\nSteps: 16%|█▌ | 162/1000 [01:42<08:31, 1.64it/s, loss=0.00382, lr=0.001]\nSteps: 16%|█▋ | 163/1000 [01:43<08:33, 1.63it/s, loss=0.00382, lr=0.001]\nSteps: 16%|█▋ | 163/1000 [01:43<08:33, 1.63it/s, loss=0.00187, lr=0.001]\nSteps: 16%|█▋ | 164/1000 [01:44<08:30, 1.64it/s, loss=0.00187, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4103],\n[0.4148]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4133], device='cuda:0')\nSteps: 16%|█▋ | 164/1000 [01:44<08:30, 1.64it/s, loss=0.00628, lr=0.001]\nSteps: 16%|█▋ | 165/1000 [01:44<08:35, 1.62it/s, loss=0.00628, lr=0.001]\nSteps: 16%|█▋ | 165/1000 [01:44<08:35, 1.62it/s, loss=0.00432, lr=0.001]\nSteps: 17%|█▋ | 166/1000 [01:45<08:32, 1.63it/s, loss=0.00432, lr=0.001]\ntensor(0.0070, device='cuda:0')\ntensor([[0.4101],\n[0.4148]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4133], device='cuda:0')\nSteps: 17%|█▋ | 166/1000 [01:45<08:32, 1.63it/s, loss=0.00689, lr=0.001]\nSteps: 17%|█▋ | 167/1000 [01:46<08:36, 1.61it/s, loss=0.00689, lr=0.001]\nSteps: 17%|█▋ | 167/1000 [01:46<08:36, 1.61it/s, loss=0.106, lr=0.001] \nSteps: 17%|█▋ | 168/1000 [01:46<08:31, 1.63it/s, loss=0.106, lr=0.001]\ntensor(0.0023, device='cuda:0')\ntensor([[0.4099],\n[0.4148]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4133], device='cuda:0')\nSteps: 17%|█▋ | 168/1000 [01:46<08:31, 1.63it/s, loss=0.000728, lr=0.001]\nSteps: 17%|█▋ | 169/1000 [01:47<08:34, 1.62it/s, loss=0.000728, lr=0.001]\nSteps: 17%|█▋ | 169/1000 [01:47<08:34, 1.62it/s, loss=0.012, lr=0.001] \nSteps: 17%|█▋ | 170/1000 [01:47<08:29, 1.63it/s, loss=0.012, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4096],\n[0.4147]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4132], device='cuda:0')\nSteps: 17%|█▋ | 170/1000 [01:47<08:29, 1.63it/s, loss=0.0805, lr=0.001]\nSteps: 17%|█▋ | 171/1000 [01:48<08:33, 1.62it/s, loss=0.0805, lr=0.001]\nSteps: 17%|█▋ | 171/1000 [01:48<08:33, 1.62it/s, loss=0.000757, lr=0.001]\nSteps: 17%|█▋ | 172/1000 [01:49<08:28, 1.63it/s, loss=0.000757, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4094],\n[0.4146]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4132], device='cuda:0')\nSteps: 17%|█▋ | 172/1000 [01:49<08:28, 1.63it/s, loss=0.000506, lr=0.001]\nSteps: 17%|█▋ | 173/1000 [01:49<08:32, 1.61it/s, loss=0.000506, lr=0.001]\nSteps: 17%|█▋ | 173/1000 [01:49<08:32, 1.61it/s, loss=0.00117, lr=0.001] \nSteps: 17%|█▋ | 174/1000 [01:50<08:28, 1.62it/s, loss=0.00117, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4093],\n[0.4145]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4130], device='cuda:0')\nSteps: 17%|█▋ | 174/1000 [01:50<08:28, 1.62it/s, loss=0.0104, lr=0.001] \nSteps: 18%|█▊ | 175/1000 [01:50<08:30, 1.62it/s, loss=0.0104, lr=0.001]\nSteps: 18%|█▊ | 175/1000 [01:50<08:30, 1.62it/s, loss=0.00425, lr=0.001]\nSteps: 18%|█▊ | 176/1000 [01:51<08:27, 1.62it/s, loss=0.00425, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4091],\n[0.4143]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4128], device='cuda:0')\nSteps: 18%|█▊ | 176/1000 [01:51<08:27, 1.62it/s, loss=0.067, lr=0.001] \nSteps: 18%|█▊ | 177/1000 [01:52<08:30, 1.61it/s, loss=0.067, lr=0.001]\nSteps: 18%|█▊ | 177/1000 [01:52<08:30, 1.61it/s, loss=0.00193, lr=0.001]\nSteps: 18%|█▊ | 178/1000 [01:52<08:25, 1.63it/s, loss=0.00193, lr=0.001]\ntensor(0.0126, device='cuda:0')\ntensor([[0.4089],\n[0.4140]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4126], device='cuda:0')\nSteps: 18%|█▊ | 178/1000 [01:52<08:25, 1.63it/s, loss=0.0166, lr=0.001] \nSteps: 18%|█▊ | 179/1000 [01:53<08:29, 1.61it/s, loss=0.0166, lr=0.001]\nSteps: 18%|█▊ | 179/1000 [01:53<08:29, 1.61it/s, loss=0.0151, lr=0.001]\nSteps: 18%|█▊ | 180/1000 [01:54<08:24, 1.62it/s, loss=0.0151, lr=0.001]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4087],\n[0.4137]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4123], device='cuda:0')\nSteps: 18%|█▊ | 180/1000 [01:54<08:24, 1.62it/s, loss=0.00435, lr=0.001]\nSteps: 18%|█▊ | 181/1000 [01:54<08:30, 1.60it/s, loss=0.00435, lr=0.001]\nSteps: 18%|█▊ | 181/1000 [01:54<08:30, 1.60it/s, loss=0.0246, lr=0.001] \nSteps: 18%|█▊ | 182/1000 [01:55<08:27, 1.61it/s, loss=0.0246, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4087],\n[0.4135]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4122], device='cuda:0')\nSteps: 18%|█▊ | 182/1000 [01:55<08:27, 1.61it/s, loss=0.00682, lr=0.001]\nSteps: 18%|█▊ | 183/1000 [01:55<08:30, 1.60it/s, loss=0.00682, lr=0.001]\nSteps: 18%|█▊ | 183/1000 [01:55<08:30, 1.60it/s, loss=0.0226, lr=0.001] \nSteps: 18%|█▊ | 184/1000 [01:56<08:25, 1.61it/s, loss=0.0226, lr=0.001]\ntensor(0.0073, device='cuda:0')\ntensor([[0.4088],\n[0.4135]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4122], device='cuda:0')\nSteps: 18%|█▊ | 184/1000 [01:56<08:25, 1.61it/s, loss=0.0133, lr=0.001]\nSteps: 18%|█▊ | 185/1000 [01:57<08:29, 1.60it/s, loss=0.0133, lr=0.001]\nSteps: 18%|█▊ | 185/1000 [01:57<08:29, 1.60it/s, loss=0.0136, lr=0.001]\nSteps: 19%|█▊ | 186/1000 [01:57<08:23, 1.62it/s, loss=0.0136, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4089],\n[0.4135]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4121], device='cuda:0')\nSteps: 19%|█▊ | 186/1000 [01:57<08:23, 1.62it/s, loss=0.000371, lr=0.001]\nSteps: 19%|█▊ | 187/1000 [01:58<08:30, 1.59it/s, loss=0.000371, lr=0.001]\nSteps: 19%|█▊ | 187/1000 [01:58<08:30, 1.59it/s, loss=0.0068, lr=0.001] \nSteps: 19%|█▉ | 188/1000 [01:59<08:26, 1.60it/s, loss=0.0068, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4091],\n[0.4135]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4121], device='cuda:0')\nSteps: 19%|█▉ | 188/1000 [01:59<08:26, 1.60it/s, loss=0.0109, lr=0.001]\nSteps: 19%|█▉ | 189/1000 [01:59<08:30, 1.59it/s, loss=0.0109, lr=0.001]\nSteps: 19%|█▉ | 189/1000 [01:59<08:30, 1.59it/s, loss=0.028, lr=0.001] \nSteps: 19%|█▉ | 190/1000 [02:00<08:28, 1.59it/s, loss=0.028, lr=0.001]\ntensor(0.0039, device='cuda:0')\ntensor([[0.4093],\n[0.4135]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4121], device='cuda:0')\nSteps: 19%|█▉ | 190/1000 [02:00<08:28, 1.59it/s, loss=0.000189, lr=0.001]\nSteps: 19%|█▉ | 191/1000 [02:00<08:30, 1.58it/s, loss=0.000189, lr=0.001]\nSteps: 19%|█▉ | 191/1000 [02:00<08:30, 1.58it/s, loss=0.00566, lr=0.001] \nSteps: 19%|█▉ | 192/1000 [02:01<08:25, 1.60it/s, loss=0.00566, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4094],\n[0.4134]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4121], device='cuda:0')\nSteps: 19%|█▉ | 192/1000 [02:01<08:25, 1.60it/s, loss=0.0471, lr=0.001] \nSteps: 19%|█▉ | 193/1000 [02:02<08:29, 1.58it/s, loss=0.0471, lr=0.001]\nSteps: 19%|█▉ | 193/1000 [02:02<08:29, 1.58it/s, loss=0.00171, lr=0.001]\nSteps: 19%|█▉ | 194/1000 [02:02<08:24, 1.60it/s, loss=0.00171, lr=0.001]\ntensor(0.0088, device='cuda:0')\ntensor([[0.4094],\n[0.4133]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4120], device='cuda:0')\nSteps: 19%|█▉ | 194/1000 [02:02<08:24, 1.60it/s, loss=0.0653, lr=0.001] \nSteps: 20%|█▉ | 195/1000 [02:03<08:28, 1.58it/s, loss=0.0653, lr=0.001]\nSteps: 20%|█▉ | 195/1000 [02:03<08:28, 1.58it/s, loss=0.111, lr=0.001] \nSteps: 20%|█▉ | 196/1000 [02:04<08:24, 1.59it/s, loss=0.111, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4094],\n[0.4133]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4119], device='cuda:0')\nSteps: 20%|█▉ | 196/1000 [02:04<08:24, 1.59it/s, loss=0.000168, lr=0.001]\nSteps: 20%|█▉ | 197/1000 [02:04<08:27, 1.58it/s, loss=0.000168, lr=0.001]\nSteps: 20%|█▉ | 197/1000 [02:04<08:27, 1.58it/s, loss=0.215, lr=0.001] \nSteps: 20%|█▉ | 198/1000 [02:05<08:24, 1.59it/s, loss=0.215, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4094],\n[0.4132]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4119], device='cuda:0')\nSteps: 20%|█▉ | 198/1000 [02:05<08:24, 1.59it/s, loss=0.0573, lr=0.001]\nSteps: 20%|█▉ | 199/1000 [02:06<08:29, 1.57it/s, loss=0.0573, lr=0.001]\nSteps: 20%|█▉ | 199/1000 [02:06<08:29, 1.57it/s, loss=0.0316, lr=0.001]\nSteps: 20%|██ | 200/1000 [02:06<08:25, 1.58it/s, loss=0.0316, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0056, -0.0009, -0.0087, -0.0236], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0028, -0.0070, 0.0065, 0.0088], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_200.safetensors\ntensor(0.0011, device='cuda:0')\ntensor([[0.4093],\n[0.4131]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4118], device='cuda:0')\nSteps: 20%|██ | 200/1000 [02:06<08:25, 1.58it/s, loss=0.000531, lr=0.001]\nSteps: 20%|██ | 201/1000 [02:07<08:29, 1.57it/s, loss=0.000531, lr=0.001]\nSteps: 20%|██ | 201/1000 [02:07<08:29, 1.57it/s, loss=0.00104, lr=0.001] \nSteps: 20%|██ | 202/1000 [02:07<08:24, 1.58it/s, loss=0.00104, lr=0.001]\ntensor(0.0003, device='cuda:0')\ntensor([[0.4093],\n[0.4130]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4117], device='cuda:0')\nSteps: 20%|██ | 202/1000 [02:07<08:24, 1.58it/s, loss=0.000786, lr=0.001]\nSteps: 20%|██ | 203/1000 [02:08<08:27, 1.57it/s, loss=0.000786, lr=0.001]\nSteps: 20%|██ | 203/1000 [02:08<08:27, 1.57it/s, loss=0.00016, lr=0.001] \nSteps: 20%|██ | 204/1000 [02:09<08:21, 1.59it/s, loss=0.00016, lr=0.001]\ntensor(0.0079, device='cuda:0')\ntensor([[0.4092],\n[0.4129]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4116], device='cuda:0')\nSteps: 20%|██ | 204/1000 [02:09<08:21, 1.59it/s, loss=0.0152, lr=0.001] \nSteps: 20%|██ | 205/1000 [02:09<08:26, 1.57it/s, loss=0.0152, lr=0.001]\nSteps: 20%|██ | 205/1000 [02:09<08:26, 1.57it/s, loss=0.0314, lr=0.001]\nSteps: 21%|██ | 206/1000 [02:10<08:20, 1.59it/s, loss=0.0314, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4092],\n[0.4129]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4116], device='cuda:0')\nSteps: 21%|██ | 206/1000 [02:10<08:20, 1.59it/s, loss=0.0123, lr=0.001]\nSteps: 21%|██ | 207/1000 [02:11<08:25, 1.57it/s, loss=0.0123, lr=0.001]\nSteps: 21%|██ | 207/1000 [02:11<08:25, 1.57it/s, loss=0.0274, lr=0.001]\nSteps: 21%|██ | 208/1000 [02:11<08:19, 1.59it/s, loss=0.0274, lr=0.001]\ntensor(0.0105, device='cuda:0')\ntensor([[0.4093],\n[0.4130]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4117], device='cuda:0')\nSteps: 21%|██ | 208/1000 [02:11<08:19, 1.59it/s, loss=0.0139, lr=0.001]\nSteps: 21%|██ | 209/1000 [02:12<08:21, 1.58it/s, loss=0.0139, lr=0.001]\nSteps: 21%|██ | 209/1000 [02:12<08:21, 1.58it/s, loss=0.00396, lr=0.001]\nSteps: 21%|██ | 210/1000 [02:12<08:18, 1.59it/s, loss=0.00396, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4094],\n[0.4130]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4117], device='cuda:0')\nSteps: 21%|██ | 210/1000 [02:12<08:18, 1.59it/s, loss=0.00166, lr=0.001]\nSteps: 21%|██ | 211/1000 [02:13<08:21, 1.57it/s, loss=0.00166, lr=0.001]\nSteps: 21%|██ | 211/1000 [02:13<08:21, 1.57it/s, loss=0.0014, lr=0.001] \nSteps: 21%|██ | 212/1000 [02:14<08:17, 1.58it/s, loss=0.0014, lr=0.001]\ntensor(0.0082, device='cuda:0')\ntensor([[0.4096],\n[0.4131]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4118], device='cuda:0')\nSteps: 21%|██ | 212/1000 [02:14<08:17, 1.58it/s, loss=0.00378, lr=0.001]\nSteps: 21%|██▏ | 213/1000 [02:14<08:28, 1.55it/s, loss=0.00378, lr=0.001]\nSteps: 21%|██▏ | 213/1000 [02:14<08:28, 1.55it/s, loss=0.0232, lr=0.001] \nSteps: 21%|██▏ | 214/1000 [02:15<08:24, 1.56it/s, loss=0.0232, lr=0.001]\ntensor(0.0009, device='cuda:0')\ntensor([[0.4098],\n[0.4131]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4118], device='cuda:0')\nSteps: 21%|██▏ | 214/1000 [02:15<08:24, 1.56it/s, loss=0.0017, lr=0.001]\nSteps: 22%|██▏ | 215/1000 [02:16<08:23, 1.56it/s, loss=0.0017, lr=0.001]\nSteps: 22%|██▏ | 215/1000 [02:16<08:23, 1.56it/s, loss=0.000727, lr=0.001]\nSteps: 22%|██▏ | 216/1000 [02:16<08:19, 1.57it/s, loss=0.000727, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4099],\n[0.4130]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4117], device='cuda:0')\nSteps: 22%|██▏ | 216/1000 [02:16<08:19, 1.57it/s, loss=0.00184, lr=0.001] \nSteps: 22%|██▏ | 217/1000 [02:17<08:22, 1.56it/s, loss=0.00184, lr=0.001]\nSteps: 22%|██▏ | 217/1000 [02:17<08:22, 1.56it/s, loss=0.0032, lr=0.001] \nSteps: 22%|██▏ | 218/1000 [02:18<08:17, 1.57it/s, loss=0.0032, lr=0.001]\ntensor(0.0043, device='cuda:0')\ntensor([[0.4100],\n[0.4129]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4116], device='cuda:0')\nSteps: 22%|██▏ | 218/1000 [02:18<08:17, 1.57it/s, loss=0.00103, lr=0.001]\nSteps: 22%|██▏ | 219/1000 [02:18<08:20, 1.56it/s, loss=0.00103, lr=0.001]\nSteps: 22%|██▏ | 219/1000 [02:18<08:20, 1.56it/s, loss=0.00787, lr=0.001]\nSteps: 22%|██▏ | 220/1000 [02:19<08:14, 1.58it/s, loss=0.00787, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4102],\n[0.4128]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4115], device='cuda:0')\nSteps: 22%|██▏ | 220/1000 [02:19<08:14, 1.58it/s, loss=0.0362, lr=0.001] \nSteps: 22%|██▏ | 221/1000 [02:20<08:17, 1.56it/s, loss=0.0362, lr=0.001]\nSteps: 22%|██▏ | 221/1000 [02:20<08:17, 1.56it/s, loss=0.00483, lr=0.001]\nSteps: 22%|██▏ | 222/1000 [02:20<08:13, 1.58it/s, loss=0.00483, lr=0.001]\ntensor(0.0095, device='cuda:0')\ntensor([[0.4104],\n[0.4126]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4114], device='cuda:0')\nSteps: 22%|██▏ | 222/1000 [02:20<08:13, 1.58it/s, loss=0.0241, lr=0.001] \nSteps: 22%|██▏ | 223/1000 [02:21<08:18, 1.56it/s, loss=0.0241, lr=0.001]\nSteps: 22%|██▏ | 223/1000 [02:21<08:18, 1.56it/s, loss=0.0179, lr=0.001]\nSteps: 22%|██▏ | 224/1000 [02:21<08:14, 1.57it/s, loss=0.0179, lr=0.001]\ntensor(0.0009, device='cuda:0')\ntensor([[0.4106],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4113], device='cuda:0')\nSteps: 22%|██▏ | 224/1000 [02:21<08:14, 1.57it/s, loss=0.00276, lr=0.001]\nSteps: 22%|██▎ | 225/1000 [02:22<08:16, 1.56it/s, loss=0.00276, lr=0.001]\nSteps: 22%|██▎ | 225/1000 [02:22<08:16, 1.56it/s, loss=0.000774, lr=0.001]\nSteps: 23%|██▎ | 226/1000 [02:23<08:12, 1.57it/s, loss=0.000774, lr=0.001]\ntensor(0.0092, device='cuda:0')\ntensor([[0.4107],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4112], device='cuda:0')\nSteps: 23%|██▎ | 226/1000 [02:23<08:12, 1.57it/s, loss=0.071, lr=0.001] \nSteps: 23%|██▎ | 227/1000 [02:23<08:16, 1.56it/s, loss=0.071, lr=0.001]\nSteps: 23%|██▎ | 227/1000 [02:23<08:16, 1.56it/s, loss=0.00114, lr=0.001]\nSteps: 23%|██▎ | 228/1000 [02:24<08:12, 1.57it/s, loss=0.00114, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4107],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4113], device='cuda:0')\nSteps: 23%|██▎ | 228/1000 [02:24<08:12, 1.57it/s, loss=0.000409, lr=0.001]\nSteps: 23%|██▎ | 229/1000 [02:25<08:16, 1.55it/s, loss=0.000409, lr=0.001]\nSteps: 23%|██▎ | 229/1000 [02:25<08:16, 1.55it/s, loss=0.0402, lr=0.001] \nSteps: 23%|██▎ | 230/1000 [02:25<08:11, 1.57it/s, loss=0.0402, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4108],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4112], device='cuda:0')\nSteps: 23%|██▎ | 230/1000 [02:25<08:11, 1.57it/s, loss=0.0238, lr=0.001]\nSteps: 23%|██▎ | 231/1000 [02:26<08:14, 1.55it/s, loss=0.0238, lr=0.001]\nSteps: 23%|██▎ | 231/1000 [02:26<08:14, 1.55it/s, loss=0.000537, lr=0.001]\nSteps: 23%|██▎ | 232/1000 [02:27<08:09, 1.57it/s, loss=0.000537, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4108],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4112], device='cuda:0')\nSteps: 23%|██▎ | 232/1000 [02:27<08:09, 1.57it/s, loss=0.000339, lr=0.001]\nSteps: 23%|██▎ | 233/1000 [02:27<08:14, 1.55it/s, loss=0.000339, lr=0.001]\nSteps: 23%|██▎ | 233/1000 [02:27<08:14, 1.55it/s, loss=0.0129, lr=0.001] \nSteps: 23%|██▎ | 234/1000 [02:28<08:11, 1.56it/s, loss=0.0129, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4109],\n[0.4124]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4112], device='cuda:0')\nSteps: 23%|██▎ | 234/1000 [02:28<08:11, 1.56it/s, loss=0.00997, lr=0.001]\nSteps: 24%|██▎ | 235/1000 [02:28<08:13, 1.55it/s, loss=0.00997, lr=0.001]\nSteps: 24%|██▎ | 235/1000 [02:28<08:13, 1.55it/s, loss=0.0416, lr=0.001] \nSteps: 24%|██▎ | 236/1000 [02:29<08:08, 1.56it/s, loss=0.0416, lr=0.001]\ntensor(0.0040, device='cuda:0')\ntensor([[0.4109],\n[0.4123]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4111], device='cuda:0')\nSteps: 24%|██▎ | 236/1000 [02:29<08:08, 1.56it/s, loss=0.00845, lr=0.001]\nSteps: 24%|██▎ | 237/1000 [02:30<08:12, 1.55it/s, loss=0.00845, lr=0.001]\nSteps: 24%|██▎ | 237/1000 [02:30<08:12, 1.55it/s, loss=0.000848, lr=0.001]\nSteps: 24%|██▍ | 238/1000 [02:30<08:08, 1.56it/s, loss=0.000848, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4110],\n[0.4122]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4110], device='cuda:0')\nSteps: 24%|██▍ | 238/1000 [02:30<08:08, 1.56it/s, loss=0.00033, lr=0.001] \nSteps: 24%|██▍ | 239/1000 [02:31<08:14, 1.54it/s, loss=0.00033, lr=0.001]\nSteps: 24%|██▍ | 239/1000 [02:31<08:14, 1.54it/s, loss=0.0048, lr=0.001] \nSteps: 24%|██▍ | 240/1000 [02:32<08:11, 1.55it/s, loss=0.0048, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4110],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 24%|██▍ | 240/1000 [02:32<08:11, 1.55it/s, loss=0.000334, lr=0.001]\nSteps: 24%|██▍ | 241/1000 [02:32<08:13, 1.54it/s, loss=0.000334, lr=0.001]\nSteps: 24%|██▍ | 241/1000 [02:32<08:13, 1.54it/s, loss=0.0037, lr=0.001] \nSteps: 24%|██▍ | 242/1000 [02:33<08:07, 1.55it/s, loss=0.0037, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4110],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4108], device='cuda:0')\nSteps: 24%|██▍ | 242/1000 [02:33<08:07, 1.55it/s, loss=0.000196, lr=0.001]\nSteps: 24%|██▍ | 243/1000 [02:34<08:11, 1.54it/s, loss=0.000196, lr=0.001]\nSteps: 24%|██▍ | 243/1000 [02:34<08:11, 1.54it/s, loss=0.0248, lr=0.001] \nSteps: 24%|██▍ | 244/1000 [02:34<08:07, 1.55it/s, loss=0.0248, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4109],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4107], device='cuda:0')\nSteps: 24%|██▍ | 244/1000 [02:34<08:07, 1.55it/s, loss=0.00292, lr=0.001]\nSteps: 24%|██▍ | 245/1000 [02:35<08:10, 1.54it/s, loss=0.00292, lr=0.001]\nSteps: 24%|██▍ | 245/1000 [02:35<08:10, 1.54it/s, loss=0.000501, lr=0.001]\nSteps: 25%|██▍ | 246/1000 [02:36<08:05, 1.55it/s, loss=0.000501, lr=0.001]\ntensor(0.0067, device='cuda:0')\ntensor([[0.4108],\n[0.4117]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4105], device='cuda:0')\nSteps: 25%|██▍ | 246/1000 [02:36<08:05, 1.55it/s, loss=0.000666, lr=0.001]\nSteps: 25%|██▍ | 247/1000 [02:36<08:06, 1.55it/s, loss=0.000666, lr=0.001]\nSteps: 25%|██▍ | 247/1000 [02:36<08:06, 1.55it/s, loss=0.0521, lr=0.001] \nSteps: 25%|██▍ | 248/1000 [02:37<08:01, 1.56it/s, loss=0.0521, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4107],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4103], device='cuda:0')\nSteps: 25%|██▍ | 248/1000 [02:37<08:01, 1.56it/s, loss=0.00594, lr=0.001]\nSteps: 25%|██▍ | 249/1000 [02:38<08:03, 1.55it/s, loss=0.00594, lr=0.001]\nSteps: 25%|██▍ | 249/1000 [02:38<08:03, 1.55it/s, loss=0.0184, lr=0.001] \nSteps: 25%|██▌ | 250/1000 [02:38<07:59, 1.56it/s, loss=0.0184, lr=0.001]\ntensor(0.0015, device='cuda:0')\ntensor([[0.4106],\n[0.4113]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4102], device='cuda:0')\nSteps: 25%|██▌ | 250/1000 [02:38<07:59, 1.56it/s, loss=0.000888, lr=0.001]\nSteps: 25%|██▌ | 251/1000 [02:39<08:04, 1.55it/s, loss=0.000888, lr=0.001]\nSteps: 25%|██▌ | 251/1000 [02:39<08:04, 1.55it/s, loss=0.0139, lr=0.001] \nSteps: 25%|██▌ | 252/1000 [02:39<08:00, 1.56it/s, loss=0.0139, lr=0.001]\ntensor(0.0089, device='cuda:0')\ntensor([[0.4105],\n[0.4111]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4100], device='cuda:0')\nSteps: 25%|██▌ | 252/1000 [02:39<08:00, 1.56it/s, loss=0.0116, lr=0.001]\nSteps: 25%|██▌ | 253/1000 [02:40<08:10, 1.52it/s, loss=0.0116, lr=0.001]\nSteps: 25%|██▌ | 253/1000 [02:40<08:10, 1.52it/s, loss=0.0285, lr=0.001]\nSteps: 25%|██▌ | 254/1000 [02:41<08:06, 1.53it/s, loss=0.0285, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4105],\n[0.4109]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4099], device='cuda:0')\nSteps: 25%|██▌ | 254/1000 [02:41<08:06, 1.53it/s, loss=0.011, lr=0.001] \nSteps: 26%|██▌ | 255/1000 [02:41<08:06, 1.53it/s, loss=0.011, lr=0.001]\nSteps: 26%|██▌ | 255/1000 [02:41<08:06, 1.53it/s, loss=0.00831, lr=0.001]\nSteps: 26%|██▌ | 256/1000 [02:42<08:02, 1.54it/s, loss=0.00831, lr=0.001]\ntensor(0.0023, device='cuda:0')\ntensor([[0.4104],\n[0.4108]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4097], device='cuda:0')\nSteps: 26%|██▌ | 256/1000 [02:42<08:02, 1.54it/s, loss=0.0235, lr=0.001] \nSteps: 26%|██▌ | 257/1000 [02:43<08:03, 1.54it/s, loss=0.0235, lr=0.001]\nSteps: 26%|██▌ | 257/1000 [02:43<08:03, 1.54it/s, loss=0.00303, lr=0.001]\nSteps: 26%|██▌ | 258/1000 [02:43<07:59, 1.55it/s, loss=0.00303, lr=0.001]\ntensor(0.0088, device='cuda:0')\ntensor([[0.4103],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4095], device='cuda:0')\nSteps: 26%|██▌ | 258/1000 [02:43<07:59, 1.55it/s, loss=0.0407, lr=0.001] \nSteps: 26%|██▌ | 259/1000 [02:44<08:01, 1.54it/s, loss=0.0407, lr=0.001]\nSteps: 26%|██▌ | 259/1000 [02:44<08:01, 1.54it/s, loss=0.0181, lr=0.001]\nSteps: 26%|██▌ | 260/1000 [02:45<07:56, 1.55it/s, loss=0.0181, lr=0.001]\ntensor(0.0088, device='cuda:0')\ntensor([[0.4102],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4095], device='cuda:0')\nSteps: 26%|██▌ | 260/1000 [02:45<07:56, 1.55it/s, loss=0.0383, lr=0.001]\nSteps: 26%|██▌ | 261/1000 [02:45<08:00, 1.54it/s, loss=0.0383, lr=0.001]\nSteps: 26%|██▌ | 261/1000 [02:45<08:00, 1.54it/s, loss=0.0344, lr=0.001]\nSteps: 26%|██▌ | 262/1000 [02:46<07:55, 1.55it/s, loss=0.0344, lr=0.001]\ntensor(0.0025, device='cuda:0')\ntensor([[0.4102],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4094], device='cuda:0')\nSteps: 26%|██▌ | 262/1000 [02:46<07:55, 1.55it/s, loss=0.00119, lr=0.001]\nSteps: 26%|██▋ | 263/1000 [02:47<08:00, 1.53it/s, loss=0.00119, lr=0.001]\nSteps: 26%|██▋ | 263/1000 [02:47<08:00, 1.53it/s, loss=0.00178, lr=0.001]\nSteps: 26%|██▋ | 264/1000 [02:47<07:53, 1.55it/s, loss=0.00178, lr=0.001]\ntensor(0.0092, device='cuda:0')\ntensor([[0.4101],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4095], device='cuda:0')\nSteps: 26%|██▋ | 264/1000 [02:47<07:53, 1.55it/s, loss=0.077, lr=0.001] \nSteps: 26%|██▋ | 265/1000 [02:48<07:55, 1.55it/s, loss=0.077, lr=0.001]\nSteps: 26%|██▋ | 265/1000 [02:48<07:55, 1.55it/s, loss=0.0233, lr=0.001]\nSteps: 27%|██▋ | 266/1000 [02:49<07:50, 1.56it/s, loss=0.0233, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4100],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4095], device='cuda:0')\nSteps: 27%|██▋ | 266/1000 [02:49<07:50, 1.56it/s, loss=0.000306, lr=0.001]\nSteps: 27%|██▋ | 267/1000 [02:49<07:54, 1.54it/s, loss=0.000306, lr=0.001]\nSteps: 27%|██▋ | 267/1000 [02:49<07:54, 1.54it/s, loss=0.0058, lr=0.001] \nSteps: 27%|██▋ | 268/1000 [02:50<07:50, 1.56it/s, loss=0.0058, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4099],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4094], device='cuda:0')\nSteps: 27%|██▋ | 268/1000 [02:50<07:50, 1.56it/s, loss=0.00137, lr=0.001]\nSteps: 27%|██▋ | 269/1000 [02:50<07:52, 1.55it/s, loss=0.00137, lr=0.001]\nSteps: 27%|██▋ | 269/1000 [02:50<07:52, 1.55it/s, loss=0.000572, lr=0.001]\nSteps: 27%|██▋ | 270/1000 [02:51<07:47, 1.56it/s, loss=0.000572, lr=0.001]\ntensor(0.0095, device='cuda:0')\ntensor([[0.4098],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4094], device='cuda:0')\nSteps: 27%|██▋ | 270/1000 [02:51<07:47, 1.56it/s, loss=0.0374, lr=0.001] \nSteps: 27%|██▋ | 271/1000 [02:52<07:51, 1.55it/s, loss=0.0374, lr=0.001]\nSteps: 27%|██▋ | 271/1000 [02:52<07:51, 1.55it/s, loss=0.00526, lr=0.001]\nSteps: 27%|██▋ | 272/1000 [02:52<07:46, 1.56it/s, loss=0.00526, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4096],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4094], device='cuda:0')\nSteps: 27%|██▋ | 272/1000 [02:52<07:46, 1.56it/s, loss=0.000402, lr=0.001]\nSteps: 27%|██▋ | 273/1000 [02:53<07:49, 1.55it/s, loss=0.000402, lr=0.001]\nSteps: 27%|██▋ | 273/1000 [02:53<07:49, 1.55it/s, loss=0.0369, lr=0.001] \nSteps: 27%|██▋ | 274/1000 [02:54<07:45, 1.56it/s, loss=0.0369, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4095],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4094], device='cuda:0')\nSteps: 27%|██▋ | 274/1000 [02:54<07:45, 1.56it/s, loss=0.0328, lr=0.001]\nSteps: 28%|██▊ | 275/1000 [02:54<07:49, 1.55it/s, loss=0.0328, lr=0.001]\nSteps: 28%|██▊ | 275/1000 [02:54<07:49, 1.55it/s, loss=0.0216, lr=0.001]\nSteps: 28%|██▊ | 276/1000 [02:55<07:45, 1.55it/s, loss=0.0216, lr=0.001]\ntensor(0.0016, device='cuda:0')\ntensor([[0.4093],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4094], device='cuda:0')\nSteps: 28%|██▊ | 276/1000 [02:55<07:45, 1.55it/s, loss=0.00096, lr=0.001]\nSteps: 28%|██▊ | 277/1000 [02:56<07:47, 1.55it/s, loss=0.00096, lr=0.001]\nSteps: 28%|██▊ | 277/1000 [02:56<07:47, 1.55it/s, loss=0.00177, lr=0.001]\nSteps: 28%|██▊ | 278/1000 [02:56<07:41, 1.57it/s, loss=0.00177, lr=0.001]\ntensor(0.0048, device='cuda:0')\ntensor([[0.4092],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4094], device='cuda:0')\nSteps: 28%|██▊ | 278/1000 [02:56<07:41, 1.57it/s, loss=0.0209, lr=0.001] \nSteps: 28%|██▊ | 279/1000 [02:57<07:45, 1.55it/s, loss=0.0209, lr=0.001]\nSteps: 28%|██▊ | 279/1000 [02:57<07:45, 1.55it/s, loss=0.000778, lr=0.001]\nSteps: 28%|██▊ | 280/1000 [02:58<07:40, 1.56it/s, loss=0.000778, lr=0.001]\ntensor(0.0025, device='cuda:0')\ntensor([[0.4091],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4093], device='cuda:0')\nSteps: 28%|██▊ | 280/1000 [02:58<07:40, 1.56it/s, loss=0.0108, lr=0.001] \nSteps: 28%|██▊ | 281/1000 [02:58<07:42, 1.55it/s, loss=0.0108, lr=0.001]\nSteps: 28%|██▊ | 281/1000 [02:58<07:42, 1.55it/s, loss=0.000837, lr=0.001]\nSteps: 28%|██▊ | 282/1000 [02:59<07:37, 1.57it/s, loss=0.000837, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4090],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4091], device='cuda:0')\nSteps: 28%|██▊ | 282/1000 [02:59<07:37, 1.57it/s, loss=0.00502, lr=0.001] \nSteps: 28%|██▊ | 283/1000 [02:59<07:41, 1.55it/s, loss=0.00502, lr=0.001]\nSteps: 28%|██▊ | 283/1000 [02:59<07:41, 1.55it/s, loss=0.00781, lr=0.001]\nSteps: 28%|██▊ | 284/1000 [03:00<07:36, 1.57it/s, loss=0.00781, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4089],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4090], device='cuda:0')\nSteps: 28%|██▊ | 284/1000 [03:00<07:36, 1.57it/s, loss=0.00449, lr=0.001]\nSteps: 28%|██▊ | 285/1000 [03:01<07:38, 1.56it/s, loss=0.00449, lr=0.001]\nSteps: 28%|██▊ | 285/1000 [03:01<07:38, 1.56it/s, loss=0.0029, lr=0.001] \nSteps: 29%|██▊ | 286/1000 [03:01<07:35, 1.57it/s, loss=0.0029, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4088],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4088], device='cuda:0')\nSteps: 29%|██▊ | 286/1000 [03:01<07:35, 1.57it/s, loss=0.00402, lr=0.001]\nSteps: 29%|██▊ | 287/1000 [03:02<07:37, 1.56it/s, loss=0.00402, lr=0.001]\nSteps: 29%|██▊ | 287/1000 [03:02<07:37, 1.56it/s, loss=0.000204, lr=0.001]\nSteps: 29%|██▉ | 288/1000 [03:03<07:32, 1.57it/s, loss=0.000204, lr=0.001]\ntensor(0.0017, device='cuda:0')\ntensor([[0.4087],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4085], device='cuda:0')\nSteps: 29%|██▉ | 288/1000 [03:03<07:32, 1.57it/s, loss=0.0105, lr=0.001] \nSteps: 29%|██▉ | 289/1000 [03:03<07:34, 1.56it/s, loss=0.0105, lr=0.001]\nSteps: 29%|██▉ | 289/1000 [03:03<07:34, 1.56it/s, loss=0.000225, lr=0.001]\nSteps: 29%|██▉ | 290/1000 [03:04<07:29, 1.58it/s, loss=0.000225, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4085],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4083], device='cuda:0')\nSteps: 29%|██▉ | 290/1000 [03:04<07:29, 1.58it/s, loss=0.00173, lr=0.001] \nSteps: 29%|██▉ | 291/1000 [03:05<07:32, 1.57it/s, loss=0.00173, lr=0.001]\nSteps: 29%|██▉ | 291/1000 [03:05<07:32, 1.57it/s, loss=0.00477, lr=0.001]\nSteps: 29%|██▉ | 292/1000 [03:05<07:27, 1.58it/s, loss=0.00477, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4083],\n[0.4089]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4080], device='cuda:0')\nSteps: 29%|██▉ | 292/1000 [03:05<07:27, 1.58it/s, loss=0.0743, lr=0.001] \nSteps: 29%|██▉ | 293/1000 [03:06<07:29, 1.57it/s, loss=0.0743, lr=0.001]\nSteps: 29%|██▉ | 293/1000 [03:06<07:29, 1.57it/s, loss=0.0805, lr=0.001]\nSteps: 29%|██▉ | 294/1000 [03:06<07:27, 1.58it/s, loss=0.0805, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4081],\n[0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4078], device='cuda:0')\nSteps: 29%|██▉ | 294/1000 [03:06<07:27, 1.58it/s, loss=0.000428, lr=0.001]\nSteps: 30%|██▉ | 295/1000 [03:07<07:29, 1.57it/s, loss=0.000428, lr=0.001]\nSteps: 30%|██▉ | 295/1000 [03:07<07:29, 1.57it/s, loss=0.0122, lr=0.001] \nSteps: 30%|██▉ | 296/1000 [03:08<07:26, 1.58it/s, loss=0.0122, lr=0.001]\ntensor(0.0043, device='cuda:0')\ntensor([[0.4080],\n[0.4084]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4076], device='cuda:0')\nSteps: 30%|██▉ | 296/1000 [03:08<07:26, 1.58it/s, loss=0.0223, lr=0.001]\nSteps: 30%|██▉ | 297/1000 [03:08<07:29, 1.56it/s, loss=0.0223, lr=0.001]\nSteps: 30%|██▉ | 297/1000 [03:08<07:29, 1.56it/s, loss=0.00694, lr=0.001]\nSteps: 30%|██▉ | 298/1000 [03:09<07:25, 1.58it/s, loss=0.00694, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4079],\n[0.4082]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4073], device='cuda:0')\nSteps: 30%|██▉ | 298/1000 [03:09<07:25, 1.58it/s, loss=0.00844, lr=0.001]\nSteps: 30%|██▉ | 299/1000 [03:10<07:28, 1.56it/s, loss=0.00844, lr=0.001]\nSteps: 30%|██▉ | 299/1000 [03:10<07:28, 1.56it/s, loss=0.0155, lr=0.001] \nSteps: 30%|███ | 300/1000 [03:10<07:24, 1.58it/s, loss=0.0155, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0150, 0.0010, -0.0104, -0.0219], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0016, -0.0075, -0.0043, 0.0085], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_300.safetensors\ntensor(0.0046, device='cuda:0')\ntensor([[0.4079],\n[0.4080]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4072], device='cuda:0')\nSteps: 30%|███ | 300/1000 [03:10<07:24, 1.58it/s, loss=0.0115, lr=0.001]\nSteps: 30%|███ | 301/1000 [03:11<07:29, 1.55it/s, loss=0.0115, lr=0.001]\nSteps: 30%|███ | 301/1000 [03:11<07:29, 1.55it/s, loss=0.00161, lr=0.001]\nSteps: 30%|███ | 302/1000 [03:12<07:25, 1.57it/s, loss=0.00161, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4079],\n[0.4078]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4070], device='cuda:0')\nSteps: 30%|███ | 302/1000 [03:12<07:25, 1.57it/s, loss=0.00208, lr=0.001]\nSteps: 30%|███ | 303/1000 [03:12<07:27, 1.56it/s, loss=0.00208, lr=0.001]\nSteps: 30%|███ | 303/1000 [03:12<07:27, 1.56it/s, loss=0.00318, lr=0.001]\nSteps: 30%|███ | 304/1000 [03:13<07:21, 1.58it/s, loss=0.00318, lr=0.001]\ntensor(0.0041, device='cuda:0')\ntensor([[0.4079],\n[0.4076]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4068], device='cuda:0')\nSteps: 30%|███ | 304/1000 [03:13<07:21, 1.58it/s, loss=0.000203, lr=0.001]\nSteps: 30%|███ | 305/1000 [03:13<07:23, 1.57it/s, loss=0.000203, lr=0.001]\nSteps: 30%|███ | 305/1000 [03:13<07:23, 1.57it/s, loss=0.00242, lr=0.001] \nSteps: 31%|███ | 306/1000 [03:14<07:20, 1.58it/s, loss=0.00242, lr=0.001]\ntensor(0.0042, device='cuda:0')\ntensor([[0.4079],\n[0.4075]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4067], device='cuda:0')\nSteps: 31%|███ | 306/1000 [03:14<07:20, 1.58it/s, loss=0.00278, lr=0.001]\nSteps: 31%|███ | 307/1000 [03:15<07:24, 1.56it/s, loss=0.00278, lr=0.001]\nSteps: 31%|███ | 307/1000 [03:15<07:24, 1.56it/s, loss=0.0105, lr=0.001] \nSteps: 31%|███ | 308/1000 [03:15<07:19, 1.58it/s, loss=0.0105, lr=0.001]\ntensor(0.0036, device='cuda:0')\ntensor([[0.4080],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4066], device='cuda:0')\nSteps: 31%|███ | 308/1000 [03:15<07:19, 1.58it/s, loss=0.00387, lr=0.001]\nSteps: 31%|███ | 309/1000 [03:16<07:20, 1.57it/s, loss=0.00387, lr=0.001]\nSteps: 31%|███ | 309/1000 [03:16<07:20, 1.57it/s, loss=0.00835, lr=0.001]\nSteps: 31%|███ | 310/1000 [03:17<07:16, 1.58it/s, loss=0.00835, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4080],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4066], device='cuda:0')\nSteps: 31%|███ | 310/1000 [03:17<07:16, 1.58it/s, loss=0.000944, lr=0.001]\nSteps: 31%|███ | 311/1000 [03:17<07:21, 1.56it/s, loss=0.000944, lr=0.001]\nSteps: 31%|███ | 311/1000 [03:17<07:21, 1.56it/s, loss=0.0117, lr=0.001] \nSteps: 31%|███ | 312/1000 [03:18<07:17, 1.57it/s, loss=0.0117, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4079],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4066], device='cuda:0')\nSteps: 31%|███ | 312/1000 [03:18<07:17, 1.57it/s, loss=0.0103, lr=0.001]\nSteps: 31%|███▏ | 313/1000 [03:19<07:18, 1.57it/s, loss=0.0103, lr=0.001]\nSteps: 31%|███▏ | 313/1000 [03:19<07:18, 1.57it/s, loss=0.000619, lr=0.001]\nSteps: 31%|███▏ | 314/1000 [03:19<07:14, 1.58it/s, loss=0.000619, lr=0.001]\ntensor(0.0091, device='cuda:0')\ntensor([[0.4079],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4066], device='cuda:0')\nSteps: 31%|███▏ | 314/1000 [03:19<07:14, 1.58it/s, loss=0.0259, lr=0.001] \nSteps: 32%|███▏ | 315/1000 [03:20<07:15, 1.57it/s, loss=0.0259, lr=0.001]\nSteps: 32%|███▏ | 315/1000 [03:20<07:15, 1.57it/s, loss=0.0223, lr=0.001]\nSteps: 32%|███▏ | 316/1000 [03:20<07:14, 1.58it/s, loss=0.0223, lr=0.001]\ntensor(0.0302, device='cuda:0')\ntensor([[0.4080],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4067], device='cuda:0')\nSteps: 32%|███▏ | 316/1000 [03:20<07:14, 1.58it/s, loss=0.00669, lr=0.001]\nSteps: 32%|███▏ | 317/1000 [03:21<07:15, 1.57it/s, loss=0.00669, lr=0.001]\nSteps: 32%|███▏ | 317/1000 [03:21<07:15, 1.57it/s, loss=0.00348, lr=0.001]\nSteps: 32%|███▏ | 318/1000 [03:22<07:10, 1.58it/s, loss=0.00348, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4084],\n[0.4077]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4069], device='cuda:0')\nSteps: 32%|███▏ | 318/1000 [03:22<07:10, 1.58it/s, loss=0.0187, lr=0.001] \nSteps: 32%|███▏ | 319/1000 [03:22<07:17, 1.56it/s, loss=0.0187, lr=0.001]\nSteps: 32%|███▏ | 319/1000 [03:22<07:17, 1.56it/s, loss=0.000746, lr=0.001]\nSteps: 32%|███▏ | 320/1000 [03:23<07:12, 1.57it/s, loss=0.000746, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4088],\n[0.4082]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4073], device='cuda:0')\nSteps: 32%|███▏ | 320/1000 [03:23<07:12, 1.57it/s, loss=0.102, lr=0.001] \nSteps: 32%|███▏ | 321/1000 [03:24<07:13, 1.56it/s, loss=0.102, lr=0.001]\nSteps: 32%|███▏ | 321/1000 [03:24<07:13, 1.56it/s, loss=0.0553, lr=0.001]\nSteps: 32%|███▏ | 322/1000 [03:24<07:08, 1.58it/s, loss=0.0553, lr=0.001]\ntensor(0.0044, device='cuda:0')\ntensor([[0.4093],\n[0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4078], device='cuda:0')\nSteps: 32%|███▏ | 322/1000 [03:24<07:08, 1.58it/s, loss=0.00572, lr=0.001]\nSteps: 32%|███▏ | 323/1000 [03:25<07:10, 1.57it/s, loss=0.00572, lr=0.001]\nSteps: 32%|███▏ | 323/1000 [03:25<07:10, 1.57it/s, loss=0.0152, lr=0.001] \nSteps: 32%|███▏ | 324/1000 [03:26<07:05, 1.59it/s, loss=0.0152, lr=0.001]\ntensor(0.0066, device='cuda:0')\ntensor([[0.4098],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4083], device='cuda:0')\nSteps: 32%|███▏ | 324/1000 [03:26<07:05, 1.59it/s, loss=0.0164, lr=0.001]\nSteps: 32%|███▎ | 325/1000 [03:26<07:10, 1.57it/s, loss=0.0164, lr=0.001]\nSteps: 32%|███▎ | 325/1000 [03:26<07:10, 1.57it/s, loss=0.0122, lr=0.001]\nSteps: 33%|███▎ | 326/1000 [03:27<07:05, 1.58it/s, loss=0.0122, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4104],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4088], device='cuda:0')\nSteps: 33%|███▎ | 326/1000 [03:27<07:05, 1.58it/s, loss=0.0254, lr=0.001]\nSteps: 33%|███▎ | 327/1000 [03:27<07:07, 1.57it/s, loss=0.0254, lr=0.001]\nSteps: 33%|███▎ | 327/1000 [03:27<07:07, 1.57it/s, loss=0.00151, lr=0.001]\nSteps: 33%|███▎ | 328/1000 [03:28<07:04, 1.58it/s, loss=0.00151, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4109],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4092], device='cuda:0')\nSteps: 33%|███▎ | 328/1000 [03:28<07:04, 1.58it/s, loss=0.0125, lr=0.001] \nSteps: 33%|███▎ | 329/1000 [03:29<07:08, 1.57it/s, loss=0.0125, lr=0.001]\nSteps: 33%|███▎ | 329/1000 [03:29<07:08, 1.57it/s, loss=0.0043, lr=0.001]\nSteps: 33%|███▎ | 330/1000 [03:29<07:04, 1.58it/s, loss=0.0043, lr=0.001]\ntensor(0.0066, device='cuda:0')\ntensor([[0.4114],\n[0.4108]], device='cuda:0')\nCurrent Norm : tensor([0.4102, 0.4097], device='cuda:0')\nSteps: 33%|███▎ | 330/1000 [03:29<07:04, 1.58it/s, loss=0.0313, lr=0.001]\nSteps: 33%|███▎ | 331/1000 [03:30<07:06, 1.57it/s, loss=0.0313, lr=0.001]\nSteps: 33%|███▎ | 331/1000 [03:30<07:06, 1.57it/s, loss=0.0152, lr=0.001]\nSteps: 33%|███▎ | 332/1000 [03:31<07:02, 1.58it/s, loss=0.0152, lr=0.001]\ntensor(0.0057, device='cuda:0')\ntensor([[0.4118],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4106, 0.4101], device='cuda:0')\nSteps: 33%|███▎ | 332/1000 [03:31<07:02, 1.58it/s, loss=0.00426, lr=0.001]\nSteps: 33%|███▎ | 333/1000 [03:31<07:06, 1.56it/s, loss=0.00426, lr=0.001]\nSteps: 33%|███▎ | 333/1000 [03:31<07:06, 1.56it/s, loss=0.00462, lr=0.001]\nSteps: 33%|███▎ | 334/1000 [03:32<07:02, 1.58it/s, loss=0.00462, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4121],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4109, 0.4103], device='cuda:0')\nSteps: 33%|███▎ | 334/1000 [03:32<07:02, 1.58it/s, loss=0.00314, lr=0.001]\nSteps: 34%|███▎ | 335/1000 [03:33<07:06, 1.56it/s, loss=0.00314, lr=0.001]\nSteps: 34%|███▎ | 335/1000 [03:33<07:06, 1.56it/s, loss=0.00436, lr=0.001]\nSteps: 34%|███▎ | 336/1000 [03:33<07:02, 1.57it/s, loss=0.00436, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4123],\n[0.4117]], device='cuda:0')\nCurrent Norm : tensor([0.4111, 0.4105], device='cuda:0')\nSteps: 34%|███▎ | 336/1000 [03:33<07:02, 1.57it/s, loss=0.00463, lr=0.001]\nSteps: 34%|███▎ | 337/1000 [03:34<07:04, 1.56it/s, loss=0.00463, lr=0.001]\nSteps: 34%|███▎ | 337/1000 [03:34<07:04, 1.56it/s, loss=0.00249, lr=0.001]\nSteps: 34%|███▍ | 338/1000 [03:34<06:59, 1.58it/s, loss=0.00249, lr=0.001]\ntensor(0.0032, device='cuda:0')\ntensor([[0.4124],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4112, 0.4106], device='cuda:0')\nSteps: 34%|███▍ | 338/1000 [03:34<06:59, 1.58it/s, loss=0.00377, lr=0.001]\nSteps: 34%|███▍ | 339/1000 [03:35<07:02, 1.57it/s, loss=0.00377, lr=0.001]\nSteps: 34%|███▍ | 339/1000 [03:35<07:02, 1.57it/s, loss=0.00659, lr=0.001]\nSteps: 34%|███▍ | 340/1000 [03:36<06:58, 1.58it/s, loss=0.00659, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4124],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4112, 0.4106], device='cuda:0')\nSteps: 34%|███▍ | 340/1000 [03:36<06:58, 1.58it/s, loss=0.0136, lr=0.001] \nSteps: 34%|███▍ | 341/1000 [03:36<07:00, 1.57it/s, loss=0.0136, lr=0.001]\nSteps: 34%|███▍ | 341/1000 [03:36<07:00, 1.57it/s, loss=0.00102, lr=0.001]\nSteps: 34%|███▍ | 342/1000 [03:37<06:56, 1.58it/s, loss=0.00102, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4124],\n[0.4117]], device='cuda:0')\nCurrent Norm : tensor([0.4112, 0.4106], device='cuda:0')\nSteps: 34%|███▍ | 342/1000 [03:37<06:56, 1.58it/s, loss=0.00414, lr=0.001]\nSteps: 34%|███▍ | 343/1000 [03:38<07:02, 1.55it/s, loss=0.00414, lr=0.001]\nSteps: 34%|███▍ | 343/1000 [03:38<07:02, 1.55it/s, loss=0.00138, lr=0.001]\nSteps: 34%|███▍ | 344/1000 [03:38<06:58, 1.57it/s, loss=0.00138, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4124],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4111, 0.4105], device='cuda:0')\nSteps: 34%|███▍ | 344/1000 [03:38<06:58, 1.57it/s, loss=0.0326, lr=0.001] \nSteps: 34%|███▍ | 345/1000 [03:39<07:01, 1.55it/s, loss=0.0326, lr=0.001]\nSteps: 34%|███▍ | 345/1000 [03:39<07:01, 1.55it/s, loss=0.0121, lr=0.001]\nSteps: 35%|███▍ | 346/1000 [03:40<06:58, 1.56it/s, loss=0.0121, lr=0.001]\ntensor(0.0043, device='cuda:0')\ntensor([[0.4123],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4111, 0.4103], device='cuda:0')\nSteps: 35%|███▍ | 346/1000 [03:40<06:58, 1.56it/s, loss=0.00339, lr=0.001]\nSteps: 35%|███▍ | 347/1000 [03:40<07:00, 1.55it/s, loss=0.00339, lr=0.001]\nSteps: 35%|███▍ | 347/1000 [03:40<07:00, 1.55it/s, loss=0.000568, lr=0.001]\nSteps: 35%|███▍ | 348/1000 [03:41<06:55, 1.57it/s, loss=0.000568, lr=0.001]\ntensor(0.0070, device='cuda:0')\ntensor([[0.4123],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4110, 0.4101], device='cuda:0')\nSteps: 35%|███▍ | 348/1000 [03:41<06:55, 1.57it/s, loss=0.00185, lr=0.001] \nSteps: 35%|███▍ | 349/1000 [03:42<06:58, 1.56it/s, loss=0.00185, lr=0.001]\nSteps: 35%|███▍ | 349/1000 [03:42<06:58, 1.56it/s, loss=0.0526, lr=0.001] \nSteps: 35%|███▌ | 350/1000 [03:42<06:54, 1.57it/s, loss=0.0526, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4122],\n[0.4111]], device='cuda:0')\nCurrent Norm : tensor([0.4110, 0.4100], device='cuda:0')\nSteps: 35%|███▌ | 350/1000 [03:42<06:54, 1.57it/s, loss=0.00635, lr=0.001]\nSteps: 35%|███▌ | 351/1000 [03:43<06:56, 1.56it/s, loss=0.00635, lr=0.001]\nSteps: 35%|███▌ | 351/1000 [03:43<06:56, 1.56it/s, loss=0.0076, lr=0.001] \nSteps: 35%|███▌ | 352/1000 [03:43<06:52, 1.57it/s, loss=0.0076, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4121],\n[0.4109]], device='cuda:0')\nCurrent Norm : tensor([0.4109, 0.4098], device='cuda:0')\nSteps: 35%|███▌ | 352/1000 [03:43<06:52, 1.57it/s, loss=0.000826, lr=0.001]\nSteps: 35%|███▌ | 353/1000 [03:44<06:54, 1.56it/s, loss=0.000826, lr=0.001]\nSteps: 35%|███▌ | 353/1000 [03:44<06:54, 1.56it/s, loss=0.0911, lr=0.001] \nSteps: 35%|███▌ | 354/1000 [03:45<06:51, 1.57it/s, loss=0.0911, lr=0.001]\ntensor(0.0056, device='cuda:0')\ntensor([[0.4120],\n[0.4108]], device='cuda:0')\nCurrent Norm : tensor([0.4108, 0.4097], device='cuda:0')\nSteps: 35%|███▌ | 354/1000 [03:45<06:51, 1.57it/s, loss=0.0925, lr=0.001]\nSteps: 36%|███▌ | 355/1000 [03:45<06:53, 1.56it/s, loss=0.0925, lr=0.001]\nSteps: 36%|███▌ | 355/1000 [03:45<06:53, 1.56it/s, loss=0.00582, lr=0.001]\nSteps: 36%|███▌ | 356/1000 [03:46<06:49, 1.57it/s, loss=0.00582, lr=0.001]\ntensor(0.0016, device='cuda:0')\ntensor([[0.4119],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4107, 0.4096], device='cuda:0')\nSteps: 36%|███▌ | 356/1000 [03:46<06:49, 1.57it/s, loss=0.00329, lr=0.001]\nSteps: 36%|███▌ | 357/1000 [03:47<06:51, 1.56it/s, loss=0.00329, lr=0.001]\nSteps: 36%|███▌ | 357/1000 [03:47<06:51, 1.56it/s, loss=0.00696, lr=0.001]\nSteps: 36%|███▌ | 358/1000 [03:47<06:48, 1.57it/s, loss=0.00696, lr=0.001]\ntensor(0.0007, device='cuda:0')\ntensor([[0.4118],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4106, 0.4095], device='cuda:0')\nSteps: 36%|███▌ | 358/1000 [03:47<06:48, 1.57it/s, loss=0.000468, lr=0.001]\nSteps: 36%|███▌ | 359/1000 [03:48<06:50, 1.56it/s, loss=0.000468, lr=0.001]\nSteps: 36%|███▌ | 359/1000 [03:48<06:50, 1.56it/s, loss=0.000551, lr=0.001]\nSteps: 36%|███▌ | 360/1000 [03:49<06:47, 1.57it/s, loss=0.000551, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4116],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4104, 0.4094], device='cuda:0')\nSteps: 36%|███▌ | 360/1000 [03:49<06:47, 1.57it/s, loss=0.00998, lr=0.001] \nSteps: 36%|███▌ | 361/1000 [03:49<06:53, 1.55it/s, loss=0.00998, lr=0.001]\nSteps: 36%|███▌ | 361/1000 [03:49<06:53, 1.55it/s, loss=0.00192, lr=0.001]\nSteps: 36%|███▌ | 362/1000 [03:50<06:48, 1.56it/s, loss=0.00192, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4113],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4102, 0.4093], device='cuda:0')\nSteps: 36%|███▌ | 362/1000 [03:50<06:48, 1.56it/s, loss=0.0871, lr=0.001] \nSteps: 36%|███▋ | 363/1000 [03:50<06:52, 1.54it/s, loss=0.0871, lr=0.001]\nSteps: 36%|███▋ | 363/1000 [03:50<06:52, 1.54it/s, loss=0.0123, lr=0.001]\nSteps: 36%|███▋ | 364/1000 [03:51<06:47, 1.56it/s, loss=0.0123, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4110],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4092], device='cuda:0')\nSteps: 36%|███▋ | 364/1000 [03:51<06:47, 1.56it/s, loss=0.00502, lr=0.001]\nSteps: 36%|███▋ | 365/1000 [03:52<06:51, 1.54it/s, loss=0.00502, lr=0.001]\nSteps: 36%|███▋ | 365/1000 [03:52<06:51, 1.54it/s, loss=0.00467, lr=0.001]\nSteps: 37%|███▋ | 366/1000 [03:52<06:45, 1.56it/s, loss=0.00467, lr=0.001]\ntensor(0.0016, device='cuda:0')\ntensor([[0.4107],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4091], device='cuda:0')\nSteps: 37%|███▋ | 366/1000 [03:52<06:45, 1.56it/s, loss=0.00221, lr=0.001]\nSteps: 37%|███▋ | 367/1000 [03:53<06:48, 1.55it/s, loss=0.00221, lr=0.001]\nSteps: 37%|███▋ | 367/1000 [03:53<06:48, 1.55it/s, loss=0.0025, lr=0.001] \nSteps: 37%|███▋ | 368/1000 [03:54<06:43, 1.57it/s, loss=0.0025, lr=0.001]\ntensor(0.0069, device='cuda:0')\ntensor([[0.4103],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4090], device='cuda:0')\nSteps: 37%|███▋ | 368/1000 [03:54<06:43, 1.57it/s, loss=0.00391, lr=0.001]\nSteps: 37%|███▋ | 369/1000 [03:54<06:47, 1.55it/s, loss=0.00391, lr=0.001]\nSteps: 37%|███▋ | 369/1000 [03:54<06:47, 1.55it/s, loss=0.0193, lr=0.001] \nSteps: 37%|███▋ | 370/1000 [03:55<06:42, 1.56it/s, loss=0.0193, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4100],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4090], device='cuda:0')\nSteps: 37%|███▋ | 370/1000 [03:55<06:42, 1.56it/s, loss=0.0164, lr=0.001]\nSteps: 37%|███▋ | 371/1000 [03:56<06:44, 1.56it/s, loss=0.0164, lr=0.001]\nSteps: 37%|███▋ | 371/1000 [03:56<06:44, 1.56it/s, loss=0.00448, lr=0.001]\nSteps: 37%|███▋ | 372/1000 [03:56<06:38, 1.57it/s, loss=0.00448, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4098],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4089], device='cuda:0')\nSteps: 37%|███▋ | 372/1000 [03:56<06:38, 1.57it/s, loss=0.025, lr=0.001] \nSteps: 37%|███▋ | 373/1000 [03:57<06:44, 1.55it/s, loss=0.025, lr=0.001]\nSteps: 37%|███▋ | 373/1000 [03:57<06:44, 1.55it/s, loss=0.0114, lr=0.001]\nSteps: 37%|███▋ | 374/1000 [03:58<06:39, 1.57it/s, loss=0.0114, lr=0.001]\ntensor(0.0017, device='cuda:0')\ntensor([[0.4096],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4089], device='cuda:0')\nSteps: 37%|███▋ | 374/1000 [03:58<06:39, 1.57it/s, loss=0.00137, lr=0.001]\nSteps: 38%|███▊ | 375/1000 [03:58<06:41, 1.56it/s, loss=0.00137, lr=0.001]\nSteps: 38%|███▊ | 375/1000 [03:58<06:41, 1.56it/s, loss=0.00574, lr=0.001]\nSteps: 38%|███▊ | 376/1000 [03:59<06:37, 1.57it/s, loss=0.00574, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4093],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4089], device='cuda:0')\nSteps: 38%|███▊ | 376/1000 [03:59<06:37, 1.57it/s, loss=0.00587, lr=0.001]\nSteps: 38%|███▊ | 377/1000 [03:59<06:39, 1.56it/s, loss=0.00587, lr=0.001]\nSteps: 38%|███▊ | 377/1000 [03:59<06:39, 1.56it/s, loss=0.000478, lr=0.001]\nSteps: 38%|███▊ | 378/1000 [04:00<06:35, 1.57it/s, loss=0.000478, lr=0.001]\ntensor(0.0050, device='cuda:0')\ntensor([[0.4090],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4089], device='cuda:0')\nSteps: 38%|███▊ | 378/1000 [04:00<06:35, 1.57it/s, loss=0.00445, lr=0.001] \nSteps: 38%|███▊ | 379/1000 [04:01<06:37, 1.56it/s, loss=0.00445, lr=0.001]\nSteps: 38%|███▊ | 379/1000 [04:01<06:37, 1.56it/s, loss=0.125, lr=0.001] \nSteps: 38%|███▊ | 380/1000 [04:01<06:34, 1.57it/s, loss=0.125, lr=0.001]\ntensor(0.0014, device='cuda:0')\ntensor([[0.4087],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4088], device='cuda:0')\nSteps: 38%|███▊ | 380/1000 [04:01<06:34, 1.57it/s, loss=0.00104, lr=0.001]\nSteps: 38%|███▊ | 381/1000 [04:02<06:36, 1.56it/s, loss=0.00104, lr=0.001]\nSteps: 38%|███▊ | 381/1000 [04:02<06:36, 1.56it/s, loss=0.0174, lr=0.001] \nSteps: 38%|███▊ | 382/1000 [04:03<06:35, 1.56it/s, loss=0.0174, lr=0.001]\ntensor(0.0005, device='cuda:0')\ntensor([[0.4084],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4088], device='cuda:0')\nSteps: 38%|███▊ | 382/1000 [04:03<06:35, 1.56it/s, loss=0.000611, lr=0.001]\nSteps: 38%|███▊ | 383/1000 [04:03<06:38, 1.55it/s, loss=0.000611, lr=0.001]\nSteps: 38%|███▊ | 383/1000 [04:03<06:38, 1.55it/s, loss=0.000669, lr=0.001]\nSteps: 38%|███▊ | 384/1000 [04:04<06:35, 1.56it/s, loss=0.000669, lr=0.001]\ntensor(0.0072, device='cuda:0')\ntensor([[0.4081],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4087], device='cuda:0')\nSteps: 38%|███▊ | 384/1000 [04:04<06:35, 1.56it/s, loss=0.00656, lr=0.001] \nSteps: 38%|███▊ | 385/1000 [04:05<06:36, 1.55it/s, loss=0.00656, lr=0.001]\nSteps: 38%|███▊ | 385/1000 [04:05<06:36, 1.55it/s, loss=0.0152, lr=0.001] \nSteps: 39%|███▊ | 386/1000 [04:05<06:32, 1.57it/s, loss=0.0152, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4078],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4086], device='cuda:0')\nSteps: 39%|███▊ | 386/1000 [04:05<06:32, 1.57it/s, loss=0.000874, lr=0.001]\nSteps: 39%|███▊ | 387/1000 [04:06<06:34, 1.56it/s, loss=0.000874, lr=0.001]\nSteps: 39%|███▊ | 387/1000 [04:06<06:34, 1.56it/s, loss=0.0114, lr=0.001] \nSteps: 39%|███▉ | 388/1000 [04:06<06:29, 1.57it/s, loss=0.0114, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4075],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4084], device='cuda:0')\nSteps: 39%|███▉ | 388/1000 [04:06<06:29, 1.57it/s, loss=0.00224, lr=0.001]\nSteps: 39%|███▉ | 389/1000 [04:07<06:33, 1.55it/s, loss=0.00224, lr=0.001]\nSteps: 39%|███▉ | 389/1000 [04:07<06:33, 1.55it/s, loss=0.00672, lr=0.001]\nSteps: 39%|███▉ | 390/1000 [04:08<06:30, 1.56it/s, loss=0.00672, lr=0.001]\ntensor(0.0092, device='cuda:0')\ntensor([[0.4073],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4083], device='cuda:0')\nSteps: 39%|███▉ | 390/1000 [04:08<06:30, 1.56it/s, loss=0.06, lr=0.001] \nSteps: 39%|███▉ | 391/1000 [04:08<06:33, 1.55it/s, loss=0.06, lr=0.001]\nSteps: 39%|███▉ | 391/1000 [04:08<06:33, 1.55it/s, loss=0.0165, lr=0.001]\nSteps: 39%|███▉ | 392/1000 [04:09<06:29, 1.56it/s, loss=0.0165, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4071],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4083], device='cuda:0')\nSteps: 39%|███▉ | 392/1000 [04:09<06:29, 1.56it/s, loss=0.00418, lr=0.001]\nSteps: 39%|███▉ | 393/1000 [04:10<06:32, 1.55it/s, loss=0.00418, lr=0.001]\nSteps: 39%|███▉ | 393/1000 [04:10<06:32, 1.55it/s, loss=0.0117, lr=0.001] \nSteps: 39%|███▉ | 394/1000 [04:10<06:27, 1.56it/s, loss=0.0117, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4070],\n [0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4063, 0.4082], device='cuda:0')\nSteps: 39%|███▉ | 394/1000 [04:10<06:27, 1.56it/s, loss=0.0147, lr=0.001]\nSteps: 40%|███▉ | 395/1000 [04:11<06:31, 1.55it/s, loss=0.0147, lr=0.001]\nSteps: 40%|███▉ | 395/1000 [04:11<06:31, 1.55it/s, loss=0.0108, lr=0.001]\nSteps: 40%|███▉ | 396/1000 [04:12<06:26, 1.56it/s, loss=0.0108, lr=0.001]\ntensor(0.0041, device='cuda:0')\ntensor([[0.4069],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4062, 0.4082], device='cuda:0')\nSteps: 40%|███▉ | 396/1000 [04:12<06:26, 1.56it/s, loss=0.0212, lr=0.001]\nSteps: 40%|███▉ | 397/1000 [04:12<06:28, 1.55it/s, loss=0.0212, lr=0.001]\nSteps: 40%|███▉ | 397/1000 [04:12<06:28, 1.55it/s, loss=0.000443, lr=0.001]\nSteps: 40%|███▉ | 398/1000 [04:13<06:24, 1.57it/s, loss=0.000443, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4068],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4061, 0.4082], device='cuda:0')\nSteps: 40%|███▉ | 398/1000 [04:13<06:24, 1.57it/s, loss=0.0275, lr=0.001] \nSteps: 40%|███▉ | 399/1000 [04:14<06:26, 1.55it/s, loss=0.0275, lr=0.001]\nSteps: 40%|███▉ | 399/1000 [04:14<06:26, 1.55it/s, loss=0.00281, lr=0.001]\nSteps: 40%|████ | 400/1000 [04:14<06:22, 1.57it/s, loss=0.00281, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0111, -0.0023, -0.0138, -0.0236], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0041, -0.0082, -0.0049, 0.0073], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_400.safetensors\ntensor(0.0030, device='cuda:0')\ntensor([[0.4067],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4060, 0.4081], device='cuda:0')\nSteps: 40%|████ | 400/1000 [04:14<06:22, 1.57it/s, loss=0.0186, lr=0.001] \nSteps: 40%|████ | 401/1000 [04:15<06:25, 1.55it/s, loss=0.0186, lr=0.001]\nSteps: 40%|████ | 401/1000 [04:15<06:25, 1.55it/s, loss=0.000768, lr=0.001]\nSteps: 40%|████ | 402/1000 [04:15<06:20, 1.57it/s, loss=0.000768, lr=0.001]\ntensor(0.0072, device='cuda:0')\ntensor([[0.4066],\n[0.4088]], device='cuda:0')\nCurrent Norm : tensor([0.4059, 0.4079], device='cuda:0')\nSteps: 40%|████ | 402/1000 [04:15<06:20, 1.57it/s, loss=0.024, lr=0.001] \nSteps: 40%|████ | 403/1000 [04:16<06:23, 1.56it/s, loss=0.024, lr=0.001]\nSteps: 40%|████ | 403/1000 [04:16<06:23, 1.56it/s, loss=0.00256, lr=0.001]\nSteps: 40%|████ | 404/1000 [04:17<06:19, 1.57it/s, loss=0.00256, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4065],\n [0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4058, 0.4078], device='cuda:0')\nSteps: 40%|████ | 404/1000 [04:17<06:19, 1.57it/s, loss=0.0389, lr=0.001] \nSteps: 40%|████ | 405/1000 [04:17<06:22, 1.56it/s, loss=0.0389, lr=0.001]\nSteps: 40%|████ | 405/1000 [04:17<06:22, 1.56it/s, loss=0.00633, lr=0.001]\nSteps: 41%|████ | 406/1000 [04:18<06:19, 1.57it/s, loss=0.00633, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4064],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4058, 0.4077], device='cuda:0')\nSteps: 41%|████ | 406/1000 [04:18<06:19, 1.57it/s, loss=0.0101, lr=0.001] \nSteps: 41%|████ | 407/1000 [04:19<06:21, 1.55it/s, loss=0.0101, lr=0.001]\nSteps: 41%|████ | 407/1000 [04:19<06:21, 1.55it/s, loss=0.0262, lr=0.001]\nSteps: 41%|████ | 408/1000 [04:19<06:18, 1.56it/s, loss=0.0262, lr=0.001]\ntensor(0.0025, device='cuda:0')\ntensor([[0.4063],\n[0.4084]], device='cuda:0')\nCurrent Norm : tensor([0.4057, 0.4076], device='cuda:0')\nSteps: 41%|████ | 408/1000 [04:19<06:18, 1.56it/s, loss=0.0159, lr=0.001]\nSteps: 41%|████ | 409/1000 [04:20<06:19, 1.56it/s, loss=0.0159, lr=0.001]\nSteps: 41%|████ | 409/1000 [04:20<06:19, 1.56it/s, loss=0.00111, lr=0.001]\nSteps: 41%|████ | 410/1000 [04:21<06:16, 1.57it/s, loss=0.00111, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4063],\n[0.4083]], device='cuda:0')\nCurrent Norm : tensor([0.4057, 0.4075], device='cuda:0')\nSteps: 41%|████ | 410/1000 [04:21<06:16, 1.57it/s, loss=0.00813, lr=0.001]\nSteps: 41%|████ | 411/1000 [04:21<06:18, 1.55it/s, loss=0.00813, lr=0.001]\nSteps: 41%|████ | 411/1000 [04:21<06:18, 1.55it/s, loss=0.00135, lr=0.001]\nSteps: 41%|████ | 412/1000 [04:22<06:15, 1.57it/s, loss=0.00135, lr=0.001]\ntensor(0.0019, device='cuda:0')\ntensor([[0.4063],\n[0.4082]], device='cuda:0')\nCurrent Norm : tensor([0.4057, 0.4074], device='cuda:0')\nSteps: 41%|████ | 412/1000 [04:22<06:15, 1.57it/s, loss=0.00114, lr=0.001]\nSteps: 41%|████▏ | 413/1000 [04:23<06:21, 1.54it/s, loss=0.00114, lr=0.001]\nSteps: 41%|████▏ | 413/1000 [04:23<06:21, 1.54it/s, loss=0.00502, lr=0.001]\nSteps: 41%|████▏ | 414/1000 [04:23<06:15, 1.56it/s, loss=0.00502, lr=0.001]\ntensor(0.0065, device='cuda:0')\ntensor([[0.4062],\n[0.4081]], device='cuda:0')\nCurrent Norm : tensor([0.4056, 0.4073], device='cuda:0')\nSteps: 41%|████▏ | 414/1000 [04:23<06:15, 1.56it/s, loss=0.00983, lr=0.001]\nSteps: 42%|████▏ | 415/1000 [04:24<06:17, 1.55it/s, loss=0.00983, lr=0.001]\nSteps: 42%|████▏ | 415/1000 [04:24<06:17, 1.55it/s, loss=0.0534, lr=0.001] \nSteps: 42%|████▏ | 416/1000 [04:24<06:13, 1.56it/s, loss=0.0534, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4061],\n[0.4081]], device='cuda:0')\nCurrent Norm : tensor([0.4055, 0.4073], device='cuda:0')\nSteps: 42%|████▏ | 416/1000 [04:24<06:13, 1.56it/s, loss=0.0325, lr=0.001]\nSteps: 42%|████▏ | 417/1000 [04:25<06:15, 1.55it/s, loss=0.0325, lr=0.001]\nSteps: 42%|████▏ | 417/1000 [04:25<06:15, 1.55it/s, loss=0.000167, lr=0.001]\nSteps: 42%|████▏ | 418/1000 [04:26<06:10, 1.57it/s, loss=0.000167, lr=0.001]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4061],\n[0.4081]], device='cuda:0')\nCurrent Norm : tensor([0.4055, 0.4073], device='cuda:0')\nSteps: 42%|████▏ | 418/1000 [04:26<06:10, 1.57it/s, loss=0.0166, lr=0.001] \nSteps: 42%|████▏ | 419/1000 [04:26<06:12, 1.56it/s, loss=0.0166, lr=0.001]\nSteps: 42%|████▏ | 419/1000 [04:26<06:12, 1.56it/s, loss=0.000417, lr=0.001]\nSteps: 42%|████▏ | 420/1000 [04:27<06:08, 1.57it/s, loss=0.000417, lr=0.001]\ntensor(0.0050, device='cuda:0')\ntensor([[0.4060],\n[0.4082]], device='cuda:0')\nCurrent Norm : tensor([0.4054, 0.4074], device='cuda:0')\nSteps: 42%|████▏ | 420/1000 [04:27<06:08, 1.57it/s, loss=0.00439, lr=0.001] \nSteps: 42%|████▏ | 421/1000 [04:28<06:13, 1.55it/s, loss=0.00439, lr=0.001]\nSteps: 42%|████▏ | 421/1000 [04:28<06:13, 1.55it/s, loss=0.114, lr=0.001] \nSteps: 42%|████▏ | 422/1000 [04:28<06:10, 1.56it/s, loss=0.114, lr=0.001]\ntensor(0.0061, device='cuda:0')\ntensor([[0.4060],\n[0.4083]], device='cuda:0')\nCurrent Norm : tensor([0.4054, 0.4075], device='cuda:0')\nSteps: 42%|████▏ | 422/1000 [04:28<06:10, 1.56it/s, loss=0.0405, lr=0.001]\nSteps: 42%|████▏ | 423/1000 [04:29<06:11, 1.55it/s, loss=0.0405, lr=0.001]\nSteps: 42%|████▏ | 423/1000 [04:29<06:11, 1.55it/s, loss=0.00279, lr=0.001]\nSteps: 42%|████▏ | 424/1000 [04:30<06:07, 1.57it/s, loss=0.00279, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4060],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4054, 0.4076], device='cuda:0')\nSteps: 42%|████▏ | 424/1000 [04:30<06:07, 1.57it/s, loss=0.00228, lr=0.001]\nSteps: 42%|████▎ | 425/1000 [04:30<06:08, 1.56it/s, loss=0.00228, lr=0.001]\nSteps: 42%|████▎ | 425/1000 [04:30<06:08, 1.56it/s, loss=0.00422, lr=0.001]\nSteps: 43%|████▎ | 426/1000 [04:31<06:05, 1.57it/s, loss=0.00422, lr=0.001]\ntensor(0.0081, device='cuda:0')\ntensor([[0.4061],\n[0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4055, 0.4078], device='cuda:0')\nSteps: 43%|████▎ | 426/1000 [04:31<06:05, 1.57it/s, loss=0.00899, lr=0.001]\nSteps: 43%|████▎ | 427/1000 [04:31<06:06, 1.56it/s, loss=0.00899, lr=0.001]\nSteps: 43%|████▎ | 427/1000 [04:31<06:06, 1.56it/s, loss=0.00143, lr=0.001]\nSteps: 43%|████▎ | 428/1000 [04:32<06:03, 1.57it/s, loss=0.00143, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4063],\n[0.4088]], device='cuda:0')\nCurrent Norm : tensor([0.4056, 0.4079], device='cuda:0')\nSteps: 43%|████▎ | 428/1000 [04:32<06:03, 1.57it/s, loss=0.00524, lr=0.001]\nSteps: 43%|████▎ | 429/1000 [04:33<06:05, 1.56it/s, loss=0.00524, lr=0.001]\nSteps: 43%|████▎ | 429/1000 [04:33<06:05, 1.56it/s, loss=0.00667, lr=0.001]\nSteps: 43%|████▎ | 430/1000 [04:33<06:02, 1.57it/s, loss=0.00667, lr=0.001]\ntensor(0.0061, device='cuda:0')\ntensor([[0.4064],\n[0.4089]], device='cuda:0')\nCurrent Norm : tensor([0.4057, 0.4080], device='cuda:0')\nSteps: 43%|████▎ | 430/1000 [04:33<06:02, 1.57it/s, loss=0.0314, lr=0.001] \nSteps: 43%|████▎ | 431/1000 [04:34<06:05, 1.56it/s, loss=0.0314, lr=0.001]\nSteps: 43%|████▎ | 431/1000 [04:34<06:05, 1.56it/s, loss=0.000126, lr=0.001]\nSteps: 43%|████▎ | 432/1000 [04:35<06:01, 1.57it/s, loss=0.000126, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4065],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4059, 0.4081], device='cuda:0')\nSteps: 43%|████▎ | 432/1000 [04:35<06:01, 1.57it/s, loss=0.00386, lr=0.001] \nSteps: 43%|████▎ | 433/1000 [04:35<06:04, 1.56it/s, loss=0.00386, lr=0.001]\nSteps: 43%|████▎ | 433/1000 [04:35<06:04, 1.56it/s, loss=0.00157, lr=0.001]\nSteps: 43%|████▎ | 434/1000 [04:36<05:59, 1.57it/s, loss=0.00157, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4067],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4060, 0.4082], device='cuda:0')\nSteps: 43%|████▎ | 434/1000 [04:36<05:59, 1.57it/s, loss=0.0367, lr=0.001] \nSteps: 44%|████▎ | 435/1000 [04:37<06:03, 1.55it/s, loss=0.0367, lr=0.001]\nSteps: 44%|████▎ | 435/1000 [04:37<06:03, 1.55it/s, loss=0.000496, lr=0.001]\nSteps: 44%|████▎ | 436/1000 [04:37<06:01, 1.56it/s, loss=0.000496, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4068],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4061, 0.4083], device='cuda:0')\nSteps: 44%|████▎ | 436/1000 [04:37<06:01, 1.56it/s, loss=0.00519, lr=0.001] \nSteps: 44%|████▎ | 437/1000 [04:38<06:03, 1.55it/s, loss=0.00519, lr=0.001]\nSteps: 44%|████▎ | 437/1000 [04:38<06:03, 1.55it/s, loss=0.000983, lr=0.001]\nSteps: 44%|████▍ | 438/1000 [04:39<06:00, 1.56it/s, loss=0.000983, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4069],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4062, 0.4084], device='cuda:0')\nSteps: 44%|████▍ | 438/1000 [04:39<06:00, 1.56it/s, loss=0.0276, lr=0.001] \nSteps: 44%|████▍ | 439/1000 [04:39<06:01, 1.55it/s, loss=0.0276, lr=0.001]\nSteps: 44%|████▍ | 439/1000 [04:39<06:01, 1.55it/s, loss=0.017, lr=0.001] \nSteps: 44%|████▍ | 440/1000 [04:40<05:57, 1.57it/s, loss=0.017, lr=0.001]\ntensor(0.0067, device='cuda:0')\ntensor([[0.4071],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4085], device='cuda:0')\nSteps: 44%|████▍ | 440/1000 [04:40<05:57, 1.57it/s, loss=0.0138, lr=0.001]\nSteps: 44%|████▍ | 441/1000 [04:40<05:58, 1.56it/s, loss=0.0138, lr=0.001]\nSteps: 44%|████▍ | 441/1000 [04:40<05:58, 1.56it/s, loss=0.083, lr=0.001] \nSteps: 44%|████▍ | 442/1000 [04:41<05:55, 1.57it/s, loss=0.083, lr=0.001]\ntensor(0.0102, device='cuda:0')\ntensor([[0.4072],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4086], device='cuda:0')\nSteps: 44%|████▍ | 442/1000 [04:41<05:55, 1.57it/s, loss=0.00452, lr=0.001]\nSteps: 44%|████▍ | 443/1000 [04:42<05:56, 1.56it/s, loss=0.00452, lr=0.001]\nSteps: 44%|████▍ | 443/1000 [04:42<05:56, 1.56it/s, loss=0.00397, lr=0.001]\nSteps: 44%|████▍ | 444/1000 [04:42<05:53, 1.57it/s, loss=0.00397, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4074],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4087], device='cuda:0')\nSteps: 44%|████▍ | 444/1000 [04:42<05:53, 1.57it/s, loss=0.0393, lr=0.001] \nSteps: 44%|████▍ | 445/1000 [04:43<05:53, 1.57it/s, loss=0.0393, lr=0.001]\nSteps: 44%|████▍ | 445/1000 [04:43<05:53, 1.57it/s, loss=0.00046, lr=0.001]\nSteps: 45%|████▍ | 446/1000 [04:44<05:50, 1.58it/s, loss=0.00046, lr=0.001]\ntensor(0.0082, device='cuda:0')\ntensor([[0.4077],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4088], device='cuda:0')\nSteps: 45%|████▍ | 446/1000 [04:44<05:50, 1.58it/s, loss=0.0732, lr=0.001] \nSteps: 45%|████▍ | 447/1000 [04:44<05:52, 1.57it/s, loss=0.0732, lr=0.001]\nSteps: 45%|████▍ | 447/1000 [04:44<05:52, 1.57it/s, loss=0.022, lr=0.001] \nSteps: 45%|████▍ | 448/1000 [04:45<05:49, 1.58it/s, loss=0.022, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4079],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4089], device='cuda:0')\nSteps: 45%|████▍ | 448/1000 [04:45<05:49, 1.58it/s, loss=0.0017, lr=0.001]\nSteps: 45%|████▍ | 449/1000 [04:46<05:51, 1.57it/s, loss=0.0017, lr=0.001]\nSteps: 45%|████▍ | 449/1000 [04:46<05:51, 1.57it/s, loss=0.0157, lr=0.001]\nSteps: 45%|████▌ | 450/1000 [04:46<05:48, 1.58it/s, loss=0.0157, lr=0.001]\ntensor(0.0044, device='cuda:0')\ntensor([[0.4081],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4090], device='cuda:0')\nSteps: 45%|████▌ | 450/1000 [04:46<05:48, 1.58it/s, loss=0.00657, lr=0.001]\nSteps: 45%|████▌ | 451/1000 [04:47<05:51, 1.56it/s, loss=0.00657, lr=0.001]\nSteps: 45%|████▌ | 451/1000 [04:47<05:51, 1.56it/s, loss=0.00209, lr=0.001]\nSteps: 45%|████▌ | 452/1000 [04:47<05:48, 1.57it/s, loss=0.00209, lr=0.001]\ntensor(0.0032, device='cuda:0')\ntensor([[0.4083],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4074, 0.4091], device='cuda:0')\nSteps: 45%|████▌ | 452/1000 [04:47<05:48, 1.57it/s, loss=0.000439, lr=0.001]\nSteps: 45%|████▌ | 453/1000 [04:48<05:48, 1.57it/s, loss=0.000439, lr=0.001]\nSteps: 45%|████▌ | 453/1000 [04:48<05:48, 1.57it/s, loss=0.00718, lr=0.001] \nSteps: 45%|████▌ | 454/1000 [04:49<05:46, 1.58it/s, loss=0.00718, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4084],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4092], device='cuda:0')\nSteps: 45%|████▌ | 454/1000 [04:49<05:46, 1.58it/s, loss=0.0172, lr=0.001] \nSteps: 46%|████▌ | 455/1000 [04:49<05:55, 1.53it/s, loss=0.0172, lr=0.001]\nSteps: 46%|████▌ | 455/1000 [04:49<05:55, 1.53it/s, loss=0.0327, lr=0.001]\nSteps: 46%|████▌ | 456/1000 [04:50<05:51, 1.55it/s, loss=0.0327, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4085],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4092], device='cuda:0')\nSteps: 46%|████▌ | 456/1000 [04:50<05:51, 1.55it/s, loss=0.00666, lr=0.001]\nSteps: 46%|████▌ | 457/1000 [04:51<05:54, 1.53it/s, loss=0.00666, lr=0.001]\nSteps: 46%|████▌ | 457/1000 [04:51<05:54, 1.53it/s, loss=0.00171, lr=0.001]\nSteps: 46%|████▌ | 458/1000 [04:51<05:51, 1.54it/s, loss=0.00171, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4085],\n [0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4092], device='cuda:0')\nSteps: 46%|████▌ | 458/1000 [04:51<05:51, 1.54it/s, loss=0.00249, lr=0.001]\nSteps: 46%|████▌ | 459/1000 [04:52<05:54, 1.52it/s, loss=0.00249, lr=0.001]\nSteps: 46%|████▌ | 459/1000 [04:52<05:54, 1.52it/s, loss=0.0129, lr=0.001] \nSteps: 46%|████▌ | 460/1000 [04:53<05:50, 1.54it/s, loss=0.0129, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4085],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4092], device='cuda:0')\nSteps: 46%|████▌ | 460/1000 [04:53<05:50, 1.54it/s, loss=0.0134, lr=0.001]\nSteps: 46%|████▌ | 461/1000 [04:53<05:50, 1.54it/s, loss=0.0134, lr=0.001]\nSteps: 46%|████▌ | 461/1000 [04:53<05:50, 1.54it/s, loss=0.0508, lr=0.001]\nSteps: 46%|████▌ | 462/1000 [04:54<05:45, 1.56it/s, loss=0.0508, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4085],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4092], device='cuda:0')\nSteps: 46%|████▌ | 462/1000 [04:54<05:45, 1.56it/s, loss=0.00419, lr=0.001]\nSteps: 46%|████▋ | 463/1000 [04:55<05:48, 1.54it/s, loss=0.00419, lr=0.001]\nSteps: 46%|████▋ | 463/1000 [04:55<05:48, 1.54it/s, loss=0.00566, lr=0.001]\nSteps: 46%|████▋ | 464/1000 [04:55<05:44, 1.56it/s, loss=0.00566, lr=0.001]\ntensor(0.0102, device='cuda:0')\ntensor([[0.4086],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4091], device='cuda:0')\nSteps: 46%|████▋ | 464/1000 [04:55<05:44, 1.56it/s, loss=0.00386, lr=0.001]\nSteps: 46%|████▋ | 465/1000 [04:56<05:46, 1.54it/s, loss=0.00386, lr=0.001]\nSteps: 46%|████▋ | 465/1000 [04:56<05:46, 1.54it/s, loss=0.00318, lr=0.001]\nSteps: 47%|████▋ | 466/1000 [04:57<05:42, 1.56it/s, loss=0.00318, lr=0.001]\ntensor(0.0017, device='cuda:0')\ntensor([[0.4087],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4091], device='cuda:0')\nSteps: 47%|████▋ | 466/1000 [04:57<05:42, 1.56it/s, loss=0.00886, lr=0.001]\nSteps: 47%|████▋ | 467/1000 [04:57<05:42, 1.55it/s, loss=0.00886, lr=0.001]\nSteps: 47%|████▋ | 467/1000 [04:57<05:42, 1.55it/s, loss=0.000229, lr=0.001]\nSteps: 47%|████▋ | 468/1000 [04:58<05:38, 1.57it/s, loss=0.000229, lr=0.001]\ntensor(0.0087, device='cuda:0')\ntensor([[0.4088],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4091], device='cuda:0')\nSteps: 47%|████▋ | 468/1000 [04:58<05:38, 1.57it/s, loss=0.0432, lr=0.001] \nSteps: 47%|████▋ | 469/1000 [04:58<05:39, 1.56it/s, loss=0.0432, lr=0.001]\nSteps: 47%|████▋ | 469/1000 [04:58<05:39, 1.56it/s, loss=0.00354, lr=0.001]\nSteps: 47%|████▋ | 470/1000 [04:59<05:36, 1.58it/s, loss=0.00354, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4090],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4092], device='cuda:0')\nSteps: 47%|████▋ | 470/1000 [04:59<05:36, 1.58it/s, loss=0.00289, lr=0.001]\nSteps: 47%|████▋ | 471/1000 [05:00<05:38, 1.56it/s, loss=0.00289, lr=0.001]\nSteps: 47%|████▋ | 471/1000 [05:00<05:38, 1.56it/s, loss=0.077, lr=0.001] \nSteps: 47%|████▋ | 472/1000 [05:00<05:34, 1.58it/s, loss=0.077, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4092],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4092], device='cuda:0')\nSteps: 47%|████▋ | 472/1000 [05:00<05:34, 1.58it/s, loss=0.00646, lr=0.001]\nSteps: 47%|████▋ | 473/1000 [05:01<05:35, 1.57it/s, loss=0.00646, lr=0.001]\nSteps: 47%|████▋ | 473/1000 [05:01<05:35, 1.57it/s, loss=0.00128, lr=0.001]\nSteps: 47%|████▋ | 474/1000 [05:02<05:32, 1.58it/s, loss=0.00128, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4094],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4092], device='cuda:0')\nSteps: 47%|████▋ | 474/1000 [05:02<05:32, 1.58it/s, loss=0.0353, lr=0.001] \nSteps: 48%|████▊ | 475/1000 [05:02<05:36, 1.56it/s, loss=0.0353, lr=0.001]\nSteps: 48%|████▊ | 475/1000 [05:02<05:36, 1.56it/s, loss=0.0144, lr=0.001]\nSteps: 48%|████▊ | 476/1000 [05:03<05:33, 1.57it/s, loss=0.0144, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4096],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4092], device='cuda:0')\nSteps: 48%|████▊ | 476/1000 [05:03<05:33, 1.57it/s, loss=0.0618, lr=0.001]\nSteps: 48%|████▊ | 477/1000 [05:04<05:35, 1.56it/s, loss=0.0618, lr=0.001]\nSteps: 48%|████▊ | 477/1000 [05:04<05:35, 1.56it/s, loss=0.00403, lr=0.001]\nSteps: 48%|████▊ | 478/1000 [05:04<05:32, 1.57it/s, loss=0.00403, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4097],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4092], device='cuda:0')\nSteps: 48%|████▊ | 478/1000 [05:04<05:32, 1.57it/s, loss=0.00982, lr=0.001]\nSteps: 48%|████▊ | 479/1000 [05:05<05:35, 1.55it/s, loss=0.00982, lr=0.001]\nSteps: 48%|████▊ | 479/1000 [05:05<05:35, 1.55it/s, loss=0.00697, lr=0.001]\nSteps: 48%|████▊ | 480/1000 [05:05<05:31, 1.57it/s, loss=0.00697, lr=0.001]\ntensor(0.0042, device='cuda:0')\ntensor([[0.4098],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4092], device='cuda:0')\nSteps: 48%|████▊ | 480/1000 [05:05<05:31, 1.57it/s, loss=0.000747, lr=0.001]\nSteps: 48%|████▊ | 481/1000 [05:06<05:35, 1.55it/s, loss=0.000747, lr=0.001]\nSteps: 48%|████▊ | 481/1000 [05:06<05:35, 1.55it/s, loss=0.0442, lr=0.001] \nSteps: 48%|████▊ | 482/1000 [05:07<05:32, 1.56it/s, loss=0.0442, lr=0.001]\ntensor(0.0087, device='cuda:0')\ntensor([[0.4099],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4091], device='cuda:0')\nSteps: 48%|████▊ | 482/1000 [05:07<05:32, 1.56it/s, loss=0.0821, lr=0.001]\nSteps: 48%|████▊ | 483/1000 [05:07<05:32, 1.55it/s, loss=0.0821, lr=0.001]\nSteps: 48%|████▊ | 483/1000 [05:07<05:32, 1.55it/s, loss=0.0551, lr=0.001]\nSteps: 48%|████▊ | 484/1000 [05:08<05:29, 1.57it/s, loss=0.0551, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4100],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4092], device='cuda:0')\nSteps: 48%|████▊ | 484/1000 [05:08<05:29, 1.57it/s, loss=0.0276, lr=0.001]\nSteps: 48%|████▊ | 485/1000 [05:09<05:33, 1.55it/s, loss=0.0276, lr=0.001]\nSteps: 48%|████▊ | 485/1000 [05:09<05:33, 1.55it/s, loss=0.049, lr=0.001] \nSteps: 49%|████▊ | 486/1000 [05:09<05:29, 1.56it/s, loss=0.049, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4101],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4093], device='cuda:0')\nSteps: 49%|████▊ | 486/1000 [05:09<05:29, 1.56it/s, loss=0.0103, lr=0.001]\nSteps: 49%|████▊ | 487/1000 [05:10<05:33, 1.54it/s, loss=0.0103, lr=0.001]\nSteps: 49%|████▊ | 487/1000 [05:10<05:33, 1.54it/s, loss=0.016, lr=0.001] \nSteps: 49%|████▉ | 488/1000 [05:11<05:28, 1.56it/s, loss=0.016, lr=0.001]\ntensor(0.0004, device='cuda:0')\ntensor([[0.4102],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4095], device='cuda:0')\nSteps: 49%|████▉ | 488/1000 [05:11<05:28, 1.56it/s, loss=0.00151, lr=0.001]\nSteps: 49%|████▉ | 489/1000 [05:11<05:29, 1.55it/s, loss=0.00151, lr=0.001]\nSteps: 49%|████▉ | 489/1000 [05:11<05:29, 1.55it/s, loss=0.000248, lr=0.001]\nSteps: 49%|████▉ | 490/1000 [05:12<05:25, 1.57it/s, loss=0.000248, lr=0.001]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4103],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4096], device='cuda:0')\nSteps: 49%|████▉ | 490/1000 [05:12<05:25, 1.57it/s, loss=0.00138, lr=0.001] \nSteps: 49%|████▉ | 491/1000 [05:13<05:28, 1.55it/s, loss=0.00138, lr=0.001]\nSteps: 49%|████▉ | 491/1000 [05:13<05:28, 1.55it/s, loss=0.0205, lr=0.001] \nSteps: 49%|████▉ | 492/1000 [05:13<05:25, 1.56it/s, loss=0.0205, lr=0.001]\ntensor(0.0024, device='cuda:0')\ntensor([[0.4104],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4096], device='cuda:0')\nSteps: 49%|████▉ | 492/1000 [05:13<05:25, 1.56it/s, loss=0.00147, lr=0.001]\nSteps: 49%|████▉ | 493/1000 [05:14<05:27, 1.55it/s, loss=0.00147, lr=0.001]\nSteps: 49%|████▉ | 493/1000 [05:14<05:27, 1.55it/s, loss=0.00637, lr=0.001]\nSteps: 49%|████▉ | 494/1000 [05:14<05:24, 1.56it/s, loss=0.00637, lr=0.001]\ntensor(0.0070, device='cuda:0')\ntensor([[0.4105],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4096], device='cuda:0')\nSteps: 49%|████▉ | 494/1000 [05:14<05:24, 1.56it/s, loss=0.044, lr=0.001] \nSteps: 50%|████▉ | 495/1000 [05:15<05:26, 1.55it/s, loss=0.044, lr=0.001]\nSteps: 50%|████▉ | 495/1000 [05:15<05:26, 1.55it/s, loss=0.00497, lr=0.001]\nSteps: 50%|████▉ | 496/1000 [05:16<05:22, 1.56it/s, loss=0.00497, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4105],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4096], device='cuda:0')\nSteps: 50%|████▉ | 496/1000 [05:16<05:22, 1.56it/s, loss=0.0009, lr=0.001] \nSteps: 50%|████▉ | 497/1000 [05:16<05:25, 1.54it/s, loss=0.0009, lr=0.001]\nSteps: 50%|████▉ | 497/1000 [05:16<05:25, 1.54it/s, loss=0.00351, lr=0.001]\nSteps: 50%|████▉ | 498/1000 [05:17<05:22, 1.56it/s, loss=0.00351, lr=0.001]\ntensor(0.0069, device='cuda:0')\ntensor([[0.4105],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4095, 0.4096], device='cuda:0')\nSteps: 50%|████▉ | 498/1000 [05:17<05:22, 1.56it/s, loss=0.0155, lr=0.001] \nSteps: 50%|████▉ | 499/1000 [05:18<05:23, 1.55it/s, loss=0.0155, lr=0.001]\nSteps: 50%|████▉ | 499/1000 [05:18<05:23, 1.55it/s, loss=0.0452, lr=0.001]\nSteps: 50%|█████ | 500/1000 [05:18<05:19, 1.56it/s, loss=0.0452, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0147, -0.0021, -0.0056, -0.0236], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0072, 0.0016, -0.0013, 0.0088], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_500.safetensors\ntensor(0.0015, device='cuda:0')\ntensor([[0.4105],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4096], device='cuda:0')\nSteps: 50%|█████ | 500/1000 [05:18<05:19, 1.56it/s, loss=0.00169, lr=0.001]\nSteps: 50%|█████ | 501/1000 [05:19<05:20, 1.55it/s, loss=0.00169, lr=0.001]\nSteps: 50%|█████ | 501/1000 [05:19<05:20, 1.55it/s, loss=0.000385, lr=0.001]\nSteps: 50%|█████ | 502/1000 [05:20<05:18, 1.56it/s, loss=0.000385, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4104],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4095], device='cuda:0')\nSteps: 50%|█████ | 502/1000 [05:20<05:18, 1.56it/s, loss=0.004, lr=0.001] \nSteps: 50%|█████ | 503/1000 [05:20<05:20, 1.55it/s, loss=0.004, lr=0.001]\nSteps: 50%|█████ | 503/1000 [05:20<05:20, 1.55it/s, loss=0.0547, lr=0.001]\nSteps: 50%|█████ | 504/1000 [05:21<05:17, 1.56it/s, loss=0.0547, lr=0.001]\ntensor(0.0032, device='cuda:0')\ntensor([[0.4103],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4094], device='cuda:0')\nSteps: 50%|█████ | 504/1000 [05:21<05:17, 1.56it/s, loss=0.000724, lr=0.001]\nSteps: 50%|█████ | 505/1000 [05:22<05:17, 1.56it/s, loss=0.000724, lr=0.001]\nSteps: 50%|█████ | 505/1000 [05:22<05:17, 1.56it/s, loss=0.00276, lr=0.001] \nSteps: 51%|█████ | 506/1000 [05:22<05:14, 1.57it/s, loss=0.00276, lr=0.001]\ntensor(0.0009, device='cuda:0')\ntensor([[0.4101],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4093], device='cuda:0')\nSteps: 51%|█████ | 506/1000 [05:22<05:14, 1.57it/s, loss=0.000694, lr=0.001]\nSteps: 51%|█████ | 507/1000 [05:23<05:16, 1.56it/s, loss=0.000694, lr=0.001]\nSteps: 51%|█████ | 507/1000 [05:23<05:16, 1.56it/s, loss=0.000989, lr=0.001]\nSteps: 51%|█████ | 508/1000 [05:23<05:13, 1.57it/s, loss=0.000989, lr=0.001]\ntensor(0.0065, device='cuda:0')\ntensor([[0.4100],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4092], device='cuda:0')\nSteps: 51%|█████ | 508/1000 [05:23<05:13, 1.57it/s, loss=0.00329, lr=0.001] \nSteps: 51%|█████ | 509/1000 [05:24<05:15, 1.56it/s, loss=0.00329, lr=0.001]\nSteps: 51%|█████ | 509/1000 [05:24<05:15, 1.56it/s, loss=0.0359, lr=0.001] \nSteps: 51%|█████ | 510/1000 [05:25<05:11, 1.57it/s, loss=0.0359, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4098],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4091], device='cuda:0')\nSteps: 51%|█████ | 510/1000 [05:25<05:11, 1.57it/s, loss=0.0109, lr=0.001]\nSteps: 51%|█████ | 511/1000 [05:25<05:15, 1.55it/s, loss=0.0109, lr=0.001]\nSteps: 51%|█████ | 511/1000 [05:25<05:15, 1.55it/s, loss=0.005, lr=0.001] \nSteps: 51%|█████ | 512/1000 [05:26<05:12, 1.56it/s, loss=0.005, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4096],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4090], device='cuda:0')\nSteps: 51%|█████ | 512/1000 [05:26<05:12, 1.56it/s, loss=0.152, lr=0.001]\nSteps: 51%|█████▏ | 513/1000 [05:27<05:17, 1.54it/s, loss=0.152, lr=0.001]\nSteps: 51%|█████▏ | 513/1000 [05:27<05:17, 1.54it/s, loss=0.00354, lr=0.001]\nSteps: 51%|█████▏ | 514/1000 [05:27<05:12, 1.56it/s, loss=0.00354, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4094],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4088], device='cuda:0')\nSteps: 51%|█████▏ | 514/1000 [05:27<05:12, 1.56it/s, loss=0.0389, lr=0.001] \nSteps: 52%|█████▏ | 515/1000 [05:28<05:13, 1.54it/s, loss=0.0389, lr=0.001]\nSteps: 52%|█████▏ | 515/1000 [05:28<05:13, 1.54it/s, loss=0.0244, lr=0.001]\nSteps: 52%|█████▏ | 516/1000 [05:29<05:11, 1.56it/s, loss=0.0244, lr=0.001]\ntensor(0.0010, device='cuda:0')\ntensor([[0.4093],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4087], device='cuda:0')\nSteps: 52%|█████▏ | 516/1000 [05:29<05:11, 1.56it/s, loss=0.0105, lr=0.001]\nSteps: 52%|█████▏ | 517/1000 [05:29<05:12, 1.55it/s, loss=0.0105, lr=0.001]\nSteps: 52%|█████▏ | 517/1000 [05:29<05:12, 1.55it/s, loss=0.000476, lr=0.001]\nSteps: 52%|█████▏ | 518/1000 [05:30<05:10, 1.55it/s, loss=0.000476, lr=0.001]\ntensor(0.0080, device='cuda:0')\ntensor([[0.4091],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4086], device='cuda:0')\nSteps: 52%|█████▏ | 518/1000 [05:30<05:10, 1.55it/s, loss=0.00252, lr=0.001] \nSteps: 52%|█████▏ | 519/1000 [05:31<05:11, 1.55it/s, loss=0.00252, lr=0.001]\nSteps: 52%|█████▏ | 519/1000 [05:31<05:11, 1.55it/s, loss=0.015, lr=0.001] \nSteps: 52%|█████▏ | 520/1000 [05:31<05:06, 1.56it/s, loss=0.015, lr=0.001]\ntensor(0.0011, device='cuda:0')\ntensor([[0.4090],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4085], device='cuda:0')\nSteps: 52%|█████▏ | 520/1000 [05:31<05:06, 1.56it/s, loss=0.000183, lr=0.001]\nSteps: 52%|█████▏ | 521/1000 [05:32<05:09, 1.55it/s, loss=0.000183, lr=0.001]\nSteps: 52%|█████▏ | 521/1000 [05:32<05:09, 1.55it/s, loss=0.00404, lr=0.001] \nSteps: 52%|█████▏ | 522/1000 [05:32<05:06, 1.56it/s, loss=0.00404, lr=0.001]\ntensor(0.0048, device='cuda:0')\ntensor([[0.4088],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4084], device='cuda:0')\nSteps: 52%|█████▏ | 522/1000 [05:32<05:06, 1.56it/s, loss=0.00752, lr=0.001]\nSteps: 52%|█████▏ | 523/1000 [05:33<05:06, 1.56it/s, loss=0.00752, lr=0.001]\nSteps: 52%|█████▏ | 523/1000 [05:33<05:06, 1.56it/s, loss=0.0113, lr=0.001] \nSteps: 52%|█████▏ | 524/1000 [05:34<05:02, 1.57it/s, loss=0.0113, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4086],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4082], device='cuda:0')\nSteps: 52%|█████▏ | 524/1000 [05:34<05:02, 1.57it/s, loss=0.00317, lr=0.001]\nSteps: 52%|█████▎ | 525/1000 [05:34<05:05, 1.55it/s, loss=0.00317, lr=0.001]\nSteps: 52%|█████▎ | 525/1000 [05:34<05:05, 1.55it/s, loss=0.000528, lr=0.001]\nSteps: 53%|█████▎ | 526/1000 [05:35<05:02, 1.57it/s, loss=0.000528, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4084],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4081], device='cuda:0')\nSteps: 53%|█████▎ | 526/1000 [05:35<05:02, 1.57it/s, loss=0.000145, lr=0.001]\nSteps: 53%|█████▎ | 527/1000 [05:36<05:03, 1.56it/s, loss=0.000145, lr=0.001]\nSteps: 53%|█████▎ | 527/1000 [05:36<05:03, 1.56it/s, loss=0.0233, lr=0.001] \nSteps: 53%|█████▎ | 528/1000 [05:36<04:59, 1.57it/s, loss=0.0233, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4083],\n[0.4088]], device='cuda:0')\nCurrent Norm : tensor([0.4074, 0.4080], device='cuda:0')\nSteps: 53%|█████▎ | 528/1000 [05:36<04:59, 1.57it/s, loss=0.000492, lr=0.001]\nSteps: 53%|█████▎ | 529/1000 [05:37<05:01, 1.56it/s, loss=0.000492, lr=0.001]\nSteps: 53%|█████▎ | 529/1000 [05:37<05:01, 1.56it/s, loss=0.0405, lr=0.001] \nSteps: 53%|█████▎ | 530/1000 [05:38<04:57, 1.58it/s, loss=0.0405, lr=0.001]\ntensor(0.0062, device='cuda:0')\ntensor([[0.4081],\n[0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4078], device='cuda:0')\nSteps: 53%|█████▎ | 530/1000 [05:38<04:57, 1.58it/s, loss=0.00214, lr=0.001]\nSteps: 53%|█████▎ | 531/1000 [05:38<04:59, 1.57it/s, loss=0.00214, lr=0.001]\nSteps: 53%|█████▎ | 531/1000 [05:38<04:59, 1.57it/s, loss=0.0452, lr=0.001] \nSteps: 53%|█████▎ | 532/1000 [05:39<04:56, 1.58it/s, loss=0.0452, lr=0.001]\ntensor(0.0072, device='cuda:0')\ntensor([[0.4080],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4077], device='cuda:0')\nSteps: 53%|█████▎ | 532/1000 [05:39<04:56, 1.58it/s, loss=0.0139, lr=0.001]\nSteps: 53%|█████▎ | 533/1000 [05:39<04:57, 1.57it/s, loss=0.0139, lr=0.001]\nSteps: 53%|█████▎ | 533/1000 [05:39<04:57, 1.57it/s, loss=0.0151, lr=0.001]\nSteps: 53%|█████▎ | 534/1000 [05:40<04:55, 1.58it/s, loss=0.0151, lr=0.001]\ntensor(0.0065, device='cuda:0')\ntensor([[0.4078],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4077], device='cuda:0')\nSteps: 53%|█████▎ | 534/1000 [05:40<04:55, 1.58it/s, loss=0.021, lr=0.001] \nSteps: 54%|█████▎ | 535/1000 [05:41<04:57, 1.56it/s, loss=0.021, lr=0.001]\nSteps: 54%|█████▎ | 535/1000 [05:41<04:57, 1.56it/s, loss=0.00169, lr=0.001]\nSteps: 54%|█████▎ | 536/1000 [05:41<04:55, 1.57it/s, loss=0.00169, lr=0.001]\ntensor(0.0003, device='cuda:0')\ntensor([[0.4077],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4077], device='cuda:0')\nSteps: 54%|█████▎ | 536/1000 [05:41<04:55, 1.57it/s, loss=0.000928, lr=0.001]\nSteps: 54%|█████▎ | 537/1000 [05:42<04:57, 1.56it/s, loss=0.000928, lr=0.001]\nSteps: 54%|█████▎ | 537/1000 [05:42<04:57, 1.56it/s, loss=0.000183, lr=0.001]\nSteps: 54%|█████▍ | 538/1000 [05:43<04:53, 1.57it/s, loss=0.000183, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4075],\n [0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4077], device='cuda:0')\nSteps: 54%|█████▍ | 538/1000 [05:43<04:53, 1.57it/s, loss=0.0489, lr=0.001] \nSteps: 54%|█████▍ | 539/1000 [05:43<04:57, 1.55it/s, loss=0.0489, lr=0.001]\nSteps: 54%|█████▍ | 539/1000 [05:43<04:57, 1.55it/s, loss=0.0065, lr=0.001]\nSteps: 54%|█████▍ | 540/1000 [05:44<04:53, 1.57it/s, loss=0.0065, lr=0.001]\ntensor(0.0019, device='cuda:0')\ntensor([[0.4074],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4077], device='cuda:0')\nSteps: 54%|█████▍ | 540/1000 [05:44<04:53, 1.57it/s, loss=0.00161, lr=0.001]\nSteps: 54%|█████▍ | 541/1000 [05:45<04:56, 1.55it/s, loss=0.00161, lr=0.001]\nSteps: 54%|█████▍ | 541/1000 [05:45<04:56, 1.55it/s, loss=0.00176, lr=0.001]\nSteps: 54%|█████▍ | 542/1000 [05:45<04:53, 1.56it/s, loss=0.00176, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4073],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4076], device='cuda:0')\nSteps: 54%|█████▍ | 542/1000 [05:45<04:53, 1.56it/s, loss=0.00181, lr=0.001]\nSteps: 54%|█████▍ | 543/1000 [05:46<04:54, 1.55it/s, loss=0.00181, lr=0.001]\nSteps: 54%|█████▍ | 543/1000 [05:46<04:54, 1.55it/s, loss=0.00432, lr=0.001]\nSteps: 54%|█████▍ | 544/1000 [05:47<04:51, 1.56it/s, loss=0.00432, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4072],\n[0.4084]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4076], device='cuda:0')\nSteps: 54%|█████▍ | 544/1000 [05:47<04:51, 1.56it/s, loss=0.00219, lr=0.001]\nSteps: 55%|█████▍ | 545/1000 [05:47<04:51, 1.56it/s, loss=0.00219, lr=0.001]\nSteps: 55%|█████▍ | 545/1000 [05:47<04:51, 1.56it/s, loss=0.0136, lr=0.001] \nSteps: 55%|█████▍ | 546/1000 [05:48<04:49, 1.57it/s, loss=0.0136, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4071],\n[0.4083]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4075], device='cuda:0')\nSteps: 55%|█████▍ | 546/1000 [05:48<04:49, 1.57it/s, loss=0.00109, lr=0.001]\nSteps: 55%|█████▍ | 547/1000 [05:48<04:49, 1.56it/s, loss=0.00109, lr=0.001]\nSteps: 55%|█████▍ | 547/1000 [05:48<04:49, 1.56it/s, loss=0.021, lr=0.001] \nSteps: 55%|█████▍ | 548/1000 [05:49<04:47, 1.57it/s, loss=0.021, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4071],\n[0.4081]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4073], device='cuda:0')\nSteps: 55%|█████▍ | 548/1000 [05:49<04:47, 1.57it/s, loss=0.016, lr=0.001]\nSteps: 55%|█████▍ | 549/1000 [05:50<04:49, 1.56it/s, loss=0.016, lr=0.001]\nSteps: 55%|█████▍ | 549/1000 [05:50<04:49, 1.56it/s, loss=0.00611, lr=0.001]\nSteps: 55%|█████▌ | 550/1000 [05:50<04:46, 1.57it/s, loss=0.00611, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4071],\n[0.4079]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4071], device='cuda:0')\nSteps: 55%|█████▌ | 550/1000 [05:50<04:46, 1.57it/s, loss=0.00452, lr=0.001]\nSteps: 55%|█████▌ | 551/1000 [05:51<04:47, 1.56it/s, loss=0.00452, lr=0.001]\nSteps: 55%|█████▌ | 551/1000 [05:51<04:47, 1.56it/s, loss=0.00135, lr=0.001]\nSteps: 55%|█████▌ | 552/1000 [05:52<04:47, 1.56it/s, loss=0.00135, lr=0.001]\ntensor(0.0008, device='cuda:0')\ntensor([[0.4071],\n[0.4077]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4069], device='cuda:0')\nSteps: 55%|█████▌ | 552/1000 [05:52<04:47, 1.56it/s, loss=0.000583, lr=0.001]\nSteps: 55%|█████▌ | 553/1000 [05:52<04:48, 1.55it/s, loss=0.000583, lr=0.001]\nSteps: 55%|█████▌ | 553/1000 [05:52<04:48, 1.55it/s, loss=0.000458, lr=0.001]\nSteps: 55%|█████▌ | 554/1000 [05:53<04:46, 1.56it/s, loss=0.000458, lr=0.001]\ntensor(0.0056, device='cuda:0')\ntensor([[0.4072],\n[0.4075]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4068], device='cuda:0')\nSteps: 55%|█████▌ | 554/1000 [05:53<04:46, 1.56it/s, loss=0.00163, lr=0.001] \nSteps: 56%|█████▌ | 555/1000 [05:54<04:47, 1.55it/s, loss=0.00163, lr=0.001]\nSteps: 56%|█████▌ | 555/1000 [05:54<04:47, 1.55it/s, loss=0.076, lr=0.001] \nSteps: 56%|█████▌ | 556/1000 [05:54<04:44, 1.56it/s, loss=0.076, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4072],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4066], device='cuda:0')\nSteps: 56%|█████▌ | 556/1000 [05:54<04:44, 1.56it/s, loss=0.00438, lr=0.001]\nSteps: 56%|█████▌ | 557/1000 [05:55<04:45, 1.55it/s, loss=0.00438, lr=0.001]\nSteps: 56%|█████▌ | 557/1000 [05:55<04:45, 1.55it/s, loss=0.00992, lr=0.001]\nSteps: 56%|█████▌ | 558/1000 [05:55<04:41, 1.57it/s, loss=0.00992, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4072],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4066], device='cuda:0')\nSteps: 56%|█████▌ | 558/1000 [05:55<04:41, 1.57it/s, loss=0.00724, lr=0.001]\nSteps: 56%|█████▌ | 559/1000 [05:56<04:42, 1.56it/s, loss=0.00724, lr=0.001]\nSteps: 56%|█████▌ | 559/1000 [05:56<04:42, 1.56it/s, loss=0.0981, lr=0.001] \nSteps: 56%|█████▌ | 560/1000 [05:57<04:39, 1.58it/s, loss=0.0981, lr=0.001]\ntensor(0.0098, device='cuda:0')\ntensor([[0.4072],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4066], device='cuda:0')\nSteps: 56%|█████▌ | 560/1000 [05:57<04:39, 1.58it/s, loss=0.0443, lr=0.001]\nSteps: 56%|█████▌ | 561/1000 [05:57<04:40, 1.56it/s, loss=0.0443, lr=0.001]\nSteps: 56%|█████▌ | 561/1000 [05:57<04:40, 1.56it/s, loss=0.00534, lr=0.001]\nSteps: 56%|█████▌ | 562/1000 [05:58<04:37, 1.58it/s, loss=0.00534, lr=0.001]\ntensor(0.0065, device='cuda:0')\ntensor([[0.4072],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4066], device='cuda:0')\nSteps: 56%|█████▌ | 562/1000 [05:58<04:37, 1.58it/s, loss=0.000203, lr=0.001]\nSteps: 56%|█████▋ | 563/1000 [05:59<04:38, 1.57it/s, loss=0.000203, lr=0.001]\nSteps: 56%|█████▋ | 563/1000 [05:59<04:38, 1.57it/s, loss=0.0275, lr=0.001] \nSteps: 56%|█████▋ | 564/1000 [05:59<04:36, 1.58it/s, loss=0.0275, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4073],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4067], device='cuda:0')\nSteps: 56%|█████▋ | 564/1000 [05:59<04:36, 1.58it/s, loss=0.0524, lr=0.001]\nSteps: 56%|█████▋ | 565/1000 [06:00<04:37, 1.57it/s, loss=0.0524, lr=0.001]\nSteps: 56%|█████▋ | 565/1000 [06:00<04:37, 1.57it/s, loss=0.0067, lr=0.001]\nSteps: 57%|█████▋ | 566/1000 [06:01<04:34, 1.58it/s, loss=0.0067, lr=0.001]\ntensor(0.0110, device='cuda:0')\ntensor([[0.4073],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4067], device='cuda:0')\nSteps: 57%|█████▋ | 566/1000 [06:01<04:34, 1.58it/s, loss=0.00774, lr=0.001]\nSteps: 57%|█████▋ | 567/1000 [06:01<04:36, 1.57it/s, loss=0.00774, lr=0.001]\nSteps: 57%|█████▋ | 567/1000 [06:01<04:36, 1.57it/s, loss=0.00531, lr=0.001]\nSteps: 57%|█████▋ | 568/1000 [06:02<04:32, 1.58it/s, loss=0.00531, lr=0.001]\ntensor(0.0025, device='cuda:0')\ntensor([[0.4073],\n[0.4075]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4068], device='cuda:0')\nSteps: 57%|█████▋ | 568/1000 [06:02<04:32, 1.58it/s, loss=0.00398, lr=0.001]\nSteps: 57%|█████▋ | 569/1000 [06:02<04:34, 1.57it/s, loss=0.00398, lr=0.001]\nSteps: 57%|█████▋ | 569/1000 [06:02<04:34, 1.57it/s, loss=0.000125, lr=0.001]\nSteps: 57%|█████▋ | 570/1000 [06:03<04:32, 1.58it/s, loss=0.000125, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4073],\n[0.4076]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4068], device='cuda:0')\nSteps: 57%|█████▋ | 570/1000 [06:03<04:32, 1.58it/s, loss=0.00406, lr=0.001] \nSteps: 57%|█████▋ | 571/1000 [06:04<04:34, 1.57it/s, loss=0.00406, lr=0.001]\nSteps: 57%|█████▋ | 571/1000 [06:04<04:34, 1.57it/s, loss=0.000723, lr=0.001]\nSteps: 57%|█████▋ | 572/1000 [06:04<04:31, 1.57it/s, loss=0.000723, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4072],\n[0.4077]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4069], device='cuda:0')\nSteps: 57%|█████▋ | 572/1000 [06:04<04:31, 1.57it/s, loss=0.0526, lr=0.001] \nSteps: 57%|█████▋ | 573/1000 [06:05<04:33, 1.56it/s, loss=0.0526, lr=0.001]\nSteps: 57%|█████▋ | 573/1000 [06:05<04:33, 1.56it/s, loss=0.000955, lr=0.001]\nSteps: 57%|█████▋ | 574/1000 [06:06<04:30, 1.57it/s, loss=0.000955, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4071],\n[0.4078]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4070], device='cuda:0')\nSteps: 57%|█████▋ | 574/1000 [06:06<04:30, 1.57it/s, loss=0.0284, lr=0.001] \nSteps: 57%|█████▊ | 575/1000 [06:06<04:31, 1.56it/s, loss=0.0284, lr=0.001]\nSteps: 57%|█████▊ | 575/1000 [06:06<04:31, 1.56it/s, loss=0.00668, lr=0.001]\nSteps: 58%|█████▊ | 576/1000 [06:07<04:28, 1.58it/s, loss=0.00668, lr=0.001]\ntensor(0.0083, device='cuda:0')\ntensor([[0.4070],\n[0.4079]], device='cuda:0')\nCurrent Norm : tensor([0.4063, 0.4071], device='cuda:0')\nSteps: 58%|█████▊ | 576/1000 [06:07<04:28, 1.58it/s, loss=0.0343, lr=0.001] \nSteps: 58%|█████▊ | 577/1000 [06:08<04:29, 1.57it/s, loss=0.0343, lr=0.001]\nSteps: 58%|█████▊ | 577/1000 [06:08<04:29, 1.57it/s, loss=0.0111, lr=0.001]\nSteps: 58%|█████▊ | 578/1000 [06:08<04:26, 1.58it/s, loss=0.0111, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4070],\n[0.4081]], device='cuda:0')\nCurrent Norm : tensor([0.4063, 0.4073], device='cuda:0')\nSteps: 58%|█████▊ | 578/1000 [06:08<04:26, 1.58it/s, loss=0.0138, lr=0.001]\nSteps: 58%|█████▊ | 579/1000 [06:09<04:28, 1.57it/s, loss=0.0138, lr=0.001]\nSteps: 58%|█████▊ | 579/1000 [06:09<04:28, 1.57it/s, loss=0.0497, lr=0.001]\nSteps: 58%|█████▊ | 580/1000 [06:09<04:26, 1.58it/s, loss=0.0497, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4071],\n[0.4083]], device='cuda:0')\nCurrent Norm : tensor([0.4064, 0.4075], device='cuda:0')\nSteps: 58%|█████▊ | 580/1000 [06:09<04:26, 1.58it/s, loss=0.000655, lr=0.001]\nSteps: 58%|█████▊ | 581/1000 [06:10<04:27, 1.57it/s, loss=0.000655, lr=0.001]\nSteps: 58%|█████▊ | 581/1000 [06:10<04:27, 1.57it/s, loss=0.013, lr=0.001] \nSteps: 58%|█████▊ | 582/1000 [06:11<04:24, 1.58it/s, loss=0.013, lr=0.001]\ntensor(0.0032, device='cuda:0')\ntensor([[0.4072],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4077], device='cuda:0')\nSteps: 58%|█████▊ | 582/1000 [06:11<04:24, 1.58it/s, loss=0.0135, lr=0.001]\nSteps: 58%|█████▊ | 583/1000 [06:11<04:26, 1.56it/s, loss=0.0135, lr=0.001]\nSteps: 58%|█████▊ | 583/1000 [06:11<04:26, 1.56it/s, loss=0.0109, lr=0.001]\nSteps: 58%|█████▊ | 584/1000 [06:12<04:24, 1.57it/s, loss=0.0109, lr=0.001]\ntensor(0.0014, device='cuda:0')\ntensor([[0.4073],\n[0.4088]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4079], device='cuda:0')\nSteps: 58%|█████▊ | 584/1000 [06:12<04:24, 1.57it/s, loss=0.00615, lr=0.001]\nSteps: 58%|█████▊ | 585/1000 [06:13<04:25, 1.56it/s, loss=0.00615, lr=0.001]\nSteps: 58%|█████▊ | 585/1000 [06:13<04:25, 1.56it/s, loss=0.0127, lr=0.001] \nSteps: 59%|█████▊ | 586/1000 [06:13<04:22, 1.57it/s, loss=0.0127, lr=0.001]\ntensor(0.0024, device='cuda:0')\ntensor([[0.4074],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4081], device='cuda:0')\nSteps: 59%|█████▊ | 586/1000 [06:13<04:22, 1.57it/s, loss=0.000324, lr=0.001]\nSteps: 59%|█████▊ | 587/1000 [06:14<04:24, 1.56it/s, loss=0.000324, lr=0.001]\nSteps: 59%|█████▊ | 587/1000 [06:14<04:24, 1.56it/s, loss=0.013, lr=0.001] \nSteps: 59%|█████▉ | 588/1000 [06:15<04:21, 1.57it/s, loss=0.013, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4075],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4083], device='cuda:0')\nSteps: 59%|█████▉ | 588/1000 [06:15<04:21, 1.57it/s, loss=0.00368, lr=0.001]\nSteps: 59%|█████▉ | 589/1000 [06:15<04:23, 1.56it/s, loss=0.00368, lr=0.001]\nSteps: 59%|█████▉ | 589/1000 [06:15<04:23, 1.56it/s, loss=0.00373, lr=0.001]\nSteps: 59%|█████▉ | 590/1000 [06:16<04:20, 1.57it/s, loss=0.00373, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4076],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4084], device='cuda:0')\nSteps: 59%|█████▉ | 590/1000 [06:16<04:20, 1.57it/s, loss=0.00137, lr=0.001]\nSteps: 59%|█████▉ | 591/1000 [06:16<04:20, 1.57it/s, loss=0.00137, lr=0.001]\nSteps: 59%|█████▉ | 591/1000 [06:16<04:20, 1.57it/s, loss=0.0131, lr=0.001] \nSteps: 59%|█████▉ | 592/1000 [06:17<04:18, 1.58it/s, loss=0.0131, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4076],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4085], device='cuda:0')\nSteps: 59%|█████▉ | 592/1000 [06:17<04:18, 1.58it/s, loss=0.014, lr=0.001] \nSteps: 59%|█████▉ | 593/1000 [06:18<04:19, 1.57it/s, loss=0.014, lr=0.001]\nSteps: 59%|█████▉ | 593/1000 [06:18<04:19, 1.57it/s, loss=0.000151, lr=0.001]\nSteps: 59%|█████▉ | 594/1000 [06:18<04:16, 1.58it/s, loss=0.000151, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4077],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4085], device='cuda:0')\nSteps: 59%|█████▉ | 594/1000 [06:18<04:16, 1.58it/s, loss=0.0421, lr=0.001] \nSteps: 60%|█████▉ | 595/1000 [06:19<04:18, 1.57it/s, loss=0.0421, lr=0.001]\nSteps: 60%|█████▉ | 595/1000 [06:19<04:18, 1.57it/s, loss=0.000933, lr=0.001]\nSteps: 60%|█████▉ | 596/1000 [06:20<04:16, 1.57it/s, loss=0.000933, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4077],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4086], device='cuda:0')\nSteps: 60%|█████▉ | 596/1000 [06:20<04:16, 1.57it/s, loss=0.00709, lr=0.001] \nSteps: 60%|█████▉ | 597/1000 [06:20<04:19, 1.55it/s, loss=0.00709, lr=0.001]\nSteps: 60%|█████▉ | 597/1000 [06:20<04:19, 1.55it/s, loss=0.000454, lr=0.001]\nSteps: 60%|█████▉ | 598/1000 [06:21<04:15, 1.57it/s, loss=0.000454, lr=0.001]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4078],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4087], device='cuda:0')\nSteps: 60%|█████▉ | 598/1000 [06:21<04:15, 1.57it/s, loss=0.0658, lr=0.001] \nSteps: 60%|█████▉ | 599/1000 [06:22<04:17, 1.56it/s, loss=0.0658, lr=0.001]\nSteps: 60%|█████▉ | 599/1000 [06:22<04:17, 1.56it/s, loss=0.00331, lr=0.001]\nSteps: 60%|██████ | 600/1000 [06:22<04:14, 1.57it/s, loss=0.00331, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0173, -0.0068, -0.0072, -0.0267], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0088, 0.0018, -0.0002, 0.0101], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_600.safetensors\ntensor(0.0059, device='cuda:0')\ntensor([[0.4078],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4086], device='cuda:0')\nSteps: 60%|██████ | 600/1000 [06:22<04:14, 1.57it/s, loss=0.000187, lr=0.001]\nSteps: 60%|██████ | 601/1000 [06:23<04:17, 1.55it/s, loss=0.000187, lr=0.001]\nSteps: 60%|██████ | 601/1000 [06:23<04:17, 1.55it/s, loss=0.0136, lr=0.001] \nSteps: 60%|██████ | 602/1000 [06:23<04:13, 1.57it/s, loss=0.0136, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4078],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4086], device='cuda:0')\nSteps: 60%|██████ | 602/1000 [06:23<04:13, 1.57it/s, loss=0.0572, lr=0.001]\nSteps: 60%|██████ | 603/1000 [06:24<04:14, 1.56it/s, loss=0.0572, lr=0.001]\nSteps: 60%|██████ | 603/1000 [06:24<04:14, 1.56it/s, loss=0.0157, lr=0.001]\nSteps: 60%|██████ | 604/1000 [06:25<04:10, 1.58it/s, loss=0.0157, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4078],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4085], device='cuda:0')\nSteps: 60%|██████ | 604/1000 [06:25<04:10, 1.58it/s, loss=0.00394, lr=0.001]\nSteps: 60%|██████ | 605/1000 [06:25<04:11, 1.57it/s, loss=0.00394, lr=0.001]\nSteps: 60%|██████ | 605/1000 [06:25<04:11, 1.57it/s, loss=0.00151, lr=0.001]\nSteps: 61%|██████ | 606/1000 [06:26<04:09, 1.58it/s, loss=0.00151, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4078],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4084], device='cuda:0')\nSteps: 61%|██████ | 606/1000 [06:26<04:09, 1.58it/s, loss=0.0347, lr=0.001] \nSteps: 61%|██████ | 607/1000 [06:27<04:11, 1.56it/s, loss=0.0347, lr=0.001]\nSteps: 61%|██████ | 607/1000 [06:27<04:11, 1.56it/s, loss=0.00051, lr=0.001]\nSteps: 61%|██████ | 608/1000 [06:27<04:08, 1.58it/s, loss=0.00051, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4077],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4084], device='cuda:0')\nSteps: 61%|██████ | 608/1000 [06:27<04:08, 1.58it/s, loss=0.0035, lr=0.001] \nSteps: 61%|██████ | 609/1000 [06:28<04:10, 1.56it/s, loss=0.0035, lr=0.001]\nSteps: 61%|██████ | 609/1000 [06:28<04:10, 1.56it/s, loss=0.00137, lr=0.001]\nSteps: 61%|██████ | 610/1000 [06:29<04:07, 1.58it/s, loss=0.00137, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4077],\n [0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4083], device='cuda:0')\nSteps: 61%|██████ | 610/1000 [06:29<04:07, 1.58it/s, loss=0.00724, lr=0.001]\nSteps: 61%|██████ | 611/1000 [06:29<04:08, 1.57it/s, loss=0.00724, lr=0.001]\nSteps: 61%|██████ | 611/1000 [06:29<04:08, 1.57it/s, loss=0.00916, lr=0.001]\nSteps: 61%|██████ | 612/1000 [06:30<04:05, 1.58it/s, loss=0.00916, lr=0.001]\ntensor(0.0041, device='cuda:0')\ntensor([[0.4076],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4082], device='cuda:0')\nSteps: 61%|██████ | 612/1000 [06:30<04:05, 1.58it/s, loss=0.00901, lr=0.001]\nSteps: 61%|██████▏ | 613/1000 [06:30<04:07, 1.56it/s, loss=0.00901, lr=0.001]\nSteps: 61%|██████▏ | 613/1000 [06:30<04:07, 1.56it/s, loss=0.0118, lr=0.001] \nSteps: 61%|██████▏ | 614/1000 [06:31<04:04, 1.58it/s, loss=0.0118, lr=0.001]\ntensor(0.0040, device='cuda:0')\ntensor([[0.4076],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4081], device='cuda:0')\nSteps: 61%|██████▏ | 614/1000 [06:31<04:04, 1.58it/s, loss=0.000786, lr=0.001]\nSteps: 62%|██████▏ | 615/1000 [06:32<04:05, 1.57it/s, loss=0.000786, lr=0.001]\nSteps: 62%|██████▏ | 615/1000 [06:32<04:05, 1.57it/s, loss=0.126, lr=0.001] \nSteps: 62%|██████▏ | 616/1000 [06:32<04:03, 1.58it/s, loss=0.126, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4075],\n[0.4089]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4080], device='cuda:0')\nSteps: 62%|██████▏ | 616/1000 [06:32<04:03, 1.58it/s, loss=0.00252, lr=0.001]\nSteps: 62%|██████▏ | 617/1000 [06:33<04:05, 1.56it/s, loss=0.00252, lr=0.001]\nSteps: 62%|██████▏ | 617/1000 [06:33<04:05, 1.56it/s, loss=0.0111, lr=0.001] \nSteps: 62%|██████▏ | 618/1000 [06:34<04:02, 1.58it/s, loss=0.0111, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4075],\n[0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4078], device='cuda:0')\nSteps: 62%|██████▏ | 618/1000 [06:34<04:02, 1.58it/s, loss=0.00777, lr=0.001]\nSteps: 62%|██████▏ | 619/1000 [06:34<04:03, 1.56it/s, loss=0.00777, lr=0.001]\nSteps: 62%|██████▏ | 619/1000 [06:34<04:03, 1.56it/s, loss=0.00577, lr=0.001]\nSteps: 62%|██████▏ | 620/1000 [06:35<04:01, 1.57it/s, loss=0.00577, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4075],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4077], device='cuda:0')\nSteps: 62%|██████▏ | 620/1000 [06:35<04:01, 1.57it/s, loss=0.0139, lr=0.001] \nSteps: 62%|██████▏ | 621/1000 [06:36<04:02, 1.56it/s, loss=0.0139, lr=0.001]\nSteps: 62%|██████▏ | 621/1000 [06:36<04:02, 1.56it/s, loss=0.0724, lr=0.001]\nSteps: 62%|██████▏ | 622/1000 [06:36<04:00, 1.57it/s, loss=0.0724, lr=0.001]\ntensor(0.0048, device='cuda:0')\ntensor([[0.4075],\n[0.4084]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4075], device='cuda:0')\nSteps: 62%|██████▏ | 622/1000 [06:36<04:00, 1.57it/s, loss=0.000151, lr=0.001]\nSteps: 62%|██████▏ | 623/1000 [06:37<04:01, 1.56it/s, loss=0.000151, lr=0.001]\nSteps: 62%|██████▏ | 623/1000 [06:37<04:01, 1.56it/s, loss=0.0327, lr=0.001] \nSteps: 62%|██████▏ | 624/1000 [06:37<03:58, 1.57it/s, loss=0.0327, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4074],\n[0.4082]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4074], device='cuda:0')\nSteps: 62%|██████▏ | 624/1000 [06:37<03:58, 1.57it/s, loss=0.00154, lr=0.001]\nSteps: 62%|██████▎ | 625/1000 [06:38<04:00, 1.56it/s, loss=0.00154, lr=0.001]\nSteps: 62%|██████▎ | 625/1000 [06:38<04:00, 1.56it/s, loss=0.00364, lr=0.001]\nSteps: 63%|██████▎ | 626/1000 [06:39<03:57, 1.57it/s, loss=0.00364, lr=0.001]\ntensor(0.0008, device='cuda:0')\ntensor([[0.4074],\n[0.4080]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4072], device='cuda:0')\nSteps: 63%|██████▎ | 626/1000 [06:39<03:57, 1.57it/s, loss=0.000339, lr=0.001]\nSteps: 63%|██████▎ | 627/1000 [06:39<03:58, 1.56it/s, loss=0.000339, lr=0.001]\nSteps: 63%|██████▎ | 627/1000 [06:39<03:58, 1.56it/s, loss=0.0116, lr=0.001] \nSteps: 63%|██████▎ | 628/1000 [06:40<03:56, 1.57it/s, loss=0.0116, lr=0.001]\ntensor(0.0045, device='cuda:0')\ntensor([[0.4073],\n[0.4079]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4071], device='cuda:0')\nSteps: 63%|██████▎ | 628/1000 [06:40<03:56, 1.57it/s, loss=0.00402, lr=0.001]\nSteps: 63%|██████▎ | 629/1000 [06:41<03:57, 1.56it/s, loss=0.00402, lr=0.001]\nSteps: 63%|██████▎ | 629/1000 [06:41<03:57, 1.56it/s, loss=0.0083, lr=0.001] \nSteps: 63%|██████▎ | 630/1000 [06:41<03:55, 1.57it/s, loss=0.0083, lr=0.001]\ntensor(0.0057, device='cuda:0')\ntensor([[0.4073],\n[0.4077]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4069], device='cuda:0')\nSteps: 63%|██████▎ | 630/1000 [06:41<03:55, 1.57it/s, loss=0.00442, lr=0.001]\nSteps: 63%|██████▎ | 631/1000 [06:42<03:55, 1.57it/s, loss=0.00442, lr=0.001]\nSteps: 63%|██████▎ | 631/1000 [06:42<03:55, 1.57it/s, loss=0.00108, lr=0.001]\nSteps: 63%|██████▎ | 632/1000 [06:43<03:53, 1.58it/s, loss=0.00108, lr=0.001]\ntensor(0.0056, device='cuda:0')\ntensor([[0.4073],\n[0.4075]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4068], device='cuda:0')\nSteps: 63%|██████▎ | 632/1000 [06:43<03:53, 1.58it/s, loss=0.0475, lr=0.001] \nSteps: 63%|██████▎ | 633/1000 [06:43<03:54, 1.57it/s, loss=0.0475, lr=0.001]\nSteps: 63%|██████▎ | 633/1000 [06:43<03:54, 1.57it/s, loss=0.000338, lr=0.001]\nSteps: 63%|██████▎ | 634/1000 [06:44<03:52, 1.57it/s, loss=0.000338, lr=0.001]\ntensor(0.0017, device='cuda:0')\ntensor([[0.4073],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4067], device='cuda:0')\nSteps: 63%|██████▎ | 634/1000 [06:44<03:52, 1.57it/s, loss=0.00117, lr=0.001] \nSteps: 64%|██████▎ | 635/1000 [06:45<03:53, 1.56it/s, loss=0.00117, lr=0.001]\nSteps: 64%|██████▎ | 635/1000 [06:45<03:53, 1.56it/s, loss=0.000606, lr=0.001]\nSteps: 64%|██████▎ | 636/1000 [06:45<03:51, 1.57it/s, loss=0.000606, lr=0.001]\ntensor(0.0010, device='cuda:0')\ntensor([[0.4074],\n[0.4073]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4066], device='cuda:0')\nSteps: 64%|██████▎ | 636/1000 [06:45<03:51, 1.57it/s, loss=0.00112, lr=0.001] \nSteps: 64%|██████▎ | 637/1000 [06:46<03:52, 1.56it/s, loss=0.00112, lr=0.001]\nSteps: 64%|██████▎ | 637/1000 [06:46<03:52, 1.56it/s, loss=0.00041, lr=0.001]\nSteps: 64%|██████▍ | 638/1000 [06:46<03:49, 1.57it/s, loss=0.00041, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4073],\n[0.4072]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4065], device='cuda:0')\nSteps: 64%|██████▍ | 638/1000 [06:46<03:49, 1.57it/s, loss=0.0321, lr=0.001] \nSteps: 64%|██████▍ | 639/1000 [06:47<03:51, 1.56it/s, loss=0.0321, lr=0.001]\nSteps: 64%|██████▍ | 639/1000 [06:47<03:51, 1.56it/s, loss=0.0054, lr=0.001]\nSteps: 64%|██████▍ | 640/1000 [06:48<03:48, 1.58it/s, loss=0.0054, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4073],\n[0.4071]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4064], device='cuda:0')\nSteps: 64%|██████▍ | 640/1000 [06:48<03:48, 1.58it/s, loss=0.00463, lr=0.001]\nSteps: 64%|██████▍ | 641/1000 [06:48<03:49, 1.57it/s, loss=0.00463, lr=0.001]\nSteps: 64%|██████▍ | 641/1000 [06:48<03:49, 1.57it/s, loss=0.00217, lr=0.001]\nSteps: 64%|██████▍ | 642/1000 [06:49<03:46, 1.58it/s, loss=0.00217, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4073],\n[0.4070]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4063], device='cuda:0')\nSteps: 64%|██████▍ | 642/1000 [06:49<03:46, 1.58it/s, loss=0.0276, lr=0.001] \nSteps: 64%|██████▍ | 643/1000 [06:50<03:47, 1.57it/s, loss=0.0276, lr=0.001]\nSteps: 64%|██████▍ | 643/1000 [06:50<03:47, 1.57it/s, loss=0.0123, lr=0.001]\nSteps: 64%|██████▍ | 644/1000 [06:50<03:45, 1.58it/s, loss=0.0123, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4072],\n[0.4069]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4062], device='cuda:0')\nSteps: 64%|██████▍ | 644/1000 [06:50<03:45, 1.58it/s, loss=0.000329, lr=0.001]\nSteps: 64%|██████▍ | 645/1000 [06:51<03:46, 1.57it/s, loss=0.000329, lr=0.001]\nSteps: 64%|██████▍ | 645/1000 [06:51<03:46, 1.57it/s, loss=0.0394, lr=0.001] \nSteps: 65%|██████▍ | 646/1000 [06:52<03:44, 1.58it/s, loss=0.0394, lr=0.001]\ntensor(0.0112, device='cuda:0')\ntensor([[0.4072],\n[0.4069]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4062], device='cuda:0')\nSteps: 65%|██████▍ | 646/1000 [06:52<03:44, 1.58it/s, loss=0.0201, lr=0.001]\nSteps: 65%|██████▍ | 647/1000 [06:52<03:45, 1.57it/s, loss=0.0201, lr=0.001]\nSteps: 65%|██████▍ | 647/1000 [06:52<03:45, 1.57it/s, loss=0.0107, lr=0.001]\nSteps: 65%|██████▍ | 648/1000 [06:53<03:42, 1.58it/s, loss=0.0107, lr=0.001]\ntensor(0.0023, device='cuda:0')\ntensor([[0.4073],\n[0.4070]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4063], device='cuda:0')\nSteps: 65%|██████▍ | 648/1000 [06:53<03:42, 1.58it/s, loss=0.00166, lr=0.001]\nSteps: 65%|██████▍ | 649/1000 [06:53<03:43, 1.57it/s, loss=0.00166, lr=0.001]\nSteps: 65%|██████▍ | 649/1000 [06:53<03:43, 1.57it/s, loss=0.00491, lr=0.001]\nSteps: 65%|██████▌ | 650/1000 [06:54<03:41, 1.58it/s, loss=0.00491, lr=0.001]\ntensor(0.0040, device='cuda:0')\ntensor([[0.4073],\n[0.4071]], device='cuda:0')\nCurrent Norm : tensor([0.4065, 0.4064], device='cuda:0')\nSteps: 65%|██████▌ | 650/1000 [06:54<03:41, 1.58it/s, loss=0.00333, lr=0.001]\nSteps: 65%|██████▌ | 651/1000 [06:55<03:43, 1.56it/s, loss=0.00333, lr=0.001]\nSteps: 65%|██████▌ | 651/1000 [06:55<03:43, 1.56it/s, loss=0.000373, lr=0.001]\nSteps: 65%|██████▌ | 652/1000 [06:55<03:42, 1.57it/s, loss=0.000373, lr=0.001]\ntensor(0.0067, device='cuda:0')\ntensor([[0.4073],\n[0.4072]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4065], device='cuda:0')\nSteps: 65%|██████▌ | 652/1000 [06:55<03:42, 1.57it/s, loss=0.0434, lr=0.001] \nSteps: 65%|██████▌ | 653/1000 [06:56<03:42, 1.56it/s, loss=0.0434, lr=0.001]\nSteps: 65%|██████▌ | 653/1000 [06:56<03:42, 1.56it/s, loss=0.00562, lr=0.001]\nSteps: 65%|██████▌ | 654/1000 [06:57<03:39, 1.57it/s, loss=0.00562, lr=0.001]\ntensor(0.0015, device='cuda:0')\ntensor([[0.4073],\n[0.4074]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4067], device='cuda:0')\nSteps: 65%|██████▌ | 654/1000 [06:57<03:39, 1.57it/s, loss=0.000348, lr=0.001]\nSteps: 66%|██████▌ | 655/1000 [06:57<03:40, 1.56it/s, loss=0.000348, lr=0.001]\nSteps: 66%|██████▌ | 655/1000 [06:57<03:40, 1.56it/s, loss=0.00938, lr=0.001] \nSteps: 66%|██████▌ | 656/1000 [06:58<03:37, 1.58it/s, loss=0.00938, lr=0.001]\ntensor(0.0103, device='cuda:0')\ntensor([[0.4074],\n[0.4076]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4069], device='cuda:0')\nSteps: 66%|██████▌ | 656/1000 [06:58<03:37, 1.58it/s, loss=0.0624, lr=0.001] \nSteps: 66%|██████▌ | 657/1000 [06:59<03:38, 1.57it/s, loss=0.0624, lr=0.001]\nSteps: 66%|██████▌ | 657/1000 [06:59<03:38, 1.57it/s, loss=0.0186, lr=0.001]\nSteps: 66%|██████▌ | 658/1000 [06:59<03:36, 1.58it/s, loss=0.0186, lr=0.001]\ntensor(0.0079, device='cuda:0')\ntensor([[0.4075],\n[0.4079]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4071], device='cuda:0')\nSteps: 66%|██████▌ | 658/1000 [06:59<03:36, 1.58it/s, loss=0.00454, lr=0.001]\nSteps: 66%|██████▌ | 659/1000 [07:00<03:37, 1.56it/s, loss=0.00454, lr=0.001]\nSteps: 66%|██████▌ | 659/1000 [07:00<03:37, 1.56it/s, loss=0.0198, lr=0.001] \nSteps: 66%|██████▌ | 660/1000 [07:00<03:35, 1.58it/s, loss=0.0198, lr=0.001]\ntensor(0.0076, device='cuda:0')\ntensor([[0.4076],\n[0.4083]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4074], device='cuda:0')\nSteps: 66%|██████▌ | 660/1000 [07:00<03:35, 1.58it/s, loss=0.00106, lr=0.001]\nSteps: 66%|██████▌ | 661/1000 [07:01<03:37, 1.56it/s, loss=0.00106, lr=0.001]\nSteps: 66%|██████▌ | 661/1000 [07:01<03:37, 1.56it/s, loss=0.0202, lr=0.001] \nSteps: 66%|██████▌ | 662/1000 [07:02<03:37, 1.56it/s, loss=0.0202, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4076],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4077], device='cuda:0')\nSteps: 66%|██████▌ | 662/1000 [07:02<03:37, 1.56it/s, loss=0.0313, lr=0.001]\nSteps: 66%|██████▋ | 663/1000 [07:02<03:38, 1.54it/s, loss=0.0313, lr=0.001]\nSteps: 66%|██████▋ | 663/1000 [07:02<03:38, 1.54it/s, loss=0.00359, lr=0.001]\nSteps: 66%|██████▋ | 664/1000 [07:03<03:35, 1.56it/s, loss=0.00359, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4077],\n[0.4088]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4079], device='cuda:0')\nSteps: 66%|██████▋ | 664/1000 [07:03<03:35, 1.56it/s, loss=0.00739, lr=0.001]\nSteps: 66%|██████▋ | 665/1000 [07:04<03:36, 1.55it/s, loss=0.00739, lr=0.001]\nSteps: 66%|██████▋ | 665/1000 [07:04<03:36, 1.55it/s, loss=0.0146, lr=0.001] \nSteps: 67%|██████▋ | 666/1000 [07:04<03:33, 1.56it/s, loss=0.0146, lr=0.001]\ntensor(0.0082, device='cuda:0')\ntensor([[0.4079],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4081], device='cuda:0')\nSteps: 67%|██████▋ | 666/1000 [07:04<03:33, 1.56it/s, loss=0.00113, lr=0.001]\nSteps: 67%|██████▋ | 667/1000 [07:05<03:33, 1.56it/s, loss=0.00113, lr=0.001]\nSteps: 67%|██████▋ | 667/1000 [07:05<03:33, 1.56it/s, loss=0.0186, lr=0.001] \nSteps: 67%|██████▋ | 668/1000 [07:06<03:31, 1.57it/s, loss=0.0186, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4081],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4082], device='cuda:0')\nSteps: 67%|██████▋ | 668/1000 [07:06<03:31, 1.57it/s, loss=0.0289, lr=0.001]\nSteps: 67%|██████▋ | 669/1000 [07:06<03:32, 1.56it/s, loss=0.0289, lr=0.001]\nSteps: 67%|██████▋ | 669/1000 [07:06<03:32, 1.56it/s, loss=0.0101, lr=0.001]\nSteps: 67%|██████▋ | 670/1000 [07:07<03:29, 1.57it/s, loss=0.0101, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4083],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4084], device='cuda:0')\nSteps: 67%|██████▋ | 670/1000 [07:07<03:29, 1.57it/s, loss=0.00419, lr=0.001]\nSteps: 67%|██████▋ | 671/1000 [07:08<03:31, 1.56it/s, loss=0.00419, lr=0.001]\nSteps: 67%|██████▋ | 671/1000 [07:08<03:31, 1.56it/s, loss=0.00952, lr=0.001]\nSteps: 67%|██████▋ | 672/1000 [07:08<03:28, 1.57it/s, loss=0.00952, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4086],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4085], device='cuda:0')\nSteps: 67%|██████▋ | 672/1000 [07:08<03:28, 1.57it/s, loss=0.0168, lr=0.001] \nSteps: 67%|██████▋ | 673/1000 [07:09<03:29, 1.56it/s, loss=0.0168, lr=0.001]\nSteps: 67%|██████▋ | 673/1000 [07:09<03:29, 1.56it/s, loss=0.00155, lr=0.001]\nSteps: 67%|██████▋ | 674/1000 [07:09<03:26, 1.58it/s, loss=0.00155, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4087],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4086], device='cuda:0')\nSteps: 67%|██████▋ | 674/1000 [07:09<03:26, 1.58it/s, loss=0.0634, lr=0.001] \nSteps: 68%|██████▊ | 675/1000 [07:10<03:28, 1.56it/s, loss=0.0634, lr=0.001]\nSteps: 68%|██████▊ | 675/1000 [07:10<03:28, 1.56it/s, loss=0.0215, lr=0.001]\nSteps: 68%|██████▊ | 676/1000 [07:11<03:26, 1.57it/s, loss=0.0215, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4089],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4087], device='cuda:0')\nSteps: 68%|██████▊ | 676/1000 [07:11<03:26, 1.57it/s, loss=0.0023, lr=0.001]\nSteps: 68%|██████▊ | 677/1000 [07:11<03:27, 1.56it/s, loss=0.0023, lr=0.001]\nSteps: 68%|██████▊ | 677/1000 [07:11<03:27, 1.56it/s, loss=0.000175, lr=0.001]\nSteps: 68%|██████▊ | 678/1000 [07:12<03:24, 1.57it/s, loss=0.000175, lr=0.001]\ntensor(0.0088, device='cuda:0')\ntensor([[0.4089],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4087], device='cuda:0')\nSteps: 68%|██████▊ | 678/1000 [07:12<03:24, 1.57it/s, loss=0.00022, lr=0.001] \nSteps: 68%|██████▊ | 679/1000 [07:13<03:25, 1.56it/s, loss=0.00022, lr=0.001]\nSteps: 68%|██████▊ | 679/1000 [07:13<03:25, 1.56it/s, loss=0.0503, lr=0.001] \nSteps: 68%|██████▊ | 680/1000 [07:13<03:23, 1.58it/s, loss=0.0503, lr=0.001]\ntensor(0.0076, device='cuda:0')\ntensor([[0.4090],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4088], device='cuda:0')\nSteps: 68%|██████▊ | 680/1000 [07:13<03:23, 1.58it/s, loss=0.0464, lr=0.001]\nSteps: 68%|██████▊ | 681/1000 [07:14<03:23, 1.56it/s, loss=0.0464, lr=0.001]\nSteps: 68%|██████▊ | 681/1000 [07:14<03:23, 1.56it/s, loss=0.000616, lr=0.001]\nSteps: 68%|██████▊ | 682/1000 [07:14<03:21, 1.58it/s, loss=0.000616, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4091],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4088], device='cuda:0')\nSteps: 68%|██████▊ | 682/1000 [07:14<03:21, 1.58it/s, loss=0.000937, lr=0.001]\nSteps: 68%|██████▊ | 683/1000 [07:15<03:21, 1.57it/s, loss=0.000937, lr=0.001]\nSteps: 68%|██████▊ | 683/1000 [07:15<03:21, 1.57it/s, loss=0.00264, lr=0.001] \nSteps: 68%|██████▊ | 684/1000 [07:16<03:19, 1.58it/s, loss=0.00264, lr=0.001]\ntensor(0.0082, device='cuda:0')\ntensor([[0.4093],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4088], device='cuda:0')\nSteps: 68%|██████▊ | 684/1000 [07:16<03:19, 1.58it/s, loss=0.068, lr=0.001] \nSteps: 68%|██████▊ | 685/1000 [07:16<03:19, 1.58it/s, loss=0.068, lr=0.001]\nSteps: 68%|██████▊ | 685/1000 [07:16<03:19, 1.58it/s, loss=0.000431, lr=0.001]\nSteps: 69%|██████▊ | 686/1000 [07:17<03:18, 1.59it/s, loss=0.000431, lr=0.001]\ntensor(0.0094, device='cuda:0')\ntensor([[0.4094],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4089], device='cuda:0')\nSteps: 69%|██████▊ | 686/1000 [07:17<03:18, 1.59it/s, loss=0.0451, lr=0.001] \nSteps: 69%|██████▊ | 687/1000 [07:18<03:19, 1.57it/s, loss=0.0451, lr=0.001]\nSteps: 69%|██████▊ | 687/1000 [07:18<03:19, 1.57it/s, loss=0.000414, lr=0.001]\nSteps: 69%|██████▉ | 688/1000 [07:18<03:17, 1.58it/s, loss=0.000414, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4095],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4090], device='cuda:0')\nSteps: 69%|██████▉ | 688/1000 [07:18<03:17, 1.58it/s, loss=0.00126, lr=0.001] \nSteps: 69%|██████▉ | 689/1000 [07:19<03:18, 1.57it/s, loss=0.00126, lr=0.001]\nSteps: 69%|██████▉ | 689/1000 [07:19<03:18, 1.57it/s, loss=0.000966, lr=0.001]\nSteps: 69%|██████▉ | 690/1000 [07:20<03:16, 1.58it/s, loss=0.000966, lr=0.001]\ntensor(0.0064, device='cuda:0')\ntensor([[0.4095],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4091], device='cuda:0')\nSteps: 69%|██████▉ | 690/1000 [07:20<03:16, 1.58it/s, loss=0.0196, lr=0.001] \nSteps: 69%|██████▉ | 691/1000 [07:20<03:17, 1.56it/s, loss=0.0196, lr=0.001]\nSteps: 69%|██████▉ | 691/1000 [07:20<03:17, 1.56it/s, loss=0.000678, lr=0.001]\nSteps: 69%|██████▉ | 692/1000 [07:21<03:15, 1.57it/s, loss=0.000678, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4096],\n [0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4092], device='cuda:0')\nSteps: 69%|██████▉ | 692/1000 [07:21<03:15, 1.57it/s, loss=0.0733, lr=0.001] \nSteps: 69%|██████▉ | 693/1000 [07:21<03:16, 1.56it/s, loss=0.0733, lr=0.001]\nSteps: 69%|██████▉ | 693/1000 [07:21<03:16, 1.56it/s, loss=0.000746, lr=0.001]\nSteps: 69%|██████▉ | 694/1000 [07:22<03:14, 1.57it/s, loss=0.000746, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4097],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4093], device='cuda:0')\nSteps: 69%|██████▉ | 694/1000 [07:22<03:14, 1.57it/s, loss=0.0629, lr=0.001] \nSteps: 70%|██████▉ | 695/1000 [07:23<03:15, 1.56it/s, loss=0.0629, lr=0.001]\nSteps: 70%|██████▉ | 695/1000 [07:23<03:15, 1.56it/s, loss=0.00226, lr=0.001]\nSteps: 70%|██████▉ | 696/1000 [07:23<03:13, 1.57it/s, loss=0.00226, lr=0.001]\ntensor(0.0129, device='cuda:0')\ntensor([[0.4097],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4094], device='cuda:0')\nSteps: 70%|██████▉ | 696/1000 [07:23<03:13, 1.57it/s, loss=0.0491, lr=0.001] \nSteps: 70%|██████▉ | 697/1000 [07:24<03:13, 1.56it/s, loss=0.0491, lr=0.001]\nSteps: 70%|██████▉ | 697/1000 [07:24<03:13, 1.56it/s, loss=0.0576, lr=0.001]\nSteps: 70%|██████▉ | 698/1000 [07:25<03:10, 1.58it/s, loss=0.0576, lr=0.001]\ntensor(0.0081, device='cuda:0')\ntensor([[0.4098],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4095], device='cuda:0')\nSteps: 70%|██████▉ | 698/1000 [07:25<03:10, 1.58it/s, loss=0.0844, lr=0.001]\nSteps: 70%|██████▉ | 699/1000 [07:25<03:11, 1.57it/s, loss=0.0844, lr=0.001]\nSteps: 70%|██████▉ | 699/1000 [07:25<03:11, 1.57it/s, loss=0.00121, lr=0.001]\nSteps: 70%|███████ | 700/1000 [07:26<03:09, 1.58it/s, loss=0.00121, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0145, -0.0087, -0.0056, -0.0218], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0014, 0.0039, -0.0050, 0.0112], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_700.safetensors\ntensor(0.0117, device='cuda:0')\ntensor([[0.4100],\n[0.4108]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4097], device='cuda:0')\nSteps: 70%|███████ | 700/1000 [07:26<03:09, 1.58it/s, loss=0.00726, lr=0.001]\nSteps: 70%|███████ | 701/1000 [07:27<03:12, 1.55it/s, loss=0.00726, lr=0.001]\nSteps: 70%|███████ | 701/1000 [07:27<03:12, 1.55it/s, loss=0.0183, lr=0.001] \nSteps: 70%|███████ | 702/1000 [07:27<03:09, 1.57it/s, loss=0.0183, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4104],\n[0.4111]], device='cuda:0')\nCurrent Norm : tensor([0.4093, 0.4100], device='cuda:0')\nSteps: 70%|███████ | 702/1000 [07:27<03:09, 1.57it/s, loss=0.00239, lr=0.001]\nSteps: 70%|███████ | 703/1000 [07:28<03:10, 1.56it/s, loss=0.00239, lr=0.001]\nSteps: 70%|███████ | 703/1000 [07:28<03:10, 1.56it/s, loss=0.00413, lr=0.001]\nSteps: 70%|███████ | 704/1000 [07:28<03:08, 1.57it/s, loss=0.00413, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4107],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4102], device='cuda:0')\nSteps: 70%|███████ | 704/1000 [07:28<03:08, 1.57it/s, loss=0.00356, lr=0.001]\nSteps: 70%|███████ | 705/1000 [07:29<03:08, 1.56it/s, loss=0.00356, lr=0.001]\nSteps: 70%|███████ | 705/1000 [07:29<03:08, 1.56it/s, loss=0.0161, lr=0.001] \nSteps: 71%|███████ | 706/1000 [07:30<03:07, 1.56it/s, loss=0.0161, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4111],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4104], device='cuda:0')\nSteps: 71%|███████ | 706/1000 [07:30<03:07, 1.56it/s, loss=0.00678, lr=0.001]\nSteps: 71%|███████ | 707/1000 [07:30<03:09, 1.55it/s, loss=0.00678, lr=0.001]\nSteps: 71%|███████ | 707/1000 [07:30<03:09, 1.55it/s, loss=0.0347, lr=0.001] \nSteps: 71%|███████ | 708/1000 [07:31<03:06, 1.57it/s, loss=0.0347, lr=0.001]\ntensor(0.0042, device='cuda:0')\ntensor([[0.4114],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4102, 0.4106], device='cuda:0')\nSteps: 71%|███████ | 708/1000 [07:31<03:06, 1.57it/s, loss=0.000707, lr=0.001]\nSteps: 71%|███████ | 709/1000 [07:32<03:06, 1.56it/s, loss=0.000707, lr=0.001]\nSteps: 71%|███████ | 709/1000 [07:32<03:06, 1.56it/s, loss=0.0312, lr=0.001] \nSteps: 71%|███████ | 710/1000 [07:32<03:04, 1.58it/s, loss=0.0312, lr=0.001]\ntensor(0.0064, device='cuda:0')\ntensor([[0.4116],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4105, 0.4109], device='cuda:0')\nSteps: 71%|███████ | 710/1000 [07:32<03:04, 1.58it/s, loss=0.0412, lr=0.001]\nSteps: 71%|███████ | 711/1000 [07:33<03:05, 1.56it/s, loss=0.0412, lr=0.001]\nSteps: 71%|███████ | 711/1000 [07:33<03:05, 1.56it/s, loss=0.0127, lr=0.001]\nSteps: 71%|███████ | 712/1000 [07:34<03:03, 1.57it/s, loss=0.0127, lr=0.001]\ntensor(0.0076, device='cuda:0')\ntensor([[0.4118],\n[0.4123]], device='cuda:0')\nCurrent Norm : tensor([0.4107, 0.4111], device='cuda:0')\nSteps: 71%|███████ | 712/1000 [07:34<03:03, 1.57it/s, loss=0.035, lr=0.001] \nSteps: 71%|███████▏ | 713/1000 [07:34<03:04, 1.55it/s, loss=0.035, lr=0.001]\nSteps: 71%|███████▏ | 713/1000 [07:34<03:04, 1.55it/s, loss=0.00396, lr=0.001]\nSteps: 71%|███████▏ | 714/1000 [07:35<03:03, 1.56it/s, loss=0.00396, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4120],\n[0.4124]], device='cuda:0')\nCurrent Norm : tensor([0.4108, 0.4112], device='cuda:0')\nSteps: 71%|███████▏ | 714/1000 [07:35<03:03, 1.56it/s, loss=0.0197, lr=0.001] \nSteps: 72%|███████▏ | 715/1000 [07:36<03:03, 1.55it/s, loss=0.0197, lr=0.001]\nSteps: 72%|███████▏ | 715/1000 [07:36<03:03, 1.55it/s, loss=0.013, lr=0.001] \nSteps: 72%|███████▏ | 716/1000 [07:36<03:00, 1.57it/s, loss=0.013, lr=0.001]\ntensor(0.0049, device='cuda:0')\ntensor([[0.4121],\n[0.4126]], device='cuda:0')\nCurrent Norm : tensor([0.4109, 0.4113], device='cuda:0')\nSteps: 72%|███████▏ | 716/1000 [07:36<03:00, 1.57it/s, loss=0.0397, lr=0.001]\nSteps: 72%|███████▏ | 717/1000 [07:37<03:01, 1.56it/s, loss=0.0397, lr=0.001]\nSteps: 72%|███████▏ | 717/1000 [07:37<03:01, 1.56it/s, loss=0.00506, lr=0.001]\nSteps: 72%|███████▏ | 718/1000 [07:37<02:59, 1.57it/s, loss=0.00506, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4122],\n[0.4126]], device='cuda:0')\nCurrent Norm : tensor([0.4110, 0.4114], device='cuda:0')\nSteps: 72%|███████▏ | 718/1000 [07:37<02:59, 1.57it/s, loss=0.198, lr=0.001] \nSteps: 72%|███████▏ | 719/1000 [07:38<03:00, 1.56it/s, loss=0.198, lr=0.001]\nSteps: 72%|███████▏ | 719/1000 [07:38<03:00, 1.56it/s, loss=0.00696, lr=0.001]\nSteps: 72%|███████▏ | 720/1000 [07:39<02:57, 1.58it/s, loss=0.00696, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4123],\n[0.4126]], device='cuda:0')\nCurrent Norm : tensor([0.4110, 0.4114], device='cuda:0')\nSteps: 72%|███████▏ | 720/1000 [07:39<02:57, 1.58it/s, loss=0.000396, lr=0.001]\nSteps: 72%|███████▏ | 721/1000 [07:39<02:59, 1.56it/s, loss=0.000396, lr=0.001]\nSteps: 72%|███████▏ | 721/1000 [07:39<02:59, 1.56it/s, loss=0.00705, lr=0.001] \nSteps: 72%|███████▏ | 722/1000 [07:40<02:56, 1.57it/s, loss=0.00705, lr=0.001]\ntensor(0.0016, device='cuda:0')\ntensor([[0.4122],\n[0.4126]], device='cuda:0')\nCurrent Norm : tensor([0.4110, 0.4113], device='cuda:0')\nSteps: 72%|███████▏ | 722/1000 [07:40<02:56, 1.57it/s, loss=0.00332, lr=0.001]\nSteps: 72%|███████▏ | 723/1000 [07:41<02:57, 1.56it/s, loss=0.00332, lr=0.001]\nSteps: 72%|███████▏ | 723/1000 [07:41<02:57, 1.56it/s, loss=0.000789, lr=0.001]\nSteps: 72%|███████▏ | 724/1000 [07:41<02:55, 1.58it/s, loss=0.000789, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4121],\n[0.4125]], device='cuda:0')\nCurrent Norm : tensor([0.4109, 0.4112], device='cuda:0')\nSteps: 72%|███████▏ | 724/1000 [07:41<02:55, 1.58it/s, loss=0.00194, lr=0.001] \nSteps: 72%|███████▎ | 725/1000 [07:42<02:55, 1.57it/s, loss=0.00194, lr=0.001]\nSteps: 72%|███████▎ | 725/1000 [07:42<02:55, 1.57it/s, loss=0.00128, lr=0.001]\nSteps: 73%|███████▎ | 726/1000 [07:43<02:53, 1.58it/s, loss=0.00128, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4119],\n[0.4123]], device='cuda:0')\nCurrent Norm : tensor([0.4107, 0.4111], device='cuda:0')\nSteps: 73%|███████▎ | 726/1000 [07:43<02:53, 1.58it/s, loss=0.0698, lr=0.001] \nSteps: 73%|███████▎ | 727/1000 [07:43<02:53, 1.57it/s, loss=0.0698, lr=0.001]\nSteps: 73%|███████▎ | 727/1000 [07:43<02:53, 1.57it/s, loss=0.00162, lr=0.001]\nSteps: 73%|███████▎ | 728/1000 [07:44<02:51, 1.59it/s, loss=0.00162, lr=0.001]\ntensor(0.0004, device='cuda:0')\ntensor([[0.4117],\n[0.4122]], device='cuda:0')\nCurrent Norm : tensor([0.4106, 0.4110], device='cuda:0')\nSteps: 73%|███████▎ | 728/1000 [07:44<02:51, 1.59it/s, loss=0.000498, lr=0.001]\nSteps: 73%|███████▎ | 729/1000 [07:44<02:52, 1.57it/s, loss=0.000498, lr=0.001]\nSteps: 73%|███████▎ | 729/1000 [07:44<02:52, 1.57it/s, loss=0.00026, lr=0.001] \nSteps: 73%|███████▎ | 730/1000 [07:45<02:50, 1.58it/s, loss=0.00026, lr=0.001]\ntensor(0.0013, device='cuda:0')\ntensor([[0.4115],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4104, 0.4108], device='cuda:0')\nSteps: 73%|███████▎ | 730/1000 [07:45<02:50, 1.58it/s, loss=0.0024, lr=0.001] \nSteps: 73%|███████▎ | 731/1000 [07:46<02:50, 1.57it/s, loss=0.0024, lr=0.001]\nSteps: 73%|███████▎ | 731/1000 [07:46<02:50, 1.57it/s, loss=0.000949, lr=0.001]\nSteps: 73%|███████▎ | 732/1000 [07:46<02:48, 1.59it/s, loss=0.000949, lr=0.001]\ntensor(0.0055, device='cuda:0')\ntensor([[0.4112],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4101, 0.4106], device='cuda:0')\nSteps: 73%|███████▎ | 732/1000 [07:46<02:48, 1.59it/s, loss=0.00115, lr=0.001] \nSteps: 73%|███████▎ | 733/1000 [07:47<02:50, 1.56it/s, loss=0.00115, lr=0.001]\nSteps: 73%|███████▎ | 733/1000 [07:47<02:50, 1.56it/s, loss=0.0166, lr=0.001] \nSteps: 73%|███████▎ | 734/1000 [07:48<02:48, 1.58it/s, loss=0.0166, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4110],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4104], device='cuda:0')\nSteps: 73%|███████▎ | 734/1000 [07:48<02:48, 1.58it/s, loss=0.0649, lr=0.001]\nSteps: 74%|███████▎ | 735/1000 [07:48<02:48, 1.57it/s, loss=0.0649, lr=0.001]\nSteps: 74%|███████▎ | 735/1000 [07:48<02:48, 1.57it/s, loss=0.0919, lr=0.001]\nSteps: 74%|███████▎ | 736/1000 [07:49<02:47, 1.58it/s, loss=0.0919, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4107],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4101], device='cuda:0')\nSteps: 74%|███████▎ | 736/1000 [07:49<02:47, 1.58it/s, loss=0.00105, lr=0.001]\nSteps: 74%|███████▎ | 737/1000 [07:50<02:48, 1.56it/s, loss=0.00105, lr=0.001]\nSteps: 74%|███████▎ | 737/1000 [07:50<02:48, 1.56it/s, loss=0.00841, lr=0.001]\nSteps: 74%|███████▍ | 738/1000 [07:50<02:45, 1.58it/s, loss=0.00841, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4104],\n[0.4110]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4099], device='cuda:0')\nSteps: 74%|███████▍ | 738/1000 [07:50<02:45, 1.58it/s, loss=0.00961, lr=0.001]\nSteps: 74%|███████▍ | 739/1000 [07:51<02:46, 1.57it/s, loss=0.00961, lr=0.001]\nSteps: 74%|███████▍ | 739/1000 [07:51<02:46, 1.57it/s, loss=0.00115, lr=0.001]\nSteps: 74%|███████▍ | 740/1000 [07:51<02:44, 1.58it/s, loss=0.00115, lr=0.001]\ntensor(0.0072, device='cuda:0')\ntensor([[0.4103],\n[0.4108]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4097], device='cuda:0')\nSteps: 74%|███████▍ | 740/1000 [07:51<02:44, 1.58it/s, loss=0.0325, lr=0.001] \nSteps: 74%|███████▍ | 741/1000 [07:52<02:44, 1.57it/s, loss=0.0325, lr=0.001]\nSteps: 74%|███████▍ | 741/1000 [07:52<02:44, 1.57it/s, loss=0.00165, lr=0.001]\nSteps: 74%|███████▍ | 742/1000 [07:53<02:42, 1.59it/s, loss=0.00165, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4101],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4091, 0.4095], device='cuda:0')\nSteps: 74%|███████▍ | 742/1000 [07:53<02:42, 1.59it/s, loss=0.0285, lr=0.001] \nSteps: 74%|███████▍ | 743/1000 [07:53<02:43, 1.57it/s, loss=0.0285, lr=0.001]\nSteps: 74%|███████▍ | 743/1000 [07:53<02:43, 1.57it/s, loss=0.00142, lr=0.001]\nSteps: 74%|███████▍ | 744/1000 [07:54<02:42, 1.58it/s, loss=0.00142, lr=0.001]\ntensor(0.0010, device='cuda:0')\ntensor([[0.4100],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4090, 0.4094], device='cuda:0')\nSteps: 74%|███████▍ | 744/1000 [07:54<02:42, 1.58it/s, loss=0.017, lr=0.001] \nSteps: 74%|███████▍ | 745/1000 [07:55<02:43, 1.56it/s, loss=0.017, lr=0.001]\nSteps: 74%|███████▍ | 745/1000 [07:55<02:43, 1.56it/s, loss=0.00199, lr=0.001]\nSteps: 75%|███████▍ | 746/1000 [07:55<02:41, 1.57it/s, loss=0.00199, lr=0.001]\ntensor(0.0085, device='cuda:0')\ntensor([[0.4099],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4094], device='cuda:0')\nSteps: 75%|███████▍ | 746/1000 [07:55<02:41, 1.57it/s, loss=0.0158, lr=0.001] \nSteps: 75%|███████▍ | 747/1000 [07:56<02:41, 1.56it/s, loss=0.0158, lr=0.001]\nSteps: 75%|███████▍ | 747/1000 [07:56<02:41, 1.56it/s, loss=0.012, lr=0.001] \nSteps: 75%|███████▍ | 748/1000 [07:57<02:39, 1.58it/s, loss=0.012, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4098],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4094], device='cuda:0')\nSteps: 75%|███████▍ | 748/1000 [07:57<02:39, 1.58it/s, loss=0.0097, lr=0.001]\nSteps: 75%|███████▍ | 749/1000 [07:57<02:40, 1.56it/s, loss=0.0097, lr=0.001]\nSteps: 75%|███████▍ | 749/1000 [07:57<02:40, 1.56it/s, loss=0.00167, lr=0.001]\nSteps: 75%|███████▌ | 750/1000 [07:58<02:38, 1.58it/s, loss=0.00167, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4098],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4088, 0.4093], device='cuda:0')\nSteps: 75%|███████▌ | 750/1000 [07:58<02:38, 1.58it/s, loss=0.0012, lr=0.001] \nSteps: 75%|███████▌ | 751/1000 [07:58<02:39, 1.56it/s, loss=0.0012, lr=0.001]\nSteps: 75%|███████▌ | 751/1000 [07:58<02:39, 1.56it/s, loss=0.00566, lr=0.001]\nSteps: 75%|███████▌ | 752/1000 [07:59<02:37, 1.57it/s, loss=0.00566, lr=0.001]\ntensor(0.0034, device='cuda:0')\ntensor([[0.4097],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4093], device='cuda:0')\nSteps: 75%|███████▌ | 752/1000 [07:59<02:37, 1.57it/s, loss=0.000511, lr=0.001]\nSteps: 75%|███████▌ | 753/1000 [08:00<02:38, 1.56it/s, loss=0.000511, lr=0.001]\nSteps: 75%|███████▌ | 753/1000 [08:00<02:38, 1.56it/s, loss=0.00294, lr=0.001] \nSteps: 75%|███████▌ | 754/1000 [08:00<02:36, 1.58it/s, loss=0.00294, lr=0.001]\ntensor(0.0042, device='cuda:0')\ntensor([[0.4096],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4092], device='cuda:0')\nSteps: 75%|███████▌ | 754/1000 [08:00<02:36, 1.58it/s, loss=0.00867, lr=0.001]\nSteps: 76%|███████▌ | 755/1000 [08:01<02:36, 1.57it/s, loss=0.00867, lr=0.001]\nSteps: 76%|███████▌ | 755/1000 [08:01<02:36, 1.57it/s, loss=0.013, lr=0.001] \nSteps: 76%|███████▌ | 756/1000 [08:02<02:34, 1.58it/s, loss=0.013, lr=0.001]\ntensor(0.0048, device='cuda:0')\ntensor([[0.4095],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4092], device='cuda:0')\nSteps: 76%|███████▌ | 756/1000 [08:02<02:34, 1.58it/s, loss=0.0197, lr=0.001]\nSteps: 76%|███████▌ | 757/1000 [08:02<02:34, 1.57it/s, loss=0.0197, lr=0.001]\nSteps: 76%|███████▌ | 757/1000 [08:02<02:34, 1.57it/s, loss=0.00643, lr=0.001]\nSteps: 76%|███████▌ | 758/1000 [08:03<02:32, 1.58it/s, loss=0.00643, lr=0.001]\ntensor(0.0008, device='cuda:0')\ntensor([[0.4094],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4091], device='cuda:0')\nSteps: 76%|███████▌ | 758/1000 [08:03<02:32, 1.58it/s, loss=0.000187, lr=0.001]\nSteps: 76%|███████▌ | 759/1000 [08:04<02:33, 1.57it/s, loss=0.000187, lr=0.001]\nSteps: 76%|███████▌ | 759/1000 [08:04<02:33, 1.57it/s, loss=0.00284, lr=0.001] \nSteps: 76%|███████▌ | 760/1000 [08:04<02:30, 1.59it/s, loss=0.00284, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4092],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4090], device='cuda:0')\nSteps: 76%|███████▌ | 760/1000 [08:04<02:30, 1.59it/s, loss=0.016, lr=0.001] \nSteps: 76%|███████▌ | 761/1000 [08:05<02:32, 1.56it/s, loss=0.016, lr=0.001]\nSteps: 76%|███████▌ | 761/1000 [08:05<02:32, 1.56it/s, loss=0.000488, lr=0.001]\nSteps: 76%|███████▌ | 762/1000 [08:05<02:31, 1.57it/s, loss=0.000488, lr=0.001]\ntensor(0.0043, device='cuda:0')\ntensor([[0.4090],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4090], device='cuda:0')\nSteps: 76%|███████▌ | 762/1000 [08:05<02:31, 1.57it/s, loss=0.000765, lr=0.001]\nSteps: 76%|███████▋ | 763/1000 [08:06<02:31, 1.56it/s, loss=0.000765, lr=0.001]\nSteps: 76%|███████▋ | 763/1000 [08:06<02:31, 1.56it/s, loss=0.0178, lr=0.001] \nSteps: 76%|███████▋ | 764/1000 [08:07<02:29, 1.58it/s, loss=0.0178, lr=0.001]\ntensor(0.0050, device='cuda:0')\ntensor([[0.4088],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4089], device='cuda:0')\nSteps: 76%|███████▋ | 764/1000 [08:07<02:29, 1.58it/s, loss=0.0123, lr=0.001]\nSteps: 76%|███████▋ | 765/1000 [08:07<02:29, 1.57it/s, loss=0.0123, lr=0.001]\nSteps: 76%|███████▋ | 765/1000 [08:07<02:29, 1.57it/s, loss=0.00927, lr=0.001]\nSteps: 77%|███████▋ | 766/1000 [08:08<02:27, 1.58it/s, loss=0.00927, lr=0.001]\ntensor(0.0081, device='cuda:0')\ntensor([[0.4087],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4089], device='cuda:0')\nSteps: 77%|███████▋ | 766/1000 [08:08<02:27, 1.58it/s, loss=0.0215, lr=0.001] \nSteps: 77%|███████▋ | 767/1000 [08:09<02:28, 1.57it/s, loss=0.0215, lr=0.001]\nSteps: 77%|███████▋ | 767/1000 [08:09<02:28, 1.57it/s, loss=0.0185, lr=0.001]\nSteps: 77%|███████▋ | 768/1000 [08:09<02:26, 1.58it/s, loss=0.0185, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4085],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4088], device='cuda:0')\nSteps: 77%|███████▋ | 768/1000 [08:09<02:26, 1.58it/s, loss=0.00787, lr=0.001]\nSteps: 77%|███████▋ | 769/1000 [08:10<02:27, 1.56it/s, loss=0.00787, lr=0.001]\nSteps: 77%|███████▋ | 769/1000 [08:10<02:27, 1.56it/s, loss=0.0034, lr=0.001] \nSteps: 77%|███████▋ | 770/1000 [08:11<02:26, 1.57it/s, loss=0.0034, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4084],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4088], device='cuda:0')\nSteps: 77%|███████▋ | 770/1000 [08:11<02:26, 1.57it/s, loss=0.00167, lr=0.001]\nSteps: 77%|███████▋ | 771/1000 [08:11<02:27, 1.55it/s, loss=0.00167, lr=0.001]\nSteps: 77%|███████▋ | 771/1000 [08:11<02:27, 1.55it/s, loss=0.0461, lr=0.001] \nSteps: 77%|███████▋ | 772/1000 [08:12<02:25, 1.57it/s, loss=0.0461, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4082],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4074, 0.4087], device='cuda:0')\nSteps: 77%|███████▋ | 772/1000 [08:12<02:25, 1.57it/s, loss=0.0114, lr=0.001]\nSteps: 77%|███████▋ | 773/1000 [08:12<02:25, 1.56it/s, loss=0.0114, lr=0.001]\nSteps: 77%|███████▋ | 773/1000 [08:12<02:25, 1.56it/s, loss=0.00369, lr=0.001]\nSteps: 77%|███████▋ | 774/1000 [08:13<02:24, 1.57it/s, loss=0.00369, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4080],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4087], device='cuda:0')\nSteps: 77%|███████▋ | 774/1000 [08:13<02:24, 1.57it/s, loss=0.0384, lr=0.001] \nSteps: 78%|███████▊ | 775/1000 [08:14<02:24, 1.56it/s, loss=0.0384, lr=0.001]\nSteps: 78%|███████▊ | 775/1000 [08:14<02:24, 1.56it/s, loss=0.0432, lr=0.001]\nSteps: 78%|███████▊ | 776/1000 [08:14<02:22, 1.57it/s, loss=0.0432, lr=0.001]\ntensor(0.0095, device='cuda:0')\ntensor([[0.4079],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4087], device='cuda:0')\nSteps: 78%|███████▊ | 776/1000 [08:14<02:22, 1.57it/s, loss=0.00367, lr=0.001]\nSteps: 78%|███████▊ | 777/1000 [08:15<02:23, 1.56it/s, loss=0.00367, lr=0.001]\nSteps: 78%|███████▊ | 777/1000 [08:15<02:23, 1.56it/s, loss=0.0117, lr=0.001] \nSteps: 78%|███████▊ | 778/1000 [08:16<02:20, 1.57it/s, loss=0.0117, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4078],\n[0.4096]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4087], device='cuda:0')\nSteps: 78%|███████▊ | 778/1000 [08:16<02:20, 1.57it/s, loss=0.00418, lr=0.001]\nSteps: 78%|███████▊ | 779/1000 [08:16<02:21, 1.56it/s, loss=0.00418, lr=0.001]\nSteps: 78%|███████▊ | 779/1000 [08:16<02:21, 1.56it/s, loss=0.00894, lr=0.001]\nSteps: 78%|███████▊ | 780/1000 [08:17<02:19, 1.57it/s, loss=0.00894, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4078],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4087], device='cuda:0')\nSteps: 78%|███████▊ | 780/1000 [08:17<02:19, 1.57it/s, loss=0.00025, lr=0.001]\nSteps: 78%|███████▊ | 781/1000 [08:18<02:20, 1.56it/s, loss=0.00025, lr=0.001]\nSteps: 78%|███████▊ | 781/1000 [08:18<02:20, 1.56it/s, loss=0.0261, lr=0.001] \nSteps: 78%|███████▊ | 782/1000 [08:18<02:18, 1.57it/s, loss=0.0261, lr=0.001]\ntensor(0.0098, device='cuda:0')\ntensor([[0.4079],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4087], device='cuda:0')\nSteps: 78%|███████▊ | 782/1000 [08:18<02:18, 1.57it/s, loss=0.0055, lr=0.001]\nSteps: 78%|███████▊ | 783/1000 [08:19<02:18, 1.57it/s, loss=0.0055, lr=0.001]\nSteps: 78%|███████▊ | 783/1000 [08:19<02:18, 1.57it/s, loss=0.0204, lr=0.001]\nSteps: 78%|███████▊ | 784/1000 [08:19<02:16, 1.58it/s, loss=0.0204, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4080],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4088], device='cuda:0')\nSteps: 78%|███████▊ | 784/1000 [08:19<02:16, 1.58it/s, loss=0.0061, lr=0.001]\nSteps: 78%|███████▊ | 785/1000 [08:20<02:16, 1.57it/s, loss=0.0061, lr=0.001]\nSteps: 78%|███████▊ | 785/1000 [08:20<02:16, 1.57it/s, loss=0.0105, lr=0.001]\nSteps: 79%|███████▊ | 786/1000 [08:21<02:15, 1.58it/s, loss=0.0105, lr=0.001]\ntensor(0.0076, device='cuda:0')\ntensor([[0.4081],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4088], device='cuda:0')\nSteps: 79%|███████▊ | 786/1000 [08:21<02:15, 1.58it/s, loss=0.0608, lr=0.001]\nSteps: 79%|███████▊ | 787/1000 [08:21<02:17, 1.55it/s, loss=0.0608, lr=0.001]\nSteps: 79%|███████▊ | 787/1000 [08:21<02:17, 1.55it/s, loss=0.00173, lr=0.001]\nSteps: 79%|███████▉ | 788/1000 [08:22<02:14, 1.58it/s, loss=0.00173, lr=0.001]\ntensor(0.0013, device='cuda:0')\ntensor([[0.4083],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4089], device='cuda:0')\nSteps: 79%|███████▉ | 788/1000 [08:22<02:14, 1.58it/s, loss=0.00145, lr=0.001]\nSteps: 79%|███████▉ | 789/1000 [08:23<02:14, 1.56it/s, loss=0.00145, lr=0.001]\nSteps: 79%|███████▉ | 789/1000 [08:23<02:14, 1.56it/s, loss=0.00302, lr=0.001]\nSteps: 79%|███████▉ | 790/1000 [08:23<02:13, 1.58it/s, loss=0.00302, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4085],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4090], device='cuda:0')\nSteps: 79%|███████▉ | 790/1000 [08:23<02:13, 1.58it/s, loss=0.00383, lr=0.001]\nSteps: 79%|███████▉ | 791/1000 [08:24<02:13, 1.56it/s, loss=0.00383, lr=0.001]\nSteps: 79%|███████▉ | 791/1000 [08:24<02:13, 1.56it/s, loss=0.004, lr=0.001] \nSteps: 79%|███████▉ | 792/1000 [08:25<02:11, 1.58it/s, loss=0.004, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4085],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4091], device='cuda:0')\nSteps: 79%|███████▉ | 792/1000 [08:25<02:11, 1.58it/s, loss=0.0108, lr=0.001]\nSteps: 79%|███████▉ | 793/1000 [08:25<02:12, 1.56it/s, loss=0.0108, lr=0.001]\nSteps: 79%|███████▉ | 793/1000 [08:25<02:12, 1.56it/s, loss=0.00444, lr=0.001]\nSteps: 79%|███████▉ | 794/1000 [08:26<02:10, 1.57it/s, loss=0.00444, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4086],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4092], device='cuda:0')\nSteps: 79%|███████▉ | 794/1000 [08:26<02:10, 1.57it/s, loss=0.000504, lr=0.001]\nSteps: 80%|███████▉ | 795/1000 [08:26<02:11, 1.56it/s, loss=0.000504, lr=0.001]\nSteps: 80%|███████▉ | 795/1000 [08:26<02:11, 1.56it/s, loss=0.00754, lr=0.001] \nSteps: 80%|███████▉ | 796/1000 [08:27<02:09, 1.58it/s, loss=0.00754, lr=0.001]\ntensor(0.0015, device='cuda:0')\ntensor([[0.4086],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4092], device='cuda:0')\nSteps: 80%|███████▉ | 796/1000 [08:27<02:09, 1.58it/s, loss=0.000605, lr=0.001]\nSteps: 80%|███████▉ | 797/1000 [08:28<02:09, 1.56it/s, loss=0.000605, lr=0.001]\nSteps: 80%|███████▉ | 797/1000 [08:28<02:09, 1.56it/s, loss=0.0101, lr=0.001] \nSteps: 80%|███████▉ | 798/1000 [08:28<02:08, 1.57it/s, loss=0.0101, lr=0.001]\ntensor(0.0057, device='cuda:0')\ntensor([[0.4085],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4091], device='cuda:0')\nSteps: 80%|███████▉ | 798/1000 [08:28<02:08, 1.57it/s, loss=0.0274, lr=0.001]\nSteps: 80%|███████▉ | 799/1000 [08:29<02:08, 1.56it/s, loss=0.0274, lr=0.001]\nSteps: 80%|███████▉ | 799/1000 [08:29<02:08, 1.56it/s, loss=0.00328, lr=0.001]\nSteps: 80%|████████ | 800/1000 [08:30<02:07, 1.57it/s, loss=0.00328, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0130, 0.0020, -0.0095, -0.0193], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([-0.0026, 0.0098, -0.0090, 0.0132], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_800.safetensors\ntensor(0.0061, device='cuda:0')\ntensor([[0.4083],\n[0.4101]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4091], device='cuda:0')\nSteps: 80%|████████ | 800/1000 [08:30<02:07, 1.57it/s, loss=0.00261, lr=0.001]\nSteps: 80%|████████ | 801/1000 [08:30<02:07, 1.56it/s, loss=0.00261, lr=0.001]\nSteps: 80%|████████ | 801/1000 [08:30<02:07, 1.56it/s, loss=0.149, lr=0.001] \nSteps: 80%|████████ | 802/1000 [08:31<02:05, 1.58it/s, loss=0.149, lr=0.001]\ntensor(0.0048, device='cuda:0')\ntensor([[0.4082],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4074, 0.4090], device='cuda:0')\nSteps: 80%|████████ | 802/1000 [08:31<02:05, 1.58it/s, loss=0.0241, lr=0.001]\nSteps: 80%|████████ | 803/1000 [08:32<02:05, 1.57it/s, loss=0.0241, lr=0.001]\nSteps: 80%|████████ | 803/1000 [08:32<02:05, 1.57it/s, loss=0.000268, lr=0.001]\nSteps: 80%|████████ | 804/1000 [08:32<02:03, 1.58it/s, loss=0.000268, lr=0.001]\ntensor(0.0061, device='cuda:0')\ntensor([[0.4080],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4090], device='cuda:0')\nSteps: 80%|████████ | 804/1000 [08:32<02:03, 1.58it/s, loss=0.0149, lr=0.001] \nSteps: 80%|████████ | 805/1000 [08:33<02:03, 1.58it/s, loss=0.0149, lr=0.001]\nSteps: 80%|████████ | 805/1000 [08:33<02:03, 1.58it/s, loss=0.00414, lr=0.001]\nSteps: 81%|████████ | 806/1000 [08:33<02:02, 1.58it/s, loss=0.00414, lr=0.001]\ntensor(0.0014, device='cuda:0')\ntensor([[0.4079],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4089], device='cuda:0')\nSteps: 81%|████████ | 806/1000 [08:33<02:02, 1.58it/s, loss=0.00103, lr=0.001]\nSteps: 81%|████████ | 807/1000 [08:34<02:02, 1.57it/s, loss=0.00103, lr=0.001]\nSteps: 81%|████████ | 807/1000 [08:34<02:02, 1.57it/s, loss=0.00419, lr=0.001]\nSteps: 81%|████████ | 808/1000 [08:35<02:00, 1.59it/s, loss=0.00419, lr=0.001]\ntensor(0.0009, device='cuda:0')\ntensor([[0.4078],\n[0.4098]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4088], device='cuda:0')\nSteps: 81%|████████ | 808/1000 [08:35<02:00, 1.59it/s, loss=0.000414, lr=0.001]\nSteps: 81%|████████ | 809/1000 [08:35<02:01, 1.58it/s, loss=0.000414, lr=0.001]\nSteps: 81%|████████ | 809/1000 [08:35<02:01, 1.58it/s, loss=0.000762, lr=0.001]\nSteps: 81%|████████ | 810/1000 [08:36<01:59, 1.59it/s, loss=0.000762, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4077],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4087], device='cuda:0')\nSteps: 81%|████████ | 810/1000 [08:36<01:59, 1.59it/s, loss=0.00104, lr=0.001] \nSteps: 81%|████████ | 811/1000 [08:37<02:00, 1.57it/s, loss=0.00104, lr=0.001]\nSteps: 81%|████████ | 811/1000 [08:37<02:00, 1.57it/s, loss=0.000707, lr=0.001]\nSteps: 81%|████████ | 812/1000 [08:37<01:59, 1.58it/s, loss=0.000707, lr=0.001]\ntensor(0.0070, device='cuda:0')\ntensor([[0.4076],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4086], device='cuda:0')\nSteps: 81%|████████ | 812/1000 [08:37<01:59, 1.58it/s, loss=0.0044, lr=0.001] \nSteps: 81%|████████▏ | 813/1000 [08:38<01:59, 1.57it/s, loss=0.0044, lr=0.001]\nSteps: 81%|████████▏ | 813/1000 [08:38<01:59, 1.57it/s, loss=0.0651, lr=0.001]\nSteps: 81%|████████▏ | 814/1000 [08:39<01:57, 1.58it/s, loss=0.0651, lr=0.001]\ntensor(0.0085, device='cuda:0')\ntensor([[0.4076],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4085], device='cuda:0')\nSteps: 81%|████████▏ | 814/1000 [08:39<01:57, 1.58it/s, loss=0.0106, lr=0.001]\nSteps: 82%|████████▏ | 815/1000 [08:39<01:57, 1.57it/s, loss=0.0106, lr=0.001]\nSteps: 82%|████████▏ | 815/1000 [08:39<01:57, 1.57it/s, loss=0.0596, lr=0.001]\nSteps: 82%|████████▏ | 816/1000 [08:40<01:56, 1.58it/s, loss=0.0596, lr=0.001]\ntensor(0.0065, device='cuda:0')\ntensor([[0.4076],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4084], device='cuda:0')\nSteps: 82%|████████▏ | 816/1000 [08:40<01:56, 1.58it/s, loss=0.0189, lr=0.001]\nSteps: 82%|████████▏ | 817/1000 [08:40<01:56, 1.57it/s, loss=0.0189, lr=0.001]\nSteps: 82%|████████▏ | 817/1000 [08:40<01:56, 1.57it/s, loss=0.0145, lr=0.001]\nSteps: 82%|████████▏ | 818/1000 [08:41<01:54, 1.58it/s, loss=0.0145, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4076],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4083], device='cuda:0')\nSteps: 82%|████████▏ | 818/1000 [08:41<01:54, 1.58it/s, loss=0.00208, lr=0.001]\nSteps: 82%|████████▏ | 819/1000 [08:42<01:55, 1.57it/s, loss=0.00208, lr=0.001]\nSteps: 82%|████████▏ | 819/1000 [08:42<01:55, 1.57it/s, loss=0.00879, lr=0.001]\nSteps: 82%|████████▏ | 820/1000 [08:42<01:53, 1.58it/s, loss=0.00879, lr=0.001]\ntensor(0.0078, device='cuda:0')\ntensor([[0.4076],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4083], device='cuda:0')\nSteps: 82%|████████▏ | 820/1000 [08:42<01:53, 1.58it/s, loss=0.000313, lr=0.001]\nSteps: 82%|████████▏ | 821/1000 [08:43<01:54, 1.56it/s, loss=0.000313, lr=0.001]\nSteps: 82%|████████▏ | 821/1000 [08:43<01:54, 1.56it/s, loss=0.011, lr=0.001] \nSteps: 82%|████████▏ | 822/1000 [08:44<01:52, 1.58it/s, loss=0.011, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4076],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4083], device='cuda:0')\nSteps: 82%|████████▏ | 822/1000 [08:44<01:52, 1.58it/s, loss=0.0259, lr=0.001]\nSteps: 82%|████████▏ | 823/1000 [08:44<01:53, 1.57it/s, loss=0.0259, lr=0.001]\nSteps: 82%|████████▏ | 823/1000 [08:44<01:53, 1.57it/s, loss=0.00135, lr=0.001]\nSteps: 82%|████████▏ | 824/1000 [08:45<01:51, 1.59it/s, loss=0.00135, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4077],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4083], device='cuda:0')\nSteps: 82%|████████▏ | 824/1000 [08:45<01:51, 1.59it/s, loss=0.0033, lr=0.001] \nSteps: 82%|████████▎ | 825/1000 [08:46<01:51, 1.57it/s, loss=0.0033, lr=0.001]\nSteps: 82%|████████▎ | 825/1000 [08:46<01:51, 1.57it/s, loss=0.00391, lr=0.001]\nSteps: 83%|████████▎ | 826/1000 [08:46<01:49, 1.58it/s, loss=0.00391, lr=0.001]\ntensor(0.0020, device='cuda:0')\ntensor([[0.4078],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4083], device='cuda:0')\nSteps: 83%|████████▎ | 826/1000 [08:46<01:49, 1.58it/s, loss=0.000745, lr=0.001]\nSteps: 83%|████████▎ | 827/1000 [08:47<01:50, 1.57it/s, loss=0.000745, lr=0.001]\nSteps: 83%|████████▎ | 827/1000 [08:47<01:50, 1.57it/s, loss=0.00252, lr=0.001] \nSteps: 83%|████████▎ | 828/1000 [08:47<01:48, 1.58it/s, loss=0.00252, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4078],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4083], device='cuda:0')\nSteps: 83%|████████▎ | 828/1000 [08:47<01:48, 1.58it/s, loss=0.00584, lr=0.001]\nSteps: 83%|████████▎ | 829/1000 [08:48<01:48, 1.57it/s, loss=0.00584, lr=0.001]\nSteps: 83%|████████▎ | 829/1000 [08:48<01:48, 1.57it/s, loss=0.013, lr=0.001] \nSteps: 83%|████████▎ | 830/1000 [08:49<01:47, 1.59it/s, loss=0.013, lr=0.001]\ntensor(0.0074, device='cuda:0')\ntensor([[0.4079],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4084], device='cuda:0')\nSteps: 83%|████████▎ | 830/1000 [08:49<01:47, 1.59it/s, loss=0.00509, lr=0.001]\nSteps: 83%|████████▎ | 831/1000 [08:49<01:47, 1.57it/s, loss=0.00509, lr=0.001]\nSteps: 83%|████████▎ | 831/1000 [08:49<01:47, 1.57it/s, loss=0.0333, lr=0.001] \nSteps: 83%|████████▎ | 832/1000 [08:50<01:45, 1.59it/s, loss=0.0333, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4080],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4084], device='cuda:0')\nSteps: 83%|████████▎ | 832/1000 [08:50<01:45, 1.59it/s, loss=0.00102, lr=0.001]\nSteps: 83%|████████▎ | 833/1000 [08:51<01:46, 1.57it/s, loss=0.00102, lr=0.001]\nSteps: 83%|████████▎ | 833/1000 [08:51<01:46, 1.57it/s, loss=0.00504, lr=0.001]\nSteps: 83%|████████▎ | 834/1000 [08:51<01:45, 1.58it/s, loss=0.00504, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4081],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4084], device='cuda:0')\nSteps: 83%|████████▎ | 834/1000 [08:51<01:45, 1.58it/s, loss=0.0171, lr=0.001] \nSteps: 84%|████████▎ | 835/1000 [08:52<01:45, 1.56it/s, loss=0.0171, lr=0.001]\nSteps: 84%|████████▎ | 835/1000 [08:52<01:45, 1.56it/s, loss=0.00969, lr=0.001]\nSteps: 84%|████████▎ | 836/1000 [08:52<01:44, 1.58it/s, loss=0.00969, lr=0.001]\ntensor(0.0010, device='cuda:0')\ntensor([[0.4081],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4085], device='cuda:0')\nSteps: 84%|████████▎ | 836/1000 [08:52<01:44, 1.58it/s, loss=0.00105, lr=0.001]\nSteps: 84%|████████▎ | 837/1000 [08:53<01:44, 1.57it/s, loss=0.00105, lr=0.001]\nSteps: 84%|████████▎ | 837/1000 [08:53<01:44, 1.57it/s, loss=0.000342, lr=0.001]\nSteps: 84%|████████▍ | 838/1000 [08:54<01:42, 1.58it/s, loss=0.000342, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4082],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4074, 0.4085], device='cuda:0')\nSteps: 84%|████████▍ | 838/1000 [08:54<01:42, 1.58it/s, loss=0.0134, lr=0.001] \nSteps: 84%|████████▍ | 839/1000 [08:54<01:42, 1.57it/s, loss=0.0134, lr=0.001]\nSteps: 84%|████████▍ | 839/1000 [08:54<01:42, 1.57it/s, loss=0.0579, lr=0.001]\nSteps: 84%|████████▍ | 840/1000 [08:55<01:41, 1.57it/s, loss=0.0579, lr=0.001]\ntensor(0.0016, device='cuda:0')\ntensor([[0.4081],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4084], device='cuda:0')\nSteps: 84%|████████▍ | 840/1000 [08:55<01:41, 1.57it/s, loss=0.00562, lr=0.001]\nSteps: 84%|████████▍ | 841/1000 [08:56<01:41, 1.56it/s, loss=0.00562, lr=0.001]\nSteps: 84%|████████▍ | 841/1000 [08:56<01:41, 1.56it/s, loss=0.000182, lr=0.001]\nSteps: 84%|████████▍ | 842/1000 [08:56<01:40, 1.57it/s, loss=0.000182, lr=0.001]\ntensor(0.0037, device='cuda:0')\ntensor([[0.4080],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4083], device='cuda:0')\nSteps: 84%|████████▍ | 842/1000 [08:56<01:40, 1.57it/s, loss=0.00387, lr=0.001] \nSteps: 84%|████████▍ | 843/1000 [08:57<01:40, 1.57it/s, loss=0.00387, lr=0.001]\nSteps: 84%|████████▍ | 843/1000 [08:57<01:40, 1.57it/s, loss=0.00379, lr=0.001]\nSteps: 84%|████████▍ | 844/1000 [08:58<01:38, 1.59it/s, loss=0.00379, lr=0.001]\ntensor(0.0019, device='cuda:0')\ntensor([[0.4079],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4072, 0.4082], device='cuda:0')\nSteps: 84%|████████▍ | 844/1000 [08:58<01:38, 1.59it/s, loss=0.00819, lr=0.001]\nSteps: 84%|████████▍ | 845/1000 [08:58<01:38, 1.57it/s, loss=0.00819, lr=0.001]\nSteps: 84%|████████▍ | 845/1000 [08:58<01:38, 1.57it/s, loss=0.00126, lr=0.001]\nSteps: 85%|████████▍ | 846/1000 [08:59<01:37, 1.58it/s, loss=0.00126, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4078],\n[0.4090]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4081], device='cuda:0')\nSteps: 85%|████████▍ | 846/1000 [08:59<01:37, 1.58it/s, loss=0.000191, lr=0.001]\nSteps: 85%|████████▍ | 847/1000 [08:59<01:37, 1.57it/s, loss=0.000191, lr=0.001]\nSteps: 85%|████████▍ | 847/1000 [08:59<01:37, 1.57it/s, loss=0.00379, lr=0.001] \nSteps: 85%|████████▍ | 848/1000 [09:00<01:35, 1.58it/s, loss=0.00379, lr=0.001]\ntensor(0.0109, device='cuda:0')\ntensor([[0.4077],\n[0.4088]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4079], device='cuda:0')\nSteps: 85%|████████▍ | 848/1000 [09:00<01:35, 1.58it/s, loss=0.0238, lr=0.001] \nSteps: 85%|████████▍ | 849/1000 [09:01<01:35, 1.58it/s, loss=0.0238, lr=0.001]\nSteps: 85%|████████▍ | 849/1000 [09:01<01:35, 1.58it/s, loss=0.0042, lr=0.001]\nSteps: 85%|████████▌ | 850/1000 [09:01<01:34, 1.59it/s, loss=0.0042, lr=0.001]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4075],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4078], device='cuda:0')\nSteps: 85%|████████▌ | 850/1000 [09:01<01:34, 1.59it/s, loss=0.0154, lr=0.001]\nSteps: 85%|████████▌ | 851/1000 [09:02<01:34, 1.58it/s, loss=0.0154, lr=0.001]\nSteps: 85%|████████▌ | 851/1000 [09:02<01:34, 1.58it/s, loss=0.0067, lr=0.001]\nSteps: 85%|████████▌ | 852/1000 [09:03<01:33, 1.58it/s, loss=0.0067, lr=0.001]\ntensor(0.0052, device='cuda:0')\ntensor([[0.4074],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4076], device='cuda:0')\nSteps: 85%|████████▌ | 852/1000 [09:03<01:33, 1.58it/s, loss=0.000584, lr=0.001]\nSteps: 85%|████████▌ | 853/1000 [09:03<01:33, 1.56it/s, loss=0.000584, lr=0.001]\nSteps: 85%|████████▌ | 853/1000 [09:03<01:33, 1.56it/s, loss=0.0849, lr=0.001] \nSteps: 85%|████████▌ | 854/1000 [09:04<01:32, 1.57it/s, loss=0.0849, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4073],\n[0.4084]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4076], device='cuda:0')\nSteps: 85%|████████▌ | 854/1000 [09:04<01:32, 1.57it/s, loss=0.036, lr=0.001] \nSteps: 86%|████████▌ | 855/1000 [09:05<01:32, 1.56it/s, loss=0.036, lr=0.001]\nSteps: 86%|████████▌ | 855/1000 [09:05<01:32, 1.56it/s, loss=0.016, lr=0.001]\nSteps: 86%|████████▌ | 856/1000 [09:05<01:31, 1.58it/s, loss=0.016, lr=0.001]\ntensor(0.0053, device='cuda:0')\ntensor([[0.4074],\n[0.4084]], device='cuda:0')\nCurrent Norm : tensor([0.4066, 0.4076], device='cuda:0')\nSteps: 86%|████████▌ | 856/1000 [09:05<01:31, 1.58it/s, loss=0.018, lr=0.001]\nSteps: 86%|████████▌ | 857/1000 [09:06<01:31, 1.57it/s, loss=0.018, lr=0.001]\nSteps: 86%|████████▌ | 857/1000 [09:06<01:31, 1.57it/s, loss=0.0116, lr=0.001]\nSteps: 86%|████████▌ | 858/1000 [09:06<01:29, 1.58it/s, loss=0.0116, lr=0.001]\ntensor(0.0047, device='cuda:0')\ntensor([[0.4074],\n[0.4085]], device='cuda:0')\nCurrent Norm : tensor([0.4067, 0.4077], device='cuda:0')\nSteps: 86%|████████▌ | 858/1000 [09:06<01:29, 1.58it/s, loss=0.00814, lr=0.001]\nSteps: 86%|████████▌ | 859/1000 [09:07<01:29, 1.57it/s, loss=0.00814, lr=0.001]\nSteps: 86%|████████▌ | 859/1000 [09:07<01:29, 1.57it/s, loss=0.00405, lr=0.001]\nSteps: 86%|████████▌ | 860/1000 [09:08<01:28, 1.58it/s, loss=0.00405, lr=0.001]\ntensor(0.0039, device='cuda:0')\ntensor([[0.4075],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4068, 0.4077], device='cuda:0')\nSteps: 86%|████████▌ | 860/1000 [09:08<01:28, 1.58it/s, loss=0.0098, lr=0.001] \nSteps: 86%|████████▌ | 861/1000 [09:08<01:28, 1.57it/s, loss=0.0098, lr=0.001]\nSteps: 86%|████████▌ | 861/1000 [09:08<01:28, 1.57it/s, loss=0.01, lr=0.001] \nSteps: 86%|████████▌ | 862/1000 [09:09<01:27, 1.58it/s, loss=0.01, lr=0.001]\ntensor(0.0073, device='cuda:0')\ntensor([[0.4077],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4069, 0.4077], device='cuda:0')\nSteps: 86%|████████▌ | 862/1000 [09:09<01:27, 1.58it/s, loss=0.00465, lr=0.001]\nSteps: 86%|████████▋ | 863/1000 [09:10<01:27, 1.57it/s, loss=0.00465, lr=0.001]\nSteps: 86%|████████▋ | 863/1000 [09:10<01:27, 1.57it/s, loss=0.0772, lr=0.001] \nSteps: 86%|████████▋ | 864/1000 [09:10<01:26, 1.58it/s, loss=0.0772, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4079],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4077], device='cuda:0')\nSteps: 86%|████████▋ | 864/1000 [09:10<01:26, 1.58it/s, loss=0.000222, lr=0.001]\nSteps: 86%|████████▋ | 865/1000 [09:11<01:26, 1.56it/s, loss=0.000222, lr=0.001]\nSteps: 86%|████████▋ | 865/1000 [09:11<01:26, 1.56it/s, loss=0.015, lr=0.001] \nSteps: 87%|████████▋ | 866/1000 [09:12<01:24, 1.58it/s, loss=0.015, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4081],\n[0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4078], device='cuda:0')\nSteps: 87%|████████▋ | 866/1000 [09:12<01:24, 1.58it/s, loss=0.0187, lr=0.001]\nSteps: 87%|████████▋ | 867/1000 [09:12<01:25, 1.56it/s, loss=0.0187, lr=0.001]\nSteps: 87%|████████▋ | 867/1000 [09:12<01:25, 1.56it/s, loss=0.0145, lr=0.001]\nSteps: 87%|████████▋ | 868/1000 [09:13<01:23, 1.57it/s, loss=0.0145, lr=0.001]\ntensor(0.0118, device='cuda:0')\ntensor([[0.4083],\n[0.4087]], device='cuda:0')\nCurrent Norm : tensor([0.4074, 0.4079], device='cuda:0')\nSteps: 87%|████████▋ | 868/1000 [09:13<01:23, 1.57it/s, loss=0.0673, lr=0.001]\nSteps: 87%|████████▋ | 869/1000 [09:13<01:23, 1.56it/s, loss=0.0673, lr=0.001]\nSteps: 87%|████████▋ | 869/1000 [09:13<01:23, 1.56it/s, loss=0.052, lr=0.001] \nSteps: 87%|████████▋ | 870/1000 [09:14<01:22, 1.58it/s, loss=0.052, lr=0.001]\ntensor(0.0075, device='cuda:0')\ntensor([[0.4086],\n[0.4089]], device='cuda:0')\nCurrent Norm : tensor([0.4077, 0.4080], device='cuda:0')\nSteps: 87%|████████▋ | 870/1000 [09:14<01:22, 1.58it/s, loss=0.0164, lr=0.001]\nSteps: 87%|████████▋ | 871/1000 [09:15<01:22, 1.57it/s, loss=0.0164, lr=0.001]\nSteps: 87%|████████▋ | 871/1000 [09:15<01:22, 1.57it/s, loss=0.0497, lr=0.001]\nSteps: 87%|████████▋ | 872/1000 [09:15<01:20, 1.58it/s, loss=0.0497, lr=0.001]\ntensor(0.0013, device='cuda:0')\ntensor([[0.4089],\n[0.4091]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4082], device='cuda:0')\nSteps: 87%|████████▋ | 872/1000 [09:15<01:20, 1.58it/s, loss=0.000777, lr=0.001]\nSteps: 87%|████████▋ | 873/1000 [09:16<01:20, 1.57it/s, loss=0.000777, lr=0.001]\nSteps: 87%|████████▋ | 873/1000 [09:16<01:20, 1.57it/s, loss=0.00841, lr=0.001] \nSteps: 87%|████████▋ | 874/1000 [09:17<01:19, 1.58it/s, loss=0.00841, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4092],\n[0.4093]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4084], device='cuda:0')\nSteps: 87%|████████▋ | 874/1000 [09:17<01:19, 1.58it/s, loss=0.0873, lr=0.001] \nSteps: 88%|████████▊ | 875/1000 [09:17<01:19, 1.57it/s, loss=0.0873, lr=0.001]\nSteps: 88%|████████▊ | 875/1000 [09:17<01:19, 1.57it/s, loss=0.000283, lr=0.001]\nSteps: 88%|████████▊ | 876/1000 [09:18<01:18, 1.57it/s, loss=0.000283, lr=0.001]\ntensor(0.0094, device='cuda:0')\ntensor([[0.4094],\n[0.4095]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4085], device='cuda:0')\nSteps: 88%|████████▊ | 876/1000 [09:18<01:18, 1.57it/s, loss=0.0306, lr=0.001] \nSteps: 88%|████████▊ | 877/1000 [09:19<01:18, 1.56it/s, loss=0.0306, lr=0.001]\nSteps: 88%|████████▊ | 877/1000 [09:19<01:18, 1.56it/s, loss=0.0644, lr=0.001]\nSteps: 88%|████████▊ | 878/1000 [09:19<01:17, 1.58it/s, loss=0.0644, lr=0.001]\ntensor(0.0087, device='cuda:0')\ntensor([[0.4096],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4087, 0.4089], device='cuda:0')\nSteps: 88%|████████▊ | 878/1000 [09:19<01:17, 1.58it/s, loss=0.0373, lr=0.001]\nSteps: 88%|████████▊ | 879/1000 [09:20<01:16, 1.58it/s, loss=0.0373, lr=0.001]\nSteps: 88%|████████▊ | 879/1000 [09:20<01:16, 1.58it/s, loss=0.00094, lr=0.001]\nSteps: 88%|████████▊ | 880/1000 [09:20<01:15, 1.59it/s, loss=0.00094, lr=0.001]\ntensor(0.0046, device='cuda:0')\ntensor([[0.4099],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4092], device='cuda:0')\nSteps: 88%|████████▊ | 880/1000 [09:20<01:15, 1.59it/s, loss=0.0125, lr=0.001] \nSteps: 88%|████████▊ | 881/1000 [09:21<01:15, 1.58it/s, loss=0.0125, lr=0.001]\nSteps: 88%|████████▊ | 881/1000 [09:21<01:15, 1.58it/s, loss=0.0143, lr=0.001]\nSteps: 88%|████████▊ | 882/1000 [09:22<01:14, 1.59it/s, loss=0.0143, lr=0.001]\ntensor(0.0109, device='cuda:0')\ntensor([[0.4102],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4096], device='cuda:0')\nSteps: 88%|████████▊ | 882/1000 [09:22<01:14, 1.59it/s, loss=0.0168, lr=0.001]\nSteps: 88%|████████▊ | 883/1000 [09:22<01:14, 1.58it/s, loss=0.0168, lr=0.001]\nSteps: 88%|████████▊ | 883/1000 [09:22<01:14, 1.58it/s, loss=0.00371, lr=0.001]\nSteps: 88%|████████▊ | 884/1000 [09:23<01:13, 1.59it/s, loss=0.00371, lr=0.001]\ntensor(0.0032, device='cuda:0')\ntensor([[0.4104],\n[0.4110]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4099], device='cuda:0')\nSteps: 88%|████████▊ | 884/1000 [09:23<01:13, 1.59it/s, loss=0.000795, lr=0.001]\nSteps: 88%|████████▊ | 885/1000 [09:24<01:13, 1.58it/s, loss=0.000795, lr=0.001]\nSteps: 88%|████████▊ | 885/1000 [09:24<01:13, 1.58it/s, loss=0.00487, lr=0.001] \nSteps: 89%|████████▊ | 886/1000 [09:24<01:12, 1.58it/s, loss=0.00487, lr=0.001]\ntensor(0.0044, device='cuda:0')\ntensor([[0.4106],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4101], device='cuda:0')\nSteps: 89%|████████▊ | 886/1000 [09:24<01:12, 1.58it/s, loss=0.000165, lr=0.001]\nSteps: 89%|████████▊ | 887/1000 [09:25<01:11, 1.57it/s, loss=0.000165, lr=0.001]\nSteps: 89%|████████▊ | 887/1000 [09:25<01:11, 1.57it/s, loss=0.011, lr=0.001] \nSteps: 89%|████████▉ | 888/1000 [09:25<01:10, 1.59it/s, loss=0.011, lr=0.001]\ntensor(0.0014, device='cuda:0')\ntensor([[0.4108],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4103], device='cuda:0')\nSteps: 89%|████████▉ | 888/1000 [09:25<01:10, 1.59it/s, loss=0.000376, lr=0.001]\nSteps: 89%|████████▉ | 889/1000 [09:26<01:10, 1.58it/s, loss=0.000376, lr=0.001]\nSteps: 89%|████████▉ | 889/1000 [09:26<01:10, 1.58it/s, loss=0.00271, lr=0.001] \nSteps: 89%|████████▉ | 890/1000 [09:27<01:09, 1.58it/s, loss=0.00271, lr=0.001]\ntensor(0.0069, device='cuda:0')\ntensor([[0.4109],\n[0.4117]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4105], device='cuda:0')\nSteps: 89%|████████▉ | 890/1000 [09:27<01:09, 1.58it/s, loss=0.107, lr=0.001] \nSteps: 89%|████████▉ | 891/1000 [09:27<01:09, 1.57it/s, loss=0.107, lr=0.001]\nSteps: 89%|████████▉ | 891/1000 [09:27<01:09, 1.57it/s, loss=0.00925, lr=0.001]\nSteps: 89%|████████▉ | 892/1000 [09:28<01:07, 1.59it/s, loss=0.00925, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4110],\n[0.4118]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4106], device='cuda:0')\nSteps: 89%|████████▉ | 892/1000 [09:28<01:07, 1.59it/s, loss=0.00599, lr=0.001]\nSteps: 89%|████████▉ | 893/1000 [09:29<01:07, 1.58it/s, loss=0.00599, lr=0.001]\nSteps: 89%|████████▉ | 893/1000 [09:29<01:07, 1.58it/s, loss=0.00141, lr=0.001]\nSteps: 89%|████████▉ | 894/1000 [09:29<01:06, 1.60it/s, loss=0.00141, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4110],\n[0.4119]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4107], device='cuda:0')\nSteps: 89%|████████▉ | 894/1000 [09:29<01:06, 1.60it/s, loss=0.00575, lr=0.001]\nSteps: 90%|████████▉ | 895/1000 [09:30<01:06, 1.58it/s, loss=0.00575, lr=0.001]\nSteps: 90%|████████▉ | 895/1000 [09:30<01:06, 1.58it/s, loss=0.00116, lr=0.001]\nSteps: 90%|████████▉ | 896/1000 [09:31<01:05, 1.59it/s, loss=0.00116, lr=0.001]\ntensor(0.0084, device='cuda:0')\ntensor([[0.4110],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4108], device='cuda:0')\nSteps: 90%|████████▉ | 896/1000 [09:31<01:05, 1.59it/s, loss=0.0673, lr=0.001] \nSteps: 90%|████████▉ | 897/1000 [09:31<01:05, 1.58it/s, loss=0.0673, lr=0.001]\nSteps: 90%|████████▉ | 897/1000 [09:31<01:05, 1.58it/s, loss=0.0103, lr=0.001]\nSteps: 90%|████████▉ | 898/1000 [09:32<01:04, 1.59it/s, loss=0.0103, lr=0.001]\ntensor(0.0061, device='cuda:0')\ntensor([[0.4110],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4108], device='cuda:0')\nSteps: 90%|████████▉ | 898/1000 [09:32<01:04, 1.59it/s, loss=0.04, lr=0.001] \nSteps: 90%|████████▉ | 899/1000 [09:32<01:03, 1.58it/s, loss=0.04, lr=0.001]\nSteps: 90%|████████▉ | 899/1000 [09:32<01:03, 1.58it/s, loss=0.0017, lr=0.001]\nSteps: 90%|█████████ | 900/1000 [09:33<01:02, 1.59it/s, loss=0.0017, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0191, 0.0026, -0.0084, -0.0223], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0061, 0.0070, -0.0055, 0.0103], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_900.safetensors\ntensor(0.0058, device='cuda:0')\ntensor([[0.4110],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 90%|█████████ | 900/1000 [09:33<01:02, 1.59it/s, loss=0.00088, lr=0.001]\nSteps: 90%|█████████ | 901/1000 [09:34<01:02, 1.58it/s, loss=0.00088, lr=0.001]\nSteps: 90%|█████████ | 901/1000 [09:34<01:02, 1.58it/s, loss=0.0159, lr=0.001] \nSteps: 90%|█████████ | 902/1000 [09:34<01:01, 1.58it/s, loss=0.0159, lr=0.001]\ntensor(0.0033, device='cuda:0')\ntensor([[0.4110],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 90%|█████████ | 902/1000 [09:34<01:01, 1.58it/s, loss=0.00016, lr=0.001]\nSteps: 90%|█████████ | 903/1000 [09:35<01:01, 1.57it/s, loss=0.00016, lr=0.001]\nSteps: 90%|█████████ | 903/1000 [09:35<01:01, 1.57it/s, loss=0.00761, lr=0.001]\nSteps: 90%|█████████ | 904/1000 [09:36<01:00, 1.58it/s, loss=0.00761, lr=0.001]\ntensor(0.0021, device='cuda:0')\ntensor([[0.4110],\n[0.4122]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4110], device='cuda:0')\nSteps: 90%|█████████ | 904/1000 [09:36<01:00, 1.58it/s, loss=0.00334, lr=0.001]\nSteps: 90%|█████████ | 905/1000 [09:36<01:00, 1.57it/s, loss=0.00334, lr=0.001]\nSteps: 90%|█████████ | 905/1000 [09:36<01:00, 1.57it/s, loss=0.00641, lr=0.001]\nSteps: 91%|█████████ | 906/1000 [09:37<00:59, 1.59it/s, loss=0.00641, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4110],\n[0.4122]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 91%|█████████ | 906/1000 [09:37<00:59, 1.59it/s, loss=0.00472, lr=0.001]\nSteps: 91%|█████████ | 907/1000 [09:38<00:58, 1.58it/s, loss=0.00472, lr=0.001]\nSteps: 91%|█████████ | 907/1000 [09:38<00:58, 1.58it/s, loss=0.00117, lr=0.001]\nSteps: 91%|█████████ | 908/1000 [09:38<00:57, 1.59it/s, loss=0.00117, lr=0.001]\ntensor(0.0101, device='cuda:0')\ntensor([[0.4110],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 91%|█████████ | 908/1000 [09:38<00:57, 1.59it/s, loss=0.0283, lr=0.001] \nSteps: 91%|█████████ | 909/1000 [09:39<00:57, 1.58it/s, loss=0.0283, lr=0.001]\nSteps: 91%|█████████ | 909/1000 [09:39<00:57, 1.58it/s, loss=0.0145, lr=0.001]\nSteps: 91%|█████████ | 910/1000 [09:39<00:56, 1.58it/s, loss=0.0145, lr=0.001]\ntensor(0.0054, device='cuda:0')\ntensor([[0.4110],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 91%|█████████ | 910/1000 [09:39<00:56, 1.58it/s, loss=0.000187, lr=0.001]\nSteps: 91%|█████████ | 911/1000 [09:40<00:56, 1.56it/s, loss=0.000187, lr=0.001]\nSteps: 91%|█████████ | 911/1000 [09:40<00:56, 1.56it/s, loss=0.0178, lr=0.001] \nSteps: 91%|█████████ | 912/1000 [09:41<00:55, 1.58it/s, loss=0.0178, lr=0.001]\ntensor(0.0091, device='cuda:0')\ntensor([[0.4110],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4109], device='cuda:0')\nSteps: 91%|█████████ | 912/1000 [09:41<00:55, 1.58it/s, loss=0.0288, lr=0.001]\nSteps: 91%|█████████▏| 913/1000 [09:41<00:55, 1.57it/s, loss=0.0288, lr=0.001]\nSteps: 91%|█████████▏| 913/1000 [09:41<00:55, 1.57it/s, loss=0.0104, lr=0.001]\nSteps: 91%|█████████▏| 914/1000 [09:42<00:54, 1.58it/s, loss=0.0104, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4111],\n[0.4122]], device='cuda:0')\nCurrent Norm : tensor([0.4100, 0.4109], device='cuda:0')\nSteps: 91%|█████████▏| 914/1000 [09:42<00:54, 1.58it/s, loss=0.00321, lr=0.001]\nSteps: 92%|█████████▏| 915/1000 [09:43<00:54, 1.57it/s, loss=0.00321, lr=0.001]\nSteps: 92%|█████████▏| 915/1000 [09:43<00:54, 1.57it/s, loss=0.00369, lr=0.001]\nSteps: 92%|█████████▏| 916/1000 [09:43<00:53, 1.58it/s, loss=0.00369, lr=0.001]\ntensor(0.0083, device='cuda:0')\ntensor([[0.4111],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4100, 0.4109], device='cuda:0')\nSteps: 92%|█████████▏| 916/1000 [09:43<00:53, 1.58it/s, loss=0.0451, lr=0.001] \nSteps: 92%|█████████▏| 917/1000 [09:44<00:52, 1.57it/s, loss=0.0451, lr=0.001]\nSteps: 92%|█████████▏| 917/1000 [09:44<00:52, 1.57it/s, loss=0.00412, lr=0.001]\nSteps: 92%|█████████▏| 918/1000 [09:44<00:51, 1.58it/s, loss=0.00412, lr=0.001]\ntensor(0.0099, device='cuda:0')\ntensor([[0.4111],\n[0.4121]], device='cuda:0')\nCurrent Norm : tensor([0.4100, 0.4109], device='cuda:0')\nSteps: 92%|█████████▏| 918/1000 [09:44<00:51, 1.58it/s, loss=0.0192, lr=0.001] \nSteps: 92%|█████████▏| 919/1000 [09:45<00:51, 1.57it/s, loss=0.0192, lr=0.001]\nSteps: 92%|█████████▏| 919/1000 [09:45<00:51, 1.57it/s, loss=0.00786, lr=0.001]\nSteps: 92%|█████████▏| 920/1000 [09:46<00:50, 1.58it/s, loss=0.00786, lr=0.001]\ntensor(0.0030, device='cuda:0')\ntensor([[0.4111],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4100, 0.4108], device='cuda:0')\nSteps: 92%|█████████▏| 920/1000 [09:46<00:50, 1.58it/s, loss=0.00157, lr=0.001]\nSteps: 92%|█████████▏| 921/1000 [09:46<00:50, 1.57it/s, loss=0.00157, lr=0.001]\nSteps: 92%|█████████▏| 921/1000 [09:46<00:50, 1.57it/s, loss=0.00689, lr=0.001]\nSteps: 92%|█████████▏| 922/1000 [09:47<00:49, 1.58it/s, loss=0.00689, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4111],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4100, 0.4108], device='cuda:0')\nSteps: 92%|█████████▏| 922/1000 [09:47<00:49, 1.58it/s, loss=0.00361, lr=0.001]\nSteps: 92%|█████████▏| 923/1000 [09:48<00:49, 1.56it/s, loss=0.00361, lr=0.001]\nSteps: 92%|█████████▏| 923/1000 [09:48<00:49, 1.56it/s, loss=0.000176, lr=0.001]\nSteps: 92%|█████████▏| 924/1000 [09:48<00:48, 1.57it/s, loss=0.000176, lr=0.001]\ntensor(0.0025, device='cuda:0')\ntensor([[0.4110],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4099, 0.4108], device='cuda:0')\nSteps: 92%|█████████▏| 924/1000 [09:48<00:48, 1.57it/s, loss=0.0106, lr=0.001] \nSteps: 92%|█████████▎| 925/1000 [09:49<00:47, 1.57it/s, loss=0.0106, lr=0.001]\nSteps: 92%|█████████▎| 925/1000 [09:49<00:47, 1.57it/s, loss=0.0317, lr=0.001]\nSteps: 93%|█████████▎| 926/1000 [09:50<00:46, 1.59it/s, loss=0.0317, lr=0.001]\ntensor(0.0038, device='cuda:0')\ntensor([[0.4109],\n[0.4120]], device='cuda:0')\nCurrent Norm : tensor([0.4098, 0.4108], device='cuda:0')\nSteps: 93%|█████████▎| 926/1000 [09:50<00:46, 1.59it/s, loss=0.000973, lr=0.001]\nSteps: 93%|█████████▎| 927/1000 [09:50<00:46, 1.58it/s, loss=0.000973, lr=0.001]\nSteps: 93%|█████████▎| 927/1000 [09:50<00:46, 1.58it/s, loss=0.00414, lr=0.001] \nSteps: 93%|█████████▎| 928/1000 [09:51<00:45, 1.59it/s, loss=0.00414, lr=0.001]\ntensor(0.0077, device='cuda:0')\ntensor([[0.4108],\n[0.4119]], device='cuda:0')\nCurrent Norm : tensor([0.4097, 0.4107], device='cuda:0')\nSteps: 93%|█████████▎| 928/1000 [09:51<00:45, 1.59it/s, loss=0.00288, lr=0.001]\nSteps: 93%|█████████▎| 929/1000 [09:51<00:44, 1.58it/s, loss=0.00288, lr=0.001]\nSteps: 93%|█████████▎| 929/1000 [09:51<00:44, 1.58it/s, loss=0.0412, lr=0.001] \nSteps: 93%|█████████▎| 930/1000 [09:52<00:44, 1.58it/s, loss=0.0412, lr=0.001]\ntensor(0.0071, device='cuda:0')\ntensor([[0.4106],\n[0.4117]], device='cuda:0')\nCurrent Norm : tensor([0.4096, 0.4106], device='cuda:0')\nSteps: 93%|█████████▎| 930/1000 [09:52<00:44, 1.58it/s, loss=0.000763, lr=0.001]\nSteps: 93%|█████████▎| 931/1000 [09:53<00:43, 1.57it/s, loss=0.000763, lr=0.001]\nSteps: 93%|█████████▎| 931/1000 [09:53<00:43, 1.57it/s, loss=0.0899, lr=0.001] \nSteps: 93%|█████████▎| 932/1000 [09:53<00:43, 1.58it/s, loss=0.0899, lr=0.001]\ntensor(0.0035, device='cuda:0')\ntensor([[0.4105],\n[0.4116]], device='cuda:0')\nCurrent Norm : tensor([0.4094, 0.4104], device='cuda:0')\nSteps: 93%|█████████▎| 932/1000 [09:53<00:43, 1.58it/s, loss=0.00604, lr=0.001]\nSteps: 93%|█████████▎| 933/1000 [09:54<00:42, 1.58it/s, loss=0.00604, lr=0.001]\nSteps: 93%|█████████▎| 933/1000 [09:54<00:42, 1.58it/s, loss=0.00167, lr=0.001]\nSteps: 93%|█████████▎| 934/1000 [09:55<00:41, 1.59it/s, loss=0.00167, lr=0.001]\ntensor(0.0070, device='cuda:0')\ntensor([[0.4102],\n[0.4115]], device='cuda:0')\nCurrent Norm : tensor([0.4092, 0.4103], device='cuda:0')\nSteps: 93%|█████████▎| 934/1000 [09:55<00:41, 1.59it/s, loss=0.00276, lr=0.001]\nSteps: 94%|█████████▎| 935/1000 [09:55<00:41, 1.57it/s, loss=0.00276, lr=0.001]\nSteps: 94%|█████████▎| 935/1000 [09:55<00:41, 1.57it/s, loss=0.0862, lr=0.001] \nSteps: 94%|█████████▎| 936/1000 [09:56<00:40, 1.58it/s, loss=0.0862, lr=0.001]\ntensor(0.0059, device='cuda:0')\ntensor([[0.4099],\n[0.4114]], device='cuda:0')\nCurrent Norm : tensor([0.4089, 0.4103], device='cuda:0')\nSteps: 94%|█████████▎| 936/1000 [09:56<00:40, 1.58it/s, loss=0.0186, lr=0.001]\nSteps: 94%|█████████▎| 937/1000 [09:57<00:40, 1.57it/s, loss=0.0186, lr=0.001]\nSteps: 94%|█████████▎| 937/1000 [09:57<00:40, 1.57it/s, loss=0.000916, lr=0.001]\nSteps: 94%|█████████▍| 938/1000 [09:57<00:39, 1.58it/s, loss=0.000916, lr=0.001]\ntensor(0.0056, device='cuda:0')\ntensor([[0.4096],\n[0.4113]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4102], device='cuda:0')\nSteps: 94%|█████████▍| 938/1000 [09:57<00:39, 1.58it/s, loss=0.0088, lr=0.001] \nSteps: 94%|█████████▍| 939/1000 [09:58<00:38, 1.57it/s, loss=0.0088, lr=0.001]\nSteps: 94%|█████████▍| 939/1000 [09:58<00:38, 1.57it/s, loss=0.000176, lr=0.001]\nSteps: 94%|█████████▍| 940/1000 [09:58<00:37, 1.59it/s, loss=0.000176, lr=0.001]\ntensor(0.0058, device='cuda:0')\ntensor([[0.4093],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4101], device='cuda:0')\nSteps: 94%|█████████▍| 940/1000 [09:58<00:37, 1.59it/s, loss=0.0042, lr=0.001] \nSteps: 94%|█████████▍| 941/1000 [09:59<00:37, 1.57it/s, loss=0.0042, lr=0.001]\nSteps: 94%|█████████▍| 941/1000 [09:59<00:37, 1.57it/s, loss=0.0133, lr=0.001]\nSteps: 94%|█████████▍| 942/1000 [10:00<00:36, 1.58it/s, loss=0.0133, lr=0.001]\ntensor(0.0099, device='cuda:0')\ntensor([[0.4091],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4101], device='cuda:0')\nSteps: 94%|█████████▍| 942/1000 [10:00<00:36, 1.58it/s, loss=0.092, lr=0.001] \nSteps: 94%|█████████▍| 943/1000 [10:00<00:36, 1.57it/s, loss=0.092, lr=0.001]\nSteps: 94%|█████████▍| 943/1000 [10:00<00:36, 1.57it/s, loss=0.0613, lr=0.001]\nSteps: 94%|█████████▍| 944/1000 [10:01<00:35, 1.59it/s, loss=0.0613, lr=0.001]\ntensor(0.0089, device='cuda:0')\ntensor([[0.4090],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4101], device='cuda:0')\nSteps: 94%|█████████▍| 944/1000 [10:01<00:35, 1.59it/s, loss=0.019, lr=0.001] \nSteps: 94%|█████████▍| 945/1000 [10:02<00:34, 1.58it/s, loss=0.019, lr=0.001]\nSteps: 94%|█████████▍| 945/1000 [10:02<00:34, 1.58it/s, loss=0.0166, lr=0.001]\nSteps: 95%|█████████▍| 946/1000 [10:02<00:34, 1.58it/s, loss=0.0166, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4090],\n[0.4113]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4101], device='cuda:0')\nSteps: 95%|█████████▍| 946/1000 [10:02<00:34, 1.58it/s, loss=0.00476, lr=0.001]\nSteps: 95%|█████████▍| 947/1000 [10:03<00:33, 1.57it/s, loss=0.00476, lr=0.001]\nSteps: 95%|█████████▍| 947/1000 [10:03<00:33, 1.57it/s, loss=0.00987, lr=0.001]\nSteps: 95%|█████████▍| 948/1000 [10:04<00:32, 1.58it/s, loss=0.00987, lr=0.001]\ntensor(0.0051, device='cuda:0')\ntensor([[0.4090],\n[0.4113]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4102], device='cuda:0')\nSteps: 95%|█████████▍| 948/1000 [10:04<00:32, 1.58it/s, loss=0.00904, lr=0.001]\nSteps: 95%|█████████▍| 949/1000 [10:04<00:32, 1.57it/s, loss=0.00904, lr=0.001]\nSteps: 95%|█████████▍| 949/1000 [10:04<00:32, 1.57it/s, loss=0.00788, lr=0.001]\nSteps: 95%|█████████▌| 950/1000 [10:05<00:31, 1.58it/s, loss=0.00788, lr=0.001]\ntensor(0.0036, device='cuda:0')\ntensor([[0.4091],\n[0.4113]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4102], device='cuda:0')\nSteps: 95%|█████████▌| 950/1000 [10:05<00:31, 1.58it/s, loss=0.00126, lr=0.001]\nSteps: 95%|█████████▌| 951/1000 [10:05<00:31, 1.56it/s, loss=0.00126, lr=0.001]\nSteps: 95%|█████████▌| 951/1000 [10:05<00:31, 1.56it/s, loss=0.00716, lr=0.001]\nSteps: 95%|█████████▌| 952/1000 [10:06<00:30, 1.58it/s, loss=0.00716, lr=0.001]\ntensor(0.0069, device='cuda:0')\ntensor([[0.4091],\n[0.4113]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4102], device='cuda:0')\nSteps: 95%|█████████▌| 952/1000 [10:06<00:30, 1.58it/s, loss=0.000269, lr=0.001]\nSteps: 95%|█████████▌| 953/1000 [10:07<00:30, 1.56it/s, loss=0.000269, lr=0.001]\nSteps: 95%|█████████▌| 953/1000 [10:07<00:30, 1.56it/s, loss=0.0689, lr=0.001] \nSteps: 95%|█████████▌| 954/1000 [10:07<00:29, 1.58it/s, loss=0.0689, lr=0.001]\ntensor(0.0086, device='cuda:0')\ntensor([[0.4091],\n[0.4112]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4101], device='cuda:0')\nSteps: 95%|█████████▌| 954/1000 [10:07<00:29, 1.58it/s, loss=0.0537, lr=0.001]\nSteps: 96%|█████████▌| 955/1000 [10:08<00:28, 1.56it/s, loss=0.0537, lr=0.001]\nSteps: 96%|█████████▌| 955/1000 [10:08<00:28, 1.56it/s, loss=0.0128, lr=0.001]\nSteps: 96%|█████████▌| 956/1000 [10:09<00:27, 1.58it/s, loss=0.0128, lr=0.001]\ntensor(0.0061, device='cuda:0')\ntensor([[0.4092],\n[0.4111]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4100], device='cuda:0')\nSteps: 96%|█████████▌| 956/1000 [10:09<00:27, 1.58it/s, loss=0.0269, lr=0.001]\nSteps: 96%|█████████▌| 957/1000 [10:09<00:27, 1.57it/s, loss=0.0269, lr=0.001]\nSteps: 96%|█████████▌| 957/1000 [10:09<00:27, 1.57it/s, loss=0.0107, lr=0.001]\nSteps: 96%|█████████▌| 958/1000 [10:10<00:26, 1.58it/s, loss=0.0107, lr=0.001]\ntensor(0.0027, device='cuda:0')\ntensor([[0.4092],\n[0.4110]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4099], device='cuda:0')\nSteps: 96%|█████████▌| 958/1000 [10:10<00:26, 1.58it/s, loss=0.0081, lr=0.001]\nSteps: 96%|█████████▌| 959/1000 [10:11<00:26, 1.57it/s, loss=0.0081, lr=0.001]\nSteps: 96%|█████████▌| 959/1000 [10:11<00:26, 1.57it/s, loss=0.00707, lr=0.001]\nSteps: 96%|█████████▌| 960/1000 [10:11<00:25, 1.59it/s, loss=0.00707, lr=0.001]\ntensor(0.0100, device='cuda:0')\ntensor([[0.4093],\n[0.4109]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4098], device='cuda:0')\nSteps: 96%|█████████▌| 960/1000 [10:11<00:25, 1.59it/s, loss=0.0223, lr=0.001] \nSteps: 96%|█████████▌| 961/1000 [10:12<00:24, 1.58it/s, loss=0.0223, lr=0.001]\nSteps: 96%|█████████▌| 961/1000 [10:12<00:24, 1.58it/s, loss=0.0405, lr=0.001]\nSteps: 96%|█████████▌| 962/1000 [10:12<00:23, 1.59it/s, loss=0.0405, lr=0.001]\ntensor(0.0028, device='cuda:0')\ntensor([[0.4094],\n[0.4108]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4097], device='cuda:0')\nSteps: 96%|█████████▌| 962/1000 [10:12<00:23, 1.59it/s, loss=0.00538, lr=0.001]\nSteps: 96%|█████████▋| 963/1000 [10:13<00:23, 1.57it/s, loss=0.00538, lr=0.001]\nSteps: 96%|█████████▋| 963/1000 [10:13<00:23, 1.57it/s, loss=0.000406, lr=0.001]\nSteps: 96%|█████████▋| 964/1000 [10:14<00:22, 1.58it/s, loss=0.000406, lr=0.001]\ntensor(0.0081, device='cuda:0')\ntensor([[0.4094],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4097], device='cuda:0')\nSteps: 96%|█████████▋| 964/1000 [10:14<00:22, 1.58it/s, loss=0.00454, lr=0.001] \nSteps: 96%|█████████▋| 965/1000 [10:14<00:22, 1.57it/s, loss=0.00454, lr=0.001]\nSteps: 96%|█████████▋| 965/1000 [10:14<00:22, 1.57it/s, loss=0.0148, lr=0.001] \nSteps: 97%|█████████▋| 966/1000 [10:15<00:21, 1.58it/s, loss=0.0148, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4095],\n[0.4107]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4096], device='cuda:0')\nSteps: 97%|█████████▋| 966/1000 [10:15<00:21, 1.58it/s, loss=0.000232, lr=0.001]\nSteps: 97%|█████████▋| 967/1000 [10:16<00:21, 1.56it/s, loss=0.000232, lr=0.001]\nSteps: 97%|█████████▋| 967/1000 [10:16<00:21, 1.56it/s, loss=0.00795, lr=0.001] \nSteps: 97%|█████████▋| 968/1000 [10:16<00:20, 1.58it/s, loss=0.00795, lr=0.001]\ntensor(0.0018, device='cuda:0')\ntensor([[0.4095],\n[0.4106]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4095], device='cuda:0')\nSteps: 97%|█████████▋| 968/1000 [10:16<00:20, 1.58it/s, loss=0.00367, lr=0.001]\nSteps: 97%|█████████▋| 969/1000 [10:17<00:19, 1.56it/s, loss=0.00367, lr=0.001]\nSteps: 97%|█████████▋| 969/1000 [10:17<00:19, 1.56it/s, loss=0.00094, lr=0.001]\nSteps: 97%|█████████▋| 970/1000 [10:17<00:19, 1.58it/s, loss=0.00094, lr=0.001]\ntensor(0.0063, device='cuda:0')\ntensor([[0.4095],\n[0.4105]], device='cuda:0')\nCurrent Norm : tensor([0.4086, 0.4095], device='cuda:0')\nSteps: 97%|█████████▋| 970/1000 [10:17<00:19, 1.58it/s, loss=0.00166, lr=0.001]\nSteps: 97%|█████████▋| 971/1000 [10:18<00:18, 1.56it/s, loss=0.00166, lr=0.001]\nSteps: 97%|█████████▋| 971/1000 [10:18<00:18, 1.56it/s, loss=0.0325, lr=0.001] \nSteps: 97%|█████████▋| 972/1000 [10:19<00:17, 1.58it/s, loss=0.0325, lr=0.001]\ntensor(0.0064, device='cuda:0')\ntensor([[0.4094],\n[0.4104]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4094], device='cuda:0')\nSteps: 97%|█████████▋| 972/1000 [10:19<00:17, 1.58it/s, loss=0.0677, lr=0.001]\nSteps: 97%|█████████▋| 973/1000 [10:19<00:17, 1.57it/s, loss=0.0677, lr=0.001]\nSteps: 97%|█████████▋| 973/1000 [10:19<00:17, 1.57it/s, loss=0.00378, lr=0.001]\nSteps: 97%|█████████▋| 974/1000 [10:20<00:16, 1.57it/s, loss=0.00378, lr=0.001]\ntensor(0.0032, device='cuda:0')\ntensor([[0.4094],\n[0.4103]], device='cuda:0')\nCurrent Norm : tensor([0.4085, 0.4093], device='cuda:0')\nSteps: 97%|█████████▋| 974/1000 [10:20<00:16, 1.57it/s, loss=0.00283, lr=0.001]\nSteps: 98%|█████████▊| 975/1000 [10:21<00:15, 1.56it/s, loss=0.00283, lr=0.001]\nSteps: 98%|█████████▊| 975/1000 [10:21<00:15, 1.56it/s, loss=0.0066, lr=0.001] \nSteps: 98%|█████████▊| 976/1000 [10:21<00:15, 1.57it/s, loss=0.0066, lr=0.001]\ntensor(0.0031, device='cuda:0')\ntensor([[0.4094],\n[0.4102]], device='cuda:0')\nCurrent Norm : tensor([0.4084, 0.4092], device='cuda:0')\nSteps: 98%|█████████▊| 976/1000 [10:21<00:15, 1.57it/s, loss=0.00529, lr=0.001]\nSteps: 98%|█████████▊| 977/1000 [10:22<00:14, 1.56it/s, loss=0.00529, lr=0.001]\nSteps: 98%|█████████▊| 977/1000 [10:22<00:14, 1.56it/s, loss=0.000486, lr=0.001]\nSteps: 98%|█████████▊| 978/1000 [10:23<00:13, 1.57it/s, loss=0.000486, lr=0.001]\ntensor(0.0005, device='cuda:0')\ntensor([[0.4093],\n[0.4100]], device='cuda:0')\nCurrent Norm : tensor([0.4083, 0.4090], device='cuda:0')\nSteps: 98%|█████████▊| 978/1000 [10:23<00:13, 1.57it/s, loss=0.000342, lr=0.001]\nSteps: 98%|█████████▊| 979/1000 [10:23<00:13, 1.56it/s, loss=0.000342, lr=0.001]\nSteps: 98%|█████████▊| 979/1000 [10:23<00:13, 1.56it/s, loss=0.000447, lr=0.001]\nSteps: 98%|█████████▊| 980/1000 [10:24<00:12, 1.57it/s, loss=0.000447, lr=0.001]\ntensor(0.0029, device='cuda:0')\ntensor([[0.4092],\n[0.4099]], device='cuda:0')\nCurrent Norm : tensor([0.4082, 0.4089], device='cuda:0')\nSteps: 98%|█████████▊| 980/1000 [10:24<00:12, 1.57it/s, loss=0.00251, lr=0.001] \nSteps: 98%|█████████▊| 981/1000 [10:25<00:12, 1.57it/s, loss=0.00251, lr=0.001]\nSteps: 98%|█████████▊| 981/1000 [10:25<00:12, 1.57it/s, loss=0.00702, lr=0.001]\nSteps: 98%|█████████▊| 982/1000 [10:25<00:11, 1.58it/s, loss=0.00702, lr=0.001]\ntensor(0.0039, device='cuda:0')\ntensor([[0.4091],\n[0.4097]], device='cuda:0')\nCurrent Norm : tensor([0.4081, 0.4087], device='cuda:0')\nSteps: 98%|█████████▊| 982/1000 [10:25<00:11, 1.58it/s, loss=0.00368, lr=0.001]\nSteps: 98%|█████████▊| 983/1000 [10:26<00:10, 1.57it/s, loss=0.00368, lr=0.001]\nSteps: 98%|█████████▊| 983/1000 [10:26<00:10, 1.57it/s, loss=0.00127, lr=0.001]\nSteps: 98%|█████████▊| 984/1000 [10:26<00:10, 1.59it/s, loss=0.00127, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4089],\n[0.4094]], device='cuda:0')\nCurrent Norm : tensor([0.4080, 0.4085], device='cuda:0')\nSteps: 98%|█████████▊| 984/1000 [10:26<00:10, 1.59it/s, loss=0.0239, lr=0.001] \nSteps: 98%|█████████▊| 985/1000 [10:27<00:09, 1.58it/s, loss=0.0239, lr=0.001]\nSteps: 98%|█████████▊| 985/1000 [10:27<00:09, 1.58it/s, loss=0.00117, lr=0.001]\nSteps: 99%|█████████▊| 986/1000 [10:28<00:08, 1.59it/s, loss=0.00117, lr=0.001]\ntensor(0.0016, device='cuda:0')\ntensor([[0.4088],\n[0.4092]], device='cuda:0')\nCurrent Norm : tensor([0.4079, 0.4083], device='cuda:0')\nSteps: 99%|█████████▊| 986/1000 [10:28<00:08, 1.59it/s, loss=0.000254, lr=0.001]\nSteps: 99%|█████████▊| 987/1000 [10:28<00:08, 1.58it/s, loss=0.000254, lr=0.001]\nSteps: 99%|█████████▊| 987/1000 [10:28<00:08, 1.58it/s, loss=0.0089, lr=0.001] \nSteps: 99%|█████████▉| 988/1000 [10:29<00:07, 1.59it/s, loss=0.0089, lr=0.001]\ntensor(0.0023, device='cuda:0')\ntensor([[0.4087],\n[0.4089]], device='cuda:0')\nCurrent Norm : tensor([0.4078, 0.4080], device='cuda:0')\nSteps: 99%|█████████▉| 988/1000 [10:29<00:07, 1.59it/s, loss=0.000653, lr=0.001]\nSteps: 99%|█████████▉| 989/1000 [10:30<00:07, 1.57it/s, loss=0.000653, lr=0.001]\nSteps: 99%|█████████▉| 989/1000 [10:30<00:07, 1.57it/s, loss=0.00647, lr=0.001] \nSteps: 99%|█████████▉| 990/1000 [10:30<00:06, 1.58it/s, loss=0.00647, lr=0.001]\ntensor(0.0026, device='cuda:0')\ntensor([[0.4085],\n [0.4086]], device='cuda:0')\nCurrent Norm : tensor([0.4076, 0.4078], device='cuda:0')\nSteps: 99%|█████████▉| 990/1000 [10:30<00:06, 1.58it/s, loss=0.00136, lr=0.001]\nSteps: 99%|█████████▉| 991/1000 [10:31<00:05, 1.57it/s, loss=0.00136, lr=0.001]\nSteps: 99%|█████████▉| 991/1000 [10:31<00:05, 1.57it/s, loss=0.0052, lr=0.001] \nSteps: 99%|█████████▉| 992/1000 [10:31<00:05, 1.58it/s, loss=0.0052, lr=0.001]\ntensor(0.0022, device='cuda:0')\ntensor([[0.4083],\n[0.4083]], device='cuda:0')\nCurrent Norm : tensor([0.4075, 0.4075], device='cuda:0')\nSteps: 99%|█████████▉| 992/1000 [10:31<00:05, 1.58it/s, loss=0.0147, lr=0.001]\nSteps: 99%|█████████▉| 993/1000 [10:32<00:04, 1.57it/s, loss=0.0147, lr=0.001]\nSteps: 99%|█████████▉| 993/1000 [10:32<00:04, 1.57it/s, loss=0.000513, lr=0.001]\nSteps: 99%|█████████▉| 994/1000 [10:33<00:03, 1.59it/s, loss=0.000513, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4081],\n[0.4080]], device='cuda:0')\nCurrent Norm : tensor([0.4073, 0.4072], device='cuda:0')\nSteps: 99%|█████████▉| 994/1000 [10:33<00:03, 1.59it/s, loss=0.0951, lr=0.001] \nSteps: 100%|█████████▉| 995/1000 [10:33<00:03, 1.57it/s, loss=0.0951, lr=0.001]\nSteps: 100%|█████████▉| 995/1000 [10:33<00:03, 1.57it/s, loss=0.000628, lr=0.001]\nSteps: 100%|█████████▉| 996/1000 [10:34<00:02, 1.58it/s, loss=0.000628, lr=0.001]\ntensor(0.0060, device='cuda:0')\ntensor([[0.4079],\n[0.4078]], device='cuda:0')\nCurrent Norm : tensor([0.4071, 0.4070], device='cuda:0')\nSteps: 100%|█████████▉| 996/1000 [10:34<00:02, 1.58it/s, loss=0.0297, lr=0.001] \nSteps: 100%|█████████▉| 997/1000 [10:35<00:01, 1.57it/s, loss=0.0297, lr=0.001]\nSteps: 100%|█████████▉| 997/1000 [10:35<00:01, 1.57it/s, loss=0.00103, lr=0.001]\nSteps: 100%|█████████▉| 998/1000 [10:35<00:01, 1.58it/s, loss=0.00103, lr=0.001]\ntensor(0.0121, device='cuda:0')\ntensor([[0.4078],\n[0.4076]], device='cuda:0')\nCurrent Norm : tensor([0.4070, 0.4068], device='cuda:0')\nSteps: 100%|█████████▉| 998/1000 [10:35<00:01, 1.58it/s, loss=0.0109, lr=0.001] \nSteps: 100%|█████████▉| 999/1000 [10:36<00:00, 1.54it/s, loss=0.0109, lr=0.001]\nSteps: 100%|█████████▉| 999/1000 [10:36<00:00, 1.54it/s, loss=0.0542, lr=0.001]\nSteps: 100%|██████████| 1000/1000 [10:37<00:00, 1.56it/s, loss=0.0542, lr=0.001]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0',\ngrad_fn=<SliceBackward0>)\nSaving weights to checkpoints/step_inv_1000.safetensors\nSteps: 100%|██████████| 1000/1000 [10:37<00:00, 1.56it/s, loss=0.0126, lr=0.001]\nSteps: 100%|██████████| 1000/1000 [10:37<00:00, 1.57it/s, loss=0.0126, lr=0.001]\nPTI : has 288 lora\nPTI : Before training:\n0%| | 0/700 [00:00<?, ?it/s]\nSteps: 0%| | 0/700 [00:00<?, ?it/s]\nSteps: 0%| | 1/700 [00:05<1:09:22, 5.95s/it]\nSteps: 0%| | 1/700 [00:05<1:09:22, 5.95s/it, loss=0.0136, lr=0.0004]\nSteps: 0%| | 2/700 [00:06<32:36, 2.80s/it, loss=0.0136, lr=0.0004] \nSteps: 0%| | 2/700 [00:06<32:36, 2.80s/it, loss=0.0645, lr=0.0004]\nSteps: 0%| | 3/700 [00:07<20:35, 1.77s/it, loss=0.0645, lr=0.0004]\nSteps: 0%| | 3/700 [00:07<20:35, 1.77s/it, loss=0.114, lr=0.0004] \nSteps: 1%| | 4/700 [00:07<15:02, 1.30s/it, loss=0.114, lr=0.0004]\nSteps: 1%| | 4/700 [00:07<15:02, 1.30s/it, loss=0.0147, lr=0.0004]\nSteps: 1%| | 5/700 [00:08<11:48, 1.02s/it, loss=0.0147, lr=0.0004]\nSteps: 1%| | 5/700 [00:08<11:48, 1.02s/it, loss=0.0199, lr=0.0004]\nSteps: 1%| | 6/700 [00:08<10:01, 1.15it/s, loss=0.0199, lr=0.0004]\nSteps: 1%| | 6/700 [00:08<10:01, 1.15it/s, loss=0.127, lr=0.0004] \nSteps: 1%| | 7/700 [00:09<08:50, 1.31it/s, loss=0.127, lr=0.0004]\nSteps: 1%| | 7/700 [00:09<08:50, 1.31it/s, loss=0.194, lr=0.0004]\nSteps: 1%| | 8/700 [00:09<08:06, 1.42it/s, loss=0.194, lr=0.0004]\nSteps: 1%| | 8/700 [00:09<08:06, 1.42it/s, loss=0.0105, lr=0.0004]\nSteps: 1%|▏ | 9/700 [00:10<07:33, 1.52it/s, loss=0.0105, lr=0.0004]\nSteps: 1%|▏ | 9/700 [00:10<07:33, 1.52it/s, loss=0.0122, lr=0.0004]\nSteps: 1%|▏ | 10/700 [00:10<07:01, 1.64it/s, loss=0.0122, lr=0.0004]\nSteps: 1%|▏ | 10/700 [00:10<07:01, 1.64it/s, loss=0.0168, lr=0.0004]\nSteps: 2%|▏ | 11/700 [00:11<06:43, 1.71it/s, loss=0.0168, lr=0.0004]\nSteps: 2%|▏ | 11/700 [00:11<06:43, 1.71it/s, loss=0.126, lr=0.0004] \nSteps: 2%|▏ | 12/700 [00:12<06:31, 1.76it/s, loss=0.126, lr=0.0004]\nSteps: 2%|▏ | 12/700 [00:12<06:31, 1.76it/s, loss=0.0972, lr=0.0004]\nSteps: 2%|▏ | 13/700 [00:12<06:22, 1.80it/s, loss=0.0972, lr=0.0004]\nSteps: 2%|▏ | 13/700 [00:12<06:22, 1.80it/s, loss=0.176, lr=0.0004] \nSteps: 2%|▏ | 14/700 [00:13<06:08, 1.86it/s, loss=0.176, lr=0.0004]\nSteps: 2%|▏ | 14/700 [00:13<06:08, 1.86it/s, loss=0.00823, lr=0.0004]\nSteps: 2%|▏ | 15/700 [00:13<06:01, 1.89it/s, loss=0.00823, lr=0.0004]\nSteps: 2%|▏ | 15/700 [00:13<06:01, 1.89it/s, loss=0.0132, lr=0.0004] \nSteps: 2%|▏ | 16/700 [00:14<05:55, 1.92it/s, loss=0.0132, lr=0.0004]\nSteps: 2%|▏ | 16/700 [00:14<05:55, 1.92it/s, loss=0.0208, lr=0.0004]\nSteps: 2%|▏ | 17/700 [00:14<05:52, 1.93it/s, loss=0.0208, lr=0.0004]\nSteps: 2%|▏ | 17/700 [00:14<05:52, 1.93it/s, loss=0.0074, lr=0.0004]\nSteps: 3%|▎ | 18/700 [00:15<06:02, 1.88it/s, loss=0.0074, lr=0.0004]\nSteps: 3%|▎ | 18/700 [00:15<06:02, 1.88it/s, loss=0.00776, lr=0.0004]\nSteps: 3%|▎ | 19/700 [00:15<06:10, 1.84it/s, loss=0.00776, lr=0.0004]\nSteps: 3%|▎ | 19/700 [00:15<06:10, 1.84it/s, loss=0.0114, lr=0.0004] \nSteps: 3%|▎ | 20/700 [00:16<06:19, 1.79it/s, loss=0.0114, lr=0.0004]\nSteps: 3%|▎ | 20/700 [00:16<06:19, 1.79it/s, loss=0.0615, lr=0.0004]\nSteps: 3%|▎ | 21/700 [00:16<06:25, 1.76it/s, loss=0.0615, lr=0.0004]\nSteps: 3%|▎ | 21/700 [00:16<06:25, 1.76it/s, loss=0.00527, lr=0.0004]\nSteps: 3%|▎ | 22/700 [00:17<06:24, 1.76it/s, loss=0.00527, lr=0.0004]\nSteps: 3%|▎ | 22/700 [00:17<06:24, 1.76it/s, loss=0.0075, lr=0.0004] \nSteps: 3%|▎ | 23/700 [00:18<06:29, 1.74it/s, loss=0.0075, lr=0.0004]\nSteps: 3%|▎ | 23/700 [00:18<06:29, 1.74it/s, loss=0.027, lr=0.0004] \nSteps: 3%|▎ | 24/700 [00:18<06:17, 1.79it/s, loss=0.027, lr=0.0004]\nSteps: 3%|▎ | 24/700 [00:18<06:17, 1.79it/s, loss=0.0509, lr=0.0004]\nSteps: 4%|▎ | 25/700 [00:19<06:09, 1.83it/s, loss=0.0509, lr=0.0004]\nSteps: 4%|▎ | 25/700 [00:19<06:09, 1.83it/s, loss=0.0534, lr=0.0004]\nSteps: 4%|▎ | 26/700 [00:19<06:11, 1.81it/s, loss=0.0534, lr=0.0004]\nSteps: 4%|▎ | 26/700 [00:19<06:11, 1.81it/s, loss=0.0332, lr=0.0004]\nSteps: 4%|▍ | 27/700 [00:20<06:05, 1.84it/s, loss=0.0332, lr=0.0004]\nSteps: 4%|▍ | 27/700 [00:20<06:05, 1.84it/s, loss=0.134, lr=0.0004] \nSteps: 4%|▍ | 28/700 [00:20<05:56, 1.89it/s, loss=0.134, lr=0.0004]\nSteps: 4%|▍ | 28/700 [00:20<05:56, 1.89it/s, loss=0.0159, lr=0.0004]\nSteps: 4%|▍ | 29/700 [00:21<05:51, 1.91it/s, loss=0.0159, lr=0.0004]\nSteps: 4%|▍ | 29/700 [00:21<05:51, 1.91it/s, loss=0.00841, lr=0.0004]\nSteps: 4%|▍ | 30/700 [00:21<05:52, 1.90it/s, loss=0.00841, lr=0.0004]\nSteps: 4%|▍ | 30/700 [00:21<05:52, 1.90it/s, loss=0.0104, lr=0.0004] \nSteps: 4%|▍ | 31/700 [00:22<05:55, 1.88it/s, loss=0.0104, lr=0.0004]\nSteps: 4%|▍ | 31/700 [00:22<05:55, 1.88it/s, loss=0.0769, lr=0.0004]\nSteps: 5%|▍ | 32/700 [00:22<05:50, 1.91it/s, loss=0.0769, lr=0.0004]\nSteps: 5%|▍ | 32/700 [00:22<05:50, 1.91it/s, loss=0.0564, lr=0.0004]\nSteps: 5%|▍ | 33/700 [00:23<05:48, 1.91it/s, loss=0.0564, lr=0.0004]\nSteps: 5%|▍ | 33/700 [00:23<05:48, 1.91it/s, loss=0.00519, lr=0.0004]\nSteps: 5%|▍ | 34/700 [00:23<05:44, 1.93it/s, loss=0.00519, lr=0.0004]\nSteps: 5%|▍ | 34/700 [00:23<05:44, 1.93it/s, loss=0.00172, lr=0.0004]\nSteps: 5%|▌ | 35/700 [00:24<05:43, 1.94it/s, loss=0.00172, lr=0.0004]\nSteps: 5%|▌ | 35/700 [00:24<05:43, 1.94it/s, loss=0.00847, lr=0.0004]\nSteps: 5%|▌ | 36/700 [00:24<05:40, 1.95it/s, loss=0.00847, lr=0.0004]\nSteps: 5%|▌ | 36/700 [00:24<05:40, 1.95it/s, loss=0.00893, lr=0.0004]\nSteps: 5%|▌ | 37/700 [00:25<05:38, 1.96it/s, loss=0.00893, lr=0.0004]\nSteps: 5%|▌ | 37/700 [00:25<05:38, 1.96it/s, loss=0.00843, lr=0.0004]\nSteps: 5%|▌ | 38/700 [00:25<05:40, 1.94it/s, loss=0.00843, lr=0.0004]\nSteps: 5%|▌ | 38/700 [00:25<05:40, 1.94it/s, loss=0.00305, lr=0.0004]\nSteps: 6%|▌ | 39/700 [00:26<05:48, 1.90it/s, loss=0.00305, lr=0.0004]\nSteps: 6%|▌ | 39/700 [00:26<05:48, 1.90it/s, loss=0.012, lr=0.0004] \nSteps: 6%|▌ | 40/700 [00:26<05:53, 1.87it/s, loss=0.012, lr=0.0004]\nSteps: 6%|▌ | 40/700 [00:26<05:53, 1.87it/s, loss=0.0233, lr=0.0004]\nSteps: 6%|▌ | 41/700 [00:27<05:56, 1.85it/s, loss=0.0233, lr=0.0004]\nSteps: 6%|▌ | 41/700 [00:27<05:56, 1.85it/s, loss=0.0213, lr=0.0004]\nSteps: 6%|▌ | 42/700 [00:28<05:57, 1.84it/s, loss=0.0213, lr=0.0004]\nSteps: 6%|▌ | 42/700 [00:28<05:57, 1.84it/s, loss=0.00223, lr=0.0004]\nSteps: 6%|▌ | 43/700 [00:28<06:00, 1.82it/s, loss=0.00223, lr=0.0004]\nSteps: 6%|▌ | 43/700 [00:28<06:00, 1.82it/s, loss=0.0261, lr=0.0004] \nSteps: 6%|▋ | 44/700 [00:29<06:10, 1.77it/s, loss=0.0261, lr=0.0004]\nSteps: 6%|▋ | 44/700 [00:29<06:10, 1.77it/s, loss=0.0833, lr=0.0004]\nSteps: 6%|▋ | 45/700 [00:29<06:12, 1.76it/s, loss=0.0833, lr=0.0004]\nSteps: 6%|▋ | 45/700 [00:29<06:12, 1.76it/s, loss=0.0273, lr=0.0004]\nSteps: 7%|▋ | 46/700 [00:30<06:14, 1.75it/s, loss=0.0273, lr=0.0004]\nSteps: 7%|▋ | 46/700 [00:30<06:14, 1.75it/s, loss=0.00564, lr=0.0004]\nSteps: 7%|▋ | 47/700 [00:30<06:12, 1.76it/s, loss=0.00564, lr=0.0004]\nSteps: 7%|▋ | 47/700 [00:30<06:12, 1.76it/s, loss=0.0392, lr=0.0004] \nSteps: 7%|▋ | 48/700 [00:31<06:07, 1.77it/s, loss=0.0392, lr=0.0004]\nSteps: 7%|▋ | 48/700 [00:31<06:07, 1.77it/s, loss=0.00178, lr=0.0004]\nSteps: 7%|▋ | 49/700 [00:32<06:06, 1.78it/s, loss=0.00178, lr=0.0004]\nSteps: 7%|▋ | 49/700 [00:32<06:06, 1.78it/s, loss=0.0246, lr=0.0004] \nSteps: 7%|▋ | 50/700 [00:32<06:04, 1.78it/s, loss=0.0246, lr=0.0004]\nSteps: 7%|▋ | 50/700 [00:32<06:04, 1.78it/s, loss=0.00817, lr=0.0004]\nSteps: 7%|▋ | 51/700 [00:33<06:10, 1.75it/s, loss=0.00817, lr=0.0004]\nSteps: 7%|▋ | 51/700 [00:33<06:10, 1.75it/s, loss=0.0547, lr=0.0004] \nSteps: 7%|▋ | 52/700 [00:33<06:10, 1.75it/s, loss=0.0547, lr=0.0004]\nSteps: 7%|▋ | 52/700 [00:33<06:10, 1.75it/s, loss=0.0248, lr=0.0004]\nSteps: 8%|▊ | 53/700 [00:34<06:16, 1.72it/s, loss=0.0248, lr=0.0004]\nSteps: 8%|▊ | 53/700 [00:34<06:16, 1.72it/s, loss=0.0956, lr=0.0004]\nSteps: 8%|▊ | 54/700 [00:34<06:09, 1.75it/s, loss=0.0956, lr=0.0004]\nSteps: 8%|▊ | 54/700 [00:34<06:09, 1.75it/s, loss=0.0246, lr=0.0004]\nSteps: 8%|▊ | 55/700 [00:35<06:06, 1.76it/s, loss=0.0246, lr=0.0004]\nSteps: 8%|▊ | 55/700 [00:35<06:06, 1.76it/s, loss=0.0204, lr=0.0004]\nSteps: 8%|▊ | 56/700 [00:36<06:02, 1.78it/s, loss=0.0204, lr=0.0004]\nSteps: 8%|▊ | 56/700 [00:36<06:02, 1.78it/s, loss=0.00192, lr=0.0004]\nSteps: 8%|▊ | 57/700 [00:36<06:03, 1.77it/s, loss=0.00192, lr=0.0004]\nSteps: 8%|▊ | 57/700 [00:36<06:03, 1.77it/s, loss=0.0176, lr=0.0004] \nSteps: 8%|▊ | 58/700 [00:37<06:03, 1.77it/s, loss=0.0176, lr=0.0004]\nSteps: 8%|▊ | 58/700 [00:37<06:03, 1.77it/s, loss=0.0782, lr=0.0004]\nSteps: 8%|▊ | 59/700 [00:37<06:04, 1.76it/s, loss=0.0782, lr=0.0004]\nSteps: 8%|▊ | 59/700 [00:37<06:04, 1.76it/s, loss=0.297, lr=0.0004] \nSteps: 9%|▊ | 60/700 [00:38<05:59, 1.78it/s, loss=0.297, lr=0.0004]\nSteps: 9%|▊ | 60/700 [00:38<05:59, 1.78it/s, loss=0.0103, lr=0.0004]\nSteps: 9%|▊ | 61/700 [00:38<05:56, 1.79it/s, loss=0.0103, lr=0.0004]\nSteps: 9%|▊ | 61/700 [00:38<05:56, 1.79it/s, loss=0.00232, lr=0.0004]\nSteps: 9%|▉ | 62/700 [00:39<05:57, 1.78it/s, loss=0.00232, lr=0.0004]\nSteps: 9%|▉ | 62/700 [00:39<05:57, 1.78it/s, loss=0.135, lr=0.0004] \nSteps: 9%|▉ | 63/700 [00:39<05:52, 1.81it/s, loss=0.135, lr=0.0004]\nSteps: 9%|▉ | 63/700 [00:39<05:52, 1.81it/s, loss=0.0448, lr=0.0004]\nSteps: 9%|▉ | 64/700 [00:40<05:49, 1.82it/s, loss=0.0448, lr=0.0004]\nSteps: 9%|▉ | 64/700 [00:40<05:49, 1.82it/s, loss=0.0329, lr=0.0004]\nSteps: 9%|▉ | 65/700 [00:41<05:48, 1.82it/s, loss=0.0329, lr=0.0004]\nSteps: 9%|▉ | 65/700 [00:41<05:48, 1.82it/s, loss=0.102, lr=0.0004] \nSteps: 9%|▉ | 66/700 [00:41<05:48, 1.82it/s, loss=0.102, lr=0.0004]\nSteps: 9%|▉ | 66/700 [00:41<05:48, 1.82it/s, loss=0.136, lr=0.0004]\nSteps: 10%|▉ | 67/700 [00:42<05:49, 1.81it/s, loss=0.136, lr=0.0004]\nSteps: 10%|▉ | 67/700 [00:42<05:49, 1.81it/s, loss=0.0229, lr=0.0004]\nSteps: 10%|▉ | 68/700 [00:42<05:47, 1.82it/s, loss=0.0229, lr=0.0004]\nSteps: 10%|▉ | 68/700 [00:42<05:47, 1.82it/s, loss=0.0538, lr=0.0004]\nSteps: 10%|▉ | 69/700 [00:43<05:43, 1.84it/s, loss=0.0538, lr=0.0004]\nSteps: 10%|▉ | 69/700 [00:43<05:43, 1.84it/s, loss=0.0282, lr=0.0004]\nSteps: 10%|█ | 70/700 [00:43<05:42, 1.84it/s, loss=0.0282, lr=0.0004]\nSteps: 10%|█ | 70/700 [00:43<05:42, 1.84it/s, loss=0.00587, lr=0.0004]\nSteps: 10%|█ | 71/700 [00:44<05:45, 1.82it/s, loss=0.00587, lr=0.0004]\nSteps: 10%|█ | 71/700 [00:44<05:45, 1.82it/s, loss=0.0534, lr=0.0004] \nSteps: 10%|█ | 72/700 [00:44<05:43, 1.83it/s, loss=0.0534, lr=0.0004]\nSteps: 10%|█ | 72/700 [00:44<05:43, 1.83it/s, loss=0.00902, lr=0.0004]\nSteps: 10%|█ | 73/700 [00:45<05:40, 1.84it/s, loss=0.00902, lr=0.0004]\nSteps: 10%|█ | 73/700 [00:45<05:40, 1.84it/s, loss=0.00754, lr=0.0004]\nSteps: 11%|█ | 74/700 [00:45<05:42, 1.83it/s, loss=0.00754, lr=0.0004]\nSteps: 11%|█ | 74/700 [00:45<05:42, 1.83it/s, loss=0.00843, lr=0.0004]\nSteps: 11%|█ | 75/700 [00:46<05:42, 1.83it/s, loss=0.00843, lr=0.0004]\nSteps: 11%|█ | 75/700 [00:46<05:42, 1.83it/s, loss=0.0558, lr=0.0004] \nSteps: 11%|█ | 76/700 [00:47<05:41, 1.83it/s, loss=0.0558, lr=0.0004]\nSteps: 11%|█ | 76/700 [00:47<05:41, 1.83it/s, loss=0.014, lr=0.0004] \nSteps: 11%|█ | 77/700 [00:47<06:10, 1.68it/s, loss=0.014, lr=0.0004]\nSteps: 11%|█ | 77/700 [00:47<06:10, 1.68it/s, loss=0.0103, lr=0.0004]\nSteps: 11%|█ | 78/700 [00:48<06:02, 1.72it/s, loss=0.0103, lr=0.0004]\nSteps: 11%|█ | 78/700 [00:48<06:02, 1.72it/s, loss=0.199, lr=0.0004] \nSteps: 11%|█▏ | 79/700 [00:48<05:54, 1.75it/s, loss=0.199, lr=0.0004]\nSteps: 11%|█▏ | 79/700 [00:48<05:54, 1.75it/s, loss=0.00994, lr=0.0004]\nSteps: 11%|█▏ | 80/700 [00:49<05:50, 1.77it/s, loss=0.00994, lr=0.0004]\nSteps: 11%|█▏ | 80/700 [00:49<05:50, 1.77it/s, loss=0.00166, lr=0.0004]\nSteps: 12%|█▏ | 81/700 [00:49<05:47, 1.78it/s, loss=0.00166, lr=0.0004]\nSteps: 12%|█▏ | 81/700 [00:49<05:47, 1.78it/s, loss=0.307, lr=0.0004] \nSteps: 12%|█▏ | 82/700 [00:50<05:46, 1.79it/s, loss=0.307, lr=0.0004]\nSteps: 12%|█▏ | 82/700 [00:50<05:46, 1.79it/s, loss=0.0787, lr=0.0004]\nSteps: 12%|█▏ | 83/700 [00:51<05:41, 1.80it/s, loss=0.0787, lr=0.0004]\nSteps: 12%|█▏ | 83/700 [00:51<05:41, 1.80it/s, loss=0.0285, lr=0.0004]\nSteps: 12%|█▏ | 84/700 [00:51<05:41, 1.81it/s, loss=0.0285, lr=0.0004]\nSteps: 12%|█▏ | 84/700 [00:51<05:41, 1.81it/s, loss=0.0156, lr=0.0004]\nSteps: 12%|█▏ | 85/700 [00:52<05:39, 1.81it/s, loss=0.0156, lr=0.0004]\nSteps: 12%|█▏ | 85/700 [00:52<05:39, 1.81it/s, loss=0.00945, lr=0.0004]\nSteps: 12%|█▏ | 86/700 [00:52<05:36, 1.82it/s, loss=0.00945, lr=0.0004]\nSteps: 12%|█▏ | 86/700 [00:52<05:36, 1.82it/s, loss=0.0294, lr=0.0004] \nSteps: 12%|█▏ | 87/700 [00:53<05:38, 1.81it/s, loss=0.0294, lr=0.0004]\nSteps: 12%|█▏ | 87/700 [00:53<05:38, 1.81it/s, loss=0.0266, lr=0.0004]\nSteps: 13%|█▎ | 88/700 [00:53<05:37, 1.82it/s, loss=0.0266, lr=0.0004]\nSteps: 13%|█▎ | 88/700 [00:53<05:37, 1.82it/s, loss=0.00252, lr=0.0004]\nSteps: 13%|█▎ | 89/700 [00:54<05:39, 1.80it/s, loss=0.00252, lr=0.0004]\nSteps: 13%|█▎ | 89/700 [00:54<05:39, 1.80it/s, loss=0.0111, lr=0.0004] \nSteps: 13%|█▎ | 90/700 [00:54<05:44, 1.77it/s, loss=0.0111, lr=0.0004]\nSteps: 13%|█▎ | 90/700 [00:54<05:44, 1.77it/s, loss=0.0113, lr=0.0004]\nSteps: 13%|█▎ | 91/700 [00:55<05:38, 1.80it/s, loss=0.0113, lr=0.0004]\nSteps: 13%|█▎ | 91/700 [00:55<05:38, 1.80it/s, loss=0.0463, lr=0.0004]\nSteps: 13%|█▎ | 92/700 [00:56<05:32, 1.83it/s, loss=0.0463, lr=0.0004]\nSteps: 13%|█▎ | 92/700 [00:56<05:32, 1.83it/s, loss=0.00671, lr=0.0004]\nSteps: 13%|█▎ | 93/700 [00:56<05:29, 1.84it/s, loss=0.00671, lr=0.0004]\nSteps: 13%|█▎ | 93/700 [00:56<05:29, 1.84it/s, loss=0.0407, lr=0.0004] \nSteps: 13%|█▎ | 94/700 [00:57<05:28, 1.84it/s, loss=0.0407, lr=0.0004]\nSteps: 13%|█▎ | 94/700 [00:57<05:28, 1.84it/s, loss=0.00514, lr=0.0004]\nSteps: 14%|█▎ | 95/700 [00:57<05:31, 1.83it/s, loss=0.00514, lr=0.0004]\nSteps: 14%|█▎ | 95/700 [00:57<05:31, 1.83it/s, loss=0.0298, lr=0.0004] \nSteps: 14%|█▎ | 96/700 [00:58<05:34, 1.81it/s, loss=0.0298, lr=0.0004]\nSteps: 14%|█▎ | 96/700 [00:58<05:34, 1.81it/s, loss=0.0139, lr=0.0004]\nSteps: 14%|█▍ | 97/700 [00:58<05:29, 1.83it/s, loss=0.0139, lr=0.0004]\nSteps: 14%|█▍ | 97/700 [00:58<05:29, 1.83it/s, loss=0.00684, lr=0.0004]\nSteps: 14%|█▍ | 98/700 [00:59<05:20, 1.88it/s, loss=0.00684, lr=0.0004]\nSteps: 14%|█▍ | 98/700 [00:59<05:20, 1.88it/s, loss=0.0252, lr=0.0004] \nSteps: 14%|█▍ | 99/700 [00:59<05:21, 1.87it/s, loss=0.0252, lr=0.0004]\nSteps: 14%|█▍ | 99/700 [00:59<05:21, 1.87it/s, loss=0.212, lr=0.0004] \nSteps: 14%|█▍ | 100/700 [01:00<05:22, 1.86it/s, loss=0.212, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_100.safetensors\nLORA Unet Moved 0.0007244422449730337\nLORA CLIP Moved 3.097903754678555e-05\nSteps: 14%|█▍ | 100/700 [01:00<05:22, 1.86it/s, loss=0.0229, lr=0.0004]\nSteps: 14%|█▍ | 101/700 [01:01<05:58, 1.67it/s, loss=0.0229, lr=0.0004]\nSteps: 14%|█▍ | 101/700 [01:01<05:58, 1.67it/s, loss=0.0265, lr=0.0004]\nSteps: 15%|█▍ | 102/700 [01:01<05:48, 1.72it/s, loss=0.0265, lr=0.0004]\nSteps: 15%|█▍ | 102/700 [01:01<05:48, 1.72it/s, loss=0.0872, lr=0.0004]\nSteps: 15%|█▍ | 103/700 [01:02<05:42, 1.75it/s, loss=0.0872, lr=0.0004]\nSteps: 15%|█▍ | 103/700 [01:02<05:42, 1.75it/s, loss=0.0143, lr=0.0004]\nSteps: 15%|█▍ | 104/700 [01:02<05:35, 1.78it/s, loss=0.0143, lr=0.0004]\nSteps: 15%|█▍ | 104/700 [01:02<05:35, 1.78it/s, loss=0.0161, lr=0.0004]\nSteps: 15%|█▌ | 105/700 [01:03<05:32, 1.79it/s, loss=0.0161, lr=0.0004]\nSteps: 15%|█▌ | 105/700 [01:03<05:32, 1.79it/s, loss=0.011, lr=0.0004] \nSteps: 15%|█▌ | 106/700 [01:03<05:31, 1.79it/s, loss=0.011, lr=0.0004]\nSteps: 15%|█▌ | 106/700 [01:03<05:31, 1.79it/s, loss=0.0072, lr=0.0004]\nSteps: 15%|█▌ | 107/700 [01:04<05:30, 1.80it/s, loss=0.0072, lr=0.0004]\nSteps: 15%|█▌ | 107/700 [01:04<05:30, 1.80it/s, loss=0.00261, lr=0.0004]\nSteps: 15%|█▌ | 108/700 [01:04<05:26, 1.81it/s, loss=0.00261, lr=0.0004]\nSteps: 15%|█▌ | 108/700 [01:04<05:26, 1.81it/s, loss=0.00597, lr=0.0004]\nSteps: 16%|█▌ | 109/700 [01:05<05:24, 1.82it/s, loss=0.00597, lr=0.0004]\nSteps: 16%|█▌ | 109/700 [01:05<05:24, 1.82it/s, loss=0.073, lr=0.0004] \nSteps: 16%|█▌ | 110/700 [01:05<05:23, 1.82it/s, loss=0.073, lr=0.0004]\nSteps: 16%|█▌ | 110/700 [01:05<05:23, 1.82it/s, loss=0.0238, lr=0.0004]\nSteps: 16%|█▌ | 111/700 [01:06<05:21, 1.83it/s, loss=0.0238, lr=0.0004]\nSteps: 16%|█▌ | 111/700 [01:06<05:21, 1.83it/s, loss=0.00492, lr=0.0004]\nSteps: 16%|█▌ | 112/700 [01:07<05:19, 1.84it/s, loss=0.00492, lr=0.0004]\nSteps: 16%|█▌ | 112/700 [01:07<05:19, 1.84it/s, loss=0.00202, lr=0.0004]\nSteps: 16%|█▌ | 113/700 [01:07<05:18, 1.84it/s, loss=0.00202, lr=0.0004]\nSteps: 16%|█▌ | 113/700 [01:07<05:18, 1.84it/s, loss=0.0107, lr=0.0004] \nSteps: 16%|█▋ | 114/700 [01:08<05:18, 1.84it/s, loss=0.0107, lr=0.0004]\nSteps: 16%|█▋ | 114/700 [01:08<05:18, 1.84it/s, loss=0.0017, lr=0.0004]\nSteps: 16%|█▋ | 115/700 [01:08<05:19, 1.83it/s, loss=0.0017, lr=0.0004]\nSteps: 16%|█▋ | 115/700 [01:08<05:19, 1.83it/s, loss=0.0193, lr=0.0004]\nSteps: 17%|█▋ | 116/700 [01:09<05:17, 1.84it/s, loss=0.0193, lr=0.0004]\nSteps: 17%|█▋ | 116/700 [01:09<05:17, 1.84it/s, loss=0.0246, lr=0.0004]\nSteps: 17%|█▋ | 117/700 [01:09<05:16, 1.84it/s, loss=0.0246, lr=0.0004]\nSteps: 17%|█▋ | 117/700 [01:09<05:16, 1.84it/s, loss=0.0084, lr=0.0004]\nSteps: 17%|█▋ | 118/700 [01:10<05:19, 1.82it/s, loss=0.0084, lr=0.0004]\nSteps: 17%|█▋ | 118/700 [01:10<05:19, 1.82it/s, loss=0.369, lr=0.0004] \nSteps: 17%|█▋ | 119/700 [01:10<05:18, 1.82it/s, loss=0.369, lr=0.0004]\nSteps: 17%|█▋ | 119/700 [01:10<05:18, 1.82it/s, loss=0.0188, lr=0.0004]\nSteps: 17%|█▋ | 120/700 [01:11<05:20, 1.81it/s, loss=0.0188, lr=0.0004]\nSteps: 17%|█▋ | 120/700 [01:11<05:20, 1.81it/s, loss=0.0234, lr=0.0004]\nSteps: 17%|█▋ | 121/700 [01:11<05:18, 1.82it/s, loss=0.0234, lr=0.0004]\nSteps: 17%|█▋ | 121/700 [01:12<05:18, 1.82it/s, loss=0.0663, lr=0.0004]\nSteps: 17%|█▋ | 122/700 [01:12<05:15, 1.83it/s, loss=0.0663, lr=0.0004]\nSteps: 17%|█▋ | 122/700 [01:12<05:15, 1.83it/s, loss=0.00747, lr=0.0004]\nSteps: 18%|█▊ | 123/700 [01:13<05:14, 1.84it/s, loss=0.00747, lr=0.0004]\nSteps: 18%|█▊ | 123/700 [01:13<05:14, 1.84it/s, loss=0.0517, lr=0.0004] \nSteps: 18%|█▊ | 124/700 [01:13<05:13, 1.84it/s, loss=0.0517, lr=0.0004]\nSteps: 18%|█▊ | 124/700 [01:13<05:13, 1.84it/s, loss=0.00986, lr=0.0004]\nSteps: 18%|█▊ | 125/700 [01:14<05:13, 1.83it/s, loss=0.00986, lr=0.0004]\nSteps: 18%|█▊ | 125/700 [01:14<05:13, 1.83it/s, loss=0.00407, lr=0.0004]\nSteps: 18%|█▊ | 126/700 [01:14<05:14, 1.83it/s, loss=0.00407, lr=0.0004]\nSteps: 18%|█▊ | 126/700 [01:14<05:14, 1.83it/s, loss=0.00421, lr=0.0004]\nSteps: 18%|█▊ | 127/700 [01:15<05:15, 1.81it/s, loss=0.00421, lr=0.0004]\nSteps: 18%|█▊ | 127/700 [01:15<05:15, 1.81it/s, loss=0.0145, lr=0.0004] \nSteps: 18%|█▊ | 128/700 [01:15<05:11, 1.84it/s, loss=0.0145, lr=0.0004]\nSteps: 18%|█▊ | 128/700 [01:15<05:11, 1.84it/s, loss=0.00552, lr=0.0004]\nSteps: 18%|█▊ | 129/700 [01:16<05:09, 1.84it/s, loss=0.00552, lr=0.0004]\nSteps: 18%|█▊ | 129/700 [01:16<05:09, 1.84it/s, loss=0.0378, lr=0.0004] \nSteps: 19%|█▊ | 130/700 [01:16<05:08, 1.85it/s, loss=0.0378, lr=0.0004]\nSteps: 19%|█▊ | 130/700 [01:16<05:08, 1.85it/s, loss=0.0183, lr=0.0004]\nSteps: 19%|█▊ | 131/700 [01:17<05:07, 1.85it/s, loss=0.0183, lr=0.0004]\nSteps: 19%|█▊ | 131/700 [01:17<05:07, 1.85it/s, loss=0.0362, lr=0.0004]\nSteps: 19%|█▉ | 132/700 [01:17<05:06, 1.86it/s, loss=0.0362, lr=0.0004]\nSteps: 19%|█▉ | 132/700 [01:17<05:06, 1.86it/s, loss=0.0043, lr=0.0004]\nSteps: 19%|█▉ | 133/700 [01:18<05:02, 1.87it/s, loss=0.0043, lr=0.0004]\nSteps: 19%|█▉ | 133/700 [01:18<05:02, 1.87it/s, loss=0.0103, lr=0.0004]\nSteps: 19%|█▉ | 134/700 [01:18<04:58, 1.89it/s, loss=0.0103, lr=0.0004]\nSteps: 19%|█▉ | 134/700 [01:18<04:58, 1.89it/s, loss=0.0782, lr=0.0004]\nSteps: 19%|█▉ | 135/700 [01:19<04:57, 1.90it/s, loss=0.0782, lr=0.0004]\nSteps: 19%|█▉ | 135/700 [01:19<04:57, 1.90it/s, loss=0.00536, lr=0.0004]\nSteps: 19%|█▉ | 136/700 [01:20<04:54, 1.91it/s, loss=0.00536, lr=0.0004]\nSteps: 19%|█▉ | 136/700 [01:20<04:54, 1.91it/s, loss=0.00977, lr=0.0004]\nSteps: 20%|█▉ | 137/700 [01:20<04:50, 1.94it/s, loss=0.00977, lr=0.0004]\nSteps: 20%|█▉ | 137/700 [01:20<04:50, 1.94it/s, loss=0.0244, lr=0.0004] \nSteps: 20%|█▉ | 138/700 [01:21<04:49, 1.94it/s, loss=0.0244, lr=0.0004]\nSteps: 20%|█▉ | 138/700 [01:21<04:49, 1.94it/s, loss=0.0119, lr=0.0004]\nSteps: 20%|█▉ | 139/700 [01:21<04:53, 1.91it/s, loss=0.0119, lr=0.0004]\nSteps: 20%|█▉ | 139/700 [01:21<04:53, 1.91it/s, loss=0.00262, lr=0.0004]\nSteps: 20%|██ | 140/700 [01:22<04:55, 1.90it/s, loss=0.00262, lr=0.0004]\nSteps: 20%|██ | 140/700 [01:22<04:55, 1.90it/s, loss=0.0776, lr=0.0004] \nSteps: 20%|██ | 141/700 [01:22<04:58, 1.87it/s, loss=0.0776, lr=0.0004]\nSteps: 20%|██ | 141/700 [01:22<04:58, 1.87it/s, loss=0.00148, lr=0.0004]\nSteps: 20%|██ | 142/700 [01:23<05:01, 1.85it/s, loss=0.00148, lr=0.0004]\nSteps: 20%|██ | 142/700 [01:23<05:01, 1.85it/s, loss=0.0134, lr=0.0004] \nSteps: 20%|██ | 143/700 [01:23<05:04, 1.83it/s, loss=0.0134, lr=0.0004]\nSteps: 20%|██ | 143/700 [01:23<05:04, 1.83it/s, loss=0.0393, lr=0.0004]\nSteps: 21%|██ | 144/700 [01:24<05:06, 1.81it/s, loss=0.0393, lr=0.0004]\nSteps: 21%|██ | 144/700 [01:24<05:06, 1.81it/s, loss=0.164, lr=0.0004] \nSteps: 21%|██ | 145/700 [01:24<05:00, 1.85it/s, loss=0.164, lr=0.0004]\nSteps: 21%|██ | 145/700 [01:24<05:00, 1.85it/s, loss=0.0173, lr=0.0004]\nSteps: 21%|██ | 146/700 [01:25<05:00, 1.85it/s, loss=0.0173, lr=0.0004]\nSteps: 21%|██ | 146/700 [01:25<05:00, 1.85it/s, loss=0.00347, lr=0.0004]\nSteps: 21%|██ | 147/700 [01:25<04:58, 1.85it/s, loss=0.00347, lr=0.0004]\nSteps: 21%|██ | 147/700 [01:25<04:58, 1.85it/s, loss=0.0358, lr=0.0004] \nSteps: 21%|██ | 148/700 [01:26<04:57, 1.85it/s, loss=0.0358, lr=0.0004]\nSteps: 21%|██ | 148/700 [01:26<04:57, 1.85it/s, loss=0.00457, lr=0.0004]\nSteps: 21%|██▏ | 149/700 [01:27<04:56, 1.86it/s, loss=0.00457, lr=0.0004]\nSteps: 21%|██▏ | 149/700 [01:27<04:56, 1.86it/s, loss=0.0184, lr=0.0004] \nSteps: 21%|██▏ | 150/700 [01:27<04:49, 1.90it/s, loss=0.0184, lr=0.0004]\nSteps: 21%|██▏ | 150/700 [01:27<04:49, 1.90it/s, loss=0.00209, lr=0.0004]\nSteps: 22%|██▏ | 151/700 [01:28<04:47, 1.91it/s, loss=0.00209, lr=0.0004]\nSteps: 22%|██▏ | 151/700 [01:28<04:47, 1.91it/s, loss=0.0184, lr=0.0004] \nSteps: 22%|██▏ | 152/700 [01:28<04:48, 1.90it/s, loss=0.0184, lr=0.0004]\nSteps: 22%|██▏ | 152/700 [01:28<04:48, 1.90it/s, loss=0.242, lr=0.0004] \nSteps: 22%|██▏ | 153/700 [01:29<04:50, 1.88it/s, loss=0.242, lr=0.0004]\nSteps: 22%|██▏ | 153/700 [01:29<04:50, 1.88it/s, loss=0.0147, lr=0.0004]\nSteps: 22%|██▏ | 154/700 [01:29<04:51, 1.88it/s, loss=0.0147, lr=0.0004]\nSteps: 22%|██▏ | 154/700 [01:29<04:51, 1.88it/s, loss=0.018, lr=0.0004] \nSteps: 22%|██▏ | 155/700 [01:30<04:48, 1.89it/s, loss=0.018, lr=0.0004]\nSteps: 22%|██▏ | 155/700 [01:30<04:48, 1.89it/s, loss=0.0357, lr=0.0004]\nSteps: 22%|██▏ | 156/700 [01:30<04:51, 1.86it/s, loss=0.0357, lr=0.0004]\nSteps: 22%|██▏ | 156/700 [01:30<04:51, 1.86it/s, loss=0.0363, lr=0.0004]\nSteps: 22%|██▏ | 157/700 [01:31<04:46, 1.90it/s, loss=0.0363, lr=0.0004]\nSteps: 22%|██▏ | 157/700 [01:31<04:46, 1.90it/s, loss=0.0198, lr=0.0004]\nSteps: 23%|██▎ | 158/700 [01:31<04:43, 1.91it/s, loss=0.0198, lr=0.0004]\nSteps: 23%|██▎ | 158/700 [01:31<04:43, 1.91it/s, loss=0.00913, lr=0.0004]\nSteps: 23%|██▎ | 159/700 [01:32<04:47, 1.88it/s, loss=0.00913, lr=0.0004]\nSteps: 23%|██▎ | 159/700 [01:32<04:47, 1.88it/s, loss=0.00706, lr=0.0004]\nSteps: 23%|██▎ | 160/700 [01:32<04:46, 1.88it/s, loss=0.00706, lr=0.0004]\nSteps: 23%|██▎ | 160/700 [01:32<04:46, 1.88it/s, loss=0.0376, lr=0.0004] \nSteps: 23%|██▎ | 161/700 [01:33<04:46, 1.88it/s, loss=0.0376, lr=0.0004]\nSteps: 23%|██▎ | 161/700 [01:33<04:46, 1.88it/s, loss=0.0822, lr=0.0004]\nSteps: 23%|██▎ | 162/700 [01:33<04:54, 1.82it/s, loss=0.0822, lr=0.0004]\nSteps: 23%|██▎ | 162/700 [01:33<04:54, 1.82it/s, loss=0.0165, lr=0.0004]\nSteps: 23%|██▎ | 163/700 [01:34<04:54, 1.82it/s, loss=0.0165, lr=0.0004]\nSteps: 23%|██▎ | 163/700 [01:34<04:54, 1.82it/s, loss=0.0109, lr=0.0004]\nSteps: 23%|██▎ | 164/700 [01:35<04:52, 1.83it/s, loss=0.0109, lr=0.0004]\nSteps: 23%|██▎ | 164/700 [01:35<04:52, 1.83it/s, loss=0.0233, lr=0.0004]\nSteps: 24%|██▎ | 165/700 [01:35<04:49, 1.85it/s, loss=0.0233, lr=0.0004]\nSteps: 24%|██▎ | 165/700 [01:35<04:49, 1.85it/s, loss=0.00457, lr=0.0004]\nSteps: 24%|██▎ | 166/700 [01:36<04:45, 1.87it/s, loss=0.00457, lr=0.0004]\nSteps: 24%|██▎ | 166/700 [01:36<04:45, 1.87it/s, loss=0.0383, lr=0.0004] \nSteps: 24%|██▍ | 167/700 [01:36<04:42, 1.89it/s, loss=0.0383, lr=0.0004]\nSteps: 24%|██▍ | 167/700 [01:36<04:42, 1.89it/s, loss=0.074, lr=0.0004] \nSteps: 24%|██▍ | 168/700 [01:37<04:42, 1.88it/s, loss=0.074, lr=0.0004]\nSteps: 24%|██▍ | 168/700 [01:37<04:42, 1.88it/s, loss=0.0275, lr=0.0004]\nSteps: 24%|██▍ | 169/700 [01:37<04:42, 1.88it/s, loss=0.0275, lr=0.0004]\nSteps: 24%|██▍ | 169/700 [01:37<04:42, 1.88it/s, loss=0.012, lr=0.0004] \nSteps: 24%|██▍ | 170/700 [01:38<04:44, 1.87it/s, loss=0.012, lr=0.0004]\nSteps: 24%|██▍ | 170/700 [01:38<04:44, 1.87it/s, loss=0.00168, lr=0.0004]\nSteps: 24%|██▍ | 171/700 [01:38<04:44, 1.86it/s, loss=0.00168, lr=0.0004]\nSteps: 24%|██▍ | 171/700 [01:38<04:44, 1.86it/s, loss=0.00761, lr=0.0004]\nSteps: 25%|██▍ | 172/700 [01:39<04:45, 1.85it/s, loss=0.00761, lr=0.0004]\nSteps: 25%|██▍ | 172/700 [01:39<04:45, 1.85it/s, loss=0.002, lr=0.0004] \nSteps: 25%|██▍ | 173/700 [01:39<04:46, 1.84it/s, loss=0.002, lr=0.0004]\nSteps: 25%|██▍ | 173/700 [01:39<04:46, 1.84it/s, loss=0.0126, lr=0.0004]\nSteps: 25%|██▍ | 174/700 [01:40<04:47, 1.83it/s, loss=0.0126, lr=0.0004]\nSteps: 25%|██▍ | 174/700 [01:40<04:47, 1.83it/s, loss=0.0468, lr=0.0004]\nSteps: 25%|██▌ | 175/700 [01:40<04:47, 1.83it/s, loss=0.0468, lr=0.0004]\nSteps: 25%|██▌ | 175/700 [01:40<04:47, 1.83it/s, loss=0.0351, lr=0.0004]\nSteps: 25%|██▌ | 176/700 [01:41<04:44, 1.84it/s, loss=0.0351, lr=0.0004]\nSteps: 25%|██▌ | 176/700 [01:41<04:44, 1.84it/s, loss=0.0108, lr=0.0004]\nSteps: 25%|██▌ | 177/700 [01:42<04:44, 1.84it/s, loss=0.0108, lr=0.0004]\nSteps: 25%|██▌ | 177/700 [01:42<04:44, 1.84it/s, loss=0.133, lr=0.0004] \nSteps: 25%|██▌ | 178/700 [01:42<04:42, 1.85it/s, loss=0.133, lr=0.0004]\nSteps: 25%|██▌ | 178/700 [01:42<04:42, 1.85it/s, loss=0.00218, lr=0.0004]\nSteps: 26%|██▌ | 179/700 [01:43<04:38, 1.87it/s, loss=0.00218, lr=0.0004]\nSteps: 26%|██▌ | 179/700 [01:43<04:38, 1.87it/s, loss=0.00678, lr=0.0004]\nSteps: 26%|██▌ | 180/700 [01:43<04:36, 1.88it/s, loss=0.00678, lr=0.0004]\nSteps: 26%|██▌ | 180/700 [01:43<04:36, 1.88it/s, loss=0.0145, lr=0.0004] \nSteps: 26%|██▌ | 181/700 [01:44<04:33, 1.90it/s, loss=0.0145, lr=0.0004]\nSteps: 26%|██▌ | 181/700 [01:44<04:33, 1.90it/s, loss=0.0168, lr=0.0004]\nSteps: 26%|██▌ | 182/700 [01:44<04:29, 1.93it/s, loss=0.0168, lr=0.0004]\nSteps: 26%|██▌ | 182/700 [01:44<04:29, 1.93it/s, loss=0.0101, lr=0.0004]\nSteps: 26%|██▌ | 183/700 [01:45<04:24, 1.96it/s, loss=0.0101, lr=0.0004]\nSteps: 26%|██▌ | 183/700 [01:45<04:24, 1.96it/s, loss=0.0785, lr=0.0004]\nSteps: 26%|██▋ | 184/700 [01:45<04:23, 1.96it/s, loss=0.0785, lr=0.0004]\nSteps: 26%|██▋ | 184/700 [01:45<04:23, 1.96it/s, loss=0.00305, lr=0.0004]\nSteps: 26%|██▋ | 185/700 [01:46<04:23, 1.96it/s, loss=0.00305, lr=0.0004]\nSteps: 26%|██▋ | 185/700 [01:46<04:23, 1.96it/s, loss=0.208, lr=0.0004] \nSteps: 27%|██▋ | 186/700 [01:46<04:25, 1.93it/s, loss=0.208, lr=0.0004]\nSteps: 27%|██▋ | 186/700 [01:46<04:25, 1.93it/s, loss=0.00711, lr=0.0004]\nSteps: 27%|██▋ | 187/700 [01:47<04:32, 1.88it/s, loss=0.00711, lr=0.0004]\nSteps: 27%|██▋ | 187/700 [01:47<04:32, 1.88it/s, loss=0.0302, lr=0.0004] \nSteps: 27%|██▋ | 188/700 [01:47<04:35, 1.86it/s, loss=0.0302, lr=0.0004]\nSteps: 27%|██▋ | 188/700 [01:47<04:35, 1.86it/s, loss=0.0422, lr=0.0004]\nSteps: 27%|██▋ | 189/700 [01:48<04:36, 1.85it/s, loss=0.0422, lr=0.0004]\nSteps: 27%|██▋ | 189/700 [01:48<04:36, 1.85it/s, loss=0.0568, lr=0.0004]\nSteps: 27%|██▋ | 190/700 [01:48<04:35, 1.85it/s, loss=0.0568, lr=0.0004]\nSteps: 27%|██▋ | 190/700 [01:48<04:35, 1.85it/s, loss=0.00478, lr=0.0004]\nSteps: 27%|██▋ | 191/700 [01:49<04:36, 1.84it/s, loss=0.00478, lr=0.0004]\nSteps: 27%|██▋ | 191/700 [01:49<04:36, 1.84it/s, loss=0.0315, lr=0.0004] \nSteps: 27%|██▋ | 192/700 [01:49<04:37, 1.83it/s, loss=0.0315, lr=0.0004]\nSteps: 27%|██▋ | 192/700 [01:49<04:37, 1.83it/s, loss=0.00483, lr=0.0004]\nSteps: 28%|██▊ | 193/700 [01:50<04:30, 1.87it/s, loss=0.00483, lr=0.0004]\nSteps: 28%|██▊ | 193/700 [01:50<04:30, 1.87it/s, loss=0.0079, lr=0.0004] \nSteps: 28%|██▊ | 194/700 [01:51<04:28, 1.88it/s, loss=0.0079, lr=0.0004]\nSteps: 28%|██▊ | 194/700 [01:51<04:28, 1.88it/s, loss=0.00442, lr=0.0004]\nSteps: 28%|██▊ | 195/700 [01:51<04:28, 1.88it/s, loss=0.00442, lr=0.0004]\nSteps: 28%|██▊ | 195/700 [01:51<04:28, 1.88it/s, loss=0.047, lr=0.0004] \nSteps: 28%|██▊ | 196/700 [01:52<04:28, 1.88it/s, loss=0.047, lr=0.0004]\nSteps: 28%|██▊ | 196/700 [01:52<04:28, 1.88it/s, loss=0.0346, lr=0.0004]\nSteps: 28%|██▊ | 197/700 [01:52<04:28, 1.87it/s, loss=0.0346, lr=0.0004]\nSteps: 28%|██▊ | 197/700 [01:52<04:28, 1.87it/s, loss=0.128, lr=0.0004] \nSteps: 28%|██▊ | 198/700 [01:53<04:31, 1.85it/s, loss=0.128, lr=0.0004]\nSteps: 28%|██▊ | 198/700 [01:53<04:31, 1.85it/s, loss=0.00269, lr=0.0004]\nSteps: 28%|██▊ | 199/700 [01:53<04:32, 1.84it/s, loss=0.00269, lr=0.0004]\nSteps: 28%|██▊ | 199/700 [01:53<04:32, 1.84it/s, loss=0.0341, lr=0.0004] \nSteps: 29%|██▊ | 200/700 [01:54<04:39, 1.79it/s, loss=0.0341, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_200.safetensors\nLORA Unet Moved 0.0009888404747471213\nLORA CLIP Moved 4.0488466765964404e-05\nSteps: 29%|██▊ | 200/700 [01:54<04:39, 1.79it/s, loss=0.12, lr=0.0004] \nSteps: 29%|██▊ | 201/700 [01:55<05:00, 1.66it/s, loss=0.12, lr=0.0004]\nSteps: 29%|██▊ | 201/700 [01:55<05:00, 1.66it/s, loss=0.0149, lr=0.0004]\nSteps: 29%|██▉ | 202/700 [01:55<04:52, 1.70it/s, loss=0.0149, lr=0.0004]\nSteps: 29%|██▉ | 202/700 [01:55<04:52, 1.70it/s, loss=0.0194, lr=0.0004]\nSteps: 29%|██▉ | 203/700 [01:56<04:44, 1.75it/s, loss=0.0194, lr=0.0004]\nSteps: 29%|██▉ | 203/700 [01:56<04:44, 1.75it/s, loss=0.00362, lr=0.0004]\nSteps: 29%|██▉ | 204/700 [01:56<04:42, 1.76it/s, loss=0.00362, lr=0.0004]\nSteps: 29%|██▉ | 204/700 [01:56<04:42, 1.76it/s, loss=0.0177, lr=0.0004] \nSteps: 29%|██▉ | 205/700 [01:57<04:39, 1.77it/s, loss=0.0177, lr=0.0004]\nSteps: 29%|██▉ | 205/700 [01:57<04:39, 1.77it/s, loss=0.0221, lr=0.0004]\nSteps: 29%|██▉ | 206/700 [01:57<04:36, 1.78it/s, loss=0.0221, lr=0.0004]\nSteps: 29%|██▉ | 206/700 [01:57<04:36, 1.78it/s, loss=0.0169, lr=0.0004]\nSteps: 30%|██▉ | 207/700 [01:58<04:34, 1.80it/s, loss=0.0169, lr=0.0004]\nSteps: 30%|██▉ | 207/700 [01:58<04:34, 1.80it/s, loss=0.0307, lr=0.0004]\nSteps: 30%|██▉ | 208/700 [01:58<04:39, 1.76it/s, loss=0.0307, lr=0.0004]\nSteps: 30%|██▉ | 208/700 [01:58<04:39, 1.76it/s, loss=0.0412, lr=0.0004]\nSteps: 30%|██▉ | 209/700 [01:59<04:51, 1.69it/s, loss=0.0412, lr=0.0004]\nSteps: 30%|██▉ | 209/700 [01:59<04:51, 1.69it/s, loss=0.0109, lr=0.0004]\nSteps: 30%|███ | 210/700 [02:00<04:45, 1.72it/s, loss=0.0109, lr=0.0004]\nSteps: 30%|███ | 210/700 [02:00<04:45, 1.72it/s, loss=0.00631, lr=0.0004]\nSteps: 30%|███ | 211/700 [02:00<04:42, 1.73it/s, loss=0.00631, lr=0.0004]\nSteps: 30%|███ | 211/700 [02:00<04:42, 1.73it/s, loss=0.135, lr=0.0004] \nSteps: 30%|███ | 212/700 [02:01<04:35, 1.77it/s, loss=0.135, lr=0.0004]\nSteps: 30%|███ | 212/700 [02:01<04:35, 1.77it/s, loss=0.0202, lr=0.0004]\nSteps: 30%|███ | 213/700 [02:01<04:34, 1.77it/s, loss=0.0202, lr=0.0004]\nSteps: 30%|███ | 213/700 [02:01<04:34, 1.77it/s, loss=0.00592, lr=0.0004]\nSteps: 31%|███ | 214/700 [02:02<04:34, 1.77it/s, loss=0.00592, lr=0.0004]\nSteps: 31%|███ | 214/700 [02:02<04:34, 1.77it/s, loss=0.267, lr=0.0004] \nSteps: 31%|███ | 215/700 [02:02<04:28, 1.81it/s, loss=0.267, lr=0.0004]\nSteps: 31%|███ | 215/700 [02:02<04:28, 1.81it/s, loss=0.0209, lr=0.0004]\nSteps: 31%|███ | 216/700 [02:03<04:25, 1.82it/s, loss=0.0209, lr=0.0004]\nSteps: 31%|███ | 216/700 [02:03<04:25, 1.82it/s, loss=0.0375, lr=0.0004]\nSteps: 31%|███ | 217/700 [02:03<04:26, 1.81it/s, loss=0.0375, lr=0.0004]\nSteps: 31%|███ | 217/700 [02:03<04:26, 1.81it/s, loss=0.00811, lr=0.0004]\nSteps: 31%|███ | 218/700 [02:04<04:28, 1.79it/s, loss=0.00811, lr=0.0004]\nSteps: 31%|███ | 218/700 [02:04<04:28, 1.79it/s, loss=0.0201, lr=0.0004] \nSteps: 31%|███▏ | 219/700 [02:05<04:25, 1.81it/s, loss=0.0201, lr=0.0004]\nSteps: 31%|███▏ | 219/700 [02:05<04:25, 1.81it/s, loss=0.0114, lr=0.0004]\nSteps: 31%|███▏ | 220/700 [02:05<04:24, 1.82it/s, loss=0.0114, lr=0.0004]\nSteps: 31%|███▏ | 220/700 [02:05<04:24, 1.82it/s, loss=0.104, lr=0.0004] \nSteps: 32%|███▏ | 221/700 [02:06<04:24, 1.81it/s, loss=0.104, lr=0.0004]\nSteps: 32%|███▏ | 221/700 [02:06<04:24, 1.81it/s, loss=0.0184, lr=0.0004]\nSteps: 32%|███▏ | 222/700 [02:06<04:27, 1.78it/s, loss=0.0184, lr=0.0004]\nSteps: 32%|███▏ | 222/700 [02:06<04:27, 1.78it/s, loss=0.0112, lr=0.0004]\nSteps: 32%|███▏ | 223/700 [02:07<04:34, 1.73it/s, loss=0.0112, lr=0.0004]\nSteps: 32%|███▏ | 223/700 [02:07<04:34, 1.73it/s, loss=0.0133, lr=0.0004]\nSteps: 32%|███▏ | 224/700 [02:07<04:32, 1.75it/s, loss=0.0133, lr=0.0004]\nSteps: 32%|███▏ | 224/700 [02:07<04:32, 1.75it/s, loss=0.0264, lr=0.0004]\nSteps: 32%|███▏ | 225/700 [02:08<04:26, 1.78it/s, loss=0.0264, lr=0.0004]\nSteps: 32%|███▏ | 225/700 [02:08<04:26, 1.78it/s, loss=0.0537, lr=0.0004]\nSteps: 32%|███▏ | 226/700 [02:09<04:25, 1.79it/s, loss=0.0537, lr=0.0004]\nSteps: 32%|███▏ | 226/700 [02:09<04:25, 1.79it/s, loss=0.00868, lr=0.0004]\nSteps: 32%|███▏ | 227/700 [02:09<04:23, 1.79it/s, loss=0.00868, lr=0.0004]\nSteps: 32%|███▏ | 227/700 [02:09<04:23, 1.79it/s, loss=0.0373, lr=0.0004] \nSteps: 33%|███▎ | 228/700 [02:10<04:19, 1.82it/s, loss=0.0373, lr=0.0004]\nSteps: 33%|███▎ | 228/700 [02:10<04:19, 1.82it/s, loss=0.0108, lr=0.0004]\nSteps: 33%|███▎ | 229/700 [02:10<04:16, 1.83it/s, loss=0.0108, lr=0.0004]\nSteps: 33%|███▎ | 229/700 [02:10<04:16, 1.83it/s, loss=0.0296, lr=0.0004]\nSteps: 33%|███▎ | 230/700 [02:11<04:14, 1.85it/s, loss=0.0296, lr=0.0004]\nSteps: 33%|███▎ | 230/700 [02:11<04:14, 1.85it/s, loss=0.0044, lr=0.0004]\nSteps: 33%|███▎ | 231/700 [02:11<04:15, 1.83it/s, loss=0.0044, lr=0.0004]\nSteps: 33%|███▎ | 231/700 [02:11<04:15, 1.83it/s, loss=0.156, lr=0.0004] \nSteps: 33%|███▎ | 232/700 [02:12<04:15, 1.83it/s, loss=0.156, lr=0.0004]\nSteps: 33%|███▎ | 232/700 [02:12<04:15, 1.83it/s, loss=0.00477, lr=0.0004]\nSteps: 33%|███▎ | 233/700 [02:12<04:11, 1.86it/s, loss=0.00477, lr=0.0004]\nSteps: 33%|███▎ | 233/700 [02:12<04:11, 1.86it/s, loss=0.112, lr=0.0004] \nSteps: 33%|███▎ | 234/700 [02:13<04:09, 1.87it/s, loss=0.112, lr=0.0004]\nSteps: 33%|███▎ | 234/700 [02:13<04:09, 1.87it/s, loss=0.0136, lr=0.0004]\nSteps: 34%|███▎ | 235/700 [02:13<04:05, 1.89it/s, loss=0.0136, lr=0.0004]\nSteps: 34%|███▎ | 235/700 [02:13<04:05, 1.89it/s, loss=0.0123, lr=0.0004]\nSteps: 34%|███▎ | 236/700 [02:14<04:03, 1.91it/s, loss=0.0123, lr=0.0004]\nSteps: 34%|███▎ | 236/700 [02:14<04:03, 1.91it/s, loss=0.022, lr=0.0004] \nSteps: 34%|███▍ | 237/700 [02:14<04:00, 1.93it/s, loss=0.022, lr=0.0004]\nSteps: 34%|███▍ | 237/700 [02:14<04:00, 1.93it/s, loss=0.00886, lr=0.0004]\nSteps: 34%|███▍ | 238/700 [02:15<03:59, 1.93it/s, loss=0.00886, lr=0.0004]\nSteps: 34%|███▍ | 238/700 [02:15<03:59, 1.93it/s, loss=0.00845, lr=0.0004]\nSteps: 34%|███▍ | 239/700 [02:15<03:57, 1.94it/s, loss=0.00845, lr=0.0004]\nSteps: 34%|███▍ | 239/700 [02:15<03:57, 1.94it/s, loss=0.00988, lr=0.0004]\nSteps: 34%|███▍ | 240/700 [02:16<03:56, 1.94it/s, loss=0.00988, lr=0.0004]\nSteps: 34%|███▍ | 240/700 [02:16<03:56, 1.94it/s, loss=0.00246, lr=0.0004]\nSteps: 34%|███▍ | 241/700 [02:16<03:53, 1.97it/s, loss=0.00246, lr=0.0004]\nSteps: 34%|███▍ | 241/700 [02:16<03:53, 1.97it/s, loss=0.00873, lr=0.0004]\nSteps: 35%|███▍ | 242/700 [02:17<03:51, 1.98it/s, loss=0.00873, lr=0.0004]\nSteps: 35%|███▍ | 242/700 [02:17<03:51, 1.98it/s, loss=0.00512, lr=0.0004]\nSteps: 35%|███▍ | 243/700 [02:17<03:49, 1.99it/s, loss=0.00512, lr=0.0004]\nSteps: 35%|███▍ | 243/700 [02:17<03:49, 1.99it/s, loss=0.0248, lr=0.0004] \nSteps: 35%|███▍ | 244/700 [02:18<03:47, 2.00it/s, loss=0.0248, lr=0.0004]\nSteps: 35%|███▍ | 244/700 [02:18<03:47, 2.00it/s, loss=0.00431, lr=0.0004]\nSteps: 35%|███▌ | 245/700 [02:18<03:49, 1.98it/s, loss=0.00431, lr=0.0004]\nSteps: 35%|███▌ | 245/700 [02:18<03:49, 1.98it/s, loss=0.0201, lr=0.0004] \nSteps: 35%|███▌ | 246/700 [02:19<03:53, 1.95it/s, loss=0.0201, lr=0.0004]\nSteps: 35%|███▌ | 246/700 [02:19<03:53, 1.95it/s, loss=0.0103, lr=0.0004]\nSteps: 35%|███▌ | 247/700 [02:19<03:50, 1.96it/s, loss=0.0103, lr=0.0004]\nSteps: 35%|███▌ | 247/700 [02:19<03:50, 1.96it/s, loss=0.0497, lr=0.0004]\nSteps: 35%|███▌ | 248/700 [02:20<03:55, 1.92it/s, loss=0.0497, lr=0.0004]\nSteps: 35%|███▌ | 248/700 [02:20<03:55, 1.92it/s, loss=0.163, lr=0.0004] \nSteps: 36%|███▌ | 249/700 [02:21<04:03, 1.85it/s, loss=0.163, lr=0.0004]\nSteps: 36%|███▌ | 249/700 [02:21<04:03, 1.85it/s, loss=0.0142, lr=0.0004]\nSteps: 36%|███▌ | 250/700 [02:21<03:57, 1.90it/s, loss=0.0142, lr=0.0004]\nSteps: 36%|███▌ | 250/700 [02:21<03:57, 1.90it/s, loss=0.00624, lr=0.0004]\nSteps: 36%|███▌ | 251/700 [02:22<03:53, 1.92it/s, loss=0.00624, lr=0.0004]\nSteps: 36%|███▌ | 251/700 [02:22<03:53, 1.92it/s, loss=0.0026, lr=0.0004] \nSteps: 36%|███▌ | 252/700 [02:22<03:52, 1.93it/s, loss=0.0026, lr=0.0004]\nSteps: 36%|███▌ | 252/700 [02:22<03:52, 1.93it/s, loss=0.15, lr=0.0004] \nSteps: 36%|███▌ | 253/700 [02:23<03:51, 1.93it/s, loss=0.15, lr=0.0004]\nSteps: 36%|███▌ | 253/700 [02:23<03:51, 1.93it/s, loss=0.0312, lr=0.0004]\nSteps: 36%|███▋ | 254/700 [02:23<03:51, 1.93it/s, loss=0.0312, lr=0.0004]\nSteps: 36%|███▋ | 254/700 [02:23<03:51, 1.93it/s, loss=0.0161, lr=0.0004]\nSteps: 36%|███▋ | 255/700 [02:24<03:50, 1.93it/s, loss=0.0161, lr=0.0004]\nSteps: 36%|███▋ | 255/700 [02:24<03:50, 1.93it/s, loss=0.00627, lr=0.0004]\nSteps: 37%|███▋ | 256/700 [02:24<03:47, 1.95it/s, loss=0.00627, lr=0.0004]\nSteps: 37%|███▋ | 256/700 [02:24<03:47, 1.95it/s, loss=0.0224, lr=0.0004] \nSteps: 37%|███▋ | 257/700 [02:25<03:48, 1.94it/s, loss=0.0224, lr=0.0004]\nSteps: 37%|███▋ | 257/700 [02:25<03:48, 1.94it/s, loss=0.0383, lr=0.0004]\nSteps: 37%|███▋ | 258/700 [02:25<03:53, 1.90it/s, loss=0.0383, lr=0.0004]\nSteps: 37%|███▋ | 258/700 [02:25<03:53, 1.90it/s, loss=0.0124, lr=0.0004]\nSteps: 37%|███▋ | 259/700 [02:26<04:00, 1.84it/s, loss=0.0124, lr=0.0004]\nSteps: 37%|███▋ | 259/700 [02:26<04:00, 1.84it/s, loss=0.00859, lr=0.0004]\nSteps: 37%|███▋ | 260/700 [02:26<03:59, 1.84it/s, loss=0.00859, lr=0.0004]\nSteps: 37%|███▋ | 260/700 [02:26<03:59, 1.84it/s, loss=0.25, lr=0.0004] \nSteps: 37%|███▋ | 261/700 [02:27<04:04, 1.80it/s, loss=0.25, lr=0.0004]\nSteps: 37%|███▋ | 261/700 [02:27<04:04, 1.80it/s, loss=0.00184, lr=0.0004]\nSteps: 37%|███▋ | 262/700 [02:28<04:07, 1.77it/s, loss=0.00184, lr=0.0004]\nSteps: 37%|███▋ | 262/700 [02:28<04:07, 1.77it/s, loss=0.0153, lr=0.0004] \nSteps: 38%|███▊ | 263/700 [02:28<04:12, 1.73it/s, loss=0.0153, lr=0.0004]\nSteps: 38%|███▊ | 263/700 [02:28<04:12, 1.73it/s, loss=0.0682, lr=0.0004]\nSteps: 38%|███▊ | 264/700 [02:29<04:14, 1.71it/s, loss=0.0682, lr=0.0004]\nSteps: 38%|███▊ | 264/700 [02:29<04:14, 1.71it/s, loss=0.0619, lr=0.0004]\nSteps: 38%|███▊ | 265/700 [02:29<04:28, 1.62it/s, loss=0.0619, lr=0.0004]\nSteps: 38%|███▊ | 265/700 [02:29<04:28, 1.62it/s, loss=0.0181, lr=0.0004]\nSteps: 38%|███▊ | 266/700 [02:30<04:25, 1.64it/s, loss=0.0181, lr=0.0004]\nSteps: 38%|███▊ | 266/700 [02:30<04:25, 1.64it/s, loss=0.0288, lr=0.0004]\nSteps: 38%|███▊ | 267/700 [02:31<04:23, 1.64it/s, loss=0.0288, lr=0.0004]\nSteps: 38%|███▊ | 267/700 [02:31<04:23, 1.64it/s, loss=0.00962, lr=0.0004]\nSteps: 38%|███▊ | 268/700 [02:31<04:28, 1.61it/s, loss=0.00962, lr=0.0004]\nSteps: 38%|███▊ | 268/700 [02:31<04:28, 1.61it/s, loss=0.0127, lr=0.0004] \nSteps: 38%|███▊ | 269/700 [02:32<04:27, 1.61it/s, loss=0.0127, lr=0.0004]\nSteps: 38%|███▊ | 269/700 [02:32<04:27, 1.61it/s, loss=0.00764, lr=0.0004]\nSteps: 39%|███▊ | 270/700 [02:33<04:29, 1.60it/s, loss=0.00764, lr=0.0004]\nSteps: 39%|███▊ | 270/700 [02:33<04:29, 1.60it/s, loss=0.005, lr=0.0004] \nSteps: 39%|███▊ | 271/700 [02:33<04:29, 1.59it/s, loss=0.005, lr=0.0004]\nSteps: 39%|███▊ | 271/700 [02:33<04:29, 1.59it/s, loss=0.0286, lr=0.0004]\nSteps: 39%|███▉ | 272/700 [02:34<04:31, 1.58it/s, loss=0.0286, lr=0.0004]\nSteps: 39%|███▉ | 272/700 [02:34<04:31, 1.58it/s, loss=0.0257, lr=0.0004]\nSteps: 39%|███▉ | 273/700 [02:34<04:24, 1.62it/s, loss=0.0257, lr=0.0004]\nSteps: 39%|███▉ | 273/700 [02:34<04:24, 1.62it/s, loss=0.0963, lr=0.0004]\nSteps: 39%|███▉ | 274/700 [02:35<04:26, 1.60it/s, loss=0.0963, lr=0.0004]\nSteps: 39%|███▉ | 274/700 [02:35<04:26, 1.60it/s, loss=0.00725, lr=0.0004]\nSteps: 39%|███▉ | 275/700 [02:36<04:19, 1.64it/s, loss=0.00725, lr=0.0004]\nSteps: 39%|███▉ | 275/700 [02:36<04:19, 1.64it/s, loss=0.00157, lr=0.0004]\nSteps: 39%|███▉ | 276/700 [02:36<04:14, 1.66it/s, loss=0.00157, lr=0.0004]\nSteps: 39%|███▉ | 276/700 [02:36<04:14, 1.66it/s, loss=0.00832, lr=0.0004]\nSteps: 40%|███▉ | 277/700 [02:37<04:13, 1.67it/s, loss=0.00832, lr=0.0004]\nSteps: 40%|███▉ | 277/700 [02:37<04:13, 1.67it/s, loss=0.0604, lr=0.0004] \nSteps: 40%|███▉ | 278/700 [02:37<04:10, 1.68it/s, loss=0.0604, lr=0.0004]\nSteps: 40%|███▉ | 278/700 [02:37<04:10, 1.68it/s, loss=0.0378, lr=0.0004]\nSteps: 40%|███▉ | 279/700 [02:38<04:06, 1.71it/s, loss=0.0378, lr=0.0004]\nSteps: 40%|███▉ | 279/700 [02:38<04:06, 1.71it/s, loss=0.0044, lr=0.0004]\nSteps: 40%|████ | 280/700 [02:39<04:06, 1.71it/s, loss=0.0044, lr=0.0004]\nSteps: 40%|████ | 280/700 [02:39<04:06, 1.71it/s, loss=0.0125, lr=0.0004]\nSteps: 40%|████ | 281/700 [02:39<04:02, 1.73it/s, loss=0.0125, lr=0.0004]\nSteps: 40%|████ | 281/700 [02:39<04:02, 1.73it/s, loss=0.00308, lr=0.0004]\nSteps: 40%|████ | 282/700 [02:40<03:59, 1.74it/s, loss=0.00308, lr=0.0004]\nSteps: 40%|████ | 282/700 [02:40<03:59, 1.74it/s, loss=0.0162, lr=0.0004] \nSteps: 40%|████ | 283/700 [02:40<03:59, 1.74it/s, loss=0.0162, lr=0.0004]\nSteps: 40%|████ | 283/700 [02:40<03:59, 1.74it/s, loss=0.0964, lr=0.0004]\nSteps: 41%|████ | 284/700 [02:41<04:00, 1.73it/s, loss=0.0964, lr=0.0004]\nSteps: 41%|████ | 284/700 [02:41<04:00, 1.73it/s, loss=0.0236, lr=0.0004]\nSteps: 41%|████ | 285/700 [02:41<03:58, 1.74it/s, loss=0.0236, lr=0.0004]\nSteps: 41%|████ | 285/700 [02:41<03:58, 1.74it/s, loss=0.016, lr=0.0004] \nSteps: 41%|████ | 286/700 [02:42<03:55, 1.76it/s, loss=0.016, lr=0.0004]\nSteps: 41%|████ | 286/700 [02:42<03:55, 1.76it/s, loss=0.00831, lr=0.0004]\nSteps: 41%|████ | 287/700 [02:43<03:54, 1.76it/s, loss=0.00831, lr=0.0004]\nSteps: 41%|████ | 287/700 [02:43<03:54, 1.76it/s, loss=0.0241, lr=0.0004] \nSteps: 41%|████ | 288/700 [02:43<03:49, 1.80it/s, loss=0.0241, lr=0.0004]\nSteps: 41%|████ | 288/700 [02:43<03:49, 1.80it/s, loss=0.0839, lr=0.0004]\nSteps: 41%|████▏ | 289/700 [02:44<03:41, 1.85it/s, loss=0.0839, lr=0.0004]\nSteps: 41%|████▏ | 289/700 [02:44<03:41, 1.85it/s, loss=0.0263, lr=0.0004]\nSteps: 41%|████▏ | 290/700 [02:44<03:35, 1.90it/s, loss=0.0263, lr=0.0004]\nSteps: 41%|████▏ | 290/700 [02:44<03:35, 1.90it/s, loss=0.0967, lr=0.0004]\nSteps: 42%|████▏ | 291/700 [02:45<03:31, 1.93it/s, loss=0.0967, lr=0.0004]\nSteps: 42%|████▏ | 291/700 [02:45<03:31, 1.93it/s, loss=0.0111, lr=0.0004]\nSteps: 42%|████▏ | 292/700 [02:45<03:30, 1.94it/s, loss=0.0111, lr=0.0004]\nSteps: 42%|████▏ | 292/700 [02:45<03:30, 1.94it/s, loss=0.0426, lr=0.0004]\nSteps: 42%|████▏ | 293/700 [02:46<03:32, 1.92it/s, loss=0.0426, lr=0.0004]\nSteps: 42%|████▏ | 293/700 [02:46<03:32, 1.92it/s, loss=0.0054, lr=0.0004]\nSteps: 42%|████▏ | 294/700 [02:46<03:31, 1.92it/s, loss=0.0054, lr=0.0004]\nSteps: 42%|████▏ | 294/700 [02:46<03:31, 1.92it/s, loss=0.0031, lr=0.0004]\nSteps: 42%|████▏ | 295/700 [02:47<03:31, 1.91it/s, loss=0.0031, lr=0.0004]\nSteps: 42%|████▏ | 295/700 [02:47<03:31, 1.91it/s, loss=0.0399, lr=0.0004]\nSteps: 42%|████▏ | 296/700 [02:47<03:35, 1.87it/s, loss=0.0399, lr=0.0004]\nSteps: 42%|████▏ | 296/700 [02:47<03:35, 1.87it/s, loss=0.0144, lr=0.0004]\nSteps: 42%|████▏ | 297/700 [02:48<03:36, 1.86it/s, loss=0.0144, lr=0.0004]\nSteps: 42%|████▏ | 297/700 [02:48<03:36, 1.86it/s, loss=0.0868, lr=0.0004]\nSteps: 43%|████▎ | 298/700 [02:48<03:37, 1.85it/s, loss=0.0868, lr=0.0004]\nSteps: 43%|████▎ | 298/700 [02:48<03:37, 1.85it/s, loss=0.0358, lr=0.0004]\nSteps: 43%|████▎ | 299/700 [02:49<03:39, 1.82it/s, loss=0.0358, lr=0.0004]\nSteps: 43%|████▎ | 299/700 [02:49<03:39, 1.82it/s, loss=0.0683, lr=0.0004]\nSteps: 43%|████▎ | 300/700 [02:49<03:36, 1.84it/s, loss=0.0683, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_300.safetensors\nLORA Unet Moved 0.0012533192057162523\nLORA CLIP Moved 5.122544462210499e-05\nSteps: 43%|████▎ | 300/700 [02:49<03:36, 1.84it/s, loss=0.00153, lr=0.0004]\nSteps: 43%|████▎ | 301/700 [02:50<04:01, 1.65it/s, loss=0.00153, lr=0.0004]\nSteps: 43%|████▎ | 301/700 [02:50<04:01, 1.65it/s, loss=0.0337, lr=0.0004] \nSteps: 43%|████▎ | 302/700 [02:51<03:52, 1.71it/s, loss=0.0337, lr=0.0004]\nSteps: 43%|████▎ | 302/700 [02:51<03:52, 1.71it/s, loss=0.0974, lr=0.0004]\nSteps: 43%|████▎ | 303/700 [02:51<03:40, 1.80it/s, loss=0.0974, lr=0.0004]\nSteps: 43%|████▎ | 303/700 [02:51<03:40, 1.80it/s, loss=0.00531, lr=0.0004]\nSteps: 43%|████▎ | 304/700 [02:52<03:34, 1.85it/s, loss=0.00531, lr=0.0004]\nSteps: 43%|████▎ | 304/700 [02:52<03:34, 1.85it/s, loss=0.0179, lr=0.0004] \nSteps: 44%|████▎ | 305/700 [02:52<03:32, 1.86it/s, loss=0.0179, lr=0.0004]\nSteps: 44%|████▎ | 305/700 [02:52<03:32, 1.86it/s, loss=0.0687, lr=0.0004]\nSteps: 44%|████▎ | 306/700 [02:53<03:35, 1.83it/s, loss=0.0687, lr=0.0004]\nSteps: 44%|████▎ | 306/700 [02:53<03:35, 1.83it/s, loss=0.00892, lr=0.0004]\nSteps: 44%|████▍ | 307/700 [02:53<03:36, 1.81it/s, loss=0.00892, lr=0.0004]\nSteps: 44%|████▍ | 307/700 [02:53<03:36, 1.81it/s, loss=0.0717, lr=0.0004] \nSteps: 44%|████▍ | 308/700 [02:54<03:31, 1.85it/s, loss=0.0717, lr=0.0004]\nSteps: 44%|████▍ | 308/700 [02:54<03:31, 1.85it/s, loss=0.00435, lr=0.0004]\nSteps: 44%|████▍ | 309/700 [02:54<03:31, 1.85it/s, loss=0.00435, lr=0.0004]\nSteps: 44%|████▍ | 309/700 [02:54<03:31, 1.85it/s, loss=0.00829, lr=0.0004]\nSteps: 44%|████▍ | 310/700 [02:55<03:36, 1.80it/s, loss=0.00829, lr=0.0004]\nSteps: 44%|████▍ | 310/700 [02:55<03:36, 1.80it/s, loss=0.0713, lr=0.0004] \nSteps: 44%|████▍ | 311/700 [02:55<03:31, 1.84it/s, loss=0.0713, lr=0.0004]\nSteps: 44%|████▍ | 311/700 [02:55<03:31, 1.84it/s, loss=0.00767, lr=0.0004]\nSteps: 45%|████▍ | 312/700 [02:56<03:25, 1.89it/s, loss=0.00767, lr=0.0004]\nSteps: 45%|████▍ | 312/700 [02:56<03:25, 1.89it/s, loss=0.0893, lr=0.0004] \nSteps: 45%|████▍ | 313/700 [02:56<03:19, 1.94it/s, loss=0.0893, lr=0.0004]\nSteps: 45%|████▍ | 313/700 [02:56<03:19, 1.94it/s, loss=0.019, lr=0.0004] \nSteps: 45%|████▍ | 314/700 [02:57<03:18, 1.95it/s, loss=0.019, lr=0.0004]\nSteps: 45%|████▍ | 314/700 [02:57<03:18, 1.95it/s, loss=0.00861, lr=0.0004]\nSteps: 45%|████▌ | 315/700 [02:57<03:15, 1.97it/s, loss=0.00861, lr=0.0004]\nSteps: 45%|████▌ | 315/700 [02:57<03:15, 1.97it/s, loss=0.0777, lr=0.0004] \nSteps: 45%|████▌ | 316/700 [02:58<03:13, 1.98it/s, loss=0.0777, lr=0.0004]\nSteps: 45%|████▌ | 316/700 [02:58<03:13, 1.98it/s, loss=0.00247, lr=0.0004]\nSteps: 45%|████▌ | 317/700 [02:58<03:15, 1.96it/s, loss=0.00247, lr=0.0004]\nSteps: 45%|████▌ | 317/700 [02:58<03:15, 1.96it/s, loss=0.229, lr=0.0004] \nSteps: 45%|████▌ | 318/700 [02:59<03:17, 1.93it/s, loss=0.229, lr=0.0004]\nSteps: 45%|████▌ | 318/700 [02:59<03:17, 1.93it/s, loss=0.0106, lr=0.0004]\nSteps: 46%|████▌ | 319/700 [03:00<03:21, 1.89it/s, loss=0.0106, lr=0.0004]\nSteps: 46%|████▌ | 319/700 [03:00<03:21, 1.89it/s, loss=0.00504, lr=0.0004]\nSteps: 46%|████▌ | 320/700 [03:00<03:23, 1.87it/s, loss=0.00504, lr=0.0004]\nSteps: 46%|████▌ | 320/700 [03:00<03:23, 1.87it/s, loss=0.00787, lr=0.0004]\nSteps: 46%|████▌ | 321/700 [03:01<03:25, 1.84it/s, loss=0.00787, lr=0.0004]\nSteps: 46%|████▌ | 321/700 [03:01<03:25, 1.84it/s, loss=0.103, lr=0.0004] \nSteps: 46%|████▌ | 322/700 [03:01<03:25, 1.84it/s, loss=0.103, lr=0.0004]\nSteps: 46%|████▌ | 322/700 [03:01<03:25, 1.84it/s, loss=0.028, lr=0.0004]\nSteps: 46%|████▌ | 323/700 [03:02<03:27, 1.82it/s, loss=0.028, lr=0.0004]\nSteps: 46%|████▌ | 323/700 [03:02<03:27, 1.82it/s, loss=0.173, lr=0.0004]\nSteps: 46%|████▋ | 324/700 [03:02<03:27, 1.82it/s, loss=0.173, lr=0.0004]\nSteps: 46%|████▋ | 324/700 [03:02<03:27, 1.82it/s, loss=0.0602, lr=0.0004]\nSteps: 46%|████▋ | 325/700 [03:03<03:28, 1.80it/s, loss=0.0602, lr=0.0004]\nSteps: 46%|████▋ | 325/700 [03:03<03:28, 1.80it/s, loss=0.0443, lr=0.0004]\nSteps: 47%|████▋ | 326/700 [03:03<03:27, 1.81it/s, loss=0.0443, lr=0.0004]\nSteps: 47%|████▋ | 326/700 [03:03<03:27, 1.81it/s, loss=0.0424, lr=0.0004]\nSteps: 47%|████▋ | 327/700 [03:04<03:27, 1.80it/s, loss=0.0424, lr=0.0004]\nSteps: 47%|████▋ | 327/700 [03:04<03:27, 1.80it/s, loss=0.00866, lr=0.0004]\nSteps: 47%|████▋ | 328/700 [03:05<03:29, 1.78it/s, loss=0.00866, lr=0.0004]\nSteps: 47%|████▋ | 328/700 [03:05<03:29, 1.78it/s, loss=0.0145, lr=0.0004] \nSteps: 47%|████▋ | 329/700 [03:05<03:27, 1.79it/s, loss=0.0145, lr=0.0004]\nSteps: 47%|████▋ | 329/700 [03:05<03:27, 1.79it/s, loss=0.0291, lr=0.0004]\nSteps: 47%|████▋ | 330/700 [03:06<03:27, 1.79it/s, loss=0.0291, lr=0.0004]\nSteps: 47%|████▋ | 330/700 [03:06<03:27, 1.79it/s, loss=0.112, lr=0.0004] \nSteps: 47%|████▋ | 331/700 [03:06<03:27, 1.78it/s, loss=0.112, lr=0.0004]\nSteps: 47%|████▋ | 331/700 [03:06<03:27, 1.78it/s, loss=0.0583, lr=0.0004]\nSteps: 47%|████▋ | 332/700 [03:07<03:29, 1.76it/s, loss=0.0583, lr=0.0004]\nSteps: 47%|████▋ | 332/700 [03:07<03:29, 1.76it/s, loss=0.0574, lr=0.0004]\nSteps: 48%|████▊ | 333/700 [03:07<03:29, 1.75it/s, loss=0.0574, lr=0.0004]\nSteps: 48%|████▊ | 333/700 [03:07<03:29, 1.75it/s, loss=0.00921, lr=0.0004]\nSteps: 48%|████▊ | 334/700 [03:08<03:21, 1.82it/s, loss=0.00921, lr=0.0004]\nSteps: 48%|████▊ | 334/700 [03:08<03:21, 1.82it/s, loss=0.0178, lr=0.0004] \nSteps: 48%|████▊ | 335/700 [03:08<03:19, 1.83it/s, loss=0.0178, lr=0.0004]\nSteps: 48%|████▊ | 335/700 [03:08<03:19, 1.83it/s, loss=0.0147, lr=0.0004]\nSteps: 48%|████▊ | 336/700 [03:09<03:21, 1.80it/s, loss=0.0147, lr=0.0004]\nSteps: 48%|████▊ | 336/700 [03:09<03:21, 1.80it/s, loss=0.0233, lr=0.0004]\nSteps: 48%|████▊ | 337/700 [03:10<03:22, 1.80it/s, loss=0.0233, lr=0.0004]\nSteps: 48%|████▊ | 337/700 [03:10<03:22, 1.80it/s, loss=0.0265, lr=0.0004]\nSteps: 48%|████▊ | 338/700 [03:10<03:22, 1.79it/s, loss=0.0265, lr=0.0004]\nSteps: 48%|████▊ | 338/700 [03:10<03:22, 1.79it/s, loss=0.0103, lr=0.0004]\nSteps: 48%|████▊ | 339/700 [03:11<03:26, 1.75it/s, loss=0.0103, lr=0.0004]\nSteps: 48%|████▊ | 339/700 [03:11<03:26, 1.75it/s, loss=0.00171, lr=0.0004]\nSteps: 49%|████▊ | 340/700 [03:11<03:23, 1.77it/s, loss=0.00171, lr=0.0004]\nSteps: 49%|████▊ | 340/700 [03:11<03:23, 1.77it/s, loss=0.226, lr=0.0004] \nSteps: 49%|████▊ | 341/700 [03:12<03:18, 1.80it/s, loss=0.226, lr=0.0004]\nSteps: 49%|████▊ | 341/700 [03:12<03:18, 1.80it/s, loss=0.0407, lr=0.0004]\nSteps: 49%|████▉ | 342/700 [03:12<03:13, 1.85it/s, loss=0.0407, lr=0.0004]\nSteps: 49%|████▉ | 342/700 [03:12<03:13, 1.85it/s, loss=0.0194, lr=0.0004]\nSteps: 49%|████▉ | 343/700 [03:13<03:08, 1.89it/s, loss=0.0194, lr=0.0004]\nSteps: 49%|████▉ | 343/700 [03:13<03:08, 1.89it/s, loss=0.00992, lr=0.0004]\nSteps: 49%|████▉ | 344/700 [03:13<03:06, 1.90it/s, loss=0.00992, lr=0.0004]\nSteps: 49%|████▉ | 344/700 [03:13<03:06, 1.90it/s, loss=0.0107, lr=0.0004] \nSteps: 49%|████▉ | 345/700 [03:14<03:03, 1.93it/s, loss=0.0107, lr=0.0004]\nSteps: 49%|████▉ | 345/700 [03:14<03:03, 1.93it/s, loss=0.028, lr=0.0004] \nSteps: 49%|████▉ | 346/700 [03:14<03:02, 1.93it/s, loss=0.028, lr=0.0004]\nSteps: 49%|████▉ | 346/700 [03:14<03:02, 1.93it/s, loss=0.00153, lr=0.0004]\nSteps: 50%|████▉ | 347/700 [03:15<03:04, 1.91it/s, loss=0.00153, lr=0.0004]\nSteps: 50%|████▉ | 347/700 [03:15<03:04, 1.91it/s, loss=0.0558, lr=0.0004] \nSteps: 50%|████▉ | 348/700 [03:15<03:07, 1.88it/s, loss=0.0558, lr=0.0004]\nSteps: 50%|████▉ | 348/700 [03:15<03:07, 1.88it/s, loss=0.0713, lr=0.0004]\nSteps: 50%|████▉ | 349/700 [03:16<03:08, 1.86it/s, loss=0.0713, lr=0.0004]\nSteps: 50%|████▉ | 349/700 [03:16<03:08, 1.86it/s, loss=0.0164, lr=0.0004]\nSteps: 50%|█████ | 350/700 [03:17<03:10, 1.83it/s, loss=0.0164, lr=0.0004]\nSteps: 50%|█████ | 350/700 [03:17<03:10, 1.83it/s, loss=0.243, lr=0.0004] \nSteps: 50%|█████ | 351/700 [03:17<03:12, 1.82it/s, loss=0.243, lr=0.0004]\nSteps: 50%|█████ | 351/700 [03:17<03:12, 1.82it/s, loss=0.0152, lr=0.0004]\nSteps: 50%|█████ | 352/700 [03:18<03:11, 1.82it/s, loss=0.0152, lr=0.0004]\nSteps: 50%|█████ | 352/700 [03:18<03:11, 1.82it/s, loss=0.0497, lr=0.0004]\nSteps: 50%|█████ | 353/700 [03:18<03:10, 1.82it/s, loss=0.0497, lr=0.0004]\nSteps: 50%|█████ | 353/700 [03:18<03:10, 1.82it/s, loss=0.0611, lr=0.0004]\nSteps: 51%|█████ | 354/700 [03:19<03:07, 1.84it/s, loss=0.0611, lr=0.0004]\nSteps: 51%|█████ | 354/700 [03:19<03:07, 1.84it/s, loss=0.0738, lr=0.0004]\nSteps: 51%|█████ | 355/700 [03:19<03:04, 1.87it/s, loss=0.0738, lr=0.0004]\nSteps: 51%|█████ | 355/700 [03:19<03:04, 1.87it/s, loss=0.00715, lr=0.0004]\nSteps: 51%|█████ | 356/700 [03:20<03:06, 1.84it/s, loss=0.00715, lr=0.0004]\nSteps: 51%|█████ | 356/700 [03:20<03:06, 1.84it/s, loss=0.0472, lr=0.0004] \nSteps: 51%|█████ | 357/700 [03:20<03:06, 1.84it/s, loss=0.0472, lr=0.0004]\nSteps: 51%|█████ | 357/700 [03:20<03:06, 1.84it/s, loss=0.0275, lr=0.0004]\nSteps: 51%|█████ | 358/700 [03:21<03:06, 1.83it/s, loss=0.0275, lr=0.0004]\nSteps: 51%|█████ | 358/700 [03:21<03:06, 1.83it/s, loss=0.111, lr=0.0004] \nSteps: 51%|█████▏ | 359/700 [03:22<03:05, 1.84it/s, loss=0.111, lr=0.0004]\nSteps: 51%|█████▏ | 359/700 [03:22<03:05, 1.84it/s, loss=0.0267, lr=0.0004]\nSteps: 51%|█████▏ | 360/700 [03:22<03:07, 1.82it/s, loss=0.0267, lr=0.0004]\nSteps: 51%|█████▏ | 360/700 [03:22<03:07, 1.82it/s, loss=0.0598, lr=0.0004]\nSteps: 52%|█████▏ | 361/700 [03:23<03:07, 1.81it/s, loss=0.0598, lr=0.0004]\nSteps: 52%|█████▏ | 361/700 [03:23<03:07, 1.81it/s, loss=0.0234, lr=0.0004]\nSteps: 52%|█████▏ | 362/700 [03:23<03:08, 1.79it/s, loss=0.0234, lr=0.0004]\nSteps: 52%|█████▏ | 362/700 [03:23<03:08, 1.79it/s, loss=0.00394, lr=0.0004]\nSteps: 52%|█████▏ | 363/700 [03:24<03:07, 1.80it/s, loss=0.00394, lr=0.0004]\nSteps: 52%|█████▏ | 363/700 [03:24<03:07, 1.80it/s, loss=0.103, lr=0.0004] \nSteps: 52%|█████▏ | 364/700 [03:24<03:07, 1.80it/s, loss=0.103, lr=0.0004]\nSteps: 52%|█████▏ | 364/700 [03:24<03:07, 1.80it/s, loss=0.0446, lr=0.0004]\nSteps: 52%|█████▏ | 365/700 [03:25<03:07, 1.79it/s, loss=0.0446, lr=0.0004]\nSteps: 52%|█████▏ | 365/700 [03:25<03:07, 1.79it/s, loss=0.0886, lr=0.0004]\nSteps: 52%|█████▏ | 366/700 [03:25<03:03, 1.82it/s, loss=0.0886, lr=0.0004]\nSteps: 52%|█████▏ | 366/700 [03:25<03:03, 1.82it/s, loss=0.00974, lr=0.0004]\nSteps: 52%|█████▏ | 367/700 [03:26<02:57, 1.87it/s, loss=0.00974, lr=0.0004]\nSteps: 52%|█████▏ | 367/700 [03:26<02:57, 1.87it/s, loss=0.0581, lr=0.0004] \nSteps: 53%|█████▎ | 368/700 [03:26<02:55, 1.89it/s, loss=0.0581, lr=0.0004]\nSteps: 53%|█████▎ | 368/700 [03:26<02:55, 1.89it/s, loss=0.0141, lr=0.0004]\nSteps: 53%|█████▎ | 369/700 [03:27<02:53, 1.91it/s, loss=0.0141, lr=0.0004]\nSteps: 53%|█████▎ | 369/700 [03:27<02:53, 1.91it/s, loss=0.108, lr=0.0004] \nSteps: 53%|█████▎ | 370/700 [03:27<02:51, 1.93it/s, loss=0.108, lr=0.0004]\nSteps: 53%|█████▎ | 370/700 [03:27<02:51, 1.93it/s, loss=0.0274, lr=0.0004]\nSteps: 53%|█████▎ | 371/700 [03:28<02:53, 1.90it/s, loss=0.0274, lr=0.0004]\nSteps: 53%|█████▎ | 371/700 [03:28<02:53, 1.90it/s, loss=0.0238, lr=0.0004]\nSteps: 53%|█████▎ | 372/700 [03:29<02:55, 1.87it/s, loss=0.0238, lr=0.0004]\nSteps: 53%|█████▎ | 372/700 [03:29<02:55, 1.87it/s, loss=0.0135, lr=0.0004]\nSteps: 53%|█████▎ | 373/700 [03:29<02:56, 1.85it/s, loss=0.0135, lr=0.0004]\nSteps: 53%|█████▎ | 373/700 [03:29<02:56, 1.85it/s, loss=0.0273, lr=0.0004]\nSteps: 53%|█████▎ | 374/700 [03:30<02:57, 1.84it/s, loss=0.0273, lr=0.0004]\nSteps: 53%|█████▎ | 374/700 [03:30<02:57, 1.84it/s, loss=0.0107, lr=0.0004]\nSteps: 54%|█████▎ | 375/700 [03:30<02:58, 1.82it/s, loss=0.0107, lr=0.0004]\nSteps: 54%|█████▎ | 375/700 [03:30<02:58, 1.82it/s, loss=0.117, lr=0.0004] \nSteps: 54%|█████▎ | 376/700 [03:31<02:57, 1.82it/s, loss=0.117, lr=0.0004]\nSteps: 54%|█████▎ | 376/700 [03:31<02:57, 1.82it/s, loss=0.00753, lr=0.0004]\nSteps: 54%|█████▍ | 377/700 [03:31<02:53, 1.86it/s, loss=0.00753, lr=0.0004]\nSteps: 54%|█████▍ | 377/700 [03:31<02:53, 1.86it/s, loss=0.00374, lr=0.0004]\nSteps: 54%|█████▍ | 378/700 [03:32<02:50, 1.89it/s, loss=0.00374, lr=0.0004]\nSteps: 54%|█████▍ | 378/700 [03:32<02:50, 1.89it/s, loss=0.00199, lr=0.0004]\nSteps: 54%|█████▍ | 379/700 [03:32<02:49, 1.89it/s, loss=0.00199, lr=0.0004]\nSteps: 54%|█████▍ | 379/700 [03:32<02:49, 1.89it/s, loss=0.0103, lr=0.0004] \nSteps: 54%|█████▍ | 380/700 [03:33<02:50, 1.88it/s, loss=0.0103, lr=0.0004]\nSteps: 54%|█████▍ | 380/700 [03:33<02:50, 1.88it/s, loss=0.0585, lr=0.0004]\nSteps: 54%|█████▍ | 381/700 [03:33<02:50, 1.87it/s, loss=0.0585, lr=0.0004]\nSteps: 54%|█████▍ | 381/700 [03:33<02:50, 1.87it/s, loss=0.00844, lr=0.0004]\nSteps: 55%|█████▍ | 382/700 [03:34<02:50, 1.87it/s, loss=0.00844, lr=0.0004]\nSteps: 55%|█████▍ | 382/700 [03:34<02:50, 1.87it/s, loss=0.0385, lr=0.0004] \nSteps: 55%|█████▍ | 383/700 [03:34<02:46, 1.90it/s, loss=0.0385, lr=0.0004]\nSteps: 55%|█████▍ | 383/700 [03:34<02:46, 1.90it/s, loss=0.0191, lr=0.0004]\nSteps: 55%|█████▍ | 384/700 [03:35<02:46, 1.90it/s, loss=0.0191, lr=0.0004]\nSteps: 55%|█████▍ | 384/700 [03:35<02:46, 1.90it/s, loss=0.00918, lr=0.0004]\nSteps: 55%|█████▌ | 385/700 [03:35<02:47, 1.88it/s, loss=0.00918, lr=0.0004]\nSteps: 55%|█████▌ | 385/700 [03:36<02:47, 1.88it/s, loss=0.0416, lr=0.0004] \nSteps: 55%|█████▌ | 386/700 [03:36<02:50, 1.84it/s, loss=0.0416, lr=0.0004]\nSteps: 55%|█████▌ | 386/700 [03:36<02:50, 1.84it/s, loss=0.0671, lr=0.0004]\nSteps: 55%|█████▌ | 387/700 [03:37<02:50, 1.84it/s, loss=0.0671, lr=0.0004]\nSteps: 55%|█████▌ | 387/700 [03:37<02:50, 1.84it/s, loss=0.0628, lr=0.0004]\nSteps: 55%|█████▌ | 388/700 [03:37<02:50, 1.83it/s, loss=0.0628, lr=0.0004]\nSteps: 55%|█████▌ | 388/700 [03:37<02:50, 1.83it/s, loss=0.00164, lr=0.0004]\nSteps: 56%|█████▌ | 389/700 [03:38<02:50, 1.83it/s, loss=0.00164, lr=0.0004]\nSteps: 56%|█████▌ | 389/700 [03:38<02:50, 1.83it/s, loss=0.0177, lr=0.0004] \nSteps: 56%|█████▌ | 390/700 [03:38<02:49, 1.83it/s, loss=0.0177, lr=0.0004]\nSteps: 56%|█████▌ | 390/700 [03:38<02:49, 1.83it/s, loss=0.0583, lr=0.0004]\nSteps: 56%|█████▌ | 391/700 [03:39<02:54, 1.77it/s, loss=0.0583, lr=0.0004]\nSteps: 56%|█████▌ | 391/700 [03:39<02:54, 1.77it/s, loss=0.0428, lr=0.0004]\nSteps: 56%|█████▌ | 392/700 [03:39<02:55, 1.76it/s, loss=0.0428, lr=0.0004]\nSteps: 56%|█████▌ | 392/700 [03:39<02:55, 1.76it/s, loss=0.01, lr=0.0004] \nSteps: 56%|█████▌ | 393/700 [03:40<02:55, 1.75it/s, loss=0.01, lr=0.0004]\nSteps: 56%|█████▌ | 393/700 [03:40<02:55, 1.75it/s, loss=0.0341, lr=0.0004]\nSteps: 56%|█████▋ | 394/700 [03:41<02:54, 1.76it/s, loss=0.0341, lr=0.0004]\nSteps: 56%|█████▋ | 394/700 [03:41<02:54, 1.76it/s, loss=0.104, lr=0.0004] \nSteps: 56%|█████▋ | 395/700 [03:41<02:54, 1.75it/s, loss=0.104, lr=0.0004]\nSteps: 56%|█████▋ | 395/700 [03:41<02:54, 1.75it/s, loss=0.00275, lr=0.0004]\nSteps: 57%|█████▋ | 396/700 [03:42<02:55, 1.74it/s, loss=0.00275, lr=0.0004]\nSteps: 57%|█████▋ | 396/700 [03:42<02:55, 1.74it/s, loss=0.0398, lr=0.0004] \nSteps: 57%|█████▋ | 397/700 [03:42<02:57, 1.71it/s, loss=0.0398, lr=0.0004]\nSteps: 57%|█████▋ | 397/700 [03:42<02:57, 1.71it/s, loss=0.0031, lr=0.0004]\nSteps: 57%|█████▋ | 398/700 [03:43<02:53, 1.74it/s, loss=0.0031, lr=0.0004]\nSteps: 57%|█████▋ | 398/700 [03:43<02:53, 1.74it/s, loss=0.00922, lr=0.0004]\nSteps: 57%|█████▋ | 399/700 [03:43<02:51, 1.76it/s, loss=0.00922, lr=0.0004]\nSteps: 57%|█████▋ | 399/700 [03:43<02:51, 1.76it/s, loss=0.0128, lr=0.0004] \nSteps: 57%|█████▋ | 400/700 [03:44<02:48, 1.79it/s, loss=0.0128, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_400.safetensors\nLORA Unet Moved 0.0015479204012081027\nLORA CLIP Moved 6.280629168031737e-05\nSteps: 57%|█████▋ | 400/700 [03:44<02:48, 1.79it/s, loss=0.00486, lr=0.0004]\nSteps: 57%|█████▋ | 401/700 [03:45<02:59, 1.67it/s, loss=0.00486, lr=0.0004]\nSteps: 57%|█████▋ | 401/700 [03:45<02:59, 1.67it/s, loss=0.0242, lr=0.0004] \nSteps: 57%|█████▋ | 402/700 [03:45<02:54, 1.70it/s, loss=0.0242, lr=0.0004]\nSteps: 57%|█████▋ | 402/700 [03:45<02:54, 1.70it/s, loss=0.0114, lr=0.0004]\nSteps: 58%|█████▊ | 403/700 [03:46<02:49, 1.75it/s, loss=0.0114, lr=0.0004]\nSteps: 58%|█████▊ | 403/700 [03:46<02:49, 1.75it/s, loss=0.101, lr=0.0004] \nSteps: 58%|█████▊ | 404/700 [03:46<02:44, 1.80it/s, loss=0.101, lr=0.0004]\nSteps: 58%|█████▊ | 404/700 [03:46<02:44, 1.80it/s, loss=0.0565, lr=0.0004]\nSteps: 58%|█████▊ | 405/700 [03:47<02:39, 1.85it/s, loss=0.0565, lr=0.0004]\nSteps: 58%|█████▊ | 405/700 [03:47<02:39, 1.85it/s, loss=0.0139, lr=0.0004]\nSteps: 58%|█████▊ | 406/700 [03:47<02:37, 1.86it/s, loss=0.0139, lr=0.0004]\nSteps: 58%|█████▊ | 406/700 [03:47<02:37, 1.86it/s, loss=0.00395, lr=0.0004]\nSteps: 58%|█████▊ | 407/700 [03:48<02:34, 1.89it/s, loss=0.00395, lr=0.0004]\nSteps: 58%|█████▊ | 407/700 [03:48<02:34, 1.89it/s, loss=0.00693, lr=0.0004]\nSteps: 58%|█████▊ | 408/700 [03:48<02:34, 1.89it/s, loss=0.00693, lr=0.0004]\nSteps: 58%|█████▊ | 408/700 [03:48<02:34, 1.89it/s, loss=0.0185, lr=0.0004] \nSteps: 58%|█████▊ | 409/700 [03:49<02:36, 1.85it/s, loss=0.0185, lr=0.0004]\nSteps: 58%|█████▊ | 409/700 [03:49<02:36, 1.85it/s, loss=0.0226, lr=0.0004]\nSteps: 59%|█████▊ | 410/700 [03:49<02:37, 1.84it/s, loss=0.0226, lr=0.0004]\nSteps: 59%|█████▊ | 410/700 [03:49<02:37, 1.84it/s, loss=0.0122, lr=0.0004]\nSteps: 59%|█████▊ | 411/700 [03:50<02:37, 1.83it/s, loss=0.0122, lr=0.0004]\nSteps: 59%|█████▊ | 411/700 [03:50<02:37, 1.83it/s, loss=0.00795, lr=0.0004]\nSteps: 59%|█████▉ | 412/700 [03:51<02:38, 1.82it/s, loss=0.00795, lr=0.0004]\nSteps: 59%|█████▉ | 412/700 [03:51<02:38, 1.82it/s, loss=0.00217, lr=0.0004]\nSteps: 59%|█████▉ | 413/700 [03:51<02:39, 1.80it/s, loss=0.00217, lr=0.0004]\nSteps: 59%|█████▉ | 413/700 [03:51<02:39, 1.80it/s, loss=0.0183, lr=0.0004] \nSteps: 59%|█████▉ | 414/700 [03:52<02:37, 1.82it/s, loss=0.0183, lr=0.0004]\nSteps: 59%|█████▉ | 414/700 [03:52<02:37, 1.82it/s, loss=0.0149, lr=0.0004]\nSteps: 59%|█████▉ | 415/700 [03:52<02:32, 1.87it/s, loss=0.0149, lr=0.0004]\nSteps: 59%|█████▉ | 415/700 [03:52<02:32, 1.87it/s, loss=0.00353, lr=0.0004]\nSteps: 59%|█████▉ | 416/700 [03:53<02:29, 1.90it/s, loss=0.00353, lr=0.0004]\nSteps: 59%|█████▉ | 416/700 [03:53<02:29, 1.90it/s, loss=0.0368, lr=0.0004] \nSteps: 60%|█████▉ | 417/700 [03:53<02:31, 1.87it/s, loss=0.0368, lr=0.0004]\nSteps: 60%|█████▉ | 417/700 [03:53<02:31, 1.87it/s, loss=0.00279, lr=0.0004]\nSteps: 60%|█████▉ | 418/700 [03:54<02:34, 1.83it/s, loss=0.00279, lr=0.0004]\nSteps: 60%|█████▉ | 418/700 [03:54<02:34, 1.83it/s, loss=0.01, lr=0.0004] \nSteps: 60%|█████▉ | 419/700 [03:54<02:34, 1.82it/s, loss=0.01, lr=0.0004]\nSteps: 60%|█████▉ | 419/700 [03:54<02:34, 1.82it/s, loss=0.00632, lr=0.0004]\nSteps: 60%|██████ | 420/700 [03:55<02:34, 1.81it/s, loss=0.00632, lr=0.0004]\nSteps: 60%|██████ | 420/700 [03:55<02:34, 1.81it/s, loss=0.178, lr=0.0004] \nSteps: 60%|██████ | 421/700 [03:55<02:31, 1.85it/s, loss=0.178, lr=0.0004]\nSteps: 60%|██████ | 421/700 [03:55<02:31, 1.85it/s, loss=0.00584, lr=0.0004]\nSteps: 60%|██████ | 422/700 [03:56<02:29, 1.85it/s, loss=0.00584, lr=0.0004]\nSteps: 60%|██████ | 422/700 [03:56<02:29, 1.85it/s, loss=0.0698, lr=0.0004] \nSteps: 60%|██████ | 423/700 [03:57<02:28, 1.87it/s, loss=0.0698, lr=0.0004]\nSteps: 60%|██████ | 423/700 [03:57<02:28, 1.87it/s, loss=0.0128, lr=0.0004]\nSteps: 61%|██████ | 424/700 [03:57<02:28, 1.86it/s, loss=0.0128, lr=0.0004]\nSteps: 61%|██████ | 424/700 [03:57<02:28, 1.86it/s, loss=0.0616, lr=0.0004]\nSteps: 61%|██████ | 425/700 [03:58<02:29, 1.84it/s, loss=0.0616, lr=0.0004]\nSteps: 61%|██████ | 425/700 [03:58<02:29, 1.84it/s, loss=0.0102, lr=0.0004]\nSteps: 61%|██████ | 426/700 [03:58<02:34, 1.77it/s, loss=0.0102, lr=0.0004]\nSteps: 61%|██████ | 426/700 [03:58<02:34, 1.77it/s, loss=0.00736, lr=0.0004]\nSteps: 61%|██████ | 427/700 [03:59<02:40, 1.70it/s, loss=0.00736, lr=0.0004]\nSteps: 61%|██████ | 427/700 [03:59<02:40, 1.70it/s, loss=0.0113, lr=0.0004] \nSteps: 61%|██████ | 428/700 [04:00<02:43, 1.67it/s, loss=0.0113, lr=0.0004]\nSteps: 61%|██████ | 428/700 [04:00<02:43, 1.67it/s, loss=0.00517, lr=0.0004]\nSteps: 61%|██████▏ | 429/700 [04:00<02:37, 1.72it/s, loss=0.00517, lr=0.0004]\nSteps: 61%|██████▏ | 429/700 [04:00<02:37, 1.72it/s, loss=0.032, lr=0.0004] \nSteps: 61%|██████▏ | 430/700 [04:01<02:33, 1.76it/s, loss=0.032, lr=0.0004]\nSteps: 61%|██████▏ | 430/700 [04:01<02:33, 1.76it/s, loss=0.0133, lr=0.0004]\nSteps: 62%|██████▏ | 431/700 [04:01<02:32, 1.77it/s, loss=0.0133, lr=0.0004]\nSteps: 62%|██████▏ | 431/700 [04:01<02:32, 1.77it/s, loss=0.0429, lr=0.0004]\nSteps: 62%|██████▏ | 432/700 [04:02<02:29, 1.79it/s, loss=0.0429, lr=0.0004]\nSteps: 62%|██████▏ | 432/700 [04:02<02:29, 1.79it/s, loss=0.00896, lr=0.0004]\nSteps: 62%|██████▏ | 433/700 [04:02<02:30, 1.78it/s, loss=0.00896, lr=0.0004]\nSteps: 62%|██████▏ | 433/700 [04:02<02:30, 1.78it/s, loss=0.072, lr=0.0004] \nSteps: 62%|██████▏ | 434/700 [04:03<02:28, 1.79it/s, loss=0.072, lr=0.0004]\nSteps: 62%|██████▏ | 434/700 [04:03<02:28, 1.79it/s, loss=0.011, lr=0.0004]\nSteps: 62%|██████▏ | 435/700 [04:03<02:26, 1.81it/s, loss=0.011, lr=0.0004]\nSteps: 62%|██████▏ | 435/700 [04:03<02:26, 1.81it/s, loss=0.116, lr=0.0004]\nSteps: 62%|██████▏ | 436/700 [04:04<02:25, 1.81it/s, loss=0.116, lr=0.0004]\nSteps: 62%|██████▏ | 436/700 [04:04<02:25, 1.81it/s, loss=0.00514, lr=0.0004]\nSteps: 62%|██████▏ | 437/700 [04:04<02:25, 1.81it/s, loss=0.00514, lr=0.0004]\nSteps: 62%|██████▏ | 437/700 [04:04<02:25, 1.81it/s, loss=0.0137, lr=0.0004] \nSteps: 63%|██████▎ | 438/700 [04:05<02:24, 1.81it/s, loss=0.0137, lr=0.0004]\nSteps: 63%|██████▎ | 438/700 [04:05<02:24, 1.81it/s, loss=0.00167, lr=0.0004]\nSteps: 63%|██████▎ | 439/700 [04:06<02:22, 1.84it/s, loss=0.00167, lr=0.0004]\nSteps: 63%|██████▎ | 439/700 [04:06<02:22, 1.84it/s, loss=0.0108, lr=0.0004] \nSteps: 63%|██████▎ | 440/700 [04:06<02:21, 1.84it/s, loss=0.0108, lr=0.0004]\nSteps: 63%|██████▎ | 440/700 [04:06<02:21, 1.84it/s, loss=0.0135, lr=0.0004]\nSteps: 63%|██████▎ | 441/700 [04:07<02:21, 1.83it/s, loss=0.0135, lr=0.0004]\nSteps: 63%|██████▎ | 441/700 [04:07<02:21, 1.83it/s, loss=0.0287, lr=0.0004]\nSteps: 63%|██████▎ | 442/700 [04:07<02:21, 1.83it/s, loss=0.0287, lr=0.0004]\nSteps: 63%|██████▎ | 442/700 [04:07<02:21, 1.83it/s, loss=0.0146, lr=0.0004]\nSteps: 63%|██████▎ | 443/700 [04:08<02:20, 1.83it/s, loss=0.0146, lr=0.0004]\nSteps: 63%|██████▎ | 443/700 [04:08<02:20, 1.83it/s, loss=0.216, lr=0.0004] \nSteps: 63%|██████▎ | 444/700 [04:08<02:19, 1.83it/s, loss=0.216, lr=0.0004]\nSteps: 63%|██████▎ | 444/700 [04:08<02:19, 1.83it/s, loss=0.0454, lr=0.0004]\nSteps: 64%|██████▎ | 445/700 [04:09<02:18, 1.84it/s, loss=0.0454, lr=0.0004]\nSteps: 64%|██████▎ | 445/700 [04:09<02:18, 1.84it/s, loss=0.0396, lr=0.0004]\nSteps: 64%|██████▎ | 446/700 [04:09<02:16, 1.86it/s, loss=0.0396, lr=0.0004]\nSteps: 64%|██████▎ | 446/700 [04:09<02:16, 1.86it/s, loss=0.0378, lr=0.0004]\nSteps: 64%|██████▍ | 447/700 [04:10<02:15, 1.86it/s, loss=0.0378, lr=0.0004]\nSteps: 64%|██████▍ | 447/700 [04:10<02:15, 1.86it/s, loss=0.0112, lr=0.0004]\nSteps: 64%|██████▍ | 448/700 [04:10<02:16, 1.85it/s, loss=0.0112, lr=0.0004]\nSteps: 64%|██████▍ | 448/700 [04:10<02:16, 1.85it/s, loss=0.0411, lr=0.0004]\nSteps: 64%|██████▍ | 449/700 [04:11<02:16, 1.83it/s, loss=0.0411, lr=0.0004]\nSteps: 64%|██████▍ | 449/700 [04:11<02:16, 1.83it/s, loss=0.0222, lr=0.0004]\nSteps: 64%|██████▍ | 450/700 [04:12<02:16, 1.83it/s, loss=0.0222, lr=0.0004]\nSteps: 64%|██████▍ | 450/700 [04:12<02:16, 1.83it/s, loss=0.0735, lr=0.0004]\nSteps: 64%|██████▍ | 451/700 [04:12<02:15, 1.84it/s, loss=0.0735, lr=0.0004]\nSteps: 64%|██████▍ | 451/700 [04:12<02:15, 1.84it/s, loss=0.0261, lr=0.0004]\nSteps: 65%|██████▍ | 452/700 [04:13<02:15, 1.83it/s, loss=0.0261, lr=0.0004]\nSteps: 65%|██████▍ | 452/700 [04:13<02:15, 1.83it/s, loss=0.0861, lr=0.0004]\nSteps: 65%|██████▍ | 453/700 [04:13<02:15, 1.82it/s, loss=0.0861, lr=0.0004]\nSteps: 65%|██████▍ | 453/700 [04:13<02:15, 1.82it/s, loss=0.148, lr=0.0004] \nSteps: 65%|██████▍ | 454/700 [04:14<02:16, 1.81it/s, loss=0.148, lr=0.0004]\nSteps: 65%|██████▍ | 454/700 [04:14<02:16, 1.81it/s, loss=0.0519, lr=0.0004]\nSteps: 65%|██████▌ | 455/700 [04:14<02:15, 1.80it/s, loss=0.0519, lr=0.0004]\nSteps: 65%|██████▌ | 455/700 [04:14<02:15, 1.80it/s, loss=0.0917, lr=0.0004]\nSteps: 65%|██████▌ | 456/700 [04:15<02:16, 1.79it/s, loss=0.0917, lr=0.0004]\nSteps: 65%|██████▌ | 456/700 [04:15<02:16, 1.79it/s, loss=0.00812, lr=0.0004]\nSteps: 65%|██████▌ | 457/700 [04:15<02:14, 1.81it/s, loss=0.00812, lr=0.0004]\nSteps: 65%|██████▌ | 457/700 [04:15<02:14, 1.81it/s, loss=0.0117, lr=0.0004] \nSteps: 65%|██████▌ | 458/700 [04:16<02:12, 1.82it/s, loss=0.0117, lr=0.0004]\nSteps: 65%|██████▌ | 458/700 [04:16<02:12, 1.82it/s, loss=0.0597, lr=0.0004]\nSteps: 66%|██████▌ | 459/700 [04:16<02:12, 1.82it/s, loss=0.0597, lr=0.0004]\nSteps: 66%|██████▌ | 459/700 [04:16<02:12, 1.82it/s, loss=0.0163, lr=0.0004]\nSteps: 66%|██████▌ | 460/700 [04:17<02:11, 1.82it/s, loss=0.0163, lr=0.0004]\nSteps: 66%|██████▌ | 460/700 [04:17<02:11, 1.82it/s, loss=0.0808, lr=0.0004]\nSteps: 66%|██████▌ | 461/700 [04:18<02:10, 1.83it/s, loss=0.0808, lr=0.0004]\nSteps: 66%|██████▌ | 461/700 [04:18<02:10, 1.83it/s, loss=0.0125, lr=0.0004]\nSteps: 66%|██████▌ | 462/700 [04:18<02:10, 1.83it/s, loss=0.0125, lr=0.0004]\nSteps: 66%|██████▌ | 462/700 [04:18<02:10, 1.83it/s, loss=0.00627, lr=0.0004]\nSteps: 66%|██████▌ | 463/700 [04:19<02:09, 1.83it/s, loss=0.00627, lr=0.0004]\nSteps: 66%|██████▌ | 463/700 [04:19<02:09, 1.83it/s, loss=0.049, lr=0.0004] \nSteps: 66%|██████▋ | 464/700 [04:19<02:09, 1.83it/s, loss=0.049, lr=0.0004]\nSteps: 66%|██████▋ | 464/700 [04:19<02:09, 1.83it/s, loss=0.0678, lr=0.0004]\nSteps: 66%|██████▋ | 465/700 [04:20<02:09, 1.82it/s, loss=0.0678, lr=0.0004]\nSteps: 66%|██████▋ | 465/700 [04:20<02:09, 1.82it/s, loss=0.035, lr=0.0004] \nSteps: 67%|██████▋ | 466/700 [04:20<02:08, 1.82it/s, loss=0.035, lr=0.0004]\nSteps: 67%|██████▋ | 466/700 [04:20<02:08, 1.82it/s, loss=0.131, lr=0.0004]\nSteps: 67%|██████▋ | 467/700 [04:21<02:08, 1.82it/s, loss=0.131, lr=0.0004]\nSteps: 67%|██████▋ | 467/700 [04:21<02:08, 1.82it/s, loss=0.277, lr=0.0004]\nSteps: 67%|██████▋ | 468/700 [04:21<02:06, 1.83it/s, loss=0.277, lr=0.0004]\nSteps: 67%|██████▋ | 468/700 [04:21<02:06, 1.83it/s, loss=0.0124, lr=0.0004]\nSteps: 67%|██████▋ | 469/700 [04:22<02:06, 1.82it/s, loss=0.0124, lr=0.0004]\nSteps: 67%|██████▋ | 469/700 [04:22<02:06, 1.82it/s, loss=0.0462, lr=0.0004]\nSteps: 67%|██████▋ | 470/700 [04:23<02:07, 1.80it/s, loss=0.0462, lr=0.0004]\nSteps: 67%|██████▋ | 470/700 [04:23<02:07, 1.80it/s, loss=0.0415, lr=0.0004]\nSteps: 67%|██████▋ | 471/700 [04:23<02:05, 1.82it/s, loss=0.0415, lr=0.0004]\nSteps: 67%|██████▋ | 471/700 [04:23<02:05, 1.82it/s, loss=0.169, lr=0.0004] \nSteps: 67%|██████▋ | 472/700 [04:24<02:04, 1.83it/s, loss=0.169, lr=0.0004]\nSteps: 67%|██████▋ | 472/700 [04:24<02:04, 1.83it/s, loss=0.0197, lr=0.0004]\nSteps: 68%|██████▊ | 473/700 [04:24<02:04, 1.82it/s, loss=0.0197, lr=0.0004]\nSteps: 68%|██████▊ | 473/700 [04:24<02:04, 1.82it/s, loss=0.0275, lr=0.0004]\nSteps: 68%|██████▊ | 474/700 [04:25<02:03, 1.83it/s, loss=0.0275, lr=0.0004]\nSteps: 68%|██████▊ | 474/700 [04:25<02:03, 1.83it/s, loss=0.00273, lr=0.0004]\nSteps: 68%|██████▊ | 475/700 [04:25<02:04, 1.81it/s, loss=0.00273, lr=0.0004]\nSteps: 68%|██████▊ | 475/700 [04:25<02:04, 1.81it/s, loss=0.0279, lr=0.0004] \nSteps: 68%|██████▊ | 476/700 [04:26<02:04, 1.79it/s, loss=0.0279, lr=0.0004]\nSteps: 68%|██████▊ | 476/700 [04:26<02:04, 1.79it/s, loss=0.133, lr=0.0004] \nSteps: 68%|██████▊ | 477/700 [04:26<02:04, 1.79it/s, loss=0.133, lr=0.0004]\nSteps: 68%|██████▊ | 477/700 [04:26<02:04, 1.79it/s, loss=0.00584, lr=0.0004]\nSteps: 68%|██████▊ | 478/700 [04:27<02:03, 1.79it/s, loss=0.00584, lr=0.0004]\nSteps: 68%|██████▊ | 478/700 [04:27<02:03, 1.79it/s, loss=0.0541, lr=0.0004] \nSteps: 68%|██████▊ | 479/700 [04:28<02:03, 1.79it/s, loss=0.0541, lr=0.0004]\nSteps: 68%|██████▊ | 479/700 [04:28<02:03, 1.79it/s, loss=0.0163, lr=0.0004]\nSteps: 69%|██████▊ | 480/700 [04:28<02:03, 1.79it/s, loss=0.0163, lr=0.0004]\nSteps: 69%|██████▊ | 480/700 [04:28<02:03, 1.79it/s, loss=0.00538, lr=0.0004]\nSteps: 69%|██████▊ | 481/700 [04:29<02:01, 1.80it/s, loss=0.00538, lr=0.0004]\nSteps: 69%|██████▊ | 481/700 [04:29<02:01, 1.80it/s, loss=0.00586, lr=0.0004]\nSteps: 69%|██████▉ | 482/700 [04:29<02:00, 1.81it/s, loss=0.00586, lr=0.0004]\nSteps: 69%|██████▉ | 482/700 [04:29<02:00, 1.81it/s, loss=0.0193, lr=0.0004] \nSteps: 69%|██████▉ | 483/700 [04:30<02:00, 1.80it/s, loss=0.0193, lr=0.0004]\nSteps: 69%|██████▉ | 483/700 [04:30<02:00, 1.80it/s, loss=0.00902, lr=0.0004]\nSteps: 69%|██████▉ | 484/700 [04:30<02:03, 1.76it/s, loss=0.00902, lr=0.0004]\nSteps: 69%|██████▉ | 484/700 [04:30<02:03, 1.76it/s, loss=0.386, lr=0.0004] \nSteps: 69%|██████▉ | 485/700 [04:31<01:59, 1.80it/s, loss=0.386, lr=0.0004]\nSteps: 69%|██████▉ | 485/700 [04:31<01:59, 1.80it/s, loss=0.00357, lr=0.0004]\nSteps: 69%|██████▉ | 486/700 [04:31<01:56, 1.83it/s, loss=0.00357, lr=0.0004]\nSteps: 69%|██████▉ | 486/700 [04:31<01:56, 1.83it/s, loss=0.0271, lr=0.0004] \nSteps: 70%|██████▉ | 487/700 [04:32<01:57, 1.81it/s, loss=0.0271, lr=0.0004]\nSteps: 70%|██████▉ | 487/700 [04:32<01:57, 1.81it/s, loss=0.122, lr=0.0004] \nSteps: 70%|██████▉ | 488/700 [04:32<01:54, 1.85it/s, loss=0.122, lr=0.0004]\nSteps: 70%|██████▉ | 488/700 [04:32<01:54, 1.85it/s, loss=0.0115, lr=0.0004]\nSteps: 70%|██████▉ | 489/700 [04:33<01:52, 1.88it/s, loss=0.0115, lr=0.0004]\nSteps: 70%|██████▉ | 489/700 [04:33<01:52, 1.88it/s, loss=0.0324, lr=0.0004]\nSteps: 70%|███████ | 490/700 [04:34<01:50, 1.89it/s, loss=0.0324, lr=0.0004]\nSteps: 70%|███████ | 490/700 [04:34<01:50, 1.89it/s, loss=0.00157, lr=0.0004]\nSteps: 70%|███████ | 491/700 [04:34<01:49, 1.91it/s, loss=0.00157, lr=0.0004]\nSteps: 70%|███████ | 491/700 [04:34<01:49, 1.91it/s, loss=0.014, lr=0.0004] \nSteps: 70%|███████ | 492/700 [04:35<01:48, 1.92it/s, loss=0.014, lr=0.0004]\nSteps: 70%|███████ | 492/700 [04:35<01:48, 1.92it/s, loss=0.0567, lr=0.0004]\nSteps: 70%|███████ | 493/700 [04:35<01:49, 1.90it/s, loss=0.0567, lr=0.0004]\nSteps: 70%|███████ | 493/700 [04:35<01:49, 1.90it/s, loss=0.046, lr=0.0004] \nSteps: 71%|███████ | 494/700 [04:36<01:50, 1.87it/s, loss=0.046, lr=0.0004]\nSteps: 71%|███████ | 494/700 [04:36<01:50, 1.87it/s, loss=0.0275, lr=0.0004]\nSteps: 71%|███████ | 495/700 [04:36<01:52, 1.83it/s, loss=0.0275, lr=0.0004]\nSteps: 71%|███████ | 495/700 [04:36<01:52, 1.83it/s, loss=0.00814, lr=0.0004]\nSteps: 71%|███████ | 496/700 [04:37<01:51, 1.82it/s, loss=0.00814, lr=0.0004]\nSteps: 71%|███████ | 496/700 [04:37<01:51, 1.82it/s, loss=0.00738, lr=0.0004]\nSteps: 71%|███████ | 497/700 [04:37<01:51, 1.82it/s, loss=0.00738, lr=0.0004]\nSteps: 71%|███████ | 497/700 [04:37<01:51, 1.82it/s, loss=0.00353, lr=0.0004]\nSteps: 71%|███████ | 498/700 [04:38<01:49, 1.85it/s, loss=0.00353, lr=0.0004]\nSteps: 71%|███████ | 498/700 [04:38<01:49, 1.85it/s, loss=0.0116, lr=0.0004] \nSteps: 71%|███████▏ | 499/700 [04:38<01:47, 1.88it/s, loss=0.0116, lr=0.0004]\nSteps: 71%|███████▏ | 499/700 [04:38<01:47, 1.88it/s, loss=0.133, lr=0.0004] \nSteps: 71%|███████▏ | 500/700 [04:39<01:47, 1.86it/s, loss=0.133, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_500.safetensors\nLORA Unet Moved 0.0018108426593244076\nLORA CLIP Moved 7.164952694438398e-05\nSteps: 71%|███████▏ | 500/700 [04:39<01:47, 1.86it/s, loss=0.0136, lr=0.0004]\nSteps: 72%|███████▏ | 501/700 [04:40<01:57, 1.70it/s, loss=0.0136, lr=0.0004]\nSteps: 72%|███████▏ | 501/700 [04:40<01:57, 1.70it/s, loss=0.0168, lr=0.0004]\nSteps: 72%|███████▏ | 502/700 [04:40<01:53, 1.74it/s, loss=0.0168, lr=0.0004]\nSteps: 72%|███████▏ | 502/700 [04:40<01:53, 1.74it/s, loss=0.0313, lr=0.0004]\nSteps: 72%|███████▏ | 503/700 [04:41<01:51, 1.76it/s, loss=0.0313, lr=0.0004]\nSteps: 72%|███████▏ | 503/700 [04:41<01:51, 1.76it/s, loss=0.162, lr=0.0004] \nSteps: 72%|███████▏ | 504/700 [04:41<01:49, 1.78it/s, loss=0.162, lr=0.0004]\nSteps: 72%|███████▏ | 504/700 [04:41<01:49, 1.78it/s, loss=0.0117, lr=0.0004]\nSteps: 72%|███████▏ | 505/700 [04:42<01:48, 1.80it/s, loss=0.0117, lr=0.0004]\nSteps: 72%|███████▏ | 505/700 [04:42<01:48, 1.80it/s, loss=0.00169, lr=0.0004]\nSteps: 72%|███████▏ | 506/700 [04:42<01:46, 1.81it/s, loss=0.00169, lr=0.0004]\nSteps: 72%|███████▏ | 506/700 [04:42<01:46, 1.81it/s, loss=0.0182, lr=0.0004] \nSteps: 72%|███████▏ | 507/700 [04:43<01:45, 1.83it/s, loss=0.0182, lr=0.0004]\nSteps: 72%|███████▏ | 507/700 [04:43<01:45, 1.83it/s, loss=0.0245, lr=0.0004]\nSteps: 73%|███████▎ | 508/700 [04:43<01:46, 1.81it/s, loss=0.0245, lr=0.0004]\nSteps: 73%|███████▎ | 508/700 [04:43<01:46, 1.81it/s, loss=0.00677, lr=0.0004]\nSteps: 73%|███████▎ | 509/700 [04:44<01:45, 1.81it/s, loss=0.00677, lr=0.0004]\nSteps: 73%|███████▎ | 509/700 [04:44<01:45, 1.81it/s, loss=0.076, lr=0.0004] \nSteps: 73%|███████▎ | 510/700 [04:45<01:44, 1.82it/s, loss=0.076, lr=0.0004]\nSteps: 73%|███████▎ | 510/700 [04:45<01:44, 1.82it/s, loss=0.295, lr=0.0004]\nSteps: 73%|███████▎ | 511/700 [04:45<01:44, 1.80it/s, loss=0.295, lr=0.0004]\nSteps: 73%|███████▎ | 511/700 [04:45<01:44, 1.80it/s, loss=0.00341, lr=0.0004]\nSteps: 73%|███████▎ | 512/700 [04:46<01:44, 1.80it/s, loss=0.00341, lr=0.0004]\nSteps: 73%|███████▎ | 512/700 [04:46<01:44, 1.80it/s, loss=0.0115, lr=0.0004] \nSteps: 73%|███████▎ | 513/700 [04:46<01:41, 1.84it/s, loss=0.0115, lr=0.0004]\nSteps: 73%|███████▎ | 513/700 [04:46<01:41, 1.84it/s, loss=0.0503, lr=0.0004]\nSteps: 73%|███████▎ | 514/700 [04:47<01:39, 1.86it/s, loss=0.0503, lr=0.0004]\nSteps: 73%|███████▎ | 514/700 [04:47<01:39, 1.86it/s, loss=0.00832, lr=0.0004]\nSteps: 74%|███████▎ | 515/700 [04:47<01:40, 1.84it/s, loss=0.00832, lr=0.0004]\nSteps: 74%|███████▎ | 515/700 [04:47<01:40, 1.84it/s, loss=0.00209, lr=0.0004]\nSteps: 74%|███████▎ | 516/700 [04:48<01:40, 1.83it/s, loss=0.00209, lr=0.0004]\nSteps: 74%|███████▎ | 516/700 [04:48<01:40, 1.83it/s, loss=0.014, lr=0.0004] \nSteps: 74%|███████▍ | 517/700 [04:48<01:40, 1.83it/s, loss=0.014, lr=0.0004]\nSteps: 74%|███████▍ | 517/700 [04:48<01:40, 1.83it/s, loss=0.035, lr=0.0004]\nSteps: 74%|███████▍ | 518/700 [04:49<01:40, 1.82it/s, loss=0.035, lr=0.0004]\nSteps: 74%|███████▍ | 518/700 [04:49<01:40, 1.82it/s, loss=0.223, lr=0.0004]\nSteps: 74%|███████▍ | 519/700 [04:49<01:39, 1.82it/s, loss=0.223, lr=0.0004]\nSteps: 74%|███████▍ | 519/700 [04:49<01:39, 1.82it/s, loss=0.0441, lr=0.0004]\nSteps: 74%|███████▍ | 520/700 [04:50<01:38, 1.83it/s, loss=0.0441, lr=0.0004]\nSteps: 74%|███████▍ | 520/700 [04:50<01:38, 1.83it/s, loss=0.0202, lr=0.0004]\nSteps: 74%|███████▍ | 521/700 [04:50<01:35, 1.88it/s, loss=0.0202, lr=0.0004]\nSteps: 74%|███████▍ | 521/700 [04:50<01:35, 1.88it/s, loss=0.0171, lr=0.0004]\nSteps: 75%|███████▍ | 522/700 [04:51<01:33, 1.90it/s, loss=0.0171, lr=0.0004]\nSteps: 75%|███████▍ | 522/700 [04:51<01:33, 1.90it/s, loss=0.0126, lr=0.0004]\nSteps: 75%|███████▍ | 523/700 [04:52<01:34, 1.87it/s, loss=0.0126, lr=0.0004]\nSteps: 75%|███████▍ | 523/700 [04:52<01:34, 1.87it/s, loss=0.0803, lr=0.0004]\nSteps: 75%|███████▍ | 524/700 [04:52<01:34, 1.87it/s, loss=0.0803, lr=0.0004]\nSteps: 75%|███████▍ | 524/700 [04:52<01:34, 1.87it/s, loss=0.00485, lr=0.0004]\nSteps: 75%|███████▌ | 525/700 [04:53<01:33, 1.87it/s, loss=0.00485, lr=0.0004]\nSteps: 75%|███████▌ | 525/700 [04:53<01:33, 1.87it/s, loss=0.0205, lr=0.0004] \nSteps: 75%|███████▌ | 526/700 [04:53<01:31, 1.90it/s, loss=0.0205, lr=0.0004]\nSteps: 75%|███████▌ | 526/700 [04:53<01:31, 1.90it/s, loss=0.0313, lr=0.0004]\nSteps: 75%|███████▌ | 527/700 [04:54<01:32, 1.88it/s, loss=0.0313, lr=0.0004]\nSteps: 75%|███████▌ | 527/700 [04:54<01:32, 1.88it/s, loss=0.00287, lr=0.0004]\nSteps: 75%|███████▌ | 528/700 [04:54<01:32, 1.85it/s, loss=0.00287, lr=0.0004]\nSteps: 75%|███████▌ | 528/700 [04:54<01:32, 1.85it/s, loss=0.00346, lr=0.0004]\nSteps: 76%|███████▌ | 529/700 [04:55<01:33, 1.83it/s, loss=0.00346, lr=0.0004]\nSteps: 76%|███████▌ | 529/700 [04:55<01:33, 1.83it/s, loss=0.277, lr=0.0004] \nSteps: 76%|███████▌ | 530/700 [04:55<01:31, 1.85it/s, loss=0.277, lr=0.0004]\nSteps: 76%|███████▌ | 530/700 [04:55<01:31, 1.85it/s, loss=0.114, lr=0.0004]\nSteps: 76%|███████▌ | 531/700 [04:56<01:31, 1.84it/s, loss=0.114, lr=0.0004]\nSteps: 76%|███████▌ | 531/700 [04:56<01:31, 1.84it/s, loss=0.00907, lr=0.0004]\nSteps: 76%|███████▌ | 532/700 [04:56<01:29, 1.88it/s, loss=0.00907, lr=0.0004]\nSteps: 76%|███████▌ | 532/700 [04:56<01:29, 1.88it/s, loss=0.0188, lr=0.0004] \nSteps: 76%|███████▌ | 533/700 [04:57<01:27, 1.92it/s, loss=0.0188, lr=0.0004]\nSteps: 76%|███████▌ | 533/700 [04:57<01:27, 1.92it/s, loss=0.00488, lr=0.0004]\nSteps: 76%|███████▋ | 534/700 [04:57<01:25, 1.95it/s, loss=0.00488, lr=0.0004]\nSteps: 76%|███████▋ | 534/700 [04:57<01:25, 1.95it/s, loss=0.043, lr=0.0004] \nSteps: 76%|███████▋ | 535/700 [04:58<01:23, 1.97it/s, loss=0.043, lr=0.0004]\nSteps: 76%|███████▋ | 535/700 [04:58<01:23, 1.97it/s, loss=0.0856, lr=0.0004]\nSteps: 77%|███████▋ | 536/700 [04:58<01:23, 1.96it/s, loss=0.0856, lr=0.0004]\nSteps: 77%|███████▋ | 536/700 [04:58<01:23, 1.96it/s, loss=0.0465, lr=0.0004]\nSteps: 77%|███████▋ | 537/700 [04:59<01:26, 1.88it/s, loss=0.0465, lr=0.0004]\nSteps: 77%|███████▋ | 537/700 [04:59<01:26, 1.88it/s, loss=0.0128, lr=0.0004]\nSteps: 77%|███████▋ | 538/700 [05:00<01:26, 1.87it/s, loss=0.0128, lr=0.0004]\nSteps: 77%|███████▋ | 538/700 [05:00<01:26, 1.87it/s, loss=0.0311, lr=0.0004]\nSteps: 77%|███████▋ | 539/700 [05:00<01:26, 1.87it/s, loss=0.0311, lr=0.0004]\nSteps: 77%|███████▋ | 539/700 [05:00<01:26, 1.87it/s, loss=0.0866, lr=0.0004]\nSteps: 77%|███████▋ | 540/700 [05:01<01:26, 1.86it/s, loss=0.0866, lr=0.0004]\nSteps: 77%|███████▋ | 540/700 [05:01<01:26, 1.86it/s, loss=0.0238, lr=0.0004]\nSteps: 77%|███████▋ | 541/700 [05:01<01:25, 1.87it/s, loss=0.0238, lr=0.0004]\nSteps: 77%|███████▋ | 541/700 [05:01<01:25, 1.87it/s, loss=0.167, lr=0.0004] \nSteps: 77%|███████▋ | 542/700 [05:02<01:23, 1.90it/s, loss=0.167, lr=0.0004]\nSteps: 77%|███████▋ | 542/700 [05:02<01:23, 1.90it/s, loss=0.0733, lr=0.0004]\nSteps: 78%|███████▊ | 543/700 [05:02<01:23, 1.87it/s, loss=0.0733, lr=0.0004]\nSteps: 78%|███████▊ | 543/700 [05:02<01:23, 1.87it/s, loss=0.0158, lr=0.0004]\nSteps: 78%|███████▊ | 544/700 [05:03<01:24, 1.86it/s, loss=0.0158, lr=0.0004]\nSteps: 78%|███████▊ | 544/700 [05:03<01:24, 1.86it/s, loss=0.0303, lr=0.0004]\nSteps: 78%|███████▊ | 545/700 [05:03<01:24, 1.82it/s, loss=0.0303, lr=0.0004]\nSteps: 78%|███████▊ | 545/700 [05:03<01:24, 1.82it/s, loss=0.00213, lr=0.0004]\nSteps: 78%|███████▊ | 546/700 [05:04<01:26, 1.78it/s, loss=0.00213, lr=0.0004]\nSteps: 78%|███████▊ | 546/700 [05:04<01:26, 1.78it/s, loss=0.0131, lr=0.0004] \nSteps: 78%|███████▊ | 547/700 [05:04<01:24, 1.80it/s, loss=0.0131, lr=0.0004]\nSteps: 78%|███████▊ | 547/700 [05:04<01:24, 1.80it/s, loss=0.00865, lr=0.0004]\nSteps: 78%|███████▊ | 548/700 [05:05<01:21, 1.86it/s, loss=0.00865, lr=0.0004]\nSteps: 78%|███████▊ | 548/700 [05:05<01:21, 1.86it/s, loss=0.0364, lr=0.0004] \nSteps: 78%|███████▊ | 549/700 [05:05<01:20, 1.88it/s, loss=0.0364, lr=0.0004]\nSteps: 78%|███████▊ | 549/700 [05:05<01:20, 1.88it/s, loss=0.0189, lr=0.0004]\nSteps: 79%|███████▊ | 550/700 [05:06<01:19, 1.88it/s, loss=0.0189, lr=0.0004]\nSteps: 79%|███████▊ | 550/700 [05:06<01:19, 1.88it/s, loss=0.0136, lr=0.0004]\nSteps: 79%|███████▊ | 551/700 [05:07<01:19, 1.88it/s, loss=0.0136, lr=0.0004]\nSteps: 79%|███████▊ | 551/700 [05:07<01:19, 1.88it/s, loss=0.0498, lr=0.0004]\nSteps: 79%|███████▉ | 552/700 [05:07<01:18, 1.89it/s, loss=0.0498, lr=0.0004]\nSteps: 79%|███████▉ | 552/700 [05:07<01:18, 1.89it/s, loss=0.0141, lr=0.0004]\nSteps: 79%|███████▉ | 553/700 [05:08<01:16, 1.92it/s, loss=0.0141, lr=0.0004]\nSteps: 79%|███████▉ | 553/700 [05:08<01:16, 1.92it/s, loss=0.00719, lr=0.0004]\nSteps: 79%|███████▉ | 554/700 [05:08<01:15, 1.93it/s, loss=0.00719, lr=0.0004]\nSteps: 79%|███████▉ | 554/700 [05:08<01:15, 1.93it/s, loss=0.00273, lr=0.0004]\nSteps: 79%|███████▉ | 555/700 [05:09<01:15, 1.92it/s, loss=0.00273, lr=0.0004]\nSteps: 79%|███████▉ | 555/700 [05:09<01:15, 1.92it/s, loss=0.0116, lr=0.0004] \nSteps: 79%|███████▉ | 556/700 [05:09<01:14, 1.94it/s, loss=0.0116, lr=0.0004]\nSteps: 79%|███████▉ | 556/700 [05:09<01:14, 1.94it/s, loss=0.0282, lr=0.0004]\nSteps: 80%|███████▉ | 557/700 [05:10<01:13, 1.95it/s, loss=0.0282, lr=0.0004]\nSteps: 80%|███████▉ | 557/700 [05:10<01:13, 1.95it/s, loss=0.0122, lr=0.0004]\nSteps: 80%|███████▉ | 558/700 [05:10<01:11, 1.97it/s, loss=0.0122, lr=0.0004]\nSteps: 80%|███████▉ | 558/700 [05:10<01:11, 1.97it/s, loss=0.0149, lr=0.0004]\nSteps: 80%|███████▉ | 559/700 [05:11<01:11, 1.96it/s, loss=0.0149, lr=0.0004]\nSteps: 80%|███████▉ | 559/700 [05:11<01:11, 1.96it/s, loss=0.00336, lr=0.0004]\nSteps: 80%|████████ | 560/700 [05:11<01:10, 1.98it/s, loss=0.00336, lr=0.0004]\nSteps: 80%|████████ | 560/700 [05:11<01:10, 1.98it/s, loss=0.0495, lr=0.0004] \nSteps: 80%|████████ | 561/700 [05:12<01:10, 1.98it/s, loss=0.0495, lr=0.0004]\nSteps: 80%|████████ | 561/700 [05:12<01:10, 1.98it/s, loss=0.00663, lr=0.0004]\nSteps: 80%|████████ | 562/700 [05:12<01:10, 1.96it/s, loss=0.00663, lr=0.0004]\nSteps: 80%|████████ | 562/700 [05:12<01:10, 1.96it/s, loss=0.00749, lr=0.0004]\nSteps: 80%|████████ | 563/700 [05:13<01:10, 1.94it/s, loss=0.00749, lr=0.0004]\nSteps: 80%|████████ | 563/700 [05:13<01:10, 1.94it/s, loss=0.0777, lr=0.0004] \nSteps: 81%|████████ | 564/700 [05:13<01:10, 1.94it/s, loss=0.0777, lr=0.0004]\nSteps: 81%|████████ | 564/700 [05:13<01:10, 1.94it/s, loss=0.00752, lr=0.0004]\nSteps: 81%|████████ | 565/700 [05:14<01:09, 1.94it/s, loss=0.00752, lr=0.0004]\nSteps: 81%|████████ | 565/700 [05:14<01:09, 1.94it/s, loss=0.0213, lr=0.0004] \nSteps: 81%|████████ | 566/700 [05:14<01:08, 1.95it/s, loss=0.0213, lr=0.0004]\nSteps: 81%|████████ | 566/700 [05:14<01:08, 1.95it/s, loss=0.182, lr=0.0004] \nSteps: 81%|████████ | 567/700 [05:15<01:08, 1.93it/s, loss=0.182, lr=0.0004]\nSteps: 81%|████████ | 567/700 [05:15<01:08, 1.93it/s, loss=0.00876, lr=0.0004]\nSteps: 81%|████████ | 568/700 [05:15<01:08, 1.94it/s, loss=0.00876, lr=0.0004]\nSteps: 81%|████████ | 568/700 [05:15<01:08, 1.94it/s, loss=0.0193, lr=0.0004] \nSteps: 81%|████████▏ | 569/700 [05:16<01:06, 1.97it/s, loss=0.0193, lr=0.0004]\nSteps: 81%|████████▏ | 569/700 [05:16<01:06, 1.97it/s, loss=0.0154, lr=0.0004]\nSteps: 81%|████████▏ | 570/700 [05:16<01:07, 1.91it/s, loss=0.0154, lr=0.0004]\nSteps: 81%|████████▏ | 570/700 [05:16<01:07, 1.91it/s, loss=0.346, lr=0.0004] \nSteps: 82%|████████▏ | 571/700 [05:17<01:09, 1.84it/s, loss=0.346, lr=0.0004]\nSteps: 82%|████████▏ | 571/700 [05:17<01:09, 1.84it/s, loss=0.00996, lr=0.0004]\nSteps: 82%|████████▏ | 572/700 [05:17<01:09, 1.84it/s, loss=0.00996, lr=0.0004]\nSteps: 82%|████████▏ | 572/700 [05:17<01:09, 1.84it/s, loss=0.0344, lr=0.0004] \nSteps: 82%|████████▏ | 573/700 [05:18<01:09, 1.81it/s, loss=0.0344, lr=0.0004]\nSteps: 82%|████████▏ | 573/700 [05:18<01:09, 1.81it/s, loss=0.00388, lr=0.0004]\nSteps: 82%|████████▏ | 574/700 [05:19<01:11, 1.75it/s, loss=0.00388, lr=0.0004]\nSteps: 82%|████████▏ | 574/700 [05:19<01:11, 1.75it/s, loss=0.00327, lr=0.0004]\nSteps: 82%|████████▏ | 575/700 [05:19<01:11, 1.75it/s, loss=0.00327, lr=0.0004]\nSteps: 82%|████████▏ | 575/700 [05:19<01:11, 1.75it/s, loss=0.0173, lr=0.0004] \nSteps: 82%|████████▏ | 576/700 [05:20<01:09, 1.77it/s, loss=0.0173, lr=0.0004]\nSteps: 82%|████████▏ | 576/700 [05:20<01:09, 1.77it/s, loss=0.0087, lr=0.0004]\nSteps: 82%|████████▏ | 577/700 [05:20<01:09, 1.77it/s, loss=0.0087, lr=0.0004]\nSteps: 82%|████████▏ | 577/700 [05:20<01:09, 1.77it/s, loss=0.0399, lr=0.0004]\nSteps: 83%|████████▎ | 578/700 [05:21<01:09, 1.76it/s, loss=0.0399, lr=0.0004]\nSteps: 83%|████████▎ | 578/700 [05:21<01:09, 1.76it/s, loss=0.00906, lr=0.0004]\nSteps: 83%|████████▎ | 579/700 [05:21<01:08, 1.78it/s, loss=0.00906, lr=0.0004]\nSteps: 83%|████████▎ | 579/700 [05:21<01:08, 1.78it/s, loss=0.0716, lr=0.0004] \nSteps: 83%|████████▎ | 580/700 [05:22<01:07, 1.77it/s, loss=0.0716, lr=0.0004]\nSteps: 83%|████████▎ | 580/700 [05:22<01:07, 1.77it/s, loss=0.214, lr=0.0004] \nSteps: 83%|████████▎ | 581/700 [05:23<01:07, 1.75it/s, loss=0.214, lr=0.0004]\nSteps: 83%|████████▎ | 581/700 [05:23<01:07, 1.75it/s, loss=0.0602, lr=0.0004]\nSteps: 83%|████████▎ | 582/700 [05:23<01:07, 1.75it/s, loss=0.0602, lr=0.0004]\nSteps: 83%|████████▎ | 582/700 [05:23<01:07, 1.75it/s, loss=0.0708, lr=0.0004]\nSteps: 83%|████████▎ | 583/700 [05:24<01:07, 1.75it/s, loss=0.0708, lr=0.0004]\nSteps: 83%|████████▎ | 583/700 [05:24<01:07, 1.75it/s, loss=0.00627, lr=0.0004]\nSteps: 83%|████████▎ | 584/700 [05:24<01:05, 1.76it/s, loss=0.00627, lr=0.0004]\nSteps: 83%|████████▎ | 584/700 [05:24<01:05, 1.76it/s, loss=0.00603, lr=0.0004]\nSteps: 84%|████████▎ | 585/700 [05:25<01:05, 1.76it/s, loss=0.00603, lr=0.0004]\nSteps: 84%|████████▎ | 585/700 [05:25<01:05, 1.76it/s, loss=0.0861, lr=0.0004] \nSteps: 84%|████████▎ | 586/700 [05:25<01:04, 1.77it/s, loss=0.0861, lr=0.0004]\nSteps: 84%|████████▎ | 586/700 [05:25<01:04, 1.77it/s, loss=0.00681, lr=0.0004]\nSteps: 84%|████████▍ | 587/700 [05:26<01:04, 1.76it/s, loss=0.00681, lr=0.0004]\nSteps: 84%|████████▍ | 587/700 [05:26<01:04, 1.76it/s, loss=0.0772, lr=0.0004] \nSteps: 84%|████████▍ | 588/700 [05:27<01:04, 1.75it/s, loss=0.0772, lr=0.0004]\nSteps: 84%|████████▍ | 588/700 [05:27<01:04, 1.75it/s, loss=0.0183, lr=0.0004]\nSteps: 84%|████████▍ | 589/700 [05:27<01:03, 1.75it/s, loss=0.0183, lr=0.0004]\nSteps: 84%|████████▍ | 589/700 [05:27<01:03, 1.75it/s, loss=0.00783, lr=0.0004]\nSteps: 84%|████████▍ | 590/700 [05:28<01:02, 1.75it/s, loss=0.00783, lr=0.0004]\nSteps: 84%|████████▍ | 590/700 [05:28<01:02, 1.75it/s, loss=0.0575, lr=0.0004] \nSteps: 84%|████████▍ | 591/700 [05:28<01:01, 1.77it/s, loss=0.0575, lr=0.0004]\nSteps: 84%|████████▍ | 591/700 [05:28<01:01, 1.77it/s, loss=0.0142, lr=0.0004]\nSteps: 85%|████████▍ | 592/700 [05:29<01:00, 1.78it/s, loss=0.0142, lr=0.0004]\nSteps: 85%|████████▍ | 592/700 [05:29<01:00, 1.78it/s, loss=0.00664, lr=0.0004]\nSteps: 85%|████████▍ | 593/700 [05:29<00:59, 1.80it/s, loss=0.00664, lr=0.0004]\nSteps: 85%|████████▍ | 593/700 [05:29<00:59, 1.80it/s, loss=0.00879, lr=0.0004]\nSteps: 85%|████████▍ | 594/700 [05:30<00:59, 1.79it/s, loss=0.00879, lr=0.0004]\nSteps: 85%|████████▍ | 594/700 [05:30<00:59, 1.79it/s, loss=0.0716, lr=0.0004] \nSteps: 85%|████████▌ | 595/700 [05:30<00:58, 1.79it/s, loss=0.0716, lr=0.0004]\nSteps: 85%|████████▌ | 595/700 [05:30<00:58, 1.79it/s, loss=0.0366, lr=0.0004]\nSteps: 85%|████████▌ | 596/700 [05:31<00:58, 1.79it/s, loss=0.0366, lr=0.0004]\nSteps: 85%|████████▌ | 596/700 [05:31<00:58, 1.79it/s, loss=0.0431, lr=0.0004]\nSteps: 85%|████████▌ | 597/700 [05:32<00:57, 1.79it/s, loss=0.0431, lr=0.0004]\nSteps: 85%|████████▌ | 597/700 [05:32<00:57, 1.79it/s, loss=0.0399, lr=0.0004]\nSteps: 85%|████████▌ | 598/700 [05:32<00:57, 1.79it/s, loss=0.0399, lr=0.0004]\nSteps: 85%|████████▌ | 598/700 [05:32<00:57, 1.79it/s, loss=0.0735, lr=0.0004]\nSteps: 86%|████████▌ | 599/700 [05:33<00:56, 1.78it/s, loss=0.0735, lr=0.0004]\nSteps: 86%|████████▌ | 599/700 [05:33<00:56, 1.78it/s, loss=0.0237, lr=0.0004]\nSteps: 86%|████████▌ | 600/700 [05:33<00:56, 1.78it/s, loss=0.0237, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_600.safetensors\nLORA Unet Moved 0.0020308804232627153\nLORA CLIP Moved 8.028616866795346e-05\nSteps: 86%|████████▌ | 600/700 [05:33<00:56, 1.78it/s, loss=0.00955, lr=0.0004]\nSteps: 86%|████████▌ | 601/700 [05:34<01:00, 1.62it/s, loss=0.00955, lr=0.0004]\nSteps: 86%|████████▌ | 601/700 [05:34<01:00, 1.62it/s, loss=0.0184, lr=0.0004] \nSteps: 86%|████████▌ | 602/700 [05:35<00:59, 1.66it/s, loss=0.0184, lr=0.0004]\nSteps: 86%|████████▌ | 602/700 [05:35<00:59, 1.66it/s, loss=0.0569, lr=0.0004]\nSteps: 86%|████████▌ | 603/700 [05:35<00:57, 1.69it/s, loss=0.0569, lr=0.0004]\nSteps: 86%|████████▌ | 603/700 [05:35<00:57, 1.69it/s, loss=0.00788, lr=0.0004]\nSteps: 86%|████████▋ | 604/700 [05:36<00:55, 1.72it/s, loss=0.00788, lr=0.0004]\nSteps: 86%|████████▋ | 604/700 [05:36<00:55, 1.72it/s, loss=0.0886, lr=0.0004] \nSteps: 86%|████████▋ | 605/700 [05:36<00:54, 1.74it/s, loss=0.0886, lr=0.0004]\nSteps: 86%|████████▋ | 605/700 [05:36<00:54, 1.74it/s, loss=0.0103, lr=0.0004]\nSteps: 87%|████████▋ | 606/700 [05:37<00:53, 1.77it/s, loss=0.0103, lr=0.0004]\nSteps: 87%|████████▋ | 606/700 [05:37<00:53, 1.77it/s, loss=0.00687, lr=0.0004]\nSteps: 87%|████████▋ | 607/700 [05:37<00:52, 1.79it/s, loss=0.00687, lr=0.0004]\nSteps: 87%|████████▋ | 607/700 [05:37<00:52, 1.79it/s, loss=0.00811, lr=0.0004]\nSteps: 87%|████████▋ | 608/700 [05:38<00:50, 1.81it/s, loss=0.00811, lr=0.0004]\nSteps: 87%|████████▋ | 608/700 [05:38<00:50, 1.81it/s, loss=0.0626, lr=0.0004] \nSteps: 87%|████████▋ | 609/700 [05:38<00:50, 1.81it/s, loss=0.0626, lr=0.0004]\nSteps: 87%|████████▋ | 609/700 [05:38<00:50, 1.81it/s, loss=0.037, lr=0.0004] \nSteps: 87%|████████▋ | 610/700 [05:39<00:50, 1.79it/s, loss=0.037, lr=0.0004]\nSteps: 87%|████████▋ | 610/700 [05:39<00:50, 1.79it/s, loss=0.0101, lr=0.0004]\nSteps: 87%|████████▋ | 611/700 [05:40<00:49, 1.80it/s, loss=0.0101, lr=0.0004]\nSteps: 87%|████████▋ | 611/700 [05:40<00:49, 1.80it/s, loss=0.00297, lr=0.0004]\nSteps: 87%|████████▋ | 612/700 [05:40<00:49, 1.79it/s, loss=0.00297, lr=0.0004]\nSteps: 87%|████████▋ | 612/700 [05:40<00:49, 1.79it/s, loss=0.045, lr=0.0004] \nSteps: 88%|████████▊ | 613/700 [05:41<00:48, 1.79it/s, loss=0.045, lr=0.0004]\nSteps: 88%|████████▊ | 613/700 [05:41<00:48, 1.79it/s, loss=0.00866, lr=0.0004]\nSteps: 88%|████████▊ | 614/700 [05:41<00:47, 1.79it/s, loss=0.00866, lr=0.0004]\nSteps: 88%|████████▊ | 614/700 [05:41<00:47, 1.79it/s, loss=0.00474, lr=0.0004]\nSteps: 88%|████████▊ | 615/700 [05:42<00:47, 1.79it/s, loss=0.00474, lr=0.0004]\nSteps: 88%|████████▊ | 615/700 [05:42<00:47, 1.79it/s, loss=0.0106, lr=0.0004] \nSteps: 88%|████████▊ | 616/700 [05:42<00:46, 1.81it/s, loss=0.0106, lr=0.0004]\nSteps: 88%|████████▊ | 616/700 [05:42<00:46, 1.81it/s, loss=0.0635, lr=0.0004]\nSteps: 88%|████████▊ | 617/700 [05:43<00:46, 1.79it/s, loss=0.0635, lr=0.0004]\nSteps: 88%|████████▊ | 617/700 [05:43<00:46, 1.79it/s, loss=0.0116, lr=0.0004]\nSteps: 88%|████████▊ | 618/700 [05:43<00:46, 1.77it/s, loss=0.0116, lr=0.0004]\nSteps: 88%|████████▊ | 618/700 [05:43<00:46, 1.77it/s, loss=0.0267, lr=0.0004]\nSteps: 88%|████████▊ | 619/700 [05:44<00:45, 1.77it/s, loss=0.0267, lr=0.0004]\nSteps: 88%|████████▊ | 619/700 [05:44<00:45, 1.77it/s, loss=0.0141, lr=0.0004]\nSteps: 89%|████████▊ | 620/700 [05:45<00:45, 1.78it/s, loss=0.0141, lr=0.0004]\nSteps: 89%|████████▊ | 620/700 [05:45<00:45, 1.78it/s, loss=0.0269, lr=0.0004]\nSteps: 89%|████████▊ | 621/700 [05:45<00:43, 1.80it/s, loss=0.0269, lr=0.0004]\nSteps: 89%|████████▊ | 621/700 [05:45<00:43, 1.80it/s, loss=0.0219, lr=0.0004]\nSteps: 89%|████████▉ | 622/700 [05:46<00:42, 1.82it/s, loss=0.0219, lr=0.0004]\nSteps: 89%|████████▉ | 622/700 [05:46<00:42, 1.82it/s, loss=0.0307, lr=0.0004]\nSteps: 89%|████████▉ | 623/700 [05:46<00:42, 1.81it/s, loss=0.0307, lr=0.0004]\nSteps: 89%|████████▉ | 623/700 [05:46<00:42, 1.81it/s, loss=0.0196, lr=0.0004]\nSteps: 89%|████████▉ | 624/700 [05:47<00:42, 1.81it/s, loss=0.0196, lr=0.0004]\nSteps: 89%|████████▉ | 624/700 [05:47<00:42, 1.81it/s, loss=0.0529, lr=0.0004]\nSteps: 89%|████████▉ | 625/700 [05:47<00:41, 1.82it/s, loss=0.0529, lr=0.0004]\nSteps: 89%|████████▉ | 625/700 [05:47<00:41, 1.82it/s, loss=0.0333, lr=0.0004]\nSteps: 89%|████████▉ | 626/700 [05:48<00:40, 1.83it/s, loss=0.0333, lr=0.0004]\nSteps: 89%|████████▉ | 626/700 [05:48<00:40, 1.83it/s, loss=0.0369, lr=0.0004]\nSteps: 90%|████████▉ | 627/700 [05:48<00:39, 1.86it/s, loss=0.0369, lr=0.0004]\nSteps: 90%|████████▉ | 627/700 [05:48<00:39, 1.86it/s, loss=0.0185, lr=0.0004]\nSteps: 90%|████████▉ | 628/700 [05:49<00:38, 1.87it/s, loss=0.0185, lr=0.0004]\nSteps: 90%|████████▉ | 628/700 [05:49<00:38, 1.87it/s, loss=0.00975, lr=0.0004]\nSteps: 90%|████████▉ | 629/700 [05:49<00:38, 1.84it/s, loss=0.00975, lr=0.0004]\nSteps: 90%|████████▉ | 629/700 [05:49<00:38, 1.84it/s, loss=0.021, lr=0.0004] \nSteps: 90%|█████████ | 630/700 [05:50<00:38, 1.84it/s, loss=0.021, lr=0.0004]\nSteps: 90%|█████████ | 630/700 [05:50<00:38, 1.84it/s, loss=0.111, lr=0.0004]\nSteps: 90%|█████████ | 631/700 [05:51<00:37, 1.84it/s, loss=0.111, lr=0.0004]\nSteps: 90%|█████████ | 631/700 [05:51<00:37, 1.84it/s, loss=0.00458, lr=0.0004]\nSteps: 90%|█████████ | 632/700 [05:51<00:36, 1.84it/s, loss=0.00458, lr=0.0004]\nSteps: 90%|█████████ | 632/700 [05:51<00:36, 1.84it/s, loss=0.0759, lr=0.0004] \nSteps: 90%|█████████ | 633/700 [05:52<00:36, 1.84it/s, loss=0.0759, lr=0.0004]\nSteps: 90%|█████████ | 633/700 [05:52<00:36, 1.84it/s, loss=0.0882, lr=0.0004]\nSteps: 91%|█████████ | 634/700 [05:52<00:36, 1.83it/s, loss=0.0882, lr=0.0004]\nSteps: 91%|█████████ | 634/700 [05:52<00:36, 1.83it/s, loss=0.0142, lr=0.0004]\nSteps: 91%|█████████ | 635/700 [05:53<00:35, 1.84it/s, loss=0.0142, lr=0.0004]\nSteps: 91%|█████████ | 635/700 [05:53<00:35, 1.84it/s, loss=0.00448, lr=0.0004]\nSteps: 91%|█████████ | 636/700 [05:53<00:34, 1.84it/s, loss=0.00448, lr=0.0004]\nSteps: 91%|█████████ | 636/700 [05:53<00:34, 1.84it/s, loss=0.0323, lr=0.0004] \nSteps: 91%|█████████ | 637/700 [05:54<00:34, 1.81it/s, loss=0.0323, lr=0.0004]\nSteps: 91%|█████████ | 637/700 [05:54<00:34, 1.81it/s, loss=0.00757, lr=0.0004]\nSteps: 91%|█████████ | 638/700 [05:54<00:34, 1.80it/s, loss=0.00757, lr=0.0004]\nSteps: 91%|█████████ | 638/700 [05:54<00:34, 1.80it/s, loss=0.0161, lr=0.0004] \nSteps: 91%|█████████▏| 639/700 [05:55<00:33, 1.80it/s, loss=0.0161, lr=0.0004]\nSteps: 91%|█████████▏| 639/700 [05:55<00:33, 1.80it/s, loss=0.0543, lr=0.0004]\nSteps: 91%|█████████▏| 640/700 [05:56<00:33, 1.80it/s, loss=0.0543, lr=0.0004]\nSteps: 91%|█████████▏| 640/700 [05:56<00:33, 1.80it/s, loss=0.0417, lr=0.0004]\nSteps: 92%|█████████▏| 641/700 [05:56<00:32, 1.81it/s, loss=0.0417, lr=0.0004]\nSteps: 92%|█████████▏| 641/700 [05:56<00:32, 1.81it/s, loss=0.0085, lr=0.0004]\nSteps: 92%|█████████▏| 642/700 [05:57<00:31, 1.85it/s, loss=0.0085, lr=0.0004]\nSteps: 92%|█████████▏| 642/700 [05:57<00:31, 1.85it/s, loss=0.00933, lr=0.0004]\nSteps: 92%|█████████▏| 643/700 [05:57<00:29, 1.90it/s, loss=0.00933, lr=0.0004]\nSteps: 92%|█████████▏| 643/700 [05:57<00:29, 1.90it/s, loss=0.00429, lr=0.0004]\nSteps: 92%|█████████▏| 644/700 [05:58<00:28, 1.94it/s, loss=0.00429, lr=0.0004]\nSteps: 92%|█████████▏| 644/700 [05:58<00:28, 1.94it/s, loss=0.051, lr=0.0004] \nSteps: 92%|█████████▏| 645/700 [05:58<00:28, 1.95it/s, loss=0.051, lr=0.0004]\nSteps: 92%|█████████▏| 645/700 [05:58<00:28, 1.95it/s, loss=0.122, lr=0.0004]\nSteps: 92%|█████████▏| 646/700 [05:59<00:27, 1.94it/s, loss=0.122, lr=0.0004]\nSteps: 92%|█████████▏| 646/700 [05:59<00:27, 1.94it/s, loss=0.0861, lr=0.0004]\nSteps: 92%|█████████▏| 647/700 [05:59<00:27, 1.93it/s, loss=0.0861, lr=0.0004]\nSteps: 92%|█████████▏| 647/700 [05:59<00:27, 1.93it/s, loss=0.0105, lr=0.0004]\nSteps: 93%|█████████▎| 648/700 [06:00<00:26, 1.95it/s, loss=0.0105, lr=0.0004]\nSteps: 93%|█████████▎| 648/700 [06:00<00:26, 1.95it/s, loss=0.28, lr=0.0004] \nSteps: 93%|█████████▎| 649/700 [06:00<00:25, 1.98it/s, loss=0.28, lr=0.0004]\nSteps: 93%|█████████▎| 649/700 [06:00<00:25, 1.98it/s, loss=0.00453, lr=0.0004]\nSteps: 93%|█████████▎| 650/700 [06:01<00:25, 1.97it/s, loss=0.00453, lr=0.0004]\nSteps: 93%|█████████▎| 650/700 [06:01<00:25, 1.97it/s, loss=0.0112, lr=0.0004] \nSteps: 93%|█████████▎| 651/700 [06:01<00:24, 1.97it/s, loss=0.0112, lr=0.0004]\nSteps: 93%|█████████▎| 651/700 [06:01<00:24, 1.97it/s, loss=0.00302, lr=0.0004]\nSteps: 93%|█████████▎| 652/700 [06:02<00:24, 1.95it/s, loss=0.00302, lr=0.0004]\nSteps: 93%|█████████▎| 652/700 [06:02<00:24, 1.95it/s, loss=0.0966, lr=0.0004] \nSteps: 93%|█████████▎| 653/700 [06:02<00:25, 1.85it/s, loss=0.0966, lr=0.0004]\nSteps: 93%|█████████▎| 653/700 [06:02<00:25, 1.85it/s, loss=0.0116, lr=0.0004]\nSteps: 93%|█████████▎| 654/700 [06:03<00:25, 1.84it/s, loss=0.0116, lr=0.0004]\nSteps: 93%|█████████▎| 654/700 [06:03<00:25, 1.84it/s, loss=0.00164, lr=0.0004]\nSteps: 94%|█████████▎| 655/700 [06:03<00:24, 1.83it/s, loss=0.00164, lr=0.0004]\nSteps: 94%|█████████▎| 655/700 [06:03<00:24, 1.83it/s, loss=0.0755, lr=0.0004] \nSteps: 94%|█████████▎| 656/700 [06:04<00:23, 1.84it/s, loss=0.0755, lr=0.0004]\nSteps: 94%|█████████▎| 656/700 [06:04<00:23, 1.84it/s, loss=0.118, lr=0.0004] \nSteps: 94%|█████████▍| 657/700 [06:04<00:22, 1.87it/s, loss=0.118, lr=0.0004]\nSteps: 94%|█████████▍| 657/700 [06:04<00:22, 1.87it/s, loss=0.00279, lr=0.0004]\nSteps: 94%|█████████▍| 658/700 [06:05<00:22, 1.90it/s, loss=0.00279, lr=0.0004]\nSteps: 94%|█████████▍| 658/700 [06:05<00:22, 1.90it/s, loss=0.0254, lr=0.0004] \nSteps: 94%|█████████▍| 659/700 [06:05<00:21, 1.91it/s, loss=0.0254, lr=0.0004]\nSteps: 94%|█████████▍| 659/700 [06:05<00:21, 1.91it/s, loss=0.00583, lr=0.0004]\nSteps: 94%|█████████▍| 660/700 [06:06<00:20, 1.93it/s, loss=0.00583, lr=0.0004]\nSteps: 94%|█████████▍| 660/700 [06:06<00:20, 1.93it/s, loss=0.0188, lr=0.0004] \nSteps: 94%|█████████▍| 661/700 [06:06<00:20, 1.94it/s, loss=0.0188, lr=0.0004]\nSteps: 94%|█████████▍| 661/700 [06:06<00:20, 1.94it/s, loss=0.0194, lr=0.0004]\nSteps: 95%|█████████▍| 662/700 [06:07<00:19, 1.96it/s, loss=0.0194, lr=0.0004]\nSteps: 95%|█████████▍| 662/700 [06:07<00:19, 1.96it/s, loss=0.0046, lr=0.0004]\nSteps: 95%|█████████▍| 663/700 [06:07<00:18, 1.97it/s, loss=0.0046, lr=0.0004]\nSteps: 95%|█████████▍| 663/700 [06:07<00:18, 1.97it/s, loss=0.0282, lr=0.0004]\nSteps: 95%|█████████▍| 664/700 [06:08<00:18, 1.95it/s, loss=0.0282, lr=0.0004]\nSteps: 95%|█████████▍| 664/700 [06:08<00:18, 1.95it/s, loss=0.0177, lr=0.0004]\nSteps: 95%|█████████▌| 665/700 [06:09<00:18, 1.87it/s, loss=0.0177, lr=0.0004]\nSteps: 95%|█████████▌| 665/700 [06:09<00:18, 1.87it/s, loss=0.028, lr=0.0004] \nSteps: 95%|█████████▌| 666/700 [06:09<00:17, 1.91it/s, loss=0.028, lr=0.0004]\nSteps: 95%|█████████▌| 666/700 [06:09<00:17, 1.91it/s, loss=0.00854, lr=0.0004]\nSteps: 95%|█████████▌| 667/700 [06:10<00:17, 1.92it/s, loss=0.00854, lr=0.0004]\nSteps: 95%|█████████▌| 667/700 [06:10<00:17, 1.92it/s, loss=0.0678, lr=0.0004] \nSteps: 95%|█████████▌| 668/700 [06:10<00:16, 1.93it/s, loss=0.0678, lr=0.0004]\nSteps: 95%|█████████▌| 668/700 [06:10<00:16, 1.93it/s, loss=0.0106, lr=0.0004]\nSteps: 96%|█████████▌| 669/700 [06:11<00:15, 1.94it/s, loss=0.0106, lr=0.0004]\nSteps: 96%|█████████▌| 669/700 [06:11<00:15, 1.94it/s, loss=0.00561, lr=0.0004]\nSteps: 96%|█████████▌| 670/700 [06:11<00:15, 1.89it/s, loss=0.00561, lr=0.0004]\nSteps: 96%|█████████▌| 670/700 [06:11<00:15, 1.89it/s, loss=0.0232, lr=0.0004] \nSteps: 96%|█████████▌| 671/700 [06:12<00:15, 1.88it/s, loss=0.0232, lr=0.0004]\nSteps: 96%|█████████▌| 671/700 [06:12<00:15, 1.88it/s, loss=0.0145, lr=0.0004]\nSteps: 96%|█████████▌| 672/700 [06:12<00:15, 1.87it/s, loss=0.0145, lr=0.0004]\nSteps: 96%|█████████▌| 672/700 [06:12<00:15, 1.87it/s, loss=0.0449, lr=0.0004]\nSteps: 96%|█████████▌| 673/700 [06:13<00:14, 1.83it/s, loss=0.0449, lr=0.0004]\nSteps: 96%|█████████▌| 673/700 [06:13<00:14, 1.83it/s, loss=0.0102, lr=0.0004]\nSteps: 96%|█████████▋| 674/700 [06:13<00:14, 1.80it/s, loss=0.0102, lr=0.0004]\nSteps: 96%|█████████▋| 674/700 [06:13<00:14, 1.80it/s, loss=0.0219, lr=0.0004]\nSteps: 96%|█████████▋| 675/700 [06:14<00:14, 1.78it/s, loss=0.0219, lr=0.0004]\nSteps: 96%|█████████▋| 675/700 [06:14<00:14, 1.78it/s, loss=0.00629, lr=0.0004]\nSteps: 97%|█████████▋| 676/700 [06:15<00:13, 1.78it/s, loss=0.00629, lr=0.0004]\nSteps: 97%|█████████▋| 676/700 [06:15<00:13, 1.78it/s, loss=0.112, lr=0.0004] \nSteps: 97%|█████████▋| 677/700 [06:15<00:13, 1.77it/s, loss=0.112, lr=0.0004]\nSteps: 97%|█████████▋| 677/700 [06:15<00:13, 1.77it/s, loss=0.00805, lr=0.0004]\nSteps: 97%|█████████▋| 678/700 [06:16<00:12, 1.76it/s, loss=0.00805, lr=0.0004]\nSteps: 97%|█████████▋| 678/700 [06:16<00:12, 1.76it/s, loss=0.00428, lr=0.0004]\nSteps: 97%|█████████▋| 679/700 [06:16<00:11, 1.76it/s, loss=0.00428, lr=0.0004]\nSteps: 97%|█████████▋| 679/700 [06:16<00:11, 1.76it/s, loss=0.00553, lr=0.0004]\nSteps: 97%|█████████▋| 680/700 [06:17<00:11, 1.76it/s, loss=0.00553, lr=0.0004]\nSteps: 97%|█████████▋| 680/700 [06:17<00:11, 1.76it/s, loss=0.00655, lr=0.0004]\nSteps: 97%|█████████▋| 681/700 [06:17<00:10, 1.74it/s, loss=0.00655, lr=0.0004]\nSteps: 97%|█████████▋| 681/700 [06:17<00:10, 1.74it/s, loss=0.0833, lr=0.0004] \nSteps: 97%|█████████▋| 682/700 [06:18<00:10, 1.73it/s, loss=0.0833, lr=0.0004]\nSteps: 97%|█████████▋| 682/700 [06:18<00:10, 1.73it/s, loss=0.0285, lr=0.0004]\nSteps: 98%|█████████▊| 683/700 [06:19<00:09, 1.71it/s, loss=0.0285, lr=0.0004]\nSteps: 98%|█████████▊| 683/700 [06:19<00:09, 1.71it/s, loss=0.0525, lr=0.0004]\nSteps: 98%|█████████▊| 684/700 [06:19<00:09, 1.72it/s, loss=0.0525, lr=0.0004]\nSteps: 98%|█████████▊| 684/700 [06:19<00:09, 1.72it/s, loss=0.00216, lr=0.0004]\nSteps: 98%|█████████▊| 685/700 [06:20<00:08, 1.71it/s, loss=0.00216, lr=0.0004]\nSteps: 98%|█████████▊| 685/700 [06:20<00:08, 1.71it/s, loss=0.0627, lr=0.0004] \nSteps: 98%|█████████▊| 686/700 [06:20<00:08, 1.73it/s, loss=0.0627, lr=0.0004]\nSteps: 98%|█████████▊| 686/700 [06:20<00:08, 1.73it/s, loss=0.0122, lr=0.0004]\nSteps: 98%|█████████▊| 687/700 [06:21<00:07, 1.72it/s, loss=0.0122, lr=0.0004]\nSteps: 98%|█████████▊| 687/700 [06:21<00:07, 1.72it/s, loss=0.00683, lr=0.0004]\nSteps: 98%|█████████▊| 688/700 [06:22<00:06, 1.72it/s, loss=0.00683, lr=0.0004]\nSteps: 98%|█████████▊| 688/700 [06:22<00:06, 1.72it/s, loss=0.00972, lr=0.0004]\nSteps: 98%|█████████▊| 689/700 [06:22<00:06, 1.73it/s, loss=0.00972, lr=0.0004]\nSteps: 98%|█████████▊| 689/700 [06:22<00:06, 1.73it/s, loss=0.0338, lr=0.0004] \nSteps: 99%|█████████▊| 690/700 [06:23<00:05, 1.74it/s, loss=0.0338, lr=0.0004]\nSteps: 99%|█████████▊| 690/700 [06:23<00:05, 1.74it/s, loss=0.0056, lr=0.0004]\nSteps: 99%|█████████▊| 691/700 [06:23<00:05, 1.75it/s, loss=0.0056, lr=0.0004]\nSteps: 99%|█████████▊| 691/700 [06:23<00:05, 1.75it/s, loss=0.00928, lr=0.0004]\nSteps: 99%|█████████▉| 692/700 [06:24<00:04, 1.75it/s, loss=0.00928, lr=0.0004]\nSteps: 99%|█████████▉| 692/700 [06:24<00:04, 1.75it/s, loss=0.00226, lr=0.0004]\nSteps: 99%|█████████▉| 693/700 [06:24<00:04, 1.72it/s, loss=0.00226, lr=0.0004]\nSteps: 99%|█████████▉| 693/700 [06:24<00:04, 1.72it/s, loss=0.00318, lr=0.0004]\nSteps: 99%|█████████▉| 694/700 [06:25<00:03, 1.73it/s, loss=0.00318, lr=0.0004]\nSteps: 99%|█████████▉| 694/700 [06:25<00:03, 1.73it/s, loss=0.00763, lr=0.0004]\nSteps: 99%|█████████▉| 695/700 [06:26<00:02, 1.74it/s, loss=0.00763, lr=0.0004]\nSteps: 99%|█████████▉| 695/700 [06:26<00:02, 1.74it/s, loss=0.0217, lr=0.0004] \nSteps: 99%|█████████▉| 696/700 [06:26<00:02, 1.75it/s, loss=0.0217, lr=0.0004]\nSteps: 99%|█████████▉| 696/700 [06:26<00:02, 1.75it/s, loss=0.0112, lr=0.0004]\nSteps: 100%|█████████▉| 697/700 [06:27<00:01, 1.76it/s, loss=0.0112, lr=0.0004]\nSteps: 100%|█████████▉| 697/700 [06:27<00:01, 1.76it/s, loss=0.0276, lr=0.0004]\nSteps: 100%|█████████▉| 698/700 [06:27<00:01, 1.76it/s, loss=0.0276, lr=0.0004]\nSteps: 100%|█████████▉| 698/700 [06:27<00:01, 1.76it/s, loss=0.0766, lr=0.0004]\nSteps: 100%|█████████▉| 699/700 [06:28<00:00, 1.74it/s, loss=0.0766, lr=0.0004]\nSteps: 100%|█████████▉| 699/700 [06:28<00:00, 1.74it/s, loss=0.00645, lr=0.0004]\nSteps: 100%|██████████| 700/700 [06:28<00:00, 1.74it/s, loss=0.00645, lr=0.0004]\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/step_700.safetensors\nLORA Unet Moved 0.002203464973717928\nLORA CLIP Moved 8.760895434534177e-05\nCurrent Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')\nCurrent Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')\nSaving weights to checkpoints/final_lora.safetensors\nSteps: 100%|██████████| 700/700 [06:28<00:00, 1.74it/s, loss=0.00693, lr=0.0004]\nSteps: 100%|██████████| 700/700 [06:29<00:00, 1.80it/s, loss=0.00693, lr=0.0004]",
"metrics": {
"predict_time": 1063.345396,
"total_time": 1408.533044
},
"output": "https://replicate.delivery/pbxt/9QEHIoGWlrriKd8lacbskLuxisFRSMkhb56aWV0Iy8FHIsQE/tmpjaf44nx5clfutialw0007zu9mb7h3l51szip.safetensors",
"started_at": "2023-06-05T03:09:41.904287Z",
"status": "succeeded",
"urls": {
"get": "https://api.replicate.com/v1/predictions/a22ulrmhu5epzjbt24uygdomcq",
"cancel": "https://api.replicate.com/v1/predictions/a22ulrmhu5epzjbt24uygdomcq/cancel"
},
"version": "1685f45a99b91aa82a5acd620c6916c3dfd5bb343fb40d439a0bf20496ec98ec"
}
Using seed: 44374
PTI : Initializer Tokens not given, doing random inits
PTI : Placeholder Tokens ['<s1>', '<s2>']
PTI : Initializer Tokens ['<rand-0.017>', '<rand-0.017>']
Initialized <s1> with random noise (sigma=0.017), empirically 0.000 +- 0.017
Norm : 0.4636
Initialized <s2> with random noise (sigma=0.017), empirically 0.000 +- 0.017
Norm : 0.4810
/root/.pyenv/versions/3.10.11/lib/python3.10/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Mask not found for cog_instance_data/0.mask.png
Warning : this will pre-process all the images in the instance data root.
0%| | 0/15 [00:00<?, ?it/s]INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
7%|▋ | 1/15 [00:00<00:02, 5.94it/s]
40%|████ | 6/15 [00:00<00:00, 25.47it/s]
100%|██████████| 15/15 [00:00<00:00, 46.74it/s]
100%|██████████| 15/15 [00:00<00:00, 37.89it/s]
a photo of a cool <s1><s2>
0%| | 0/15 [00:00<?, ?it/s]
a photo of a nice <s1><s2>
a cropped photo of the <s1><s2>
7%|▋ | 1/15 [00:07<01:49, 7.81s/it]
a photo of the nice <s1><s2>
20%|██ | 3/15 [00:08<00:25, 2.13s/it]
a photo of the clean <s1><s2>
27%|██▋ | 4/15 [00:08<00:16, 1.46s/it]
a good photo of the <s1><s2>
33%|███▎ | 5/15 [00:08<00:10, 1.05s/it]
a photo of a clean <s1><s2>
40%|████ | 6/15 [00:08<00:07, 1.28it/s]
a photo of the <s1><s2>
47%|████▋ | 7/15 [00:08<00:04, 1.66it/s]
a photo of a nice <s1><s2>
53%|█████▎ | 8/15 [00:09<00:03, 2.10it/s]
a rendition of the <s1><s2>
60%|██████ | 9/15 [00:09<00:02, 2.54it/s]
a photo of a nice <s1><s2>
67%|██████▋ | 10/15 [00:09<00:01, 2.98it/s]
a photo of a small <s1><s2>
73%|███████▎ | 11/15 [00:09<00:01, 3.37it/s]
a photo of the <s1><s2>
80%|████████ | 12/15 [00:09<00:00, 3.70it/s]
a rendition of the <s1><s2>
87%|████████▋ | 13/15 [00:10<00:00, 3.97it/s]
a photo of my <s1><s2>
93%|█████████▎| 14/15 [00:10<00:00, 4.17it/s]
100%|██████████| 15/15 [00:10<00:00, 4.34it/s]
100%|██████████| 15/15 [00:10<00:00, 1.42it/s]
PTI : Using cached latent.
0%| | 0/1000 [00:00<?, ?it/s]
tensor(0.0058, device='cuda:0')
tensor([[0.4645],
[0.4824]], device='cuda:0')
Current Norm : tensor([0.4580, 0.4741], device='cuda:0')
Steps: 0%| | 0/1000 [00:00<?, ?it/s]
Steps: 0%| | 1/1000 [00:05<1:33:07, 5.59s/it]
Steps: 0%| | 1/1000 [00:05<1:33:07, 5.59s/it, loss=0.00105, lr=0.001]
Steps: 0%| | 2/1000 [00:06<43:50, 2.64s/it, loss=0.00105, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4600],
[0.4757]], device='cuda:0')
Current Norm : tensor([0.4540, 0.4681], device='cuda:0')
Steps: 0%| | 2/1000 [00:06<43:50, 2.64s/it, loss=0.00522, lr=0.001]
Steps: 0%| | 3/1000 [00:06<28:27, 1.71s/it, loss=0.00522, lr=0.001]
Steps: 0%| | 3/1000 [00:06<28:27, 1.71s/it, loss=0.0674, lr=0.001]
Steps: 0%| | 4/1000 [00:07<20:57, 1.26s/it, loss=0.0674, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4559],
[0.4699]], device='cuda:0')
Current Norm : tensor([0.4503, 0.4629], device='cuda:0')
Steps: 0%| | 4/1000 [00:07<20:57, 1.26s/it, loss=0.00126, lr=0.001]
Steps: 0%| | 5/1000 [00:07<17:01, 1.03s/it, loss=0.00126, lr=0.001]
Steps: 0%| | 5/1000 [00:07<17:01, 1.03s/it, loss=0.00103, lr=0.001]
Steps: 1%| | 6/1000 [00:08<14:27, 1.15it/s, loss=0.00103, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4523],
[0.4645]], device='cuda:0')
Current Norm : tensor([0.4470, 0.4581], device='cuda:0')
Steps: 1%| | 6/1000 [00:08<14:27, 1.15it/s, loss=0.00719, lr=0.001]
Steps: 1%| | 7/1000 [00:09<13:02, 1.27it/s, loss=0.00719, lr=0.001]
Steps: 1%| | 7/1000 [00:09<13:02, 1.27it/s, loss=0.000785, lr=0.001]
Steps: 1%| | 8/1000 [00:09<11:53, 1.39it/s, loss=0.000785, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4486],
[0.4593]], device='cuda:0')
Current Norm : tensor([0.4437, 0.4534], device='cuda:0')
Steps: 1%| | 8/1000 [00:09<11:53, 1.39it/s, loss=0.00652, lr=0.001]
Steps: 1%| | 9/1000 [00:10<11:19, 1.46it/s, loss=0.00652, lr=0.001]
Steps: 1%| | 9/1000 [00:10<11:19, 1.46it/s, loss=0.000549, lr=0.001]
Steps: 1%| | 10/1000 [00:10<10:46, 1.53it/s, loss=0.000549, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4453],
[0.4547]], device='cuda:0')
Current Norm : tensor([0.4408, 0.4493], device='cuda:0')
Steps: 1%| | 10/1000 [00:10<10:46, 1.53it/s, loss=0.079, lr=0.001]
Steps: 1%| | 11/1000 [00:11<10:31, 1.57it/s, loss=0.079, lr=0.001]
Steps: 1%| | 11/1000 [00:11<10:31, 1.57it/s, loss=0.00539, lr=0.001]
Steps: 1%| | 12/1000 [00:12<10:11, 1.62it/s, loss=0.00539, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4424],
[0.4503]], device='cuda:0')
Current Norm : tensor([0.4381, 0.4453], device='cuda:0')
Steps: 1%| | 12/1000 [00:12<10:11, 1.62it/s, loss=0.0134, lr=0.001]
Steps: 1%|▏ | 13/1000 [00:12<10:07, 1.63it/s, loss=0.0134, lr=0.001]
Steps: 1%|▏ | 13/1000 [00:12<10:07, 1.63it/s, loss=0.00237, lr=0.001]
Steps: 1%|▏ | 14/1000 [00:13<09:57, 1.65it/s, loss=0.00237, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4397],
[0.4465]], device='cuda:0')
Current Norm : tensor([0.4357, 0.4418], device='cuda:0')
Steps: 1%|▏ | 14/1000 [00:13<09:57, 1.65it/s, loss=0.0392, lr=0.001]
Steps: 2%|▏ | 15/1000 [00:13<09:57, 1.65it/s, loss=0.0392, lr=0.001]
Steps: 2%|▏ | 15/1000 [00:13<09:57, 1.65it/s, loss=0.025, lr=0.001]
Steps: 2%|▏ | 16/1000 [00:14<09:49, 1.67it/s, loss=0.025, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4370],
[0.4430]], device='cuda:0')
Current Norm : tensor([0.4333, 0.4387], device='cuda:0')
Steps: 2%|▏ | 16/1000 [00:14<09:49, 1.67it/s, loss=0.00716, lr=0.001]
Steps: 2%|▏ | 17/1000 [00:15<09:55, 1.65it/s, loss=0.00716, lr=0.001]
Steps: 2%|▏ | 17/1000 [00:15<09:55, 1.65it/s, loss=0.101, lr=0.001]
Steps: 2%|▏ | 18/1000 [00:15<09:49, 1.67it/s, loss=0.101, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4346],
[0.4399]], device='cuda:0')
Current Norm : tensor([0.4311, 0.4359], device='cuda:0')
Steps: 2%|▏ | 18/1000 [00:15<09:49, 1.67it/s, loss=0.0104, lr=0.001]
Steps: 2%|▏ | 19/1000 [00:16<09:50, 1.66it/s, loss=0.0104, lr=0.001]
Steps: 2%|▏ | 19/1000 [00:16<09:50, 1.66it/s, loss=0.0845, lr=0.001]
Steps: 2%|▏ | 20/1000 [00:16<09:41, 1.69it/s, loss=0.0845, lr=0.001]
tensor(0.0006, device='cuda:0')
tensor([[0.4324],
[0.4371]], device='cuda:0')
Current Norm : tensor([0.4291, 0.4334], device='cuda:0')
Steps: 2%|▏ | 20/1000 [00:16<09:41, 1.69it/s, loss=0.000133, lr=0.001]
Steps: 2%|▏ | 21/1000 [00:17<09:46, 1.67it/s, loss=0.000133, lr=0.001]
Steps: 2%|▏ | 21/1000 [00:17<09:46, 1.67it/s, loss=0.000289, lr=0.001]
Steps: 2%|▏ | 22/1000 [00:18<09:42, 1.68it/s, loss=0.000289, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4304],
[0.4347]], device='cuda:0')
Current Norm : tensor([0.4273, 0.4312], device='cuda:0')
Steps: 2%|▏ | 22/1000 [00:18<09:42, 1.68it/s, loss=0.00106, lr=0.001]
Steps: 2%|▏ | 23/1000 [00:18<09:47, 1.66it/s, loss=0.00106, lr=0.001]
Steps: 2%|▏ | 23/1000 [00:18<09:47, 1.66it/s, loss=0.000295, lr=0.001]
Steps: 2%|▏ | 24/1000 [00:19<09:39, 1.68it/s, loss=0.000295, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4285],
[0.4325]], device='cuda:0')
Current Norm : tensor([0.4256, 0.4292], device='cuda:0')
Steps: 2%|▏ | 24/1000 [00:19<09:39, 1.68it/s, loss=0.0145, lr=0.001]
Steps: 2%|▎ | 25/1000 [00:19<09:43, 1.67it/s, loss=0.0145, lr=0.001]
Steps: 2%|▎ | 25/1000 [00:19<09:43, 1.67it/s, loss=0.0101, lr=0.001]
Steps: 3%|▎ | 26/1000 [00:20<09:37, 1.69it/s, loss=0.0101, lr=0.001]
tensor(0.0068, device='cuda:0')
tensor([[0.4268],
[0.4305]], device='cuda:0')
Current Norm : tensor([0.4241, 0.4275], device='cuda:0')
Steps: 3%|▎ | 26/1000 [00:20<09:37, 1.69it/s, loss=0.000702, lr=0.001]
Steps: 3%|▎ | 27/1000 [00:21<09:40, 1.68it/s, loss=0.000702, lr=0.001]
Steps: 3%|▎ | 27/1000 [00:21<09:40, 1.68it/s, loss=0.0608, lr=0.001]
Steps: 3%|▎ | 28/1000 [00:21<09:33, 1.69it/s, loss=0.0608, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4252],
[0.4288]], device='cuda:0')
Current Norm : tensor([0.4227, 0.4259], device='cuda:0')
Steps: 3%|▎ | 28/1000 [00:21<09:33, 1.69it/s, loss=0.000732, lr=0.001]
Steps: 3%|▎ | 29/1000 [00:22<09:43, 1.66it/s, loss=0.000732, lr=0.001]
Steps: 3%|▎ | 29/1000 [00:22<09:43, 1.66it/s, loss=0.0296, lr=0.001]
Steps: 3%|▎ | 30/1000 [00:22<09:39, 1.67it/s, loss=0.0296, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4238],
[0.4273]], device='cuda:0')
Current Norm : tensor([0.4215, 0.4246], device='cuda:0')
Steps: 3%|▎ | 30/1000 [00:22<09:39, 1.67it/s, loss=0.00688, lr=0.001]
Steps: 3%|▎ | 31/1000 [00:23<11:19, 1.43it/s, loss=0.00688, lr=0.001]
Steps: 3%|▎ | 31/1000 [00:23<11:19, 1.43it/s, loss=0.000233, lr=0.001]
Steps: 3%|▎ | 32/1000 [00:24<10:44, 1.50it/s, loss=0.000233, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4226],
[0.4259]], device='cuda:0')
Current Norm : tensor([0.4203, 0.4233], device='cuda:0')
Steps: 3%|▎ | 32/1000 [00:24<10:44, 1.50it/s, loss=0.0533, lr=0.001]
Steps: 3%|▎ | 33/1000 [00:24<10:26, 1.54it/s, loss=0.0533, lr=0.001]
Steps: 3%|▎ | 33/1000 [00:24<10:26, 1.54it/s, loss=0.00796, lr=0.001]
Steps: 3%|▎ | 34/1000 [00:25<10:07, 1.59it/s, loss=0.00796, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4214],
[0.4246]], device='cuda:0')
Current Norm : tensor([0.4193, 0.4222], device='cuda:0')
Steps: 3%|▎ | 34/1000 [00:25<10:07, 1.59it/s, loss=0.00294, lr=0.001]
Steps: 4%|▎ | 35/1000 [00:26<10:00, 1.61it/s, loss=0.00294, lr=0.001]
Steps: 4%|▎ | 35/1000 [00:26<10:00, 1.61it/s, loss=0.00516, lr=0.001]
Steps: 4%|▎ | 36/1000 [00:26<09:47, 1.64it/s, loss=0.00516, lr=0.001]
tensor(0.0097, device='cuda:0')
tensor([[0.4203],
[0.4232]], device='cuda:0')
Current Norm : tensor([0.4183, 0.4209], device='cuda:0')
Steps: 4%|▎ | 36/1000 [00:26<09:47, 1.64it/s, loss=0.149, lr=0.001]
Steps: 4%|▎ | 37/1000 [00:27<09:46, 1.64it/s, loss=0.149, lr=0.001]
Steps: 4%|▎ | 37/1000 [00:27<09:46, 1.64it/s, loss=0.0233, lr=0.001]
Steps: 4%|▍ | 38/1000 [00:27<09:38, 1.66it/s, loss=0.0233, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4194],
[0.4220]], device='cuda:0')
Current Norm : tensor([0.4174, 0.4198], device='cuda:0')
Steps: 4%|▍ | 38/1000 [00:27<09:38, 1.66it/s, loss=0.105, lr=0.001]
Steps: 4%|▍ | 39/1000 [00:28<09:40, 1.66it/s, loss=0.105, lr=0.001]
Steps: 4%|▍ | 39/1000 [00:28<09:40, 1.66it/s, loss=0.000409, lr=0.001]
Steps: 4%|▍ | 40/1000 [00:29<09:32, 1.68it/s, loss=0.000409, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4186],
[0.4208]], device='cuda:0')
Current Norm : tensor([0.4167, 0.4188], device='cuda:0')
Steps: 4%|▍ | 40/1000 [00:29<09:32, 1.68it/s, loss=0.0079, lr=0.001]
Steps: 4%|▍ | 41/1000 [00:29<09:36, 1.66it/s, loss=0.0079, lr=0.001]
Steps: 4%|▍ | 41/1000 [00:29<09:36, 1.66it/s, loss=0.0223, lr=0.001]
Steps: 4%|▍ | 42/1000 [00:30<09:29, 1.68it/s, loss=0.0223, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4179],
[0.4198]], device='cuda:0')
Current Norm : tensor([0.4161, 0.4178], device='cuda:0')
Steps: 4%|▍ | 42/1000 [00:30<09:29, 1.68it/s, loss=0.0519, lr=0.001]
Steps: 4%|▍ | 43/1000 [00:30<09:38, 1.65it/s, loss=0.0519, lr=0.001]
Steps: 4%|▍ | 43/1000 [00:30<09:38, 1.65it/s, loss=0.00597, lr=0.001]
Steps: 4%|▍ | 44/1000 [00:31<09:30, 1.68it/s, loss=0.00597, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4173],
[0.4189]], device='cuda:0')
Current Norm : tensor([0.4156, 0.4170], device='cuda:0')
Steps: 4%|▍ | 44/1000 [00:31<09:30, 1.68it/s, loss=0.028, lr=0.001]
Steps: 4%|▍ | 45/1000 [00:32<09:34, 1.66it/s, loss=0.028, lr=0.001]
Steps: 4%|▍ | 45/1000 [00:32<09:34, 1.66it/s, loss=0.000535, lr=0.001]
Steps: 5%|▍ | 46/1000 [00:32<09:28, 1.68it/s, loss=0.000535, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4168],
[0.4180]], device='cuda:0')
Current Norm : tensor([0.4151, 0.4162], device='cuda:0')
Steps: 5%|▍ | 46/1000 [00:32<09:28, 1.68it/s, loss=0.0098, lr=0.001]
Steps: 5%|▍ | 47/1000 [00:33<09:33, 1.66it/s, loss=0.0098, lr=0.001]
Steps: 5%|▍ | 47/1000 [00:33<09:33, 1.66it/s, loss=0.00272, lr=0.001]
Steps: 5%|▍ | 48/1000 [00:33<09:26, 1.68it/s, loss=0.00272, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4162],
[0.4172]], device='cuda:0')
Current Norm : tensor([0.4146, 0.4155], device='cuda:0')
Steps: 5%|▍ | 48/1000 [00:33<09:26, 1.68it/s, loss=0.0584, lr=0.001]
Steps: 5%|▍ | 49/1000 [00:34<09:33, 1.66it/s, loss=0.0584, lr=0.001]
Steps: 5%|▍ | 49/1000 [00:34<09:33, 1.66it/s, loss=0.000687, lr=0.001]
Steps: 5%|▌ | 50/1000 [00:35<09:28, 1.67it/s, loss=0.000687, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4157],
[0.4165]], device='cuda:0')
Current Norm : tensor([0.4141, 0.4149], device='cuda:0')
Steps: 5%|▌ | 50/1000 [00:35<09:28, 1.67it/s, loss=0.000356, lr=0.001]
Steps: 5%|▌ | 51/1000 [00:35<09:31, 1.66it/s, loss=0.000356, lr=0.001]
Steps: 5%|▌ | 51/1000 [00:35<09:31, 1.66it/s, loss=0.0267, lr=0.001]
Steps: 5%|▌ | 52/1000 [00:36<09:25, 1.68it/s, loss=0.0267, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4152],
[0.4158]], device='cuda:0')
Current Norm : tensor([0.4137, 0.4142], device='cuda:0')
Steps: 5%|▌ | 52/1000 [00:36<09:25, 1.68it/s, loss=0.000584, lr=0.001]
Steps: 5%|▌ | 53/1000 [00:36<09:29, 1.66it/s, loss=0.000584, lr=0.001]
Steps: 5%|▌ | 53/1000 [00:36<09:29, 1.66it/s, loss=0.00399, lr=0.001]
Steps: 5%|▌ | 54/1000 [00:37<09:22, 1.68it/s, loss=0.00399, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4148],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4133, 0.4136], device='cuda:0')
Steps: 5%|▌ | 54/1000 [00:37<09:22, 1.68it/s, loss=0.000407, lr=0.001]
Steps: 6%|▌ | 55/1000 [00:38<09:24, 1.68it/s, loss=0.000407, lr=0.001]
Steps: 6%|▌ | 55/1000 [00:38<09:24, 1.68it/s, loss=0.00915, lr=0.001]
Steps: 6%|▌ | 56/1000 [00:38<09:18, 1.69it/s, loss=0.00915, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4144],
[0.4145]], device='cuda:0')
Current Norm : tensor([0.4129, 0.4131], device='cuda:0')
Steps: 6%|▌ | 56/1000 [00:38<09:18, 1.69it/s, loss=0.00812, lr=0.001]
Steps: 6%|▌ | 57/1000 [00:39<09:24, 1.67it/s, loss=0.00812, lr=0.001]
Steps: 6%|▌ | 57/1000 [00:39<09:24, 1.67it/s, loss=0.000517, lr=0.001]
Steps: 6%|▌ | 58/1000 [00:39<09:19, 1.68it/s, loss=0.000517, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4140],
[0.4139]], device='cuda:0')
Current Norm : tensor([0.4126, 0.4125], device='cuda:0')
Steps: 6%|▌ | 58/1000 [00:39<09:19, 1.68it/s, loss=0.00064, lr=0.001]
Steps: 6%|▌ | 59/1000 [00:40<09:24, 1.67it/s, loss=0.00064, lr=0.001]
Steps: 6%|▌ | 59/1000 [00:40<09:24, 1.67it/s, loss=0.0373, lr=0.001]
Steps: 6%|▌ | 60/1000 [00:41<09:18, 1.68it/s, loss=0.0373, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4136],
[0.4134]], device='cuda:0')
Current Norm : tensor([0.4122, 0.4121], device='cuda:0')
Steps: 6%|▌ | 60/1000 [00:41<09:18, 1.68it/s, loss=0.0119, lr=0.001]
Steps: 6%|▌ | 61/1000 [00:41<09:25, 1.66it/s, loss=0.0119, lr=0.001]
Steps: 6%|▌ | 61/1000 [00:41<09:25, 1.66it/s, loss=0.000365, lr=0.001]
Steps: 6%|▌ | 62/1000 [00:42<09:20, 1.67it/s, loss=0.000365, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4132],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4119, 0.4116], device='cuda:0')
Steps: 6%|▌ | 62/1000 [00:42<09:20, 1.67it/s, loss=0.00457, lr=0.001]
Steps: 6%|▋ | 63/1000 [00:42<09:24, 1.66it/s, loss=0.00457, lr=0.001]
Steps: 6%|▋ | 63/1000 [00:42<09:24, 1.66it/s, loss=0.038, lr=0.001]
Steps: 6%|▋ | 64/1000 [00:43<09:19, 1.67it/s, loss=0.038, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4128],
[0.4124]], device='cuda:0')
Current Norm : tensor([0.4116, 0.4112], device='cuda:0')
Steps: 6%|▋ | 64/1000 [00:43<09:19, 1.67it/s, loss=0.0021, lr=0.001]
Steps: 6%|▋ | 65/1000 [00:44<09:23, 1.66it/s, loss=0.0021, lr=0.001]
Steps: 6%|▋ | 65/1000 [00:44<09:23, 1.66it/s, loss=0.000878, lr=0.001]
Steps: 7%|▋ | 66/1000 [00:44<09:17, 1.68it/s, loss=0.000878, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4125],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4113, 0.4108], device='cuda:0')
Steps: 7%|▋ | 66/1000 [00:44<09:17, 1.68it/s, loss=0.0884, lr=0.001]
Steps: 7%|▋ | 67/1000 [00:45<09:21, 1.66it/s, loss=0.0884, lr=0.001]
Steps: 7%|▋ | 67/1000 [00:45<09:21, 1.66it/s, loss=0.00163, lr=0.001]
Steps: 7%|▋ | 68/1000 [00:45<09:15, 1.68it/s, loss=0.00163, lr=0.001]
tensor(0.0089, device='cuda:0')
tensor([[0.4123],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4106], device='cuda:0')
Steps: 7%|▋ | 68/1000 [00:45<09:15, 1.68it/s, loss=0.00931, lr=0.001]
Steps: 7%|▋ | 69/1000 [00:46<09:19, 1.67it/s, loss=0.00931, lr=0.001]
Steps: 7%|▋ | 69/1000 [00:46<09:19, 1.67it/s, loss=0.00434, lr=0.001]
Steps: 7%|▋ | 70/1000 [00:47<09:13, 1.68it/s, loss=0.00434, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4121],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4104], device='cuda:0')
Steps: 7%|▋ | 70/1000 [00:47<09:13, 1.68it/s, loss=0.00211, lr=0.001]
Steps: 7%|▋ | 71/1000 [00:47<09:22, 1.65it/s, loss=0.00211, lr=0.001]
Steps: 7%|▋ | 71/1000 [00:47<09:22, 1.65it/s, loss=0.0829, lr=0.001]
Steps: 7%|▋ | 72/1000 [00:48<09:16, 1.67it/s, loss=0.0829, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4119],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4104], device='cuda:0')
Steps: 7%|▋ | 72/1000 [00:48<09:16, 1.67it/s, loss=0.00421, lr=0.001]
Steps: 7%|▋ | 73/1000 [00:48<09:19, 1.66it/s, loss=0.00421, lr=0.001]
Steps: 7%|▋ | 73/1000 [00:48<09:19, 1.66it/s, loss=0.00171, lr=0.001]
Steps: 7%|▋ | 74/1000 [00:49<09:12, 1.67it/s, loss=0.00171, lr=0.001]
tensor(0.0080, device='cuda:0')
tensor([[0.4116],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4105, 0.4103], device='cuda:0')
Steps: 7%|▋ | 74/1000 [00:49<09:12, 1.67it/s, loss=0.0162, lr=0.001]
Steps: 8%|▊ | 75/1000 [00:50<09:16, 1.66it/s, loss=0.0162, lr=0.001]
Steps: 8%|▊ | 75/1000 [00:50<09:16, 1.66it/s, loss=0.00691, lr=0.001]
Steps: 8%|▊ | 76/1000 [00:50<09:12, 1.67it/s, loss=0.00691, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4114],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4103, 0.4103], device='cuda:0')
Steps: 8%|▊ | 76/1000 [00:50<09:12, 1.67it/s, loss=0.002, lr=0.001]
Steps: 8%|▊ | 77/1000 [00:51<09:20, 1.65it/s, loss=0.002, lr=0.001]
Steps: 8%|▊ | 77/1000 [00:51<09:20, 1.65it/s, loss=0.000539, lr=0.001]
Steps: 8%|▊ | 78/1000 [00:51<09:16, 1.66it/s, loss=0.000539, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4112],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4101, 0.4103], device='cuda:0')
Steps: 8%|▊ | 78/1000 [00:51<09:16, 1.66it/s, loss=0.0036, lr=0.001]
Steps: 8%|▊ | 79/1000 [00:52<09:21, 1.64it/s, loss=0.0036, lr=0.001]
Steps: 8%|▊ | 79/1000 [00:52<09:21, 1.64it/s, loss=0.00652, lr=0.001]
Steps: 8%|▊ | 80/1000 [00:53<09:16, 1.65it/s, loss=0.00652, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4111],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4103], device='cuda:0')
Steps: 8%|▊ | 80/1000 [00:53<09:16, 1.65it/s, loss=0.00137, lr=0.001]
Steps: 8%|▊ | 81/1000 [00:53<09:18, 1.64it/s, loss=0.00137, lr=0.001]
Steps: 8%|▊ | 81/1000 [00:53<09:18, 1.64it/s, loss=0.00362, lr=0.001]
Steps: 8%|▊ | 82/1000 [00:54<09:12, 1.66it/s, loss=0.00362, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4109],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4103], device='cuda:0')
Steps: 8%|▊ | 82/1000 [00:54<09:12, 1.66it/s, loss=0.0197, lr=0.001]
Steps: 8%|▊ | 83/1000 [00:54<09:16, 1.65it/s, loss=0.0197, lr=0.001]
Steps: 8%|▊ | 83/1000 [00:54<09:16, 1.65it/s, loss=0.00979, lr=0.001]
Steps: 8%|▊ | 84/1000 [00:55<09:10, 1.66it/s, loss=0.00979, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4107],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4103], device='cuda:0')
Steps: 8%|▊ | 84/1000 [00:55<09:10, 1.66it/s, loss=0.0264, lr=0.001]
Steps: 8%|▊ | 85/1000 [00:56<09:14, 1.65it/s, loss=0.0264, lr=0.001]
Steps: 8%|▊ | 85/1000 [00:56<09:14, 1.65it/s, loss=0.000609, lr=0.001]
Steps: 9%|▊ | 86/1000 [00:56<09:11, 1.66it/s, loss=0.000609, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4106],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4103], device='cuda:0')
Steps: 9%|▊ | 86/1000 [00:56<09:11, 1.66it/s, loss=0.00065, lr=0.001]
Steps: 9%|▊ | 87/1000 [00:57<09:15, 1.64it/s, loss=0.00065, lr=0.001]
Steps: 9%|▊ | 87/1000 [00:57<09:15, 1.64it/s, loss=0.0177, lr=0.001]
Steps: 9%|▉ | 88/1000 [00:57<09:08, 1.66it/s, loss=0.0177, lr=0.001]
tensor(0.0062, device='cuda:0')
tensor([[0.4104],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4103], device='cuda:0')
Steps: 9%|▉ | 88/1000 [00:57<09:08, 1.66it/s, loss=0.0455, lr=0.001]
Steps: 9%|▉ | 89/1000 [00:58<09:12, 1.65it/s, loss=0.0455, lr=0.001]
Steps: 9%|▉ | 89/1000 [00:58<09:12, 1.65it/s, loss=0.0164, lr=0.001]
Steps: 9%|▉ | 90/1000 [00:59<09:07, 1.66it/s, loss=0.0164, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4103],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4104], device='cuda:0')
Steps: 9%|▉ | 90/1000 [00:59<09:07, 1.66it/s, loss=0.00294, lr=0.001]
Steps: 9%|▉ | 91/1000 [00:59<09:13, 1.64it/s, loss=0.00294, lr=0.001]
Steps: 9%|▉ | 91/1000 [00:59<09:13, 1.64it/s, loss=0.000789, lr=0.001]
Steps: 9%|▉ | 92/1000 [01:00<09:06, 1.66it/s, loss=0.000789, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4102],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4104], device='cuda:0')
Steps: 9%|▉ | 92/1000 [01:00<09:06, 1.66it/s, loss=0.000544, lr=0.001]
Steps: 9%|▉ | 93/1000 [01:00<09:08, 1.65it/s, loss=0.000544, lr=0.001]
Steps: 9%|▉ | 93/1000 [01:00<09:08, 1.65it/s, loss=0.000953, lr=0.001]
Steps: 9%|▉ | 94/1000 [01:01<09:02, 1.67it/s, loss=0.000953, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4101],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4105], device='cuda:0')
Steps: 9%|▉ | 94/1000 [01:01<09:02, 1.67it/s, loss=0.000429, lr=0.001]
Steps: 10%|▉ | 95/1000 [01:02<09:06, 1.66it/s, loss=0.000429, lr=0.001]
Steps: 10%|▉ | 95/1000 [01:02<09:06, 1.66it/s, loss=0.00231, lr=0.001]
Steps: 10%|▉ | 96/1000 [01:02<09:04, 1.66it/s, loss=0.00231, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4100],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4105], device='cuda:0')
Steps: 10%|▉ | 96/1000 [01:02<09:04, 1.66it/s, loss=0.00377, lr=0.001]
Steps: 10%|▉ | 97/1000 [01:03<09:11, 1.64it/s, loss=0.00377, lr=0.001]
Steps: 10%|▉ | 97/1000 [01:03<09:11, 1.64it/s, loss=0.000302, lr=0.001]
Steps: 10%|▉ | 98/1000 [01:03<09:03, 1.66it/s, loss=0.000302, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4099],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4105], device='cuda:0')
Steps: 10%|▉ | 98/1000 [01:03<09:03, 1.66it/s, loss=0.0706, lr=0.001]
Steps: 10%|▉ | 99/1000 [01:04<09:07, 1.64it/s, loss=0.0706, lr=0.001]
Steps: 10%|▉ | 99/1000 [01:04<09:07, 1.64it/s, loss=0.00305, lr=0.001]
Steps: 10%|█ | 100/1000 [01:05<09:03, 1.65it/s, loss=0.00305, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 1.5137e-03, 9.1326e-05, -1.6363e-03, -1.4289e-02], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([-0.0204, -0.0029, 0.0139, -0.0003], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_100.safetensors
tensor(0.0076, device='cuda:0')
tensor([[0.4098],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4103], device='cuda:0')
Steps: 10%|█ | 100/1000 [01:05<09:03, 1.65it/s, loss=0.000652, lr=0.001]
Steps: 10%|█ | 101/1000 [01:05<09:10, 1.63it/s, loss=0.000652, lr=0.001]
Steps: 10%|█ | 101/1000 [01:05<09:10, 1.63it/s, loss=0.0218, lr=0.001]
Steps: 10%|█ | 102/1000 [01:06<09:03, 1.65it/s, loss=0.0218, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4096],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4103], device='cuda:0')
Steps: 10%|█ | 102/1000 [01:06<09:03, 1.65it/s, loss=0.00278, lr=0.001]
Steps: 10%|█ | 103/1000 [01:07<09:04, 1.65it/s, loss=0.00278, lr=0.001]
Steps: 10%|█ | 103/1000 [01:07<09:04, 1.65it/s, loss=0.00829, lr=0.001]
Steps: 10%|█ | 104/1000 [01:07<08:59, 1.66it/s, loss=0.00829, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4096],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4103], device='cuda:0')
Steps: 10%|█ | 104/1000 [01:07<08:59, 1.66it/s, loss=0.0058, lr=0.001]
Steps: 10%|█ | 105/1000 [01:08<09:04, 1.64it/s, loss=0.0058, lr=0.001]
Steps: 10%|█ | 105/1000 [01:08<09:04, 1.64it/s, loss=0.000726, lr=0.001]
Steps: 11%|█ | 106/1000 [01:08<08:59, 1.66it/s, loss=0.000726, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4095],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4102], device='cuda:0')
Steps: 11%|█ | 106/1000 [01:08<08:59, 1.66it/s, loss=0.00454, lr=0.001]
Steps: 11%|█ | 107/1000 [01:09<09:04, 1.64it/s, loss=0.00454, lr=0.001]
Steps: 11%|█ | 107/1000 [01:09<09:04, 1.64it/s, loss=0.00176, lr=0.001]
Steps: 11%|█ | 108/1000 [01:10<08:57, 1.66it/s, loss=0.00176, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4094],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4103], device='cuda:0')
Steps: 11%|█ | 108/1000 [01:10<08:57, 1.66it/s, loss=0.0143, lr=0.001]
Steps: 11%|█ | 109/1000 [01:10<09:00, 1.65it/s, loss=0.0143, lr=0.001]
Steps: 11%|█ | 109/1000 [01:10<09:00, 1.65it/s, loss=0.00456, lr=0.001]
Steps: 11%|█ | 110/1000 [01:11<08:56, 1.66it/s, loss=0.00456, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4093],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4103], device='cuda:0')
Steps: 11%|█ | 110/1000 [01:11<08:56, 1.66it/s, loss=0.0555, lr=0.001]
Steps: 11%|█ | 111/1000 [01:11<09:01, 1.64it/s, loss=0.0555, lr=0.001]
Steps: 11%|█ | 111/1000 [01:11<09:01, 1.64it/s, loss=0.00316, lr=0.001]
Steps: 11%|█ | 112/1000 [01:12<08:55, 1.66it/s, loss=0.00316, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4092],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4103], device='cuda:0')
Steps: 11%|█ | 112/1000 [01:12<08:55, 1.66it/s, loss=0.00851, lr=0.001]
Steps: 11%|█▏ | 113/1000 [01:13<08:57, 1.65it/s, loss=0.00851, lr=0.001]
Steps: 11%|█▏ | 113/1000 [01:13<08:57, 1.65it/s, loss=0.0277, lr=0.001]
Steps: 11%|█▏ | 114/1000 [01:13<08:55, 1.65it/s, loss=0.0277, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4091],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4104], device='cuda:0')
Steps: 11%|█▏ | 114/1000 [01:13<08:55, 1.65it/s, loss=0.00622, lr=0.001]
Steps: 12%|█▏ | 115/1000 [01:14<09:01, 1.63it/s, loss=0.00622, lr=0.001]
Steps: 12%|█▏ | 115/1000 [01:14<09:01, 1.63it/s, loss=0.0382, lr=0.001]
Steps: 12%|█▏ | 116/1000 [01:14<08:56, 1.65it/s, loss=0.0382, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4090],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4105], device='cuda:0')
Steps: 12%|█▏ | 116/1000 [01:14<08:56, 1.65it/s, loss=0.00442, lr=0.001]
Steps: 12%|█▏ | 117/1000 [01:15<08:58, 1.64it/s, loss=0.00442, lr=0.001]
Steps: 12%|█▏ | 117/1000 [01:15<08:58, 1.64it/s, loss=0.0118, lr=0.001]
Steps: 12%|█▏ | 118/1000 [01:16<08:53, 1.65it/s, loss=0.0118, lr=0.001]
tensor(0.0080, device='cuda:0')
tensor([[0.4090],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4106], device='cuda:0')
Steps: 12%|█▏ | 118/1000 [01:16<08:53, 1.65it/s, loss=0.0209, lr=0.001]
Steps: 12%|█▏ | 119/1000 [01:16<08:57, 1.64it/s, loss=0.0209, lr=0.001]
Steps: 12%|█▏ | 119/1000 [01:16<08:57, 1.64it/s, loss=0.0189, lr=0.001]
Steps: 12%|█▏ | 120/1000 [01:17<08:55, 1.64it/s, loss=0.0189, lr=0.001]
tensor(0.0190, device='cuda:0')
tensor([[0.4091],
[0.4119]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4107], device='cuda:0')
Steps: 12%|█▏ | 120/1000 [01:17<08:55, 1.64it/s, loss=0.0302, lr=0.001]
Steps: 12%|█▏ | 121/1000 [01:17<09:00, 1.62it/s, loss=0.0302, lr=0.001]
Steps: 12%|█▏ | 121/1000 [01:17<09:00, 1.62it/s, loss=0.0261, lr=0.001]
Steps: 12%|█▏ | 122/1000 [01:18<08:52, 1.65it/s, loss=0.0261, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4093],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4110], device='cuda:0')
Steps: 12%|█▏ | 122/1000 [01:18<08:52, 1.65it/s, loss=0.00258, lr=0.001]
Steps: 12%|█▏ | 123/1000 [01:19<08:55, 1.64it/s, loss=0.00258, lr=0.001]
Steps: 12%|█▏ | 123/1000 [01:19<08:55, 1.64it/s, loss=0.000634, lr=0.001]
Steps: 12%|█▏ | 124/1000 [01:19<08:49, 1.65it/s, loss=0.000634, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4096],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4113], device='cuda:0')
Steps: 12%|█▏ | 124/1000 [01:19<08:49, 1.65it/s, loss=0.00197, lr=0.001]
Steps: 12%|█▎ | 125/1000 [01:20<08:51, 1.65it/s, loss=0.00197, lr=0.001]
Steps: 12%|█▎ | 125/1000 [01:20<08:51, 1.65it/s, loss=0.0138, lr=0.001]
Steps: 13%|█▎ | 126/1000 [01:20<08:44, 1.67it/s, loss=0.0138, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4098],
[0.4128]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4116], device='cuda:0')
Steps: 13%|█▎ | 126/1000 [01:20<08:44, 1.67it/s, loss=0.0196, lr=0.001]
Steps: 13%|█▎ | 127/1000 [01:21<08:48, 1.65it/s, loss=0.0196, lr=0.001]
Steps: 13%|█▎ | 127/1000 [01:21<08:48, 1.65it/s, loss=0.000179, lr=0.001]
Steps: 13%|█▎ | 128/1000 [01:22<08:45, 1.66it/s, loss=0.000179, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4100],
[0.4132]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4119], device='cuda:0')
Steps: 13%|█▎ | 128/1000 [01:22<08:45, 1.66it/s, loss=0.0143, lr=0.001]
Steps: 13%|█▎ | 129/1000 [01:22<08:47, 1.65it/s, loss=0.0143, lr=0.001]
Steps: 13%|█▎ | 129/1000 [01:22<08:47, 1.65it/s, loss=0.0112, lr=0.001]
Steps: 13%|█▎ | 130/1000 [01:23<08:45, 1.66it/s, loss=0.0112, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4101],
[0.4136]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4122], device='cuda:0')
Steps: 13%|█▎ | 130/1000 [01:23<08:45, 1.66it/s, loss=0.029, lr=0.001]
Steps: 13%|█▎ | 131/1000 [01:24<08:52, 1.63it/s, loss=0.029, lr=0.001]
Steps: 13%|█▎ | 131/1000 [01:24<08:52, 1.63it/s, loss=0.00343, lr=0.001]
Steps: 13%|█▎ | 132/1000 [01:24<08:49, 1.64it/s, loss=0.00343, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4103],
[0.4139]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4125], device='cuda:0')
Steps: 13%|█▎ | 132/1000 [01:24<08:49, 1.64it/s, loss=0.00165, lr=0.001]
Steps: 13%|█▎ | 133/1000 [01:25<08:52, 1.63it/s, loss=0.00165, lr=0.001]
Steps: 13%|█▎ | 133/1000 [01:25<08:52, 1.63it/s, loss=0.00451, lr=0.001]
Steps: 13%|█▎ | 134/1000 [01:25<08:45, 1.65it/s, loss=0.00451, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4104],
[0.4142]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4128], device='cuda:0')
Steps: 13%|█▎ | 134/1000 [01:25<08:45, 1.65it/s, loss=0.00724, lr=0.001]
Steps: 14%|█▎ | 135/1000 [01:26<08:51, 1.63it/s, loss=0.00724, lr=0.001]
Steps: 14%|█▎ | 135/1000 [01:26<08:51, 1.63it/s, loss=0.000926, lr=0.001]
Steps: 14%|█▎ | 136/1000 [01:27<08:47, 1.64it/s, loss=0.000926, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4104],
[0.4145]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4130], device='cuda:0')
Steps: 14%|█▎ | 136/1000 [01:27<08:47, 1.64it/s, loss=0.000544, lr=0.001]
Steps: 14%|█▎ | 137/1000 [01:27<08:49, 1.63it/s, loss=0.000544, lr=0.001]
Steps: 14%|█▎ | 137/1000 [01:27<08:49, 1.63it/s, loss=0.0385, lr=0.001]
Steps: 14%|█▍ | 138/1000 [01:28<08:44, 1.64it/s, loss=0.0385, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4104],
[0.4147]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4132], device='cuda:0')
Steps: 14%|█▍ | 138/1000 [01:28<08:44, 1.64it/s, loss=0.0509, lr=0.001]
Steps: 14%|█▍ | 139/1000 [01:28<08:45, 1.64it/s, loss=0.0509, lr=0.001]
Steps: 14%|█▍ | 139/1000 [01:28<08:45, 1.64it/s, loss=0.000223, lr=0.001]
Steps: 14%|█▍ | 140/1000 [01:29<08:39, 1.66it/s, loss=0.000223, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4104],
[0.4149]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4134], device='cuda:0')
Steps: 14%|█▍ | 140/1000 [01:29<08:39, 1.66it/s, loss=0.113, lr=0.001]
Steps: 14%|█▍ | 141/1000 [01:30<08:44, 1.64it/s, loss=0.113, lr=0.001]
Steps: 14%|█▍ | 141/1000 [01:30<08:44, 1.64it/s, loss=0.0127, lr=0.001]
Steps: 14%|█▍ | 142/1000 [01:30<08:38, 1.65it/s, loss=0.0127, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4105],
[0.4151]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4135], device='cuda:0')
Steps: 14%|█▍ | 142/1000 [01:30<08:38, 1.65it/s, loss=0.0342, lr=0.001]
Steps: 14%|█▍ | 143/1000 [01:31<08:40, 1.65it/s, loss=0.0342, lr=0.001]
Steps: 14%|█▍ | 143/1000 [01:31<08:40, 1.65it/s, loss=0.00191, lr=0.001]
Steps: 14%|█▍ | 144/1000 [01:31<08:36, 1.66it/s, loss=0.00191, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4106],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4137], device='cuda:0')
Steps: 14%|█▍ | 144/1000 [01:31<08:36, 1.66it/s, loss=0.0527, lr=0.001]
Steps: 14%|█▍ | 145/1000 [01:32<08:39, 1.65it/s, loss=0.0527, lr=0.001]
Steps: 14%|█▍ | 145/1000 [01:32<08:39, 1.65it/s, loss=0.000642, lr=0.001]
Steps: 15%|█▍ | 146/1000 [01:33<08:34, 1.66it/s, loss=0.000642, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4137], device='cuda:0')
Steps: 15%|█▍ | 146/1000 [01:33<08:34, 1.66it/s, loss=0.000191, lr=0.001]
Steps: 15%|█▍ | 147/1000 [01:33<08:39, 1.64it/s, loss=0.000191, lr=0.001]
Steps: 15%|█▍ | 147/1000 [01:33<08:39, 1.64it/s, loss=0.0652, lr=0.001]
Steps: 15%|█▍ | 148/1000 [01:34<08:36, 1.65it/s, loss=0.0652, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4137], device='cuda:0')
Steps: 15%|█▍ | 148/1000 [01:34<08:36, 1.65it/s, loss=0.00285, lr=0.001]
Steps: 15%|█▍ | 149/1000 [01:34<08:39, 1.64it/s, loss=0.00285, lr=0.001]
Steps: 15%|█▍ | 149/1000 [01:34<08:39, 1.64it/s, loss=0.0492, lr=0.001]
Steps: 15%|█▌ | 150/1000 [01:35<08:34, 1.65it/s, loss=0.0492, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4137], device='cuda:0')
Steps: 15%|█▌ | 150/1000 [01:35<08:34, 1.65it/s, loss=0.00289, lr=0.001]
Steps: 15%|█▌ | 151/1000 [01:36<08:41, 1.63it/s, loss=0.00289, lr=0.001]
Steps: 15%|█▌ | 151/1000 [01:36<08:41, 1.63it/s, loss=0.0184, lr=0.001]
Steps: 15%|█▌ | 152/1000 [01:36<08:34, 1.65it/s, loss=0.0184, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4137], device='cuda:0')
Steps: 15%|█▌ | 152/1000 [01:36<08:34, 1.65it/s, loss=0.00912, lr=0.001]
Steps: 15%|█▌ | 153/1000 [01:37<08:39, 1.63it/s, loss=0.00912, lr=0.001]
Steps: 15%|█▌ | 153/1000 [01:37<08:39, 1.63it/s, loss=0.0912, lr=0.001]
Steps: 15%|█▌ | 154/1000 [01:38<08:34, 1.65it/s, loss=0.0912, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4136], device='cuda:0')
Steps: 15%|█▌ | 154/1000 [01:38<08:34, 1.65it/s, loss=0.00224, lr=0.001]
Steps: 16%|█▌ | 155/1000 [01:38<08:38, 1.63it/s, loss=0.00224, lr=0.001]
Steps: 16%|█▌ | 155/1000 [01:38<08:38, 1.63it/s, loss=0.0296, lr=0.001]
Steps: 16%|█▌ | 156/1000 [01:39<08:34, 1.64it/s, loss=0.0296, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4107],
[0.4151]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4136], device='cuda:0')
Steps: 16%|█▌ | 156/1000 [01:39<08:34, 1.64it/s, loss=0.0158, lr=0.001]
Steps: 16%|█▌ | 157/1000 [01:39<08:37, 1.63it/s, loss=0.0158, lr=0.001]
Steps: 16%|█▌ | 157/1000 [01:39<08:37, 1.63it/s, loss=0.0173, lr=0.001]
Steps: 16%|█▌ | 158/1000 [01:40<08:32, 1.64it/s, loss=0.0173, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4106],
[0.4151]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4136], device='cuda:0')
Steps: 16%|█▌ | 158/1000 [01:40<08:32, 1.64it/s, loss=0.00296, lr=0.001]
Steps: 16%|█▌ | 159/1000 [01:41<08:37, 1.63it/s, loss=0.00296, lr=0.001]
Steps: 16%|█▌ | 159/1000 [01:41<08:37, 1.63it/s, loss=0.000782, lr=0.001]
Steps: 16%|█▌ | 160/1000 [01:41<08:35, 1.63it/s, loss=0.000782, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4105],
[0.4150]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4135], device='cuda:0')
Steps: 16%|█▌ | 160/1000 [01:41<08:35, 1.63it/s, loss=0.0113, lr=0.001]
Steps: 16%|█▌ | 161/1000 [01:42<08:36, 1.62it/s, loss=0.0113, lr=0.001]
Steps: 16%|█▌ | 161/1000 [01:42<08:36, 1.62it/s, loss=0.019, lr=0.001]
Steps: 16%|█▌ | 162/1000 [01:42<08:31, 1.64it/s, loss=0.019, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4104],
[0.4149]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4134], device='cuda:0')
Steps: 16%|█▌ | 162/1000 [01:42<08:31, 1.64it/s, loss=0.00382, lr=0.001]
Steps: 16%|█▋ | 163/1000 [01:43<08:33, 1.63it/s, loss=0.00382, lr=0.001]
Steps: 16%|█▋ | 163/1000 [01:43<08:33, 1.63it/s, loss=0.00187, lr=0.001]
Steps: 16%|█▋ | 164/1000 [01:44<08:30, 1.64it/s, loss=0.00187, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4103],
[0.4148]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4133], device='cuda:0')
Steps: 16%|█▋ | 164/1000 [01:44<08:30, 1.64it/s, loss=0.00628, lr=0.001]
Steps: 16%|█▋ | 165/1000 [01:44<08:35, 1.62it/s, loss=0.00628, lr=0.001]
Steps: 16%|█▋ | 165/1000 [01:44<08:35, 1.62it/s, loss=0.00432, lr=0.001]
Steps: 17%|█▋ | 166/1000 [01:45<08:32, 1.63it/s, loss=0.00432, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4101],
[0.4148]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4133], device='cuda:0')
Steps: 17%|█▋ | 166/1000 [01:45<08:32, 1.63it/s, loss=0.00689, lr=0.001]
Steps: 17%|█▋ | 167/1000 [01:46<08:36, 1.61it/s, loss=0.00689, lr=0.001]
Steps: 17%|█▋ | 167/1000 [01:46<08:36, 1.61it/s, loss=0.106, lr=0.001]
Steps: 17%|█▋ | 168/1000 [01:46<08:31, 1.63it/s, loss=0.106, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4099],
[0.4148]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4133], device='cuda:0')
Steps: 17%|█▋ | 168/1000 [01:46<08:31, 1.63it/s, loss=0.000728, lr=0.001]
Steps: 17%|█▋ | 169/1000 [01:47<08:34, 1.62it/s, loss=0.000728, lr=0.001]
Steps: 17%|█▋ | 169/1000 [01:47<08:34, 1.62it/s, loss=0.012, lr=0.001]
Steps: 17%|█▋ | 170/1000 [01:47<08:29, 1.63it/s, loss=0.012, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4096],
[0.4147]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4132], device='cuda:0')
Steps: 17%|█▋ | 170/1000 [01:47<08:29, 1.63it/s, loss=0.0805, lr=0.001]
Steps: 17%|█▋ | 171/1000 [01:48<08:33, 1.62it/s, loss=0.0805, lr=0.001]
Steps: 17%|█▋ | 171/1000 [01:48<08:33, 1.62it/s, loss=0.000757, lr=0.001]
Steps: 17%|█▋ | 172/1000 [01:49<08:28, 1.63it/s, loss=0.000757, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4094],
[0.4146]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4132], device='cuda:0')
Steps: 17%|█▋ | 172/1000 [01:49<08:28, 1.63it/s, loss=0.000506, lr=0.001]
Steps: 17%|█▋ | 173/1000 [01:49<08:32, 1.61it/s, loss=0.000506, lr=0.001]
Steps: 17%|█▋ | 173/1000 [01:49<08:32, 1.61it/s, loss=0.00117, lr=0.001]
Steps: 17%|█▋ | 174/1000 [01:50<08:28, 1.62it/s, loss=0.00117, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4093],
[0.4145]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4130], device='cuda:0')
Steps: 17%|█▋ | 174/1000 [01:50<08:28, 1.62it/s, loss=0.0104, lr=0.001]
Steps: 18%|█▊ | 175/1000 [01:50<08:30, 1.62it/s, loss=0.0104, lr=0.001]
Steps: 18%|█▊ | 175/1000 [01:50<08:30, 1.62it/s, loss=0.00425, lr=0.001]
Steps: 18%|█▊ | 176/1000 [01:51<08:27, 1.62it/s, loss=0.00425, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4091],
[0.4143]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4128], device='cuda:0')
Steps: 18%|█▊ | 176/1000 [01:51<08:27, 1.62it/s, loss=0.067, lr=0.001]
Steps: 18%|█▊ | 177/1000 [01:52<08:30, 1.61it/s, loss=0.067, lr=0.001]
Steps: 18%|█▊ | 177/1000 [01:52<08:30, 1.61it/s, loss=0.00193, lr=0.001]
Steps: 18%|█▊ | 178/1000 [01:52<08:25, 1.63it/s, loss=0.00193, lr=0.001]
tensor(0.0126, device='cuda:0')
tensor([[0.4089],
[0.4140]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4126], device='cuda:0')
Steps: 18%|█▊ | 178/1000 [01:52<08:25, 1.63it/s, loss=0.0166, lr=0.001]
Steps: 18%|█▊ | 179/1000 [01:53<08:29, 1.61it/s, loss=0.0166, lr=0.001]
Steps: 18%|█▊ | 179/1000 [01:53<08:29, 1.61it/s, loss=0.0151, lr=0.001]
Steps: 18%|█▊ | 180/1000 [01:54<08:24, 1.62it/s, loss=0.0151, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4087],
[0.4137]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4123], device='cuda:0')
Steps: 18%|█▊ | 180/1000 [01:54<08:24, 1.62it/s, loss=0.00435, lr=0.001]
Steps: 18%|█▊ | 181/1000 [01:54<08:30, 1.60it/s, loss=0.00435, lr=0.001]
Steps: 18%|█▊ | 181/1000 [01:54<08:30, 1.60it/s, loss=0.0246, lr=0.001]
Steps: 18%|█▊ | 182/1000 [01:55<08:27, 1.61it/s, loss=0.0246, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4087],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4122], device='cuda:0')
Steps: 18%|█▊ | 182/1000 [01:55<08:27, 1.61it/s, loss=0.00682, lr=0.001]
Steps: 18%|█▊ | 183/1000 [01:55<08:30, 1.60it/s, loss=0.00682, lr=0.001]
Steps: 18%|█▊ | 183/1000 [01:55<08:30, 1.60it/s, loss=0.0226, lr=0.001]
Steps: 18%|█▊ | 184/1000 [01:56<08:25, 1.61it/s, loss=0.0226, lr=0.001]
tensor(0.0073, device='cuda:0')
tensor([[0.4088],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4122], device='cuda:0')
Steps: 18%|█▊ | 184/1000 [01:56<08:25, 1.61it/s, loss=0.0133, lr=0.001]
Steps: 18%|█▊ | 185/1000 [01:57<08:29, 1.60it/s, loss=0.0133, lr=0.001]
Steps: 18%|█▊ | 185/1000 [01:57<08:29, 1.60it/s, loss=0.0136, lr=0.001]
Steps: 19%|█▊ | 186/1000 [01:57<08:23, 1.62it/s, loss=0.0136, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4089],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4121], device='cuda:0')
Steps: 19%|█▊ | 186/1000 [01:57<08:23, 1.62it/s, loss=0.000371, lr=0.001]
Steps: 19%|█▊ | 187/1000 [01:58<08:30, 1.59it/s, loss=0.000371, lr=0.001]
Steps: 19%|█▊ | 187/1000 [01:58<08:30, 1.59it/s, loss=0.0068, lr=0.001]
Steps: 19%|█▉ | 188/1000 [01:59<08:26, 1.60it/s, loss=0.0068, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4091],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4121], device='cuda:0')
Steps: 19%|█▉ | 188/1000 [01:59<08:26, 1.60it/s, loss=0.0109, lr=0.001]
Steps: 19%|█▉ | 189/1000 [01:59<08:30, 1.59it/s, loss=0.0109, lr=0.001]
Steps: 19%|█▉ | 189/1000 [01:59<08:30, 1.59it/s, loss=0.028, lr=0.001]
Steps: 19%|█▉ | 190/1000 [02:00<08:28, 1.59it/s, loss=0.028, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4093],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4121], device='cuda:0')
Steps: 19%|█▉ | 190/1000 [02:00<08:28, 1.59it/s, loss=0.000189, lr=0.001]
Steps: 19%|█▉ | 191/1000 [02:00<08:30, 1.58it/s, loss=0.000189, lr=0.001]
Steps: 19%|█▉ | 191/1000 [02:00<08:30, 1.58it/s, loss=0.00566, lr=0.001]
Steps: 19%|█▉ | 192/1000 [02:01<08:25, 1.60it/s, loss=0.00566, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4094],
[0.4134]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4121], device='cuda:0')
Steps: 19%|█▉ | 192/1000 [02:01<08:25, 1.60it/s, loss=0.0471, lr=0.001]
Steps: 19%|█▉ | 193/1000 [02:02<08:29, 1.58it/s, loss=0.0471, lr=0.001]
Steps: 19%|█▉ | 193/1000 [02:02<08:29, 1.58it/s, loss=0.00171, lr=0.001]
Steps: 19%|█▉ | 194/1000 [02:02<08:24, 1.60it/s, loss=0.00171, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4094],
[0.4133]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4120], device='cuda:0')
Steps: 19%|█▉ | 194/1000 [02:02<08:24, 1.60it/s, loss=0.0653, lr=0.001]
Steps: 20%|█▉ | 195/1000 [02:03<08:28, 1.58it/s, loss=0.0653, lr=0.001]
Steps: 20%|█▉ | 195/1000 [02:03<08:28, 1.58it/s, loss=0.111, lr=0.001]
Steps: 20%|█▉ | 196/1000 [02:04<08:24, 1.59it/s, loss=0.111, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4094],
[0.4133]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4119], device='cuda:0')
Steps: 20%|█▉ | 196/1000 [02:04<08:24, 1.59it/s, loss=0.000168, lr=0.001]
Steps: 20%|█▉ | 197/1000 [02:04<08:27, 1.58it/s, loss=0.000168, lr=0.001]
Steps: 20%|█▉ | 197/1000 [02:04<08:27, 1.58it/s, loss=0.215, lr=0.001]
Steps: 20%|█▉ | 198/1000 [02:05<08:24, 1.59it/s, loss=0.215, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4094],
[0.4132]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4119], device='cuda:0')
Steps: 20%|█▉ | 198/1000 [02:05<08:24, 1.59it/s, loss=0.0573, lr=0.001]
Steps: 20%|█▉ | 199/1000 [02:06<08:29, 1.57it/s, loss=0.0573, lr=0.001]
Steps: 20%|█▉ | 199/1000 [02:06<08:29, 1.57it/s, loss=0.0316, lr=0.001]
Steps: 20%|██ | 200/1000 [02:06<08:25, 1.58it/s, loss=0.0316, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0056, -0.0009, -0.0087, -0.0236], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0028, -0.0070, 0.0065, 0.0088], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_200.safetensors
tensor(0.0011, device='cuda:0')
tensor([[0.4093],
[0.4131]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4118], device='cuda:0')
Steps: 20%|██ | 200/1000 [02:06<08:25, 1.58it/s, loss=0.000531, lr=0.001]
Steps: 20%|██ | 201/1000 [02:07<08:29, 1.57it/s, loss=0.000531, lr=0.001]
Steps: 20%|██ | 201/1000 [02:07<08:29, 1.57it/s, loss=0.00104, lr=0.001]
Steps: 20%|██ | 202/1000 [02:07<08:24, 1.58it/s, loss=0.00104, lr=0.001]
tensor(0.0003, device='cuda:0')
tensor([[0.4093],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4117], device='cuda:0')
Steps: 20%|██ | 202/1000 [02:07<08:24, 1.58it/s, loss=0.000786, lr=0.001]
Steps: 20%|██ | 203/1000 [02:08<08:27, 1.57it/s, loss=0.000786, lr=0.001]
Steps: 20%|██ | 203/1000 [02:08<08:27, 1.57it/s, loss=0.00016, lr=0.001]
Steps: 20%|██ | 204/1000 [02:09<08:21, 1.59it/s, loss=0.00016, lr=0.001]
tensor(0.0079, device='cuda:0')
tensor([[0.4092],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4116], device='cuda:0')
Steps: 20%|██ | 204/1000 [02:09<08:21, 1.59it/s, loss=0.0152, lr=0.001]
Steps: 20%|██ | 205/1000 [02:09<08:26, 1.57it/s, loss=0.0152, lr=0.001]
Steps: 20%|██ | 205/1000 [02:09<08:26, 1.57it/s, loss=0.0314, lr=0.001]
Steps: 21%|██ | 206/1000 [02:10<08:20, 1.59it/s, loss=0.0314, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4092],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4116], device='cuda:0')
Steps: 21%|██ | 206/1000 [02:10<08:20, 1.59it/s, loss=0.0123, lr=0.001]
Steps: 21%|██ | 207/1000 [02:11<08:25, 1.57it/s, loss=0.0123, lr=0.001]
Steps: 21%|██ | 207/1000 [02:11<08:25, 1.57it/s, loss=0.0274, lr=0.001]
Steps: 21%|██ | 208/1000 [02:11<08:19, 1.59it/s, loss=0.0274, lr=0.001]
tensor(0.0105, device='cuda:0')
tensor([[0.4093],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4117], device='cuda:0')
Steps: 21%|██ | 208/1000 [02:11<08:19, 1.59it/s, loss=0.0139, lr=0.001]
Steps: 21%|██ | 209/1000 [02:12<08:21, 1.58it/s, loss=0.0139, lr=0.001]
Steps: 21%|██ | 209/1000 [02:12<08:21, 1.58it/s, loss=0.00396, lr=0.001]
Steps: 21%|██ | 210/1000 [02:12<08:18, 1.59it/s, loss=0.00396, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4094],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4117], device='cuda:0')
Steps: 21%|██ | 210/1000 [02:12<08:18, 1.59it/s, loss=0.00166, lr=0.001]
Steps: 21%|██ | 211/1000 [02:13<08:21, 1.57it/s, loss=0.00166, lr=0.001]
Steps: 21%|██ | 211/1000 [02:13<08:21, 1.57it/s, loss=0.0014, lr=0.001]
Steps: 21%|██ | 212/1000 [02:14<08:17, 1.58it/s, loss=0.0014, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4096],
[0.4131]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4118], device='cuda:0')
Steps: 21%|██ | 212/1000 [02:14<08:17, 1.58it/s, loss=0.00378, lr=0.001]
Steps: 21%|██▏ | 213/1000 [02:14<08:28, 1.55it/s, loss=0.00378, lr=0.001]
Steps: 21%|██▏ | 213/1000 [02:14<08:28, 1.55it/s, loss=0.0232, lr=0.001]
Steps: 21%|██▏ | 214/1000 [02:15<08:24, 1.56it/s, loss=0.0232, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4098],
[0.4131]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4118], device='cuda:0')
Steps: 21%|██▏ | 214/1000 [02:15<08:24, 1.56it/s, loss=0.0017, lr=0.001]
Steps: 22%|██▏ | 215/1000 [02:16<08:23, 1.56it/s, loss=0.0017, lr=0.001]
Steps: 22%|██▏ | 215/1000 [02:16<08:23, 1.56it/s, loss=0.000727, lr=0.001]
Steps: 22%|██▏ | 216/1000 [02:16<08:19, 1.57it/s, loss=0.000727, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4099],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4117], device='cuda:0')
Steps: 22%|██▏ | 216/1000 [02:16<08:19, 1.57it/s, loss=0.00184, lr=0.001]
Steps: 22%|██▏ | 217/1000 [02:17<08:22, 1.56it/s, loss=0.00184, lr=0.001]
Steps: 22%|██▏ | 217/1000 [02:17<08:22, 1.56it/s, loss=0.0032, lr=0.001]
Steps: 22%|██▏ | 218/1000 [02:18<08:17, 1.57it/s, loss=0.0032, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4100],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4116], device='cuda:0')
Steps: 22%|██▏ | 218/1000 [02:18<08:17, 1.57it/s, loss=0.00103, lr=0.001]
Steps: 22%|██▏ | 219/1000 [02:18<08:20, 1.56it/s, loss=0.00103, lr=0.001]
Steps: 22%|██▏ | 219/1000 [02:18<08:20, 1.56it/s, loss=0.00787, lr=0.001]
Steps: 22%|██▏ | 220/1000 [02:19<08:14, 1.58it/s, loss=0.00787, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4102],
[0.4128]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4115], device='cuda:0')
Steps: 22%|██▏ | 220/1000 [02:19<08:14, 1.58it/s, loss=0.0362, lr=0.001]
Steps: 22%|██▏ | 221/1000 [02:20<08:17, 1.56it/s, loss=0.0362, lr=0.001]
Steps: 22%|██▏ | 221/1000 [02:20<08:17, 1.56it/s, loss=0.00483, lr=0.001]
Steps: 22%|██▏ | 222/1000 [02:20<08:13, 1.58it/s, loss=0.00483, lr=0.001]
tensor(0.0095, device='cuda:0')
tensor([[0.4104],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4114], device='cuda:0')
Steps: 22%|██▏ | 222/1000 [02:20<08:13, 1.58it/s, loss=0.0241, lr=0.001]
Steps: 22%|██▏ | 223/1000 [02:21<08:18, 1.56it/s, loss=0.0241, lr=0.001]
Steps: 22%|██▏ | 223/1000 [02:21<08:18, 1.56it/s, loss=0.0179, lr=0.001]
Steps: 22%|██▏ | 224/1000 [02:21<08:14, 1.57it/s, loss=0.0179, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4106],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4113], device='cuda:0')
Steps: 22%|██▏ | 224/1000 [02:21<08:14, 1.57it/s, loss=0.00276, lr=0.001]
Steps: 22%|██▎ | 225/1000 [02:22<08:16, 1.56it/s, loss=0.00276, lr=0.001]
Steps: 22%|██▎ | 225/1000 [02:22<08:16, 1.56it/s, loss=0.000774, lr=0.001]
Steps: 23%|██▎ | 226/1000 [02:23<08:12, 1.57it/s, loss=0.000774, lr=0.001]
tensor(0.0092, device='cuda:0')
tensor([[0.4107],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 226/1000 [02:23<08:12, 1.57it/s, loss=0.071, lr=0.001]
Steps: 23%|██▎ | 227/1000 [02:23<08:16, 1.56it/s, loss=0.071, lr=0.001]
Steps: 23%|██▎ | 227/1000 [02:23<08:16, 1.56it/s, loss=0.00114, lr=0.001]
Steps: 23%|██▎ | 228/1000 [02:24<08:12, 1.57it/s, loss=0.00114, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4107],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4113], device='cuda:0')
Steps: 23%|██▎ | 228/1000 [02:24<08:12, 1.57it/s, loss=0.000409, lr=0.001]
Steps: 23%|██▎ | 229/1000 [02:25<08:16, 1.55it/s, loss=0.000409, lr=0.001]
Steps: 23%|██▎ | 229/1000 [02:25<08:16, 1.55it/s, loss=0.0402, lr=0.001]
Steps: 23%|██▎ | 230/1000 [02:25<08:11, 1.57it/s, loss=0.0402, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4108],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 230/1000 [02:25<08:11, 1.57it/s, loss=0.0238, lr=0.001]
Steps: 23%|██▎ | 231/1000 [02:26<08:14, 1.55it/s, loss=0.0238, lr=0.001]
Steps: 23%|██▎ | 231/1000 [02:26<08:14, 1.55it/s, loss=0.000537, lr=0.001]
Steps: 23%|██▎ | 232/1000 [02:27<08:09, 1.57it/s, loss=0.000537, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4108],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 232/1000 [02:27<08:09, 1.57it/s, loss=0.000339, lr=0.001]
Steps: 23%|██▎ | 233/1000 [02:27<08:14, 1.55it/s, loss=0.000339, lr=0.001]
Steps: 23%|██▎ | 233/1000 [02:27<08:14, 1.55it/s, loss=0.0129, lr=0.001]
Steps: 23%|██▎ | 234/1000 [02:28<08:11, 1.56it/s, loss=0.0129, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4109],
[0.4124]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 234/1000 [02:28<08:11, 1.56it/s, loss=0.00997, lr=0.001]
Steps: 24%|██▎ | 235/1000 [02:28<08:13, 1.55it/s, loss=0.00997, lr=0.001]
Steps: 24%|██▎ | 235/1000 [02:28<08:13, 1.55it/s, loss=0.0416, lr=0.001]
Steps: 24%|██▎ | 236/1000 [02:29<08:08, 1.56it/s, loss=0.0416, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4109],
[0.4123]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4111], device='cuda:0')
Steps: 24%|██▎ | 236/1000 [02:29<08:08, 1.56it/s, loss=0.00845, lr=0.001]
Steps: 24%|██▎ | 237/1000 [02:30<08:12, 1.55it/s, loss=0.00845, lr=0.001]
Steps: 24%|██▎ | 237/1000 [02:30<08:12, 1.55it/s, loss=0.000848, lr=0.001]
Steps: 24%|██▍ | 238/1000 [02:30<08:08, 1.56it/s, loss=0.000848, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4110],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4110], device='cuda:0')
Steps: 24%|██▍ | 238/1000 [02:30<08:08, 1.56it/s, loss=0.00033, lr=0.001]
Steps: 24%|██▍ | 239/1000 [02:31<08:14, 1.54it/s, loss=0.00033, lr=0.001]
Steps: 24%|██▍ | 239/1000 [02:31<08:14, 1.54it/s, loss=0.0048, lr=0.001]
Steps: 24%|██▍ | 240/1000 [02:32<08:11, 1.55it/s, loss=0.0048, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 24%|██▍ | 240/1000 [02:32<08:11, 1.55it/s, loss=0.000334, lr=0.001]
Steps: 24%|██▍ | 241/1000 [02:32<08:13, 1.54it/s, loss=0.000334, lr=0.001]
Steps: 24%|██▍ | 241/1000 [02:32<08:13, 1.54it/s, loss=0.0037, lr=0.001]
Steps: 24%|██▍ | 242/1000 [02:33<08:07, 1.55it/s, loss=0.0037, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 24%|██▍ | 242/1000 [02:33<08:07, 1.55it/s, loss=0.000196, lr=0.001]
Steps: 24%|██▍ | 243/1000 [02:34<08:11, 1.54it/s, loss=0.000196, lr=0.001]
Steps: 24%|██▍ | 243/1000 [02:34<08:11, 1.54it/s, loss=0.0248, lr=0.001]
Steps: 24%|██▍ | 244/1000 [02:34<08:07, 1.55it/s, loss=0.0248, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4109],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4107], device='cuda:0')
Steps: 24%|██▍ | 244/1000 [02:34<08:07, 1.55it/s, loss=0.00292, lr=0.001]
Steps: 24%|██▍ | 245/1000 [02:35<08:10, 1.54it/s, loss=0.00292, lr=0.001]
Steps: 24%|██▍ | 245/1000 [02:35<08:10, 1.54it/s, loss=0.000501, lr=0.001]
Steps: 25%|██▍ | 246/1000 [02:36<08:05, 1.55it/s, loss=0.000501, lr=0.001]
tensor(0.0067, device='cuda:0')
tensor([[0.4108],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4105], device='cuda:0')
Steps: 25%|██▍ | 246/1000 [02:36<08:05, 1.55it/s, loss=0.000666, lr=0.001]
Steps: 25%|██▍ | 247/1000 [02:36<08:06, 1.55it/s, loss=0.000666, lr=0.001]
Steps: 25%|██▍ | 247/1000 [02:36<08:06, 1.55it/s, loss=0.0521, lr=0.001]
Steps: 25%|██▍ | 248/1000 [02:37<08:01, 1.56it/s, loss=0.0521, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4107],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4103], device='cuda:0')
Steps: 25%|██▍ | 248/1000 [02:37<08:01, 1.56it/s, loss=0.00594, lr=0.001]
Steps: 25%|██▍ | 249/1000 [02:38<08:03, 1.55it/s, loss=0.00594, lr=0.001]
Steps: 25%|██▍ | 249/1000 [02:38<08:03, 1.55it/s, loss=0.0184, lr=0.001]
Steps: 25%|██▌ | 250/1000 [02:38<07:59, 1.56it/s, loss=0.0184, lr=0.001]
tensor(0.0015, device='cuda:0')
tensor([[0.4106],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4102], device='cuda:0')
Steps: 25%|██▌ | 250/1000 [02:38<07:59, 1.56it/s, loss=0.000888, lr=0.001]
Steps: 25%|██▌ | 251/1000 [02:39<08:04, 1.55it/s, loss=0.000888, lr=0.001]
Steps: 25%|██▌ | 251/1000 [02:39<08:04, 1.55it/s, loss=0.0139, lr=0.001]
Steps: 25%|██▌ | 252/1000 [02:39<08:00, 1.56it/s, loss=0.0139, lr=0.001]
tensor(0.0089, device='cuda:0')
tensor([[0.4105],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4100], device='cuda:0')
Steps: 25%|██▌ | 252/1000 [02:39<08:00, 1.56it/s, loss=0.0116, lr=0.001]
Steps: 25%|██▌ | 253/1000 [02:40<08:10, 1.52it/s, loss=0.0116, lr=0.001]
Steps: 25%|██▌ | 253/1000 [02:40<08:10, 1.52it/s, loss=0.0285, lr=0.001]
Steps: 25%|██▌ | 254/1000 [02:41<08:06, 1.53it/s, loss=0.0285, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4105],
[0.4109]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4099], device='cuda:0')
Steps: 25%|██▌ | 254/1000 [02:41<08:06, 1.53it/s, loss=0.011, lr=0.001]
Steps: 26%|██▌ | 255/1000 [02:41<08:06, 1.53it/s, loss=0.011, lr=0.001]
Steps: 26%|██▌ | 255/1000 [02:41<08:06, 1.53it/s, loss=0.00831, lr=0.001]
Steps: 26%|██▌ | 256/1000 [02:42<08:02, 1.54it/s, loss=0.00831, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4104],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4097], device='cuda:0')
Steps: 26%|██▌ | 256/1000 [02:42<08:02, 1.54it/s, loss=0.0235, lr=0.001]
Steps: 26%|██▌ | 257/1000 [02:43<08:03, 1.54it/s, loss=0.0235, lr=0.001]
Steps: 26%|██▌ | 257/1000 [02:43<08:03, 1.54it/s, loss=0.00303, lr=0.001]
Steps: 26%|██▌ | 258/1000 [02:43<07:59, 1.55it/s, loss=0.00303, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4103],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4095], device='cuda:0')
Steps: 26%|██▌ | 258/1000 [02:43<07:59, 1.55it/s, loss=0.0407, lr=0.001]
Steps: 26%|██▌ | 259/1000 [02:44<08:01, 1.54it/s, loss=0.0407, lr=0.001]
Steps: 26%|██▌ | 259/1000 [02:44<08:01, 1.54it/s, loss=0.0181, lr=0.001]
Steps: 26%|██▌ | 260/1000 [02:45<07:56, 1.55it/s, loss=0.0181, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4102],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4095], device='cuda:0')
Steps: 26%|██▌ | 260/1000 [02:45<07:56, 1.55it/s, loss=0.0383, lr=0.001]
Steps: 26%|██▌ | 261/1000 [02:45<08:00, 1.54it/s, loss=0.0383, lr=0.001]
Steps: 26%|██▌ | 261/1000 [02:45<08:00, 1.54it/s, loss=0.0344, lr=0.001]
Steps: 26%|██▌ | 262/1000 [02:46<07:55, 1.55it/s, loss=0.0344, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4102],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4094], device='cuda:0')
Steps: 26%|██▌ | 262/1000 [02:46<07:55, 1.55it/s, loss=0.00119, lr=0.001]
Steps: 26%|██▋ | 263/1000 [02:47<08:00, 1.53it/s, loss=0.00119, lr=0.001]
Steps: 26%|██▋ | 263/1000 [02:47<08:00, 1.53it/s, loss=0.00178, lr=0.001]
Steps: 26%|██▋ | 264/1000 [02:47<07:53, 1.55it/s, loss=0.00178, lr=0.001]
tensor(0.0092, device='cuda:0')
tensor([[0.4101],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4095], device='cuda:0')
Steps: 26%|██▋ | 264/1000 [02:47<07:53, 1.55it/s, loss=0.077, lr=0.001]
Steps: 26%|██▋ | 265/1000 [02:48<07:55, 1.55it/s, loss=0.077, lr=0.001]
Steps: 26%|██▋ | 265/1000 [02:48<07:55, 1.55it/s, loss=0.0233, lr=0.001]
Steps: 27%|██▋ | 266/1000 [02:49<07:50, 1.56it/s, loss=0.0233, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4100],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4095], device='cuda:0')
Steps: 27%|██▋ | 266/1000 [02:49<07:50, 1.56it/s, loss=0.000306, lr=0.001]
Steps: 27%|██▋ | 267/1000 [02:49<07:54, 1.54it/s, loss=0.000306, lr=0.001]
Steps: 27%|██▋ | 267/1000 [02:49<07:54, 1.54it/s, loss=0.0058, lr=0.001]
Steps: 27%|██▋ | 268/1000 [02:50<07:50, 1.56it/s, loss=0.0058, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4099],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 268/1000 [02:50<07:50, 1.56it/s, loss=0.00137, lr=0.001]
Steps: 27%|██▋ | 269/1000 [02:50<07:52, 1.55it/s, loss=0.00137, lr=0.001]
Steps: 27%|██▋ | 269/1000 [02:50<07:52, 1.55it/s, loss=0.000572, lr=0.001]
Steps: 27%|██▋ | 270/1000 [02:51<07:47, 1.56it/s, loss=0.000572, lr=0.001]
tensor(0.0095, device='cuda:0')
tensor([[0.4098],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 270/1000 [02:51<07:47, 1.56it/s, loss=0.0374, lr=0.001]
Steps: 27%|██▋ | 271/1000 [02:52<07:51, 1.55it/s, loss=0.0374, lr=0.001]
Steps: 27%|██▋ | 271/1000 [02:52<07:51, 1.55it/s, loss=0.00526, lr=0.001]
Steps: 27%|██▋ | 272/1000 [02:52<07:46, 1.56it/s, loss=0.00526, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4096],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 272/1000 [02:52<07:46, 1.56it/s, loss=0.000402, lr=0.001]
Steps: 27%|██▋ | 273/1000 [02:53<07:49, 1.55it/s, loss=0.000402, lr=0.001]
Steps: 27%|██▋ | 273/1000 [02:53<07:49, 1.55it/s, loss=0.0369, lr=0.001]
Steps: 27%|██▋ | 274/1000 [02:54<07:45, 1.56it/s, loss=0.0369, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4095],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 274/1000 [02:54<07:45, 1.56it/s, loss=0.0328, lr=0.001]
Steps: 28%|██▊ | 275/1000 [02:54<07:49, 1.55it/s, loss=0.0328, lr=0.001]
Steps: 28%|██▊ | 275/1000 [02:54<07:49, 1.55it/s, loss=0.0216, lr=0.001]
Steps: 28%|██▊ | 276/1000 [02:55<07:45, 1.55it/s, loss=0.0216, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4093],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4094], device='cuda:0')
Steps: 28%|██▊ | 276/1000 [02:55<07:45, 1.55it/s, loss=0.00096, lr=0.001]
Steps: 28%|██▊ | 277/1000 [02:56<07:47, 1.55it/s, loss=0.00096, lr=0.001]
Steps: 28%|██▊ | 277/1000 [02:56<07:47, 1.55it/s, loss=0.00177, lr=0.001]
Steps: 28%|██▊ | 278/1000 [02:56<07:41, 1.57it/s, loss=0.00177, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4092],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4094], device='cuda:0')
Steps: 28%|██▊ | 278/1000 [02:56<07:41, 1.57it/s, loss=0.0209, lr=0.001]
Steps: 28%|██▊ | 279/1000 [02:57<07:45, 1.55it/s, loss=0.0209, lr=0.001]
Steps: 28%|██▊ | 279/1000 [02:57<07:45, 1.55it/s, loss=0.000778, lr=0.001]
Steps: 28%|██▊ | 280/1000 [02:58<07:40, 1.56it/s, loss=0.000778, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4091],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4093], device='cuda:0')
Steps: 28%|██▊ | 280/1000 [02:58<07:40, 1.56it/s, loss=0.0108, lr=0.001]
Steps: 28%|██▊ | 281/1000 [02:58<07:42, 1.55it/s, loss=0.0108, lr=0.001]
Steps: 28%|██▊ | 281/1000 [02:58<07:42, 1.55it/s, loss=0.000837, lr=0.001]
Steps: 28%|██▊ | 282/1000 [02:59<07:37, 1.57it/s, loss=0.000837, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4090],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4091], device='cuda:0')
Steps: 28%|██▊ | 282/1000 [02:59<07:37, 1.57it/s, loss=0.00502, lr=0.001]
Steps: 28%|██▊ | 283/1000 [02:59<07:41, 1.55it/s, loss=0.00502, lr=0.001]
Steps: 28%|██▊ | 283/1000 [02:59<07:41, 1.55it/s, loss=0.00781, lr=0.001]
Steps: 28%|██▊ | 284/1000 [03:00<07:36, 1.57it/s, loss=0.00781, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4089],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4090], device='cuda:0')
Steps: 28%|██▊ | 284/1000 [03:00<07:36, 1.57it/s, loss=0.00449, lr=0.001]
Steps: 28%|██▊ | 285/1000 [03:01<07:38, 1.56it/s, loss=0.00449, lr=0.001]
Steps: 28%|██▊ | 285/1000 [03:01<07:38, 1.56it/s, loss=0.0029, lr=0.001]
Steps: 29%|██▊ | 286/1000 [03:01<07:35, 1.57it/s, loss=0.0029, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4088],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4088], device='cuda:0')
Steps: 29%|██▊ | 286/1000 [03:01<07:35, 1.57it/s, loss=0.00402, lr=0.001]
Steps: 29%|██▊ | 287/1000 [03:02<07:37, 1.56it/s, loss=0.00402, lr=0.001]
Steps: 29%|██▊ | 287/1000 [03:02<07:37, 1.56it/s, loss=0.000204, lr=0.001]
Steps: 29%|██▉ | 288/1000 [03:03<07:32, 1.57it/s, loss=0.000204, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4087],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4085], device='cuda:0')
Steps: 29%|██▉ | 288/1000 [03:03<07:32, 1.57it/s, loss=0.0105, lr=0.001]
Steps: 29%|██▉ | 289/1000 [03:03<07:34, 1.56it/s, loss=0.0105, lr=0.001]
Steps: 29%|██▉ | 289/1000 [03:03<07:34, 1.56it/s, loss=0.000225, lr=0.001]
Steps: 29%|██▉ | 290/1000 [03:04<07:29, 1.58it/s, loss=0.000225, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4085],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4083], device='cuda:0')
Steps: 29%|██▉ | 290/1000 [03:04<07:29, 1.58it/s, loss=0.00173, lr=0.001]
Steps: 29%|██▉ | 291/1000 [03:05<07:32, 1.57it/s, loss=0.00173, lr=0.001]
Steps: 29%|██▉ | 291/1000 [03:05<07:32, 1.57it/s, loss=0.00477, lr=0.001]
Steps: 29%|██▉ | 292/1000 [03:05<07:27, 1.58it/s, loss=0.00477, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4083],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4080], device='cuda:0')
Steps: 29%|██▉ | 292/1000 [03:05<07:27, 1.58it/s, loss=0.0743, lr=0.001]
Steps: 29%|██▉ | 293/1000 [03:06<07:29, 1.57it/s, loss=0.0743, lr=0.001]
Steps: 29%|██▉ | 293/1000 [03:06<07:29, 1.57it/s, loss=0.0805, lr=0.001]
Steps: 29%|██▉ | 294/1000 [03:06<07:27, 1.58it/s, loss=0.0805, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4081],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4078], device='cuda:0')
Steps: 29%|██▉ | 294/1000 [03:06<07:27, 1.58it/s, loss=0.000428, lr=0.001]
Steps: 30%|██▉ | 295/1000 [03:07<07:29, 1.57it/s, loss=0.000428, lr=0.001]
Steps: 30%|██▉ | 295/1000 [03:07<07:29, 1.57it/s, loss=0.0122, lr=0.001]
Steps: 30%|██▉ | 296/1000 [03:08<07:26, 1.58it/s, loss=0.0122, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4080],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4076], device='cuda:0')
Steps: 30%|██▉ | 296/1000 [03:08<07:26, 1.58it/s, loss=0.0223, lr=0.001]
Steps: 30%|██▉ | 297/1000 [03:08<07:29, 1.56it/s, loss=0.0223, lr=0.001]
Steps: 30%|██▉ | 297/1000 [03:08<07:29, 1.56it/s, loss=0.00694, lr=0.001]
Steps: 30%|██▉ | 298/1000 [03:09<07:25, 1.58it/s, loss=0.00694, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4079],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4073], device='cuda:0')
Steps: 30%|██▉ | 298/1000 [03:09<07:25, 1.58it/s, loss=0.00844, lr=0.001]
Steps: 30%|██▉ | 299/1000 [03:10<07:28, 1.56it/s, loss=0.00844, lr=0.001]
Steps: 30%|██▉ | 299/1000 [03:10<07:28, 1.56it/s, loss=0.0155, lr=0.001]
Steps: 30%|███ | 300/1000 [03:10<07:24, 1.58it/s, loss=0.0155, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0150, 0.0010, -0.0104, -0.0219], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0016, -0.0075, -0.0043, 0.0085], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_300.safetensors
tensor(0.0046, device='cuda:0')
tensor([[0.4079],
[0.4080]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4072], device='cuda:0')
Steps: 30%|███ | 300/1000 [03:10<07:24, 1.58it/s, loss=0.0115, lr=0.001]
Steps: 30%|███ | 301/1000 [03:11<07:29, 1.55it/s, loss=0.0115, lr=0.001]
Steps: 30%|███ | 301/1000 [03:11<07:29, 1.55it/s, loss=0.00161, lr=0.001]
Steps: 30%|███ | 302/1000 [03:12<07:25, 1.57it/s, loss=0.00161, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4079],
[0.4078]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4070], device='cuda:0')
Steps: 30%|███ | 302/1000 [03:12<07:25, 1.57it/s, loss=0.00208, lr=0.001]
Steps: 30%|███ | 303/1000 [03:12<07:27, 1.56it/s, loss=0.00208, lr=0.001]
Steps: 30%|███ | 303/1000 [03:12<07:27, 1.56it/s, loss=0.00318, lr=0.001]
Steps: 30%|███ | 304/1000 [03:13<07:21, 1.58it/s, loss=0.00318, lr=0.001]
tensor(0.0041, device='cuda:0')
tensor([[0.4079],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4068], device='cuda:0')
Steps: 30%|███ | 304/1000 [03:13<07:21, 1.58it/s, loss=0.000203, lr=0.001]
Steps: 30%|███ | 305/1000 [03:13<07:23, 1.57it/s, loss=0.000203, lr=0.001]
Steps: 30%|███ | 305/1000 [03:13<07:23, 1.57it/s, loss=0.00242, lr=0.001]
Steps: 31%|███ | 306/1000 [03:14<07:20, 1.58it/s, loss=0.00242, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4079],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4067], device='cuda:0')
Steps: 31%|███ | 306/1000 [03:14<07:20, 1.58it/s, loss=0.00278, lr=0.001]
Steps: 31%|███ | 307/1000 [03:15<07:24, 1.56it/s, loss=0.00278, lr=0.001]
Steps: 31%|███ | 307/1000 [03:15<07:24, 1.56it/s, loss=0.0105, lr=0.001]
Steps: 31%|███ | 308/1000 [03:15<07:19, 1.58it/s, loss=0.0105, lr=0.001]
tensor(0.0036, device='cuda:0')
tensor([[0.4080],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4066], device='cuda:0')
Steps: 31%|███ | 308/1000 [03:15<07:19, 1.58it/s, loss=0.00387, lr=0.001]
Steps: 31%|███ | 309/1000 [03:16<07:20, 1.57it/s, loss=0.00387, lr=0.001]
Steps: 31%|███ | 309/1000 [03:16<07:20, 1.57it/s, loss=0.00835, lr=0.001]
Steps: 31%|███ | 310/1000 [03:17<07:16, 1.58it/s, loss=0.00835, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4080],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4066], device='cuda:0')
Steps: 31%|███ | 310/1000 [03:17<07:16, 1.58it/s, loss=0.000944, lr=0.001]
Steps: 31%|███ | 311/1000 [03:17<07:21, 1.56it/s, loss=0.000944, lr=0.001]
Steps: 31%|███ | 311/1000 [03:17<07:21, 1.56it/s, loss=0.0117, lr=0.001]
Steps: 31%|███ | 312/1000 [03:18<07:17, 1.57it/s, loss=0.0117, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4079],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4066], device='cuda:0')
Steps: 31%|███ | 312/1000 [03:18<07:17, 1.57it/s, loss=0.0103, lr=0.001]
Steps: 31%|███▏ | 313/1000 [03:19<07:18, 1.57it/s, loss=0.0103, lr=0.001]
Steps: 31%|███▏ | 313/1000 [03:19<07:18, 1.57it/s, loss=0.000619, lr=0.001]
Steps: 31%|███▏ | 314/1000 [03:19<07:14, 1.58it/s, loss=0.000619, lr=0.001]
tensor(0.0091, device='cuda:0')
tensor([[0.4079],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4066], device='cuda:0')
Steps: 31%|███▏ | 314/1000 [03:19<07:14, 1.58it/s, loss=0.0259, lr=0.001]
Steps: 32%|███▏ | 315/1000 [03:20<07:15, 1.57it/s, loss=0.0259, lr=0.001]
Steps: 32%|███▏ | 315/1000 [03:20<07:15, 1.57it/s, loss=0.0223, lr=0.001]
Steps: 32%|███▏ | 316/1000 [03:20<07:14, 1.58it/s, loss=0.0223, lr=0.001]
tensor(0.0302, device='cuda:0')
tensor([[0.4080],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4067], device='cuda:0')
Steps: 32%|███▏ | 316/1000 [03:20<07:14, 1.58it/s, loss=0.00669, lr=0.001]
Steps: 32%|███▏ | 317/1000 [03:21<07:15, 1.57it/s, loss=0.00669, lr=0.001]
Steps: 32%|███▏ | 317/1000 [03:21<07:15, 1.57it/s, loss=0.00348, lr=0.001]
Steps: 32%|███▏ | 318/1000 [03:22<07:10, 1.58it/s, loss=0.00348, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4084],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4069], device='cuda:0')
Steps: 32%|███▏ | 318/1000 [03:22<07:10, 1.58it/s, loss=0.0187, lr=0.001]
Steps: 32%|███▏ | 319/1000 [03:22<07:17, 1.56it/s, loss=0.0187, lr=0.001]
Steps: 32%|███▏ | 319/1000 [03:22<07:17, 1.56it/s, loss=0.000746, lr=0.001]
Steps: 32%|███▏ | 320/1000 [03:23<07:12, 1.57it/s, loss=0.000746, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4088],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4073], device='cuda:0')
Steps: 32%|███▏ | 320/1000 [03:23<07:12, 1.57it/s, loss=0.102, lr=0.001]
Steps: 32%|███▏ | 321/1000 [03:24<07:13, 1.56it/s, loss=0.102, lr=0.001]
Steps: 32%|███▏ | 321/1000 [03:24<07:13, 1.56it/s, loss=0.0553, lr=0.001]
Steps: 32%|███▏ | 322/1000 [03:24<07:08, 1.58it/s, loss=0.0553, lr=0.001]
tensor(0.0044, device='cuda:0')
tensor([[0.4093],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4078], device='cuda:0')
Steps: 32%|███▏ | 322/1000 [03:24<07:08, 1.58it/s, loss=0.00572, lr=0.001]
Steps: 32%|███▏ | 323/1000 [03:25<07:10, 1.57it/s, loss=0.00572, lr=0.001]
Steps: 32%|███▏ | 323/1000 [03:25<07:10, 1.57it/s, loss=0.0152, lr=0.001]
Steps: 32%|███▏ | 324/1000 [03:26<07:05, 1.59it/s, loss=0.0152, lr=0.001]
tensor(0.0066, device='cuda:0')
tensor([[0.4098],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4083], device='cuda:0')
Steps: 32%|███▏ | 324/1000 [03:26<07:05, 1.59it/s, loss=0.0164, lr=0.001]
Steps: 32%|███▎ | 325/1000 [03:26<07:10, 1.57it/s, loss=0.0164, lr=0.001]
Steps: 32%|███▎ | 325/1000 [03:26<07:10, 1.57it/s, loss=0.0122, lr=0.001]
Steps: 33%|███▎ | 326/1000 [03:27<07:05, 1.58it/s, loss=0.0122, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4104],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4088], device='cuda:0')
Steps: 33%|███▎ | 326/1000 [03:27<07:05, 1.58it/s, loss=0.0254, lr=0.001]
Steps: 33%|███▎ | 327/1000 [03:27<07:07, 1.57it/s, loss=0.0254, lr=0.001]
Steps: 33%|███▎ | 327/1000 [03:27<07:07, 1.57it/s, loss=0.00151, lr=0.001]
Steps: 33%|███▎ | 328/1000 [03:28<07:04, 1.58it/s, loss=0.00151, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4109],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4092], device='cuda:0')
Steps: 33%|███▎ | 328/1000 [03:28<07:04, 1.58it/s, loss=0.0125, lr=0.001]
Steps: 33%|███▎ | 329/1000 [03:29<07:08, 1.57it/s, loss=0.0125, lr=0.001]
Steps: 33%|███▎ | 329/1000 [03:29<07:08, 1.57it/s, loss=0.0043, lr=0.001]
Steps: 33%|███▎ | 330/1000 [03:29<07:04, 1.58it/s, loss=0.0043, lr=0.001]
tensor(0.0066, device='cuda:0')
tensor([[0.4114],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4102, 0.4097], device='cuda:0')
Steps: 33%|███▎ | 330/1000 [03:29<07:04, 1.58it/s, loss=0.0313, lr=0.001]
Steps: 33%|███▎ | 331/1000 [03:30<07:06, 1.57it/s, loss=0.0313, lr=0.001]
Steps: 33%|███▎ | 331/1000 [03:30<07:06, 1.57it/s, loss=0.0152, lr=0.001]
Steps: 33%|███▎ | 332/1000 [03:31<07:02, 1.58it/s, loss=0.0152, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4118],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4106, 0.4101], device='cuda:0')
Steps: 33%|███▎ | 332/1000 [03:31<07:02, 1.58it/s, loss=0.00426, lr=0.001]
Steps: 33%|███▎ | 333/1000 [03:31<07:06, 1.56it/s, loss=0.00426, lr=0.001]
Steps: 33%|███▎ | 333/1000 [03:31<07:06, 1.56it/s, loss=0.00462, lr=0.001]
Steps: 33%|███▎ | 334/1000 [03:32<07:02, 1.58it/s, loss=0.00462, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4121],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4103], device='cuda:0')
Steps: 33%|███▎ | 334/1000 [03:32<07:02, 1.58it/s, loss=0.00314, lr=0.001]
Steps: 34%|███▎ | 335/1000 [03:33<07:06, 1.56it/s, loss=0.00314, lr=0.001]
Steps: 34%|███▎ | 335/1000 [03:33<07:06, 1.56it/s, loss=0.00436, lr=0.001]
Steps: 34%|███▎ | 336/1000 [03:33<07:02, 1.57it/s, loss=0.00436, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4123],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4105], device='cuda:0')
Steps: 34%|███▎ | 336/1000 [03:33<07:02, 1.57it/s, loss=0.00463, lr=0.001]
Steps: 34%|███▎ | 337/1000 [03:34<07:04, 1.56it/s, loss=0.00463, lr=0.001]
Steps: 34%|███▎ | 337/1000 [03:34<07:04, 1.56it/s, loss=0.00249, lr=0.001]
Steps: 34%|███▍ | 338/1000 [03:34<06:59, 1.58it/s, loss=0.00249, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4124],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4112, 0.4106], device='cuda:0')
Steps: 34%|███▍ | 338/1000 [03:34<06:59, 1.58it/s, loss=0.00377, lr=0.001]
Steps: 34%|███▍ | 339/1000 [03:35<07:02, 1.57it/s, loss=0.00377, lr=0.001]
Steps: 34%|███▍ | 339/1000 [03:35<07:02, 1.57it/s, loss=0.00659, lr=0.001]
Steps: 34%|███▍ | 340/1000 [03:36<06:58, 1.58it/s, loss=0.00659, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4124],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4112, 0.4106], device='cuda:0')
Steps: 34%|███▍ | 340/1000 [03:36<06:58, 1.58it/s, loss=0.0136, lr=0.001]
Steps: 34%|███▍ | 341/1000 [03:36<07:00, 1.57it/s, loss=0.0136, lr=0.001]
Steps: 34%|███▍ | 341/1000 [03:36<07:00, 1.57it/s, loss=0.00102, lr=0.001]
Steps: 34%|███▍ | 342/1000 [03:37<06:56, 1.58it/s, loss=0.00102, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4124],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4112, 0.4106], device='cuda:0')
Steps: 34%|███▍ | 342/1000 [03:37<06:56, 1.58it/s, loss=0.00414, lr=0.001]
Steps: 34%|███▍ | 343/1000 [03:38<07:02, 1.55it/s, loss=0.00414, lr=0.001]
Steps: 34%|███▍ | 343/1000 [03:38<07:02, 1.55it/s, loss=0.00138, lr=0.001]
Steps: 34%|███▍ | 344/1000 [03:38<06:58, 1.57it/s, loss=0.00138, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4124],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4105], device='cuda:0')
Steps: 34%|███▍ | 344/1000 [03:38<06:58, 1.57it/s, loss=0.0326, lr=0.001]
Steps: 34%|███▍ | 345/1000 [03:39<07:01, 1.55it/s, loss=0.0326, lr=0.001]
Steps: 34%|███▍ | 345/1000 [03:39<07:01, 1.55it/s, loss=0.0121, lr=0.001]
Steps: 35%|███▍ | 346/1000 [03:40<06:58, 1.56it/s, loss=0.0121, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4123],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4103], device='cuda:0')
Steps: 35%|███▍ | 346/1000 [03:40<06:58, 1.56it/s, loss=0.00339, lr=0.001]
Steps: 35%|███▍ | 347/1000 [03:40<07:00, 1.55it/s, loss=0.00339, lr=0.001]
Steps: 35%|███▍ | 347/1000 [03:40<07:00, 1.55it/s, loss=0.000568, lr=0.001]
Steps: 35%|███▍ | 348/1000 [03:41<06:55, 1.57it/s, loss=0.000568, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4123],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4101], device='cuda:0')
Steps: 35%|███▍ | 348/1000 [03:41<06:55, 1.57it/s, loss=0.00185, lr=0.001]
Steps: 35%|███▍ | 349/1000 [03:42<06:58, 1.56it/s, loss=0.00185, lr=0.001]
Steps: 35%|███▍ | 349/1000 [03:42<06:58, 1.56it/s, loss=0.0526, lr=0.001]
Steps: 35%|███▌ | 350/1000 [03:42<06:54, 1.57it/s, loss=0.0526, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4122],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4100], device='cuda:0')
Steps: 35%|███▌ | 350/1000 [03:42<06:54, 1.57it/s, loss=0.00635, lr=0.001]
Steps: 35%|███▌ | 351/1000 [03:43<06:56, 1.56it/s, loss=0.00635, lr=0.001]
Steps: 35%|███▌ | 351/1000 [03:43<06:56, 1.56it/s, loss=0.0076, lr=0.001]
Steps: 35%|███▌ | 352/1000 [03:43<06:52, 1.57it/s, loss=0.0076, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4121],
[0.4109]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4098], device='cuda:0')
Steps: 35%|███▌ | 352/1000 [03:43<06:52, 1.57it/s, loss=0.000826, lr=0.001]
Steps: 35%|███▌ | 353/1000 [03:44<06:54, 1.56it/s, loss=0.000826, lr=0.001]
Steps: 35%|███▌ | 353/1000 [03:44<06:54, 1.56it/s, loss=0.0911, lr=0.001]
Steps: 35%|███▌ | 354/1000 [03:45<06:51, 1.57it/s, loss=0.0911, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4120],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4108, 0.4097], device='cuda:0')
Steps: 35%|███▌ | 354/1000 [03:45<06:51, 1.57it/s, loss=0.0925, lr=0.001]
Steps: 36%|███▌ | 355/1000 [03:45<06:53, 1.56it/s, loss=0.0925, lr=0.001]
Steps: 36%|███▌ | 355/1000 [03:45<06:53, 1.56it/s, loss=0.00582, lr=0.001]
Steps: 36%|███▌ | 356/1000 [03:46<06:49, 1.57it/s, loss=0.00582, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4119],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4096], device='cuda:0')
Steps: 36%|███▌ | 356/1000 [03:46<06:49, 1.57it/s, loss=0.00329, lr=0.001]
Steps: 36%|███▌ | 357/1000 [03:47<06:51, 1.56it/s, loss=0.00329, lr=0.001]
Steps: 36%|███▌ | 357/1000 [03:47<06:51, 1.56it/s, loss=0.00696, lr=0.001]
Steps: 36%|███▌ | 358/1000 [03:47<06:48, 1.57it/s, loss=0.00696, lr=0.001]
tensor(0.0007, device='cuda:0')
tensor([[0.4118],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4106, 0.4095], device='cuda:0')
Steps: 36%|███▌ | 358/1000 [03:47<06:48, 1.57it/s, loss=0.000468, lr=0.001]
Steps: 36%|███▌ | 359/1000 [03:48<06:50, 1.56it/s, loss=0.000468, lr=0.001]
Steps: 36%|███▌ | 359/1000 [03:48<06:50, 1.56it/s, loss=0.000551, lr=0.001]
Steps: 36%|███▌ | 360/1000 [03:49<06:47, 1.57it/s, loss=0.000551, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4116],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4104, 0.4094], device='cuda:0')
Steps: 36%|███▌ | 360/1000 [03:49<06:47, 1.57it/s, loss=0.00998, lr=0.001]
Steps: 36%|███▌ | 361/1000 [03:49<06:53, 1.55it/s, loss=0.00998, lr=0.001]
Steps: 36%|███▌ | 361/1000 [03:49<06:53, 1.55it/s, loss=0.00192, lr=0.001]
Steps: 36%|███▌ | 362/1000 [03:50<06:48, 1.56it/s, loss=0.00192, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4113],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4102, 0.4093], device='cuda:0')
Steps: 36%|███▌ | 362/1000 [03:50<06:48, 1.56it/s, loss=0.0871, lr=0.001]
Steps: 36%|███▋ | 363/1000 [03:50<06:52, 1.54it/s, loss=0.0871, lr=0.001]
Steps: 36%|███▋ | 363/1000 [03:50<06:52, 1.54it/s, loss=0.0123, lr=0.001]
Steps: 36%|███▋ | 364/1000 [03:51<06:47, 1.56it/s, loss=0.0123, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4110],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4092], device='cuda:0')
Steps: 36%|███▋ | 364/1000 [03:51<06:47, 1.56it/s, loss=0.00502, lr=0.001]
Steps: 36%|███▋ | 365/1000 [03:52<06:51, 1.54it/s, loss=0.00502, lr=0.001]
Steps: 36%|███▋ | 365/1000 [03:52<06:51, 1.54it/s, loss=0.00467, lr=0.001]
Steps: 37%|███▋ | 366/1000 [03:52<06:45, 1.56it/s, loss=0.00467, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4107],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4091], device='cuda:0')
Steps: 37%|███▋ | 366/1000 [03:52<06:45, 1.56it/s, loss=0.00221, lr=0.001]
Steps: 37%|███▋ | 367/1000 [03:53<06:48, 1.55it/s, loss=0.00221, lr=0.001]
Steps: 37%|███▋ | 367/1000 [03:53<06:48, 1.55it/s, loss=0.0025, lr=0.001]
Steps: 37%|███▋ | 368/1000 [03:54<06:43, 1.57it/s, loss=0.0025, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4103],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4090], device='cuda:0')
Steps: 37%|███▋ | 368/1000 [03:54<06:43, 1.57it/s, loss=0.00391, lr=0.001]
Steps: 37%|███▋ | 369/1000 [03:54<06:47, 1.55it/s, loss=0.00391, lr=0.001]
Steps: 37%|███▋ | 369/1000 [03:54<06:47, 1.55it/s, loss=0.0193, lr=0.001]
Steps: 37%|███▋ | 370/1000 [03:55<06:42, 1.56it/s, loss=0.0193, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4100],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4090], device='cuda:0')
Steps: 37%|███▋ | 370/1000 [03:55<06:42, 1.56it/s, loss=0.0164, lr=0.001]
Steps: 37%|███▋ | 371/1000 [03:56<06:44, 1.56it/s, loss=0.0164, lr=0.001]
Steps: 37%|███▋ | 371/1000 [03:56<06:44, 1.56it/s, loss=0.00448, lr=0.001]
Steps: 37%|███▋ | 372/1000 [03:56<06:38, 1.57it/s, loss=0.00448, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4098],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4089], device='cuda:0')
Steps: 37%|███▋ | 372/1000 [03:56<06:38, 1.57it/s, loss=0.025, lr=0.001]
Steps: 37%|███▋ | 373/1000 [03:57<06:44, 1.55it/s, loss=0.025, lr=0.001]
Steps: 37%|███▋ | 373/1000 [03:57<06:44, 1.55it/s, loss=0.0114, lr=0.001]
Steps: 37%|███▋ | 374/1000 [03:58<06:39, 1.57it/s, loss=0.0114, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4096],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4089], device='cuda:0')
Steps: 37%|███▋ | 374/1000 [03:58<06:39, 1.57it/s, loss=0.00137, lr=0.001]
Steps: 38%|███▊ | 375/1000 [03:58<06:41, 1.56it/s, loss=0.00137, lr=0.001]
Steps: 38%|███▊ | 375/1000 [03:58<06:41, 1.56it/s, loss=0.00574, lr=0.001]
Steps: 38%|███▊ | 376/1000 [03:59<06:37, 1.57it/s, loss=0.00574, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4093],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4089], device='cuda:0')
Steps: 38%|███▊ | 376/1000 [03:59<06:37, 1.57it/s, loss=0.00587, lr=0.001]
Steps: 38%|███▊ | 377/1000 [03:59<06:39, 1.56it/s, loss=0.00587, lr=0.001]
Steps: 38%|███▊ | 377/1000 [03:59<06:39, 1.56it/s, loss=0.000478, lr=0.001]
Steps: 38%|███▊ | 378/1000 [04:00<06:35, 1.57it/s, loss=0.000478, lr=0.001]
tensor(0.0050, device='cuda:0')
tensor([[0.4090],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4089], device='cuda:0')
Steps: 38%|███▊ | 378/1000 [04:00<06:35, 1.57it/s, loss=0.00445, lr=0.001]
Steps: 38%|███▊ | 379/1000 [04:01<06:37, 1.56it/s, loss=0.00445, lr=0.001]
Steps: 38%|███▊ | 379/1000 [04:01<06:37, 1.56it/s, loss=0.125, lr=0.001]
Steps: 38%|███▊ | 380/1000 [04:01<06:34, 1.57it/s, loss=0.125, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4087],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4088], device='cuda:0')
Steps: 38%|███▊ | 380/1000 [04:01<06:34, 1.57it/s, loss=0.00104, lr=0.001]
Steps: 38%|███▊ | 381/1000 [04:02<06:36, 1.56it/s, loss=0.00104, lr=0.001]
Steps: 38%|███▊ | 381/1000 [04:02<06:36, 1.56it/s, loss=0.0174, lr=0.001]
Steps: 38%|███▊ | 382/1000 [04:03<06:35, 1.56it/s, loss=0.0174, lr=0.001]
tensor(0.0005, device='cuda:0')
tensor([[0.4084],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4088], device='cuda:0')
Steps: 38%|███▊ | 382/1000 [04:03<06:35, 1.56it/s, loss=0.000611, lr=0.001]
Steps: 38%|███▊ | 383/1000 [04:03<06:38, 1.55it/s, loss=0.000611, lr=0.001]
Steps: 38%|███▊ | 383/1000 [04:03<06:38, 1.55it/s, loss=0.000669, lr=0.001]
Steps: 38%|███▊ | 384/1000 [04:04<06:35, 1.56it/s, loss=0.000669, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4081],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4087], device='cuda:0')
Steps: 38%|███▊ | 384/1000 [04:04<06:35, 1.56it/s, loss=0.00656, lr=0.001]
Steps: 38%|███▊ | 385/1000 [04:05<06:36, 1.55it/s, loss=0.00656, lr=0.001]
Steps: 38%|███▊ | 385/1000 [04:05<06:36, 1.55it/s, loss=0.0152, lr=0.001]
Steps: 39%|███▊ | 386/1000 [04:05<06:32, 1.57it/s, loss=0.0152, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4078],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 39%|███▊ | 386/1000 [04:05<06:32, 1.57it/s, loss=0.000874, lr=0.001]
Steps: 39%|███▊ | 387/1000 [04:06<06:34, 1.56it/s, loss=0.000874, lr=0.001]
Steps: 39%|███▊ | 387/1000 [04:06<06:34, 1.56it/s, loss=0.0114, lr=0.001]
Steps: 39%|███▉ | 388/1000 [04:06<06:29, 1.57it/s, loss=0.0114, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4075],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4084], device='cuda:0')
Steps: 39%|███▉ | 388/1000 [04:06<06:29, 1.57it/s, loss=0.00224, lr=0.001]
Steps: 39%|███▉ | 389/1000 [04:07<06:33, 1.55it/s, loss=0.00224, lr=0.001]
Steps: 39%|███▉ | 389/1000 [04:07<06:33, 1.55it/s, loss=0.00672, lr=0.001]
Steps: 39%|███▉ | 390/1000 [04:08<06:30, 1.56it/s, loss=0.00672, lr=0.001]
tensor(0.0092, device='cuda:0')
tensor([[0.4073],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4083], device='cuda:0')
Steps: 39%|███▉ | 390/1000 [04:08<06:30, 1.56it/s, loss=0.06, lr=0.001]
Steps: 39%|███▉ | 391/1000 [04:08<06:33, 1.55it/s, loss=0.06, lr=0.001]
Steps: 39%|███▉ | 391/1000 [04:08<06:33, 1.55it/s, loss=0.0165, lr=0.001]
Steps: 39%|███▉ | 392/1000 [04:09<06:29, 1.56it/s, loss=0.0165, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4071],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4083], device='cuda:0')
Steps: 39%|███▉ | 392/1000 [04:09<06:29, 1.56it/s, loss=0.00418, lr=0.001]
Steps: 39%|███▉ | 393/1000 [04:10<06:32, 1.55it/s, loss=0.00418, lr=0.001]
Steps: 39%|███▉ | 393/1000 [04:10<06:32, 1.55it/s, loss=0.0117, lr=0.001]
Steps: 39%|███▉ | 394/1000 [04:10<06:27, 1.56it/s, loss=0.0117, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4070],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4063, 0.4082], device='cuda:0')
Steps: 39%|███▉ | 394/1000 [04:10<06:27, 1.56it/s, loss=0.0147, lr=0.001]
Steps: 40%|███▉ | 395/1000 [04:11<06:31, 1.55it/s, loss=0.0147, lr=0.001]
Steps: 40%|███▉ | 395/1000 [04:11<06:31, 1.55it/s, loss=0.0108, lr=0.001]
Steps: 40%|███▉ | 396/1000 [04:12<06:26, 1.56it/s, loss=0.0108, lr=0.001]
tensor(0.0041, device='cuda:0')
tensor([[0.4069],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4062, 0.4082], device='cuda:0')
Steps: 40%|███▉ | 396/1000 [04:12<06:26, 1.56it/s, loss=0.0212, lr=0.001]
Steps: 40%|███▉ | 397/1000 [04:12<06:28, 1.55it/s, loss=0.0212, lr=0.001]
Steps: 40%|███▉ | 397/1000 [04:12<06:28, 1.55it/s, loss=0.000443, lr=0.001]
Steps: 40%|███▉ | 398/1000 [04:13<06:24, 1.57it/s, loss=0.000443, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4068],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4061, 0.4082], device='cuda:0')
Steps: 40%|███▉ | 398/1000 [04:13<06:24, 1.57it/s, loss=0.0275, lr=0.001]
Steps: 40%|███▉ | 399/1000 [04:14<06:26, 1.55it/s, loss=0.0275, lr=0.001]
Steps: 40%|███▉ | 399/1000 [04:14<06:26, 1.55it/s, loss=0.00281, lr=0.001]
Steps: 40%|████ | 400/1000 [04:14<06:22, 1.57it/s, loss=0.00281, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0111, -0.0023, -0.0138, -0.0236], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0041, -0.0082, -0.0049, 0.0073], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_400.safetensors
tensor(0.0030, device='cuda:0')
tensor([[0.4067],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4060, 0.4081], device='cuda:0')
Steps: 40%|████ | 400/1000 [04:14<06:22, 1.57it/s, loss=0.0186, lr=0.001]
Steps: 40%|████ | 401/1000 [04:15<06:25, 1.55it/s, loss=0.0186, lr=0.001]
Steps: 40%|████ | 401/1000 [04:15<06:25, 1.55it/s, loss=0.000768, lr=0.001]
Steps: 40%|████ | 402/1000 [04:15<06:20, 1.57it/s, loss=0.000768, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4066],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4059, 0.4079], device='cuda:0')
Steps: 40%|████ | 402/1000 [04:15<06:20, 1.57it/s, loss=0.024, lr=0.001]
Steps: 40%|████ | 403/1000 [04:16<06:23, 1.56it/s, loss=0.024, lr=0.001]
Steps: 40%|████ | 403/1000 [04:16<06:23, 1.56it/s, loss=0.00256, lr=0.001]
Steps: 40%|████ | 404/1000 [04:17<06:19, 1.57it/s, loss=0.00256, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4065],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4058, 0.4078], device='cuda:0')
Steps: 40%|████ | 404/1000 [04:17<06:19, 1.57it/s, loss=0.0389, lr=0.001]
Steps: 40%|████ | 405/1000 [04:17<06:22, 1.56it/s, loss=0.0389, lr=0.001]
Steps: 40%|████ | 405/1000 [04:17<06:22, 1.56it/s, loss=0.00633, lr=0.001]
Steps: 41%|████ | 406/1000 [04:18<06:19, 1.57it/s, loss=0.00633, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4064],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4058, 0.4077], device='cuda:0')
Steps: 41%|████ | 406/1000 [04:18<06:19, 1.57it/s, loss=0.0101, lr=0.001]
Steps: 41%|████ | 407/1000 [04:19<06:21, 1.55it/s, loss=0.0101, lr=0.001]
Steps: 41%|████ | 407/1000 [04:19<06:21, 1.55it/s, loss=0.0262, lr=0.001]
Steps: 41%|████ | 408/1000 [04:19<06:18, 1.56it/s, loss=0.0262, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4063],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4076], device='cuda:0')
Steps: 41%|████ | 408/1000 [04:19<06:18, 1.56it/s, loss=0.0159, lr=0.001]
Steps: 41%|████ | 409/1000 [04:20<06:19, 1.56it/s, loss=0.0159, lr=0.001]
Steps: 41%|████ | 409/1000 [04:20<06:19, 1.56it/s, loss=0.00111, lr=0.001]
Steps: 41%|████ | 410/1000 [04:21<06:16, 1.57it/s, loss=0.00111, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4063],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4075], device='cuda:0')
Steps: 41%|████ | 410/1000 [04:21<06:16, 1.57it/s, loss=0.00813, lr=0.001]
Steps: 41%|████ | 411/1000 [04:21<06:18, 1.55it/s, loss=0.00813, lr=0.001]
Steps: 41%|████ | 411/1000 [04:21<06:18, 1.55it/s, loss=0.00135, lr=0.001]
Steps: 41%|████ | 412/1000 [04:22<06:15, 1.57it/s, loss=0.00135, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4063],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4074], device='cuda:0')
Steps: 41%|████ | 412/1000 [04:22<06:15, 1.57it/s, loss=0.00114, lr=0.001]
Steps: 41%|████▏ | 413/1000 [04:23<06:21, 1.54it/s, loss=0.00114, lr=0.001]
Steps: 41%|████▏ | 413/1000 [04:23<06:21, 1.54it/s, loss=0.00502, lr=0.001]
Steps: 41%|████▏ | 414/1000 [04:23<06:15, 1.56it/s, loss=0.00502, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4062],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4056, 0.4073], device='cuda:0')
Steps: 41%|████▏ | 414/1000 [04:23<06:15, 1.56it/s, loss=0.00983, lr=0.001]
Steps: 42%|████▏ | 415/1000 [04:24<06:17, 1.55it/s, loss=0.00983, lr=0.001]
Steps: 42%|████▏ | 415/1000 [04:24<06:17, 1.55it/s, loss=0.0534, lr=0.001]
Steps: 42%|████▏ | 416/1000 [04:24<06:13, 1.56it/s, loss=0.0534, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4061],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4055, 0.4073], device='cuda:0')
Steps: 42%|████▏ | 416/1000 [04:24<06:13, 1.56it/s, loss=0.0325, lr=0.001]
Steps: 42%|████▏ | 417/1000 [04:25<06:15, 1.55it/s, loss=0.0325, lr=0.001]
Steps: 42%|████▏ | 417/1000 [04:25<06:15, 1.55it/s, loss=0.000167, lr=0.001]
Steps: 42%|████▏ | 418/1000 [04:26<06:10, 1.57it/s, loss=0.000167, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4061],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4055, 0.4073], device='cuda:0')
Steps: 42%|████▏ | 418/1000 [04:26<06:10, 1.57it/s, loss=0.0166, lr=0.001]
Steps: 42%|████▏ | 419/1000 [04:26<06:12, 1.56it/s, loss=0.0166, lr=0.001]
Steps: 42%|████▏ | 419/1000 [04:26<06:12, 1.56it/s, loss=0.000417, lr=0.001]
Steps: 42%|████▏ | 420/1000 [04:27<06:08, 1.57it/s, loss=0.000417, lr=0.001]
tensor(0.0050, device='cuda:0')
tensor([[0.4060],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4054, 0.4074], device='cuda:0')
Steps: 42%|████▏ | 420/1000 [04:27<06:08, 1.57it/s, loss=0.00439, lr=0.001]
Steps: 42%|████▏ | 421/1000 [04:28<06:13, 1.55it/s, loss=0.00439, lr=0.001]
Steps: 42%|████▏ | 421/1000 [04:28<06:13, 1.55it/s, loss=0.114, lr=0.001]
Steps: 42%|████▏ | 422/1000 [04:28<06:10, 1.56it/s, loss=0.114, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4060],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4054, 0.4075], device='cuda:0')
Steps: 42%|████▏ | 422/1000 [04:28<06:10, 1.56it/s, loss=0.0405, lr=0.001]
Steps: 42%|████▏ | 423/1000 [04:29<06:11, 1.55it/s, loss=0.0405, lr=0.001]
Steps: 42%|████▏ | 423/1000 [04:29<06:11, 1.55it/s, loss=0.00279, lr=0.001]
Steps: 42%|████▏ | 424/1000 [04:30<06:07, 1.57it/s, loss=0.00279, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4060],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4054, 0.4076], device='cuda:0')
Steps: 42%|████▏ | 424/1000 [04:30<06:07, 1.57it/s, loss=0.00228, lr=0.001]
Steps: 42%|████▎ | 425/1000 [04:30<06:08, 1.56it/s, loss=0.00228, lr=0.001]
Steps: 42%|████▎ | 425/1000 [04:30<06:08, 1.56it/s, loss=0.00422, lr=0.001]
Steps: 43%|████▎ | 426/1000 [04:31<06:05, 1.57it/s, loss=0.00422, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4061],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4055, 0.4078], device='cuda:0')
Steps: 43%|████▎ | 426/1000 [04:31<06:05, 1.57it/s, loss=0.00899, lr=0.001]
Steps: 43%|████▎ | 427/1000 [04:31<06:06, 1.56it/s, loss=0.00899, lr=0.001]
Steps: 43%|████▎ | 427/1000 [04:31<06:06, 1.56it/s, loss=0.00143, lr=0.001]
Steps: 43%|████▎ | 428/1000 [04:32<06:03, 1.57it/s, loss=0.00143, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4063],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4056, 0.4079], device='cuda:0')
Steps: 43%|████▎ | 428/1000 [04:32<06:03, 1.57it/s, loss=0.00524, lr=0.001]
Steps: 43%|████▎ | 429/1000 [04:33<06:05, 1.56it/s, loss=0.00524, lr=0.001]
Steps: 43%|████▎ | 429/1000 [04:33<06:05, 1.56it/s, loss=0.00667, lr=0.001]
Steps: 43%|████▎ | 430/1000 [04:33<06:02, 1.57it/s, loss=0.00667, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4064],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4080], device='cuda:0')
Steps: 43%|████▎ | 430/1000 [04:33<06:02, 1.57it/s, loss=0.0314, lr=0.001]
Steps: 43%|████▎ | 431/1000 [04:34<06:05, 1.56it/s, loss=0.0314, lr=0.001]
Steps: 43%|████▎ | 431/1000 [04:34<06:05, 1.56it/s, loss=0.000126, lr=0.001]
Steps: 43%|████▎ | 432/1000 [04:35<06:01, 1.57it/s, loss=0.000126, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4065],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4059, 0.4081], device='cuda:0')
Steps: 43%|████▎ | 432/1000 [04:35<06:01, 1.57it/s, loss=0.00386, lr=0.001]
Steps: 43%|████▎ | 433/1000 [04:35<06:04, 1.56it/s, loss=0.00386, lr=0.001]
Steps: 43%|████▎ | 433/1000 [04:35<06:04, 1.56it/s, loss=0.00157, lr=0.001]
Steps: 43%|████▎ | 434/1000 [04:36<05:59, 1.57it/s, loss=0.00157, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4067],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4060, 0.4082], device='cuda:0')
Steps: 43%|████▎ | 434/1000 [04:36<05:59, 1.57it/s, loss=0.0367, lr=0.001]
Steps: 44%|████▎ | 435/1000 [04:37<06:03, 1.55it/s, loss=0.0367, lr=0.001]
Steps: 44%|████▎ | 435/1000 [04:37<06:03, 1.55it/s, loss=0.000496, lr=0.001]
Steps: 44%|████▎ | 436/1000 [04:37<06:01, 1.56it/s, loss=0.000496, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4068],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4061, 0.4083], device='cuda:0')
Steps: 44%|████▎ | 436/1000 [04:37<06:01, 1.56it/s, loss=0.00519, lr=0.001]
Steps: 44%|████▎ | 437/1000 [04:38<06:03, 1.55it/s, loss=0.00519, lr=0.001]
Steps: 44%|████▎ | 437/1000 [04:38<06:03, 1.55it/s, loss=0.000983, lr=0.001]
Steps: 44%|████▍ | 438/1000 [04:39<06:00, 1.56it/s, loss=0.000983, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4069],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4062, 0.4084], device='cuda:0')
Steps: 44%|████▍ | 438/1000 [04:39<06:00, 1.56it/s, loss=0.0276, lr=0.001]
Steps: 44%|████▍ | 439/1000 [04:39<06:01, 1.55it/s, loss=0.0276, lr=0.001]
Steps: 44%|████▍ | 439/1000 [04:39<06:01, 1.55it/s, loss=0.017, lr=0.001]
Steps: 44%|████▍ | 440/1000 [04:40<05:57, 1.57it/s, loss=0.017, lr=0.001]
tensor(0.0067, device='cuda:0')
tensor([[0.4071],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4085], device='cuda:0')
Steps: 44%|████▍ | 440/1000 [04:40<05:57, 1.57it/s, loss=0.0138, lr=0.001]
Steps: 44%|████▍ | 441/1000 [04:40<05:58, 1.56it/s, loss=0.0138, lr=0.001]
Steps: 44%|████▍ | 441/1000 [04:40<05:58, 1.56it/s, loss=0.083, lr=0.001]
Steps: 44%|████▍ | 442/1000 [04:41<05:55, 1.57it/s, loss=0.083, lr=0.001]
tensor(0.0102, device='cuda:0')
tensor([[0.4072],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4086], device='cuda:0')
Steps: 44%|████▍ | 442/1000 [04:41<05:55, 1.57it/s, loss=0.00452, lr=0.001]
Steps: 44%|████▍ | 443/1000 [04:42<05:56, 1.56it/s, loss=0.00452, lr=0.001]
Steps: 44%|████▍ | 443/1000 [04:42<05:56, 1.56it/s, loss=0.00397, lr=0.001]
Steps: 44%|████▍ | 444/1000 [04:42<05:53, 1.57it/s, loss=0.00397, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4074],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4087], device='cuda:0')
Steps: 44%|████▍ | 444/1000 [04:42<05:53, 1.57it/s, loss=0.0393, lr=0.001]
Steps: 44%|████▍ | 445/1000 [04:43<05:53, 1.57it/s, loss=0.0393, lr=0.001]
Steps: 44%|████▍ | 445/1000 [04:43<05:53, 1.57it/s, loss=0.00046, lr=0.001]
Steps: 45%|████▍ | 446/1000 [04:44<05:50, 1.58it/s, loss=0.00046, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4077],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4088], device='cuda:0')
Steps: 45%|████▍ | 446/1000 [04:44<05:50, 1.58it/s, loss=0.0732, lr=0.001]
Steps: 45%|████▍ | 447/1000 [04:44<05:52, 1.57it/s, loss=0.0732, lr=0.001]
Steps: 45%|████▍ | 447/1000 [04:44<05:52, 1.57it/s, loss=0.022, lr=0.001]
Steps: 45%|████▍ | 448/1000 [04:45<05:49, 1.58it/s, loss=0.022, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4079],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4089], device='cuda:0')
Steps: 45%|████▍ | 448/1000 [04:45<05:49, 1.58it/s, loss=0.0017, lr=0.001]
Steps: 45%|████▍ | 449/1000 [04:46<05:51, 1.57it/s, loss=0.0017, lr=0.001]
Steps: 45%|████▍ | 449/1000 [04:46<05:51, 1.57it/s, loss=0.0157, lr=0.001]
Steps: 45%|████▌ | 450/1000 [04:46<05:48, 1.58it/s, loss=0.0157, lr=0.001]
tensor(0.0044, device='cuda:0')
tensor([[0.4081],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4090], device='cuda:0')
Steps: 45%|████▌ | 450/1000 [04:46<05:48, 1.58it/s, loss=0.00657, lr=0.001]
Steps: 45%|████▌ | 451/1000 [04:47<05:51, 1.56it/s, loss=0.00657, lr=0.001]
Steps: 45%|████▌ | 451/1000 [04:47<05:51, 1.56it/s, loss=0.00209, lr=0.001]
Steps: 45%|████▌ | 452/1000 [04:47<05:48, 1.57it/s, loss=0.00209, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4083],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4091], device='cuda:0')
Steps: 45%|████▌ | 452/1000 [04:47<05:48, 1.57it/s, loss=0.000439, lr=0.001]
Steps: 45%|████▌ | 453/1000 [04:48<05:48, 1.57it/s, loss=0.000439, lr=0.001]
Steps: 45%|████▌ | 453/1000 [04:48<05:48, 1.57it/s, loss=0.00718, lr=0.001]
Steps: 45%|████▌ | 454/1000 [04:49<05:46, 1.58it/s, loss=0.00718, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4084],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4092], device='cuda:0')
Steps: 45%|████▌ | 454/1000 [04:49<05:46, 1.58it/s, loss=0.0172, lr=0.001]
Steps: 46%|████▌ | 455/1000 [04:49<05:55, 1.53it/s, loss=0.0172, lr=0.001]
Steps: 46%|████▌ | 455/1000 [04:49<05:55, 1.53it/s, loss=0.0327, lr=0.001]
Steps: 46%|████▌ | 456/1000 [04:50<05:51, 1.55it/s, loss=0.0327, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 456/1000 [04:50<05:51, 1.55it/s, loss=0.00666, lr=0.001]
Steps: 46%|████▌ | 457/1000 [04:51<05:54, 1.53it/s, loss=0.00666, lr=0.001]
Steps: 46%|████▌ | 457/1000 [04:51<05:54, 1.53it/s, loss=0.00171, lr=0.001]
Steps: 46%|████▌ | 458/1000 [04:51<05:51, 1.54it/s, loss=0.00171, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 458/1000 [04:51<05:51, 1.54it/s, loss=0.00249, lr=0.001]
Steps: 46%|████▌ | 459/1000 [04:52<05:54, 1.52it/s, loss=0.00249, lr=0.001]
Steps: 46%|████▌ | 459/1000 [04:52<05:54, 1.52it/s, loss=0.0129, lr=0.001]
Steps: 46%|████▌ | 460/1000 [04:53<05:50, 1.54it/s, loss=0.0129, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 460/1000 [04:53<05:50, 1.54it/s, loss=0.0134, lr=0.001]
Steps: 46%|████▌ | 461/1000 [04:53<05:50, 1.54it/s, loss=0.0134, lr=0.001]
Steps: 46%|████▌ | 461/1000 [04:53<05:50, 1.54it/s, loss=0.0508, lr=0.001]
Steps: 46%|████▌ | 462/1000 [04:54<05:45, 1.56it/s, loss=0.0508, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 462/1000 [04:54<05:45, 1.56it/s, loss=0.00419, lr=0.001]
Steps: 46%|████▋ | 463/1000 [04:55<05:48, 1.54it/s, loss=0.00419, lr=0.001]
Steps: 46%|████▋ | 463/1000 [04:55<05:48, 1.54it/s, loss=0.00566, lr=0.001]
Steps: 46%|████▋ | 464/1000 [04:55<05:44, 1.56it/s, loss=0.00566, lr=0.001]
tensor(0.0102, device='cuda:0')
tensor([[0.4086],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4091], device='cuda:0')
Steps: 46%|████▋ | 464/1000 [04:55<05:44, 1.56it/s, loss=0.00386, lr=0.001]
Steps: 46%|████▋ | 465/1000 [04:56<05:46, 1.54it/s, loss=0.00386, lr=0.001]
Steps: 46%|████▋ | 465/1000 [04:56<05:46, 1.54it/s, loss=0.00318, lr=0.001]
Steps: 47%|████▋ | 466/1000 [04:57<05:42, 1.56it/s, loss=0.00318, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4087],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4091], device='cuda:0')
Steps: 47%|████▋ | 466/1000 [04:57<05:42, 1.56it/s, loss=0.00886, lr=0.001]
Steps: 47%|████▋ | 467/1000 [04:57<05:42, 1.55it/s, loss=0.00886, lr=0.001]
Steps: 47%|████▋ | 467/1000 [04:57<05:42, 1.55it/s, loss=0.000229, lr=0.001]
Steps: 47%|████▋ | 468/1000 [04:58<05:38, 1.57it/s, loss=0.000229, lr=0.001]
tensor(0.0087, device='cuda:0')
tensor([[0.4088],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4091], device='cuda:0')
Steps: 47%|████▋ | 468/1000 [04:58<05:38, 1.57it/s, loss=0.0432, lr=0.001]
Steps: 47%|████▋ | 469/1000 [04:58<05:39, 1.56it/s, loss=0.0432, lr=0.001]
Steps: 47%|████▋ | 469/1000 [04:58<05:39, 1.56it/s, loss=0.00354, lr=0.001]
Steps: 47%|████▋ | 470/1000 [04:59<05:36, 1.58it/s, loss=0.00354, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4090],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4092], device='cuda:0')
Steps: 47%|████▋ | 470/1000 [04:59<05:36, 1.58it/s, loss=0.00289, lr=0.001]
Steps: 47%|████▋ | 471/1000 [05:00<05:38, 1.56it/s, loss=0.00289, lr=0.001]
Steps: 47%|████▋ | 471/1000 [05:00<05:38, 1.56it/s, loss=0.077, lr=0.001]
Steps: 47%|████▋ | 472/1000 [05:00<05:34, 1.58it/s, loss=0.077, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4092],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4092], device='cuda:0')
Steps: 47%|████▋ | 472/1000 [05:00<05:34, 1.58it/s, loss=0.00646, lr=0.001]
Steps: 47%|████▋ | 473/1000 [05:01<05:35, 1.57it/s, loss=0.00646, lr=0.001]
Steps: 47%|████▋ | 473/1000 [05:01<05:35, 1.57it/s, loss=0.00128, lr=0.001]
Steps: 47%|████▋ | 474/1000 [05:02<05:32, 1.58it/s, loss=0.00128, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4094],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4092], device='cuda:0')
Steps: 47%|████▋ | 474/1000 [05:02<05:32, 1.58it/s, loss=0.0353, lr=0.001]
Steps: 48%|████▊ | 475/1000 [05:02<05:36, 1.56it/s, loss=0.0353, lr=0.001]
Steps: 48%|████▊ | 475/1000 [05:02<05:36, 1.56it/s, loss=0.0144, lr=0.001]
Steps: 48%|████▊ | 476/1000 [05:03<05:33, 1.57it/s, loss=0.0144, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4096],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 476/1000 [05:03<05:33, 1.57it/s, loss=0.0618, lr=0.001]
Steps: 48%|████▊ | 477/1000 [05:04<05:35, 1.56it/s, loss=0.0618, lr=0.001]
Steps: 48%|████▊ | 477/1000 [05:04<05:35, 1.56it/s, loss=0.00403, lr=0.001]
Steps: 48%|████▊ | 478/1000 [05:04<05:32, 1.57it/s, loss=0.00403, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4097],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 478/1000 [05:04<05:32, 1.57it/s, loss=0.00982, lr=0.001]
Steps: 48%|████▊ | 479/1000 [05:05<05:35, 1.55it/s, loss=0.00982, lr=0.001]
Steps: 48%|████▊ | 479/1000 [05:05<05:35, 1.55it/s, loss=0.00697, lr=0.001]
Steps: 48%|████▊ | 480/1000 [05:05<05:31, 1.57it/s, loss=0.00697, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4098],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 480/1000 [05:05<05:31, 1.57it/s, loss=0.000747, lr=0.001]
Steps: 48%|████▊ | 481/1000 [05:06<05:35, 1.55it/s, loss=0.000747, lr=0.001]
Steps: 48%|████▊ | 481/1000 [05:06<05:35, 1.55it/s, loss=0.0442, lr=0.001]
Steps: 48%|████▊ | 482/1000 [05:07<05:32, 1.56it/s, loss=0.0442, lr=0.001]
tensor(0.0087, device='cuda:0')
tensor([[0.4099],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4091], device='cuda:0')
Steps: 48%|████▊ | 482/1000 [05:07<05:32, 1.56it/s, loss=0.0821, lr=0.001]
Steps: 48%|████▊ | 483/1000 [05:07<05:32, 1.55it/s, loss=0.0821, lr=0.001]
Steps: 48%|████▊ | 483/1000 [05:07<05:32, 1.55it/s, loss=0.0551, lr=0.001]
Steps: 48%|████▊ | 484/1000 [05:08<05:29, 1.57it/s, loss=0.0551, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4100],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 484/1000 [05:08<05:29, 1.57it/s, loss=0.0276, lr=0.001]
Steps: 48%|████▊ | 485/1000 [05:09<05:33, 1.55it/s, loss=0.0276, lr=0.001]
Steps: 48%|████▊ | 485/1000 [05:09<05:33, 1.55it/s, loss=0.049, lr=0.001]
Steps: 49%|████▊ | 486/1000 [05:09<05:29, 1.56it/s, loss=0.049, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4101],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4093], device='cuda:0')
Steps: 49%|████▊ | 486/1000 [05:09<05:29, 1.56it/s, loss=0.0103, lr=0.001]
Steps: 49%|████▊ | 487/1000 [05:10<05:33, 1.54it/s, loss=0.0103, lr=0.001]
Steps: 49%|████▊ | 487/1000 [05:10<05:33, 1.54it/s, loss=0.016, lr=0.001]
Steps: 49%|████▉ | 488/1000 [05:11<05:28, 1.56it/s, loss=0.016, lr=0.001]
tensor(0.0004, device='cuda:0')
tensor([[0.4102],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4095], device='cuda:0')
Steps: 49%|████▉ | 488/1000 [05:11<05:28, 1.56it/s, loss=0.00151, lr=0.001]
Steps: 49%|████▉ | 489/1000 [05:11<05:29, 1.55it/s, loss=0.00151, lr=0.001]
Steps: 49%|████▉ | 489/1000 [05:11<05:29, 1.55it/s, loss=0.000248, lr=0.001]
Steps: 49%|████▉ | 490/1000 [05:12<05:25, 1.57it/s, loss=0.000248, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4103],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4096], device='cuda:0')
Steps: 49%|████▉ | 490/1000 [05:12<05:25, 1.57it/s, loss=0.00138, lr=0.001]
Steps: 49%|████▉ | 491/1000 [05:13<05:28, 1.55it/s, loss=0.00138, lr=0.001]
Steps: 49%|████▉ | 491/1000 [05:13<05:28, 1.55it/s, loss=0.0205, lr=0.001]
Steps: 49%|████▉ | 492/1000 [05:13<05:25, 1.56it/s, loss=0.0205, lr=0.001]
tensor(0.0024, device='cuda:0')
tensor([[0.4104],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4096], device='cuda:0')
Steps: 49%|████▉ | 492/1000 [05:13<05:25, 1.56it/s, loss=0.00147, lr=0.001]
Steps: 49%|████▉ | 493/1000 [05:14<05:27, 1.55it/s, loss=0.00147, lr=0.001]
Steps: 49%|████▉ | 493/1000 [05:14<05:27, 1.55it/s, loss=0.00637, lr=0.001]
Steps: 49%|████▉ | 494/1000 [05:14<05:24, 1.56it/s, loss=0.00637, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4105],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4096], device='cuda:0')
Steps: 49%|████▉ | 494/1000 [05:14<05:24, 1.56it/s, loss=0.044, lr=0.001]
Steps: 50%|████▉ | 495/1000 [05:15<05:26, 1.55it/s, loss=0.044, lr=0.001]
Steps: 50%|████▉ | 495/1000 [05:15<05:26, 1.55it/s, loss=0.00497, lr=0.001]
Steps: 50%|████▉ | 496/1000 [05:16<05:22, 1.56it/s, loss=0.00497, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4105],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4096], device='cuda:0')
Steps: 50%|████▉ | 496/1000 [05:16<05:22, 1.56it/s, loss=0.0009, lr=0.001]
Steps: 50%|████▉ | 497/1000 [05:16<05:25, 1.54it/s, loss=0.0009, lr=0.001]
Steps: 50%|████▉ | 497/1000 [05:16<05:25, 1.54it/s, loss=0.00351, lr=0.001]
Steps: 50%|████▉ | 498/1000 [05:17<05:22, 1.56it/s, loss=0.00351, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4105],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4096], device='cuda:0')
Steps: 50%|████▉ | 498/1000 [05:17<05:22, 1.56it/s, loss=0.0155, lr=0.001]
Steps: 50%|████▉ | 499/1000 [05:18<05:23, 1.55it/s, loss=0.0155, lr=0.001]
Steps: 50%|████▉ | 499/1000 [05:18<05:23, 1.55it/s, loss=0.0452, lr=0.001]
Steps: 50%|█████ | 500/1000 [05:18<05:19, 1.56it/s, loss=0.0452, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0147, -0.0021, -0.0056, -0.0236], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0072, 0.0016, -0.0013, 0.0088], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_500.safetensors
tensor(0.0015, device='cuda:0')
tensor([[0.4105],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4096], device='cuda:0')
Steps: 50%|█████ | 500/1000 [05:18<05:19, 1.56it/s, loss=0.00169, lr=0.001]
Steps: 50%|█████ | 501/1000 [05:19<05:20, 1.55it/s, loss=0.00169, lr=0.001]
Steps: 50%|█████ | 501/1000 [05:19<05:20, 1.55it/s, loss=0.000385, lr=0.001]
Steps: 50%|█████ | 502/1000 [05:20<05:18, 1.56it/s, loss=0.000385, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4104],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4095], device='cuda:0')
Steps: 50%|█████ | 502/1000 [05:20<05:18, 1.56it/s, loss=0.004, lr=0.001]
Steps: 50%|█████ | 503/1000 [05:20<05:20, 1.55it/s, loss=0.004, lr=0.001]
Steps: 50%|█████ | 503/1000 [05:20<05:20, 1.55it/s, loss=0.0547, lr=0.001]
Steps: 50%|█████ | 504/1000 [05:21<05:17, 1.56it/s, loss=0.0547, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4103],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4094], device='cuda:0')
Steps: 50%|█████ | 504/1000 [05:21<05:17, 1.56it/s, loss=0.000724, lr=0.001]
Steps: 50%|█████ | 505/1000 [05:22<05:17, 1.56it/s, loss=0.000724, lr=0.001]
Steps: 50%|█████ | 505/1000 [05:22<05:17, 1.56it/s, loss=0.00276, lr=0.001]
Steps: 51%|█████ | 506/1000 [05:22<05:14, 1.57it/s, loss=0.00276, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4101],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4093], device='cuda:0')
Steps: 51%|█████ | 506/1000 [05:22<05:14, 1.57it/s, loss=0.000694, lr=0.001]
Steps: 51%|█████ | 507/1000 [05:23<05:16, 1.56it/s, loss=0.000694, lr=0.001]
Steps: 51%|█████ | 507/1000 [05:23<05:16, 1.56it/s, loss=0.000989, lr=0.001]
Steps: 51%|█████ | 508/1000 [05:23<05:13, 1.57it/s, loss=0.000989, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4100],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4092], device='cuda:0')
Steps: 51%|█████ | 508/1000 [05:23<05:13, 1.57it/s, loss=0.00329, lr=0.001]
Steps: 51%|█████ | 509/1000 [05:24<05:15, 1.56it/s, loss=0.00329, lr=0.001]
Steps: 51%|█████ | 509/1000 [05:24<05:15, 1.56it/s, loss=0.0359, lr=0.001]
Steps: 51%|█████ | 510/1000 [05:25<05:11, 1.57it/s, loss=0.0359, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4098],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4091], device='cuda:0')
Steps: 51%|█████ | 510/1000 [05:25<05:11, 1.57it/s, loss=0.0109, lr=0.001]
Steps: 51%|█████ | 511/1000 [05:25<05:15, 1.55it/s, loss=0.0109, lr=0.001]
Steps: 51%|█████ | 511/1000 [05:25<05:15, 1.55it/s, loss=0.005, lr=0.001]
Steps: 51%|█████ | 512/1000 [05:26<05:12, 1.56it/s, loss=0.005, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4096],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4090], device='cuda:0')
Steps: 51%|█████ | 512/1000 [05:26<05:12, 1.56it/s, loss=0.152, lr=0.001]
Steps: 51%|█████▏ | 513/1000 [05:27<05:17, 1.54it/s, loss=0.152, lr=0.001]
Steps: 51%|█████▏ | 513/1000 [05:27<05:17, 1.54it/s, loss=0.00354, lr=0.001]
Steps: 51%|█████▏ | 514/1000 [05:27<05:12, 1.56it/s, loss=0.00354, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4094],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4088], device='cuda:0')
Steps: 51%|█████▏ | 514/1000 [05:27<05:12, 1.56it/s, loss=0.0389, lr=0.001]
Steps: 52%|█████▏ | 515/1000 [05:28<05:13, 1.54it/s, loss=0.0389, lr=0.001]
Steps: 52%|█████▏ | 515/1000 [05:28<05:13, 1.54it/s, loss=0.0244, lr=0.001]
Steps: 52%|█████▏ | 516/1000 [05:29<05:11, 1.56it/s, loss=0.0244, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4093],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4087], device='cuda:0')
Steps: 52%|█████▏ | 516/1000 [05:29<05:11, 1.56it/s, loss=0.0105, lr=0.001]
Steps: 52%|█████▏ | 517/1000 [05:29<05:12, 1.55it/s, loss=0.0105, lr=0.001]
Steps: 52%|█████▏ | 517/1000 [05:29<05:12, 1.55it/s, loss=0.000476, lr=0.001]
Steps: 52%|█████▏ | 518/1000 [05:30<05:10, 1.55it/s, loss=0.000476, lr=0.001]
tensor(0.0080, device='cuda:0')
tensor([[0.4091],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4086], device='cuda:0')
Steps: 52%|█████▏ | 518/1000 [05:30<05:10, 1.55it/s, loss=0.00252, lr=0.001]
Steps: 52%|█████▏ | 519/1000 [05:31<05:11, 1.55it/s, loss=0.00252, lr=0.001]
Steps: 52%|█████▏ | 519/1000 [05:31<05:11, 1.55it/s, loss=0.015, lr=0.001]
Steps: 52%|█████▏ | 520/1000 [05:31<05:06, 1.56it/s, loss=0.015, lr=0.001]
tensor(0.0011, device='cuda:0')
tensor([[0.4090],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4085], device='cuda:0')
Steps: 52%|█████▏ | 520/1000 [05:31<05:06, 1.56it/s, loss=0.000183, lr=0.001]
Steps: 52%|█████▏ | 521/1000 [05:32<05:09, 1.55it/s, loss=0.000183, lr=0.001]
Steps: 52%|█████▏ | 521/1000 [05:32<05:09, 1.55it/s, loss=0.00404, lr=0.001]
Steps: 52%|█████▏ | 522/1000 [05:32<05:06, 1.56it/s, loss=0.00404, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4088],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4084], device='cuda:0')
Steps: 52%|█████▏ | 522/1000 [05:32<05:06, 1.56it/s, loss=0.00752, lr=0.001]
Steps: 52%|█████▏ | 523/1000 [05:33<05:06, 1.56it/s, loss=0.00752, lr=0.001]
Steps: 52%|█████▏ | 523/1000 [05:33<05:06, 1.56it/s, loss=0.0113, lr=0.001]
Steps: 52%|█████▏ | 524/1000 [05:34<05:02, 1.57it/s, loss=0.0113, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4086],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4082], device='cuda:0')
Steps: 52%|█████▏ | 524/1000 [05:34<05:02, 1.57it/s, loss=0.00317, lr=0.001]
Steps: 52%|█████▎ | 525/1000 [05:34<05:05, 1.55it/s, loss=0.00317, lr=0.001]
Steps: 52%|█████▎ | 525/1000 [05:34<05:05, 1.55it/s, loss=0.000528, lr=0.001]
Steps: 53%|█████▎ | 526/1000 [05:35<05:02, 1.57it/s, loss=0.000528, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4084],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4081], device='cuda:0')
Steps: 53%|█████▎ | 526/1000 [05:35<05:02, 1.57it/s, loss=0.000145, lr=0.001]
Steps: 53%|█████▎ | 527/1000 [05:36<05:03, 1.56it/s, loss=0.000145, lr=0.001]
Steps: 53%|█████▎ | 527/1000 [05:36<05:03, 1.56it/s, loss=0.0233, lr=0.001]
Steps: 53%|█████▎ | 528/1000 [05:36<04:59, 1.57it/s, loss=0.0233, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4083],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4080], device='cuda:0')
Steps: 53%|█████▎ | 528/1000 [05:36<04:59, 1.57it/s, loss=0.000492, lr=0.001]
Steps: 53%|█████▎ | 529/1000 [05:37<05:01, 1.56it/s, loss=0.000492, lr=0.001]
Steps: 53%|█████▎ | 529/1000 [05:37<05:01, 1.56it/s, loss=0.0405, lr=0.001]
Steps: 53%|█████▎ | 530/1000 [05:38<04:57, 1.58it/s, loss=0.0405, lr=0.001]
tensor(0.0062, device='cuda:0')
tensor([[0.4081],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4078], device='cuda:0')
Steps: 53%|█████▎ | 530/1000 [05:38<04:57, 1.58it/s, loss=0.00214, lr=0.001]
Steps: 53%|█████▎ | 531/1000 [05:38<04:59, 1.57it/s, loss=0.00214, lr=0.001]
Steps: 53%|█████▎ | 531/1000 [05:38<04:59, 1.57it/s, loss=0.0452, lr=0.001]
Steps: 53%|█████▎ | 532/1000 [05:39<04:56, 1.58it/s, loss=0.0452, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4080],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4077], device='cuda:0')
Steps: 53%|█████▎ | 532/1000 [05:39<04:56, 1.58it/s, loss=0.0139, lr=0.001]
Steps: 53%|█████▎ | 533/1000 [05:39<04:57, 1.57it/s, loss=0.0139, lr=0.001]
Steps: 53%|█████▎ | 533/1000 [05:39<04:57, 1.57it/s, loss=0.0151, lr=0.001]
Steps: 53%|█████▎ | 534/1000 [05:40<04:55, 1.58it/s, loss=0.0151, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4078],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4077], device='cuda:0')
Steps: 53%|█████▎ | 534/1000 [05:40<04:55, 1.58it/s, loss=0.021, lr=0.001]
Steps: 54%|█████▎ | 535/1000 [05:41<04:57, 1.56it/s, loss=0.021, lr=0.001]
Steps: 54%|█████▎ | 535/1000 [05:41<04:57, 1.56it/s, loss=0.00169, lr=0.001]
Steps: 54%|█████▎ | 536/1000 [05:41<04:55, 1.57it/s, loss=0.00169, lr=0.001]
tensor(0.0003, device='cuda:0')
tensor([[0.4077],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4077], device='cuda:0')
Steps: 54%|█████▎ | 536/1000 [05:41<04:55, 1.57it/s, loss=0.000928, lr=0.001]
Steps: 54%|█████▎ | 537/1000 [05:42<04:57, 1.56it/s, loss=0.000928, lr=0.001]
Steps: 54%|█████▎ | 537/1000 [05:42<04:57, 1.56it/s, loss=0.000183, lr=0.001]
Steps: 54%|█████▍ | 538/1000 [05:43<04:53, 1.57it/s, loss=0.000183, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4075],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4077], device='cuda:0')
Steps: 54%|█████▍ | 538/1000 [05:43<04:53, 1.57it/s, loss=0.0489, lr=0.001]
Steps: 54%|█████▍ | 539/1000 [05:43<04:57, 1.55it/s, loss=0.0489, lr=0.001]
Steps: 54%|█████▍ | 539/1000 [05:43<04:57, 1.55it/s, loss=0.0065, lr=0.001]
Steps: 54%|█████▍ | 540/1000 [05:44<04:53, 1.57it/s, loss=0.0065, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4074],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4077], device='cuda:0')
Steps: 54%|█████▍ | 540/1000 [05:44<04:53, 1.57it/s, loss=0.00161, lr=0.001]
Steps: 54%|█████▍ | 541/1000 [05:45<04:56, 1.55it/s, loss=0.00161, lr=0.001]
Steps: 54%|█████▍ | 541/1000 [05:45<04:56, 1.55it/s, loss=0.00176, lr=0.001]
Steps: 54%|█████▍ | 542/1000 [05:45<04:53, 1.56it/s, loss=0.00176, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4073],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4076], device='cuda:0')
Steps: 54%|█████▍ | 542/1000 [05:45<04:53, 1.56it/s, loss=0.00181, lr=0.001]
Steps: 54%|█████▍ | 543/1000 [05:46<04:54, 1.55it/s, loss=0.00181, lr=0.001]
Steps: 54%|█████▍ | 543/1000 [05:46<04:54, 1.55it/s, loss=0.00432, lr=0.001]
Steps: 54%|█████▍ | 544/1000 [05:47<04:51, 1.56it/s, loss=0.00432, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4072],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4076], device='cuda:0')
Steps: 54%|█████▍ | 544/1000 [05:47<04:51, 1.56it/s, loss=0.00219, lr=0.001]
Steps: 55%|█████▍ | 545/1000 [05:47<04:51, 1.56it/s, loss=0.00219, lr=0.001]
Steps: 55%|█████▍ | 545/1000 [05:47<04:51, 1.56it/s, loss=0.0136, lr=0.001]
Steps: 55%|█████▍ | 546/1000 [05:48<04:49, 1.57it/s, loss=0.0136, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4071],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4075], device='cuda:0')
Steps: 55%|█████▍ | 546/1000 [05:48<04:49, 1.57it/s, loss=0.00109, lr=0.001]
Steps: 55%|█████▍ | 547/1000 [05:48<04:49, 1.56it/s, loss=0.00109, lr=0.001]
Steps: 55%|█████▍ | 547/1000 [05:48<04:49, 1.56it/s, loss=0.021, lr=0.001]
Steps: 55%|█████▍ | 548/1000 [05:49<04:47, 1.57it/s, loss=0.021, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4071],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4073], device='cuda:0')
Steps: 55%|█████▍ | 548/1000 [05:49<04:47, 1.57it/s, loss=0.016, lr=0.001]
Steps: 55%|█████▍ | 549/1000 [05:50<04:49, 1.56it/s, loss=0.016, lr=0.001]
Steps: 55%|█████▍ | 549/1000 [05:50<04:49, 1.56it/s, loss=0.00611, lr=0.001]
Steps: 55%|█████▌ | 550/1000 [05:50<04:46, 1.57it/s, loss=0.00611, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4071],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4071], device='cuda:0')
Steps: 55%|█████▌ | 550/1000 [05:50<04:46, 1.57it/s, loss=0.00452, lr=0.001]
Steps: 55%|█████▌ | 551/1000 [05:51<04:47, 1.56it/s, loss=0.00452, lr=0.001]
Steps: 55%|█████▌ | 551/1000 [05:51<04:47, 1.56it/s, loss=0.00135, lr=0.001]
Steps: 55%|█████▌ | 552/1000 [05:52<04:47, 1.56it/s, loss=0.00135, lr=0.001]
tensor(0.0008, device='cuda:0')
tensor([[0.4071],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4069], device='cuda:0')
Steps: 55%|█████▌ | 552/1000 [05:52<04:47, 1.56it/s, loss=0.000583, lr=0.001]
Steps: 55%|█████▌ | 553/1000 [05:52<04:48, 1.55it/s, loss=0.000583, lr=0.001]
Steps: 55%|█████▌ | 553/1000 [05:52<04:48, 1.55it/s, loss=0.000458, lr=0.001]
Steps: 55%|█████▌ | 554/1000 [05:53<04:46, 1.56it/s, loss=0.000458, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4072],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4068], device='cuda:0')
Steps: 55%|█████▌ | 554/1000 [05:53<04:46, 1.56it/s, loss=0.00163, lr=0.001]
Steps: 56%|█████▌ | 555/1000 [05:54<04:47, 1.55it/s, loss=0.00163, lr=0.001]
Steps: 56%|█████▌ | 555/1000 [05:54<04:47, 1.55it/s, loss=0.076, lr=0.001]
Steps: 56%|█████▌ | 556/1000 [05:54<04:44, 1.56it/s, loss=0.076, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4072],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 556/1000 [05:54<04:44, 1.56it/s, loss=0.00438, lr=0.001]
Steps: 56%|█████▌ | 557/1000 [05:55<04:45, 1.55it/s, loss=0.00438, lr=0.001]
Steps: 56%|█████▌ | 557/1000 [05:55<04:45, 1.55it/s, loss=0.00992, lr=0.001]
Steps: 56%|█████▌ | 558/1000 [05:55<04:41, 1.57it/s, loss=0.00992, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4072],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 558/1000 [05:55<04:41, 1.57it/s, loss=0.00724, lr=0.001]
Steps: 56%|█████▌ | 559/1000 [05:56<04:42, 1.56it/s, loss=0.00724, lr=0.001]
Steps: 56%|█████▌ | 559/1000 [05:56<04:42, 1.56it/s, loss=0.0981, lr=0.001]
Steps: 56%|█████▌ | 560/1000 [05:57<04:39, 1.58it/s, loss=0.0981, lr=0.001]
tensor(0.0098, device='cuda:0')
tensor([[0.4072],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 560/1000 [05:57<04:39, 1.58it/s, loss=0.0443, lr=0.001]
Steps: 56%|█████▌ | 561/1000 [05:57<04:40, 1.56it/s, loss=0.0443, lr=0.001]
Steps: 56%|█████▌ | 561/1000 [05:57<04:40, 1.56it/s, loss=0.00534, lr=0.001]
Steps: 56%|█████▌ | 562/1000 [05:58<04:37, 1.58it/s, loss=0.00534, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4072],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 562/1000 [05:58<04:37, 1.58it/s, loss=0.000203, lr=0.001]
Steps: 56%|█████▋ | 563/1000 [05:59<04:38, 1.57it/s, loss=0.000203, lr=0.001]
Steps: 56%|█████▋ | 563/1000 [05:59<04:38, 1.57it/s, loss=0.0275, lr=0.001]
Steps: 56%|█████▋ | 564/1000 [05:59<04:36, 1.58it/s, loss=0.0275, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 56%|█████▋ | 564/1000 [05:59<04:36, 1.58it/s, loss=0.0524, lr=0.001]
Steps: 56%|█████▋ | 565/1000 [06:00<04:37, 1.57it/s, loss=0.0524, lr=0.001]
Steps: 56%|█████▋ | 565/1000 [06:00<04:37, 1.57it/s, loss=0.0067, lr=0.001]
Steps: 57%|█████▋ | 566/1000 [06:01<04:34, 1.58it/s, loss=0.0067, lr=0.001]
tensor(0.0110, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 57%|█████▋ | 566/1000 [06:01<04:34, 1.58it/s, loss=0.00774, lr=0.001]
Steps: 57%|█████▋ | 567/1000 [06:01<04:36, 1.57it/s, loss=0.00774, lr=0.001]
Steps: 57%|█████▋ | 567/1000 [06:01<04:36, 1.57it/s, loss=0.00531, lr=0.001]
Steps: 57%|█████▋ | 568/1000 [06:02<04:32, 1.58it/s, loss=0.00531, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4073],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4068], device='cuda:0')
Steps: 57%|█████▋ | 568/1000 [06:02<04:32, 1.58it/s, loss=0.00398, lr=0.001]
Steps: 57%|█████▋ | 569/1000 [06:02<04:34, 1.57it/s, loss=0.00398, lr=0.001]
Steps: 57%|█████▋ | 569/1000 [06:02<04:34, 1.57it/s, loss=0.000125, lr=0.001]
Steps: 57%|█████▋ | 570/1000 [06:03<04:32, 1.58it/s, loss=0.000125, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4073],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4068], device='cuda:0')
Steps: 57%|█████▋ | 570/1000 [06:03<04:32, 1.58it/s, loss=0.00406, lr=0.001]
Steps: 57%|█████▋ | 571/1000 [06:04<04:34, 1.57it/s, loss=0.00406, lr=0.001]
Steps: 57%|█████▋ | 571/1000 [06:04<04:34, 1.57it/s, loss=0.000723, lr=0.001]
Steps: 57%|█████▋ | 572/1000 [06:04<04:31, 1.57it/s, loss=0.000723, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4072],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4069], device='cuda:0')
Steps: 57%|█████▋ | 572/1000 [06:04<04:31, 1.57it/s, loss=0.0526, lr=0.001]
Steps: 57%|█████▋ | 573/1000 [06:05<04:33, 1.56it/s, loss=0.0526, lr=0.001]
Steps: 57%|█████▋ | 573/1000 [06:05<04:33, 1.56it/s, loss=0.000955, lr=0.001]
Steps: 57%|█████▋ | 574/1000 [06:06<04:30, 1.57it/s, loss=0.000955, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4071],
[0.4078]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4070], device='cuda:0')
Steps: 57%|█████▋ | 574/1000 [06:06<04:30, 1.57it/s, loss=0.0284, lr=0.001]
Steps: 57%|█████▊ | 575/1000 [06:06<04:31, 1.56it/s, loss=0.0284, lr=0.001]
Steps: 57%|█████▊ | 575/1000 [06:06<04:31, 1.56it/s, loss=0.00668, lr=0.001]
Steps: 58%|█████▊ | 576/1000 [06:07<04:28, 1.58it/s, loss=0.00668, lr=0.001]
tensor(0.0083, device='cuda:0')
tensor([[0.4070],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4063, 0.4071], device='cuda:0')
Steps: 58%|█████▊ | 576/1000 [06:07<04:28, 1.58it/s, loss=0.0343, lr=0.001]
Steps: 58%|█████▊ | 577/1000 [06:08<04:29, 1.57it/s, loss=0.0343, lr=0.001]
Steps: 58%|█████▊ | 577/1000 [06:08<04:29, 1.57it/s, loss=0.0111, lr=0.001]
Steps: 58%|█████▊ | 578/1000 [06:08<04:26, 1.58it/s, loss=0.0111, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4070],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4063, 0.4073], device='cuda:0')
Steps: 58%|█████▊ | 578/1000 [06:08<04:26, 1.58it/s, loss=0.0138, lr=0.001]
Steps: 58%|█████▊ | 579/1000 [06:09<04:28, 1.57it/s, loss=0.0138, lr=0.001]
Steps: 58%|█████▊ | 579/1000 [06:09<04:28, 1.57it/s, loss=0.0497, lr=0.001]
Steps: 58%|█████▊ | 580/1000 [06:09<04:26, 1.58it/s, loss=0.0497, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4071],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4075], device='cuda:0')
Steps: 58%|█████▊ | 580/1000 [06:09<04:26, 1.58it/s, loss=0.000655, lr=0.001]
Steps: 58%|█████▊ | 581/1000 [06:10<04:27, 1.57it/s, loss=0.000655, lr=0.001]
Steps: 58%|█████▊ | 581/1000 [06:10<04:27, 1.57it/s, loss=0.013, lr=0.001]
Steps: 58%|█████▊ | 582/1000 [06:11<04:24, 1.58it/s, loss=0.013, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4072],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4077], device='cuda:0')
Steps: 58%|█████▊ | 582/1000 [06:11<04:24, 1.58it/s, loss=0.0135, lr=0.001]
Steps: 58%|█████▊ | 583/1000 [06:11<04:26, 1.56it/s, loss=0.0135, lr=0.001]
Steps: 58%|█████▊ | 583/1000 [06:11<04:26, 1.56it/s, loss=0.0109, lr=0.001]
Steps: 58%|█████▊ | 584/1000 [06:12<04:24, 1.57it/s, loss=0.0109, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4073],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4079], device='cuda:0')
Steps: 58%|█████▊ | 584/1000 [06:12<04:24, 1.57it/s, loss=0.00615, lr=0.001]
Steps: 58%|█████▊ | 585/1000 [06:13<04:25, 1.56it/s, loss=0.00615, lr=0.001]
Steps: 58%|█████▊ | 585/1000 [06:13<04:25, 1.56it/s, loss=0.0127, lr=0.001]
Steps: 59%|█████▊ | 586/1000 [06:13<04:22, 1.57it/s, loss=0.0127, lr=0.001]
tensor(0.0024, device='cuda:0')
tensor([[0.4074],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4081], device='cuda:0')
Steps: 59%|█████▊ | 586/1000 [06:13<04:22, 1.57it/s, loss=0.000324, lr=0.001]
Steps: 59%|█████▊ | 587/1000 [06:14<04:24, 1.56it/s, loss=0.000324, lr=0.001]
Steps: 59%|█████▊ | 587/1000 [06:14<04:24, 1.56it/s, loss=0.013, lr=0.001]
Steps: 59%|█████▉ | 588/1000 [06:15<04:21, 1.57it/s, loss=0.013, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4075],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4083], device='cuda:0')
Steps: 59%|█████▉ | 588/1000 [06:15<04:21, 1.57it/s, loss=0.00368, lr=0.001]
Steps: 59%|█████▉ | 589/1000 [06:15<04:23, 1.56it/s, loss=0.00368, lr=0.001]
Steps: 59%|█████▉ | 589/1000 [06:15<04:23, 1.56it/s, loss=0.00373, lr=0.001]
Steps: 59%|█████▉ | 590/1000 [06:16<04:20, 1.57it/s, loss=0.00373, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4076],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4084], device='cuda:0')
Steps: 59%|█████▉ | 590/1000 [06:16<04:20, 1.57it/s, loss=0.00137, lr=0.001]
Steps: 59%|█████▉ | 591/1000 [06:16<04:20, 1.57it/s, loss=0.00137, lr=0.001]
Steps: 59%|█████▉ | 591/1000 [06:16<04:20, 1.57it/s, loss=0.0131, lr=0.001]
Steps: 59%|█████▉ | 592/1000 [06:17<04:18, 1.58it/s, loss=0.0131, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4076],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4085], device='cuda:0')
Steps: 59%|█████▉ | 592/1000 [06:17<04:18, 1.58it/s, loss=0.014, lr=0.001]
Steps: 59%|█████▉ | 593/1000 [06:18<04:19, 1.57it/s, loss=0.014, lr=0.001]
Steps: 59%|█████▉ | 593/1000 [06:18<04:19, 1.57it/s, loss=0.000151, lr=0.001]
Steps: 59%|█████▉ | 594/1000 [06:18<04:16, 1.58it/s, loss=0.000151, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4077],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4085], device='cuda:0')
Steps: 59%|█████▉ | 594/1000 [06:18<04:16, 1.58it/s, loss=0.0421, lr=0.001]
Steps: 60%|█████▉ | 595/1000 [06:19<04:18, 1.57it/s, loss=0.0421, lr=0.001]
Steps: 60%|█████▉ | 595/1000 [06:19<04:18, 1.57it/s, loss=0.000933, lr=0.001]
Steps: 60%|█████▉ | 596/1000 [06:20<04:16, 1.57it/s, loss=0.000933, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4077],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 60%|█████▉ | 596/1000 [06:20<04:16, 1.57it/s, loss=0.00709, lr=0.001]
Steps: 60%|█████▉ | 597/1000 [06:20<04:19, 1.55it/s, loss=0.00709, lr=0.001]
Steps: 60%|█████▉ | 597/1000 [06:20<04:19, 1.55it/s, loss=0.000454, lr=0.001]
Steps: 60%|█████▉ | 598/1000 [06:21<04:15, 1.57it/s, loss=0.000454, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4078],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4087], device='cuda:0')
Steps: 60%|█████▉ | 598/1000 [06:21<04:15, 1.57it/s, loss=0.0658, lr=0.001]
Steps: 60%|█████▉ | 599/1000 [06:22<04:17, 1.56it/s, loss=0.0658, lr=0.001]
Steps: 60%|█████▉ | 599/1000 [06:22<04:17, 1.56it/s, loss=0.00331, lr=0.001]
Steps: 60%|██████ | 600/1000 [06:22<04:14, 1.57it/s, loss=0.00331, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0173, -0.0068, -0.0072, -0.0267], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0088, 0.0018, -0.0002, 0.0101], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_600.safetensors
tensor(0.0059, device='cuda:0')
tensor([[0.4078],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 60%|██████ | 600/1000 [06:22<04:14, 1.57it/s, loss=0.000187, lr=0.001]
Steps: 60%|██████ | 601/1000 [06:23<04:17, 1.55it/s, loss=0.000187, lr=0.001]
Steps: 60%|██████ | 601/1000 [06:23<04:17, 1.55it/s, loss=0.0136, lr=0.001]
Steps: 60%|██████ | 602/1000 [06:23<04:13, 1.57it/s, loss=0.0136, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4078],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 60%|██████ | 602/1000 [06:23<04:13, 1.57it/s, loss=0.0572, lr=0.001]
Steps: 60%|██████ | 603/1000 [06:24<04:14, 1.56it/s, loss=0.0572, lr=0.001]
Steps: 60%|██████ | 603/1000 [06:24<04:14, 1.56it/s, loss=0.0157, lr=0.001]
Steps: 60%|██████ | 604/1000 [06:25<04:10, 1.58it/s, loss=0.0157, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4078],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4085], device='cuda:0')
Steps: 60%|██████ | 604/1000 [06:25<04:10, 1.58it/s, loss=0.00394, lr=0.001]
Steps: 60%|██████ | 605/1000 [06:25<04:11, 1.57it/s, loss=0.00394, lr=0.001]
Steps: 60%|██████ | 605/1000 [06:25<04:11, 1.57it/s, loss=0.00151, lr=0.001]
Steps: 61%|██████ | 606/1000 [06:26<04:09, 1.58it/s, loss=0.00151, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4078],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4084], device='cuda:0')
Steps: 61%|██████ | 606/1000 [06:26<04:09, 1.58it/s, loss=0.0347, lr=0.001]
Steps: 61%|██████ | 607/1000 [06:27<04:11, 1.56it/s, loss=0.0347, lr=0.001]
Steps: 61%|██████ | 607/1000 [06:27<04:11, 1.56it/s, loss=0.00051, lr=0.001]
Steps: 61%|██████ | 608/1000 [06:27<04:08, 1.58it/s, loss=0.00051, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4077],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4084], device='cuda:0')
Steps: 61%|██████ | 608/1000 [06:27<04:08, 1.58it/s, loss=0.0035, lr=0.001]
Steps: 61%|██████ | 609/1000 [06:28<04:10, 1.56it/s, loss=0.0035, lr=0.001]
Steps: 61%|██████ | 609/1000 [06:28<04:10, 1.56it/s, loss=0.00137, lr=0.001]
Steps: 61%|██████ | 610/1000 [06:29<04:07, 1.58it/s, loss=0.00137, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4077],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4083], device='cuda:0')
Steps: 61%|██████ | 610/1000 [06:29<04:07, 1.58it/s, loss=0.00724, lr=0.001]
Steps: 61%|██████ | 611/1000 [06:29<04:08, 1.57it/s, loss=0.00724, lr=0.001]
Steps: 61%|██████ | 611/1000 [06:29<04:08, 1.57it/s, loss=0.00916, lr=0.001]
Steps: 61%|██████ | 612/1000 [06:30<04:05, 1.58it/s, loss=0.00916, lr=0.001]
tensor(0.0041, device='cuda:0')
tensor([[0.4076],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4082], device='cuda:0')
Steps: 61%|██████ | 612/1000 [06:30<04:05, 1.58it/s, loss=0.00901, lr=0.001]
Steps: 61%|██████▏ | 613/1000 [06:30<04:07, 1.56it/s, loss=0.00901, lr=0.001]
Steps: 61%|██████▏ | 613/1000 [06:30<04:07, 1.56it/s, loss=0.0118, lr=0.001]
Steps: 61%|██████▏ | 614/1000 [06:31<04:04, 1.58it/s, loss=0.0118, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4076],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4081], device='cuda:0')
Steps: 61%|██████▏ | 614/1000 [06:31<04:04, 1.58it/s, loss=0.000786, lr=0.001]
Steps: 62%|██████▏ | 615/1000 [06:32<04:05, 1.57it/s, loss=0.000786, lr=0.001]
Steps: 62%|██████▏ | 615/1000 [06:32<04:05, 1.57it/s, loss=0.126, lr=0.001]
Steps: 62%|██████▏ | 616/1000 [06:32<04:03, 1.58it/s, loss=0.126, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4075],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4080], device='cuda:0')
Steps: 62%|██████▏ | 616/1000 [06:32<04:03, 1.58it/s, loss=0.00252, lr=0.001]
Steps: 62%|██████▏ | 617/1000 [06:33<04:05, 1.56it/s, loss=0.00252, lr=0.001]
Steps: 62%|██████▏ | 617/1000 [06:33<04:05, 1.56it/s, loss=0.0111, lr=0.001]
Steps: 62%|██████▏ | 618/1000 [06:34<04:02, 1.58it/s, loss=0.0111, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4075],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4078], device='cuda:0')
Steps: 62%|██████▏ | 618/1000 [06:34<04:02, 1.58it/s, loss=0.00777, lr=0.001]
Steps: 62%|██████▏ | 619/1000 [06:34<04:03, 1.56it/s, loss=0.00777, lr=0.001]
Steps: 62%|██████▏ | 619/1000 [06:34<04:03, 1.56it/s, loss=0.00577, lr=0.001]
Steps: 62%|██████▏ | 620/1000 [06:35<04:01, 1.57it/s, loss=0.00577, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4075],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4077], device='cuda:0')
Steps: 62%|██████▏ | 620/1000 [06:35<04:01, 1.57it/s, loss=0.0139, lr=0.001]
Steps: 62%|██████▏ | 621/1000 [06:36<04:02, 1.56it/s, loss=0.0139, lr=0.001]
Steps: 62%|██████▏ | 621/1000 [06:36<04:02, 1.56it/s, loss=0.0724, lr=0.001]
Steps: 62%|██████▏ | 622/1000 [06:36<04:00, 1.57it/s, loss=0.0724, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4075],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4075], device='cuda:0')
Steps: 62%|██████▏ | 622/1000 [06:36<04:00, 1.57it/s, loss=0.000151, lr=0.001]
Steps: 62%|██████▏ | 623/1000 [06:37<04:01, 1.56it/s, loss=0.000151, lr=0.001]
Steps: 62%|██████▏ | 623/1000 [06:37<04:01, 1.56it/s, loss=0.0327, lr=0.001]
Steps: 62%|██████▏ | 624/1000 [06:37<03:58, 1.57it/s, loss=0.0327, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4074],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4074], device='cuda:0')
Steps: 62%|██████▏ | 624/1000 [06:37<03:58, 1.57it/s, loss=0.00154, lr=0.001]
Steps: 62%|██████▎ | 625/1000 [06:38<04:00, 1.56it/s, loss=0.00154, lr=0.001]
Steps: 62%|██████▎ | 625/1000 [06:38<04:00, 1.56it/s, loss=0.00364, lr=0.001]
Steps: 63%|██████▎ | 626/1000 [06:39<03:57, 1.57it/s, loss=0.00364, lr=0.001]
tensor(0.0008, device='cuda:0')
tensor([[0.4074],
[0.4080]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4072], device='cuda:0')
Steps: 63%|██████▎ | 626/1000 [06:39<03:57, 1.57it/s, loss=0.000339, lr=0.001]
Steps: 63%|██████▎ | 627/1000 [06:39<03:58, 1.56it/s, loss=0.000339, lr=0.001]
Steps: 63%|██████▎ | 627/1000 [06:39<03:58, 1.56it/s, loss=0.0116, lr=0.001]
Steps: 63%|██████▎ | 628/1000 [06:40<03:56, 1.57it/s, loss=0.0116, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4073],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4071], device='cuda:0')
Steps: 63%|██████▎ | 628/1000 [06:40<03:56, 1.57it/s, loss=0.00402, lr=0.001]
Steps: 63%|██████▎ | 629/1000 [06:41<03:57, 1.56it/s, loss=0.00402, lr=0.001]
Steps: 63%|██████▎ | 629/1000 [06:41<03:57, 1.56it/s, loss=0.0083, lr=0.001]
Steps: 63%|██████▎ | 630/1000 [06:41<03:55, 1.57it/s, loss=0.0083, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4073],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4069], device='cuda:0')
Steps: 63%|██████▎ | 630/1000 [06:41<03:55, 1.57it/s, loss=0.00442, lr=0.001]
Steps: 63%|██████▎ | 631/1000 [06:42<03:55, 1.57it/s, loss=0.00442, lr=0.001]
Steps: 63%|██████▎ | 631/1000 [06:42<03:55, 1.57it/s, loss=0.00108, lr=0.001]
Steps: 63%|██████▎ | 632/1000 [06:43<03:53, 1.58it/s, loss=0.00108, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4073],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4068], device='cuda:0')
Steps: 63%|██████▎ | 632/1000 [06:43<03:53, 1.58it/s, loss=0.0475, lr=0.001]
Steps: 63%|██████▎ | 633/1000 [06:43<03:54, 1.57it/s, loss=0.0475, lr=0.001]
Steps: 63%|██████▎ | 633/1000 [06:43<03:54, 1.57it/s, loss=0.000338, lr=0.001]
Steps: 63%|██████▎ | 634/1000 [06:44<03:52, 1.57it/s, loss=0.000338, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 63%|██████▎ | 634/1000 [06:44<03:52, 1.57it/s, loss=0.00117, lr=0.001]
Steps: 64%|██████▎ | 635/1000 [06:45<03:53, 1.56it/s, loss=0.00117, lr=0.001]
Steps: 64%|██████▎ | 635/1000 [06:45<03:53, 1.56it/s, loss=0.000606, lr=0.001]
Steps: 64%|██████▎ | 636/1000 [06:45<03:51, 1.57it/s, loss=0.000606, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4074],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4066], device='cuda:0')
Steps: 64%|██████▎ | 636/1000 [06:45<03:51, 1.57it/s, loss=0.00112, lr=0.001]
Steps: 64%|██████▎ | 637/1000 [06:46<03:52, 1.56it/s, loss=0.00112, lr=0.001]
Steps: 64%|██████▎ | 637/1000 [06:46<03:52, 1.56it/s, loss=0.00041, lr=0.001]
Steps: 64%|██████▍ | 638/1000 [06:46<03:49, 1.57it/s, loss=0.00041, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4073],
[0.4072]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4065], device='cuda:0')
Steps: 64%|██████▍ | 638/1000 [06:46<03:49, 1.57it/s, loss=0.0321, lr=0.001]
Steps: 64%|██████▍ | 639/1000 [06:47<03:51, 1.56it/s, loss=0.0321, lr=0.001]
Steps: 64%|██████▍ | 639/1000 [06:47<03:51, 1.56it/s, loss=0.0054, lr=0.001]
Steps: 64%|██████▍ | 640/1000 [06:48<03:48, 1.58it/s, loss=0.0054, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4073],
[0.4071]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4064], device='cuda:0')
Steps: 64%|██████▍ | 640/1000 [06:48<03:48, 1.58it/s, loss=0.00463, lr=0.001]
Steps: 64%|██████▍ | 641/1000 [06:48<03:49, 1.57it/s, loss=0.00463, lr=0.001]
Steps: 64%|██████▍ | 641/1000 [06:48<03:49, 1.57it/s, loss=0.00217, lr=0.001]
Steps: 64%|██████▍ | 642/1000 [06:49<03:46, 1.58it/s, loss=0.00217, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4073],
[0.4070]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4063], device='cuda:0')
Steps: 64%|██████▍ | 642/1000 [06:49<03:46, 1.58it/s, loss=0.0276, lr=0.001]
Steps: 64%|██████▍ | 643/1000 [06:50<03:47, 1.57it/s, loss=0.0276, lr=0.001]
Steps: 64%|██████▍ | 643/1000 [06:50<03:47, 1.57it/s, loss=0.0123, lr=0.001]
Steps: 64%|██████▍ | 644/1000 [06:50<03:45, 1.58it/s, loss=0.0123, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4072],
[0.4069]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4062], device='cuda:0')
Steps: 64%|██████▍ | 644/1000 [06:50<03:45, 1.58it/s, loss=0.000329, lr=0.001]
Steps: 64%|██████▍ | 645/1000 [06:51<03:46, 1.57it/s, loss=0.000329, lr=0.001]
Steps: 64%|██████▍ | 645/1000 [06:51<03:46, 1.57it/s, loss=0.0394, lr=0.001]
Steps: 65%|██████▍ | 646/1000 [06:52<03:44, 1.58it/s, loss=0.0394, lr=0.001]
tensor(0.0112, device='cuda:0')
tensor([[0.4072],
[0.4069]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4062], device='cuda:0')
Steps: 65%|██████▍ | 646/1000 [06:52<03:44, 1.58it/s, loss=0.0201, lr=0.001]
Steps: 65%|██████▍ | 647/1000 [06:52<03:45, 1.57it/s, loss=0.0201, lr=0.001]
Steps: 65%|██████▍ | 647/1000 [06:52<03:45, 1.57it/s, loss=0.0107, lr=0.001]
Steps: 65%|██████▍ | 648/1000 [06:53<03:42, 1.58it/s, loss=0.0107, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4073],
[0.4070]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4063], device='cuda:0')
Steps: 65%|██████▍ | 648/1000 [06:53<03:42, 1.58it/s, loss=0.00166, lr=0.001]
Steps: 65%|██████▍ | 649/1000 [06:53<03:43, 1.57it/s, loss=0.00166, lr=0.001]
Steps: 65%|██████▍ | 649/1000 [06:53<03:43, 1.57it/s, loss=0.00491, lr=0.001]
Steps: 65%|██████▌ | 650/1000 [06:54<03:41, 1.58it/s, loss=0.00491, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4073],
[0.4071]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4064], device='cuda:0')
Steps: 65%|██████▌ | 650/1000 [06:54<03:41, 1.58it/s, loss=0.00333, lr=0.001]
Steps: 65%|██████▌ | 651/1000 [06:55<03:43, 1.56it/s, loss=0.00333, lr=0.001]
Steps: 65%|██████▌ | 651/1000 [06:55<03:43, 1.56it/s, loss=0.000373, lr=0.001]
Steps: 65%|██████▌ | 652/1000 [06:55<03:42, 1.57it/s, loss=0.000373, lr=0.001]
tensor(0.0067, device='cuda:0')
tensor([[0.4073],
[0.4072]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4065], device='cuda:0')
Steps: 65%|██████▌ | 652/1000 [06:55<03:42, 1.57it/s, loss=0.0434, lr=0.001]
Steps: 65%|██████▌ | 653/1000 [06:56<03:42, 1.56it/s, loss=0.0434, lr=0.001]
Steps: 65%|██████▌ | 653/1000 [06:56<03:42, 1.56it/s, loss=0.00562, lr=0.001]
Steps: 65%|██████▌ | 654/1000 [06:57<03:39, 1.57it/s, loss=0.00562, lr=0.001]
tensor(0.0015, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 65%|██████▌ | 654/1000 [06:57<03:39, 1.57it/s, loss=0.000348, lr=0.001]
Steps: 66%|██████▌ | 655/1000 [06:57<03:40, 1.56it/s, loss=0.000348, lr=0.001]
Steps: 66%|██████▌ | 655/1000 [06:57<03:40, 1.56it/s, loss=0.00938, lr=0.001]
Steps: 66%|██████▌ | 656/1000 [06:58<03:37, 1.58it/s, loss=0.00938, lr=0.001]
tensor(0.0103, device='cuda:0')
tensor([[0.4074],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4069], device='cuda:0')
Steps: 66%|██████▌ | 656/1000 [06:58<03:37, 1.58it/s, loss=0.0624, lr=0.001]
Steps: 66%|██████▌ | 657/1000 [06:59<03:38, 1.57it/s, loss=0.0624, lr=0.001]
Steps: 66%|██████▌ | 657/1000 [06:59<03:38, 1.57it/s, loss=0.0186, lr=0.001]
Steps: 66%|██████▌ | 658/1000 [06:59<03:36, 1.58it/s, loss=0.0186, lr=0.001]
tensor(0.0079, device='cuda:0')
tensor([[0.4075],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4071], device='cuda:0')
Steps: 66%|██████▌ | 658/1000 [06:59<03:36, 1.58it/s, loss=0.00454, lr=0.001]
Steps: 66%|██████▌ | 659/1000 [07:00<03:37, 1.56it/s, loss=0.00454, lr=0.001]
Steps: 66%|██████▌ | 659/1000 [07:00<03:37, 1.56it/s, loss=0.0198, lr=0.001]
Steps: 66%|██████▌ | 660/1000 [07:00<03:35, 1.58it/s, loss=0.0198, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4076],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4074], device='cuda:0')
Steps: 66%|██████▌ | 660/1000 [07:00<03:35, 1.58it/s, loss=0.00106, lr=0.001]
Steps: 66%|██████▌ | 661/1000 [07:01<03:37, 1.56it/s, loss=0.00106, lr=0.001]
Steps: 66%|██████▌ | 661/1000 [07:01<03:37, 1.56it/s, loss=0.0202, lr=0.001]
Steps: 66%|██████▌ | 662/1000 [07:02<03:37, 1.56it/s, loss=0.0202, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4076],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4077], device='cuda:0')
Steps: 66%|██████▌ | 662/1000 [07:02<03:37, 1.56it/s, loss=0.0313, lr=0.001]
Steps: 66%|██████▋ | 663/1000 [07:02<03:38, 1.54it/s, loss=0.0313, lr=0.001]
Steps: 66%|██████▋ | 663/1000 [07:02<03:38, 1.54it/s, loss=0.00359, lr=0.001]
Steps: 66%|██████▋ | 664/1000 [07:03<03:35, 1.56it/s, loss=0.00359, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4077],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4079], device='cuda:0')
Steps: 66%|██████▋ | 664/1000 [07:03<03:35, 1.56it/s, loss=0.00739, lr=0.001]
Steps: 66%|██████▋ | 665/1000 [07:04<03:36, 1.55it/s, loss=0.00739, lr=0.001]
Steps: 66%|██████▋ | 665/1000 [07:04<03:36, 1.55it/s, loss=0.0146, lr=0.001]
Steps: 67%|██████▋ | 666/1000 [07:04<03:33, 1.56it/s, loss=0.0146, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4079],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4081], device='cuda:0')
Steps: 67%|██████▋ | 666/1000 [07:04<03:33, 1.56it/s, loss=0.00113, lr=0.001]
Steps: 67%|██████▋ | 667/1000 [07:05<03:33, 1.56it/s, loss=0.00113, lr=0.001]
Steps: 67%|██████▋ | 667/1000 [07:05<03:33, 1.56it/s, loss=0.0186, lr=0.001]
Steps: 67%|██████▋ | 668/1000 [07:06<03:31, 1.57it/s, loss=0.0186, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4081],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4082], device='cuda:0')
Steps: 67%|██████▋ | 668/1000 [07:06<03:31, 1.57it/s, loss=0.0289, lr=0.001]
Steps: 67%|██████▋ | 669/1000 [07:06<03:32, 1.56it/s, loss=0.0289, lr=0.001]
Steps: 67%|██████▋ | 669/1000 [07:06<03:32, 1.56it/s, loss=0.0101, lr=0.001]
Steps: 67%|██████▋ | 670/1000 [07:07<03:29, 1.57it/s, loss=0.0101, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4083],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4084], device='cuda:0')
Steps: 67%|██████▋ | 670/1000 [07:07<03:29, 1.57it/s, loss=0.00419, lr=0.001]
Steps: 67%|██████▋ | 671/1000 [07:08<03:31, 1.56it/s, loss=0.00419, lr=0.001]
Steps: 67%|██████▋ | 671/1000 [07:08<03:31, 1.56it/s, loss=0.00952, lr=0.001]
Steps: 67%|██████▋ | 672/1000 [07:08<03:28, 1.57it/s, loss=0.00952, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4086],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4085], device='cuda:0')
Steps: 67%|██████▋ | 672/1000 [07:08<03:28, 1.57it/s, loss=0.0168, lr=0.001]
Steps: 67%|██████▋ | 673/1000 [07:09<03:29, 1.56it/s, loss=0.0168, lr=0.001]
Steps: 67%|██████▋ | 673/1000 [07:09<03:29, 1.56it/s, loss=0.00155, lr=0.001]
Steps: 67%|██████▋ | 674/1000 [07:09<03:26, 1.58it/s, loss=0.00155, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4087],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4086], device='cuda:0')
Steps: 67%|██████▋ | 674/1000 [07:09<03:26, 1.58it/s, loss=0.0634, lr=0.001]
Steps: 68%|██████▊ | 675/1000 [07:10<03:28, 1.56it/s, loss=0.0634, lr=0.001]
Steps: 68%|██████▊ | 675/1000 [07:10<03:28, 1.56it/s, loss=0.0215, lr=0.001]
Steps: 68%|██████▊ | 676/1000 [07:11<03:26, 1.57it/s, loss=0.0215, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4089],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4087], device='cuda:0')
Steps: 68%|██████▊ | 676/1000 [07:11<03:26, 1.57it/s, loss=0.0023, lr=0.001]
Steps: 68%|██████▊ | 677/1000 [07:11<03:27, 1.56it/s, loss=0.0023, lr=0.001]
Steps: 68%|██████▊ | 677/1000 [07:11<03:27, 1.56it/s, loss=0.000175, lr=0.001]
Steps: 68%|██████▊ | 678/1000 [07:12<03:24, 1.57it/s, loss=0.000175, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4089],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4087], device='cuda:0')
Steps: 68%|██████▊ | 678/1000 [07:12<03:24, 1.57it/s, loss=0.00022, lr=0.001]
Steps: 68%|██████▊ | 679/1000 [07:13<03:25, 1.56it/s, loss=0.00022, lr=0.001]
Steps: 68%|██████▊ | 679/1000 [07:13<03:25, 1.56it/s, loss=0.0503, lr=0.001]
Steps: 68%|██████▊ | 680/1000 [07:13<03:23, 1.58it/s, loss=0.0503, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4090],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4088], device='cuda:0')
Steps: 68%|██████▊ | 680/1000 [07:13<03:23, 1.58it/s, loss=0.0464, lr=0.001]
Steps: 68%|██████▊ | 681/1000 [07:14<03:23, 1.56it/s, loss=0.0464, lr=0.001]
Steps: 68%|██████▊ | 681/1000 [07:14<03:23, 1.56it/s, loss=0.000616, lr=0.001]
Steps: 68%|██████▊ | 682/1000 [07:14<03:21, 1.58it/s, loss=0.000616, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4091],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4088], device='cuda:0')
Steps: 68%|██████▊ | 682/1000 [07:14<03:21, 1.58it/s, loss=0.000937, lr=0.001]
Steps: 68%|██████▊ | 683/1000 [07:15<03:21, 1.57it/s, loss=0.000937, lr=0.001]
Steps: 68%|██████▊ | 683/1000 [07:15<03:21, 1.57it/s, loss=0.00264, lr=0.001]
Steps: 68%|██████▊ | 684/1000 [07:16<03:19, 1.58it/s, loss=0.00264, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4093],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4088], device='cuda:0')
Steps: 68%|██████▊ | 684/1000 [07:16<03:19, 1.58it/s, loss=0.068, lr=0.001]
Steps: 68%|██████▊ | 685/1000 [07:16<03:19, 1.58it/s, loss=0.068, lr=0.001]
Steps: 68%|██████▊ | 685/1000 [07:16<03:19, 1.58it/s, loss=0.000431, lr=0.001]
Steps: 69%|██████▊ | 686/1000 [07:17<03:18, 1.59it/s, loss=0.000431, lr=0.001]
tensor(0.0094, device='cuda:0')
tensor([[0.4094],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4089], device='cuda:0')
Steps: 69%|██████▊ | 686/1000 [07:17<03:18, 1.59it/s, loss=0.0451, lr=0.001]
Steps: 69%|██████▊ | 687/1000 [07:18<03:19, 1.57it/s, loss=0.0451, lr=0.001]
Steps: 69%|██████▊ | 687/1000 [07:18<03:19, 1.57it/s, loss=0.000414, lr=0.001]
Steps: 69%|██████▉ | 688/1000 [07:18<03:17, 1.58it/s, loss=0.000414, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4095],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4090], device='cuda:0')
Steps: 69%|██████▉ | 688/1000 [07:18<03:17, 1.58it/s, loss=0.00126, lr=0.001]
Steps: 69%|██████▉ | 689/1000 [07:19<03:18, 1.57it/s, loss=0.00126, lr=0.001]
Steps: 69%|██████▉ | 689/1000 [07:19<03:18, 1.57it/s, loss=0.000966, lr=0.001]
Steps: 69%|██████▉ | 690/1000 [07:20<03:16, 1.58it/s, loss=0.000966, lr=0.001]
tensor(0.0064, device='cuda:0')
tensor([[0.4095],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4091], device='cuda:0')
Steps: 69%|██████▉ | 690/1000 [07:20<03:16, 1.58it/s, loss=0.0196, lr=0.001]
Steps: 69%|██████▉ | 691/1000 [07:20<03:17, 1.56it/s, loss=0.0196, lr=0.001]
Steps: 69%|██████▉ | 691/1000 [07:20<03:17, 1.56it/s, loss=0.000678, lr=0.001]
Steps: 69%|██████▉ | 692/1000 [07:21<03:15, 1.57it/s, loss=0.000678, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4096],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4092], device='cuda:0')
Steps: 69%|██████▉ | 692/1000 [07:21<03:15, 1.57it/s, loss=0.0733, lr=0.001]
Steps: 69%|██████▉ | 693/1000 [07:21<03:16, 1.56it/s, loss=0.0733, lr=0.001]
Steps: 69%|██████▉ | 693/1000 [07:21<03:16, 1.56it/s, loss=0.000746, lr=0.001]
Steps: 69%|██████▉ | 694/1000 [07:22<03:14, 1.57it/s, loss=0.000746, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4097],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4093], device='cuda:0')
Steps: 69%|██████▉ | 694/1000 [07:22<03:14, 1.57it/s, loss=0.0629, lr=0.001]
Steps: 70%|██████▉ | 695/1000 [07:23<03:15, 1.56it/s, loss=0.0629, lr=0.001]
Steps: 70%|██████▉ | 695/1000 [07:23<03:15, 1.56it/s, loss=0.00226, lr=0.001]
Steps: 70%|██████▉ | 696/1000 [07:23<03:13, 1.57it/s, loss=0.00226, lr=0.001]
tensor(0.0129, device='cuda:0')
tensor([[0.4097],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4094], device='cuda:0')
Steps: 70%|██████▉ | 696/1000 [07:23<03:13, 1.57it/s, loss=0.0491, lr=0.001]
Steps: 70%|██████▉ | 697/1000 [07:24<03:13, 1.56it/s, loss=0.0491, lr=0.001]
Steps: 70%|██████▉ | 697/1000 [07:24<03:13, 1.56it/s, loss=0.0576, lr=0.001]
Steps: 70%|██████▉ | 698/1000 [07:25<03:10, 1.58it/s, loss=0.0576, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4098],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4095], device='cuda:0')
Steps: 70%|██████▉ | 698/1000 [07:25<03:10, 1.58it/s, loss=0.0844, lr=0.001]
Steps: 70%|██████▉ | 699/1000 [07:25<03:11, 1.57it/s, loss=0.0844, lr=0.001]
Steps: 70%|██████▉ | 699/1000 [07:25<03:11, 1.57it/s, loss=0.00121, lr=0.001]
Steps: 70%|███████ | 700/1000 [07:26<03:09, 1.58it/s, loss=0.00121, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0145, -0.0087, -0.0056, -0.0218], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0014, 0.0039, -0.0050, 0.0112], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_700.safetensors
tensor(0.0117, device='cuda:0')
tensor([[0.4100],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4097], device='cuda:0')
Steps: 70%|███████ | 700/1000 [07:26<03:09, 1.58it/s, loss=0.00726, lr=0.001]
Steps: 70%|███████ | 701/1000 [07:27<03:12, 1.55it/s, loss=0.00726, lr=0.001]
Steps: 70%|███████ | 701/1000 [07:27<03:12, 1.55it/s, loss=0.0183, lr=0.001]
Steps: 70%|███████ | 702/1000 [07:27<03:09, 1.57it/s, loss=0.0183, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4104],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4100], device='cuda:0')
Steps: 70%|███████ | 702/1000 [07:27<03:09, 1.57it/s, loss=0.00239, lr=0.001]
Steps: 70%|███████ | 703/1000 [07:28<03:10, 1.56it/s, loss=0.00239, lr=0.001]
Steps: 70%|███████ | 703/1000 [07:28<03:10, 1.56it/s, loss=0.00413, lr=0.001]
Steps: 70%|███████ | 704/1000 [07:28<03:08, 1.57it/s, loss=0.00413, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4107],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4102], device='cuda:0')
Steps: 70%|███████ | 704/1000 [07:28<03:08, 1.57it/s, loss=0.00356, lr=0.001]
Steps: 70%|███████ | 705/1000 [07:29<03:08, 1.56it/s, loss=0.00356, lr=0.001]
Steps: 70%|███████ | 705/1000 [07:29<03:08, 1.56it/s, loss=0.0161, lr=0.001]
Steps: 71%|███████ | 706/1000 [07:30<03:07, 1.56it/s, loss=0.0161, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4111],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4104], device='cuda:0')
Steps: 71%|███████ | 706/1000 [07:30<03:07, 1.56it/s, loss=0.00678, lr=0.001]
Steps: 71%|███████ | 707/1000 [07:30<03:09, 1.55it/s, loss=0.00678, lr=0.001]
Steps: 71%|███████ | 707/1000 [07:30<03:09, 1.55it/s, loss=0.0347, lr=0.001]
Steps: 71%|███████ | 708/1000 [07:31<03:06, 1.57it/s, loss=0.0347, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4114],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4102, 0.4106], device='cuda:0')
Steps: 71%|███████ | 708/1000 [07:31<03:06, 1.57it/s, loss=0.000707, lr=0.001]
Steps: 71%|███████ | 709/1000 [07:32<03:06, 1.56it/s, loss=0.000707, lr=0.001]
Steps: 71%|███████ | 709/1000 [07:32<03:06, 1.56it/s, loss=0.0312, lr=0.001]
Steps: 71%|███████ | 710/1000 [07:32<03:04, 1.58it/s, loss=0.0312, lr=0.001]
tensor(0.0064, device='cuda:0')
tensor([[0.4116],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4105, 0.4109], device='cuda:0')
Steps: 71%|███████ | 710/1000 [07:32<03:04, 1.58it/s, loss=0.0412, lr=0.001]
Steps: 71%|███████ | 711/1000 [07:33<03:05, 1.56it/s, loss=0.0412, lr=0.001]
Steps: 71%|███████ | 711/1000 [07:33<03:05, 1.56it/s, loss=0.0127, lr=0.001]
Steps: 71%|███████ | 712/1000 [07:34<03:03, 1.57it/s, loss=0.0127, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4118],
[0.4123]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4111], device='cuda:0')
Steps: 71%|███████ | 712/1000 [07:34<03:03, 1.57it/s, loss=0.035, lr=0.001]
Steps: 71%|███████▏ | 713/1000 [07:34<03:04, 1.55it/s, loss=0.035, lr=0.001]
Steps: 71%|███████▏ | 713/1000 [07:34<03:04, 1.55it/s, loss=0.00396, lr=0.001]
Steps: 71%|███████▏ | 714/1000 [07:35<03:03, 1.56it/s, loss=0.00396, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4120],
[0.4124]], device='cuda:0')
Current Norm : tensor([0.4108, 0.4112], device='cuda:0')
Steps: 71%|███████▏ | 714/1000 [07:35<03:03, 1.56it/s, loss=0.0197, lr=0.001]
Steps: 72%|███████▏ | 715/1000 [07:36<03:03, 1.55it/s, loss=0.0197, lr=0.001]
Steps: 72%|███████▏ | 715/1000 [07:36<03:03, 1.55it/s, loss=0.013, lr=0.001]
Steps: 72%|███████▏ | 716/1000 [07:36<03:00, 1.57it/s, loss=0.013, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4121],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4113], device='cuda:0')
Steps: 72%|███████▏ | 716/1000 [07:36<03:00, 1.57it/s, loss=0.0397, lr=0.001]
Steps: 72%|███████▏ | 717/1000 [07:37<03:01, 1.56it/s, loss=0.0397, lr=0.001]
Steps: 72%|███████▏ | 717/1000 [07:37<03:01, 1.56it/s, loss=0.00506, lr=0.001]
Steps: 72%|███████▏ | 718/1000 [07:37<02:59, 1.57it/s, loss=0.00506, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4122],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4114], device='cuda:0')
Steps: 72%|███████▏ | 718/1000 [07:37<02:59, 1.57it/s, loss=0.198, lr=0.001]
Steps: 72%|███████▏ | 719/1000 [07:38<03:00, 1.56it/s, loss=0.198, lr=0.001]
Steps: 72%|███████▏ | 719/1000 [07:38<03:00, 1.56it/s, loss=0.00696, lr=0.001]
Steps: 72%|███████▏ | 720/1000 [07:39<02:57, 1.58it/s, loss=0.00696, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4123],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4114], device='cuda:0')
Steps: 72%|███████▏ | 720/1000 [07:39<02:57, 1.58it/s, loss=0.000396, lr=0.001]
Steps: 72%|███████▏ | 721/1000 [07:39<02:59, 1.56it/s, loss=0.000396, lr=0.001]
Steps: 72%|███████▏ | 721/1000 [07:39<02:59, 1.56it/s, loss=0.00705, lr=0.001]
Steps: 72%|███████▏ | 722/1000 [07:40<02:56, 1.57it/s, loss=0.00705, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4122],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4113], device='cuda:0')
Steps: 72%|███████▏ | 722/1000 [07:40<02:56, 1.57it/s, loss=0.00332, lr=0.001]
Steps: 72%|███████▏ | 723/1000 [07:41<02:57, 1.56it/s, loss=0.00332, lr=0.001]
Steps: 72%|███████▏ | 723/1000 [07:41<02:57, 1.56it/s, loss=0.000789, lr=0.001]
Steps: 72%|███████▏ | 724/1000 [07:41<02:55, 1.58it/s, loss=0.000789, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4121],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4112], device='cuda:0')
Steps: 72%|███████▏ | 724/1000 [07:41<02:55, 1.58it/s, loss=0.00194, lr=0.001]
Steps: 72%|███████▎ | 725/1000 [07:42<02:55, 1.57it/s, loss=0.00194, lr=0.001]
Steps: 72%|███████▎ | 725/1000 [07:42<02:55, 1.57it/s, loss=0.00128, lr=0.001]
Steps: 73%|███████▎ | 726/1000 [07:43<02:53, 1.58it/s, loss=0.00128, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4119],
[0.4123]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4111], device='cuda:0')
Steps: 73%|███████▎ | 726/1000 [07:43<02:53, 1.58it/s, loss=0.0698, lr=0.001]
Steps: 73%|███████▎ | 727/1000 [07:43<02:53, 1.57it/s, loss=0.0698, lr=0.001]
Steps: 73%|███████▎ | 727/1000 [07:43<02:53, 1.57it/s, loss=0.00162, lr=0.001]
Steps: 73%|███████▎ | 728/1000 [07:44<02:51, 1.59it/s, loss=0.00162, lr=0.001]
tensor(0.0004, device='cuda:0')
tensor([[0.4117],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4106, 0.4110], device='cuda:0')
Steps: 73%|███████▎ | 728/1000 [07:44<02:51, 1.59it/s, loss=0.000498, lr=0.001]
Steps: 73%|███████▎ | 729/1000 [07:44<02:52, 1.57it/s, loss=0.000498, lr=0.001]
Steps: 73%|███████▎ | 729/1000 [07:44<02:52, 1.57it/s, loss=0.00026, lr=0.001]
Steps: 73%|███████▎ | 730/1000 [07:45<02:50, 1.58it/s, loss=0.00026, lr=0.001]
tensor(0.0013, device='cuda:0')
tensor([[0.4115],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4104, 0.4108], device='cuda:0')
Steps: 73%|███████▎ | 730/1000 [07:45<02:50, 1.58it/s, loss=0.0024, lr=0.001]
Steps: 73%|███████▎ | 731/1000 [07:46<02:50, 1.57it/s, loss=0.0024, lr=0.001]
Steps: 73%|███████▎ | 731/1000 [07:46<02:50, 1.57it/s, loss=0.000949, lr=0.001]
Steps: 73%|███████▎ | 732/1000 [07:46<02:48, 1.59it/s, loss=0.000949, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4112],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4101, 0.4106], device='cuda:0')
Steps: 73%|███████▎ | 732/1000 [07:46<02:48, 1.59it/s, loss=0.00115, lr=0.001]
Steps: 73%|███████▎ | 733/1000 [07:47<02:50, 1.56it/s, loss=0.00115, lr=0.001]
Steps: 73%|███████▎ | 733/1000 [07:47<02:50, 1.56it/s, loss=0.0166, lr=0.001]
Steps: 73%|███████▎ | 734/1000 [07:48<02:48, 1.58it/s, loss=0.0166, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4110],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4104], device='cuda:0')
Steps: 73%|███████▎ | 734/1000 [07:48<02:48, 1.58it/s, loss=0.0649, lr=0.001]
Steps: 74%|███████▎ | 735/1000 [07:48<02:48, 1.57it/s, loss=0.0649, lr=0.001]
Steps: 74%|███████▎ | 735/1000 [07:48<02:48, 1.57it/s, loss=0.0919, lr=0.001]
Steps: 74%|███████▎ | 736/1000 [07:49<02:47, 1.58it/s, loss=0.0919, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4107],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4101], device='cuda:0')
Steps: 74%|███████▎ | 736/1000 [07:49<02:47, 1.58it/s, loss=0.00105, lr=0.001]
Steps: 74%|███████▎ | 737/1000 [07:50<02:48, 1.56it/s, loss=0.00105, lr=0.001]
Steps: 74%|███████▎ | 737/1000 [07:50<02:48, 1.56it/s, loss=0.00841, lr=0.001]
Steps: 74%|███████▍ | 738/1000 [07:50<02:45, 1.58it/s, loss=0.00841, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4104],
[0.4110]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4099], device='cuda:0')
Steps: 74%|███████▍ | 738/1000 [07:50<02:45, 1.58it/s, loss=0.00961, lr=0.001]
Steps: 74%|███████▍ | 739/1000 [07:51<02:46, 1.57it/s, loss=0.00961, lr=0.001]
Steps: 74%|███████▍ | 739/1000 [07:51<02:46, 1.57it/s, loss=0.00115, lr=0.001]
Steps: 74%|███████▍ | 740/1000 [07:51<02:44, 1.58it/s, loss=0.00115, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4103],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4097], device='cuda:0')
Steps: 74%|███████▍ | 740/1000 [07:51<02:44, 1.58it/s, loss=0.0325, lr=0.001]
Steps: 74%|███████▍ | 741/1000 [07:52<02:44, 1.57it/s, loss=0.0325, lr=0.001]
Steps: 74%|███████▍ | 741/1000 [07:52<02:44, 1.57it/s, loss=0.00165, lr=0.001]
Steps: 74%|███████▍ | 742/1000 [07:53<02:42, 1.59it/s, loss=0.00165, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4101],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4095], device='cuda:0')
Steps: 74%|███████▍ | 742/1000 [07:53<02:42, 1.59it/s, loss=0.0285, lr=0.001]
Steps: 74%|███████▍ | 743/1000 [07:53<02:43, 1.57it/s, loss=0.0285, lr=0.001]
Steps: 74%|███████▍ | 743/1000 [07:53<02:43, 1.57it/s, loss=0.00142, lr=0.001]
Steps: 74%|███████▍ | 744/1000 [07:54<02:42, 1.58it/s, loss=0.00142, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4100],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4094], device='cuda:0')
Steps: 74%|███████▍ | 744/1000 [07:54<02:42, 1.58it/s, loss=0.017, lr=0.001]
Steps: 74%|███████▍ | 745/1000 [07:55<02:43, 1.56it/s, loss=0.017, lr=0.001]
Steps: 74%|███████▍ | 745/1000 [07:55<02:43, 1.56it/s, loss=0.00199, lr=0.001]
Steps: 75%|███████▍ | 746/1000 [07:55<02:41, 1.57it/s, loss=0.00199, lr=0.001]
tensor(0.0085, device='cuda:0')
tensor([[0.4099],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4094], device='cuda:0')
Steps: 75%|███████▍ | 746/1000 [07:55<02:41, 1.57it/s, loss=0.0158, lr=0.001]
Steps: 75%|███████▍ | 747/1000 [07:56<02:41, 1.56it/s, loss=0.0158, lr=0.001]
Steps: 75%|███████▍ | 747/1000 [07:56<02:41, 1.56it/s, loss=0.012, lr=0.001]
Steps: 75%|███████▍ | 748/1000 [07:57<02:39, 1.58it/s, loss=0.012, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4098],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4094], device='cuda:0')
Steps: 75%|███████▍ | 748/1000 [07:57<02:39, 1.58it/s, loss=0.0097, lr=0.001]
Steps: 75%|███████▍ | 749/1000 [07:57<02:40, 1.56it/s, loss=0.0097, lr=0.001]
Steps: 75%|███████▍ | 749/1000 [07:57<02:40, 1.56it/s, loss=0.00167, lr=0.001]
Steps: 75%|███████▌ | 750/1000 [07:58<02:38, 1.58it/s, loss=0.00167, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4098],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4093], device='cuda:0')
Steps: 75%|███████▌ | 750/1000 [07:58<02:38, 1.58it/s, loss=0.0012, lr=0.001]
Steps: 75%|███████▌ | 751/1000 [07:58<02:39, 1.56it/s, loss=0.0012, lr=0.001]
Steps: 75%|███████▌ | 751/1000 [07:58<02:39, 1.56it/s, loss=0.00566, lr=0.001]
Steps: 75%|███████▌ | 752/1000 [07:59<02:37, 1.57it/s, loss=0.00566, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4097],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4093], device='cuda:0')
Steps: 75%|███████▌ | 752/1000 [07:59<02:37, 1.57it/s, loss=0.000511, lr=0.001]
Steps: 75%|███████▌ | 753/1000 [08:00<02:38, 1.56it/s, loss=0.000511, lr=0.001]
Steps: 75%|███████▌ | 753/1000 [08:00<02:38, 1.56it/s, loss=0.00294, lr=0.001]
Steps: 75%|███████▌ | 754/1000 [08:00<02:36, 1.58it/s, loss=0.00294, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4096],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4092], device='cuda:0')
Steps: 75%|███████▌ | 754/1000 [08:00<02:36, 1.58it/s, loss=0.00867, lr=0.001]
Steps: 76%|███████▌ | 755/1000 [08:01<02:36, 1.57it/s, loss=0.00867, lr=0.001]
Steps: 76%|███████▌ | 755/1000 [08:01<02:36, 1.57it/s, loss=0.013, lr=0.001]
Steps: 76%|███████▌ | 756/1000 [08:02<02:34, 1.58it/s, loss=0.013, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4095],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4092], device='cuda:0')
Steps: 76%|███████▌ | 756/1000 [08:02<02:34, 1.58it/s, loss=0.0197, lr=0.001]
Steps: 76%|███████▌ | 757/1000 [08:02<02:34, 1.57it/s, loss=0.0197, lr=0.001]
Steps: 76%|███████▌ | 757/1000 [08:02<02:34, 1.57it/s, loss=0.00643, lr=0.001]
Steps: 76%|███████▌ | 758/1000 [08:03<02:32, 1.58it/s, loss=0.00643, lr=0.001]
tensor(0.0008, device='cuda:0')
tensor([[0.4094],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4091], device='cuda:0')
Steps: 76%|███████▌ | 758/1000 [08:03<02:32, 1.58it/s, loss=0.000187, lr=0.001]
Steps: 76%|███████▌ | 759/1000 [08:04<02:33, 1.57it/s, loss=0.000187, lr=0.001]
Steps: 76%|███████▌ | 759/1000 [08:04<02:33, 1.57it/s, loss=0.00284, lr=0.001]
Steps: 76%|███████▌ | 760/1000 [08:04<02:30, 1.59it/s, loss=0.00284, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4092],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4090], device='cuda:0')
Steps: 76%|███████▌ | 760/1000 [08:04<02:30, 1.59it/s, loss=0.016, lr=0.001]
Steps: 76%|███████▌ | 761/1000 [08:05<02:32, 1.56it/s, loss=0.016, lr=0.001]
Steps: 76%|███████▌ | 761/1000 [08:05<02:32, 1.56it/s, loss=0.000488, lr=0.001]
Steps: 76%|███████▌ | 762/1000 [08:05<02:31, 1.57it/s, loss=0.000488, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4090],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4090], device='cuda:0')
Steps: 76%|███████▌ | 762/1000 [08:05<02:31, 1.57it/s, loss=0.000765, lr=0.001]
Steps: 76%|███████▋ | 763/1000 [08:06<02:31, 1.56it/s, loss=0.000765, lr=0.001]
Steps: 76%|███████▋ | 763/1000 [08:06<02:31, 1.56it/s, loss=0.0178, lr=0.001]
Steps: 76%|███████▋ | 764/1000 [08:07<02:29, 1.58it/s, loss=0.0178, lr=0.001]
tensor(0.0050, device='cuda:0')
tensor([[0.4088],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4089], device='cuda:0')
Steps: 76%|███████▋ | 764/1000 [08:07<02:29, 1.58it/s, loss=0.0123, lr=0.001]
Steps: 76%|███████▋ | 765/1000 [08:07<02:29, 1.57it/s, loss=0.0123, lr=0.001]
Steps: 76%|███████▋ | 765/1000 [08:07<02:29, 1.57it/s, loss=0.00927, lr=0.001]
Steps: 77%|███████▋ | 766/1000 [08:08<02:27, 1.58it/s, loss=0.00927, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4087],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4089], device='cuda:0')
Steps: 77%|███████▋ | 766/1000 [08:08<02:27, 1.58it/s, loss=0.0215, lr=0.001]
Steps: 77%|███████▋ | 767/1000 [08:09<02:28, 1.57it/s, loss=0.0215, lr=0.001]
Steps: 77%|███████▋ | 767/1000 [08:09<02:28, 1.57it/s, loss=0.0185, lr=0.001]
Steps: 77%|███████▋ | 768/1000 [08:09<02:26, 1.58it/s, loss=0.0185, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4085],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4088], device='cuda:0')
Steps: 77%|███████▋ | 768/1000 [08:09<02:26, 1.58it/s, loss=0.00787, lr=0.001]
Steps: 77%|███████▋ | 769/1000 [08:10<02:27, 1.56it/s, loss=0.00787, lr=0.001]
Steps: 77%|███████▋ | 769/1000 [08:10<02:27, 1.56it/s, loss=0.0034, lr=0.001]
Steps: 77%|███████▋ | 770/1000 [08:11<02:26, 1.57it/s, loss=0.0034, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4084],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4088], device='cuda:0')
Steps: 77%|███████▋ | 770/1000 [08:11<02:26, 1.57it/s, loss=0.00167, lr=0.001]
Steps: 77%|███████▋ | 771/1000 [08:11<02:27, 1.55it/s, loss=0.00167, lr=0.001]
Steps: 77%|███████▋ | 771/1000 [08:11<02:27, 1.55it/s, loss=0.0461, lr=0.001]
Steps: 77%|███████▋ | 772/1000 [08:12<02:25, 1.57it/s, loss=0.0461, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4082],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4087], device='cuda:0')
Steps: 77%|███████▋ | 772/1000 [08:12<02:25, 1.57it/s, loss=0.0114, lr=0.001]
Steps: 77%|███████▋ | 773/1000 [08:12<02:25, 1.56it/s, loss=0.0114, lr=0.001]
Steps: 77%|███████▋ | 773/1000 [08:12<02:25, 1.56it/s, loss=0.00369, lr=0.001]
Steps: 77%|███████▋ | 774/1000 [08:13<02:24, 1.57it/s, loss=0.00369, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4080],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4087], device='cuda:0')
Steps: 77%|███████▋ | 774/1000 [08:13<02:24, 1.57it/s, loss=0.0384, lr=0.001]
Steps: 78%|███████▊ | 775/1000 [08:14<02:24, 1.56it/s, loss=0.0384, lr=0.001]
Steps: 78%|███████▊ | 775/1000 [08:14<02:24, 1.56it/s, loss=0.0432, lr=0.001]
Steps: 78%|███████▊ | 776/1000 [08:14<02:22, 1.57it/s, loss=0.0432, lr=0.001]
tensor(0.0095, device='cuda:0')
tensor([[0.4079],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 776/1000 [08:14<02:22, 1.57it/s, loss=0.00367, lr=0.001]
Steps: 78%|███████▊ | 777/1000 [08:15<02:23, 1.56it/s, loss=0.00367, lr=0.001]
Steps: 78%|███████▊ | 777/1000 [08:15<02:23, 1.56it/s, loss=0.0117, lr=0.001]
Steps: 78%|███████▊ | 778/1000 [08:16<02:20, 1.57it/s, loss=0.0117, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4078],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 778/1000 [08:16<02:20, 1.57it/s, loss=0.00418, lr=0.001]
Steps: 78%|███████▊ | 779/1000 [08:16<02:21, 1.56it/s, loss=0.00418, lr=0.001]
Steps: 78%|███████▊ | 779/1000 [08:16<02:21, 1.56it/s, loss=0.00894, lr=0.001]
Steps: 78%|███████▊ | 780/1000 [08:17<02:19, 1.57it/s, loss=0.00894, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4078],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 780/1000 [08:17<02:19, 1.57it/s, loss=0.00025, lr=0.001]
Steps: 78%|███████▊ | 781/1000 [08:18<02:20, 1.56it/s, loss=0.00025, lr=0.001]
Steps: 78%|███████▊ | 781/1000 [08:18<02:20, 1.56it/s, loss=0.0261, lr=0.001]
Steps: 78%|███████▊ | 782/1000 [08:18<02:18, 1.57it/s, loss=0.0261, lr=0.001]
tensor(0.0098, device='cuda:0')
tensor([[0.4079],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 782/1000 [08:18<02:18, 1.57it/s, loss=0.0055, lr=0.001]
Steps: 78%|███████▊ | 783/1000 [08:19<02:18, 1.57it/s, loss=0.0055, lr=0.001]
Steps: 78%|███████▊ | 783/1000 [08:19<02:18, 1.57it/s, loss=0.0204, lr=0.001]
Steps: 78%|███████▊ | 784/1000 [08:19<02:16, 1.58it/s, loss=0.0204, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4080],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4088], device='cuda:0')
Steps: 78%|███████▊ | 784/1000 [08:19<02:16, 1.58it/s, loss=0.0061, lr=0.001]
Steps: 78%|███████▊ | 785/1000 [08:20<02:16, 1.57it/s, loss=0.0061, lr=0.001]
Steps: 78%|███████▊ | 785/1000 [08:20<02:16, 1.57it/s, loss=0.0105, lr=0.001]
Steps: 79%|███████▊ | 786/1000 [08:21<02:15, 1.58it/s, loss=0.0105, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4081],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4088], device='cuda:0')
Steps: 79%|███████▊ | 786/1000 [08:21<02:15, 1.58it/s, loss=0.0608, lr=0.001]
Steps: 79%|███████▊ | 787/1000 [08:21<02:17, 1.55it/s, loss=0.0608, lr=0.001]
Steps: 79%|███████▊ | 787/1000 [08:21<02:17, 1.55it/s, loss=0.00173, lr=0.001]
Steps: 79%|███████▉ | 788/1000 [08:22<02:14, 1.58it/s, loss=0.00173, lr=0.001]
tensor(0.0013, device='cuda:0')
tensor([[0.4083],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4089], device='cuda:0')
Steps: 79%|███████▉ | 788/1000 [08:22<02:14, 1.58it/s, loss=0.00145, lr=0.001]
Steps: 79%|███████▉ | 789/1000 [08:23<02:14, 1.56it/s, loss=0.00145, lr=0.001]
Steps: 79%|███████▉ | 789/1000 [08:23<02:14, 1.56it/s, loss=0.00302, lr=0.001]
Steps: 79%|███████▉ | 790/1000 [08:23<02:13, 1.58it/s, loss=0.00302, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4085],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4090], device='cuda:0')
Steps: 79%|███████▉ | 790/1000 [08:23<02:13, 1.58it/s, loss=0.00383, lr=0.001]
Steps: 79%|███████▉ | 791/1000 [08:24<02:13, 1.56it/s, loss=0.00383, lr=0.001]
Steps: 79%|███████▉ | 791/1000 [08:24<02:13, 1.56it/s, loss=0.004, lr=0.001]
Steps: 79%|███████▉ | 792/1000 [08:25<02:11, 1.58it/s, loss=0.004, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4085],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4091], device='cuda:0')
Steps: 79%|███████▉ | 792/1000 [08:25<02:11, 1.58it/s, loss=0.0108, lr=0.001]
Steps: 79%|███████▉ | 793/1000 [08:25<02:12, 1.56it/s, loss=0.0108, lr=0.001]
Steps: 79%|███████▉ | 793/1000 [08:25<02:12, 1.56it/s, loss=0.00444, lr=0.001]
Steps: 79%|███████▉ | 794/1000 [08:26<02:10, 1.57it/s, loss=0.00444, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4086],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 79%|███████▉ | 794/1000 [08:26<02:10, 1.57it/s, loss=0.000504, lr=0.001]
Steps: 80%|███████▉ | 795/1000 [08:26<02:11, 1.56it/s, loss=0.000504, lr=0.001]
Steps: 80%|███████▉ | 795/1000 [08:26<02:11, 1.56it/s, loss=0.00754, lr=0.001]
Steps: 80%|███████▉ | 796/1000 [08:27<02:09, 1.58it/s, loss=0.00754, lr=0.001]
tensor(0.0015, device='cuda:0')
tensor([[0.4086],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 80%|███████▉ | 796/1000 [08:27<02:09, 1.58it/s, loss=0.000605, lr=0.001]
Steps: 80%|███████▉ | 797/1000 [08:28<02:09, 1.56it/s, loss=0.000605, lr=0.001]
Steps: 80%|███████▉ | 797/1000 [08:28<02:09, 1.56it/s, loss=0.0101, lr=0.001]
Steps: 80%|███████▉ | 798/1000 [08:28<02:08, 1.57it/s, loss=0.0101, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4091], device='cuda:0')
Steps: 80%|███████▉ | 798/1000 [08:28<02:08, 1.57it/s, loss=0.0274, lr=0.001]
Steps: 80%|███████▉ | 799/1000 [08:29<02:08, 1.56it/s, loss=0.0274, lr=0.001]
Steps: 80%|███████▉ | 799/1000 [08:29<02:08, 1.56it/s, loss=0.00328, lr=0.001]
Steps: 80%|████████ | 800/1000 [08:30<02:07, 1.57it/s, loss=0.00328, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0130, 0.0020, -0.0095, -0.0193], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([-0.0026, 0.0098, -0.0090, 0.0132], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_800.safetensors
tensor(0.0061, device='cuda:0')
tensor([[0.4083],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4091], device='cuda:0')
Steps: 80%|████████ | 800/1000 [08:30<02:07, 1.57it/s, loss=0.00261, lr=0.001]
Steps: 80%|████████ | 801/1000 [08:30<02:07, 1.56it/s, loss=0.00261, lr=0.001]
Steps: 80%|████████ | 801/1000 [08:30<02:07, 1.56it/s, loss=0.149, lr=0.001]
Steps: 80%|████████ | 802/1000 [08:31<02:05, 1.58it/s, loss=0.149, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4082],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4090], device='cuda:0')
Steps: 80%|████████ | 802/1000 [08:31<02:05, 1.58it/s, loss=0.0241, lr=0.001]
Steps: 80%|████████ | 803/1000 [08:32<02:05, 1.57it/s, loss=0.0241, lr=0.001]
Steps: 80%|████████ | 803/1000 [08:32<02:05, 1.57it/s, loss=0.000268, lr=0.001]
Steps: 80%|████████ | 804/1000 [08:32<02:03, 1.58it/s, loss=0.000268, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4080],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4090], device='cuda:0')
Steps: 80%|████████ | 804/1000 [08:32<02:03, 1.58it/s, loss=0.0149, lr=0.001]
Steps: 80%|████████ | 805/1000 [08:33<02:03, 1.58it/s, loss=0.0149, lr=0.001]
Steps: 80%|████████ | 805/1000 [08:33<02:03, 1.58it/s, loss=0.00414, lr=0.001]
Steps: 81%|████████ | 806/1000 [08:33<02:02, 1.58it/s, loss=0.00414, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4079],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4089], device='cuda:0')
Steps: 81%|████████ | 806/1000 [08:33<02:02, 1.58it/s, loss=0.00103, lr=0.001]
Steps: 81%|████████ | 807/1000 [08:34<02:02, 1.57it/s, loss=0.00103, lr=0.001]
Steps: 81%|████████ | 807/1000 [08:34<02:02, 1.57it/s, loss=0.00419, lr=0.001]
Steps: 81%|████████ | 808/1000 [08:35<02:00, 1.59it/s, loss=0.00419, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4078],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4088], device='cuda:0')
Steps: 81%|████████ | 808/1000 [08:35<02:00, 1.59it/s, loss=0.000414, lr=0.001]
Steps: 81%|████████ | 809/1000 [08:35<02:01, 1.58it/s, loss=0.000414, lr=0.001]
Steps: 81%|████████ | 809/1000 [08:35<02:01, 1.58it/s, loss=0.000762, lr=0.001]
Steps: 81%|████████ | 810/1000 [08:36<01:59, 1.59it/s, loss=0.000762, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4077],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4087], device='cuda:0')
Steps: 81%|████████ | 810/1000 [08:36<01:59, 1.59it/s, loss=0.00104, lr=0.001]
Steps: 81%|████████ | 811/1000 [08:37<02:00, 1.57it/s, loss=0.00104, lr=0.001]
Steps: 81%|████████ | 811/1000 [08:37<02:00, 1.57it/s, loss=0.000707, lr=0.001]
Steps: 81%|████████ | 812/1000 [08:37<01:59, 1.58it/s, loss=0.000707, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4076],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4086], device='cuda:0')
Steps: 81%|████████ | 812/1000 [08:37<01:59, 1.58it/s, loss=0.0044, lr=0.001]
Steps: 81%|████████▏ | 813/1000 [08:38<01:59, 1.57it/s, loss=0.0044, lr=0.001]
Steps: 81%|████████▏ | 813/1000 [08:38<01:59, 1.57it/s, loss=0.0651, lr=0.001]
Steps: 81%|████████▏ | 814/1000 [08:39<01:57, 1.58it/s, loss=0.0651, lr=0.001]
tensor(0.0085, device='cuda:0')
tensor([[0.4076],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4085], device='cuda:0')
Steps: 81%|████████▏ | 814/1000 [08:39<01:57, 1.58it/s, loss=0.0106, lr=0.001]
Steps: 82%|████████▏ | 815/1000 [08:39<01:57, 1.57it/s, loss=0.0106, lr=0.001]
Steps: 82%|████████▏ | 815/1000 [08:39<01:57, 1.57it/s, loss=0.0596, lr=0.001]
Steps: 82%|████████▏ | 816/1000 [08:40<01:56, 1.58it/s, loss=0.0596, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4076],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4084], device='cuda:0')
Steps: 82%|████████▏ | 816/1000 [08:40<01:56, 1.58it/s, loss=0.0189, lr=0.001]
Steps: 82%|████████▏ | 817/1000 [08:40<01:56, 1.57it/s, loss=0.0189, lr=0.001]
Steps: 82%|████████▏ | 817/1000 [08:40<01:56, 1.57it/s, loss=0.0145, lr=0.001]
Steps: 82%|████████▏ | 818/1000 [08:41<01:54, 1.58it/s, loss=0.0145, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4076],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 818/1000 [08:41<01:54, 1.58it/s, loss=0.00208, lr=0.001]
Steps: 82%|████████▏ | 819/1000 [08:42<01:55, 1.57it/s, loss=0.00208, lr=0.001]
Steps: 82%|████████▏ | 819/1000 [08:42<01:55, 1.57it/s, loss=0.00879, lr=0.001]
Steps: 82%|████████▏ | 820/1000 [08:42<01:53, 1.58it/s, loss=0.00879, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4076],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 820/1000 [08:42<01:53, 1.58it/s, loss=0.000313, lr=0.001]
Steps: 82%|████████▏ | 821/1000 [08:43<01:54, 1.56it/s, loss=0.000313, lr=0.001]
Steps: 82%|████████▏ | 821/1000 [08:43<01:54, 1.56it/s, loss=0.011, lr=0.001]
Steps: 82%|████████▏ | 822/1000 [08:44<01:52, 1.58it/s, loss=0.011, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4076],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 822/1000 [08:44<01:52, 1.58it/s, loss=0.0259, lr=0.001]
Steps: 82%|████████▏ | 823/1000 [08:44<01:53, 1.57it/s, loss=0.0259, lr=0.001]
Steps: 82%|████████▏ | 823/1000 [08:44<01:53, 1.57it/s, loss=0.00135, lr=0.001]
Steps: 82%|████████▏ | 824/1000 [08:45<01:51, 1.59it/s, loss=0.00135, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4077],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 824/1000 [08:45<01:51, 1.59it/s, loss=0.0033, lr=0.001]
Steps: 82%|████████▎ | 825/1000 [08:46<01:51, 1.57it/s, loss=0.0033, lr=0.001]
Steps: 82%|████████▎ | 825/1000 [08:46<01:51, 1.57it/s, loss=0.00391, lr=0.001]
Steps: 83%|████████▎ | 826/1000 [08:46<01:49, 1.58it/s, loss=0.00391, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4078],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4083], device='cuda:0')
Steps: 83%|████████▎ | 826/1000 [08:46<01:49, 1.58it/s, loss=0.000745, lr=0.001]
Steps: 83%|████████▎ | 827/1000 [08:47<01:50, 1.57it/s, loss=0.000745, lr=0.001]
Steps: 83%|████████▎ | 827/1000 [08:47<01:50, 1.57it/s, loss=0.00252, lr=0.001]
Steps: 83%|████████▎ | 828/1000 [08:47<01:48, 1.58it/s, loss=0.00252, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4078],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4083], device='cuda:0')
Steps: 83%|████████▎ | 828/1000 [08:47<01:48, 1.58it/s, loss=0.00584, lr=0.001]
Steps: 83%|████████▎ | 829/1000 [08:48<01:48, 1.57it/s, loss=0.00584, lr=0.001]
Steps: 83%|████████▎ | 829/1000 [08:48<01:48, 1.57it/s, loss=0.013, lr=0.001]
Steps: 83%|████████▎ | 830/1000 [08:49<01:47, 1.59it/s, loss=0.013, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4079],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4084], device='cuda:0')
Steps: 83%|████████▎ | 830/1000 [08:49<01:47, 1.59it/s, loss=0.00509, lr=0.001]
Steps: 83%|████████▎ | 831/1000 [08:49<01:47, 1.57it/s, loss=0.00509, lr=0.001]
Steps: 83%|████████▎ | 831/1000 [08:49<01:47, 1.57it/s, loss=0.0333, lr=0.001]
Steps: 83%|████████▎ | 832/1000 [08:50<01:45, 1.59it/s, loss=0.0333, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4080],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4084], device='cuda:0')
Steps: 83%|████████▎ | 832/1000 [08:50<01:45, 1.59it/s, loss=0.00102, lr=0.001]
Steps: 83%|████████▎ | 833/1000 [08:51<01:46, 1.57it/s, loss=0.00102, lr=0.001]
Steps: 83%|████████▎ | 833/1000 [08:51<01:46, 1.57it/s, loss=0.00504, lr=0.001]
Steps: 83%|████████▎ | 834/1000 [08:51<01:45, 1.58it/s, loss=0.00504, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4081],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4084], device='cuda:0')
Steps: 83%|████████▎ | 834/1000 [08:51<01:45, 1.58it/s, loss=0.0171, lr=0.001]
Steps: 84%|████████▎ | 835/1000 [08:52<01:45, 1.56it/s, loss=0.0171, lr=0.001]
Steps: 84%|████████▎ | 835/1000 [08:52<01:45, 1.56it/s, loss=0.00969, lr=0.001]
Steps: 84%|████████▎ | 836/1000 [08:52<01:44, 1.58it/s, loss=0.00969, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4081],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4085], device='cuda:0')
Steps: 84%|████████▎ | 836/1000 [08:52<01:44, 1.58it/s, loss=0.00105, lr=0.001]
Steps: 84%|████████▎ | 837/1000 [08:53<01:44, 1.57it/s, loss=0.00105, lr=0.001]
Steps: 84%|████████▎ | 837/1000 [08:53<01:44, 1.57it/s, loss=0.000342, lr=0.001]
Steps: 84%|████████▍ | 838/1000 [08:54<01:42, 1.58it/s, loss=0.000342, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4082],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4085], device='cuda:0')
Steps: 84%|████████▍ | 838/1000 [08:54<01:42, 1.58it/s, loss=0.0134, lr=0.001]
Steps: 84%|████████▍ | 839/1000 [08:54<01:42, 1.57it/s, loss=0.0134, lr=0.001]
Steps: 84%|████████▍ | 839/1000 [08:54<01:42, 1.57it/s, loss=0.0579, lr=0.001]
Steps: 84%|████████▍ | 840/1000 [08:55<01:41, 1.57it/s, loss=0.0579, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4081],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4084], device='cuda:0')
Steps: 84%|████████▍ | 840/1000 [08:55<01:41, 1.57it/s, loss=0.00562, lr=0.001]
Steps: 84%|████████▍ | 841/1000 [08:56<01:41, 1.56it/s, loss=0.00562, lr=0.001]
Steps: 84%|████████▍ | 841/1000 [08:56<01:41, 1.56it/s, loss=0.000182, lr=0.001]
Steps: 84%|████████▍ | 842/1000 [08:56<01:40, 1.57it/s, loss=0.000182, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4080],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4083], device='cuda:0')
Steps: 84%|████████▍ | 842/1000 [08:56<01:40, 1.57it/s, loss=0.00387, lr=0.001]
Steps: 84%|████████▍ | 843/1000 [08:57<01:40, 1.57it/s, loss=0.00387, lr=0.001]
Steps: 84%|████████▍ | 843/1000 [08:57<01:40, 1.57it/s, loss=0.00379, lr=0.001]
Steps: 84%|████████▍ | 844/1000 [08:58<01:38, 1.59it/s, loss=0.00379, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4079],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4082], device='cuda:0')
Steps: 84%|████████▍ | 844/1000 [08:58<01:38, 1.59it/s, loss=0.00819, lr=0.001]
Steps: 84%|████████▍ | 845/1000 [08:58<01:38, 1.57it/s, loss=0.00819, lr=0.001]
Steps: 84%|████████▍ | 845/1000 [08:58<01:38, 1.57it/s, loss=0.00126, lr=0.001]
Steps: 85%|████████▍ | 846/1000 [08:59<01:37, 1.58it/s, loss=0.00126, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4078],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4081], device='cuda:0')
Steps: 85%|████████▍ | 846/1000 [08:59<01:37, 1.58it/s, loss=0.000191, lr=0.001]
Steps: 85%|████████▍ | 847/1000 [08:59<01:37, 1.57it/s, loss=0.000191, lr=0.001]
Steps: 85%|████████▍ | 847/1000 [08:59<01:37, 1.57it/s, loss=0.00379, lr=0.001]
Steps: 85%|████████▍ | 848/1000 [09:00<01:35, 1.58it/s, loss=0.00379, lr=0.001]
tensor(0.0109, device='cuda:0')
tensor([[0.4077],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4079], device='cuda:0')
Steps: 85%|████████▍ | 848/1000 [09:00<01:35, 1.58it/s, loss=0.0238, lr=0.001]
Steps: 85%|████████▍ | 849/1000 [09:01<01:35, 1.58it/s, loss=0.0238, lr=0.001]
Steps: 85%|████████▍ | 849/1000 [09:01<01:35, 1.58it/s, loss=0.0042, lr=0.001]
Steps: 85%|████████▌ | 850/1000 [09:01<01:34, 1.59it/s, loss=0.0042, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4075],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4078], device='cuda:0')
Steps: 85%|████████▌ | 850/1000 [09:01<01:34, 1.59it/s, loss=0.0154, lr=0.001]
Steps: 85%|████████▌ | 851/1000 [09:02<01:34, 1.58it/s, loss=0.0154, lr=0.001]
Steps: 85%|████████▌ | 851/1000 [09:02<01:34, 1.58it/s, loss=0.0067, lr=0.001]
Steps: 85%|████████▌ | 852/1000 [09:03<01:33, 1.58it/s, loss=0.0067, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4074],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4076], device='cuda:0')
Steps: 85%|████████▌ | 852/1000 [09:03<01:33, 1.58it/s, loss=0.000584, lr=0.001]
Steps: 85%|████████▌ | 853/1000 [09:03<01:33, 1.56it/s, loss=0.000584, lr=0.001]
Steps: 85%|████████▌ | 853/1000 [09:03<01:33, 1.56it/s, loss=0.0849, lr=0.001]
Steps: 85%|████████▌ | 854/1000 [09:04<01:32, 1.57it/s, loss=0.0849, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4073],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4076], device='cuda:0')
Steps: 85%|████████▌ | 854/1000 [09:04<01:32, 1.57it/s, loss=0.036, lr=0.001]
Steps: 86%|████████▌ | 855/1000 [09:05<01:32, 1.56it/s, loss=0.036, lr=0.001]
Steps: 86%|████████▌ | 855/1000 [09:05<01:32, 1.56it/s, loss=0.016, lr=0.001]
Steps: 86%|████████▌ | 856/1000 [09:05<01:31, 1.58it/s, loss=0.016, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4074],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4076], device='cuda:0')
Steps: 86%|████████▌ | 856/1000 [09:05<01:31, 1.58it/s, loss=0.018, lr=0.001]
Steps: 86%|████████▌ | 857/1000 [09:06<01:31, 1.57it/s, loss=0.018, lr=0.001]
Steps: 86%|████████▌ | 857/1000 [09:06<01:31, 1.57it/s, loss=0.0116, lr=0.001]
Steps: 86%|████████▌ | 858/1000 [09:06<01:29, 1.58it/s, loss=0.0116, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4074],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4077], device='cuda:0')
Steps: 86%|████████▌ | 858/1000 [09:06<01:29, 1.58it/s, loss=0.00814, lr=0.001]
Steps: 86%|████████▌ | 859/1000 [09:07<01:29, 1.57it/s, loss=0.00814, lr=0.001]
Steps: 86%|████████▌ | 859/1000 [09:07<01:29, 1.57it/s, loss=0.00405, lr=0.001]
Steps: 86%|████████▌ | 860/1000 [09:08<01:28, 1.58it/s, loss=0.00405, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4075],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4077], device='cuda:0')
Steps: 86%|████████▌ | 860/1000 [09:08<01:28, 1.58it/s, loss=0.0098, lr=0.001]
Steps: 86%|████████▌ | 861/1000 [09:08<01:28, 1.57it/s, loss=0.0098, lr=0.001]
Steps: 86%|████████▌ | 861/1000 [09:08<01:28, 1.57it/s, loss=0.01, lr=0.001]
Steps: 86%|████████▌ | 862/1000 [09:09<01:27, 1.58it/s, loss=0.01, lr=0.001]
tensor(0.0073, device='cuda:0')
tensor([[0.4077],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4077], device='cuda:0')
Steps: 86%|████████▌ | 862/1000 [09:09<01:27, 1.58it/s, loss=0.00465, lr=0.001]
Steps: 86%|████████▋ | 863/1000 [09:10<01:27, 1.57it/s, loss=0.00465, lr=0.001]
Steps: 86%|████████▋ | 863/1000 [09:10<01:27, 1.57it/s, loss=0.0772, lr=0.001]
Steps: 86%|████████▋ | 864/1000 [09:10<01:26, 1.58it/s, loss=0.0772, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4079],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4077], device='cuda:0')
Steps: 86%|████████▋ | 864/1000 [09:10<01:26, 1.58it/s, loss=0.000222, lr=0.001]
Steps: 86%|████████▋ | 865/1000 [09:11<01:26, 1.56it/s, loss=0.000222, lr=0.001]
Steps: 86%|████████▋ | 865/1000 [09:11<01:26, 1.56it/s, loss=0.015, lr=0.001]
Steps: 87%|████████▋ | 866/1000 [09:12<01:24, 1.58it/s, loss=0.015, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4081],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4078], device='cuda:0')
Steps: 87%|████████▋ | 866/1000 [09:12<01:24, 1.58it/s, loss=0.0187, lr=0.001]
Steps: 87%|████████▋ | 867/1000 [09:12<01:25, 1.56it/s, loss=0.0187, lr=0.001]
Steps: 87%|████████▋ | 867/1000 [09:12<01:25, 1.56it/s, loss=0.0145, lr=0.001]
Steps: 87%|████████▋ | 868/1000 [09:13<01:23, 1.57it/s, loss=0.0145, lr=0.001]
tensor(0.0118, device='cuda:0')
tensor([[0.4083],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4079], device='cuda:0')
Steps: 87%|████████▋ | 868/1000 [09:13<01:23, 1.57it/s, loss=0.0673, lr=0.001]
Steps: 87%|████████▋ | 869/1000 [09:13<01:23, 1.56it/s, loss=0.0673, lr=0.001]
Steps: 87%|████████▋ | 869/1000 [09:13<01:23, 1.56it/s, loss=0.052, lr=0.001]
Steps: 87%|████████▋ | 870/1000 [09:14<01:22, 1.58it/s, loss=0.052, lr=0.001]
tensor(0.0075, device='cuda:0')
tensor([[0.4086],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4080], device='cuda:0')
Steps: 87%|████████▋ | 870/1000 [09:14<01:22, 1.58it/s, loss=0.0164, lr=0.001]
Steps: 87%|████████▋ | 871/1000 [09:15<01:22, 1.57it/s, loss=0.0164, lr=0.001]
Steps: 87%|████████▋ | 871/1000 [09:15<01:22, 1.57it/s, loss=0.0497, lr=0.001]
Steps: 87%|████████▋ | 872/1000 [09:15<01:20, 1.58it/s, loss=0.0497, lr=0.001]
tensor(0.0013, device='cuda:0')
tensor([[0.4089],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4082], device='cuda:0')
Steps: 87%|████████▋ | 872/1000 [09:15<01:20, 1.58it/s, loss=0.000777, lr=0.001]
Steps: 87%|████████▋ | 873/1000 [09:16<01:20, 1.57it/s, loss=0.000777, lr=0.001]
Steps: 87%|████████▋ | 873/1000 [09:16<01:20, 1.57it/s, loss=0.00841, lr=0.001]
Steps: 87%|████████▋ | 874/1000 [09:17<01:19, 1.58it/s, loss=0.00841, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4092],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4084], device='cuda:0')
Steps: 87%|████████▋ | 874/1000 [09:17<01:19, 1.58it/s, loss=0.0873, lr=0.001]
Steps: 88%|████████▊ | 875/1000 [09:17<01:19, 1.57it/s, loss=0.0873, lr=0.001]
Steps: 88%|████████▊ | 875/1000 [09:17<01:19, 1.57it/s, loss=0.000283, lr=0.001]
Steps: 88%|████████▊ | 876/1000 [09:18<01:18, 1.57it/s, loss=0.000283, lr=0.001]
tensor(0.0094, device='cuda:0')
tensor([[0.4094],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4085], device='cuda:0')
Steps: 88%|████████▊ | 876/1000 [09:18<01:18, 1.57it/s, loss=0.0306, lr=0.001]
Steps: 88%|████████▊ | 877/1000 [09:19<01:18, 1.56it/s, loss=0.0306, lr=0.001]
Steps: 88%|████████▊ | 877/1000 [09:19<01:18, 1.56it/s, loss=0.0644, lr=0.001]
Steps: 88%|████████▊ | 878/1000 [09:19<01:17, 1.58it/s, loss=0.0644, lr=0.001]
tensor(0.0087, device='cuda:0')
tensor([[0.4096],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4089], device='cuda:0')
Steps: 88%|████████▊ | 878/1000 [09:19<01:17, 1.58it/s, loss=0.0373, lr=0.001]
Steps: 88%|████████▊ | 879/1000 [09:20<01:16, 1.58it/s, loss=0.0373, lr=0.001]
Steps: 88%|████████▊ | 879/1000 [09:20<01:16, 1.58it/s, loss=0.00094, lr=0.001]
Steps: 88%|████████▊ | 880/1000 [09:20<01:15, 1.59it/s, loss=0.00094, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4099],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4092], device='cuda:0')
Steps: 88%|████████▊ | 880/1000 [09:20<01:15, 1.59it/s, loss=0.0125, lr=0.001]
Steps: 88%|████████▊ | 881/1000 [09:21<01:15, 1.58it/s, loss=0.0125, lr=0.001]
Steps: 88%|████████▊ | 881/1000 [09:21<01:15, 1.58it/s, loss=0.0143, lr=0.001]
Steps: 88%|████████▊ | 882/1000 [09:22<01:14, 1.59it/s, loss=0.0143, lr=0.001]
tensor(0.0109, device='cuda:0')
tensor([[0.4102],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4096], device='cuda:0')
Steps: 88%|████████▊ | 882/1000 [09:22<01:14, 1.59it/s, loss=0.0168, lr=0.001]
Steps: 88%|████████▊ | 883/1000 [09:22<01:14, 1.58it/s, loss=0.0168, lr=0.001]
Steps: 88%|████████▊ | 883/1000 [09:22<01:14, 1.58it/s, loss=0.00371, lr=0.001]
Steps: 88%|████████▊ | 884/1000 [09:23<01:13, 1.59it/s, loss=0.00371, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4104],
[0.4110]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4099], device='cuda:0')
Steps: 88%|████████▊ | 884/1000 [09:23<01:13, 1.59it/s, loss=0.000795, lr=0.001]
Steps: 88%|████████▊ | 885/1000 [09:24<01:13, 1.58it/s, loss=0.000795, lr=0.001]
Steps: 88%|████████▊ | 885/1000 [09:24<01:13, 1.58it/s, loss=0.00487, lr=0.001]
Steps: 89%|████████▊ | 886/1000 [09:24<01:12, 1.58it/s, loss=0.00487, lr=0.001]
tensor(0.0044, device='cuda:0')
tensor([[0.4106],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4101], device='cuda:0')
Steps: 89%|████████▊ | 886/1000 [09:24<01:12, 1.58it/s, loss=0.000165, lr=0.001]
Steps: 89%|████████▊ | 887/1000 [09:25<01:11, 1.57it/s, loss=0.000165, lr=0.001]
Steps: 89%|████████▊ | 887/1000 [09:25<01:11, 1.57it/s, loss=0.011, lr=0.001]
Steps: 89%|████████▉ | 888/1000 [09:25<01:10, 1.59it/s, loss=0.011, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4108],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4103], device='cuda:0')
Steps: 89%|████████▉ | 888/1000 [09:25<01:10, 1.59it/s, loss=0.000376, lr=0.001]
Steps: 89%|████████▉ | 889/1000 [09:26<01:10, 1.58it/s, loss=0.000376, lr=0.001]
Steps: 89%|████████▉ | 889/1000 [09:26<01:10, 1.58it/s, loss=0.00271, lr=0.001]
Steps: 89%|████████▉ | 890/1000 [09:27<01:09, 1.58it/s, loss=0.00271, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4109],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4105], device='cuda:0')
Steps: 89%|████████▉ | 890/1000 [09:27<01:09, 1.58it/s, loss=0.107, lr=0.001]
Steps: 89%|████████▉ | 891/1000 [09:27<01:09, 1.57it/s, loss=0.107, lr=0.001]
Steps: 89%|████████▉ | 891/1000 [09:27<01:09, 1.57it/s, loss=0.00925, lr=0.001]
Steps: 89%|████████▉ | 892/1000 [09:28<01:07, 1.59it/s, loss=0.00925, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4110],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4106], device='cuda:0')
Steps: 89%|████████▉ | 892/1000 [09:28<01:07, 1.59it/s, loss=0.00599, lr=0.001]
Steps: 89%|████████▉ | 893/1000 [09:29<01:07, 1.58it/s, loss=0.00599, lr=0.001]
Steps: 89%|████████▉ | 893/1000 [09:29<01:07, 1.58it/s, loss=0.00141, lr=0.001]
Steps: 89%|████████▉ | 894/1000 [09:29<01:06, 1.60it/s, loss=0.00141, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4110],
[0.4119]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4107], device='cuda:0')
Steps: 89%|████████▉ | 894/1000 [09:29<01:06, 1.60it/s, loss=0.00575, lr=0.001]
Steps: 90%|████████▉ | 895/1000 [09:30<01:06, 1.58it/s, loss=0.00575, lr=0.001]
Steps: 90%|████████▉ | 895/1000 [09:30<01:06, 1.58it/s, loss=0.00116, lr=0.001]
Steps: 90%|████████▉ | 896/1000 [09:31<01:05, 1.59it/s, loss=0.00116, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 90%|████████▉ | 896/1000 [09:31<01:05, 1.59it/s, loss=0.0673, lr=0.001]
Steps: 90%|████████▉ | 897/1000 [09:31<01:05, 1.58it/s, loss=0.0673, lr=0.001]
Steps: 90%|████████▉ | 897/1000 [09:31<01:05, 1.58it/s, loss=0.0103, lr=0.001]
Steps: 90%|████████▉ | 898/1000 [09:32<01:04, 1.59it/s, loss=0.0103, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 90%|████████▉ | 898/1000 [09:32<01:04, 1.59it/s, loss=0.04, lr=0.001]
Steps: 90%|████████▉ | 899/1000 [09:32<01:03, 1.58it/s, loss=0.04, lr=0.001]
Steps: 90%|████████▉ | 899/1000 [09:32<01:03, 1.58it/s, loss=0.0017, lr=0.001]
Steps: 90%|█████████ | 900/1000 [09:33<01:02, 1.59it/s, loss=0.0017, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0191, 0.0026, -0.0084, -0.0223], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0061, 0.0070, -0.0055, 0.0103], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_900.safetensors
tensor(0.0058, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 90%|█████████ | 900/1000 [09:33<01:02, 1.59it/s, loss=0.00088, lr=0.001]
Steps: 90%|█████████ | 901/1000 [09:34<01:02, 1.58it/s, loss=0.00088, lr=0.001]
Steps: 90%|█████████ | 901/1000 [09:34<01:02, 1.58it/s, loss=0.0159, lr=0.001]
Steps: 90%|█████████ | 902/1000 [09:34<01:01, 1.58it/s, loss=0.0159, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 90%|█████████ | 902/1000 [09:34<01:01, 1.58it/s, loss=0.00016, lr=0.001]
Steps: 90%|█████████ | 903/1000 [09:35<01:01, 1.57it/s, loss=0.00016, lr=0.001]
Steps: 90%|█████████ | 903/1000 [09:35<01:01, 1.57it/s, loss=0.00761, lr=0.001]
Steps: 90%|█████████ | 904/1000 [09:36<01:00, 1.58it/s, loss=0.00761, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4110],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4110], device='cuda:0')
Steps: 90%|█████████ | 904/1000 [09:36<01:00, 1.58it/s, loss=0.00334, lr=0.001]
Steps: 90%|█████████ | 905/1000 [09:36<01:00, 1.57it/s, loss=0.00334, lr=0.001]
Steps: 90%|█████████ | 905/1000 [09:36<01:00, 1.57it/s, loss=0.00641, lr=0.001]
Steps: 91%|█████████ | 906/1000 [09:37<00:59, 1.59it/s, loss=0.00641, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4110],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 906/1000 [09:37<00:59, 1.59it/s, loss=0.00472, lr=0.001]
Steps: 91%|█████████ | 907/1000 [09:38<00:58, 1.58it/s, loss=0.00472, lr=0.001]
Steps: 91%|█████████ | 907/1000 [09:38<00:58, 1.58it/s, loss=0.00117, lr=0.001]
Steps: 91%|█████████ | 908/1000 [09:38<00:57, 1.59it/s, loss=0.00117, lr=0.001]
tensor(0.0101, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 908/1000 [09:38<00:57, 1.59it/s, loss=0.0283, lr=0.001]
Steps: 91%|█████████ | 909/1000 [09:39<00:57, 1.58it/s, loss=0.0283, lr=0.001]
Steps: 91%|█████████ | 909/1000 [09:39<00:57, 1.58it/s, loss=0.0145, lr=0.001]
Steps: 91%|█████████ | 910/1000 [09:39<00:56, 1.58it/s, loss=0.0145, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 910/1000 [09:39<00:56, 1.58it/s, loss=0.000187, lr=0.001]
Steps: 91%|█████████ | 911/1000 [09:40<00:56, 1.56it/s, loss=0.000187, lr=0.001]
Steps: 91%|█████████ | 911/1000 [09:40<00:56, 1.56it/s, loss=0.0178, lr=0.001]
Steps: 91%|█████████ | 912/1000 [09:41<00:55, 1.58it/s, loss=0.0178, lr=0.001]
tensor(0.0091, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 912/1000 [09:41<00:55, 1.58it/s, loss=0.0288, lr=0.001]
Steps: 91%|█████████▏| 913/1000 [09:41<00:55, 1.57it/s, loss=0.0288, lr=0.001]
Steps: 91%|█████████▏| 913/1000 [09:41<00:55, 1.57it/s, loss=0.0104, lr=0.001]
Steps: 91%|█████████▏| 914/1000 [09:42<00:54, 1.58it/s, loss=0.0104, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4111],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4109], device='cuda:0')
Steps: 91%|█████████▏| 914/1000 [09:42<00:54, 1.58it/s, loss=0.00321, lr=0.001]
Steps: 92%|█████████▏| 915/1000 [09:43<00:54, 1.57it/s, loss=0.00321, lr=0.001]
Steps: 92%|█████████▏| 915/1000 [09:43<00:54, 1.57it/s, loss=0.00369, lr=0.001]
Steps: 92%|█████████▏| 916/1000 [09:43<00:53, 1.58it/s, loss=0.00369, lr=0.001]
tensor(0.0083, device='cuda:0')
tensor([[0.4111],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4109], device='cuda:0')
Steps: 92%|█████████▏| 916/1000 [09:43<00:53, 1.58it/s, loss=0.0451, lr=0.001]
Steps: 92%|█████████▏| 917/1000 [09:44<00:52, 1.57it/s, loss=0.0451, lr=0.001]
Steps: 92%|█████████▏| 917/1000 [09:44<00:52, 1.57it/s, loss=0.00412, lr=0.001]
Steps: 92%|█████████▏| 918/1000 [09:44<00:51, 1.58it/s, loss=0.00412, lr=0.001]
tensor(0.0099, device='cuda:0')
tensor([[0.4111],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4109], device='cuda:0')
Steps: 92%|█████████▏| 918/1000 [09:44<00:51, 1.58it/s, loss=0.0192, lr=0.001]
Steps: 92%|█████████▏| 919/1000 [09:45<00:51, 1.57it/s, loss=0.0192, lr=0.001]
Steps: 92%|█████████▏| 919/1000 [09:45<00:51, 1.57it/s, loss=0.00786, lr=0.001]
Steps: 92%|█████████▏| 920/1000 [09:46<00:50, 1.58it/s, loss=0.00786, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4111],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4108], device='cuda:0')
Steps: 92%|█████████▏| 920/1000 [09:46<00:50, 1.58it/s, loss=0.00157, lr=0.001]
Steps: 92%|█████████▏| 921/1000 [09:46<00:50, 1.57it/s, loss=0.00157, lr=0.001]
Steps: 92%|█████████▏| 921/1000 [09:46<00:50, 1.57it/s, loss=0.00689, lr=0.001]
Steps: 92%|█████████▏| 922/1000 [09:47<00:49, 1.58it/s, loss=0.00689, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4111],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4108], device='cuda:0')
Steps: 92%|█████████▏| 922/1000 [09:47<00:49, 1.58it/s, loss=0.00361, lr=0.001]
Steps: 92%|█████████▏| 923/1000 [09:48<00:49, 1.56it/s, loss=0.00361, lr=0.001]
Steps: 92%|█████████▏| 923/1000 [09:48<00:49, 1.56it/s, loss=0.000176, lr=0.001]
Steps: 92%|█████████▏| 924/1000 [09:48<00:48, 1.57it/s, loss=0.000176, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 92%|█████████▏| 924/1000 [09:48<00:48, 1.57it/s, loss=0.0106, lr=0.001]
Steps: 92%|█████████▎| 925/1000 [09:49<00:47, 1.57it/s, loss=0.0106, lr=0.001]
Steps: 92%|█████████▎| 925/1000 [09:49<00:47, 1.57it/s, loss=0.0317, lr=0.001]
Steps: 93%|█████████▎| 926/1000 [09:50<00:46, 1.59it/s, loss=0.0317, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4109],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4108], device='cuda:0')
Steps: 93%|█████████▎| 926/1000 [09:50<00:46, 1.59it/s, loss=0.000973, lr=0.001]
Steps: 93%|█████████▎| 927/1000 [09:50<00:46, 1.58it/s, loss=0.000973, lr=0.001]
Steps: 93%|█████████▎| 927/1000 [09:50<00:46, 1.58it/s, loss=0.00414, lr=0.001]
Steps: 93%|█████████▎| 928/1000 [09:51<00:45, 1.59it/s, loss=0.00414, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4108],
[0.4119]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4107], device='cuda:0')
Steps: 93%|█████████▎| 928/1000 [09:51<00:45, 1.59it/s, loss=0.00288, lr=0.001]
Steps: 93%|█████████▎| 929/1000 [09:51<00:44, 1.58it/s, loss=0.00288, lr=0.001]
Steps: 93%|█████████▎| 929/1000 [09:51<00:44, 1.58it/s, loss=0.0412, lr=0.001]
Steps: 93%|█████████▎| 930/1000 [09:52<00:44, 1.58it/s, loss=0.0412, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4106],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4106], device='cuda:0')
Steps: 93%|█████████▎| 930/1000 [09:52<00:44, 1.58it/s, loss=0.000763, lr=0.001]
Steps: 93%|█████████▎| 931/1000 [09:53<00:43, 1.57it/s, loss=0.000763, lr=0.001]
Steps: 93%|█████████▎| 931/1000 [09:53<00:43, 1.57it/s, loss=0.0899, lr=0.001]
Steps: 93%|█████████▎| 932/1000 [09:53<00:43, 1.58it/s, loss=0.0899, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4105],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4104], device='cuda:0')
Steps: 93%|█████████▎| 932/1000 [09:53<00:43, 1.58it/s, loss=0.00604, lr=0.001]
Steps: 93%|█████████▎| 933/1000 [09:54<00:42, 1.58it/s, loss=0.00604, lr=0.001]
Steps: 93%|█████████▎| 933/1000 [09:54<00:42, 1.58it/s, loss=0.00167, lr=0.001]
Steps: 93%|█████████▎| 934/1000 [09:55<00:41, 1.59it/s, loss=0.00167, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4102],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4103], device='cuda:0')
Steps: 93%|█████████▎| 934/1000 [09:55<00:41, 1.59it/s, loss=0.00276, lr=0.001]
Steps: 94%|█████████▎| 935/1000 [09:55<00:41, 1.57it/s, loss=0.00276, lr=0.001]
Steps: 94%|█████████▎| 935/1000 [09:55<00:41, 1.57it/s, loss=0.0862, lr=0.001]
Steps: 94%|█████████▎| 936/1000 [09:56<00:40, 1.58it/s, loss=0.0862, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4099],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4103], device='cuda:0')
Steps: 94%|█████████▎| 936/1000 [09:56<00:40, 1.58it/s, loss=0.0186, lr=0.001]
Steps: 94%|█████████▎| 937/1000 [09:57<00:40, 1.57it/s, loss=0.0186, lr=0.001]
Steps: 94%|█████████▎| 937/1000 [09:57<00:40, 1.57it/s, loss=0.000916, lr=0.001]
Steps: 94%|█████████▍| 938/1000 [09:57<00:39, 1.58it/s, loss=0.000916, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4096],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4102], device='cuda:0')
Steps: 94%|█████████▍| 938/1000 [09:57<00:39, 1.58it/s, loss=0.0088, lr=0.001]
Steps: 94%|█████████▍| 939/1000 [09:58<00:38, 1.57it/s, loss=0.0088, lr=0.001]
Steps: 94%|█████████▍| 939/1000 [09:58<00:38, 1.57it/s, loss=0.000176, lr=0.001]
Steps: 94%|█████████▍| 940/1000 [09:58<00:37, 1.59it/s, loss=0.000176, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4093],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4101], device='cuda:0')
Steps: 94%|█████████▍| 940/1000 [09:58<00:37, 1.59it/s, loss=0.0042, lr=0.001]
Steps: 94%|█████████▍| 941/1000 [09:59<00:37, 1.57it/s, loss=0.0042, lr=0.001]
Steps: 94%|█████████▍| 941/1000 [09:59<00:37, 1.57it/s, loss=0.0133, lr=0.001]
Steps: 94%|█████████▍| 942/1000 [10:00<00:36, 1.58it/s, loss=0.0133, lr=0.001]
tensor(0.0099, device='cuda:0')
tensor([[0.4091],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4101], device='cuda:0')
Steps: 94%|█████████▍| 942/1000 [10:00<00:36, 1.58it/s, loss=0.092, lr=0.001]
Steps: 94%|█████████▍| 943/1000 [10:00<00:36, 1.57it/s, loss=0.092, lr=0.001]
Steps: 94%|█████████▍| 943/1000 [10:00<00:36, 1.57it/s, loss=0.0613, lr=0.001]
Steps: 94%|█████████▍| 944/1000 [10:01<00:35, 1.59it/s, loss=0.0613, lr=0.001]
tensor(0.0089, device='cuda:0')
tensor([[0.4090],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4101], device='cuda:0')
Steps: 94%|█████████▍| 944/1000 [10:01<00:35, 1.59it/s, loss=0.019, lr=0.001]
Steps: 94%|█████████▍| 945/1000 [10:02<00:34, 1.58it/s, loss=0.019, lr=0.001]
Steps: 94%|█████████▍| 945/1000 [10:02<00:34, 1.58it/s, loss=0.0166, lr=0.001]
Steps: 95%|█████████▍| 946/1000 [10:02<00:34, 1.58it/s, loss=0.0166, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4090],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4101], device='cuda:0')
Steps: 95%|█████████▍| 946/1000 [10:02<00:34, 1.58it/s, loss=0.00476, lr=0.001]
Steps: 95%|█████████▍| 947/1000 [10:03<00:33, 1.57it/s, loss=0.00476, lr=0.001]
Steps: 95%|█████████▍| 947/1000 [10:03<00:33, 1.57it/s, loss=0.00987, lr=0.001]
Steps: 95%|█████████▍| 948/1000 [10:04<00:32, 1.58it/s, loss=0.00987, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4090],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4102], device='cuda:0')
Steps: 95%|█████████▍| 948/1000 [10:04<00:32, 1.58it/s, loss=0.00904, lr=0.001]
Steps: 95%|█████████▍| 949/1000 [10:04<00:32, 1.57it/s, loss=0.00904, lr=0.001]
Steps: 95%|█████████▍| 949/1000 [10:04<00:32, 1.57it/s, loss=0.00788, lr=0.001]
Steps: 95%|█████████▌| 950/1000 [10:05<00:31, 1.58it/s, loss=0.00788, lr=0.001]
tensor(0.0036, device='cuda:0')
tensor([[0.4091],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4102], device='cuda:0')
Steps: 95%|█████████▌| 950/1000 [10:05<00:31, 1.58it/s, loss=0.00126, lr=0.001]
Steps: 95%|█████████▌| 951/1000 [10:05<00:31, 1.56it/s, loss=0.00126, lr=0.001]
Steps: 95%|█████████▌| 951/1000 [10:05<00:31, 1.56it/s, loss=0.00716, lr=0.001]
Steps: 95%|█████████▌| 952/1000 [10:06<00:30, 1.58it/s, loss=0.00716, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4091],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4102], device='cuda:0')
Steps: 95%|█████████▌| 952/1000 [10:06<00:30, 1.58it/s, loss=0.000269, lr=0.001]
Steps: 95%|█████████▌| 953/1000 [10:07<00:30, 1.56it/s, loss=0.000269, lr=0.001]
Steps: 95%|█████████▌| 953/1000 [10:07<00:30, 1.56it/s, loss=0.0689, lr=0.001]
Steps: 95%|█████████▌| 954/1000 [10:07<00:29, 1.58it/s, loss=0.0689, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4091],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4101], device='cuda:0')
Steps: 95%|█████████▌| 954/1000 [10:07<00:29, 1.58it/s, loss=0.0537, lr=0.001]
Steps: 96%|█████████▌| 955/1000 [10:08<00:28, 1.56it/s, loss=0.0537, lr=0.001]
Steps: 96%|█████████▌| 955/1000 [10:08<00:28, 1.56it/s, loss=0.0128, lr=0.001]
Steps: 96%|█████████▌| 956/1000 [10:09<00:27, 1.58it/s, loss=0.0128, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4092],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4100], device='cuda:0')
Steps: 96%|█████████▌| 956/1000 [10:09<00:27, 1.58it/s, loss=0.0269, lr=0.001]
Steps: 96%|█████████▌| 957/1000 [10:09<00:27, 1.57it/s, loss=0.0269, lr=0.001]
Steps: 96%|█████████▌| 957/1000 [10:09<00:27, 1.57it/s, loss=0.0107, lr=0.001]
Steps: 96%|█████████▌| 958/1000 [10:10<00:26, 1.58it/s, loss=0.0107, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4092],
[0.4110]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4099], device='cuda:0')
Steps: 96%|█████████▌| 958/1000 [10:10<00:26, 1.58it/s, loss=0.0081, lr=0.001]
Steps: 96%|█████████▌| 959/1000 [10:11<00:26, 1.57it/s, loss=0.0081, lr=0.001]
Steps: 96%|█████████▌| 959/1000 [10:11<00:26, 1.57it/s, loss=0.00707, lr=0.001]
Steps: 96%|█████████▌| 960/1000 [10:11<00:25, 1.59it/s, loss=0.00707, lr=0.001]
tensor(0.0100, device='cuda:0')
tensor([[0.4093],
[0.4109]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4098], device='cuda:0')
Steps: 96%|█████████▌| 960/1000 [10:11<00:25, 1.59it/s, loss=0.0223, lr=0.001]
Steps: 96%|█████████▌| 961/1000 [10:12<00:24, 1.58it/s, loss=0.0223, lr=0.001]
Steps: 96%|█████████▌| 961/1000 [10:12<00:24, 1.58it/s, loss=0.0405, lr=0.001]
Steps: 96%|█████████▌| 962/1000 [10:12<00:23, 1.59it/s, loss=0.0405, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4094],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4097], device='cuda:0')
Steps: 96%|█████████▌| 962/1000 [10:12<00:23, 1.59it/s, loss=0.00538, lr=0.001]
Steps: 96%|█████████▋| 963/1000 [10:13<00:23, 1.57it/s, loss=0.00538, lr=0.001]
Steps: 96%|█████████▋| 963/1000 [10:13<00:23, 1.57it/s, loss=0.000406, lr=0.001]
Steps: 96%|█████████▋| 964/1000 [10:14<00:22, 1.58it/s, loss=0.000406, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4094],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4097], device='cuda:0')
Steps: 96%|█████████▋| 964/1000 [10:14<00:22, 1.58it/s, loss=0.00454, lr=0.001]
Steps: 96%|█████████▋| 965/1000 [10:14<00:22, 1.57it/s, loss=0.00454, lr=0.001]
Steps: 96%|█████████▋| 965/1000 [10:14<00:22, 1.57it/s, loss=0.0148, lr=0.001]
Steps: 97%|█████████▋| 966/1000 [10:15<00:21, 1.58it/s, loss=0.0148, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4095],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4096], device='cuda:0')
Steps: 97%|█████████▋| 966/1000 [10:15<00:21, 1.58it/s, loss=0.000232, lr=0.001]
Steps: 97%|█████████▋| 967/1000 [10:16<00:21, 1.56it/s, loss=0.000232, lr=0.001]
Steps: 97%|█████████▋| 967/1000 [10:16<00:21, 1.56it/s, loss=0.00795, lr=0.001]
Steps: 97%|█████████▋| 968/1000 [10:16<00:20, 1.58it/s, loss=0.00795, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4095],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4095], device='cuda:0')
Steps: 97%|█████████▋| 968/1000 [10:16<00:20, 1.58it/s, loss=0.00367, lr=0.001]
Steps: 97%|█████████▋| 969/1000 [10:17<00:19, 1.56it/s, loss=0.00367, lr=0.001]
Steps: 97%|█████████▋| 969/1000 [10:17<00:19, 1.56it/s, loss=0.00094, lr=0.001]
Steps: 97%|█████████▋| 970/1000 [10:17<00:19, 1.58it/s, loss=0.00094, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4095],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4095], device='cuda:0')
Steps: 97%|█████████▋| 970/1000 [10:17<00:19, 1.58it/s, loss=0.00166, lr=0.001]
Steps: 97%|█████████▋| 971/1000 [10:18<00:18, 1.56it/s, loss=0.00166, lr=0.001]
Steps: 97%|█████████▋| 971/1000 [10:18<00:18, 1.56it/s, loss=0.0325, lr=0.001]
Steps: 97%|█████████▋| 972/1000 [10:19<00:17, 1.58it/s, loss=0.0325, lr=0.001]
tensor(0.0064, device='cuda:0')
tensor([[0.4094],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4094], device='cuda:0')
Steps: 97%|█████████▋| 972/1000 [10:19<00:17, 1.58it/s, loss=0.0677, lr=0.001]
Steps: 97%|█████████▋| 973/1000 [10:19<00:17, 1.57it/s, loss=0.0677, lr=0.001]
Steps: 97%|█████████▋| 973/1000 [10:19<00:17, 1.57it/s, loss=0.00378, lr=0.001]
Steps: 97%|█████████▋| 974/1000 [10:20<00:16, 1.57it/s, loss=0.00378, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4094],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4093], device='cuda:0')
Steps: 97%|█████████▋| 974/1000 [10:20<00:16, 1.57it/s, loss=0.00283, lr=0.001]
Steps: 98%|█████████▊| 975/1000 [10:21<00:15, 1.56it/s, loss=0.00283, lr=0.001]
Steps: 98%|█████████▊| 975/1000 [10:21<00:15, 1.56it/s, loss=0.0066, lr=0.001]
Steps: 98%|█████████▊| 976/1000 [10:21<00:15, 1.57it/s, loss=0.0066, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4094],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4092], device='cuda:0')
Steps: 98%|█████████▊| 976/1000 [10:21<00:15, 1.57it/s, loss=0.00529, lr=0.001]
Steps: 98%|█████████▊| 977/1000 [10:22<00:14, 1.56it/s, loss=0.00529, lr=0.001]
Steps: 98%|█████████▊| 977/1000 [10:22<00:14, 1.56it/s, loss=0.000486, lr=0.001]
Steps: 98%|█████████▊| 978/1000 [10:23<00:13, 1.57it/s, loss=0.000486, lr=0.001]
tensor(0.0005, device='cuda:0')
tensor([[0.4093],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4090], device='cuda:0')
Steps: 98%|█████████▊| 978/1000 [10:23<00:13, 1.57it/s, loss=0.000342, lr=0.001]
Steps: 98%|█████████▊| 979/1000 [10:23<00:13, 1.56it/s, loss=0.000342, lr=0.001]
Steps: 98%|█████████▊| 979/1000 [10:23<00:13, 1.56it/s, loss=0.000447, lr=0.001]
Steps: 98%|█████████▊| 980/1000 [10:24<00:12, 1.57it/s, loss=0.000447, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4092],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4089], device='cuda:0')
Steps: 98%|█████████▊| 980/1000 [10:24<00:12, 1.57it/s, loss=0.00251, lr=0.001]
Steps: 98%|█████████▊| 981/1000 [10:25<00:12, 1.57it/s, loss=0.00251, lr=0.001]
Steps: 98%|█████████▊| 981/1000 [10:25<00:12, 1.57it/s, loss=0.00702, lr=0.001]
Steps: 98%|█████████▊| 982/1000 [10:25<00:11, 1.58it/s, loss=0.00702, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4091],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4087], device='cuda:0')
Steps: 98%|█████████▊| 982/1000 [10:25<00:11, 1.58it/s, loss=0.00368, lr=0.001]
Steps: 98%|█████████▊| 983/1000 [10:26<00:10, 1.57it/s, loss=0.00368, lr=0.001]
Steps: 98%|█████████▊| 983/1000 [10:26<00:10, 1.57it/s, loss=0.00127, lr=0.001]
Steps: 98%|█████████▊| 984/1000 [10:26<00:10, 1.59it/s, loss=0.00127, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4089],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4085], device='cuda:0')
Steps: 98%|█████████▊| 984/1000 [10:26<00:10, 1.59it/s, loss=0.0239, lr=0.001]
Steps: 98%|█████████▊| 985/1000 [10:27<00:09, 1.58it/s, loss=0.0239, lr=0.001]
Steps: 98%|█████████▊| 985/1000 [10:27<00:09, 1.58it/s, loss=0.00117, lr=0.001]
Steps: 99%|█████████▊| 986/1000 [10:28<00:08, 1.59it/s, loss=0.00117, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4088],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4083], device='cuda:0')
Steps: 99%|█████████▊| 986/1000 [10:28<00:08, 1.59it/s, loss=0.000254, lr=0.001]
Steps: 99%|█████████▊| 987/1000 [10:28<00:08, 1.58it/s, loss=0.000254, lr=0.001]
Steps: 99%|█████████▊| 987/1000 [10:28<00:08, 1.58it/s, loss=0.0089, lr=0.001]
Steps: 99%|█████████▉| 988/1000 [10:29<00:07, 1.59it/s, loss=0.0089, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4087],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4080], device='cuda:0')
Steps: 99%|█████████▉| 988/1000 [10:29<00:07, 1.59it/s, loss=0.000653, lr=0.001]
Steps: 99%|█████████▉| 989/1000 [10:30<00:07, 1.57it/s, loss=0.000653, lr=0.001]
Steps: 99%|█████████▉| 989/1000 [10:30<00:07, 1.57it/s, loss=0.00647, lr=0.001]
Steps: 99%|█████████▉| 990/1000 [10:30<00:06, 1.58it/s, loss=0.00647, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4085],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4078], device='cuda:0')
Steps: 99%|█████████▉| 990/1000 [10:30<00:06, 1.58it/s, loss=0.00136, lr=0.001]
Steps: 99%|█████████▉| 991/1000 [10:31<00:05, 1.57it/s, loss=0.00136, lr=0.001]
Steps: 99%|█████████▉| 991/1000 [10:31<00:05, 1.57it/s, loss=0.0052, lr=0.001]
Steps: 99%|█████████▉| 992/1000 [10:31<00:05, 1.58it/s, loss=0.0052, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4083],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4075], device='cuda:0')
Steps: 99%|█████████▉| 992/1000 [10:31<00:05, 1.58it/s, loss=0.0147, lr=0.001]
Steps: 99%|█████████▉| 993/1000 [10:32<00:04, 1.57it/s, loss=0.0147, lr=0.001]
Steps: 99%|█████████▉| 993/1000 [10:32<00:04, 1.57it/s, loss=0.000513, lr=0.001]
Steps: 99%|█████████▉| 994/1000 [10:33<00:03, 1.59it/s, loss=0.000513, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4081],
[0.4080]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4072], device='cuda:0')
Steps: 99%|█████████▉| 994/1000 [10:33<00:03, 1.59it/s, loss=0.0951, lr=0.001]
Steps: 100%|█████████▉| 995/1000 [10:33<00:03, 1.57it/s, loss=0.0951, lr=0.001]
Steps: 100%|█████████▉| 995/1000 [10:33<00:03, 1.57it/s, loss=0.000628, lr=0.001]
Steps: 100%|█████████▉| 996/1000 [10:34<00:02, 1.58it/s, loss=0.000628, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4079],
[0.4078]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4070], device='cuda:0')
Steps: 100%|█████████▉| 996/1000 [10:34<00:02, 1.58it/s, loss=0.0297, lr=0.001]
Steps: 100%|█████████▉| 997/1000 [10:35<00:01, 1.57it/s, loss=0.0297, lr=0.001]
Steps: 100%|█████████▉| 997/1000 [10:35<00:01, 1.57it/s, loss=0.00103, lr=0.001]
Steps: 100%|█████████▉| 998/1000 [10:35<00:01, 1.58it/s, loss=0.00103, lr=0.001]
tensor(0.0121, device='cuda:0')
tensor([[0.4078],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4068], device='cuda:0')
Steps: 100%|█████████▉| 998/1000 [10:35<00:01, 1.58it/s, loss=0.0109, lr=0.001]
Steps: 100%|█████████▉| 999/1000 [10:36<00:00, 1.54it/s, loss=0.0109, lr=0.001]
Steps: 100%|█████████▉| 999/1000 [10:36<00:00, 1.54it/s, loss=0.0542, lr=0.001]
Steps: 100%|██████████| 1000/1000 [10:37<00:00, 1.56it/s, loss=0.0542, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_1000.safetensors
Steps: 100%|██████████| 1000/1000 [10:37<00:00, 1.56it/s, loss=0.0126, lr=0.001]
Steps: 100%|██████████| 1000/1000 [10:37<00:00, 1.57it/s, loss=0.0126, lr=0.001]
PTI : has 288 lora
PTI : Before training:
0%| | 0/700 [00:00<?, ?it/s]
Steps: 0%| | 0/700 [00:00<?, ?it/s]
Steps: 0%| | 1/700 [00:05<1:09:22, 5.95s/it]
Steps: 0%| | 1/700 [00:05<1:09:22, 5.95s/it, loss=0.0136, lr=0.0004]
Steps: 0%| | 2/700 [00:06<32:36, 2.80s/it, loss=0.0136, lr=0.0004]
Steps: 0%| | 2/700 [00:06<32:36, 2.80s/it, loss=0.0645, lr=0.0004]
Steps: 0%| | 3/700 [00:07<20:35, 1.77s/it, loss=0.0645, lr=0.0004]
Steps: 0%| | 3/700 [00:07<20:35, 1.77s/it, loss=0.114, lr=0.0004]
Steps: 1%| | 4/700 [00:07<15:02, 1.30s/it, loss=0.114, lr=0.0004]
Steps: 1%| | 4/700 [00:07<15:02, 1.30s/it, loss=0.0147, lr=0.0004]
Steps: 1%| | 5/700 [00:08<11:48, 1.02s/it, loss=0.0147, lr=0.0004]
Steps: 1%| | 5/700 [00:08<11:48, 1.02s/it, loss=0.0199, lr=0.0004]
Steps: 1%| | 6/700 [00:08<10:01, 1.15it/s, loss=0.0199, lr=0.0004]
Steps: 1%| | 6/700 [00:08<10:01, 1.15it/s, loss=0.127, lr=0.0004]
Steps: 1%| | 7/700 [00:09<08:50, 1.31it/s, loss=0.127, lr=0.0004]
Steps: 1%| | 7/700 [00:09<08:50, 1.31it/s, loss=0.194, lr=0.0004]
Steps: 1%| | 8/700 [00:09<08:06, 1.42it/s, loss=0.194, lr=0.0004]
Steps: 1%| | 8/700 [00:09<08:06, 1.42it/s, loss=0.0105, lr=0.0004]
Steps: 1%|▏ | 9/700 [00:10<07:33, 1.52it/s, loss=0.0105, lr=0.0004]
Steps: 1%|▏ | 9/700 [00:10<07:33, 1.52it/s, loss=0.0122, lr=0.0004]
Steps: 1%|▏ | 10/700 [00:10<07:01, 1.64it/s, loss=0.0122, lr=0.0004]
Steps: 1%|▏ | 10/700 [00:10<07:01, 1.64it/s, loss=0.0168, lr=0.0004]
Steps: 2%|▏ | 11/700 [00:11<06:43, 1.71it/s, loss=0.0168, lr=0.0004]
Steps: 2%|▏ | 11/700 [00:11<06:43, 1.71it/s, loss=0.126, lr=0.0004]
Steps: 2%|▏ | 12/700 [00:12<06:31, 1.76it/s, loss=0.126, lr=0.0004]
Steps: 2%|▏ | 12/700 [00:12<06:31, 1.76it/s, loss=0.0972, lr=0.0004]
Steps: 2%|▏ | 13/700 [00:12<06:22, 1.80it/s, loss=0.0972, lr=0.0004]
Steps: 2%|▏ | 13/700 [00:12<06:22, 1.80it/s, loss=0.176, lr=0.0004]
Steps: 2%|▏ | 14/700 [00:13<06:08, 1.86it/s, loss=0.176, lr=0.0004]
Steps: 2%|▏ | 14/700 [00:13<06:08, 1.86it/s, loss=0.00823, lr=0.0004]
Steps: 2%|▏ | 15/700 [00:13<06:01, 1.89it/s, loss=0.00823, lr=0.0004]
Steps: 2%|▏ | 15/700 [00:13<06:01, 1.89it/s, loss=0.0132, lr=0.0004]
Steps: 2%|▏ | 16/700 [00:14<05:55, 1.92it/s, loss=0.0132, lr=0.0004]
Steps: 2%|▏ | 16/700 [00:14<05:55, 1.92it/s, loss=0.0208, lr=0.0004]
Steps: 2%|▏ | 17/700 [00:14<05:52, 1.93it/s, loss=0.0208, lr=0.0004]
Steps: 2%|▏ | 17/700 [00:14<05:52, 1.93it/s, loss=0.0074, lr=0.0004]
Steps: 3%|▎ | 18/700 [00:15<06:02, 1.88it/s, loss=0.0074, lr=0.0004]
Steps: 3%|▎ | 18/700 [00:15<06:02, 1.88it/s, loss=0.00776, lr=0.0004]
Steps: 3%|▎ | 19/700 [00:15<06:10, 1.84it/s, loss=0.00776, lr=0.0004]
Steps: 3%|▎ | 19/700 [00:15<06:10, 1.84it/s, loss=0.0114, lr=0.0004]
Steps: 3%|▎ | 20/700 [00:16<06:19, 1.79it/s, loss=0.0114, lr=0.0004]
Steps: 3%|▎ | 20/700 [00:16<06:19, 1.79it/s, loss=0.0615, lr=0.0004]
Steps: 3%|▎ | 21/700 [00:16<06:25, 1.76it/s, loss=0.0615, lr=0.0004]
Steps: 3%|▎ | 21/700 [00:16<06:25, 1.76it/s, loss=0.00527, lr=0.0004]
Steps: 3%|▎ | 22/700 [00:17<06:24, 1.76it/s, loss=0.00527, lr=0.0004]
Steps: 3%|▎ | 22/700 [00:17<06:24, 1.76it/s, loss=0.0075, lr=0.0004]
Steps: 3%|▎ | 23/700 [00:18<06:29, 1.74it/s, loss=0.0075, lr=0.0004]
Steps: 3%|▎ | 23/700 [00:18<06:29, 1.74it/s, loss=0.027, lr=0.0004]
Steps: 3%|▎ | 24/700 [00:18<06:17, 1.79it/s, loss=0.027, lr=0.0004]
Steps: 3%|▎ | 24/700 [00:18<06:17, 1.79it/s, loss=0.0509, lr=0.0004]
Steps: 4%|▎ | 25/700 [00:19<06:09, 1.83it/s, loss=0.0509, lr=0.0004]
Steps: 4%|▎ | 25/700 [00:19<06:09, 1.83it/s, loss=0.0534, lr=0.0004]
Steps: 4%|▎ | 26/700 [00:19<06:11, 1.81it/s, loss=0.0534, lr=0.0004]
Steps: 4%|▎ | 26/700 [00:19<06:11, 1.81it/s, loss=0.0332, lr=0.0004]
Steps: 4%|▍ | 27/700 [00:20<06:05, 1.84it/s, loss=0.0332, lr=0.0004]
Steps: 4%|▍ | 27/700 [00:20<06:05, 1.84it/s, loss=0.134, lr=0.0004]
Steps: 4%|▍ | 28/700 [00:20<05:56, 1.89it/s, loss=0.134, lr=0.0004]
Steps: 4%|▍ | 28/700 [00:20<05:56, 1.89it/s, loss=0.0159, lr=0.0004]
Steps: 4%|▍ | 29/700 [00:21<05:51, 1.91it/s, loss=0.0159, lr=0.0004]
Steps: 4%|▍ | 29/700 [00:21<05:51, 1.91it/s, loss=0.00841, lr=0.0004]
Steps: 4%|▍ | 30/700 [00:21<05:52, 1.90it/s, loss=0.00841, lr=0.0004]
Steps: 4%|▍ | 30/700 [00:21<05:52, 1.90it/s, loss=0.0104, lr=0.0004]
Steps: 4%|▍ | 31/700 [00:22<05:55, 1.88it/s, loss=0.0104, lr=0.0004]
Steps: 4%|▍ | 31/700 [00:22<05:55, 1.88it/s, loss=0.0769, lr=0.0004]
Steps: 5%|▍ | 32/700 [00:22<05:50, 1.91it/s, loss=0.0769, lr=0.0004]
Steps: 5%|▍ | 32/700 [00:22<05:50, 1.91it/s, loss=0.0564, lr=0.0004]
Steps: 5%|▍ | 33/700 [00:23<05:48, 1.91it/s, loss=0.0564, lr=0.0004]
Steps: 5%|▍ | 33/700 [00:23<05:48, 1.91it/s, loss=0.00519, lr=0.0004]
Steps: 5%|▍ | 34/700 [00:23<05:44, 1.93it/s, loss=0.00519, lr=0.0004]
Steps: 5%|▍ | 34/700 [00:23<05:44, 1.93it/s, loss=0.00172, lr=0.0004]
Steps: 5%|▌ | 35/700 [00:24<05:43, 1.94it/s, loss=0.00172, lr=0.0004]
Steps: 5%|▌ | 35/700 [00:24<05:43, 1.94it/s, loss=0.00847, lr=0.0004]
Steps: 5%|▌ | 36/700 [00:24<05:40, 1.95it/s, loss=0.00847, lr=0.0004]
Steps: 5%|▌ | 36/700 [00:24<05:40, 1.95it/s, loss=0.00893, lr=0.0004]
Steps: 5%|▌ | 37/700 [00:25<05:38, 1.96it/s, loss=0.00893, lr=0.0004]
Steps: 5%|▌ | 37/700 [00:25<05:38, 1.96it/s, loss=0.00843, lr=0.0004]
Steps: 5%|▌ | 38/700 [00:25<05:40, 1.94it/s, loss=0.00843, lr=0.0004]
Steps: 5%|▌ | 38/700 [00:25<05:40, 1.94it/s, loss=0.00305, lr=0.0004]
Steps: 6%|▌ | 39/700 [00:26<05:48, 1.90it/s, loss=0.00305, lr=0.0004]
Steps: 6%|▌ | 39/700 [00:26<05:48, 1.90it/s, loss=0.012, lr=0.0004]
Steps: 6%|▌ | 40/700 [00:26<05:53, 1.87it/s, loss=0.012, lr=0.0004]
Steps: 6%|▌ | 40/700 [00:26<05:53, 1.87it/s, loss=0.0233, lr=0.0004]
Steps: 6%|▌ | 41/700 [00:27<05:56, 1.85it/s, loss=0.0233, lr=0.0004]
Steps: 6%|▌ | 41/700 [00:27<05:56, 1.85it/s, loss=0.0213, lr=0.0004]
Steps: 6%|▌ | 42/700 [00:28<05:57, 1.84it/s, loss=0.0213, lr=0.0004]
Steps: 6%|▌ | 42/700 [00:28<05:57, 1.84it/s, loss=0.00223, lr=0.0004]
Steps: 6%|▌ | 43/700 [00:28<06:00, 1.82it/s, loss=0.00223, lr=0.0004]
Steps: 6%|▌ | 43/700 [00:28<06:00, 1.82it/s, loss=0.0261, lr=0.0004]
Steps: 6%|▋ | 44/700 [00:29<06:10, 1.77it/s, loss=0.0261, lr=0.0004]
Steps: 6%|▋ | 44/700 [00:29<06:10, 1.77it/s, loss=0.0833, lr=0.0004]
Steps: 6%|▋ | 45/700 [00:29<06:12, 1.76it/s, loss=0.0833, lr=0.0004]
Steps: 6%|▋ | 45/700 [00:29<06:12, 1.76it/s, loss=0.0273, lr=0.0004]
Steps: 7%|▋ | 46/700 [00:30<06:14, 1.75it/s, loss=0.0273, lr=0.0004]
Steps: 7%|▋ | 46/700 [00:30<06:14, 1.75it/s, loss=0.00564, lr=0.0004]
Steps: 7%|▋ | 47/700 [00:30<06:12, 1.76it/s, loss=0.00564, lr=0.0004]
Steps: 7%|▋ | 47/700 [00:30<06:12, 1.76it/s, loss=0.0392, lr=0.0004]
Steps: 7%|▋ | 48/700 [00:31<06:07, 1.77it/s, loss=0.0392, lr=0.0004]
Steps: 7%|▋ | 48/700 [00:31<06:07, 1.77it/s, loss=0.00178, lr=0.0004]
Steps: 7%|▋ | 49/700 [00:32<06:06, 1.78it/s, loss=0.00178, lr=0.0004]
Steps: 7%|▋ | 49/700 [00:32<06:06, 1.78it/s, loss=0.0246, lr=0.0004]
Steps: 7%|▋ | 50/700 [00:32<06:04, 1.78it/s, loss=0.0246, lr=0.0004]
Steps: 7%|▋ | 50/700 [00:32<06:04, 1.78it/s, loss=0.00817, lr=0.0004]
Steps: 7%|▋ | 51/700 [00:33<06:10, 1.75it/s, loss=0.00817, lr=0.0004]
Steps: 7%|▋ | 51/700 [00:33<06:10, 1.75it/s, loss=0.0547, lr=0.0004]
Steps: 7%|▋ | 52/700 [00:33<06:10, 1.75it/s, loss=0.0547, lr=0.0004]
Steps: 7%|▋ | 52/700 [00:33<06:10, 1.75it/s, loss=0.0248, lr=0.0004]
Steps: 8%|▊ | 53/700 [00:34<06:16, 1.72it/s, loss=0.0248, lr=0.0004]
Steps: 8%|▊ | 53/700 [00:34<06:16, 1.72it/s, loss=0.0956, lr=0.0004]
Steps: 8%|▊ | 54/700 [00:34<06:09, 1.75it/s, loss=0.0956, lr=0.0004]
Steps: 8%|▊ | 54/700 [00:34<06:09, 1.75it/s, loss=0.0246, lr=0.0004]
Steps: 8%|▊ | 55/700 [00:35<06:06, 1.76it/s, loss=0.0246, lr=0.0004]
Steps: 8%|▊ | 55/700 [00:35<06:06, 1.76it/s, loss=0.0204, lr=0.0004]
Steps: 8%|▊ | 56/700 [00:36<06:02, 1.78it/s, loss=0.0204, lr=0.0004]
Steps: 8%|▊ | 56/700 [00:36<06:02, 1.78it/s, loss=0.00192, lr=0.0004]
Steps: 8%|▊ | 57/700 [00:36<06:03, 1.77it/s, loss=0.00192, lr=0.0004]
Steps: 8%|▊ | 57/700 [00:36<06:03, 1.77it/s, loss=0.0176, lr=0.0004]
Steps: 8%|▊ | 58/700 [00:37<06:03, 1.77it/s, loss=0.0176, lr=0.0004]
Steps: 8%|▊ | 58/700 [00:37<06:03, 1.77it/s, loss=0.0782, lr=0.0004]
Steps: 8%|▊ | 59/700 [00:37<06:04, 1.76it/s, loss=0.0782, lr=0.0004]
Steps: 8%|▊ | 59/700 [00:37<06:04, 1.76it/s, loss=0.297, lr=0.0004]
Steps: 9%|▊ | 60/700 [00:38<05:59, 1.78it/s, loss=0.297, lr=0.0004]
Steps: 9%|▊ | 60/700 [00:38<05:59, 1.78it/s, loss=0.0103, lr=0.0004]
Steps: 9%|▊ | 61/700 [00:38<05:56, 1.79it/s, loss=0.0103, lr=0.0004]
Steps: 9%|▊ | 61/700 [00:38<05:56, 1.79it/s, loss=0.00232, lr=0.0004]
Steps: 9%|▉ | 62/700 [00:39<05:57, 1.78it/s, loss=0.00232, lr=0.0004]
Steps: 9%|▉ | 62/700 [00:39<05:57, 1.78it/s, loss=0.135, lr=0.0004]
Steps: 9%|▉ | 63/700 [00:39<05:52, 1.81it/s, loss=0.135, lr=0.0004]
Steps: 9%|▉ | 63/700 [00:39<05:52, 1.81it/s, loss=0.0448, lr=0.0004]
Steps: 9%|▉ | 64/700 [00:40<05:49, 1.82it/s, loss=0.0448, lr=0.0004]
Steps: 9%|▉ | 64/700 [00:40<05:49, 1.82it/s, loss=0.0329, lr=0.0004]
Steps: 9%|▉ | 65/700 [00:41<05:48, 1.82it/s, loss=0.0329, lr=0.0004]
Steps: 9%|▉ | 65/700 [00:41<05:48, 1.82it/s, loss=0.102, lr=0.0004]
Steps: 9%|▉ | 66/700 [00:41<05:48, 1.82it/s, loss=0.102, lr=0.0004]
Steps: 9%|▉ | 66/700 [00:41<05:48, 1.82it/s, loss=0.136, lr=0.0004]
Steps: 10%|▉ | 67/700 [00:42<05:49, 1.81it/s, loss=0.136, lr=0.0004]
Steps: 10%|▉ | 67/700 [00:42<05:49, 1.81it/s, loss=0.0229, lr=0.0004]
Steps: 10%|▉ | 68/700 [00:42<05:47, 1.82it/s, loss=0.0229, lr=0.0004]
Steps: 10%|▉ | 68/700 [00:42<05:47, 1.82it/s, loss=0.0538, lr=0.0004]
Steps: 10%|▉ | 69/700 [00:43<05:43, 1.84it/s, loss=0.0538, lr=0.0004]
Steps: 10%|▉ | 69/700 [00:43<05:43, 1.84it/s, loss=0.0282, lr=0.0004]
Steps: 10%|█ | 70/700 [00:43<05:42, 1.84it/s, loss=0.0282, lr=0.0004]
Steps: 10%|█ | 70/700 [00:43<05:42, 1.84it/s, loss=0.00587, lr=0.0004]
Steps: 10%|█ | 71/700 [00:44<05:45, 1.82it/s, loss=0.00587, lr=0.0004]
Steps: 10%|█ | 71/700 [00:44<05:45, 1.82it/s, loss=0.0534, lr=0.0004]
Steps: 10%|█ | 72/700 [00:44<05:43, 1.83it/s, loss=0.0534, lr=0.0004]
Steps: 10%|█ | 72/700 [00:44<05:43, 1.83it/s, loss=0.00902, lr=0.0004]
Steps: 10%|█ | 73/700 [00:45<05:40, 1.84it/s, loss=0.00902, lr=0.0004]
Steps: 10%|█ | 73/700 [00:45<05:40, 1.84it/s, loss=0.00754, lr=0.0004]
Steps: 11%|█ | 74/700 [00:45<05:42, 1.83it/s, loss=0.00754, lr=0.0004]
Steps: 11%|█ | 74/700 [00:45<05:42, 1.83it/s, loss=0.00843, lr=0.0004]
Steps: 11%|█ | 75/700 [00:46<05:42, 1.83it/s, loss=0.00843, lr=0.0004]
Steps: 11%|█ | 75/700 [00:46<05:42, 1.83it/s, loss=0.0558, lr=0.0004]
Steps: 11%|█ | 76/700 [00:47<05:41, 1.83it/s, loss=0.0558, lr=0.0004]
Steps: 11%|█ | 76/700 [00:47<05:41, 1.83it/s, loss=0.014, lr=0.0004]
Steps: 11%|█ | 77/700 [00:47<06:10, 1.68it/s, loss=0.014, lr=0.0004]
Steps: 11%|█ | 77/700 [00:47<06:10, 1.68it/s, loss=0.0103, lr=0.0004]
Steps: 11%|█ | 78/700 [00:48<06:02, 1.72it/s, loss=0.0103, lr=0.0004]
Steps: 11%|█ | 78/700 [00:48<06:02, 1.72it/s, loss=0.199, lr=0.0004]
Steps: 11%|█▏ | 79/700 [00:48<05:54, 1.75it/s, loss=0.199, lr=0.0004]
Steps: 11%|█▏ | 79/700 [00:48<05:54, 1.75it/s, loss=0.00994, lr=0.0004]
Steps: 11%|█▏ | 80/700 [00:49<05:50, 1.77it/s, loss=0.00994, lr=0.0004]
Steps: 11%|█▏ | 80/700 [00:49<05:50, 1.77it/s, loss=0.00166, lr=0.0004]
Steps: 12%|█▏ | 81/700 [00:49<05:47, 1.78it/s, loss=0.00166, lr=0.0004]
Steps: 12%|█▏ | 81/700 [00:49<05:47, 1.78it/s, loss=0.307, lr=0.0004]
Steps: 12%|█▏ | 82/700 [00:50<05:46, 1.79it/s, loss=0.307, lr=0.0004]
Steps: 12%|█▏ | 82/700 [00:50<05:46, 1.79it/s, loss=0.0787, lr=0.0004]
Steps: 12%|█▏ | 83/700 [00:51<05:41, 1.80it/s, loss=0.0787, lr=0.0004]
Steps: 12%|█▏ | 83/700 [00:51<05:41, 1.80it/s, loss=0.0285, lr=0.0004]
Steps: 12%|█▏ | 84/700 [00:51<05:41, 1.81it/s, loss=0.0285, lr=0.0004]
Steps: 12%|█▏ | 84/700 [00:51<05:41, 1.81it/s, loss=0.0156, lr=0.0004]
Steps: 12%|█▏ | 85/700 [00:52<05:39, 1.81it/s, loss=0.0156, lr=0.0004]
Steps: 12%|█▏ | 85/700 [00:52<05:39, 1.81it/s, loss=0.00945, lr=0.0004]
Steps: 12%|█▏ | 86/700 [00:52<05:36, 1.82it/s, loss=0.00945, lr=0.0004]
Steps: 12%|█▏ | 86/700 [00:52<05:36, 1.82it/s, loss=0.0294, lr=0.0004]
Steps: 12%|█▏ | 87/700 [00:53<05:38, 1.81it/s, loss=0.0294, lr=0.0004]
Steps: 12%|█▏ | 87/700 [00:53<05:38, 1.81it/s, loss=0.0266, lr=0.0004]
Steps: 13%|█▎ | 88/700 [00:53<05:37, 1.82it/s, loss=0.0266, lr=0.0004]
Steps: 13%|█▎ | 88/700 [00:53<05:37, 1.82it/s, loss=0.00252, lr=0.0004]
Steps: 13%|█▎ | 89/700 [00:54<05:39, 1.80it/s, loss=0.00252, lr=0.0004]
Steps: 13%|█▎ | 89/700 [00:54<05:39, 1.80it/s, loss=0.0111, lr=0.0004]
Steps: 13%|█▎ | 90/700 [00:54<05:44, 1.77it/s, loss=0.0111, lr=0.0004]
Steps: 13%|█▎ | 90/700 [00:54<05:44, 1.77it/s, loss=0.0113, lr=0.0004]
Steps: 13%|█▎ | 91/700 [00:55<05:38, 1.80it/s, loss=0.0113, lr=0.0004]
Steps: 13%|█▎ | 91/700 [00:55<05:38, 1.80it/s, loss=0.0463, lr=0.0004]
Steps: 13%|█▎ | 92/700 [00:56<05:32, 1.83it/s, loss=0.0463, lr=0.0004]
Steps: 13%|█▎ | 92/700 [00:56<05:32, 1.83it/s, loss=0.00671, lr=0.0004]
Steps: 13%|█▎ | 93/700 [00:56<05:29, 1.84it/s, loss=0.00671, lr=0.0004]
Steps: 13%|█▎ | 93/700 [00:56<05:29, 1.84it/s, loss=0.0407, lr=0.0004]
Steps: 13%|█▎ | 94/700 [00:57<05:28, 1.84it/s, loss=0.0407, lr=0.0004]
Steps: 13%|█▎ | 94/700 [00:57<05:28, 1.84it/s, loss=0.00514, lr=0.0004]
Steps: 14%|█▎ | 95/700 [00:57<05:31, 1.83it/s, loss=0.00514, lr=0.0004]
Steps: 14%|█▎ | 95/700 [00:57<05:31, 1.83it/s, loss=0.0298, lr=0.0004]
Steps: 14%|█▎ | 96/700 [00:58<05:34, 1.81it/s, loss=0.0298, lr=0.0004]
Steps: 14%|█▎ | 96/700 [00:58<05:34, 1.81it/s, loss=0.0139, lr=0.0004]
Steps: 14%|█▍ | 97/700 [00:58<05:29, 1.83it/s, loss=0.0139, lr=0.0004]
Steps: 14%|█▍ | 97/700 [00:58<05:29, 1.83it/s, loss=0.00684, lr=0.0004]
Steps: 14%|█▍ | 98/700 [00:59<05:20, 1.88it/s, loss=0.00684, lr=0.0004]
Steps: 14%|█▍ | 98/700 [00:59<05:20, 1.88it/s, loss=0.0252, lr=0.0004]
Steps: 14%|█▍ | 99/700 [00:59<05:21, 1.87it/s, loss=0.0252, lr=0.0004]
Steps: 14%|█▍ | 99/700 [00:59<05:21, 1.87it/s, loss=0.212, lr=0.0004]
Steps: 14%|█▍ | 100/700 [01:00<05:22, 1.86it/s, loss=0.212, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_100.safetensors
LORA Unet Moved 0.0007244422449730337
LORA CLIP Moved 3.097903754678555e-05
Steps: 14%|█▍ | 100/700 [01:00<05:22, 1.86it/s, loss=0.0229, lr=0.0004]
Steps: 14%|█▍ | 101/700 [01:01<05:58, 1.67it/s, loss=0.0229, lr=0.0004]
Steps: 14%|█▍ | 101/700 [01:01<05:58, 1.67it/s, loss=0.0265, lr=0.0004]
Steps: 15%|█▍ | 102/700 [01:01<05:48, 1.72it/s, loss=0.0265, lr=0.0004]
Steps: 15%|█▍ | 102/700 [01:01<05:48, 1.72it/s, loss=0.0872, lr=0.0004]
Steps: 15%|█▍ | 103/700 [01:02<05:42, 1.75it/s, loss=0.0872, lr=0.0004]
Steps: 15%|█▍ | 103/700 [01:02<05:42, 1.75it/s, loss=0.0143, lr=0.0004]
Steps: 15%|█▍ | 104/700 [01:02<05:35, 1.78it/s, loss=0.0143, lr=0.0004]
Steps: 15%|█▍ | 104/700 [01:02<05:35, 1.78it/s, loss=0.0161, lr=0.0004]
Steps: 15%|█▌ | 105/700 [01:03<05:32, 1.79it/s, loss=0.0161, lr=0.0004]
Steps: 15%|█▌ | 105/700 [01:03<05:32, 1.79it/s, loss=0.011, lr=0.0004]
Steps: 15%|█▌ | 106/700 [01:03<05:31, 1.79it/s, loss=0.011, lr=0.0004]
Steps: 15%|█▌ | 106/700 [01:03<05:31, 1.79it/s, loss=0.0072, lr=0.0004]
Steps: 15%|█▌ | 107/700 [01:04<05:30, 1.80it/s, loss=0.0072, lr=0.0004]
Steps: 15%|█▌ | 107/700 [01:04<05:30, 1.80it/s, loss=0.00261, lr=0.0004]
Steps: 15%|█▌ | 108/700 [01:04<05:26, 1.81it/s, loss=0.00261, lr=0.0004]
Steps: 15%|█▌ | 108/700 [01:04<05:26, 1.81it/s, loss=0.00597, lr=0.0004]
Steps: 16%|█▌ | 109/700 [01:05<05:24, 1.82it/s, loss=0.00597, lr=0.0004]
Steps: 16%|█▌ | 109/700 [01:05<05:24, 1.82it/s, loss=0.073, lr=0.0004]
Steps: 16%|█▌ | 110/700 [01:05<05:23, 1.82it/s, loss=0.073, lr=0.0004]
Steps: 16%|█▌ | 110/700 [01:05<05:23, 1.82it/s, loss=0.0238, lr=0.0004]
Steps: 16%|█▌ | 111/700 [01:06<05:21, 1.83it/s, loss=0.0238, lr=0.0004]
Steps: 16%|█▌ | 111/700 [01:06<05:21, 1.83it/s, loss=0.00492, lr=0.0004]
Steps: 16%|█▌ | 112/700 [01:07<05:19, 1.84it/s, loss=0.00492, lr=0.0004]
Steps: 16%|█▌ | 112/700 [01:07<05:19, 1.84it/s, loss=0.00202, lr=0.0004]
Steps: 16%|█▌ | 113/700 [01:07<05:18, 1.84it/s, loss=0.00202, lr=0.0004]
Steps: 16%|█▌ | 113/700 [01:07<05:18, 1.84it/s, loss=0.0107, lr=0.0004]
Steps: 16%|█▋ | 114/700 [01:08<05:18, 1.84it/s, loss=0.0107, lr=0.0004]
Steps: 16%|█▋ | 114/700 [01:08<05:18, 1.84it/s, loss=0.0017, lr=0.0004]
Steps: 16%|█▋ | 115/700 [01:08<05:19, 1.83it/s, loss=0.0017, lr=0.0004]
Steps: 16%|█▋ | 115/700 [01:08<05:19, 1.83it/s, loss=0.0193, lr=0.0004]
Steps: 17%|█▋ | 116/700 [01:09<05:17, 1.84it/s, loss=0.0193, lr=0.0004]
Steps: 17%|█▋ | 116/700 [01:09<05:17, 1.84it/s, loss=0.0246, lr=0.0004]
Steps: 17%|█▋ | 117/700 [01:09<05:16, 1.84it/s, loss=0.0246, lr=0.0004]
Steps: 17%|█▋ | 117/700 [01:09<05:16, 1.84it/s, loss=0.0084, lr=0.0004]
Steps: 17%|█▋ | 118/700 [01:10<05:19, 1.82it/s, loss=0.0084, lr=0.0004]
Steps: 17%|█▋ | 118/700 [01:10<05:19, 1.82it/s, loss=0.369, lr=0.0004]
Steps: 17%|█▋ | 119/700 [01:10<05:18, 1.82it/s, loss=0.369, lr=0.0004]
Steps: 17%|█▋ | 119/700 [01:10<05:18, 1.82it/s, loss=0.0188, lr=0.0004]
Steps: 17%|█▋ | 120/700 [01:11<05:20, 1.81it/s, loss=0.0188, lr=0.0004]
Steps: 17%|█▋ | 120/700 [01:11<05:20, 1.81it/s, loss=0.0234, lr=0.0004]
Steps: 17%|█▋ | 121/700 [01:11<05:18, 1.82it/s, loss=0.0234, lr=0.0004]
Steps: 17%|█▋ | 121/700 [01:12<05:18, 1.82it/s, loss=0.0663, lr=0.0004]
Steps: 17%|█▋ | 122/700 [01:12<05:15, 1.83it/s, loss=0.0663, lr=0.0004]
Steps: 17%|█▋ | 122/700 [01:12<05:15, 1.83it/s, loss=0.00747, lr=0.0004]
Steps: 18%|█▊ | 123/700 [01:13<05:14, 1.84it/s, loss=0.00747, lr=0.0004]
Steps: 18%|█▊ | 123/700 [01:13<05:14, 1.84it/s, loss=0.0517, lr=0.0004]
Steps: 18%|█▊ | 124/700 [01:13<05:13, 1.84it/s, loss=0.0517, lr=0.0004]
Steps: 18%|█▊ | 124/700 [01:13<05:13, 1.84it/s, loss=0.00986, lr=0.0004]
Steps: 18%|█▊ | 125/700 [01:14<05:13, 1.83it/s, loss=0.00986, lr=0.0004]
Steps: 18%|█▊ | 125/700 [01:14<05:13, 1.83it/s, loss=0.00407, lr=0.0004]
Steps: 18%|█▊ | 126/700 [01:14<05:14, 1.83it/s, loss=0.00407, lr=0.0004]
Steps: 18%|█▊ | 126/700 [01:14<05:14, 1.83it/s, loss=0.00421, lr=0.0004]
Steps: 18%|█▊ | 127/700 [01:15<05:15, 1.81it/s, loss=0.00421, lr=0.0004]
Steps: 18%|█▊ | 127/700 [01:15<05:15, 1.81it/s, loss=0.0145, lr=0.0004]
Steps: 18%|█▊ | 128/700 [01:15<05:11, 1.84it/s, loss=0.0145, lr=0.0004]
Steps: 18%|█▊ | 128/700 [01:15<05:11, 1.84it/s, loss=0.00552, lr=0.0004]
Steps: 18%|█▊ | 129/700 [01:16<05:09, 1.84it/s, loss=0.00552, lr=0.0004]
Steps: 18%|█▊ | 129/700 [01:16<05:09, 1.84it/s, loss=0.0378, lr=0.0004]
Steps: 19%|█▊ | 130/700 [01:16<05:08, 1.85it/s, loss=0.0378, lr=0.0004]
Steps: 19%|█▊ | 130/700 [01:16<05:08, 1.85it/s, loss=0.0183, lr=0.0004]
Steps: 19%|█▊ | 131/700 [01:17<05:07, 1.85it/s, loss=0.0183, lr=0.0004]
Steps: 19%|█▊ | 131/700 [01:17<05:07, 1.85it/s, loss=0.0362, lr=0.0004]
Steps: 19%|█▉ | 132/700 [01:17<05:06, 1.86it/s, loss=0.0362, lr=0.0004]
Steps: 19%|█▉ | 132/700 [01:17<05:06, 1.86it/s, loss=0.0043, lr=0.0004]
Steps: 19%|█▉ | 133/700 [01:18<05:02, 1.87it/s, loss=0.0043, lr=0.0004]
Steps: 19%|█▉ | 133/700 [01:18<05:02, 1.87it/s, loss=0.0103, lr=0.0004]
Steps: 19%|█▉ | 134/700 [01:18<04:58, 1.89it/s, loss=0.0103, lr=0.0004]
Steps: 19%|█▉ | 134/700 [01:18<04:58, 1.89it/s, loss=0.0782, lr=0.0004]
Steps: 19%|█▉ | 135/700 [01:19<04:57, 1.90it/s, loss=0.0782, lr=0.0004]
Steps: 19%|█▉ | 135/700 [01:19<04:57, 1.90it/s, loss=0.00536, lr=0.0004]
Steps: 19%|█▉ | 136/700 [01:20<04:54, 1.91it/s, loss=0.00536, lr=0.0004]
Steps: 19%|█▉ | 136/700 [01:20<04:54, 1.91it/s, loss=0.00977, lr=0.0004]
Steps: 20%|█▉ | 137/700 [01:20<04:50, 1.94it/s, loss=0.00977, lr=0.0004]
Steps: 20%|█▉ | 137/700 [01:20<04:50, 1.94it/s, loss=0.0244, lr=0.0004]
Steps: 20%|█▉ | 138/700 [01:21<04:49, 1.94it/s, loss=0.0244, lr=0.0004]
Steps: 20%|█▉ | 138/700 [01:21<04:49, 1.94it/s, loss=0.0119, lr=0.0004]
Steps: 20%|█▉ | 139/700 [01:21<04:53, 1.91it/s, loss=0.0119, lr=0.0004]
Steps: 20%|█▉ | 139/700 [01:21<04:53, 1.91it/s, loss=0.00262, lr=0.0004]
Steps: 20%|██ | 140/700 [01:22<04:55, 1.90it/s, loss=0.00262, lr=0.0004]
Steps: 20%|██ | 140/700 [01:22<04:55, 1.90it/s, loss=0.0776, lr=0.0004]
Steps: 20%|██ | 141/700 [01:22<04:58, 1.87it/s, loss=0.0776, lr=0.0004]
Steps: 20%|██ | 141/700 [01:22<04:58, 1.87it/s, loss=0.00148, lr=0.0004]
Steps: 20%|██ | 142/700 [01:23<05:01, 1.85it/s, loss=0.00148, lr=0.0004]
Steps: 20%|██ | 142/700 [01:23<05:01, 1.85it/s, loss=0.0134, lr=0.0004]
Steps: 20%|██ | 143/700 [01:23<05:04, 1.83it/s, loss=0.0134, lr=0.0004]
Steps: 20%|██ | 143/700 [01:23<05:04, 1.83it/s, loss=0.0393, lr=0.0004]
Steps: 21%|██ | 144/700 [01:24<05:06, 1.81it/s, loss=0.0393, lr=0.0004]
Steps: 21%|██ | 144/700 [01:24<05:06, 1.81it/s, loss=0.164, lr=0.0004]
Steps: 21%|██ | 145/700 [01:24<05:00, 1.85it/s, loss=0.164, lr=0.0004]
Steps: 21%|██ | 145/700 [01:24<05:00, 1.85it/s, loss=0.0173, lr=0.0004]
Steps: 21%|██ | 146/700 [01:25<05:00, 1.85it/s, loss=0.0173, lr=0.0004]
Steps: 21%|██ | 146/700 [01:25<05:00, 1.85it/s, loss=0.00347, lr=0.0004]
Steps: 21%|██ | 147/700 [01:25<04:58, 1.85it/s, loss=0.00347, lr=0.0004]
Steps: 21%|██ | 147/700 [01:25<04:58, 1.85it/s, loss=0.0358, lr=0.0004]
Steps: 21%|██ | 148/700 [01:26<04:57, 1.85it/s, loss=0.0358, lr=0.0004]
Steps: 21%|██ | 148/700 [01:26<04:57, 1.85it/s, loss=0.00457, lr=0.0004]
Steps: 21%|██▏ | 149/700 [01:27<04:56, 1.86it/s, loss=0.00457, lr=0.0004]
Steps: 21%|██▏ | 149/700 [01:27<04:56, 1.86it/s, loss=0.0184, lr=0.0004]
Steps: 21%|██▏ | 150/700 [01:27<04:49, 1.90it/s, loss=0.0184, lr=0.0004]
Steps: 21%|██▏ | 150/700 [01:27<04:49, 1.90it/s, loss=0.00209, lr=0.0004]
Steps: 22%|██▏ | 151/700 [01:28<04:47, 1.91it/s, loss=0.00209, lr=0.0004]
Steps: 22%|██▏ | 151/700 [01:28<04:47, 1.91it/s, loss=0.0184, lr=0.0004]
Steps: 22%|██▏ | 152/700 [01:28<04:48, 1.90it/s, loss=0.0184, lr=0.0004]
Steps: 22%|██▏ | 152/700 [01:28<04:48, 1.90it/s, loss=0.242, lr=0.0004]
Steps: 22%|██▏ | 153/700 [01:29<04:50, 1.88it/s, loss=0.242, lr=0.0004]
Steps: 22%|██▏ | 153/700 [01:29<04:50, 1.88it/s, loss=0.0147, lr=0.0004]
Steps: 22%|██▏ | 154/700 [01:29<04:51, 1.88it/s, loss=0.0147, lr=0.0004]
Steps: 22%|██▏ | 154/700 [01:29<04:51, 1.88it/s, loss=0.018, lr=0.0004]
Steps: 22%|██▏ | 155/700 [01:30<04:48, 1.89it/s, loss=0.018, lr=0.0004]
Steps: 22%|██▏ | 155/700 [01:30<04:48, 1.89it/s, loss=0.0357, lr=0.0004]
Steps: 22%|██▏ | 156/700 [01:30<04:51, 1.86it/s, loss=0.0357, lr=0.0004]
Steps: 22%|██▏ | 156/700 [01:30<04:51, 1.86it/s, loss=0.0363, lr=0.0004]
Steps: 22%|██▏ | 157/700 [01:31<04:46, 1.90it/s, loss=0.0363, lr=0.0004]
Steps: 22%|██▏ | 157/700 [01:31<04:46, 1.90it/s, loss=0.0198, lr=0.0004]
Steps: 23%|██▎ | 158/700 [01:31<04:43, 1.91it/s, loss=0.0198, lr=0.0004]
Steps: 23%|██▎ | 158/700 [01:31<04:43, 1.91it/s, loss=0.00913, lr=0.0004]
Steps: 23%|██▎ | 159/700 [01:32<04:47, 1.88it/s, loss=0.00913, lr=0.0004]
Steps: 23%|██▎ | 159/700 [01:32<04:47, 1.88it/s, loss=0.00706, lr=0.0004]
Steps: 23%|██▎ | 160/700 [01:32<04:46, 1.88it/s, loss=0.00706, lr=0.0004]
Steps: 23%|██▎ | 160/700 [01:32<04:46, 1.88it/s, loss=0.0376, lr=0.0004]
Steps: 23%|██▎ | 161/700 [01:33<04:46, 1.88it/s, loss=0.0376, lr=0.0004]
Steps: 23%|██▎ | 161/700 [01:33<04:46, 1.88it/s, loss=0.0822, lr=0.0004]
Steps: 23%|██▎ | 162/700 [01:33<04:54, 1.82it/s, loss=0.0822, lr=0.0004]
Steps: 23%|██▎ | 162/700 [01:33<04:54, 1.82it/s, loss=0.0165, lr=0.0004]
Steps: 23%|██▎ | 163/700 [01:34<04:54, 1.82it/s, loss=0.0165, lr=0.0004]
Steps: 23%|██▎ | 163/700 [01:34<04:54, 1.82it/s, loss=0.0109, lr=0.0004]
Steps: 23%|██▎ | 164/700 [01:35<04:52, 1.83it/s, loss=0.0109, lr=0.0004]
Steps: 23%|██▎ | 164/700 [01:35<04:52, 1.83it/s, loss=0.0233, lr=0.0004]
Steps: 24%|██▎ | 165/700 [01:35<04:49, 1.85it/s, loss=0.0233, lr=0.0004]
Steps: 24%|██▎ | 165/700 [01:35<04:49, 1.85it/s, loss=0.00457, lr=0.0004]
Steps: 24%|██▎ | 166/700 [01:36<04:45, 1.87it/s, loss=0.00457, lr=0.0004]
Steps: 24%|██▎ | 166/700 [01:36<04:45, 1.87it/s, loss=0.0383, lr=0.0004]
Steps: 24%|██▍ | 167/700 [01:36<04:42, 1.89it/s, loss=0.0383, lr=0.0004]
Steps: 24%|██▍ | 167/700 [01:36<04:42, 1.89it/s, loss=0.074, lr=0.0004]
Steps: 24%|██▍ | 168/700 [01:37<04:42, 1.88it/s, loss=0.074, lr=0.0004]
Steps: 24%|██▍ | 168/700 [01:37<04:42, 1.88it/s, loss=0.0275, lr=0.0004]
Steps: 24%|██▍ | 169/700 [01:37<04:42, 1.88it/s, loss=0.0275, lr=0.0004]
Steps: 24%|██▍ | 169/700 [01:37<04:42, 1.88it/s, loss=0.012, lr=0.0004]
Steps: 24%|██▍ | 170/700 [01:38<04:44, 1.87it/s, loss=0.012, lr=0.0004]
Steps: 24%|██▍ | 170/700 [01:38<04:44, 1.87it/s, loss=0.00168, lr=0.0004]
Steps: 24%|██▍ | 171/700 [01:38<04:44, 1.86it/s, loss=0.00168, lr=0.0004]
Steps: 24%|██▍ | 171/700 [01:38<04:44, 1.86it/s, loss=0.00761, lr=0.0004]
Steps: 25%|██▍ | 172/700 [01:39<04:45, 1.85it/s, loss=0.00761, lr=0.0004]
Steps: 25%|██▍ | 172/700 [01:39<04:45, 1.85it/s, loss=0.002, lr=0.0004]
Steps: 25%|██▍ | 173/700 [01:39<04:46, 1.84it/s, loss=0.002, lr=0.0004]
Steps: 25%|██▍ | 173/700 [01:39<04:46, 1.84it/s, loss=0.0126, lr=0.0004]
Steps: 25%|██▍ | 174/700 [01:40<04:47, 1.83it/s, loss=0.0126, lr=0.0004]
Steps: 25%|██▍ | 174/700 [01:40<04:47, 1.83it/s, loss=0.0468, lr=0.0004]
Steps: 25%|██▌ | 175/700 [01:40<04:47, 1.83it/s, loss=0.0468, lr=0.0004]
Steps: 25%|██▌ | 175/700 [01:40<04:47, 1.83it/s, loss=0.0351, lr=0.0004]
Steps: 25%|██▌ | 176/700 [01:41<04:44, 1.84it/s, loss=0.0351, lr=0.0004]
Steps: 25%|██▌ | 176/700 [01:41<04:44, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 25%|██▌ | 177/700 [01:42<04:44, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 25%|██▌ | 177/700 [01:42<04:44, 1.84it/s, loss=0.133, lr=0.0004]
Steps: 25%|██▌ | 178/700 [01:42<04:42, 1.85it/s, loss=0.133, lr=0.0004]
Steps: 25%|██▌ | 178/700 [01:42<04:42, 1.85it/s, loss=0.00218, lr=0.0004]
Steps: 26%|██▌ | 179/700 [01:43<04:38, 1.87it/s, loss=0.00218, lr=0.0004]
Steps: 26%|██▌ | 179/700 [01:43<04:38, 1.87it/s, loss=0.00678, lr=0.0004]
Steps: 26%|██▌ | 180/700 [01:43<04:36, 1.88it/s, loss=0.00678, lr=0.0004]
Steps: 26%|██▌ | 180/700 [01:43<04:36, 1.88it/s, loss=0.0145, lr=0.0004]
Steps: 26%|██▌ | 181/700 [01:44<04:33, 1.90it/s, loss=0.0145, lr=0.0004]
Steps: 26%|██▌ | 181/700 [01:44<04:33, 1.90it/s, loss=0.0168, lr=0.0004]
Steps: 26%|██▌ | 182/700 [01:44<04:29, 1.93it/s, loss=0.0168, lr=0.0004]
Steps: 26%|██▌ | 182/700 [01:44<04:29, 1.93it/s, loss=0.0101, lr=0.0004]
Steps: 26%|██▌ | 183/700 [01:45<04:24, 1.96it/s, loss=0.0101, lr=0.0004]
Steps: 26%|██▌ | 183/700 [01:45<04:24, 1.96it/s, loss=0.0785, lr=0.0004]
Steps: 26%|██▋ | 184/700 [01:45<04:23, 1.96it/s, loss=0.0785, lr=0.0004]
Steps: 26%|██▋ | 184/700 [01:45<04:23, 1.96it/s, loss=0.00305, lr=0.0004]
Steps: 26%|██▋ | 185/700 [01:46<04:23, 1.96it/s, loss=0.00305, lr=0.0004]
Steps: 26%|██▋ | 185/700 [01:46<04:23, 1.96it/s, loss=0.208, lr=0.0004]
Steps: 27%|██▋ | 186/700 [01:46<04:25, 1.93it/s, loss=0.208, lr=0.0004]
Steps: 27%|██▋ | 186/700 [01:46<04:25, 1.93it/s, loss=0.00711, lr=0.0004]
Steps: 27%|██▋ | 187/700 [01:47<04:32, 1.88it/s, loss=0.00711, lr=0.0004]
Steps: 27%|██▋ | 187/700 [01:47<04:32, 1.88it/s, loss=0.0302, lr=0.0004]
Steps: 27%|██▋ | 188/700 [01:47<04:35, 1.86it/s, loss=0.0302, lr=0.0004]
Steps: 27%|██▋ | 188/700 [01:47<04:35, 1.86it/s, loss=0.0422, lr=0.0004]
Steps: 27%|██▋ | 189/700 [01:48<04:36, 1.85it/s, loss=0.0422, lr=0.0004]
Steps: 27%|██▋ | 189/700 [01:48<04:36, 1.85it/s, loss=0.0568, lr=0.0004]
Steps: 27%|██▋ | 190/700 [01:48<04:35, 1.85it/s, loss=0.0568, lr=0.0004]
Steps: 27%|██▋ | 190/700 [01:48<04:35, 1.85it/s, loss=0.00478, lr=0.0004]
Steps: 27%|██▋ | 191/700 [01:49<04:36, 1.84it/s, loss=0.00478, lr=0.0004]
Steps: 27%|██▋ | 191/700 [01:49<04:36, 1.84it/s, loss=0.0315, lr=0.0004]
Steps: 27%|██▋ | 192/700 [01:49<04:37, 1.83it/s, loss=0.0315, lr=0.0004]
Steps: 27%|██▋ | 192/700 [01:49<04:37, 1.83it/s, loss=0.00483, lr=0.0004]
Steps: 28%|██▊ | 193/700 [01:50<04:30, 1.87it/s, loss=0.00483, lr=0.0004]
Steps: 28%|██▊ | 193/700 [01:50<04:30, 1.87it/s, loss=0.0079, lr=0.0004]
Steps: 28%|██▊ | 194/700 [01:51<04:28, 1.88it/s, loss=0.0079, lr=0.0004]
Steps: 28%|██▊ | 194/700 [01:51<04:28, 1.88it/s, loss=0.00442, lr=0.0004]
Steps: 28%|██▊ | 195/700 [01:51<04:28, 1.88it/s, loss=0.00442, lr=0.0004]
Steps: 28%|██▊ | 195/700 [01:51<04:28, 1.88it/s, loss=0.047, lr=0.0004]
Steps: 28%|██▊ | 196/700 [01:52<04:28, 1.88it/s, loss=0.047, lr=0.0004]
Steps: 28%|██▊ | 196/700 [01:52<04:28, 1.88it/s, loss=0.0346, lr=0.0004]
Steps: 28%|██▊ | 197/700 [01:52<04:28, 1.87it/s, loss=0.0346, lr=0.0004]
Steps: 28%|██▊ | 197/700 [01:52<04:28, 1.87it/s, loss=0.128, lr=0.0004]
Steps: 28%|██▊ | 198/700 [01:53<04:31, 1.85it/s, loss=0.128, lr=0.0004]
Steps: 28%|██▊ | 198/700 [01:53<04:31, 1.85it/s, loss=0.00269, lr=0.0004]
Steps: 28%|██▊ | 199/700 [01:53<04:32, 1.84it/s, loss=0.00269, lr=0.0004]
Steps: 28%|██▊ | 199/700 [01:53<04:32, 1.84it/s, loss=0.0341, lr=0.0004]
Steps: 29%|██▊ | 200/700 [01:54<04:39, 1.79it/s, loss=0.0341, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_200.safetensors
LORA Unet Moved 0.0009888404747471213
LORA CLIP Moved 4.0488466765964404e-05
Steps: 29%|██▊ | 200/700 [01:54<04:39, 1.79it/s, loss=0.12, lr=0.0004]
Steps: 29%|██▊ | 201/700 [01:55<05:00, 1.66it/s, loss=0.12, lr=0.0004]
Steps: 29%|██▊ | 201/700 [01:55<05:00, 1.66it/s, loss=0.0149, lr=0.0004]
Steps: 29%|██▉ | 202/700 [01:55<04:52, 1.70it/s, loss=0.0149, lr=0.0004]
Steps: 29%|██▉ | 202/700 [01:55<04:52, 1.70it/s, loss=0.0194, lr=0.0004]
Steps: 29%|██▉ | 203/700 [01:56<04:44, 1.75it/s, loss=0.0194, lr=0.0004]
Steps: 29%|██▉ | 203/700 [01:56<04:44, 1.75it/s, loss=0.00362, lr=0.0004]
Steps: 29%|██▉ | 204/700 [01:56<04:42, 1.76it/s, loss=0.00362, lr=0.0004]
Steps: 29%|██▉ | 204/700 [01:56<04:42, 1.76it/s, loss=0.0177, lr=0.0004]
Steps: 29%|██▉ | 205/700 [01:57<04:39, 1.77it/s, loss=0.0177, lr=0.0004]
Steps: 29%|██▉ | 205/700 [01:57<04:39, 1.77it/s, loss=0.0221, lr=0.0004]
Steps: 29%|██▉ | 206/700 [01:57<04:36, 1.78it/s, loss=0.0221, lr=0.0004]
Steps: 29%|██▉ | 206/700 [01:57<04:36, 1.78it/s, loss=0.0169, lr=0.0004]
Steps: 30%|██▉ | 207/700 [01:58<04:34, 1.80it/s, loss=0.0169, lr=0.0004]
Steps: 30%|██▉ | 207/700 [01:58<04:34, 1.80it/s, loss=0.0307, lr=0.0004]
Steps: 30%|██▉ | 208/700 [01:58<04:39, 1.76it/s, loss=0.0307, lr=0.0004]
Steps: 30%|██▉ | 208/700 [01:58<04:39, 1.76it/s, loss=0.0412, lr=0.0004]
Steps: 30%|██▉ | 209/700 [01:59<04:51, 1.69it/s, loss=0.0412, lr=0.0004]
Steps: 30%|██▉ | 209/700 [01:59<04:51, 1.69it/s, loss=0.0109, lr=0.0004]
Steps: 30%|███ | 210/700 [02:00<04:45, 1.72it/s, loss=0.0109, lr=0.0004]
Steps: 30%|███ | 210/700 [02:00<04:45, 1.72it/s, loss=0.00631, lr=0.0004]
Steps: 30%|███ | 211/700 [02:00<04:42, 1.73it/s, loss=0.00631, lr=0.0004]
Steps: 30%|███ | 211/700 [02:00<04:42, 1.73it/s, loss=0.135, lr=0.0004]
Steps: 30%|███ | 212/700 [02:01<04:35, 1.77it/s, loss=0.135, lr=0.0004]
Steps: 30%|███ | 212/700 [02:01<04:35, 1.77it/s, loss=0.0202, lr=0.0004]
Steps: 30%|███ | 213/700 [02:01<04:34, 1.77it/s, loss=0.0202, lr=0.0004]
Steps: 30%|███ | 213/700 [02:01<04:34, 1.77it/s, loss=0.00592, lr=0.0004]
Steps: 31%|███ | 214/700 [02:02<04:34, 1.77it/s, loss=0.00592, lr=0.0004]
Steps: 31%|███ | 214/700 [02:02<04:34, 1.77it/s, loss=0.267, lr=0.0004]
Steps: 31%|███ | 215/700 [02:02<04:28, 1.81it/s, loss=0.267, lr=0.0004]
Steps: 31%|███ | 215/700 [02:02<04:28, 1.81it/s, loss=0.0209, lr=0.0004]
Steps: 31%|███ | 216/700 [02:03<04:25, 1.82it/s, loss=0.0209, lr=0.0004]
Steps: 31%|███ | 216/700 [02:03<04:25, 1.82it/s, loss=0.0375, lr=0.0004]
Steps: 31%|███ | 217/700 [02:03<04:26, 1.81it/s, loss=0.0375, lr=0.0004]
Steps: 31%|███ | 217/700 [02:03<04:26, 1.81it/s, loss=0.00811, lr=0.0004]
Steps: 31%|███ | 218/700 [02:04<04:28, 1.79it/s, loss=0.00811, lr=0.0004]
Steps: 31%|███ | 218/700 [02:04<04:28, 1.79it/s, loss=0.0201, lr=0.0004]
Steps: 31%|███▏ | 219/700 [02:05<04:25, 1.81it/s, loss=0.0201, lr=0.0004]
Steps: 31%|███▏ | 219/700 [02:05<04:25, 1.81it/s, loss=0.0114, lr=0.0004]
Steps: 31%|███▏ | 220/700 [02:05<04:24, 1.82it/s, loss=0.0114, lr=0.0004]
Steps: 31%|███▏ | 220/700 [02:05<04:24, 1.82it/s, loss=0.104, lr=0.0004]
Steps: 32%|███▏ | 221/700 [02:06<04:24, 1.81it/s, loss=0.104, lr=0.0004]
Steps: 32%|███▏ | 221/700 [02:06<04:24, 1.81it/s, loss=0.0184, lr=0.0004]
Steps: 32%|███▏ | 222/700 [02:06<04:27, 1.78it/s, loss=0.0184, lr=0.0004]
Steps: 32%|███▏ | 222/700 [02:06<04:27, 1.78it/s, loss=0.0112, lr=0.0004]
Steps: 32%|███▏ | 223/700 [02:07<04:34, 1.73it/s, loss=0.0112, lr=0.0004]
Steps: 32%|███▏ | 223/700 [02:07<04:34, 1.73it/s, loss=0.0133, lr=0.0004]
Steps: 32%|███▏ | 224/700 [02:07<04:32, 1.75it/s, loss=0.0133, lr=0.0004]
Steps: 32%|███▏ | 224/700 [02:07<04:32, 1.75it/s, loss=0.0264, lr=0.0004]
Steps: 32%|███▏ | 225/700 [02:08<04:26, 1.78it/s, loss=0.0264, lr=0.0004]
Steps: 32%|███▏ | 225/700 [02:08<04:26, 1.78it/s, loss=0.0537, lr=0.0004]
Steps: 32%|███▏ | 226/700 [02:09<04:25, 1.79it/s, loss=0.0537, lr=0.0004]
Steps: 32%|███▏ | 226/700 [02:09<04:25, 1.79it/s, loss=0.00868, lr=0.0004]
Steps: 32%|███▏ | 227/700 [02:09<04:23, 1.79it/s, loss=0.00868, lr=0.0004]
Steps: 32%|███▏ | 227/700 [02:09<04:23, 1.79it/s, loss=0.0373, lr=0.0004]
Steps: 33%|███▎ | 228/700 [02:10<04:19, 1.82it/s, loss=0.0373, lr=0.0004]
Steps: 33%|███▎ | 228/700 [02:10<04:19, 1.82it/s, loss=0.0108, lr=0.0004]
Steps: 33%|███▎ | 229/700 [02:10<04:16, 1.83it/s, loss=0.0108, lr=0.0004]
Steps: 33%|███▎ | 229/700 [02:10<04:16, 1.83it/s, loss=0.0296, lr=0.0004]
Steps: 33%|███▎ | 230/700 [02:11<04:14, 1.85it/s, loss=0.0296, lr=0.0004]
Steps: 33%|███▎ | 230/700 [02:11<04:14, 1.85it/s, loss=0.0044, lr=0.0004]
Steps: 33%|███▎ | 231/700 [02:11<04:15, 1.83it/s, loss=0.0044, lr=0.0004]
Steps: 33%|███▎ | 231/700 [02:11<04:15, 1.83it/s, loss=0.156, lr=0.0004]
Steps: 33%|███▎ | 232/700 [02:12<04:15, 1.83it/s, loss=0.156, lr=0.0004]
Steps: 33%|███▎ | 232/700 [02:12<04:15, 1.83it/s, loss=0.00477, lr=0.0004]
Steps: 33%|███▎ | 233/700 [02:12<04:11, 1.86it/s, loss=0.00477, lr=0.0004]
Steps: 33%|███▎ | 233/700 [02:12<04:11, 1.86it/s, loss=0.112, lr=0.0004]
Steps: 33%|███▎ | 234/700 [02:13<04:09, 1.87it/s, loss=0.112, lr=0.0004]
Steps: 33%|███▎ | 234/700 [02:13<04:09, 1.87it/s, loss=0.0136, lr=0.0004]
Steps: 34%|███▎ | 235/700 [02:13<04:05, 1.89it/s, loss=0.0136, lr=0.0004]
Steps: 34%|███▎ | 235/700 [02:13<04:05, 1.89it/s, loss=0.0123, lr=0.0004]
Steps: 34%|███▎ | 236/700 [02:14<04:03, 1.91it/s, loss=0.0123, lr=0.0004]
Steps: 34%|███▎ | 236/700 [02:14<04:03, 1.91it/s, loss=0.022, lr=0.0004]
Steps: 34%|███▍ | 237/700 [02:14<04:00, 1.93it/s, loss=0.022, lr=0.0004]
Steps: 34%|███▍ | 237/700 [02:14<04:00, 1.93it/s, loss=0.00886, lr=0.0004]
Steps: 34%|███▍ | 238/700 [02:15<03:59, 1.93it/s, loss=0.00886, lr=0.0004]
Steps: 34%|███▍ | 238/700 [02:15<03:59, 1.93it/s, loss=0.00845, lr=0.0004]
Steps: 34%|███▍ | 239/700 [02:15<03:57, 1.94it/s, loss=0.00845, lr=0.0004]
Steps: 34%|███▍ | 239/700 [02:15<03:57, 1.94it/s, loss=0.00988, lr=0.0004]
Steps: 34%|███▍ | 240/700 [02:16<03:56, 1.94it/s, loss=0.00988, lr=0.0004]
Steps: 34%|███▍ | 240/700 [02:16<03:56, 1.94it/s, loss=0.00246, lr=0.0004]
Steps: 34%|███▍ | 241/700 [02:16<03:53, 1.97it/s, loss=0.00246, lr=0.0004]
Steps: 34%|███▍ | 241/700 [02:16<03:53, 1.97it/s, loss=0.00873, lr=0.0004]
Steps: 35%|███▍ | 242/700 [02:17<03:51, 1.98it/s, loss=0.00873, lr=0.0004]
Steps: 35%|███▍ | 242/700 [02:17<03:51, 1.98it/s, loss=0.00512, lr=0.0004]
Steps: 35%|███▍ | 243/700 [02:17<03:49, 1.99it/s, loss=0.00512, lr=0.0004]
Steps: 35%|███▍ | 243/700 [02:17<03:49, 1.99it/s, loss=0.0248, lr=0.0004]
Steps: 35%|███▍ | 244/700 [02:18<03:47, 2.00it/s, loss=0.0248, lr=0.0004]
Steps: 35%|███▍ | 244/700 [02:18<03:47, 2.00it/s, loss=0.00431, lr=0.0004]
Steps: 35%|███▌ | 245/700 [02:18<03:49, 1.98it/s, loss=0.00431, lr=0.0004]
Steps: 35%|███▌ | 245/700 [02:18<03:49, 1.98it/s, loss=0.0201, lr=0.0004]
Steps: 35%|███▌ | 246/700 [02:19<03:53, 1.95it/s, loss=0.0201, lr=0.0004]
Steps: 35%|███▌ | 246/700 [02:19<03:53, 1.95it/s, loss=0.0103, lr=0.0004]
Steps: 35%|███▌ | 247/700 [02:19<03:50, 1.96it/s, loss=0.0103, lr=0.0004]
Steps: 35%|███▌ | 247/700 [02:19<03:50, 1.96it/s, loss=0.0497, lr=0.0004]
Steps: 35%|███▌ | 248/700 [02:20<03:55, 1.92it/s, loss=0.0497, lr=0.0004]
Steps: 35%|███▌ | 248/700 [02:20<03:55, 1.92it/s, loss=0.163, lr=0.0004]
Steps: 36%|███▌ | 249/700 [02:21<04:03, 1.85it/s, loss=0.163, lr=0.0004]
Steps: 36%|███▌ | 249/700 [02:21<04:03, 1.85it/s, loss=0.0142, lr=0.0004]
Steps: 36%|███▌ | 250/700 [02:21<03:57, 1.90it/s, loss=0.0142, lr=0.0004]
Steps: 36%|███▌ | 250/700 [02:21<03:57, 1.90it/s, loss=0.00624, lr=0.0004]
Steps: 36%|███▌ | 251/700 [02:22<03:53, 1.92it/s, loss=0.00624, lr=0.0004]
Steps: 36%|███▌ | 251/700 [02:22<03:53, 1.92it/s, loss=0.0026, lr=0.0004]
Steps: 36%|███▌ | 252/700 [02:22<03:52, 1.93it/s, loss=0.0026, lr=0.0004]
Steps: 36%|███▌ | 252/700 [02:22<03:52, 1.93it/s, loss=0.15, lr=0.0004]
Steps: 36%|███▌ | 253/700 [02:23<03:51, 1.93it/s, loss=0.15, lr=0.0004]
Steps: 36%|███▌ | 253/700 [02:23<03:51, 1.93it/s, loss=0.0312, lr=0.0004]
Steps: 36%|███▋ | 254/700 [02:23<03:51, 1.93it/s, loss=0.0312, lr=0.0004]
Steps: 36%|███▋ | 254/700 [02:23<03:51, 1.93it/s, loss=0.0161, lr=0.0004]
Steps: 36%|███▋ | 255/700 [02:24<03:50, 1.93it/s, loss=0.0161, lr=0.0004]
Steps: 36%|███▋ | 255/700 [02:24<03:50, 1.93it/s, loss=0.00627, lr=0.0004]
Steps: 37%|███▋ | 256/700 [02:24<03:47, 1.95it/s, loss=0.00627, lr=0.0004]
Steps: 37%|███▋ | 256/700 [02:24<03:47, 1.95it/s, loss=0.0224, lr=0.0004]
Steps: 37%|███▋ | 257/700 [02:25<03:48, 1.94it/s, loss=0.0224, lr=0.0004]
Steps: 37%|███▋ | 257/700 [02:25<03:48, 1.94it/s, loss=0.0383, lr=0.0004]
Steps: 37%|███▋ | 258/700 [02:25<03:53, 1.90it/s, loss=0.0383, lr=0.0004]
Steps: 37%|███▋ | 258/700 [02:25<03:53, 1.90it/s, loss=0.0124, lr=0.0004]
Steps: 37%|███▋ | 259/700 [02:26<04:00, 1.84it/s, loss=0.0124, lr=0.0004]
Steps: 37%|███▋ | 259/700 [02:26<04:00, 1.84it/s, loss=0.00859, lr=0.0004]
Steps: 37%|███▋ | 260/700 [02:26<03:59, 1.84it/s, loss=0.00859, lr=0.0004]
Steps: 37%|███▋ | 260/700 [02:26<03:59, 1.84it/s, loss=0.25, lr=0.0004]
Steps: 37%|███▋ | 261/700 [02:27<04:04, 1.80it/s, loss=0.25, lr=0.0004]
Steps: 37%|███▋ | 261/700 [02:27<04:04, 1.80it/s, loss=0.00184, lr=0.0004]
Steps: 37%|███▋ | 262/700 [02:28<04:07, 1.77it/s, loss=0.00184, lr=0.0004]
Steps: 37%|███▋ | 262/700 [02:28<04:07, 1.77it/s, loss=0.0153, lr=0.0004]
Steps: 38%|███▊ | 263/700 [02:28<04:12, 1.73it/s, loss=0.0153, lr=0.0004]
Steps: 38%|███▊ | 263/700 [02:28<04:12, 1.73it/s, loss=0.0682, lr=0.0004]
Steps: 38%|███▊ | 264/700 [02:29<04:14, 1.71it/s, loss=0.0682, lr=0.0004]
Steps: 38%|███▊ | 264/700 [02:29<04:14, 1.71it/s, loss=0.0619, lr=0.0004]
Steps: 38%|███▊ | 265/700 [02:29<04:28, 1.62it/s, loss=0.0619, lr=0.0004]
Steps: 38%|███▊ | 265/700 [02:29<04:28, 1.62it/s, loss=0.0181, lr=0.0004]
Steps: 38%|███▊ | 266/700 [02:30<04:25, 1.64it/s, loss=0.0181, lr=0.0004]
Steps: 38%|███▊ | 266/700 [02:30<04:25, 1.64it/s, loss=0.0288, lr=0.0004]
Steps: 38%|███▊ | 267/700 [02:31<04:23, 1.64it/s, loss=0.0288, lr=0.0004]
Steps: 38%|███▊ | 267/700 [02:31<04:23, 1.64it/s, loss=0.00962, lr=0.0004]
Steps: 38%|███▊ | 268/700 [02:31<04:28, 1.61it/s, loss=0.00962, lr=0.0004]
Steps: 38%|███▊ | 268/700 [02:31<04:28, 1.61it/s, loss=0.0127, lr=0.0004]
Steps: 38%|███▊ | 269/700 [02:32<04:27, 1.61it/s, loss=0.0127, lr=0.0004]
Steps: 38%|███▊ | 269/700 [02:32<04:27, 1.61it/s, loss=0.00764, lr=0.0004]
Steps: 39%|███▊ | 270/700 [02:33<04:29, 1.60it/s, loss=0.00764, lr=0.0004]
Steps: 39%|███▊ | 270/700 [02:33<04:29, 1.60it/s, loss=0.005, lr=0.0004]
Steps: 39%|███▊ | 271/700 [02:33<04:29, 1.59it/s, loss=0.005, lr=0.0004]
Steps: 39%|███▊ | 271/700 [02:33<04:29, 1.59it/s, loss=0.0286, lr=0.0004]
Steps: 39%|███▉ | 272/700 [02:34<04:31, 1.58it/s, loss=0.0286, lr=0.0004]
Steps: 39%|███▉ | 272/700 [02:34<04:31, 1.58it/s, loss=0.0257, lr=0.0004]
Steps: 39%|███▉ | 273/700 [02:34<04:24, 1.62it/s, loss=0.0257, lr=0.0004]
Steps: 39%|███▉ | 273/700 [02:34<04:24, 1.62it/s, loss=0.0963, lr=0.0004]
Steps: 39%|███▉ | 274/700 [02:35<04:26, 1.60it/s, loss=0.0963, lr=0.0004]
Steps: 39%|███▉ | 274/700 [02:35<04:26, 1.60it/s, loss=0.00725, lr=0.0004]
Steps: 39%|███▉ | 275/700 [02:36<04:19, 1.64it/s, loss=0.00725, lr=0.0004]
Steps: 39%|███▉ | 275/700 [02:36<04:19, 1.64it/s, loss=0.00157, lr=0.0004]
Steps: 39%|███▉ | 276/700 [02:36<04:14, 1.66it/s, loss=0.00157, lr=0.0004]
Steps: 39%|███▉ | 276/700 [02:36<04:14, 1.66it/s, loss=0.00832, lr=0.0004]
Steps: 40%|███▉ | 277/700 [02:37<04:13, 1.67it/s, loss=0.00832, lr=0.0004]
Steps: 40%|███▉ | 277/700 [02:37<04:13, 1.67it/s, loss=0.0604, lr=0.0004]
Steps: 40%|███▉ | 278/700 [02:37<04:10, 1.68it/s, loss=0.0604, lr=0.0004]
Steps: 40%|███▉ | 278/700 [02:37<04:10, 1.68it/s, loss=0.0378, lr=0.0004]
Steps: 40%|███▉ | 279/700 [02:38<04:06, 1.71it/s, loss=0.0378, lr=0.0004]
Steps: 40%|███▉ | 279/700 [02:38<04:06, 1.71it/s, loss=0.0044, lr=0.0004]
Steps: 40%|████ | 280/700 [02:39<04:06, 1.71it/s, loss=0.0044, lr=0.0004]
Steps: 40%|████ | 280/700 [02:39<04:06, 1.71it/s, loss=0.0125, lr=0.0004]
Steps: 40%|████ | 281/700 [02:39<04:02, 1.73it/s, loss=0.0125, lr=0.0004]
Steps: 40%|████ | 281/700 [02:39<04:02, 1.73it/s, loss=0.00308, lr=0.0004]
Steps: 40%|████ | 282/700 [02:40<03:59, 1.74it/s, loss=0.00308, lr=0.0004]
Steps: 40%|████ | 282/700 [02:40<03:59, 1.74it/s, loss=0.0162, lr=0.0004]
Steps: 40%|████ | 283/700 [02:40<03:59, 1.74it/s, loss=0.0162, lr=0.0004]
Steps: 40%|████ | 283/700 [02:40<03:59, 1.74it/s, loss=0.0964, lr=0.0004]
Steps: 41%|████ | 284/700 [02:41<04:00, 1.73it/s, loss=0.0964, lr=0.0004]
Steps: 41%|████ | 284/700 [02:41<04:00, 1.73it/s, loss=0.0236, lr=0.0004]
Steps: 41%|████ | 285/700 [02:41<03:58, 1.74it/s, loss=0.0236, lr=0.0004]
Steps: 41%|████ | 285/700 [02:41<03:58, 1.74it/s, loss=0.016, lr=0.0004]
Steps: 41%|████ | 286/700 [02:42<03:55, 1.76it/s, loss=0.016, lr=0.0004]
Steps: 41%|████ | 286/700 [02:42<03:55, 1.76it/s, loss=0.00831, lr=0.0004]
Steps: 41%|████ | 287/700 [02:43<03:54, 1.76it/s, loss=0.00831, lr=0.0004]
Steps: 41%|████ | 287/700 [02:43<03:54, 1.76it/s, loss=0.0241, lr=0.0004]
Steps: 41%|████ | 288/700 [02:43<03:49, 1.80it/s, loss=0.0241, lr=0.0004]
Steps: 41%|████ | 288/700 [02:43<03:49, 1.80it/s, loss=0.0839, lr=0.0004]
Steps: 41%|████▏ | 289/700 [02:44<03:41, 1.85it/s, loss=0.0839, lr=0.0004]
Steps: 41%|████▏ | 289/700 [02:44<03:41, 1.85it/s, loss=0.0263, lr=0.0004]
Steps: 41%|████▏ | 290/700 [02:44<03:35, 1.90it/s, loss=0.0263, lr=0.0004]
Steps: 41%|████▏ | 290/700 [02:44<03:35, 1.90it/s, loss=0.0967, lr=0.0004]
Steps: 42%|████▏ | 291/700 [02:45<03:31, 1.93it/s, loss=0.0967, lr=0.0004]
Steps: 42%|████▏ | 291/700 [02:45<03:31, 1.93it/s, loss=0.0111, lr=0.0004]
Steps: 42%|████▏ | 292/700 [02:45<03:30, 1.94it/s, loss=0.0111, lr=0.0004]
Steps: 42%|████▏ | 292/700 [02:45<03:30, 1.94it/s, loss=0.0426, lr=0.0004]
Steps: 42%|████▏ | 293/700 [02:46<03:32, 1.92it/s, loss=0.0426, lr=0.0004]
Steps: 42%|████▏ | 293/700 [02:46<03:32, 1.92it/s, loss=0.0054, lr=0.0004]
Steps: 42%|████▏ | 294/700 [02:46<03:31, 1.92it/s, loss=0.0054, lr=0.0004]
Steps: 42%|████▏ | 294/700 [02:46<03:31, 1.92it/s, loss=0.0031, lr=0.0004]
Steps: 42%|████▏ | 295/700 [02:47<03:31, 1.91it/s, loss=0.0031, lr=0.0004]
Steps: 42%|████▏ | 295/700 [02:47<03:31, 1.91it/s, loss=0.0399, lr=0.0004]
Steps: 42%|████▏ | 296/700 [02:47<03:35, 1.87it/s, loss=0.0399, lr=0.0004]
Steps: 42%|████▏ | 296/700 [02:47<03:35, 1.87it/s, loss=0.0144, lr=0.0004]
Steps: 42%|████▏ | 297/700 [02:48<03:36, 1.86it/s, loss=0.0144, lr=0.0004]
Steps: 42%|████▏ | 297/700 [02:48<03:36, 1.86it/s, loss=0.0868, lr=0.0004]
Steps: 43%|████▎ | 298/700 [02:48<03:37, 1.85it/s, loss=0.0868, lr=0.0004]
Steps: 43%|████▎ | 298/700 [02:48<03:37, 1.85it/s, loss=0.0358, lr=0.0004]
Steps: 43%|████▎ | 299/700 [02:49<03:39, 1.82it/s, loss=0.0358, lr=0.0004]
Steps: 43%|████▎ | 299/700 [02:49<03:39, 1.82it/s, loss=0.0683, lr=0.0004]
Steps: 43%|████▎ | 300/700 [02:49<03:36, 1.84it/s, loss=0.0683, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_300.safetensors
LORA Unet Moved 0.0012533192057162523
LORA CLIP Moved 5.122544462210499e-05
Steps: 43%|████▎ | 300/700 [02:49<03:36, 1.84it/s, loss=0.00153, lr=0.0004]
Steps: 43%|████▎ | 301/700 [02:50<04:01, 1.65it/s, loss=0.00153, lr=0.0004]
Steps: 43%|████▎ | 301/700 [02:50<04:01, 1.65it/s, loss=0.0337, lr=0.0004]
Steps: 43%|████▎ | 302/700 [02:51<03:52, 1.71it/s, loss=0.0337, lr=0.0004]
Steps: 43%|████▎ | 302/700 [02:51<03:52, 1.71it/s, loss=0.0974, lr=0.0004]
Steps: 43%|████▎ | 303/700 [02:51<03:40, 1.80it/s, loss=0.0974, lr=0.0004]
Steps: 43%|████▎ | 303/700 [02:51<03:40, 1.80it/s, loss=0.00531, lr=0.0004]
Steps: 43%|████▎ | 304/700 [02:52<03:34, 1.85it/s, loss=0.00531, lr=0.0004]
Steps: 43%|████▎ | 304/700 [02:52<03:34, 1.85it/s, loss=0.0179, lr=0.0004]
Steps: 44%|████▎ | 305/700 [02:52<03:32, 1.86it/s, loss=0.0179, lr=0.0004]
Steps: 44%|████▎ | 305/700 [02:52<03:32, 1.86it/s, loss=0.0687, lr=0.0004]
Steps: 44%|████▎ | 306/700 [02:53<03:35, 1.83it/s, loss=0.0687, lr=0.0004]
Steps: 44%|████▎ | 306/700 [02:53<03:35, 1.83it/s, loss=0.00892, lr=0.0004]
Steps: 44%|████▍ | 307/700 [02:53<03:36, 1.81it/s, loss=0.00892, lr=0.0004]
Steps: 44%|████▍ | 307/700 [02:53<03:36, 1.81it/s, loss=0.0717, lr=0.0004]
Steps: 44%|████▍ | 308/700 [02:54<03:31, 1.85it/s, loss=0.0717, lr=0.0004]
Steps: 44%|████▍ | 308/700 [02:54<03:31, 1.85it/s, loss=0.00435, lr=0.0004]
Steps: 44%|████▍ | 309/700 [02:54<03:31, 1.85it/s, loss=0.00435, lr=0.0004]
Steps: 44%|████▍ | 309/700 [02:54<03:31, 1.85it/s, loss=0.00829, lr=0.0004]
Steps: 44%|████▍ | 310/700 [02:55<03:36, 1.80it/s, loss=0.00829, lr=0.0004]
Steps: 44%|████▍ | 310/700 [02:55<03:36, 1.80it/s, loss=0.0713, lr=0.0004]
Steps: 44%|████▍ | 311/700 [02:55<03:31, 1.84it/s, loss=0.0713, lr=0.0004]
Steps: 44%|████▍ | 311/700 [02:55<03:31, 1.84it/s, loss=0.00767, lr=0.0004]
Steps: 45%|████▍ | 312/700 [02:56<03:25, 1.89it/s, loss=0.00767, lr=0.0004]
Steps: 45%|████▍ | 312/700 [02:56<03:25, 1.89it/s, loss=0.0893, lr=0.0004]
Steps: 45%|████▍ | 313/700 [02:56<03:19, 1.94it/s, loss=0.0893, lr=0.0004]
Steps: 45%|████▍ | 313/700 [02:56<03:19, 1.94it/s, loss=0.019, lr=0.0004]
Steps: 45%|████▍ | 314/700 [02:57<03:18, 1.95it/s, loss=0.019, lr=0.0004]
Steps: 45%|████▍ | 314/700 [02:57<03:18, 1.95it/s, loss=0.00861, lr=0.0004]
Steps: 45%|████▌ | 315/700 [02:57<03:15, 1.97it/s, loss=0.00861, lr=0.0004]
Steps: 45%|████▌ | 315/700 [02:57<03:15, 1.97it/s, loss=0.0777, lr=0.0004]
Steps: 45%|████▌ | 316/700 [02:58<03:13, 1.98it/s, loss=0.0777, lr=0.0004]
Steps: 45%|████▌ | 316/700 [02:58<03:13, 1.98it/s, loss=0.00247, lr=0.0004]
Steps: 45%|████▌ | 317/700 [02:58<03:15, 1.96it/s, loss=0.00247, lr=0.0004]
Steps: 45%|████▌ | 317/700 [02:58<03:15, 1.96it/s, loss=0.229, lr=0.0004]
Steps: 45%|████▌ | 318/700 [02:59<03:17, 1.93it/s, loss=0.229, lr=0.0004]
Steps: 45%|████▌ | 318/700 [02:59<03:17, 1.93it/s, loss=0.0106, lr=0.0004]
Steps: 46%|████▌ | 319/700 [03:00<03:21, 1.89it/s, loss=0.0106, lr=0.0004]
Steps: 46%|████▌ | 319/700 [03:00<03:21, 1.89it/s, loss=0.00504, lr=0.0004]
Steps: 46%|████▌ | 320/700 [03:00<03:23, 1.87it/s, loss=0.00504, lr=0.0004]
Steps: 46%|████▌ | 320/700 [03:00<03:23, 1.87it/s, loss=0.00787, lr=0.0004]
Steps: 46%|████▌ | 321/700 [03:01<03:25, 1.84it/s, loss=0.00787, lr=0.0004]
Steps: 46%|████▌ | 321/700 [03:01<03:25, 1.84it/s, loss=0.103, lr=0.0004]
Steps: 46%|████▌ | 322/700 [03:01<03:25, 1.84it/s, loss=0.103, lr=0.0004]
Steps: 46%|████▌ | 322/700 [03:01<03:25, 1.84it/s, loss=0.028, lr=0.0004]
Steps: 46%|████▌ | 323/700 [03:02<03:27, 1.82it/s, loss=0.028, lr=0.0004]
Steps: 46%|████▌ | 323/700 [03:02<03:27, 1.82it/s, loss=0.173, lr=0.0004]
Steps: 46%|████▋ | 324/700 [03:02<03:27, 1.82it/s, loss=0.173, lr=0.0004]
Steps: 46%|████▋ | 324/700 [03:02<03:27, 1.82it/s, loss=0.0602, lr=0.0004]
Steps: 46%|████▋ | 325/700 [03:03<03:28, 1.80it/s, loss=0.0602, lr=0.0004]
Steps: 46%|████▋ | 325/700 [03:03<03:28, 1.80it/s, loss=0.0443, lr=0.0004]
Steps: 47%|████▋ | 326/700 [03:03<03:27, 1.81it/s, loss=0.0443, lr=0.0004]
Steps: 47%|████▋ | 326/700 [03:03<03:27, 1.81it/s, loss=0.0424, lr=0.0004]
Steps: 47%|████▋ | 327/700 [03:04<03:27, 1.80it/s, loss=0.0424, lr=0.0004]
Steps: 47%|████▋ | 327/700 [03:04<03:27, 1.80it/s, loss=0.00866, lr=0.0004]
Steps: 47%|████▋ | 328/700 [03:05<03:29, 1.78it/s, loss=0.00866, lr=0.0004]
Steps: 47%|████▋ | 328/700 [03:05<03:29, 1.78it/s, loss=0.0145, lr=0.0004]
Steps: 47%|████▋ | 329/700 [03:05<03:27, 1.79it/s, loss=0.0145, lr=0.0004]
Steps: 47%|████▋ | 329/700 [03:05<03:27, 1.79it/s, loss=0.0291, lr=0.0004]
Steps: 47%|████▋ | 330/700 [03:06<03:27, 1.79it/s, loss=0.0291, lr=0.0004]
Steps: 47%|████▋ | 330/700 [03:06<03:27, 1.79it/s, loss=0.112, lr=0.0004]
Steps: 47%|████▋ | 331/700 [03:06<03:27, 1.78it/s, loss=0.112, lr=0.0004]
Steps: 47%|████▋ | 331/700 [03:06<03:27, 1.78it/s, loss=0.0583, lr=0.0004]
Steps: 47%|████▋ | 332/700 [03:07<03:29, 1.76it/s, loss=0.0583, lr=0.0004]
Steps: 47%|████▋ | 332/700 [03:07<03:29, 1.76it/s, loss=0.0574, lr=0.0004]
Steps: 48%|████▊ | 333/700 [03:07<03:29, 1.75it/s, loss=0.0574, lr=0.0004]
Steps: 48%|████▊ | 333/700 [03:07<03:29, 1.75it/s, loss=0.00921, lr=0.0004]
Steps: 48%|████▊ | 334/700 [03:08<03:21, 1.82it/s, loss=0.00921, lr=0.0004]
Steps: 48%|████▊ | 334/700 [03:08<03:21, 1.82it/s, loss=0.0178, lr=0.0004]
Steps: 48%|████▊ | 335/700 [03:08<03:19, 1.83it/s, loss=0.0178, lr=0.0004]
Steps: 48%|████▊ | 335/700 [03:08<03:19, 1.83it/s, loss=0.0147, lr=0.0004]
Steps: 48%|████▊ | 336/700 [03:09<03:21, 1.80it/s, loss=0.0147, lr=0.0004]
Steps: 48%|████▊ | 336/700 [03:09<03:21, 1.80it/s, loss=0.0233, lr=0.0004]
Steps: 48%|████▊ | 337/700 [03:10<03:22, 1.80it/s, loss=0.0233, lr=0.0004]
Steps: 48%|████▊ | 337/700 [03:10<03:22, 1.80it/s, loss=0.0265, lr=0.0004]
Steps: 48%|████▊ | 338/700 [03:10<03:22, 1.79it/s, loss=0.0265, lr=0.0004]
Steps: 48%|████▊ | 338/700 [03:10<03:22, 1.79it/s, loss=0.0103, lr=0.0004]
Steps: 48%|████▊ | 339/700 [03:11<03:26, 1.75it/s, loss=0.0103, lr=0.0004]
Steps: 48%|████▊ | 339/700 [03:11<03:26, 1.75it/s, loss=0.00171, lr=0.0004]
Steps: 49%|████▊ | 340/700 [03:11<03:23, 1.77it/s, loss=0.00171, lr=0.0004]
Steps: 49%|████▊ | 340/700 [03:11<03:23, 1.77it/s, loss=0.226, lr=0.0004]
Steps: 49%|████▊ | 341/700 [03:12<03:18, 1.80it/s, loss=0.226, lr=0.0004]
Steps: 49%|████▊ | 341/700 [03:12<03:18, 1.80it/s, loss=0.0407, lr=0.0004]
Steps: 49%|████▉ | 342/700 [03:12<03:13, 1.85it/s, loss=0.0407, lr=0.0004]
Steps: 49%|████▉ | 342/700 [03:12<03:13, 1.85it/s, loss=0.0194, lr=0.0004]
Steps: 49%|████▉ | 343/700 [03:13<03:08, 1.89it/s, loss=0.0194, lr=0.0004]
Steps: 49%|████▉ | 343/700 [03:13<03:08, 1.89it/s, loss=0.00992, lr=0.0004]
Steps: 49%|████▉ | 344/700 [03:13<03:06, 1.90it/s, loss=0.00992, lr=0.0004]
Steps: 49%|████▉ | 344/700 [03:13<03:06, 1.90it/s, loss=0.0107, lr=0.0004]
Steps: 49%|████▉ | 345/700 [03:14<03:03, 1.93it/s, loss=0.0107, lr=0.0004]
Steps: 49%|████▉ | 345/700 [03:14<03:03, 1.93it/s, loss=0.028, lr=0.0004]
Steps: 49%|████▉ | 346/700 [03:14<03:02, 1.93it/s, loss=0.028, lr=0.0004]
Steps: 49%|████▉ | 346/700 [03:14<03:02, 1.93it/s, loss=0.00153, lr=0.0004]
Steps: 50%|████▉ | 347/700 [03:15<03:04, 1.91it/s, loss=0.00153, lr=0.0004]
Steps: 50%|████▉ | 347/700 [03:15<03:04, 1.91it/s, loss=0.0558, lr=0.0004]
Steps: 50%|████▉ | 348/700 [03:15<03:07, 1.88it/s, loss=0.0558, lr=0.0004]
Steps: 50%|████▉ | 348/700 [03:15<03:07, 1.88it/s, loss=0.0713, lr=0.0004]
Steps: 50%|████▉ | 349/700 [03:16<03:08, 1.86it/s, loss=0.0713, lr=0.0004]
Steps: 50%|████▉ | 349/700 [03:16<03:08, 1.86it/s, loss=0.0164, lr=0.0004]
Steps: 50%|█████ | 350/700 [03:17<03:10, 1.83it/s, loss=0.0164, lr=0.0004]
Steps: 50%|█████ | 350/700 [03:17<03:10, 1.83it/s, loss=0.243, lr=0.0004]
Steps: 50%|█████ | 351/700 [03:17<03:12, 1.82it/s, loss=0.243, lr=0.0004]
Steps: 50%|█████ | 351/700 [03:17<03:12, 1.82it/s, loss=0.0152, lr=0.0004]
Steps: 50%|█████ | 352/700 [03:18<03:11, 1.82it/s, loss=0.0152, lr=0.0004]
Steps: 50%|█████ | 352/700 [03:18<03:11, 1.82it/s, loss=0.0497, lr=0.0004]
Steps: 50%|█████ | 353/700 [03:18<03:10, 1.82it/s, loss=0.0497, lr=0.0004]
Steps: 50%|█████ | 353/700 [03:18<03:10, 1.82it/s, loss=0.0611, lr=0.0004]
Steps: 51%|█████ | 354/700 [03:19<03:07, 1.84it/s, loss=0.0611, lr=0.0004]
Steps: 51%|█████ | 354/700 [03:19<03:07, 1.84it/s, loss=0.0738, lr=0.0004]
Steps: 51%|█████ | 355/700 [03:19<03:04, 1.87it/s, loss=0.0738, lr=0.0004]
Steps: 51%|█████ | 355/700 [03:19<03:04, 1.87it/s, loss=0.00715, lr=0.0004]
Steps: 51%|█████ | 356/700 [03:20<03:06, 1.84it/s, loss=0.00715, lr=0.0004]
Steps: 51%|█████ | 356/700 [03:20<03:06, 1.84it/s, loss=0.0472, lr=0.0004]
Steps: 51%|█████ | 357/700 [03:20<03:06, 1.84it/s, loss=0.0472, lr=0.0004]
Steps: 51%|█████ | 357/700 [03:20<03:06, 1.84it/s, loss=0.0275, lr=0.0004]
Steps: 51%|█████ | 358/700 [03:21<03:06, 1.83it/s, loss=0.0275, lr=0.0004]
Steps: 51%|█████ | 358/700 [03:21<03:06, 1.83it/s, loss=0.111, lr=0.0004]
Steps: 51%|█████▏ | 359/700 [03:22<03:05, 1.84it/s, loss=0.111, lr=0.0004]
Steps: 51%|█████▏ | 359/700 [03:22<03:05, 1.84it/s, loss=0.0267, lr=0.0004]
Steps: 51%|█████▏ | 360/700 [03:22<03:07, 1.82it/s, loss=0.0267, lr=0.0004]
Steps: 51%|█████▏ | 360/700 [03:22<03:07, 1.82it/s, loss=0.0598, lr=0.0004]
Steps: 52%|█████▏ | 361/700 [03:23<03:07, 1.81it/s, loss=0.0598, lr=0.0004]
Steps: 52%|█████▏ | 361/700 [03:23<03:07, 1.81it/s, loss=0.0234, lr=0.0004]
Steps: 52%|█████▏ | 362/700 [03:23<03:08, 1.79it/s, loss=0.0234, lr=0.0004]
Steps: 52%|█████▏ | 362/700 [03:23<03:08, 1.79it/s, loss=0.00394, lr=0.0004]
Steps: 52%|█████▏ | 363/700 [03:24<03:07, 1.80it/s, loss=0.00394, lr=0.0004]
Steps: 52%|█████▏ | 363/700 [03:24<03:07, 1.80it/s, loss=0.103, lr=0.0004]
Steps: 52%|█████▏ | 364/700 [03:24<03:07, 1.80it/s, loss=0.103, lr=0.0004]
Steps: 52%|█████▏ | 364/700 [03:24<03:07, 1.80it/s, loss=0.0446, lr=0.0004]
Steps: 52%|█████▏ | 365/700 [03:25<03:07, 1.79it/s, loss=0.0446, lr=0.0004]
Steps: 52%|█████▏ | 365/700 [03:25<03:07, 1.79it/s, loss=0.0886, lr=0.0004]
Steps: 52%|█████▏ | 366/700 [03:25<03:03, 1.82it/s, loss=0.0886, lr=0.0004]
Steps: 52%|█████▏ | 366/700 [03:25<03:03, 1.82it/s, loss=0.00974, lr=0.0004]
Steps: 52%|█████▏ | 367/700 [03:26<02:57, 1.87it/s, loss=0.00974, lr=0.0004]
Steps: 52%|█████▏ | 367/700 [03:26<02:57, 1.87it/s, loss=0.0581, lr=0.0004]
Steps: 53%|█████▎ | 368/700 [03:26<02:55, 1.89it/s, loss=0.0581, lr=0.0004]
Steps: 53%|█████▎ | 368/700 [03:26<02:55, 1.89it/s, loss=0.0141, lr=0.0004]
Steps: 53%|█████▎ | 369/700 [03:27<02:53, 1.91it/s, loss=0.0141, lr=0.0004]
Steps: 53%|█████▎ | 369/700 [03:27<02:53, 1.91it/s, loss=0.108, lr=0.0004]
Steps: 53%|█████▎ | 370/700 [03:27<02:51, 1.93it/s, loss=0.108, lr=0.0004]
Steps: 53%|█████▎ | 370/700 [03:27<02:51, 1.93it/s, loss=0.0274, lr=0.0004]
Steps: 53%|█████▎ | 371/700 [03:28<02:53, 1.90it/s, loss=0.0274, lr=0.0004]
Steps: 53%|█████▎ | 371/700 [03:28<02:53, 1.90it/s, loss=0.0238, lr=0.0004]
Steps: 53%|█████▎ | 372/700 [03:29<02:55, 1.87it/s, loss=0.0238, lr=0.0004]
Steps: 53%|█████▎ | 372/700 [03:29<02:55, 1.87it/s, loss=0.0135, lr=0.0004]
Steps: 53%|█████▎ | 373/700 [03:29<02:56, 1.85it/s, loss=0.0135, lr=0.0004]
Steps: 53%|█████▎ | 373/700 [03:29<02:56, 1.85it/s, loss=0.0273, lr=0.0004]
Steps: 53%|█████▎ | 374/700 [03:30<02:57, 1.84it/s, loss=0.0273, lr=0.0004]
Steps: 53%|█████▎ | 374/700 [03:30<02:57, 1.84it/s, loss=0.0107, lr=0.0004]
Steps: 54%|█████▎ | 375/700 [03:30<02:58, 1.82it/s, loss=0.0107, lr=0.0004]
Steps: 54%|█████▎ | 375/700 [03:30<02:58, 1.82it/s, loss=0.117, lr=0.0004]
Steps: 54%|█████▎ | 376/700 [03:31<02:57, 1.82it/s, loss=0.117, lr=0.0004]
Steps: 54%|█████▎ | 376/700 [03:31<02:57, 1.82it/s, loss=0.00753, lr=0.0004]
Steps: 54%|█████▍ | 377/700 [03:31<02:53, 1.86it/s, loss=0.00753, lr=0.0004]
Steps: 54%|█████▍ | 377/700 [03:31<02:53, 1.86it/s, loss=0.00374, lr=0.0004]
Steps: 54%|█████▍ | 378/700 [03:32<02:50, 1.89it/s, loss=0.00374, lr=0.0004]
Steps: 54%|█████▍ | 378/700 [03:32<02:50, 1.89it/s, loss=0.00199, lr=0.0004]
Steps: 54%|█████▍ | 379/700 [03:32<02:49, 1.89it/s, loss=0.00199, lr=0.0004]
Steps: 54%|█████▍ | 379/700 [03:32<02:49, 1.89it/s, loss=0.0103, lr=0.0004]
Steps: 54%|█████▍ | 380/700 [03:33<02:50, 1.88it/s, loss=0.0103, lr=0.0004]
Steps: 54%|█████▍ | 380/700 [03:33<02:50, 1.88it/s, loss=0.0585, lr=0.0004]
Steps: 54%|█████▍ | 381/700 [03:33<02:50, 1.87it/s, loss=0.0585, lr=0.0004]
Steps: 54%|█████▍ | 381/700 [03:33<02:50, 1.87it/s, loss=0.00844, lr=0.0004]
Steps: 55%|█████▍ | 382/700 [03:34<02:50, 1.87it/s, loss=0.00844, lr=0.0004]
Steps: 55%|█████▍ | 382/700 [03:34<02:50, 1.87it/s, loss=0.0385, lr=0.0004]
Steps: 55%|█████▍ | 383/700 [03:34<02:46, 1.90it/s, loss=0.0385, lr=0.0004]
Steps: 55%|█████▍ | 383/700 [03:34<02:46, 1.90it/s, loss=0.0191, lr=0.0004]
Steps: 55%|█████▍ | 384/700 [03:35<02:46, 1.90it/s, loss=0.0191, lr=0.0004]
Steps: 55%|█████▍ | 384/700 [03:35<02:46, 1.90it/s, loss=0.00918, lr=0.0004]
Steps: 55%|█████▌ | 385/700 [03:35<02:47, 1.88it/s, loss=0.00918, lr=0.0004]
Steps: 55%|█████▌ | 385/700 [03:36<02:47, 1.88it/s, loss=0.0416, lr=0.0004]
Steps: 55%|█████▌ | 386/700 [03:36<02:50, 1.84it/s, loss=0.0416, lr=0.0004]
Steps: 55%|█████▌ | 386/700 [03:36<02:50, 1.84it/s, loss=0.0671, lr=0.0004]
Steps: 55%|█████▌ | 387/700 [03:37<02:50, 1.84it/s, loss=0.0671, lr=0.0004]
Steps: 55%|█████▌ | 387/700 [03:37<02:50, 1.84it/s, loss=0.0628, lr=0.0004]
Steps: 55%|█████▌ | 388/700 [03:37<02:50, 1.83it/s, loss=0.0628, lr=0.0004]
Steps: 55%|█████▌ | 388/700 [03:37<02:50, 1.83it/s, loss=0.00164, lr=0.0004]
Steps: 56%|█████▌ | 389/700 [03:38<02:50, 1.83it/s, loss=0.00164, lr=0.0004]
Steps: 56%|█████▌ | 389/700 [03:38<02:50, 1.83it/s, loss=0.0177, lr=0.0004]
Steps: 56%|█████▌ | 390/700 [03:38<02:49, 1.83it/s, loss=0.0177, lr=0.0004]
Steps: 56%|█████▌ | 390/700 [03:38<02:49, 1.83it/s, loss=0.0583, lr=0.0004]
Steps: 56%|█████▌ | 391/700 [03:39<02:54, 1.77it/s, loss=0.0583, lr=0.0004]
Steps: 56%|█████▌ | 391/700 [03:39<02:54, 1.77it/s, loss=0.0428, lr=0.0004]
Steps: 56%|█████▌ | 392/700 [03:39<02:55, 1.76it/s, loss=0.0428, lr=0.0004]
Steps: 56%|█████▌ | 392/700 [03:39<02:55, 1.76it/s, loss=0.01, lr=0.0004]
Steps: 56%|█████▌ | 393/700 [03:40<02:55, 1.75it/s, loss=0.01, lr=0.0004]
Steps: 56%|█████▌ | 393/700 [03:40<02:55, 1.75it/s, loss=0.0341, lr=0.0004]
Steps: 56%|█████▋ | 394/700 [03:41<02:54, 1.76it/s, loss=0.0341, lr=0.0004]
Steps: 56%|█████▋ | 394/700 [03:41<02:54, 1.76it/s, loss=0.104, lr=0.0004]
Steps: 56%|█████▋ | 395/700 [03:41<02:54, 1.75it/s, loss=0.104, lr=0.0004]
Steps: 56%|█████▋ | 395/700 [03:41<02:54, 1.75it/s, loss=0.00275, lr=0.0004]
Steps: 57%|█████▋ | 396/700 [03:42<02:55, 1.74it/s, loss=0.00275, lr=0.0004]
Steps: 57%|█████▋ | 396/700 [03:42<02:55, 1.74it/s, loss=0.0398, lr=0.0004]
Steps: 57%|█████▋ | 397/700 [03:42<02:57, 1.71it/s, loss=0.0398, lr=0.0004]
Steps: 57%|█████▋ | 397/700 [03:42<02:57, 1.71it/s, loss=0.0031, lr=0.0004]
Steps: 57%|█████▋ | 398/700 [03:43<02:53, 1.74it/s, loss=0.0031, lr=0.0004]
Steps: 57%|█████▋ | 398/700 [03:43<02:53, 1.74it/s, loss=0.00922, lr=0.0004]
Steps: 57%|█████▋ | 399/700 [03:43<02:51, 1.76it/s, loss=0.00922, lr=0.0004]
Steps: 57%|█████▋ | 399/700 [03:43<02:51, 1.76it/s, loss=0.0128, lr=0.0004]
Steps: 57%|█████▋ | 400/700 [03:44<02:48, 1.79it/s, loss=0.0128, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_400.safetensors
LORA Unet Moved 0.0015479204012081027
LORA CLIP Moved 6.280629168031737e-05
Steps: 57%|█████▋ | 400/700 [03:44<02:48, 1.79it/s, loss=0.00486, lr=0.0004]
Steps: 57%|█████▋ | 401/700 [03:45<02:59, 1.67it/s, loss=0.00486, lr=0.0004]
Steps: 57%|█████▋ | 401/700 [03:45<02:59, 1.67it/s, loss=0.0242, lr=0.0004]
Steps: 57%|█████▋ | 402/700 [03:45<02:54, 1.70it/s, loss=0.0242, lr=0.0004]
Steps: 57%|█████▋ | 402/700 [03:45<02:54, 1.70it/s, loss=0.0114, lr=0.0004]
Steps: 58%|█████▊ | 403/700 [03:46<02:49, 1.75it/s, loss=0.0114, lr=0.0004]
Steps: 58%|█████▊ | 403/700 [03:46<02:49, 1.75it/s, loss=0.101, lr=0.0004]
Steps: 58%|█████▊ | 404/700 [03:46<02:44, 1.80it/s, loss=0.101, lr=0.0004]
Steps: 58%|█████▊ | 404/700 [03:46<02:44, 1.80it/s, loss=0.0565, lr=0.0004]
Steps: 58%|█████▊ | 405/700 [03:47<02:39, 1.85it/s, loss=0.0565, lr=0.0004]
Steps: 58%|█████▊ | 405/700 [03:47<02:39, 1.85it/s, loss=0.0139, lr=0.0004]
Steps: 58%|█████▊ | 406/700 [03:47<02:37, 1.86it/s, loss=0.0139, lr=0.0004]
Steps: 58%|█████▊ | 406/700 [03:47<02:37, 1.86it/s, loss=0.00395, lr=0.0004]
Steps: 58%|█████▊ | 407/700 [03:48<02:34, 1.89it/s, loss=0.00395, lr=0.0004]
Steps: 58%|█████▊ | 407/700 [03:48<02:34, 1.89it/s, loss=0.00693, lr=0.0004]
Steps: 58%|█████▊ | 408/700 [03:48<02:34, 1.89it/s, loss=0.00693, lr=0.0004]
Steps: 58%|█████▊ | 408/700 [03:48<02:34, 1.89it/s, loss=0.0185, lr=0.0004]
Steps: 58%|█████▊ | 409/700 [03:49<02:36, 1.85it/s, loss=0.0185, lr=0.0004]
Steps: 58%|█████▊ | 409/700 [03:49<02:36, 1.85it/s, loss=0.0226, lr=0.0004]
Steps: 59%|█████▊ | 410/700 [03:49<02:37, 1.84it/s, loss=0.0226, lr=0.0004]
Steps: 59%|█████▊ | 410/700 [03:49<02:37, 1.84it/s, loss=0.0122, lr=0.0004]
Steps: 59%|█████▊ | 411/700 [03:50<02:37, 1.83it/s, loss=0.0122, lr=0.0004]
Steps: 59%|█████▊ | 411/700 [03:50<02:37, 1.83it/s, loss=0.00795, lr=0.0004]
Steps: 59%|█████▉ | 412/700 [03:51<02:38, 1.82it/s, loss=0.00795, lr=0.0004]
Steps: 59%|█████▉ | 412/700 [03:51<02:38, 1.82it/s, loss=0.00217, lr=0.0004]
Steps: 59%|█████▉ | 413/700 [03:51<02:39, 1.80it/s, loss=0.00217, lr=0.0004]
Steps: 59%|█████▉ | 413/700 [03:51<02:39, 1.80it/s, loss=0.0183, lr=0.0004]
Steps: 59%|█████▉ | 414/700 [03:52<02:37, 1.82it/s, loss=0.0183, lr=0.0004]
Steps: 59%|█████▉ | 414/700 [03:52<02:37, 1.82it/s, loss=0.0149, lr=0.0004]
Steps: 59%|█████▉ | 415/700 [03:52<02:32, 1.87it/s, loss=0.0149, lr=0.0004]
Steps: 59%|█████▉ | 415/700 [03:52<02:32, 1.87it/s, loss=0.00353, lr=0.0004]
Steps: 59%|█████▉ | 416/700 [03:53<02:29, 1.90it/s, loss=0.00353, lr=0.0004]
Steps: 59%|█████▉ | 416/700 [03:53<02:29, 1.90it/s, loss=0.0368, lr=0.0004]
Steps: 60%|█████▉ | 417/700 [03:53<02:31, 1.87it/s, loss=0.0368, lr=0.0004]
Steps: 60%|█████▉ | 417/700 [03:53<02:31, 1.87it/s, loss=0.00279, lr=0.0004]
Steps: 60%|█████▉ | 418/700 [03:54<02:34, 1.83it/s, loss=0.00279, lr=0.0004]
Steps: 60%|█████▉ | 418/700 [03:54<02:34, 1.83it/s, loss=0.01, lr=0.0004]
Steps: 60%|█████▉ | 419/700 [03:54<02:34, 1.82it/s, loss=0.01, lr=0.0004]
Steps: 60%|█████▉ | 419/700 [03:54<02:34, 1.82it/s, loss=0.00632, lr=0.0004]
Steps: 60%|██████ | 420/700 [03:55<02:34, 1.81it/s, loss=0.00632, lr=0.0004]
Steps: 60%|██████ | 420/700 [03:55<02:34, 1.81it/s, loss=0.178, lr=0.0004]
Steps: 60%|██████ | 421/700 [03:55<02:31, 1.85it/s, loss=0.178, lr=0.0004]
Steps: 60%|██████ | 421/700 [03:55<02:31, 1.85it/s, loss=0.00584, lr=0.0004]
Steps: 60%|██████ | 422/700 [03:56<02:29, 1.85it/s, loss=0.00584, lr=0.0004]
Steps: 60%|██████ | 422/700 [03:56<02:29, 1.85it/s, loss=0.0698, lr=0.0004]
Steps: 60%|██████ | 423/700 [03:57<02:28, 1.87it/s, loss=0.0698, lr=0.0004]
Steps: 60%|██████ | 423/700 [03:57<02:28, 1.87it/s, loss=0.0128, lr=0.0004]
Steps: 61%|██████ | 424/700 [03:57<02:28, 1.86it/s, loss=0.0128, lr=0.0004]
Steps: 61%|██████ | 424/700 [03:57<02:28, 1.86it/s, loss=0.0616, lr=0.0004]
Steps: 61%|██████ | 425/700 [03:58<02:29, 1.84it/s, loss=0.0616, lr=0.0004]
Steps: 61%|██████ | 425/700 [03:58<02:29, 1.84it/s, loss=0.0102, lr=0.0004]
Steps: 61%|██████ | 426/700 [03:58<02:34, 1.77it/s, loss=0.0102, lr=0.0004]
Steps: 61%|██████ | 426/700 [03:58<02:34, 1.77it/s, loss=0.00736, lr=0.0004]
Steps: 61%|██████ | 427/700 [03:59<02:40, 1.70it/s, loss=0.00736, lr=0.0004]
Steps: 61%|██████ | 427/700 [03:59<02:40, 1.70it/s, loss=0.0113, lr=0.0004]
Steps: 61%|██████ | 428/700 [04:00<02:43, 1.67it/s, loss=0.0113, lr=0.0004]
Steps: 61%|██████ | 428/700 [04:00<02:43, 1.67it/s, loss=0.00517, lr=0.0004]
Steps: 61%|██████▏ | 429/700 [04:00<02:37, 1.72it/s, loss=0.00517, lr=0.0004]
Steps: 61%|██████▏ | 429/700 [04:00<02:37, 1.72it/s, loss=0.032, lr=0.0004]
Steps: 61%|██████▏ | 430/700 [04:01<02:33, 1.76it/s, loss=0.032, lr=0.0004]
Steps: 61%|██████▏ | 430/700 [04:01<02:33, 1.76it/s, loss=0.0133, lr=0.0004]
Steps: 62%|██████▏ | 431/700 [04:01<02:32, 1.77it/s, loss=0.0133, lr=0.0004]
Steps: 62%|██████▏ | 431/700 [04:01<02:32, 1.77it/s, loss=0.0429, lr=0.0004]
Steps: 62%|██████▏ | 432/700 [04:02<02:29, 1.79it/s, loss=0.0429, lr=0.0004]
Steps: 62%|██████▏ | 432/700 [04:02<02:29, 1.79it/s, loss=0.00896, lr=0.0004]
Steps: 62%|██████▏ | 433/700 [04:02<02:30, 1.78it/s, loss=0.00896, lr=0.0004]
Steps: 62%|██████▏ | 433/700 [04:02<02:30, 1.78it/s, loss=0.072, lr=0.0004]
Steps: 62%|██████▏ | 434/700 [04:03<02:28, 1.79it/s, loss=0.072, lr=0.0004]
Steps: 62%|██████▏ | 434/700 [04:03<02:28, 1.79it/s, loss=0.011, lr=0.0004]
Steps: 62%|██████▏ | 435/700 [04:03<02:26, 1.81it/s, loss=0.011, lr=0.0004]
Steps: 62%|██████▏ | 435/700 [04:03<02:26, 1.81it/s, loss=0.116, lr=0.0004]
Steps: 62%|██████▏ | 436/700 [04:04<02:25, 1.81it/s, loss=0.116, lr=0.0004]
Steps: 62%|██████▏ | 436/700 [04:04<02:25, 1.81it/s, loss=0.00514, lr=0.0004]
Steps: 62%|██████▏ | 437/700 [04:04<02:25, 1.81it/s, loss=0.00514, lr=0.0004]
Steps: 62%|██████▏ | 437/700 [04:04<02:25, 1.81it/s, loss=0.0137, lr=0.0004]
Steps: 63%|██████▎ | 438/700 [04:05<02:24, 1.81it/s, loss=0.0137, lr=0.0004]
Steps: 63%|██████▎ | 438/700 [04:05<02:24, 1.81it/s, loss=0.00167, lr=0.0004]
Steps: 63%|██████▎ | 439/700 [04:06<02:22, 1.84it/s, loss=0.00167, lr=0.0004]
Steps: 63%|██████▎ | 439/700 [04:06<02:22, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 63%|██████▎ | 440/700 [04:06<02:21, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 63%|██████▎ | 440/700 [04:06<02:21, 1.84it/s, loss=0.0135, lr=0.0004]
Steps: 63%|██████▎ | 441/700 [04:07<02:21, 1.83it/s, loss=0.0135, lr=0.0004]
Steps: 63%|██████▎ | 441/700 [04:07<02:21, 1.83it/s, loss=0.0287, lr=0.0004]
Steps: 63%|██████▎ | 442/700 [04:07<02:21, 1.83it/s, loss=0.0287, lr=0.0004]
Steps: 63%|██████▎ | 442/700 [04:07<02:21, 1.83it/s, loss=0.0146, lr=0.0004]
Steps: 63%|██████▎ | 443/700 [04:08<02:20, 1.83it/s, loss=0.0146, lr=0.0004]
Steps: 63%|██████▎ | 443/700 [04:08<02:20, 1.83it/s, loss=0.216, lr=0.0004]
Steps: 63%|██████▎ | 444/700 [04:08<02:19, 1.83it/s, loss=0.216, lr=0.0004]
Steps: 63%|██████▎ | 444/700 [04:08<02:19, 1.83it/s, loss=0.0454, lr=0.0004]
Steps: 64%|██████▎ | 445/700 [04:09<02:18, 1.84it/s, loss=0.0454, lr=0.0004]
Steps: 64%|██████▎ | 445/700 [04:09<02:18, 1.84it/s, loss=0.0396, lr=0.0004]
Steps: 64%|██████▎ | 446/700 [04:09<02:16, 1.86it/s, loss=0.0396, lr=0.0004]
Steps: 64%|██████▎ | 446/700 [04:09<02:16, 1.86it/s, loss=0.0378, lr=0.0004]
Steps: 64%|██████▍ | 447/700 [04:10<02:15, 1.86it/s, loss=0.0378, lr=0.0004]
Steps: 64%|██████▍ | 447/700 [04:10<02:15, 1.86it/s, loss=0.0112, lr=0.0004]
Steps: 64%|██████▍ | 448/700 [04:10<02:16, 1.85it/s, loss=0.0112, lr=0.0004]
Steps: 64%|██████▍ | 448/700 [04:10<02:16, 1.85it/s, loss=0.0411, lr=0.0004]
Steps: 64%|██████▍ | 449/700 [04:11<02:16, 1.83it/s, loss=0.0411, lr=0.0004]
Steps: 64%|██████▍ | 449/700 [04:11<02:16, 1.83it/s, loss=0.0222, lr=0.0004]
Steps: 64%|██████▍ | 450/700 [04:12<02:16, 1.83it/s, loss=0.0222, lr=0.0004]
Steps: 64%|██████▍ | 450/700 [04:12<02:16, 1.83it/s, loss=0.0735, lr=0.0004]
Steps: 64%|██████▍ | 451/700 [04:12<02:15, 1.84it/s, loss=0.0735, lr=0.0004]
Steps: 64%|██████▍ | 451/700 [04:12<02:15, 1.84it/s, loss=0.0261, lr=0.0004]
Steps: 65%|██████▍ | 452/700 [04:13<02:15, 1.83it/s, loss=0.0261, lr=0.0004]
Steps: 65%|██████▍ | 452/700 [04:13<02:15, 1.83it/s, loss=0.0861, lr=0.0004]
Steps: 65%|██████▍ | 453/700 [04:13<02:15, 1.82it/s, loss=0.0861, lr=0.0004]
Steps: 65%|██████▍ | 453/700 [04:13<02:15, 1.82it/s, loss=0.148, lr=0.0004]
Steps: 65%|██████▍ | 454/700 [04:14<02:16, 1.81it/s, loss=0.148, lr=0.0004]
Steps: 65%|██████▍ | 454/700 [04:14<02:16, 1.81it/s, loss=0.0519, lr=0.0004]
Steps: 65%|██████▌ | 455/700 [04:14<02:15, 1.80it/s, loss=0.0519, lr=0.0004]
Steps: 65%|██████▌ | 455/700 [04:14<02:15, 1.80it/s, loss=0.0917, lr=0.0004]
Steps: 65%|██████▌ | 456/700 [04:15<02:16, 1.79it/s, loss=0.0917, lr=0.0004]
Steps: 65%|██████▌ | 456/700 [04:15<02:16, 1.79it/s, loss=0.00812, lr=0.0004]
Steps: 65%|██████▌ | 457/700 [04:15<02:14, 1.81it/s, loss=0.00812, lr=0.0004]
Steps: 65%|██████▌ | 457/700 [04:15<02:14, 1.81it/s, loss=0.0117, lr=0.0004]
Steps: 65%|██████▌ | 458/700 [04:16<02:12, 1.82it/s, loss=0.0117, lr=0.0004]
Steps: 65%|██████▌ | 458/700 [04:16<02:12, 1.82it/s, loss=0.0597, lr=0.0004]
Steps: 66%|██████▌ | 459/700 [04:16<02:12, 1.82it/s, loss=0.0597, lr=0.0004]
Steps: 66%|██████▌ | 459/700 [04:16<02:12, 1.82it/s, loss=0.0163, lr=0.0004]
Steps: 66%|██████▌ | 460/700 [04:17<02:11, 1.82it/s, loss=0.0163, lr=0.0004]
Steps: 66%|██████▌ | 460/700 [04:17<02:11, 1.82it/s, loss=0.0808, lr=0.0004]
Steps: 66%|██████▌ | 461/700 [04:18<02:10, 1.83it/s, loss=0.0808, lr=0.0004]
Steps: 66%|██████▌ | 461/700 [04:18<02:10, 1.83it/s, loss=0.0125, lr=0.0004]
Steps: 66%|██████▌ | 462/700 [04:18<02:10, 1.83it/s, loss=0.0125, lr=0.0004]
Steps: 66%|██████▌ | 462/700 [04:18<02:10, 1.83it/s, loss=0.00627, lr=0.0004]
Steps: 66%|██████▌ | 463/700 [04:19<02:09, 1.83it/s, loss=0.00627, lr=0.0004]
Steps: 66%|██████▌ | 463/700 [04:19<02:09, 1.83it/s, loss=0.049, lr=0.0004]
Steps: 66%|██████▋ | 464/700 [04:19<02:09, 1.83it/s, loss=0.049, lr=0.0004]
Steps: 66%|██████▋ | 464/700 [04:19<02:09, 1.83it/s, loss=0.0678, lr=0.0004]
Steps: 66%|██████▋ | 465/700 [04:20<02:09, 1.82it/s, loss=0.0678, lr=0.0004]
Steps: 66%|██████▋ | 465/700 [04:20<02:09, 1.82it/s, loss=0.035, lr=0.0004]
Steps: 67%|██████▋ | 466/700 [04:20<02:08, 1.82it/s, loss=0.035, lr=0.0004]
Steps: 67%|██████▋ | 466/700 [04:20<02:08, 1.82it/s, loss=0.131, lr=0.0004]
Steps: 67%|██████▋ | 467/700 [04:21<02:08, 1.82it/s, loss=0.131, lr=0.0004]
Steps: 67%|██████▋ | 467/700 [04:21<02:08, 1.82it/s, loss=0.277, lr=0.0004]
Steps: 67%|██████▋ | 468/700 [04:21<02:06, 1.83it/s, loss=0.277, lr=0.0004]
Steps: 67%|██████▋ | 468/700 [04:21<02:06, 1.83it/s, loss=0.0124, lr=0.0004]
Steps: 67%|██████▋ | 469/700 [04:22<02:06, 1.82it/s, loss=0.0124, lr=0.0004]
Steps: 67%|██████▋ | 469/700 [04:22<02:06, 1.82it/s, loss=0.0462, lr=0.0004]
Steps: 67%|██████▋ | 470/700 [04:23<02:07, 1.80it/s, loss=0.0462, lr=0.0004]
Steps: 67%|██████▋ | 470/700 [04:23<02:07, 1.80it/s, loss=0.0415, lr=0.0004]
Steps: 67%|██████▋ | 471/700 [04:23<02:05, 1.82it/s, loss=0.0415, lr=0.0004]
Steps: 67%|██████▋ | 471/700 [04:23<02:05, 1.82it/s, loss=0.169, lr=0.0004]
Steps: 67%|██████▋ | 472/700 [04:24<02:04, 1.83it/s, loss=0.169, lr=0.0004]
Steps: 67%|██████▋ | 472/700 [04:24<02:04, 1.83it/s, loss=0.0197, lr=0.0004]
Steps: 68%|██████▊ | 473/700 [04:24<02:04, 1.82it/s, loss=0.0197, lr=0.0004]
Steps: 68%|██████▊ | 473/700 [04:24<02:04, 1.82it/s, loss=0.0275, lr=0.0004]
Steps: 68%|██████▊ | 474/700 [04:25<02:03, 1.83it/s, loss=0.0275, lr=0.0004]
Steps: 68%|██████▊ | 474/700 [04:25<02:03, 1.83it/s, loss=0.00273, lr=0.0004]
Steps: 68%|██████▊ | 475/700 [04:25<02:04, 1.81it/s, loss=0.00273, lr=0.0004]
Steps: 68%|██████▊ | 475/700 [04:25<02:04, 1.81it/s, loss=0.0279, lr=0.0004]
Steps: 68%|██████▊ | 476/700 [04:26<02:04, 1.79it/s, loss=0.0279, lr=0.0004]
Steps: 68%|██████▊ | 476/700 [04:26<02:04, 1.79it/s, loss=0.133, lr=0.0004]
Steps: 68%|██████▊ | 477/700 [04:26<02:04, 1.79it/s, loss=0.133, lr=0.0004]
Steps: 68%|██████▊ | 477/700 [04:26<02:04, 1.79it/s, loss=0.00584, lr=0.0004]
Steps: 68%|██████▊ | 478/700 [04:27<02:03, 1.79it/s, loss=0.00584, lr=0.0004]
Steps: 68%|██████▊ | 478/700 [04:27<02:03, 1.79it/s, loss=0.0541, lr=0.0004]
Steps: 68%|██████▊ | 479/700 [04:28<02:03, 1.79it/s, loss=0.0541, lr=0.0004]
Steps: 68%|██████▊ | 479/700 [04:28<02:03, 1.79it/s, loss=0.0163, lr=0.0004]
Steps: 69%|██████▊ | 480/700 [04:28<02:03, 1.79it/s, loss=0.0163, lr=0.0004]
Steps: 69%|██████▊ | 480/700 [04:28<02:03, 1.79it/s, loss=0.00538, lr=0.0004]
Steps: 69%|██████▊ | 481/700 [04:29<02:01, 1.80it/s, loss=0.00538, lr=0.0004]
Steps: 69%|██████▊ | 481/700 [04:29<02:01, 1.80it/s, loss=0.00586, lr=0.0004]
Steps: 69%|██████▉ | 482/700 [04:29<02:00, 1.81it/s, loss=0.00586, lr=0.0004]
Steps: 69%|██████▉ | 482/700 [04:29<02:00, 1.81it/s, loss=0.0193, lr=0.0004]
Steps: 69%|██████▉ | 483/700 [04:30<02:00, 1.80it/s, loss=0.0193, lr=0.0004]
Steps: 69%|██████▉ | 483/700 [04:30<02:00, 1.80it/s, loss=0.00902, lr=0.0004]
Steps: 69%|██████▉ | 484/700 [04:30<02:03, 1.76it/s, loss=0.00902, lr=0.0004]
Steps: 69%|██████▉ | 484/700 [04:30<02:03, 1.76it/s, loss=0.386, lr=0.0004]
Steps: 69%|██████▉ | 485/700 [04:31<01:59, 1.80it/s, loss=0.386, lr=0.0004]
Steps: 69%|██████▉ | 485/700 [04:31<01:59, 1.80it/s, loss=0.00357, lr=0.0004]
Steps: 69%|██████▉ | 486/700 [04:31<01:56, 1.83it/s, loss=0.00357, lr=0.0004]
Steps: 69%|██████▉ | 486/700 [04:31<01:56, 1.83it/s, loss=0.0271, lr=0.0004]
Steps: 70%|██████▉ | 487/700 [04:32<01:57, 1.81it/s, loss=0.0271, lr=0.0004]
Steps: 70%|██████▉ | 487/700 [04:32<01:57, 1.81it/s, loss=0.122, lr=0.0004]
Steps: 70%|██████▉ | 488/700 [04:32<01:54, 1.85it/s, loss=0.122, lr=0.0004]
Steps: 70%|██████▉ | 488/700 [04:32<01:54, 1.85it/s, loss=0.0115, lr=0.0004]
Steps: 70%|██████▉ | 489/700 [04:33<01:52, 1.88it/s, loss=0.0115, lr=0.0004]
Steps: 70%|██████▉ | 489/700 [04:33<01:52, 1.88it/s, loss=0.0324, lr=0.0004]
Steps: 70%|███████ | 490/700 [04:34<01:50, 1.89it/s, loss=0.0324, lr=0.0004]
Steps: 70%|███████ | 490/700 [04:34<01:50, 1.89it/s, loss=0.00157, lr=0.0004]
Steps: 70%|███████ | 491/700 [04:34<01:49, 1.91it/s, loss=0.00157, lr=0.0004]
Steps: 70%|███████ | 491/700 [04:34<01:49, 1.91it/s, loss=0.014, lr=0.0004]
Steps: 70%|███████ | 492/700 [04:35<01:48, 1.92it/s, loss=0.014, lr=0.0004]
Steps: 70%|███████ | 492/700 [04:35<01:48, 1.92it/s, loss=0.0567, lr=0.0004]
Steps: 70%|███████ | 493/700 [04:35<01:49, 1.90it/s, loss=0.0567, lr=0.0004]
Steps: 70%|███████ | 493/700 [04:35<01:49, 1.90it/s, loss=0.046, lr=0.0004]
Steps: 71%|███████ | 494/700 [04:36<01:50, 1.87it/s, loss=0.046, lr=0.0004]
Steps: 71%|███████ | 494/700 [04:36<01:50, 1.87it/s, loss=0.0275, lr=0.0004]
Steps: 71%|███████ | 495/700 [04:36<01:52, 1.83it/s, loss=0.0275, lr=0.0004]
Steps: 71%|███████ | 495/700 [04:36<01:52, 1.83it/s, loss=0.00814, lr=0.0004]
Steps: 71%|███████ | 496/700 [04:37<01:51, 1.82it/s, loss=0.00814, lr=0.0004]
Steps: 71%|███████ | 496/700 [04:37<01:51, 1.82it/s, loss=0.00738, lr=0.0004]
Steps: 71%|███████ | 497/700 [04:37<01:51, 1.82it/s, loss=0.00738, lr=0.0004]
Steps: 71%|███████ | 497/700 [04:37<01:51, 1.82it/s, loss=0.00353, lr=0.0004]
Steps: 71%|███████ | 498/700 [04:38<01:49, 1.85it/s, loss=0.00353, lr=0.0004]
Steps: 71%|███████ | 498/700 [04:38<01:49, 1.85it/s, loss=0.0116, lr=0.0004]
Steps: 71%|███████▏ | 499/700 [04:38<01:47, 1.88it/s, loss=0.0116, lr=0.0004]
Steps: 71%|███████▏ | 499/700 [04:38<01:47, 1.88it/s, loss=0.133, lr=0.0004]
Steps: 71%|███████▏ | 500/700 [04:39<01:47, 1.86it/s, loss=0.133, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_500.safetensors
LORA Unet Moved 0.0018108426593244076
LORA CLIP Moved 7.164952694438398e-05
Steps: 71%|███████▏ | 500/700 [04:39<01:47, 1.86it/s, loss=0.0136, lr=0.0004]
Steps: 72%|███████▏ | 501/700 [04:40<01:57, 1.70it/s, loss=0.0136, lr=0.0004]
Steps: 72%|███████▏ | 501/700 [04:40<01:57, 1.70it/s, loss=0.0168, lr=0.0004]
Steps: 72%|███████▏ | 502/700 [04:40<01:53, 1.74it/s, loss=0.0168, lr=0.0004]
Steps: 72%|███████▏ | 502/700 [04:40<01:53, 1.74it/s, loss=0.0313, lr=0.0004]
Steps: 72%|███████▏ | 503/700 [04:41<01:51, 1.76it/s, loss=0.0313, lr=0.0004]
Steps: 72%|███████▏ | 503/700 [04:41<01:51, 1.76it/s, loss=0.162, lr=0.0004]
Steps: 72%|███████▏ | 504/700 [04:41<01:49, 1.78it/s, loss=0.162, lr=0.0004]
Steps: 72%|███████▏ | 504/700 [04:41<01:49, 1.78it/s, loss=0.0117, lr=0.0004]
Steps: 72%|███████▏ | 505/700 [04:42<01:48, 1.80it/s, loss=0.0117, lr=0.0004]
Steps: 72%|███████▏ | 505/700 [04:42<01:48, 1.80it/s, loss=0.00169, lr=0.0004]
Steps: 72%|███████▏ | 506/700 [04:42<01:46, 1.81it/s, loss=0.00169, lr=0.0004]
Steps: 72%|███████▏ | 506/700 [04:42<01:46, 1.81it/s, loss=0.0182, lr=0.0004]
Steps: 72%|███████▏ | 507/700 [04:43<01:45, 1.83it/s, loss=0.0182, lr=0.0004]
Steps: 72%|███████▏ | 507/700 [04:43<01:45, 1.83it/s, loss=0.0245, lr=0.0004]
Steps: 73%|███████▎ | 508/700 [04:43<01:46, 1.81it/s, loss=0.0245, lr=0.0004]
Steps: 73%|███████▎ | 508/700 [04:43<01:46, 1.81it/s, loss=0.00677, lr=0.0004]
Steps: 73%|███████▎ | 509/700 [04:44<01:45, 1.81it/s, loss=0.00677, lr=0.0004]
Steps: 73%|███████▎ | 509/700 [04:44<01:45, 1.81it/s, loss=0.076, lr=0.0004]
Steps: 73%|███████▎ | 510/700 [04:45<01:44, 1.82it/s, loss=0.076, lr=0.0004]
Steps: 73%|███████▎ | 510/700 [04:45<01:44, 1.82it/s, loss=0.295, lr=0.0004]
Steps: 73%|███████▎ | 511/700 [04:45<01:44, 1.80it/s, loss=0.295, lr=0.0004]
Steps: 73%|███████▎ | 511/700 [04:45<01:44, 1.80it/s, loss=0.00341, lr=0.0004]
Steps: 73%|███████▎ | 512/700 [04:46<01:44, 1.80it/s, loss=0.00341, lr=0.0004]
Steps: 73%|███████▎ | 512/700 [04:46<01:44, 1.80it/s, loss=0.0115, lr=0.0004]
Steps: 73%|███████▎ | 513/700 [04:46<01:41, 1.84it/s, loss=0.0115, lr=0.0004]
Steps: 73%|███████▎ | 513/700 [04:46<01:41, 1.84it/s, loss=0.0503, lr=0.0004]
Steps: 73%|███████▎ | 514/700 [04:47<01:39, 1.86it/s, loss=0.0503, lr=0.0004]
Steps: 73%|███████▎ | 514/700 [04:47<01:39, 1.86it/s, loss=0.00832, lr=0.0004]
Steps: 74%|███████▎ | 515/700 [04:47<01:40, 1.84it/s, loss=0.00832, lr=0.0004]
Steps: 74%|███████▎ | 515/700 [04:47<01:40, 1.84it/s, loss=0.00209, lr=0.0004]
Steps: 74%|███████▎ | 516/700 [04:48<01:40, 1.83it/s, loss=0.00209, lr=0.0004]
Steps: 74%|███████▎ | 516/700 [04:48<01:40, 1.83it/s, loss=0.014, lr=0.0004]
Steps: 74%|███████▍ | 517/700 [04:48<01:40, 1.83it/s, loss=0.014, lr=0.0004]
Steps: 74%|███████▍ | 517/700 [04:48<01:40, 1.83it/s, loss=0.035, lr=0.0004]
Steps: 74%|███████▍ | 518/700 [04:49<01:40, 1.82it/s, loss=0.035, lr=0.0004]
Steps: 74%|███████▍ | 518/700 [04:49<01:40, 1.82it/s, loss=0.223, lr=0.0004]
Steps: 74%|███████▍ | 519/700 [04:49<01:39, 1.82it/s, loss=0.223, lr=0.0004]
Steps: 74%|███████▍ | 519/700 [04:49<01:39, 1.82it/s, loss=0.0441, lr=0.0004]
Steps: 74%|███████▍ | 520/700 [04:50<01:38, 1.83it/s, loss=0.0441, lr=0.0004]
Steps: 74%|███████▍ | 520/700 [04:50<01:38, 1.83it/s, loss=0.0202, lr=0.0004]
Steps: 74%|███████▍ | 521/700 [04:50<01:35, 1.88it/s, loss=0.0202, lr=0.0004]
Steps: 74%|███████▍ | 521/700 [04:50<01:35, 1.88it/s, loss=0.0171, lr=0.0004]
Steps: 75%|███████▍ | 522/700 [04:51<01:33, 1.90it/s, loss=0.0171, lr=0.0004]
Steps: 75%|███████▍ | 522/700 [04:51<01:33, 1.90it/s, loss=0.0126, lr=0.0004]
Steps: 75%|███████▍ | 523/700 [04:52<01:34, 1.87it/s, loss=0.0126, lr=0.0004]
Steps: 75%|███████▍ | 523/700 [04:52<01:34, 1.87it/s, loss=0.0803, lr=0.0004]
Steps: 75%|███████▍ | 524/700 [04:52<01:34, 1.87it/s, loss=0.0803, lr=0.0004]
Steps: 75%|███████▍ | 524/700 [04:52<01:34, 1.87it/s, loss=0.00485, lr=0.0004]
Steps: 75%|███████▌ | 525/700 [04:53<01:33, 1.87it/s, loss=0.00485, lr=0.0004]
Steps: 75%|███████▌ | 525/700 [04:53<01:33, 1.87it/s, loss=0.0205, lr=0.0004]
Steps: 75%|███████▌ | 526/700 [04:53<01:31, 1.90it/s, loss=0.0205, lr=0.0004]
Steps: 75%|███████▌ | 526/700 [04:53<01:31, 1.90it/s, loss=0.0313, lr=0.0004]
Steps: 75%|███████▌ | 527/700 [04:54<01:32, 1.88it/s, loss=0.0313, lr=0.0004]
Steps: 75%|███████▌ | 527/700 [04:54<01:32, 1.88it/s, loss=0.00287, lr=0.0004]
Steps: 75%|███████▌ | 528/700 [04:54<01:32, 1.85it/s, loss=0.00287, lr=0.0004]
Steps: 75%|███████▌ | 528/700 [04:54<01:32, 1.85it/s, loss=0.00346, lr=0.0004]
Steps: 76%|███████▌ | 529/700 [04:55<01:33, 1.83it/s, loss=0.00346, lr=0.0004]
Steps: 76%|███████▌ | 529/700 [04:55<01:33, 1.83it/s, loss=0.277, lr=0.0004]
Steps: 76%|███████▌ | 530/700 [04:55<01:31, 1.85it/s, loss=0.277, lr=0.0004]
Steps: 76%|███████▌ | 530/700 [04:55<01:31, 1.85it/s, loss=0.114, lr=0.0004]
Steps: 76%|███████▌ | 531/700 [04:56<01:31, 1.84it/s, loss=0.114, lr=0.0004]
Steps: 76%|███████▌ | 531/700 [04:56<01:31, 1.84it/s, loss=0.00907, lr=0.0004]
Steps: 76%|███████▌ | 532/700 [04:56<01:29, 1.88it/s, loss=0.00907, lr=0.0004]
Steps: 76%|███████▌ | 532/700 [04:56<01:29, 1.88it/s, loss=0.0188, lr=0.0004]
Steps: 76%|███████▌ | 533/700 [04:57<01:27, 1.92it/s, loss=0.0188, lr=0.0004]
Steps: 76%|███████▌ | 533/700 [04:57<01:27, 1.92it/s, loss=0.00488, lr=0.0004]
Steps: 76%|███████▋ | 534/700 [04:57<01:25, 1.95it/s, loss=0.00488, lr=0.0004]
Steps: 76%|███████▋ | 534/700 [04:57<01:25, 1.95it/s, loss=0.043, lr=0.0004]
Steps: 76%|███████▋ | 535/700 [04:58<01:23, 1.97it/s, loss=0.043, lr=0.0004]
Steps: 76%|███████▋ | 535/700 [04:58<01:23, 1.97it/s, loss=0.0856, lr=0.0004]
Steps: 77%|███████▋ | 536/700 [04:58<01:23, 1.96it/s, loss=0.0856, lr=0.0004]
Steps: 77%|███████▋ | 536/700 [04:58<01:23, 1.96it/s, loss=0.0465, lr=0.0004]
Steps: 77%|███████▋ | 537/700 [04:59<01:26, 1.88it/s, loss=0.0465, lr=0.0004]
Steps: 77%|███████▋ | 537/700 [04:59<01:26, 1.88it/s, loss=0.0128, lr=0.0004]
Steps: 77%|███████▋ | 538/700 [05:00<01:26, 1.87it/s, loss=0.0128, lr=0.0004]
Steps: 77%|███████▋ | 538/700 [05:00<01:26, 1.87it/s, loss=0.0311, lr=0.0004]
Steps: 77%|███████▋ | 539/700 [05:00<01:26, 1.87it/s, loss=0.0311, lr=0.0004]
Steps: 77%|███████▋ | 539/700 [05:00<01:26, 1.87it/s, loss=0.0866, lr=0.0004]
Steps: 77%|███████▋ | 540/700 [05:01<01:26, 1.86it/s, loss=0.0866, lr=0.0004]
Steps: 77%|███████▋ | 540/700 [05:01<01:26, 1.86it/s, loss=0.0238, lr=0.0004]
Steps: 77%|███████▋ | 541/700 [05:01<01:25, 1.87it/s, loss=0.0238, lr=0.0004]
Steps: 77%|███████▋ | 541/700 [05:01<01:25, 1.87it/s, loss=0.167, lr=0.0004]
Steps: 77%|███████▋ | 542/700 [05:02<01:23, 1.90it/s, loss=0.167, lr=0.0004]
Steps: 77%|███████▋ | 542/700 [05:02<01:23, 1.90it/s, loss=0.0733, lr=0.0004]
Steps: 78%|███████▊ | 543/700 [05:02<01:23, 1.87it/s, loss=0.0733, lr=0.0004]
Steps: 78%|███████▊ | 543/700 [05:02<01:23, 1.87it/s, loss=0.0158, lr=0.0004]
Steps: 78%|███████▊ | 544/700 [05:03<01:24, 1.86it/s, loss=0.0158, lr=0.0004]
Steps: 78%|███████▊ | 544/700 [05:03<01:24, 1.86it/s, loss=0.0303, lr=0.0004]
Steps: 78%|███████▊ | 545/700 [05:03<01:24, 1.82it/s, loss=0.0303, lr=0.0004]
Steps: 78%|███████▊ | 545/700 [05:03<01:24, 1.82it/s, loss=0.00213, lr=0.0004]
Steps: 78%|███████▊ | 546/700 [05:04<01:26, 1.78it/s, loss=0.00213, lr=0.0004]
Steps: 78%|███████▊ | 546/700 [05:04<01:26, 1.78it/s, loss=0.0131, lr=0.0004]
Steps: 78%|███████▊ | 547/700 [05:04<01:24, 1.80it/s, loss=0.0131, lr=0.0004]
Steps: 78%|███████▊ | 547/700 [05:04<01:24, 1.80it/s, loss=0.00865, lr=0.0004]
Steps: 78%|███████▊ | 548/700 [05:05<01:21, 1.86it/s, loss=0.00865, lr=0.0004]
Steps: 78%|███████▊ | 548/700 [05:05<01:21, 1.86it/s, loss=0.0364, lr=0.0004]
Steps: 78%|███████▊ | 549/700 [05:05<01:20, 1.88it/s, loss=0.0364, lr=0.0004]
Steps: 78%|███████▊ | 549/700 [05:05<01:20, 1.88it/s, loss=0.0189, lr=0.0004]
Steps: 79%|███████▊ | 550/700 [05:06<01:19, 1.88it/s, loss=0.0189, lr=0.0004]
Steps: 79%|███████▊ | 550/700 [05:06<01:19, 1.88it/s, loss=0.0136, lr=0.0004]
Steps: 79%|███████▊ | 551/700 [05:07<01:19, 1.88it/s, loss=0.0136, lr=0.0004]
Steps: 79%|███████▊ | 551/700 [05:07<01:19, 1.88it/s, loss=0.0498, lr=0.0004]
Steps: 79%|███████▉ | 552/700 [05:07<01:18, 1.89it/s, loss=0.0498, lr=0.0004]
Steps: 79%|███████▉ | 552/700 [05:07<01:18, 1.89it/s, loss=0.0141, lr=0.0004]
Steps: 79%|███████▉ | 553/700 [05:08<01:16, 1.92it/s, loss=0.0141, lr=0.0004]
Steps: 79%|███████▉ | 553/700 [05:08<01:16, 1.92it/s, loss=0.00719, lr=0.0004]
Steps: 79%|███████▉ | 554/700 [05:08<01:15, 1.93it/s, loss=0.00719, lr=0.0004]
Steps: 79%|███████▉ | 554/700 [05:08<01:15, 1.93it/s, loss=0.00273, lr=0.0004]
Steps: 79%|███████▉ | 555/700 [05:09<01:15, 1.92it/s, loss=0.00273, lr=0.0004]
Steps: 79%|███████▉ | 555/700 [05:09<01:15, 1.92it/s, loss=0.0116, lr=0.0004]
Steps: 79%|███████▉ | 556/700 [05:09<01:14, 1.94it/s, loss=0.0116, lr=0.0004]
Steps: 79%|███████▉ | 556/700 [05:09<01:14, 1.94it/s, loss=0.0282, lr=0.0004]
Steps: 80%|███████▉ | 557/700 [05:10<01:13, 1.95it/s, loss=0.0282, lr=0.0004]
Steps: 80%|███████▉ | 557/700 [05:10<01:13, 1.95it/s, loss=0.0122, lr=0.0004]
Steps: 80%|███████▉ | 558/700 [05:10<01:11, 1.97it/s, loss=0.0122, lr=0.0004]
Steps: 80%|███████▉ | 558/700 [05:10<01:11, 1.97it/s, loss=0.0149, lr=0.0004]
Steps: 80%|███████▉ | 559/700 [05:11<01:11, 1.96it/s, loss=0.0149, lr=0.0004]
Steps: 80%|███████▉ | 559/700 [05:11<01:11, 1.96it/s, loss=0.00336, lr=0.0004]
Steps: 80%|████████ | 560/700 [05:11<01:10, 1.98it/s, loss=0.00336, lr=0.0004]
Steps: 80%|████████ | 560/700 [05:11<01:10, 1.98it/s, loss=0.0495, lr=0.0004]
Steps: 80%|████████ | 561/700 [05:12<01:10, 1.98it/s, loss=0.0495, lr=0.0004]
Steps: 80%|████████ | 561/700 [05:12<01:10, 1.98it/s, loss=0.00663, lr=0.0004]
Steps: 80%|████████ | 562/700 [05:12<01:10, 1.96it/s, loss=0.00663, lr=0.0004]
Steps: 80%|████████ | 562/700 [05:12<01:10, 1.96it/s, loss=0.00749, lr=0.0004]
Steps: 80%|████████ | 563/700 [05:13<01:10, 1.94it/s, loss=0.00749, lr=0.0004]
Steps: 80%|████████ | 563/700 [05:13<01:10, 1.94it/s, loss=0.0777, lr=0.0004]
Steps: 81%|████████ | 564/700 [05:13<01:10, 1.94it/s, loss=0.0777, lr=0.0004]
Steps: 81%|████████ | 564/700 [05:13<01:10, 1.94it/s, loss=0.00752, lr=0.0004]
Steps: 81%|████████ | 565/700 [05:14<01:09, 1.94it/s, loss=0.00752, lr=0.0004]
Steps: 81%|████████ | 565/700 [05:14<01:09, 1.94it/s, loss=0.0213, lr=0.0004]
Steps: 81%|████████ | 566/700 [05:14<01:08, 1.95it/s, loss=0.0213, lr=0.0004]
Steps: 81%|████████ | 566/700 [05:14<01:08, 1.95it/s, loss=0.182, lr=0.0004]
Steps: 81%|████████ | 567/700 [05:15<01:08, 1.93it/s, loss=0.182, lr=0.0004]
Steps: 81%|████████ | 567/700 [05:15<01:08, 1.93it/s, loss=0.00876, lr=0.0004]
Steps: 81%|████████ | 568/700 [05:15<01:08, 1.94it/s, loss=0.00876, lr=0.0004]
Steps: 81%|████████ | 568/700 [05:15<01:08, 1.94it/s, loss=0.0193, lr=0.0004]
Steps: 81%|████████▏ | 569/700 [05:16<01:06, 1.97it/s, loss=0.0193, lr=0.0004]
Steps: 81%|████████▏ | 569/700 [05:16<01:06, 1.97it/s, loss=0.0154, lr=0.0004]
Steps: 81%|████████▏ | 570/700 [05:16<01:07, 1.91it/s, loss=0.0154, lr=0.0004]
Steps: 81%|████████▏ | 570/700 [05:16<01:07, 1.91it/s, loss=0.346, lr=0.0004]
Steps: 82%|████████▏ | 571/700 [05:17<01:09, 1.84it/s, loss=0.346, lr=0.0004]
Steps: 82%|████████▏ | 571/700 [05:17<01:09, 1.84it/s, loss=0.00996, lr=0.0004]
Steps: 82%|████████▏ | 572/700 [05:17<01:09, 1.84it/s, loss=0.00996, lr=0.0004]
Steps: 82%|████████▏ | 572/700 [05:17<01:09, 1.84it/s, loss=0.0344, lr=0.0004]
Steps: 82%|████████▏ | 573/700 [05:18<01:09, 1.81it/s, loss=0.0344, lr=0.0004]
Steps: 82%|████████▏ | 573/700 [05:18<01:09, 1.81it/s, loss=0.00388, lr=0.0004]
Steps: 82%|████████▏ | 574/700 [05:19<01:11, 1.75it/s, loss=0.00388, lr=0.0004]
Steps: 82%|████████▏ | 574/700 [05:19<01:11, 1.75it/s, loss=0.00327, lr=0.0004]
Steps: 82%|████████▏ | 575/700 [05:19<01:11, 1.75it/s, loss=0.00327, lr=0.0004]
Steps: 82%|████████▏ | 575/700 [05:19<01:11, 1.75it/s, loss=0.0173, lr=0.0004]
Steps: 82%|████████▏ | 576/700 [05:20<01:09, 1.77it/s, loss=0.0173, lr=0.0004]
Steps: 82%|████████▏ | 576/700 [05:20<01:09, 1.77it/s, loss=0.0087, lr=0.0004]
Steps: 82%|████████▏ | 577/700 [05:20<01:09, 1.77it/s, loss=0.0087, lr=0.0004]
Steps: 82%|████████▏ | 577/700 [05:20<01:09, 1.77it/s, loss=0.0399, lr=0.0004]
Steps: 83%|████████▎ | 578/700 [05:21<01:09, 1.76it/s, loss=0.0399, lr=0.0004]
Steps: 83%|████████▎ | 578/700 [05:21<01:09, 1.76it/s, loss=0.00906, lr=0.0004]
Steps: 83%|████████▎ | 579/700 [05:21<01:08, 1.78it/s, loss=0.00906, lr=0.0004]
Steps: 83%|████████▎ | 579/700 [05:21<01:08, 1.78it/s, loss=0.0716, lr=0.0004]
Steps: 83%|████████▎ | 580/700 [05:22<01:07, 1.77it/s, loss=0.0716, lr=0.0004]
Steps: 83%|████████▎ | 580/700 [05:22<01:07, 1.77it/s, loss=0.214, lr=0.0004]
Steps: 83%|████████▎ | 581/700 [05:23<01:07, 1.75it/s, loss=0.214, lr=0.0004]
Steps: 83%|████████▎ | 581/700 [05:23<01:07, 1.75it/s, loss=0.0602, lr=0.0004]
Steps: 83%|████████▎ | 582/700 [05:23<01:07, 1.75it/s, loss=0.0602, lr=0.0004]
Steps: 83%|████████▎ | 582/700 [05:23<01:07, 1.75it/s, loss=0.0708, lr=0.0004]
Steps: 83%|████████▎ | 583/700 [05:24<01:07, 1.75it/s, loss=0.0708, lr=0.0004]
Steps: 83%|████████▎ | 583/700 [05:24<01:07, 1.75it/s, loss=0.00627, lr=0.0004]
Steps: 83%|████████▎ | 584/700 [05:24<01:05, 1.76it/s, loss=0.00627, lr=0.0004]
Steps: 83%|████████▎ | 584/700 [05:24<01:05, 1.76it/s, loss=0.00603, lr=0.0004]
Steps: 84%|████████▎ | 585/700 [05:25<01:05, 1.76it/s, loss=0.00603, lr=0.0004]
Steps: 84%|████████▎ | 585/700 [05:25<01:05, 1.76it/s, loss=0.0861, lr=0.0004]
Steps: 84%|████████▎ | 586/700 [05:25<01:04, 1.77it/s, loss=0.0861, lr=0.0004]
Steps: 84%|████████▎ | 586/700 [05:25<01:04, 1.77it/s, loss=0.00681, lr=0.0004]
Steps: 84%|████████▍ | 587/700 [05:26<01:04, 1.76it/s, loss=0.00681, lr=0.0004]
Steps: 84%|████████▍ | 587/700 [05:26<01:04, 1.76it/s, loss=0.0772, lr=0.0004]
Steps: 84%|████████▍ | 588/700 [05:27<01:04, 1.75it/s, loss=0.0772, lr=0.0004]
Steps: 84%|████████▍ | 588/700 [05:27<01:04, 1.75it/s, loss=0.0183, lr=0.0004]
Steps: 84%|████████▍ | 589/700 [05:27<01:03, 1.75it/s, loss=0.0183, lr=0.0004]
Steps: 84%|████████▍ | 589/700 [05:27<01:03, 1.75it/s, loss=0.00783, lr=0.0004]
Steps: 84%|████████▍ | 590/700 [05:28<01:02, 1.75it/s, loss=0.00783, lr=0.0004]
Steps: 84%|████████▍ | 590/700 [05:28<01:02, 1.75it/s, loss=0.0575, lr=0.0004]
Steps: 84%|████████▍ | 591/700 [05:28<01:01, 1.77it/s, loss=0.0575, lr=0.0004]
Steps: 84%|████████▍ | 591/700 [05:28<01:01, 1.77it/s, loss=0.0142, lr=0.0004]
Steps: 85%|████████▍ | 592/700 [05:29<01:00, 1.78it/s, loss=0.0142, lr=0.0004]
Steps: 85%|████████▍ | 592/700 [05:29<01:00, 1.78it/s, loss=0.00664, lr=0.0004]
Steps: 85%|████████▍ | 593/700 [05:29<00:59, 1.80it/s, loss=0.00664, lr=0.0004]
Steps: 85%|████████▍ | 593/700 [05:29<00:59, 1.80it/s, loss=0.00879, lr=0.0004]
Steps: 85%|████████▍ | 594/700 [05:30<00:59, 1.79it/s, loss=0.00879, lr=0.0004]
Steps: 85%|████████▍ | 594/700 [05:30<00:59, 1.79it/s, loss=0.0716, lr=0.0004]
Steps: 85%|████████▌ | 595/700 [05:30<00:58, 1.79it/s, loss=0.0716, lr=0.0004]
Steps: 85%|████████▌ | 595/700 [05:30<00:58, 1.79it/s, loss=0.0366, lr=0.0004]
Steps: 85%|████████▌ | 596/700 [05:31<00:58, 1.79it/s, loss=0.0366, lr=0.0004]
Steps: 85%|████████▌ | 596/700 [05:31<00:58, 1.79it/s, loss=0.0431, lr=0.0004]
Steps: 85%|████████▌ | 597/700 [05:32<00:57, 1.79it/s, loss=0.0431, lr=0.0004]
Steps: 85%|████████▌ | 597/700 [05:32<00:57, 1.79it/s, loss=0.0399, lr=0.0004]
Steps: 85%|████████▌ | 598/700 [05:32<00:57, 1.79it/s, loss=0.0399, lr=0.0004]
Steps: 85%|████████▌ | 598/700 [05:32<00:57, 1.79it/s, loss=0.0735, lr=0.0004]
Steps: 86%|████████▌ | 599/700 [05:33<00:56, 1.78it/s, loss=0.0735, lr=0.0004]
Steps: 86%|████████▌ | 599/700 [05:33<00:56, 1.78it/s, loss=0.0237, lr=0.0004]
Steps: 86%|████████▌ | 600/700 [05:33<00:56, 1.78it/s, loss=0.0237, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_600.safetensors
LORA Unet Moved 0.0020308804232627153
LORA CLIP Moved 8.028616866795346e-05
Steps: 86%|████████▌ | 600/700 [05:33<00:56, 1.78it/s, loss=0.00955, lr=0.0004]
Steps: 86%|████████▌ | 601/700 [05:34<01:00, 1.62it/s, loss=0.00955, lr=0.0004]
Steps: 86%|████████▌ | 601/700 [05:34<01:00, 1.62it/s, loss=0.0184, lr=0.0004]
Steps: 86%|████████▌ | 602/700 [05:35<00:59, 1.66it/s, loss=0.0184, lr=0.0004]
Steps: 86%|████████▌ | 602/700 [05:35<00:59, 1.66it/s, loss=0.0569, lr=0.0004]
Steps: 86%|████████▌ | 603/700 [05:35<00:57, 1.69it/s, loss=0.0569, lr=0.0004]
Steps: 86%|████████▌ | 603/700 [05:35<00:57, 1.69it/s, loss=0.00788, lr=0.0004]
Steps: 86%|████████▋ | 604/700 [05:36<00:55, 1.72it/s, loss=0.00788, lr=0.0004]
Steps: 86%|████████▋ | 604/700 [05:36<00:55, 1.72it/s, loss=0.0886, lr=0.0004]
Steps: 86%|████████▋ | 605/700 [05:36<00:54, 1.74it/s, loss=0.0886, lr=0.0004]
Steps: 86%|████████▋ | 605/700 [05:36<00:54, 1.74it/s, loss=0.0103, lr=0.0004]
Steps: 87%|████████▋ | 606/700 [05:37<00:53, 1.77it/s, loss=0.0103, lr=0.0004]
Steps: 87%|████████▋ | 606/700 [05:37<00:53, 1.77it/s, loss=0.00687, lr=0.0004]
Steps: 87%|████████▋ | 607/700 [05:37<00:52, 1.79it/s, loss=0.00687, lr=0.0004]
Steps: 87%|████████▋ | 607/700 [05:37<00:52, 1.79it/s, loss=0.00811, lr=0.0004]
Steps: 87%|████████▋ | 608/700 [05:38<00:50, 1.81it/s, loss=0.00811, lr=0.0004]
Steps: 87%|████████▋ | 608/700 [05:38<00:50, 1.81it/s, loss=0.0626, lr=0.0004]
Steps: 87%|████████▋ | 609/700 [05:38<00:50, 1.81it/s, loss=0.0626, lr=0.0004]
Steps: 87%|████████▋ | 609/700 [05:38<00:50, 1.81it/s, loss=0.037, lr=0.0004]
Steps: 87%|████████▋ | 610/700 [05:39<00:50, 1.79it/s, loss=0.037, lr=0.0004]
Steps: 87%|████████▋ | 610/700 [05:39<00:50, 1.79it/s, loss=0.0101, lr=0.0004]
Steps: 87%|████████▋ | 611/700 [05:40<00:49, 1.80it/s, loss=0.0101, lr=0.0004]
Steps: 87%|████████▋ | 611/700 [05:40<00:49, 1.80it/s, loss=0.00297, lr=0.0004]
Steps: 87%|████████▋ | 612/700 [05:40<00:49, 1.79it/s, loss=0.00297, lr=0.0004]
Steps: 87%|████████▋ | 612/700 [05:40<00:49, 1.79it/s, loss=0.045, lr=0.0004]
Steps: 88%|████████▊ | 613/700 [05:41<00:48, 1.79it/s, loss=0.045, lr=0.0004]
Steps: 88%|████████▊ | 613/700 [05:41<00:48, 1.79it/s, loss=0.00866, lr=0.0004]
Steps: 88%|████████▊ | 614/700 [05:41<00:47, 1.79it/s, loss=0.00866, lr=0.0004]
Steps: 88%|████████▊ | 614/700 [05:41<00:47, 1.79it/s, loss=0.00474, lr=0.0004]
Steps: 88%|████████▊ | 615/700 [05:42<00:47, 1.79it/s, loss=0.00474, lr=0.0004]
Steps: 88%|████████▊ | 615/700 [05:42<00:47, 1.79it/s, loss=0.0106, lr=0.0004]
Steps: 88%|████████▊ | 616/700 [05:42<00:46, 1.81it/s, loss=0.0106, lr=0.0004]
Steps: 88%|████████▊ | 616/700 [05:42<00:46, 1.81it/s, loss=0.0635, lr=0.0004]
Steps: 88%|████████▊ | 617/700 [05:43<00:46, 1.79it/s, loss=0.0635, lr=0.0004]
Steps: 88%|████████▊ | 617/700 [05:43<00:46, 1.79it/s, loss=0.0116, lr=0.0004]
Steps: 88%|████████▊ | 618/700 [05:43<00:46, 1.77it/s, loss=0.0116, lr=0.0004]
Steps: 88%|████████▊ | 618/700 [05:43<00:46, 1.77it/s, loss=0.0267, lr=0.0004]
Steps: 88%|████████▊ | 619/700 [05:44<00:45, 1.77it/s, loss=0.0267, lr=0.0004]
Steps: 88%|████████▊ | 619/700 [05:44<00:45, 1.77it/s, loss=0.0141, lr=0.0004]
Steps: 89%|████████▊ | 620/700 [05:45<00:45, 1.78it/s, loss=0.0141, lr=0.0004]
Steps: 89%|████████▊ | 620/700 [05:45<00:45, 1.78it/s, loss=0.0269, lr=0.0004]
Steps: 89%|████████▊ | 621/700 [05:45<00:43, 1.80it/s, loss=0.0269, lr=0.0004]
Steps: 89%|████████▊ | 621/700 [05:45<00:43, 1.80it/s, loss=0.0219, lr=0.0004]
Steps: 89%|████████▉ | 622/700 [05:46<00:42, 1.82it/s, loss=0.0219, lr=0.0004]
Steps: 89%|████████▉ | 622/700 [05:46<00:42, 1.82it/s, loss=0.0307, lr=0.0004]
Steps: 89%|████████▉ | 623/700 [05:46<00:42, 1.81it/s, loss=0.0307, lr=0.0004]
Steps: 89%|████████▉ | 623/700 [05:46<00:42, 1.81it/s, loss=0.0196, lr=0.0004]
Steps: 89%|████████▉ | 624/700 [05:47<00:42, 1.81it/s, loss=0.0196, lr=0.0004]
Steps: 89%|████████▉ | 624/700 [05:47<00:42, 1.81it/s, loss=0.0529, lr=0.0004]
Steps: 89%|████████▉ | 625/700 [05:47<00:41, 1.82it/s, loss=0.0529, lr=0.0004]
Steps: 89%|████████▉ | 625/700 [05:47<00:41, 1.82it/s, loss=0.0333, lr=0.0004]
Steps: 89%|████████▉ | 626/700 [05:48<00:40, 1.83it/s, loss=0.0333, lr=0.0004]
Steps: 89%|████████▉ | 626/700 [05:48<00:40, 1.83it/s, loss=0.0369, lr=0.0004]
Steps: 90%|████████▉ | 627/700 [05:48<00:39, 1.86it/s, loss=0.0369, lr=0.0004]
Steps: 90%|████████▉ | 627/700 [05:48<00:39, 1.86it/s, loss=0.0185, lr=0.0004]
Steps: 90%|████████▉ | 628/700 [05:49<00:38, 1.87it/s, loss=0.0185, lr=0.0004]
Steps: 90%|████████▉ | 628/700 [05:49<00:38, 1.87it/s, loss=0.00975, lr=0.0004]
Steps: 90%|████████▉ | 629/700 [05:49<00:38, 1.84it/s, loss=0.00975, lr=0.0004]
Steps: 90%|████████▉ | 629/700 [05:49<00:38, 1.84it/s, loss=0.021, lr=0.0004]
Steps: 90%|█████████ | 630/700 [05:50<00:38, 1.84it/s, loss=0.021, lr=0.0004]
Steps: 90%|█████████ | 630/700 [05:50<00:38, 1.84it/s, loss=0.111, lr=0.0004]
Steps: 90%|█████████ | 631/700 [05:51<00:37, 1.84it/s, loss=0.111, lr=0.0004]
Steps: 90%|█████████ | 631/700 [05:51<00:37, 1.84it/s, loss=0.00458, lr=0.0004]
Steps: 90%|█████████ | 632/700 [05:51<00:36, 1.84it/s, loss=0.00458, lr=0.0004]
Steps: 90%|█████████ | 632/700 [05:51<00:36, 1.84it/s, loss=0.0759, lr=0.0004]
Steps: 90%|█████████ | 633/700 [05:52<00:36, 1.84it/s, loss=0.0759, lr=0.0004]
Steps: 90%|█████████ | 633/700 [05:52<00:36, 1.84it/s, loss=0.0882, lr=0.0004]
Steps: 91%|█████████ | 634/700 [05:52<00:36, 1.83it/s, loss=0.0882, lr=0.0004]
Steps: 91%|█████████ | 634/700 [05:52<00:36, 1.83it/s, loss=0.0142, lr=0.0004]
Steps: 91%|█████████ | 635/700 [05:53<00:35, 1.84it/s, loss=0.0142, lr=0.0004]
Steps: 91%|█████████ | 635/700 [05:53<00:35, 1.84it/s, loss=0.00448, lr=0.0004]
Steps: 91%|█████████ | 636/700 [05:53<00:34, 1.84it/s, loss=0.00448, lr=0.0004]
Steps: 91%|█████████ | 636/700 [05:53<00:34, 1.84it/s, loss=0.0323, lr=0.0004]
Steps: 91%|█████████ | 637/700 [05:54<00:34, 1.81it/s, loss=0.0323, lr=0.0004]
Steps: 91%|█████████ | 637/700 [05:54<00:34, 1.81it/s, loss=0.00757, lr=0.0004]
Steps: 91%|█████████ | 638/700 [05:54<00:34, 1.80it/s, loss=0.00757, lr=0.0004]
Steps: 91%|█████████ | 638/700 [05:54<00:34, 1.80it/s, loss=0.0161, lr=0.0004]
Steps: 91%|█████████▏| 639/700 [05:55<00:33, 1.80it/s, loss=0.0161, lr=0.0004]
Steps: 91%|█████████▏| 639/700 [05:55<00:33, 1.80it/s, loss=0.0543, lr=0.0004]
Steps: 91%|█████████▏| 640/700 [05:56<00:33, 1.80it/s, loss=0.0543, lr=0.0004]
Steps: 91%|█████████▏| 640/700 [05:56<00:33, 1.80it/s, loss=0.0417, lr=0.0004]
Steps: 92%|█████████▏| 641/700 [05:56<00:32, 1.81it/s, loss=0.0417, lr=0.0004]
Steps: 92%|█████████▏| 641/700 [05:56<00:32, 1.81it/s, loss=0.0085, lr=0.0004]
Steps: 92%|█████████▏| 642/700 [05:57<00:31, 1.85it/s, loss=0.0085, lr=0.0004]
Steps: 92%|█████████▏| 642/700 [05:57<00:31, 1.85it/s, loss=0.00933, lr=0.0004]
Steps: 92%|█████████▏| 643/700 [05:57<00:29, 1.90it/s, loss=0.00933, lr=0.0004]
Steps: 92%|█████████▏| 643/700 [05:57<00:29, 1.90it/s, loss=0.00429, lr=0.0004]
Steps: 92%|█████████▏| 644/700 [05:58<00:28, 1.94it/s, loss=0.00429, lr=0.0004]
Steps: 92%|█████████▏| 644/700 [05:58<00:28, 1.94it/s, loss=0.051, lr=0.0004]
Steps: 92%|█████████▏| 645/700 [05:58<00:28, 1.95it/s, loss=0.051, lr=0.0004]
Steps: 92%|█████████▏| 645/700 [05:58<00:28, 1.95it/s, loss=0.122, lr=0.0004]
Steps: 92%|█████████▏| 646/700 [05:59<00:27, 1.94it/s, loss=0.122, lr=0.0004]
Steps: 92%|█████████▏| 646/700 [05:59<00:27, 1.94it/s, loss=0.0861, lr=0.0004]
Steps: 92%|█████████▏| 647/700 [05:59<00:27, 1.93it/s, loss=0.0861, lr=0.0004]
Steps: 92%|█████████▏| 647/700 [05:59<00:27, 1.93it/s, loss=0.0105, lr=0.0004]
Steps: 93%|█████████▎| 648/700 [06:00<00:26, 1.95it/s, loss=0.0105, lr=0.0004]
Steps: 93%|█████████▎| 648/700 [06:00<00:26, 1.95it/s, loss=0.28, lr=0.0004]
Steps: 93%|█████████▎| 649/700 [06:00<00:25, 1.98it/s, loss=0.28, lr=0.0004]
Steps: 93%|█████████▎| 649/700 [06:00<00:25, 1.98it/s, loss=0.00453, lr=0.0004]
Steps: 93%|█████████▎| 650/700 [06:01<00:25, 1.97it/s, loss=0.00453, lr=0.0004]
Steps: 93%|█████████▎| 650/700 [06:01<00:25, 1.97it/s, loss=0.0112, lr=0.0004]
Steps: 93%|█████████▎| 651/700 [06:01<00:24, 1.97it/s, loss=0.0112, lr=0.0004]
Steps: 93%|█████████▎| 651/700 [06:01<00:24, 1.97it/s, loss=0.00302, lr=0.0004]
Steps: 93%|█████████▎| 652/700 [06:02<00:24, 1.95it/s, loss=0.00302, lr=0.0004]
Steps: 93%|█████████▎| 652/700 [06:02<00:24, 1.95it/s, loss=0.0966, lr=0.0004]
Steps: 93%|█████████▎| 653/700 [06:02<00:25, 1.85it/s, loss=0.0966, lr=0.0004]
Steps: 93%|█████████▎| 653/700 [06:02<00:25, 1.85it/s, loss=0.0116, lr=0.0004]
Steps: 93%|█████████▎| 654/700 [06:03<00:25, 1.84it/s, loss=0.0116, lr=0.0004]
Steps: 93%|█████████▎| 654/700 [06:03<00:25, 1.84it/s, loss=0.00164, lr=0.0004]
Steps: 94%|█████████▎| 655/700 [06:03<00:24, 1.83it/s, loss=0.00164, lr=0.0004]
Steps: 94%|█████████▎| 655/700 [06:03<00:24, 1.83it/s, loss=0.0755, lr=0.0004]
Steps: 94%|█████████▎| 656/700 [06:04<00:23, 1.84it/s, loss=0.0755, lr=0.0004]
Steps: 94%|█████████▎| 656/700 [06:04<00:23, 1.84it/s, loss=0.118, lr=0.0004]
Steps: 94%|█████████▍| 657/700 [06:04<00:22, 1.87it/s, loss=0.118, lr=0.0004]
Steps: 94%|█████████▍| 657/700 [06:04<00:22, 1.87it/s, loss=0.00279, lr=0.0004]
Steps: 94%|█████████▍| 658/700 [06:05<00:22, 1.90it/s, loss=0.00279, lr=0.0004]
Steps: 94%|█████████▍| 658/700 [06:05<00:22, 1.90it/s, loss=0.0254, lr=0.0004]
Steps: 94%|█████████▍| 659/700 [06:05<00:21, 1.91it/s, loss=0.0254, lr=0.0004]
Steps: 94%|█████████▍| 659/700 [06:05<00:21, 1.91it/s, loss=0.00583, lr=0.0004]
Steps: 94%|█████████▍| 660/700 [06:06<00:20, 1.93it/s, loss=0.00583, lr=0.0004]
Steps: 94%|█████████▍| 660/700 [06:06<00:20, 1.93it/s, loss=0.0188, lr=0.0004]
Steps: 94%|█████████▍| 661/700 [06:06<00:20, 1.94it/s, loss=0.0188, lr=0.0004]
Steps: 94%|█████████▍| 661/700 [06:06<00:20, 1.94it/s, loss=0.0194, lr=0.0004]
Steps: 95%|█████████▍| 662/700 [06:07<00:19, 1.96it/s, loss=0.0194, lr=0.0004]
Steps: 95%|█████████▍| 662/700 [06:07<00:19, 1.96it/s, loss=0.0046, lr=0.0004]
Steps: 95%|█████████▍| 663/700 [06:07<00:18, 1.97it/s, loss=0.0046, lr=0.0004]
Steps: 95%|█████████▍| 663/700 [06:07<00:18, 1.97it/s, loss=0.0282, lr=0.0004]
Steps: 95%|█████████▍| 664/700 [06:08<00:18, 1.95it/s, loss=0.0282, lr=0.0004]
Steps: 95%|█████████▍| 664/700 [06:08<00:18, 1.95it/s, loss=0.0177, lr=0.0004]
Steps: 95%|█████████▌| 665/700 [06:09<00:18, 1.87it/s, loss=0.0177, lr=0.0004]
Steps: 95%|█████████▌| 665/700 [06:09<00:18, 1.87it/s, loss=0.028, lr=0.0004]
Steps: 95%|█████████▌| 666/700 [06:09<00:17, 1.91it/s, loss=0.028, lr=0.0004]
Steps: 95%|█████████▌| 666/700 [06:09<00:17, 1.91it/s, loss=0.00854, lr=0.0004]
Steps: 95%|█████████▌| 667/700 [06:10<00:17, 1.92it/s, loss=0.00854, lr=0.0004]
Steps: 95%|█████████▌| 667/700 [06:10<00:17, 1.92it/s, loss=0.0678, lr=0.0004]
Steps: 95%|█████████▌| 668/700 [06:10<00:16, 1.93it/s, loss=0.0678, lr=0.0004]
Steps: 95%|█████████▌| 668/700 [06:10<00:16, 1.93it/s, loss=0.0106, lr=0.0004]
Steps: 96%|█████████▌| 669/700 [06:11<00:15, 1.94it/s, loss=0.0106, lr=0.0004]
Steps: 96%|█████████▌| 669/700 [06:11<00:15, 1.94it/s, loss=0.00561, lr=0.0004]
Steps: 96%|█████████▌| 670/700 [06:11<00:15, 1.89it/s, loss=0.00561, lr=0.0004]
Steps: 96%|█████████▌| 670/700 [06:11<00:15, 1.89it/s, loss=0.0232, lr=0.0004]
Steps: 96%|█████████▌| 671/700 [06:12<00:15, 1.88it/s, loss=0.0232, lr=0.0004]
Steps: 96%|█████████▌| 671/700 [06:12<00:15, 1.88it/s, loss=0.0145, lr=0.0004]
Steps: 96%|█████████▌| 672/700 [06:12<00:15, 1.87it/s, loss=0.0145, lr=0.0004]
Steps: 96%|█████████▌| 672/700 [06:12<00:15, 1.87it/s, loss=0.0449, lr=0.0004]
Steps: 96%|█████████▌| 673/700 [06:13<00:14, 1.83it/s, loss=0.0449, lr=0.0004]
Steps: 96%|█████████▌| 673/700 [06:13<00:14, 1.83it/s, loss=0.0102, lr=0.0004]
Steps: 96%|█████████▋| 674/700 [06:13<00:14, 1.80it/s, loss=0.0102, lr=0.0004]
Steps: 96%|█████████▋| 674/700 [06:13<00:14, 1.80it/s, loss=0.0219, lr=0.0004]
Steps: 96%|█████████▋| 675/700 [06:14<00:14, 1.78it/s, loss=0.0219, lr=0.0004]
Steps: 96%|█████████▋| 675/700 [06:14<00:14, 1.78it/s, loss=0.00629, lr=0.0004]
Steps: 97%|█████████▋| 676/700 [06:15<00:13, 1.78it/s, loss=0.00629, lr=0.0004]
Steps: 97%|█████████▋| 676/700 [06:15<00:13, 1.78it/s, loss=0.112, lr=0.0004]
Steps: 97%|█████████▋| 677/700 [06:15<00:13, 1.77it/s, loss=0.112, lr=0.0004]
Steps: 97%|█████████▋| 677/700 [06:15<00:13, 1.77it/s, loss=0.00805, lr=0.0004]
Steps: 97%|█████████▋| 678/700 [06:16<00:12, 1.76it/s, loss=0.00805, lr=0.0004]
Steps: 97%|█████████▋| 678/700 [06:16<00:12, 1.76it/s, loss=0.00428, lr=0.0004]
Steps: 97%|█████████▋| 679/700 [06:16<00:11, 1.76it/s, loss=0.00428, lr=0.0004]
Steps: 97%|█████████▋| 679/700 [06:16<00:11, 1.76it/s, loss=0.00553, lr=0.0004]
Steps: 97%|█████████▋| 680/700 [06:17<00:11, 1.76it/s, loss=0.00553, lr=0.0004]
Steps: 97%|█████████▋| 680/700 [06:17<00:11, 1.76it/s, loss=0.00655, lr=0.0004]
Steps: 97%|█████████▋| 681/700 [06:17<00:10, 1.74it/s, loss=0.00655, lr=0.0004]
Steps: 97%|█████████▋| 681/700 [06:17<00:10, 1.74it/s, loss=0.0833, lr=0.0004]
Steps: 97%|█████████▋| 682/700 [06:18<00:10, 1.73it/s, loss=0.0833, lr=0.0004]
Steps: 97%|█████████▋| 682/700 [06:18<00:10, 1.73it/s, loss=0.0285, lr=0.0004]
Steps: 98%|█████████▊| 683/700 [06:19<00:09, 1.71it/s, loss=0.0285, lr=0.0004]
Steps: 98%|█████████▊| 683/700 [06:19<00:09, 1.71it/s, loss=0.0525, lr=0.0004]
Steps: 98%|█████████▊| 684/700 [06:19<00:09, 1.72it/s, loss=0.0525, lr=0.0004]
Steps: 98%|█████████▊| 684/700 [06:19<00:09, 1.72it/s, loss=0.00216, lr=0.0004]
Steps: 98%|█████████▊| 685/700 [06:20<00:08, 1.71it/s, loss=0.00216, lr=0.0004]
Steps: 98%|█████████▊| 685/700 [06:20<00:08, 1.71it/s, loss=0.0627, lr=0.0004]
Steps: 98%|█████████▊| 686/700 [06:20<00:08, 1.73it/s, loss=0.0627, lr=0.0004]
Steps: 98%|█████████▊| 686/700 [06:20<00:08, 1.73it/s, loss=0.0122, lr=0.0004]
Steps: 98%|█████████▊| 687/700 [06:21<00:07, 1.72it/s, loss=0.0122, lr=0.0004]
Steps: 98%|█████████▊| 687/700 [06:21<00:07, 1.72it/s, loss=0.00683, lr=0.0004]
Steps: 98%|█████████▊| 688/700 [06:22<00:06, 1.72it/s, loss=0.00683, lr=0.0004]
Steps: 98%|█████████▊| 688/700 [06:22<00:06, 1.72it/s, loss=0.00972, lr=0.0004]
Steps: 98%|█████████▊| 689/700 [06:22<00:06, 1.73it/s, loss=0.00972, lr=0.0004]
Steps: 98%|█████████▊| 689/700 [06:22<00:06, 1.73it/s, loss=0.0338, lr=0.0004]
Steps: 99%|█████████▊| 690/700 [06:23<00:05, 1.74it/s, loss=0.0338, lr=0.0004]
Steps: 99%|█████████▊| 690/700 [06:23<00:05, 1.74it/s, loss=0.0056, lr=0.0004]
Steps: 99%|█████████▊| 691/700 [06:23<00:05, 1.75it/s, loss=0.0056, lr=0.0004]
Steps: 99%|█████████▊| 691/700 [06:23<00:05, 1.75it/s, loss=0.00928, lr=0.0004]
Steps: 99%|█████████▉| 692/700 [06:24<00:04, 1.75it/s, loss=0.00928, lr=0.0004]
Steps: 99%|█████████▉| 692/700 [06:24<00:04, 1.75it/s, loss=0.00226, lr=0.0004]
Steps: 99%|█████████▉| 693/700 [06:24<00:04, 1.72it/s, loss=0.00226, lr=0.0004]
Steps: 99%|█████████▉| 693/700 [06:24<00:04, 1.72it/s, loss=0.00318, lr=0.0004]
Steps: 99%|█████████▉| 694/700 [06:25<00:03, 1.73it/s, loss=0.00318, lr=0.0004]
Steps: 99%|█████████▉| 694/700 [06:25<00:03, 1.73it/s, loss=0.00763, lr=0.0004]
Steps: 99%|█████████▉| 695/700 [06:26<00:02, 1.74it/s, loss=0.00763, lr=0.0004]
Steps: 99%|█████████▉| 695/700 [06:26<00:02, 1.74it/s, loss=0.0217, lr=0.0004]
Steps: 99%|█████████▉| 696/700 [06:26<00:02, 1.75it/s, loss=0.0217, lr=0.0004]
Steps: 99%|█████████▉| 696/700 [06:26<00:02, 1.75it/s, loss=0.0112, lr=0.0004]
Steps: 100%|█████████▉| 697/700 [06:27<00:01, 1.76it/s, loss=0.0112, lr=0.0004]
Steps: 100%|█████████▉| 697/700 [06:27<00:01, 1.76it/s, loss=0.0276, lr=0.0004]
Steps: 100%|█████████▉| 698/700 [06:27<00:01, 1.76it/s, loss=0.0276, lr=0.0004]
Steps: 100%|█████████▉| 698/700 [06:27<00:01, 1.76it/s, loss=0.0766, lr=0.0004]
Steps: 100%|█████████▉| 699/700 [06:28<00:00, 1.74it/s, loss=0.0766, lr=0.0004]
Steps: 100%|█████████▉| 699/700 [06:28<00:00, 1.74it/s, loss=0.00645, lr=0.0004]
Steps: 100%|██████████| 700/700 [06:28<00:00, 1.74it/s, loss=0.00645, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_700.safetensors
LORA Unet Moved 0.002203464973717928
LORA CLIP Moved 8.760895434534177e-05
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/final_lora.safetensors
Steps: 100%|██████████| 700/700 [06:28<00:00, 1.74it/s, loss=0.00693, lr=0.0004]
Steps: 100%|██████████| 700/700 [06:29<00:00, 1.80it/s, loss=0.00693, lr=0.0004]
This example was created by a different version, zhouzhengjun/lora_train_base:1685f45a.
This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.
This model doesn't have a readme.
This model is cold. You'll get a fast response if the model is warm and already running, and a slower response if the model is cold and starting up.
Choose a file from your machine
Hint: you can also drag files onto the input
Using seed: 44374
PTI : Initializer Tokens not given, doing random inits
PTI : Placeholder Tokens ['<s1>', '<s2>']
PTI : Initializer Tokens ['<rand-0.017>', '<rand-0.017>']
Initialized <s1> with random noise (sigma=0.017), empirically 0.000 +- 0.017
Norm : 0.4636
Initialized <s2> with random noise (sigma=0.017), empirically 0.000 +- 0.017
Norm : 0.4810
/root/.pyenv/versions/3.10.11/lib/python3.10/site-packages/diffusers/configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Mask not found for cog_instance_data/0.mask.png
Warning : this will pre-process all the images in the instance data root.
0%| | 0/15 [00:00<?, ?it/s]INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
7%|▋ | 1/15 [00:00<00:02, 5.94it/s]
40%|████ | 6/15 [00:00<00:00, 25.47it/s]
100%|██████████| 15/15 [00:00<00:00, 46.74it/s]
100%|██████████| 15/15 [00:00<00:00, 37.89it/s]
a photo of a cool <s1><s2>
0%| | 0/15 [00:00<?, ?it/s]
a photo of a nice <s1><s2>
a cropped photo of the <s1><s2>
7%|▋ | 1/15 [00:07<01:49, 7.81s/it]
a photo of the nice <s1><s2>
20%|██ | 3/15 [00:08<00:25, 2.13s/it]
a photo of the clean <s1><s2>
27%|██▋ | 4/15 [00:08<00:16, 1.46s/it]
a good photo of the <s1><s2>
33%|███▎ | 5/15 [00:08<00:10, 1.05s/it]
a photo of a clean <s1><s2>
40%|████ | 6/15 [00:08<00:07, 1.28it/s]
a photo of the <s1><s2>
47%|████▋ | 7/15 [00:08<00:04, 1.66it/s]
a photo of a nice <s1><s2>
53%|█████▎ | 8/15 [00:09<00:03, 2.10it/s]
a rendition of the <s1><s2>
60%|██████ | 9/15 [00:09<00:02, 2.54it/s]
a photo of a nice <s1><s2>
67%|██████▋ | 10/15 [00:09<00:01, 2.98it/s]
a photo of a small <s1><s2>
73%|███████▎ | 11/15 [00:09<00:01, 3.37it/s]
a photo of the <s1><s2>
80%|████████ | 12/15 [00:09<00:00, 3.70it/s]
a rendition of the <s1><s2>
87%|████████▋ | 13/15 [00:10<00:00, 3.97it/s]
a photo of my <s1><s2>
93%|█████████▎| 14/15 [00:10<00:00, 4.17it/s]
100%|██████████| 15/15 [00:10<00:00, 4.34it/s]
100%|██████████| 15/15 [00:10<00:00, 1.42it/s]
PTI : Using cached latent.
0%| | 0/1000 [00:00<?, ?it/s]
tensor(0.0058, device='cuda:0')
tensor([[0.4645],
[0.4824]], device='cuda:0')
Current Norm : tensor([0.4580, 0.4741], device='cuda:0')
Steps: 0%| | 0/1000 [00:00<?, ?it/s]
Steps: 0%| | 1/1000 [00:05<1:33:07, 5.59s/it]
Steps: 0%| | 1/1000 [00:05<1:33:07, 5.59s/it, loss=0.00105, lr=0.001]
Steps: 0%| | 2/1000 [00:06<43:50, 2.64s/it, loss=0.00105, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4600],
[0.4757]], device='cuda:0')
Current Norm : tensor([0.4540, 0.4681], device='cuda:0')
Steps: 0%| | 2/1000 [00:06<43:50, 2.64s/it, loss=0.00522, lr=0.001]
Steps: 0%| | 3/1000 [00:06<28:27, 1.71s/it, loss=0.00522, lr=0.001]
Steps: 0%| | 3/1000 [00:06<28:27, 1.71s/it, loss=0.0674, lr=0.001]
Steps: 0%| | 4/1000 [00:07<20:57, 1.26s/it, loss=0.0674, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4559],
[0.4699]], device='cuda:0')
Current Norm : tensor([0.4503, 0.4629], device='cuda:0')
Steps: 0%| | 4/1000 [00:07<20:57, 1.26s/it, loss=0.00126, lr=0.001]
Steps: 0%| | 5/1000 [00:07<17:01, 1.03s/it, loss=0.00126, lr=0.001]
Steps: 0%| | 5/1000 [00:07<17:01, 1.03s/it, loss=0.00103, lr=0.001]
Steps: 1%| | 6/1000 [00:08<14:27, 1.15it/s, loss=0.00103, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4523],
[0.4645]], device='cuda:0')
Current Norm : tensor([0.4470, 0.4581], device='cuda:0')
Steps: 1%| | 6/1000 [00:08<14:27, 1.15it/s, loss=0.00719, lr=0.001]
Steps: 1%| | 7/1000 [00:09<13:02, 1.27it/s, loss=0.00719, lr=0.001]
Steps: 1%| | 7/1000 [00:09<13:02, 1.27it/s, loss=0.000785, lr=0.001]
Steps: 1%| | 8/1000 [00:09<11:53, 1.39it/s, loss=0.000785, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4486],
[0.4593]], device='cuda:0')
Current Norm : tensor([0.4437, 0.4534], device='cuda:0')
Steps: 1%| | 8/1000 [00:09<11:53, 1.39it/s, loss=0.00652, lr=0.001]
Steps: 1%| | 9/1000 [00:10<11:19, 1.46it/s, loss=0.00652, lr=0.001]
Steps: 1%| | 9/1000 [00:10<11:19, 1.46it/s, loss=0.000549, lr=0.001]
Steps: 1%| | 10/1000 [00:10<10:46, 1.53it/s, loss=0.000549, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4453],
[0.4547]], device='cuda:0')
Current Norm : tensor([0.4408, 0.4493], device='cuda:0')
Steps: 1%| | 10/1000 [00:10<10:46, 1.53it/s, loss=0.079, lr=0.001]
Steps: 1%| | 11/1000 [00:11<10:31, 1.57it/s, loss=0.079, lr=0.001]
Steps: 1%| | 11/1000 [00:11<10:31, 1.57it/s, loss=0.00539, lr=0.001]
Steps: 1%| | 12/1000 [00:12<10:11, 1.62it/s, loss=0.00539, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4424],
[0.4503]], device='cuda:0')
Current Norm : tensor([0.4381, 0.4453], device='cuda:0')
Steps: 1%| | 12/1000 [00:12<10:11, 1.62it/s, loss=0.0134, lr=0.001]
Steps: 1%|▏ | 13/1000 [00:12<10:07, 1.63it/s, loss=0.0134, lr=0.001]
Steps: 1%|▏ | 13/1000 [00:12<10:07, 1.63it/s, loss=0.00237, lr=0.001]
Steps: 1%|▏ | 14/1000 [00:13<09:57, 1.65it/s, loss=0.00237, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4397],
[0.4465]], device='cuda:0')
Current Norm : tensor([0.4357, 0.4418], device='cuda:0')
Steps: 1%|▏ | 14/1000 [00:13<09:57, 1.65it/s, loss=0.0392, lr=0.001]
Steps: 2%|▏ | 15/1000 [00:13<09:57, 1.65it/s, loss=0.0392, lr=0.001]
Steps: 2%|▏ | 15/1000 [00:13<09:57, 1.65it/s, loss=0.025, lr=0.001]
Steps: 2%|▏ | 16/1000 [00:14<09:49, 1.67it/s, loss=0.025, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4370],
[0.4430]], device='cuda:0')
Current Norm : tensor([0.4333, 0.4387], device='cuda:0')
Steps: 2%|▏ | 16/1000 [00:14<09:49, 1.67it/s, loss=0.00716, lr=0.001]
Steps: 2%|▏ | 17/1000 [00:15<09:55, 1.65it/s, loss=0.00716, lr=0.001]
Steps: 2%|▏ | 17/1000 [00:15<09:55, 1.65it/s, loss=0.101, lr=0.001]
Steps: 2%|▏ | 18/1000 [00:15<09:49, 1.67it/s, loss=0.101, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4346],
[0.4399]], device='cuda:0')
Current Norm : tensor([0.4311, 0.4359], device='cuda:0')
Steps: 2%|▏ | 18/1000 [00:15<09:49, 1.67it/s, loss=0.0104, lr=0.001]
Steps: 2%|▏ | 19/1000 [00:16<09:50, 1.66it/s, loss=0.0104, lr=0.001]
Steps: 2%|▏ | 19/1000 [00:16<09:50, 1.66it/s, loss=0.0845, lr=0.001]
Steps: 2%|▏ | 20/1000 [00:16<09:41, 1.69it/s, loss=0.0845, lr=0.001]
tensor(0.0006, device='cuda:0')
tensor([[0.4324],
[0.4371]], device='cuda:0')
Current Norm : tensor([0.4291, 0.4334], device='cuda:0')
Steps: 2%|▏ | 20/1000 [00:16<09:41, 1.69it/s, loss=0.000133, lr=0.001]
Steps: 2%|▏ | 21/1000 [00:17<09:46, 1.67it/s, loss=0.000133, lr=0.001]
Steps: 2%|▏ | 21/1000 [00:17<09:46, 1.67it/s, loss=0.000289, lr=0.001]
Steps: 2%|▏ | 22/1000 [00:18<09:42, 1.68it/s, loss=0.000289, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4304],
[0.4347]], device='cuda:0')
Current Norm : tensor([0.4273, 0.4312], device='cuda:0')
Steps: 2%|▏ | 22/1000 [00:18<09:42, 1.68it/s, loss=0.00106, lr=0.001]
Steps: 2%|▏ | 23/1000 [00:18<09:47, 1.66it/s, loss=0.00106, lr=0.001]
Steps: 2%|▏ | 23/1000 [00:18<09:47, 1.66it/s, loss=0.000295, lr=0.001]
Steps: 2%|▏ | 24/1000 [00:19<09:39, 1.68it/s, loss=0.000295, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4285],
[0.4325]], device='cuda:0')
Current Norm : tensor([0.4256, 0.4292], device='cuda:0')
Steps: 2%|▏ | 24/1000 [00:19<09:39, 1.68it/s, loss=0.0145, lr=0.001]
Steps: 2%|▎ | 25/1000 [00:19<09:43, 1.67it/s, loss=0.0145, lr=0.001]
Steps: 2%|▎ | 25/1000 [00:19<09:43, 1.67it/s, loss=0.0101, lr=0.001]
Steps: 3%|▎ | 26/1000 [00:20<09:37, 1.69it/s, loss=0.0101, lr=0.001]
tensor(0.0068, device='cuda:0')
tensor([[0.4268],
[0.4305]], device='cuda:0')
Current Norm : tensor([0.4241, 0.4275], device='cuda:0')
Steps: 3%|▎ | 26/1000 [00:20<09:37, 1.69it/s, loss=0.000702, lr=0.001]
Steps: 3%|▎ | 27/1000 [00:21<09:40, 1.68it/s, loss=0.000702, lr=0.001]
Steps: 3%|▎ | 27/1000 [00:21<09:40, 1.68it/s, loss=0.0608, lr=0.001]
Steps: 3%|▎ | 28/1000 [00:21<09:33, 1.69it/s, loss=0.0608, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4252],
[0.4288]], device='cuda:0')
Current Norm : tensor([0.4227, 0.4259], device='cuda:0')
Steps: 3%|▎ | 28/1000 [00:21<09:33, 1.69it/s, loss=0.000732, lr=0.001]
Steps: 3%|▎ | 29/1000 [00:22<09:43, 1.66it/s, loss=0.000732, lr=0.001]
Steps: 3%|▎ | 29/1000 [00:22<09:43, 1.66it/s, loss=0.0296, lr=0.001]
Steps: 3%|▎ | 30/1000 [00:22<09:39, 1.67it/s, loss=0.0296, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4238],
[0.4273]], device='cuda:0')
Current Norm : tensor([0.4215, 0.4246], device='cuda:0')
Steps: 3%|▎ | 30/1000 [00:22<09:39, 1.67it/s, loss=0.00688, lr=0.001]
Steps: 3%|▎ | 31/1000 [00:23<11:19, 1.43it/s, loss=0.00688, lr=0.001]
Steps: 3%|▎ | 31/1000 [00:23<11:19, 1.43it/s, loss=0.000233, lr=0.001]
Steps: 3%|▎ | 32/1000 [00:24<10:44, 1.50it/s, loss=0.000233, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4226],
[0.4259]], device='cuda:0')
Current Norm : tensor([0.4203, 0.4233], device='cuda:0')
Steps: 3%|▎ | 32/1000 [00:24<10:44, 1.50it/s, loss=0.0533, lr=0.001]
Steps: 3%|▎ | 33/1000 [00:24<10:26, 1.54it/s, loss=0.0533, lr=0.001]
Steps: 3%|▎ | 33/1000 [00:24<10:26, 1.54it/s, loss=0.00796, lr=0.001]
Steps: 3%|▎ | 34/1000 [00:25<10:07, 1.59it/s, loss=0.00796, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4214],
[0.4246]], device='cuda:0')
Current Norm : tensor([0.4193, 0.4222], device='cuda:0')
Steps: 3%|▎ | 34/1000 [00:25<10:07, 1.59it/s, loss=0.00294, lr=0.001]
Steps: 4%|▎ | 35/1000 [00:26<10:00, 1.61it/s, loss=0.00294, lr=0.001]
Steps: 4%|▎ | 35/1000 [00:26<10:00, 1.61it/s, loss=0.00516, lr=0.001]
Steps: 4%|▎ | 36/1000 [00:26<09:47, 1.64it/s, loss=0.00516, lr=0.001]
tensor(0.0097, device='cuda:0')
tensor([[0.4203],
[0.4232]], device='cuda:0')
Current Norm : tensor([0.4183, 0.4209], device='cuda:0')
Steps: 4%|▎ | 36/1000 [00:26<09:47, 1.64it/s, loss=0.149, lr=0.001]
Steps: 4%|▎ | 37/1000 [00:27<09:46, 1.64it/s, loss=0.149, lr=0.001]
Steps: 4%|▎ | 37/1000 [00:27<09:46, 1.64it/s, loss=0.0233, lr=0.001]
Steps: 4%|▍ | 38/1000 [00:27<09:38, 1.66it/s, loss=0.0233, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4194],
[0.4220]], device='cuda:0')
Current Norm : tensor([0.4174, 0.4198], device='cuda:0')
Steps: 4%|▍ | 38/1000 [00:27<09:38, 1.66it/s, loss=0.105, lr=0.001]
Steps: 4%|▍ | 39/1000 [00:28<09:40, 1.66it/s, loss=0.105, lr=0.001]
Steps: 4%|▍ | 39/1000 [00:28<09:40, 1.66it/s, loss=0.000409, lr=0.001]
Steps: 4%|▍ | 40/1000 [00:29<09:32, 1.68it/s, loss=0.000409, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4186],
[0.4208]], device='cuda:0')
Current Norm : tensor([0.4167, 0.4188], device='cuda:0')
Steps: 4%|▍ | 40/1000 [00:29<09:32, 1.68it/s, loss=0.0079, lr=0.001]
Steps: 4%|▍ | 41/1000 [00:29<09:36, 1.66it/s, loss=0.0079, lr=0.001]
Steps: 4%|▍ | 41/1000 [00:29<09:36, 1.66it/s, loss=0.0223, lr=0.001]
Steps: 4%|▍ | 42/1000 [00:30<09:29, 1.68it/s, loss=0.0223, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4179],
[0.4198]], device='cuda:0')
Current Norm : tensor([0.4161, 0.4178], device='cuda:0')
Steps: 4%|▍ | 42/1000 [00:30<09:29, 1.68it/s, loss=0.0519, lr=0.001]
Steps: 4%|▍ | 43/1000 [00:30<09:38, 1.65it/s, loss=0.0519, lr=0.001]
Steps: 4%|▍ | 43/1000 [00:30<09:38, 1.65it/s, loss=0.00597, lr=0.001]
Steps: 4%|▍ | 44/1000 [00:31<09:30, 1.68it/s, loss=0.00597, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4173],
[0.4189]], device='cuda:0')
Current Norm : tensor([0.4156, 0.4170], device='cuda:0')
Steps: 4%|▍ | 44/1000 [00:31<09:30, 1.68it/s, loss=0.028, lr=0.001]
Steps: 4%|▍ | 45/1000 [00:32<09:34, 1.66it/s, loss=0.028, lr=0.001]
Steps: 4%|▍ | 45/1000 [00:32<09:34, 1.66it/s, loss=0.000535, lr=0.001]
Steps: 5%|▍ | 46/1000 [00:32<09:28, 1.68it/s, loss=0.000535, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4168],
[0.4180]], device='cuda:0')
Current Norm : tensor([0.4151, 0.4162], device='cuda:0')
Steps: 5%|▍ | 46/1000 [00:32<09:28, 1.68it/s, loss=0.0098, lr=0.001]
Steps: 5%|▍ | 47/1000 [00:33<09:33, 1.66it/s, loss=0.0098, lr=0.001]
Steps: 5%|▍ | 47/1000 [00:33<09:33, 1.66it/s, loss=0.00272, lr=0.001]
Steps: 5%|▍ | 48/1000 [00:33<09:26, 1.68it/s, loss=0.00272, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4162],
[0.4172]], device='cuda:0')
Current Norm : tensor([0.4146, 0.4155], device='cuda:0')
Steps: 5%|▍ | 48/1000 [00:33<09:26, 1.68it/s, loss=0.0584, lr=0.001]
Steps: 5%|▍ | 49/1000 [00:34<09:33, 1.66it/s, loss=0.0584, lr=0.001]
Steps: 5%|▍ | 49/1000 [00:34<09:33, 1.66it/s, loss=0.000687, lr=0.001]
Steps: 5%|▌ | 50/1000 [00:35<09:28, 1.67it/s, loss=0.000687, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4157],
[0.4165]], device='cuda:0')
Current Norm : tensor([0.4141, 0.4149], device='cuda:0')
Steps: 5%|▌ | 50/1000 [00:35<09:28, 1.67it/s, loss=0.000356, lr=0.001]
Steps: 5%|▌ | 51/1000 [00:35<09:31, 1.66it/s, loss=0.000356, lr=0.001]
Steps: 5%|▌ | 51/1000 [00:35<09:31, 1.66it/s, loss=0.0267, lr=0.001]
Steps: 5%|▌ | 52/1000 [00:36<09:25, 1.68it/s, loss=0.0267, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4152],
[0.4158]], device='cuda:0')
Current Norm : tensor([0.4137, 0.4142], device='cuda:0')
Steps: 5%|▌ | 52/1000 [00:36<09:25, 1.68it/s, loss=0.000584, lr=0.001]
Steps: 5%|▌ | 53/1000 [00:36<09:29, 1.66it/s, loss=0.000584, lr=0.001]
Steps: 5%|▌ | 53/1000 [00:36<09:29, 1.66it/s, loss=0.00399, lr=0.001]
Steps: 5%|▌ | 54/1000 [00:37<09:22, 1.68it/s, loss=0.00399, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4148],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4133, 0.4136], device='cuda:0')
Steps: 5%|▌ | 54/1000 [00:37<09:22, 1.68it/s, loss=0.000407, lr=0.001]
Steps: 6%|▌ | 55/1000 [00:38<09:24, 1.68it/s, loss=0.000407, lr=0.001]
Steps: 6%|▌ | 55/1000 [00:38<09:24, 1.68it/s, loss=0.00915, lr=0.001]
Steps: 6%|▌ | 56/1000 [00:38<09:18, 1.69it/s, loss=0.00915, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4144],
[0.4145]], device='cuda:0')
Current Norm : tensor([0.4129, 0.4131], device='cuda:0')
Steps: 6%|▌ | 56/1000 [00:38<09:18, 1.69it/s, loss=0.00812, lr=0.001]
Steps: 6%|▌ | 57/1000 [00:39<09:24, 1.67it/s, loss=0.00812, lr=0.001]
Steps: 6%|▌ | 57/1000 [00:39<09:24, 1.67it/s, loss=0.000517, lr=0.001]
Steps: 6%|▌ | 58/1000 [00:39<09:19, 1.68it/s, loss=0.000517, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4140],
[0.4139]], device='cuda:0')
Current Norm : tensor([0.4126, 0.4125], device='cuda:0')
Steps: 6%|▌ | 58/1000 [00:39<09:19, 1.68it/s, loss=0.00064, lr=0.001]
Steps: 6%|▌ | 59/1000 [00:40<09:24, 1.67it/s, loss=0.00064, lr=0.001]
Steps: 6%|▌ | 59/1000 [00:40<09:24, 1.67it/s, loss=0.0373, lr=0.001]
Steps: 6%|▌ | 60/1000 [00:41<09:18, 1.68it/s, loss=0.0373, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4136],
[0.4134]], device='cuda:0')
Current Norm : tensor([0.4122, 0.4121], device='cuda:0')
Steps: 6%|▌ | 60/1000 [00:41<09:18, 1.68it/s, loss=0.0119, lr=0.001]
Steps: 6%|▌ | 61/1000 [00:41<09:25, 1.66it/s, loss=0.0119, lr=0.001]
Steps: 6%|▌ | 61/1000 [00:41<09:25, 1.66it/s, loss=0.000365, lr=0.001]
Steps: 6%|▌ | 62/1000 [00:42<09:20, 1.67it/s, loss=0.000365, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4132],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4119, 0.4116], device='cuda:0')
Steps: 6%|▌ | 62/1000 [00:42<09:20, 1.67it/s, loss=0.00457, lr=0.001]
Steps: 6%|▋ | 63/1000 [00:42<09:24, 1.66it/s, loss=0.00457, lr=0.001]
Steps: 6%|▋ | 63/1000 [00:42<09:24, 1.66it/s, loss=0.038, lr=0.001]
Steps: 6%|▋ | 64/1000 [00:43<09:19, 1.67it/s, loss=0.038, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4128],
[0.4124]], device='cuda:0')
Current Norm : tensor([0.4116, 0.4112], device='cuda:0')
Steps: 6%|▋ | 64/1000 [00:43<09:19, 1.67it/s, loss=0.0021, lr=0.001]
Steps: 6%|▋ | 65/1000 [00:44<09:23, 1.66it/s, loss=0.0021, lr=0.001]
Steps: 6%|▋ | 65/1000 [00:44<09:23, 1.66it/s, loss=0.000878, lr=0.001]
Steps: 7%|▋ | 66/1000 [00:44<09:17, 1.68it/s, loss=0.000878, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4125],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4113, 0.4108], device='cuda:0')
Steps: 7%|▋ | 66/1000 [00:44<09:17, 1.68it/s, loss=0.0884, lr=0.001]
Steps: 7%|▋ | 67/1000 [00:45<09:21, 1.66it/s, loss=0.0884, lr=0.001]
Steps: 7%|▋ | 67/1000 [00:45<09:21, 1.66it/s, loss=0.00163, lr=0.001]
Steps: 7%|▋ | 68/1000 [00:45<09:15, 1.68it/s, loss=0.00163, lr=0.001]
tensor(0.0089, device='cuda:0')
tensor([[0.4123],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4106], device='cuda:0')
Steps: 7%|▋ | 68/1000 [00:45<09:15, 1.68it/s, loss=0.00931, lr=0.001]
Steps: 7%|▋ | 69/1000 [00:46<09:19, 1.67it/s, loss=0.00931, lr=0.001]
Steps: 7%|▋ | 69/1000 [00:46<09:19, 1.67it/s, loss=0.00434, lr=0.001]
Steps: 7%|▋ | 70/1000 [00:47<09:13, 1.68it/s, loss=0.00434, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4121],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4104], device='cuda:0')
Steps: 7%|▋ | 70/1000 [00:47<09:13, 1.68it/s, loss=0.00211, lr=0.001]
Steps: 7%|▋ | 71/1000 [00:47<09:22, 1.65it/s, loss=0.00211, lr=0.001]
Steps: 7%|▋ | 71/1000 [00:47<09:22, 1.65it/s, loss=0.0829, lr=0.001]
Steps: 7%|▋ | 72/1000 [00:48<09:16, 1.67it/s, loss=0.0829, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4119],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4104], device='cuda:0')
Steps: 7%|▋ | 72/1000 [00:48<09:16, 1.67it/s, loss=0.00421, lr=0.001]
Steps: 7%|▋ | 73/1000 [00:48<09:19, 1.66it/s, loss=0.00421, lr=0.001]
Steps: 7%|▋ | 73/1000 [00:48<09:19, 1.66it/s, loss=0.00171, lr=0.001]
Steps: 7%|▋ | 74/1000 [00:49<09:12, 1.67it/s, loss=0.00171, lr=0.001]
tensor(0.0080, device='cuda:0')
tensor([[0.4116],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4105, 0.4103], device='cuda:0')
Steps: 7%|▋ | 74/1000 [00:49<09:12, 1.67it/s, loss=0.0162, lr=0.001]
Steps: 8%|▊ | 75/1000 [00:50<09:16, 1.66it/s, loss=0.0162, lr=0.001]
Steps: 8%|▊ | 75/1000 [00:50<09:16, 1.66it/s, loss=0.00691, lr=0.001]
Steps: 8%|▊ | 76/1000 [00:50<09:12, 1.67it/s, loss=0.00691, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4114],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4103, 0.4103], device='cuda:0')
Steps: 8%|▊ | 76/1000 [00:50<09:12, 1.67it/s, loss=0.002, lr=0.001]
Steps: 8%|▊ | 77/1000 [00:51<09:20, 1.65it/s, loss=0.002, lr=0.001]
Steps: 8%|▊ | 77/1000 [00:51<09:20, 1.65it/s, loss=0.000539, lr=0.001]
Steps: 8%|▊ | 78/1000 [00:51<09:16, 1.66it/s, loss=0.000539, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4112],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4101, 0.4103], device='cuda:0')
Steps: 8%|▊ | 78/1000 [00:51<09:16, 1.66it/s, loss=0.0036, lr=0.001]
Steps: 8%|▊ | 79/1000 [00:52<09:21, 1.64it/s, loss=0.0036, lr=0.001]
Steps: 8%|▊ | 79/1000 [00:52<09:21, 1.64it/s, loss=0.00652, lr=0.001]
Steps: 8%|▊ | 80/1000 [00:53<09:16, 1.65it/s, loss=0.00652, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4111],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4103], device='cuda:0')
Steps: 8%|▊ | 80/1000 [00:53<09:16, 1.65it/s, loss=0.00137, lr=0.001]
Steps: 8%|▊ | 81/1000 [00:53<09:18, 1.64it/s, loss=0.00137, lr=0.001]
Steps: 8%|▊ | 81/1000 [00:53<09:18, 1.64it/s, loss=0.00362, lr=0.001]
Steps: 8%|▊ | 82/1000 [00:54<09:12, 1.66it/s, loss=0.00362, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4109],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4103], device='cuda:0')
Steps: 8%|▊ | 82/1000 [00:54<09:12, 1.66it/s, loss=0.0197, lr=0.001]
Steps: 8%|▊ | 83/1000 [00:54<09:16, 1.65it/s, loss=0.0197, lr=0.001]
Steps: 8%|▊ | 83/1000 [00:54<09:16, 1.65it/s, loss=0.00979, lr=0.001]
Steps: 8%|▊ | 84/1000 [00:55<09:10, 1.66it/s, loss=0.00979, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4107],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4103], device='cuda:0')
Steps: 8%|▊ | 84/1000 [00:55<09:10, 1.66it/s, loss=0.0264, lr=0.001]
Steps: 8%|▊ | 85/1000 [00:56<09:14, 1.65it/s, loss=0.0264, lr=0.001]
Steps: 8%|▊ | 85/1000 [00:56<09:14, 1.65it/s, loss=0.000609, lr=0.001]
Steps: 9%|▊ | 86/1000 [00:56<09:11, 1.66it/s, loss=0.000609, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4106],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4103], device='cuda:0')
Steps: 9%|▊ | 86/1000 [00:56<09:11, 1.66it/s, loss=0.00065, lr=0.001]
Steps: 9%|▊ | 87/1000 [00:57<09:15, 1.64it/s, loss=0.00065, lr=0.001]
Steps: 9%|▊ | 87/1000 [00:57<09:15, 1.64it/s, loss=0.0177, lr=0.001]
Steps: 9%|▉ | 88/1000 [00:57<09:08, 1.66it/s, loss=0.0177, lr=0.001]
tensor(0.0062, device='cuda:0')
tensor([[0.4104],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4103], device='cuda:0')
Steps: 9%|▉ | 88/1000 [00:57<09:08, 1.66it/s, loss=0.0455, lr=0.001]
Steps: 9%|▉ | 89/1000 [00:58<09:12, 1.65it/s, loss=0.0455, lr=0.001]
Steps: 9%|▉ | 89/1000 [00:58<09:12, 1.65it/s, loss=0.0164, lr=0.001]
Steps: 9%|▉ | 90/1000 [00:59<09:07, 1.66it/s, loss=0.0164, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4103],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4104], device='cuda:0')
Steps: 9%|▉ | 90/1000 [00:59<09:07, 1.66it/s, loss=0.00294, lr=0.001]
Steps: 9%|▉ | 91/1000 [00:59<09:13, 1.64it/s, loss=0.00294, lr=0.001]
Steps: 9%|▉ | 91/1000 [00:59<09:13, 1.64it/s, loss=0.000789, lr=0.001]
Steps: 9%|▉ | 92/1000 [01:00<09:06, 1.66it/s, loss=0.000789, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4102],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4104], device='cuda:0')
Steps: 9%|▉ | 92/1000 [01:00<09:06, 1.66it/s, loss=0.000544, lr=0.001]
Steps: 9%|▉ | 93/1000 [01:00<09:08, 1.65it/s, loss=0.000544, lr=0.001]
Steps: 9%|▉ | 93/1000 [01:00<09:08, 1.65it/s, loss=0.000953, lr=0.001]
Steps: 9%|▉ | 94/1000 [01:01<09:02, 1.67it/s, loss=0.000953, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4101],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4105], device='cuda:0')
Steps: 9%|▉ | 94/1000 [01:01<09:02, 1.67it/s, loss=0.000429, lr=0.001]
Steps: 10%|▉ | 95/1000 [01:02<09:06, 1.66it/s, loss=0.000429, lr=0.001]
Steps: 10%|▉ | 95/1000 [01:02<09:06, 1.66it/s, loss=0.00231, lr=0.001]
Steps: 10%|▉ | 96/1000 [01:02<09:04, 1.66it/s, loss=0.00231, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4100],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4105], device='cuda:0')
Steps: 10%|▉ | 96/1000 [01:02<09:04, 1.66it/s, loss=0.00377, lr=0.001]
Steps: 10%|▉ | 97/1000 [01:03<09:11, 1.64it/s, loss=0.00377, lr=0.001]
Steps: 10%|▉ | 97/1000 [01:03<09:11, 1.64it/s, loss=0.000302, lr=0.001]
Steps: 10%|▉ | 98/1000 [01:03<09:03, 1.66it/s, loss=0.000302, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4099],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4105], device='cuda:0')
Steps: 10%|▉ | 98/1000 [01:03<09:03, 1.66it/s, loss=0.0706, lr=0.001]
Steps: 10%|▉ | 99/1000 [01:04<09:07, 1.64it/s, loss=0.0706, lr=0.001]
Steps: 10%|▉ | 99/1000 [01:04<09:07, 1.64it/s, loss=0.00305, lr=0.001]
Steps: 10%|█ | 100/1000 [01:05<09:03, 1.65it/s, loss=0.00305, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 1.5137e-03, 9.1326e-05, -1.6363e-03, -1.4289e-02], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([-0.0204, -0.0029, 0.0139, -0.0003], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_100.safetensors
tensor(0.0076, device='cuda:0')
tensor([[0.4098],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4103], device='cuda:0')
Steps: 10%|█ | 100/1000 [01:05<09:03, 1.65it/s, loss=0.000652, lr=0.001]
Steps: 10%|█ | 101/1000 [01:05<09:10, 1.63it/s, loss=0.000652, lr=0.001]
Steps: 10%|█ | 101/1000 [01:05<09:10, 1.63it/s, loss=0.0218, lr=0.001]
Steps: 10%|█ | 102/1000 [01:06<09:03, 1.65it/s, loss=0.0218, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4096],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4103], device='cuda:0')
Steps: 10%|█ | 102/1000 [01:06<09:03, 1.65it/s, loss=0.00278, lr=0.001]
Steps: 10%|█ | 103/1000 [01:07<09:04, 1.65it/s, loss=0.00278, lr=0.001]
Steps: 10%|█ | 103/1000 [01:07<09:04, 1.65it/s, loss=0.00829, lr=0.001]
Steps: 10%|█ | 104/1000 [01:07<08:59, 1.66it/s, loss=0.00829, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4096],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4103], device='cuda:0')
Steps: 10%|█ | 104/1000 [01:07<08:59, 1.66it/s, loss=0.0058, lr=0.001]
Steps: 10%|█ | 105/1000 [01:08<09:04, 1.64it/s, loss=0.0058, lr=0.001]
Steps: 10%|█ | 105/1000 [01:08<09:04, 1.64it/s, loss=0.000726, lr=0.001]
Steps: 11%|█ | 106/1000 [01:08<08:59, 1.66it/s, loss=0.000726, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4095],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4102], device='cuda:0')
Steps: 11%|█ | 106/1000 [01:08<08:59, 1.66it/s, loss=0.00454, lr=0.001]
Steps: 11%|█ | 107/1000 [01:09<09:04, 1.64it/s, loss=0.00454, lr=0.001]
Steps: 11%|█ | 107/1000 [01:09<09:04, 1.64it/s, loss=0.00176, lr=0.001]
Steps: 11%|█ | 108/1000 [01:10<08:57, 1.66it/s, loss=0.00176, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4094],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4103], device='cuda:0')
Steps: 11%|█ | 108/1000 [01:10<08:57, 1.66it/s, loss=0.0143, lr=0.001]
Steps: 11%|█ | 109/1000 [01:10<09:00, 1.65it/s, loss=0.0143, lr=0.001]
Steps: 11%|█ | 109/1000 [01:10<09:00, 1.65it/s, loss=0.00456, lr=0.001]
Steps: 11%|█ | 110/1000 [01:11<08:56, 1.66it/s, loss=0.00456, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4093],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4103], device='cuda:0')
Steps: 11%|█ | 110/1000 [01:11<08:56, 1.66it/s, loss=0.0555, lr=0.001]
Steps: 11%|█ | 111/1000 [01:11<09:01, 1.64it/s, loss=0.0555, lr=0.001]
Steps: 11%|█ | 111/1000 [01:11<09:01, 1.64it/s, loss=0.00316, lr=0.001]
Steps: 11%|█ | 112/1000 [01:12<08:55, 1.66it/s, loss=0.00316, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4092],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4103], device='cuda:0')
Steps: 11%|█ | 112/1000 [01:12<08:55, 1.66it/s, loss=0.00851, lr=0.001]
Steps: 11%|█▏ | 113/1000 [01:13<08:57, 1.65it/s, loss=0.00851, lr=0.001]
Steps: 11%|█▏ | 113/1000 [01:13<08:57, 1.65it/s, loss=0.0277, lr=0.001]
Steps: 11%|█▏ | 114/1000 [01:13<08:55, 1.65it/s, loss=0.0277, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4091],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4104], device='cuda:0')
Steps: 11%|█▏ | 114/1000 [01:13<08:55, 1.65it/s, loss=0.00622, lr=0.001]
Steps: 12%|█▏ | 115/1000 [01:14<09:01, 1.63it/s, loss=0.00622, lr=0.001]
Steps: 12%|█▏ | 115/1000 [01:14<09:01, 1.63it/s, loss=0.0382, lr=0.001]
Steps: 12%|█▏ | 116/1000 [01:14<08:56, 1.65it/s, loss=0.0382, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4090],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4105], device='cuda:0')
Steps: 12%|█▏ | 116/1000 [01:14<08:56, 1.65it/s, loss=0.00442, lr=0.001]
Steps: 12%|█▏ | 117/1000 [01:15<08:58, 1.64it/s, loss=0.00442, lr=0.001]
Steps: 12%|█▏ | 117/1000 [01:15<08:58, 1.64it/s, loss=0.0118, lr=0.001]
Steps: 12%|█▏ | 118/1000 [01:16<08:53, 1.65it/s, loss=0.0118, lr=0.001]
tensor(0.0080, device='cuda:0')
tensor([[0.4090],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4106], device='cuda:0')
Steps: 12%|█▏ | 118/1000 [01:16<08:53, 1.65it/s, loss=0.0209, lr=0.001]
Steps: 12%|█▏ | 119/1000 [01:16<08:57, 1.64it/s, loss=0.0209, lr=0.001]
Steps: 12%|█▏ | 119/1000 [01:16<08:57, 1.64it/s, loss=0.0189, lr=0.001]
Steps: 12%|█▏ | 120/1000 [01:17<08:55, 1.64it/s, loss=0.0189, lr=0.001]
tensor(0.0190, device='cuda:0')
tensor([[0.4091],
[0.4119]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4107], device='cuda:0')
Steps: 12%|█▏ | 120/1000 [01:17<08:55, 1.64it/s, loss=0.0302, lr=0.001]
Steps: 12%|█▏ | 121/1000 [01:17<09:00, 1.62it/s, loss=0.0302, lr=0.001]
Steps: 12%|█▏ | 121/1000 [01:17<09:00, 1.62it/s, loss=0.0261, lr=0.001]
Steps: 12%|█▏ | 122/1000 [01:18<08:52, 1.65it/s, loss=0.0261, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4093],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4110], device='cuda:0')
Steps: 12%|█▏ | 122/1000 [01:18<08:52, 1.65it/s, loss=0.00258, lr=0.001]
Steps: 12%|█▏ | 123/1000 [01:19<08:55, 1.64it/s, loss=0.00258, lr=0.001]
Steps: 12%|█▏ | 123/1000 [01:19<08:55, 1.64it/s, loss=0.000634, lr=0.001]
Steps: 12%|█▏ | 124/1000 [01:19<08:49, 1.65it/s, loss=0.000634, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4096],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4113], device='cuda:0')
Steps: 12%|█▏ | 124/1000 [01:19<08:49, 1.65it/s, loss=0.00197, lr=0.001]
Steps: 12%|█▎ | 125/1000 [01:20<08:51, 1.65it/s, loss=0.00197, lr=0.001]
Steps: 12%|█▎ | 125/1000 [01:20<08:51, 1.65it/s, loss=0.0138, lr=0.001]
Steps: 13%|█▎ | 126/1000 [01:20<08:44, 1.67it/s, loss=0.0138, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4098],
[0.4128]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4116], device='cuda:0')
Steps: 13%|█▎ | 126/1000 [01:20<08:44, 1.67it/s, loss=0.0196, lr=0.001]
Steps: 13%|█▎ | 127/1000 [01:21<08:48, 1.65it/s, loss=0.0196, lr=0.001]
Steps: 13%|█▎ | 127/1000 [01:21<08:48, 1.65it/s, loss=0.000179, lr=0.001]
Steps: 13%|█▎ | 128/1000 [01:22<08:45, 1.66it/s, loss=0.000179, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4100],
[0.4132]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4119], device='cuda:0')
Steps: 13%|█▎ | 128/1000 [01:22<08:45, 1.66it/s, loss=0.0143, lr=0.001]
Steps: 13%|█▎ | 129/1000 [01:22<08:47, 1.65it/s, loss=0.0143, lr=0.001]
Steps: 13%|█▎ | 129/1000 [01:22<08:47, 1.65it/s, loss=0.0112, lr=0.001]
Steps: 13%|█▎ | 130/1000 [01:23<08:45, 1.66it/s, loss=0.0112, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4101],
[0.4136]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4122], device='cuda:0')
Steps: 13%|█▎ | 130/1000 [01:23<08:45, 1.66it/s, loss=0.029, lr=0.001]
Steps: 13%|█▎ | 131/1000 [01:24<08:52, 1.63it/s, loss=0.029, lr=0.001]
Steps: 13%|█▎ | 131/1000 [01:24<08:52, 1.63it/s, loss=0.00343, lr=0.001]
Steps: 13%|█▎ | 132/1000 [01:24<08:49, 1.64it/s, loss=0.00343, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4103],
[0.4139]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4125], device='cuda:0')
Steps: 13%|█▎ | 132/1000 [01:24<08:49, 1.64it/s, loss=0.00165, lr=0.001]
Steps: 13%|█▎ | 133/1000 [01:25<08:52, 1.63it/s, loss=0.00165, lr=0.001]
Steps: 13%|█▎ | 133/1000 [01:25<08:52, 1.63it/s, loss=0.00451, lr=0.001]
Steps: 13%|█▎ | 134/1000 [01:25<08:45, 1.65it/s, loss=0.00451, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4104],
[0.4142]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4128], device='cuda:0')
Steps: 13%|█▎ | 134/1000 [01:25<08:45, 1.65it/s, loss=0.00724, lr=0.001]
Steps: 14%|█▎ | 135/1000 [01:26<08:51, 1.63it/s, loss=0.00724, lr=0.001]
Steps: 14%|█▎ | 135/1000 [01:26<08:51, 1.63it/s, loss=0.000926, lr=0.001]
Steps: 14%|█▎ | 136/1000 [01:27<08:47, 1.64it/s, loss=0.000926, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4104],
[0.4145]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4130], device='cuda:0')
Steps: 14%|█▎ | 136/1000 [01:27<08:47, 1.64it/s, loss=0.000544, lr=0.001]
Steps: 14%|█▎ | 137/1000 [01:27<08:49, 1.63it/s, loss=0.000544, lr=0.001]
Steps: 14%|█▎ | 137/1000 [01:27<08:49, 1.63it/s, loss=0.0385, lr=0.001]
Steps: 14%|█▍ | 138/1000 [01:28<08:44, 1.64it/s, loss=0.0385, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4104],
[0.4147]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4132], device='cuda:0')
Steps: 14%|█▍ | 138/1000 [01:28<08:44, 1.64it/s, loss=0.0509, lr=0.001]
Steps: 14%|█▍ | 139/1000 [01:28<08:45, 1.64it/s, loss=0.0509, lr=0.001]
Steps: 14%|█▍ | 139/1000 [01:28<08:45, 1.64it/s, loss=0.000223, lr=0.001]
Steps: 14%|█▍ | 140/1000 [01:29<08:39, 1.66it/s, loss=0.000223, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4104],
[0.4149]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4134], device='cuda:0')
Steps: 14%|█▍ | 140/1000 [01:29<08:39, 1.66it/s, loss=0.113, lr=0.001]
Steps: 14%|█▍ | 141/1000 [01:30<08:44, 1.64it/s, loss=0.113, lr=0.001]
Steps: 14%|█▍ | 141/1000 [01:30<08:44, 1.64it/s, loss=0.0127, lr=0.001]
Steps: 14%|█▍ | 142/1000 [01:30<08:38, 1.65it/s, loss=0.0127, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4105],
[0.4151]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4135], device='cuda:0')
Steps: 14%|█▍ | 142/1000 [01:30<08:38, 1.65it/s, loss=0.0342, lr=0.001]
Steps: 14%|█▍ | 143/1000 [01:31<08:40, 1.65it/s, loss=0.0342, lr=0.001]
Steps: 14%|█▍ | 143/1000 [01:31<08:40, 1.65it/s, loss=0.00191, lr=0.001]
Steps: 14%|█▍ | 144/1000 [01:31<08:36, 1.66it/s, loss=0.00191, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4106],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4137], device='cuda:0')
Steps: 14%|█▍ | 144/1000 [01:31<08:36, 1.66it/s, loss=0.0527, lr=0.001]
Steps: 14%|█▍ | 145/1000 [01:32<08:39, 1.65it/s, loss=0.0527, lr=0.001]
Steps: 14%|█▍ | 145/1000 [01:32<08:39, 1.65it/s, loss=0.000642, lr=0.001]
Steps: 15%|█▍ | 146/1000 [01:33<08:34, 1.66it/s, loss=0.000642, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4137], device='cuda:0')
Steps: 15%|█▍ | 146/1000 [01:33<08:34, 1.66it/s, loss=0.000191, lr=0.001]
Steps: 15%|█▍ | 147/1000 [01:33<08:39, 1.64it/s, loss=0.000191, lr=0.001]
Steps: 15%|█▍ | 147/1000 [01:33<08:39, 1.64it/s, loss=0.0652, lr=0.001]
Steps: 15%|█▍ | 148/1000 [01:34<08:36, 1.65it/s, loss=0.0652, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4137], device='cuda:0')
Steps: 15%|█▍ | 148/1000 [01:34<08:36, 1.65it/s, loss=0.00285, lr=0.001]
Steps: 15%|█▍ | 149/1000 [01:34<08:39, 1.64it/s, loss=0.00285, lr=0.001]
Steps: 15%|█▍ | 149/1000 [01:34<08:39, 1.64it/s, loss=0.0492, lr=0.001]
Steps: 15%|█▌ | 150/1000 [01:35<08:34, 1.65it/s, loss=0.0492, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4137], device='cuda:0')
Steps: 15%|█▌ | 150/1000 [01:35<08:34, 1.65it/s, loss=0.00289, lr=0.001]
Steps: 15%|█▌ | 151/1000 [01:36<08:41, 1.63it/s, loss=0.00289, lr=0.001]
Steps: 15%|█▌ | 151/1000 [01:36<08:41, 1.63it/s, loss=0.0184, lr=0.001]
Steps: 15%|█▌ | 152/1000 [01:36<08:34, 1.65it/s, loss=0.0184, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4137], device='cuda:0')
Steps: 15%|█▌ | 152/1000 [01:36<08:34, 1.65it/s, loss=0.00912, lr=0.001]
Steps: 15%|█▌ | 153/1000 [01:37<08:39, 1.63it/s, loss=0.00912, lr=0.001]
Steps: 15%|█▌ | 153/1000 [01:37<08:39, 1.63it/s, loss=0.0912, lr=0.001]
Steps: 15%|█▌ | 154/1000 [01:38<08:34, 1.65it/s, loss=0.0912, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4107],
[0.4152]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4136], device='cuda:0')
Steps: 15%|█▌ | 154/1000 [01:38<08:34, 1.65it/s, loss=0.00224, lr=0.001]
Steps: 16%|█▌ | 155/1000 [01:38<08:38, 1.63it/s, loss=0.00224, lr=0.001]
Steps: 16%|█▌ | 155/1000 [01:38<08:38, 1.63it/s, loss=0.0296, lr=0.001]
Steps: 16%|█▌ | 156/1000 [01:39<08:34, 1.64it/s, loss=0.0296, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4107],
[0.4151]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4136], device='cuda:0')
Steps: 16%|█▌ | 156/1000 [01:39<08:34, 1.64it/s, loss=0.0158, lr=0.001]
Steps: 16%|█▌ | 157/1000 [01:39<08:37, 1.63it/s, loss=0.0158, lr=0.001]
Steps: 16%|█▌ | 157/1000 [01:39<08:37, 1.63it/s, loss=0.0173, lr=0.001]
Steps: 16%|█▌ | 158/1000 [01:40<08:32, 1.64it/s, loss=0.0173, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4106],
[0.4151]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4136], device='cuda:0')
Steps: 16%|█▌ | 158/1000 [01:40<08:32, 1.64it/s, loss=0.00296, lr=0.001]
Steps: 16%|█▌ | 159/1000 [01:41<08:37, 1.63it/s, loss=0.00296, lr=0.001]
Steps: 16%|█▌ | 159/1000 [01:41<08:37, 1.63it/s, loss=0.000782, lr=0.001]
Steps: 16%|█▌ | 160/1000 [01:41<08:35, 1.63it/s, loss=0.000782, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4105],
[0.4150]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4135], device='cuda:0')
Steps: 16%|█▌ | 160/1000 [01:41<08:35, 1.63it/s, loss=0.0113, lr=0.001]
Steps: 16%|█▌ | 161/1000 [01:42<08:36, 1.62it/s, loss=0.0113, lr=0.001]
Steps: 16%|█▌ | 161/1000 [01:42<08:36, 1.62it/s, loss=0.019, lr=0.001]
Steps: 16%|█▌ | 162/1000 [01:42<08:31, 1.64it/s, loss=0.019, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4104],
[0.4149]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4134], device='cuda:0')
Steps: 16%|█▌ | 162/1000 [01:42<08:31, 1.64it/s, loss=0.00382, lr=0.001]
Steps: 16%|█▋ | 163/1000 [01:43<08:33, 1.63it/s, loss=0.00382, lr=0.001]
Steps: 16%|█▋ | 163/1000 [01:43<08:33, 1.63it/s, loss=0.00187, lr=0.001]
Steps: 16%|█▋ | 164/1000 [01:44<08:30, 1.64it/s, loss=0.00187, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4103],
[0.4148]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4133], device='cuda:0')
Steps: 16%|█▋ | 164/1000 [01:44<08:30, 1.64it/s, loss=0.00628, lr=0.001]
Steps: 16%|█▋ | 165/1000 [01:44<08:35, 1.62it/s, loss=0.00628, lr=0.001]
Steps: 16%|█▋ | 165/1000 [01:44<08:35, 1.62it/s, loss=0.00432, lr=0.001]
Steps: 17%|█▋ | 166/1000 [01:45<08:32, 1.63it/s, loss=0.00432, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4101],
[0.4148]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4133], device='cuda:0')
Steps: 17%|█▋ | 166/1000 [01:45<08:32, 1.63it/s, loss=0.00689, lr=0.001]
Steps: 17%|█▋ | 167/1000 [01:46<08:36, 1.61it/s, loss=0.00689, lr=0.001]
Steps: 17%|█▋ | 167/1000 [01:46<08:36, 1.61it/s, loss=0.106, lr=0.001]
Steps: 17%|█▋ | 168/1000 [01:46<08:31, 1.63it/s, loss=0.106, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4099],
[0.4148]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4133], device='cuda:0')
Steps: 17%|█▋ | 168/1000 [01:46<08:31, 1.63it/s, loss=0.000728, lr=0.001]
Steps: 17%|█▋ | 169/1000 [01:47<08:34, 1.62it/s, loss=0.000728, lr=0.001]
Steps: 17%|█▋ | 169/1000 [01:47<08:34, 1.62it/s, loss=0.012, lr=0.001]
Steps: 17%|█▋ | 170/1000 [01:47<08:29, 1.63it/s, loss=0.012, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4096],
[0.4147]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4132], device='cuda:0')
Steps: 17%|█▋ | 170/1000 [01:47<08:29, 1.63it/s, loss=0.0805, lr=0.001]
Steps: 17%|█▋ | 171/1000 [01:48<08:33, 1.62it/s, loss=0.0805, lr=0.001]
Steps: 17%|█▋ | 171/1000 [01:48<08:33, 1.62it/s, loss=0.000757, lr=0.001]
Steps: 17%|█▋ | 172/1000 [01:49<08:28, 1.63it/s, loss=0.000757, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4094],
[0.4146]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4132], device='cuda:0')
Steps: 17%|█▋ | 172/1000 [01:49<08:28, 1.63it/s, loss=0.000506, lr=0.001]
Steps: 17%|█▋ | 173/1000 [01:49<08:32, 1.61it/s, loss=0.000506, lr=0.001]
Steps: 17%|█▋ | 173/1000 [01:49<08:32, 1.61it/s, loss=0.00117, lr=0.001]
Steps: 17%|█▋ | 174/1000 [01:50<08:28, 1.62it/s, loss=0.00117, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4093],
[0.4145]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4130], device='cuda:0')
Steps: 17%|█▋ | 174/1000 [01:50<08:28, 1.62it/s, loss=0.0104, lr=0.001]
Steps: 18%|█▊ | 175/1000 [01:50<08:30, 1.62it/s, loss=0.0104, lr=0.001]
Steps: 18%|█▊ | 175/1000 [01:50<08:30, 1.62it/s, loss=0.00425, lr=0.001]
Steps: 18%|█▊ | 176/1000 [01:51<08:27, 1.62it/s, loss=0.00425, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4091],
[0.4143]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4128], device='cuda:0')
Steps: 18%|█▊ | 176/1000 [01:51<08:27, 1.62it/s, loss=0.067, lr=0.001]
Steps: 18%|█▊ | 177/1000 [01:52<08:30, 1.61it/s, loss=0.067, lr=0.001]
Steps: 18%|█▊ | 177/1000 [01:52<08:30, 1.61it/s, loss=0.00193, lr=0.001]
Steps: 18%|█▊ | 178/1000 [01:52<08:25, 1.63it/s, loss=0.00193, lr=0.001]
tensor(0.0126, device='cuda:0')
tensor([[0.4089],
[0.4140]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4126], device='cuda:0')
Steps: 18%|█▊ | 178/1000 [01:52<08:25, 1.63it/s, loss=0.0166, lr=0.001]
Steps: 18%|█▊ | 179/1000 [01:53<08:29, 1.61it/s, loss=0.0166, lr=0.001]
Steps: 18%|█▊ | 179/1000 [01:53<08:29, 1.61it/s, loss=0.0151, lr=0.001]
Steps: 18%|█▊ | 180/1000 [01:54<08:24, 1.62it/s, loss=0.0151, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4087],
[0.4137]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4123], device='cuda:0')
Steps: 18%|█▊ | 180/1000 [01:54<08:24, 1.62it/s, loss=0.00435, lr=0.001]
Steps: 18%|█▊ | 181/1000 [01:54<08:30, 1.60it/s, loss=0.00435, lr=0.001]
Steps: 18%|█▊ | 181/1000 [01:54<08:30, 1.60it/s, loss=0.0246, lr=0.001]
Steps: 18%|█▊ | 182/1000 [01:55<08:27, 1.61it/s, loss=0.0246, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4087],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4122], device='cuda:0')
Steps: 18%|█▊ | 182/1000 [01:55<08:27, 1.61it/s, loss=0.00682, lr=0.001]
Steps: 18%|█▊ | 183/1000 [01:55<08:30, 1.60it/s, loss=0.00682, lr=0.001]
Steps: 18%|█▊ | 183/1000 [01:55<08:30, 1.60it/s, loss=0.0226, lr=0.001]
Steps: 18%|█▊ | 184/1000 [01:56<08:25, 1.61it/s, loss=0.0226, lr=0.001]
tensor(0.0073, device='cuda:0')
tensor([[0.4088],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4122], device='cuda:0')
Steps: 18%|█▊ | 184/1000 [01:56<08:25, 1.61it/s, loss=0.0133, lr=0.001]
Steps: 18%|█▊ | 185/1000 [01:57<08:29, 1.60it/s, loss=0.0133, lr=0.001]
Steps: 18%|█▊ | 185/1000 [01:57<08:29, 1.60it/s, loss=0.0136, lr=0.001]
Steps: 19%|█▊ | 186/1000 [01:57<08:23, 1.62it/s, loss=0.0136, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4089],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4121], device='cuda:0')
Steps: 19%|█▊ | 186/1000 [01:57<08:23, 1.62it/s, loss=0.000371, lr=0.001]
Steps: 19%|█▊ | 187/1000 [01:58<08:30, 1.59it/s, loss=0.000371, lr=0.001]
Steps: 19%|█▊ | 187/1000 [01:58<08:30, 1.59it/s, loss=0.0068, lr=0.001]
Steps: 19%|█▉ | 188/1000 [01:59<08:26, 1.60it/s, loss=0.0068, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4091],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4121], device='cuda:0')
Steps: 19%|█▉ | 188/1000 [01:59<08:26, 1.60it/s, loss=0.0109, lr=0.001]
Steps: 19%|█▉ | 189/1000 [01:59<08:30, 1.59it/s, loss=0.0109, lr=0.001]
Steps: 19%|█▉ | 189/1000 [01:59<08:30, 1.59it/s, loss=0.028, lr=0.001]
Steps: 19%|█▉ | 190/1000 [02:00<08:28, 1.59it/s, loss=0.028, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4093],
[0.4135]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4121], device='cuda:0')
Steps: 19%|█▉ | 190/1000 [02:00<08:28, 1.59it/s, loss=0.000189, lr=0.001]
Steps: 19%|█▉ | 191/1000 [02:00<08:30, 1.58it/s, loss=0.000189, lr=0.001]
Steps: 19%|█▉ | 191/1000 [02:00<08:30, 1.58it/s, loss=0.00566, lr=0.001]
Steps: 19%|█▉ | 192/1000 [02:01<08:25, 1.60it/s, loss=0.00566, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4094],
[0.4134]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4121], device='cuda:0')
Steps: 19%|█▉ | 192/1000 [02:01<08:25, 1.60it/s, loss=0.0471, lr=0.001]
Steps: 19%|█▉ | 193/1000 [02:02<08:29, 1.58it/s, loss=0.0471, lr=0.001]
Steps: 19%|█▉ | 193/1000 [02:02<08:29, 1.58it/s, loss=0.00171, lr=0.001]
Steps: 19%|█▉ | 194/1000 [02:02<08:24, 1.60it/s, loss=0.00171, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4094],
[0.4133]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4120], device='cuda:0')
Steps: 19%|█▉ | 194/1000 [02:02<08:24, 1.60it/s, loss=0.0653, lr=0.001]
Steps: 20%|█▉ | 195/1000 [02:03<08:28, 1.58it/s, loss=0.0653, lr=0.001]
Steps: 20%|█▉ | 195/1000 [02:03<08:28, 1.58it/s, loss=0.111, lr=0.001]
Steps: 20%|█▉ | 196/1000 [02:04<08:24, 1.59it/s, loss=0.111, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4094],
[0.4133]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4119], device='cuda:0')
Steps: 20%|█▉ | 196/1000 [02:04<08:24, 1.59it/s, loss=0.000168, lr=0.001]
Steps: 20%|█▉ | 197/1000 [02:04<08:27, 1.58it/s, loss=0.000168, lr=0.001]
Steps: 20%|█▉ | 197/1000 [02:04<08:27, 1.58it/s, loss=0.215, lr=0.001]
Steps: 20%|█▉ | 198/1000 [02:05<08:24, 1.59it/s, loss=0.215, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4094],
[0.4132]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4119], device='cuda:0')
Steps: 20%|█▉ | 198/1000 [02:05<08:24, 1.59it/s, loss=0.0573, lr=0.001]
Steps: 20%|█▉ | 199/1000 [02:06<08:29, 1.57it/s, loss=0.0573, lr=0.001]
Steps: 20%|█▉ | 199/1000 [02:06<08:29, 1.57it/s, loss=0.0316, lr=0.001]
Steps: 20%|██ | 200/1000 [02:06<08:25, 1.58it/s, loss=0.0316, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0056, -0.0009, -0.0087, -0.0236], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0028, -0.0070, 0.0065, 0.0088], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_200.safetensors
tensor(0.0011, device='cuda:0')
tensor([[0.4093],
[0.4131]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4118], device='cuda:0')
Steps: 20%|██ | 200/1000 [02:06<08:25, 1.58it/s, loss=0.000531, lr=0.001]
Steps: 20%|██ | 201/1000 [02:07<08:29, 1.57it/s, loss=0.000531, lr=0.001]
Steps: 20%|██ | 201/1000 [02:07<08:29, 1.57it/s, loss=0.00104, lr=0.001]
Steps: 20%|██ | 202/1000 [02:07<08:24, 1.58it/s, loss=0.00104, lr=0.001]
tensor(0.0003, device='cuda:0')
tensor([[0.4093],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4117], device='cuda:0')
Steps: 20%|██ | 202/1000 [02:07<08:24, 1.58it/s, loss=0.000786, lr=0.001]
Steps: 20%|██ | 203/1000 [02:08<08:27, 1.57it/s, loss=0.000786, lr=0.001]
Steps: 20%|██ | 203/1000 [02:08<08:27, 1.57it/s, loss=0.00016, lr=0.001]
Steps: 20%|██ | 204/1000 [02:09<08:21, 1.59it/s, loss=0.00016, lr=0.001]
tensor(0.0079, device='cuda:0')
tensor([[0.4092],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4116], device='cuda:0')
Steps: 20%|██ | 204/1000 [02:09<08:21, 1.59it/s, loss=0.0152, lr=0.001]
Steps: 20%|██ | 205/1000 [02:09<08:26, 1.57it/s, loss=0.0152, lr=0.001]
Steps: 20%|██ | 205/1000 [02:09<08:26, 1.57it/s, loss=0.0314, lr=0.001]
Steps: 21%|██ | 206/1000 [02:10<08:20, 1.59it/s, loss=0.0314, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4092],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4116], device='cuda:0')
Steps: 21%|██ | 206/1000 [02:10<08:20, 1.59it/s, loss=0.0123, lr=0.001]
Steps: 21%|██ | 207/1000 [02:11<08:25, 1.57it/s, loss=0.0123, lr=0.001]
Steps: 21%|██ | 207/1000 [02:11<08:25, 1.57it/s, loss=0.0274, lr=0.001]
Steps: 21%|██ | 208/1000 [02:11<08:19, 1.59it/s, loss=0.0274, lr=0.001]
tensor(0.0105, device='cuda:0')
tensor([[0.4093],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4117], device='cuda:0')
Steps: 21%|██ | 208/1000 [02:11<08:19, 1.59it/s, loss=0.0139, lr=0.001]
Steps: 21%|██ | 209/1000 [02:12<08:21, 1.58it/s, loss=0.0139, lr=0.001]
Steps: 21%|██ | 209/1000 [02:12<08:21, 1.58it/s, loss=0.00396, lr=0.001]
Steps: 21%|██ | 210/1000 [02:12<08:18, 1.59it/s, loss=0.00396, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4094],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4117], device='cuda:0')
Steps: 21%|██ | 210/1000 [02:12<08:18, 1.59it/s, loss=0.00166, lr=0.001]
Steps: 21%|██ | 211/1000 [02:13<08:21, 1.57it/s, loss=0.00166, lr=0.001]
Steps: 21%|██ | 211/1000 [02:13<08:21, 1.57it/s, loss=0.0014, lr=0.001]
Steps: 21%|██ | 212/1000 [02:14<08:17, 1.58it/s, loss=0.0014, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4096],
[0.4131]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4118], device='cuda:0')
Steps: 21%|██ | 212/1000 [02:14<08:17, 1.58it/s, loss=0.00378, lr=0.001]
Steps: 21%|██▏ | 213/1000 [02:14<08:28, 1.55it/s, loss=0.00378, lr=0.001]
Steps: 21%|██▏ | 213/1000 [02:14<08:28, 1.55it/s, loss=0.0232, lr=0.001]
Steps: 21%|██▏ | 214/1000 [02:15<08:24, 1.56it/s, loss=0.0232, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4098],
[0.4131]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4118], device='cuda:0')
Steps: 21%|██▏ | 214/1000 [02:15<08:24, 1.56it/s, loss=0.0017, lr=0.001]
Steps: 22%|██▏ | 215/1000 [02:16<08:23, 1.56it/s, loss=0.0017, lr=0.001]
Steps: 22%|██▏ | 215/1000 [02:16<08:23, 1.56it/s, loss=0.000727, lr=0.001]
Steps: 22%|██▏ | 216/1000 [02:16<08:19, 1.57it/s, loss=0.000727, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4099],
[0.4130]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4117], device='cuda:0')
Steps: 22%|██▏ | 216/1000 [02:16<08:19, 1.57it/s, loss=0.00184, lr=0.001]
Steps: 22%|██▏ | 217/1000 [02:17<08:22, 1.56it/s, loss=0.00184, lr=0.001]
Steps: 22%|██▏ | 217/1000 [02:17<08:22, 1.56it/s, loss=0.0032, lr=0.001]
Steps: 22%|██▏ | 218/1000 [02:18<08:17, 1.57it/s, loss=0.0032, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4100],
[0.4129]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4116], device='cuda:0')
Steps: 22%|██▏ | 218/1000 [02:18<08:17, 1.57it/s, loss=0.00103, lr=0.001]
Steps: 22%|██▏ | 219/1000 [02:18<08:20, 1.56it/s, loss=0.00103, lr=0.001]
Steps: 22%|██▏ | 219/1000 [02:18<08:20, 1.56it/s, loss=0.00787, lr=0.001]
Steps: 22%|██▏ | 220/1000 [02:19<08:14, 1.58it/s, loss=0.00787, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4102],
[0.4128]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4115], device='cuda:0')
Steps: 22%|██▏ | 220/1000 [02:19<08:14, 1.58it/s, loss=0.0362, lr=0.001]
Steps: 22%|██▏ | 221/1000 [02:20<08:17, 1.56it/s, loss=0.0362, lr=0.001]
Steps: 22%|██▏ | 221/1000 [02:20<08:17, 1.56it/s, loss=0.00483, lr=0.001]
Steps: 22%|██▏ | 222/1000 [02:20<08:13, 1.58it/s, loss=0.00483, lr=0.001]
tensor(0.0095, device='cuda:0')
tensor([[0.4104],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4114], device='cuda:0')
Steps: 22%|██▏ | 222/1000 [02:20<08:13, 1.58it/s, loss=0.0241, lr=0.001]
Steps: 22%|██▏ | 223/1000 [02:21<08:18, 1.56it/s, loss=0.0241, lr=0.001]
Steps: 22%|██▏ | 223/1000 [02:21<08:18, 1.56it/s, loss=0.0179, lr=0.001]
Steps: 22%|██▏ | 224/1000 [02:21<08:14, 1.57it/s, loss=0.0179, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4106],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4113], device='cuda:0')
Steps: 22%|██▏ | 224/1000 [02:21<08:14, 1.57it/s, loss=0.00276, lr=0.001]
Steps: 22%|██▎ | 225/1000 [02:22<08:16, 1.56it/s, loss=0.00276, lr=0.001]
Steps: 22%|██▎ | 225/1000 [02:22<08:16, 1.56it/s, loss=0.000774, lr=0.001]
Steps: 23%|██▎ | 226/1000 [02:23<08:12, 1.57it/s, loss=0.000774, lr=0.001]
tensor(0.0092, device='cuda:0')
tensor([[0.4107],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 226/1000 [02:23<08:12, 1.57it/s, loss=0.071, lr=0.001]
Steps: 23%|██▎ | 227/1000 [02:23<08:16, 1.56it/s, loss=0.071, lr=0.001]
Steps: 23%|██▎ | 227/1000 [02:23<08:16, 1.56it/s, loss=0.00114, lr=0.001]
Steps: 23%|██▎ | 228/1000 [02:24<08:12, 1.57it/s, loss=0.00114, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4107],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4113], device='cuda:0')
Steps: 23%|██▎ | 228/1000 [02:24<08:12, 1.57it/s, loss=0.000409, lr=0.001]
Steps: 23%|██▎ | 229/1000 [02:25<08:16, 1.55it/s, loss=0.000409, lr=0.001]
Steps: 23%|██▎ | 229/1000 [02:25<08:16, 1.55it/s, loss=0.0402, lr=0.001]
Steps: 23%|██▎ | 230/1000 [02:25<08:11, 1.57it/s, loss=0.0402, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4108],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 230/1000 [02:25<08:11, 1.57it/s, loss=0.0238, lr=0.001]
Steps: 23%|██▎ | 231/1000 [02:26<08:14, 1.55it/s, loss=0.0238, lr=0.001]
Steps: 23%|██▎ | 231/1000 [02:26<08:14, 1.55it/s, loss=0.000537, lr=0.001]
Steps: 23%|██▎ | 232/1000 [02:27<08:09, 1.57it/s, loss=0.000537, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4108],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 232/1000 [02:27<08:09, 1.57it/s, loss=0.000339, lr=0.001]
Steps: 23%|██▎ | 233/1000 [02:27<08:14, 1.55it/s, loss=0.000339, lr=0.001]
Steps: 23%|██▎ | 233/1000 [02:27<08:14, 1.55it/s, loss=0.0129, lr=0.001]
Steps: 23%|██▎ | 234/1000 [02:28<08:11, 1.56it/s, loss=0.0129, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4109],
[0.4124]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4112], device='cuda:0')
Steps: 23%|██▎ | 234/1000 [02:28<08:11, 1.56it/s, loss=0.00997, lr=0.001]
Steps: 24%|██▎ | 235/1000 [02:28<08:13, 1.55it/s, loss=0.00997, lr=0.001]
Steps: 24%|██▎ | 235/1000 [02:28<08:13, 1.55it/s, loss=0.0416, lr=0.001]
Steps: 24%|██▎ | 236/1000 [02:29<08:08, 1.56it/s, loss=0.0416, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4109],
[0.4123]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4111], device='cuda:0')
Steps: 24%|██▎ | 236/1000 [02:29<08:08, 1.56it/s, loss=0.00845, lr=0.001]
Steps: 24%|██▎ | 237/1000 [02:30<08:12, 1.55it/s, loss=0.00845, lr=0.001]
Steps: 24%|██▎ | 237/1000 [02:30<08:12, 1.55it/s, loss=0.000848, lr=0.001]
Steps: 24%|██▍ | 238/1000 [02:30<08:08, 1.56it/s, loss=0.000848, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4110],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4110], device='cuda:0')
Steps: 24%|██▍ | 238/1000 [02:30<08:08, 1.56it/s, loss=0.00033, lr=0.001]
Steps: 24%|██▍ | 239/1000 [02:31<08:14, 1.54it/s, loss=0.00033, lr=0.001]
Steps: 24%|██▍ | 239/1000 [02:31<08:14, 1.54it/s, loss=0.0048, lr=0.001]
Steps: 24%|██▍ | 240/1000 [02:32<08:11, 1.55it/s, loss=0.0048, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 24%|██▍ | 240/1000 [02:32<08:11, 1.55it/s, loss=0.000334, lr=0.001]
Steps: 24%|██▍ | 241/1000 [02:32<08:13, 1.54it/s, loss=0.000334, lr=0.001]
Steps: 24%|██▍ | 241/1000 [02:32<08:13, 1.54it/s, loss=0.0037, lr=0.001]
Steps: 24%|██▍ | 242/1000 [02:33<08:07, 1.55it/s, loss=0.0037, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 24%|██▍ | 242/1000 [02:33<08:07, 1.55it/s, loss=0.000196, lr=0.001]
Steps: 24%|██▍ | 243/1000 [02:34<08:11, 1.54it/s, loss=0.000196, lr=0.001]
Steps: 24%|██▍ | 243/1000 [02:34<08:11, 1.54it/s, loss=0.0248, lr=0.001]
Steps: 24%|██▍ | 244/1000 [02:34<08:07, 1.55it/s, loss=0.0248, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4109],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4107], device='cuda:0')
Steps: 24%|██▍ | 244/1000 [02:34<08:07, 1.55it/s, loss=0.00292, lr=0.001]
Steps: 24%|██▍ | 245/1000 [02:35<08:10, 1.54it/s, loss=0.00292, lr=0.001]
Steps: 24%|██▍ | 245/1000 [02:35<08:10, 1.54it/s, loss=0.000501, lr=0.001]
Steps: 25%|██▍ | 246/1000 [02:36<08:05, 1.55it/s, loss=0.000501, lr=0.001]
tensor(0.0067, device='cuda:0')
tensor([[0.4108],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4105], device='cuda:0')
Steps: 25%|██▍ | 246/1000 [02:36<08:05, 1.55it/s, loss=0.000666, lr=0.001]
Steps: 25%|██▍ | 247/1000 [02:36<08:06, 1.55it/s, loss=0.000666, lr=0.001]
Steps: 25%|██▍ | 247/1000 [02:36<08:06, 1.55it/s, loss=0.0521, lr=0.001]
Steps: 25%|██▍ | 248/1000 [02:37<08:01, 1.56it/s, loss=0.0521, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4107],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4103], device='cuda:0')
Steps: 25%|██▍ | 248/1000 [02:37<08:01, 1.56it/s, loss=0.00594, lr=0.001]
Steps: 25%|██▍ | 249/1000 [02:38<08:03, 1.55it/s, loss=0.00594, lr=0.001]
Steps: 25%|██▍ | 249/1000 [02:38<08:03, 1.55it/s, loss=0.0184, lr=0.001]
Steps: 25%|██▌ | 250/1000 [02:38<07:59, 1.56it/s, loss=0.0184, lr=0.001]
tensor(0.0015, device='cuda:0')
tensor([[0.4106],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4102], device='cuda:0')
Steps: 25%|██▌ | 250/1000 [02:38<07:59, 1.56it/s, loss=0.000888, lr=0.001]
Steps: 25%|██▌ | 251/1000 [02:39<08:04, 1.55it/s, loss=0.000888, lr=0.001]
Steps: 25%|██▌ | 251/1000 [02:39<08:04, 1.55it/s, loss=0.0139, lr=0.001]
Steps: 25%|██▌ | 252/1000 [02:39<08:00, 1.56it/s, loss=0.0139, lr=0.001]
tensor(0.0089, device='cuda:0')
tensor([[0.4105],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4100], device='cuda:0')
Steps: 25%|██▌ | 252/1000 [02:39<08:00, 1.56it/s, loss=0.0116, lr=0.001]
Steps: 25%|██▌ | 253/1000 [02:40<08:10, 1.52it/s, loss=0.0116, lr=0.001]
Steps: 25%|██▌ | 253/1000 [02:40<08:10, 1.52it/s, loss=0.0285, lr=0.001]
Steps: 25%|██▌ | 254/1000 [02:41<08:06, 1.53it/s, loss=0.0285, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4105],
[0.4109]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4099], device='cuda:0')
Steps: 25%|██▌ | 254/1000 [02:41<08:06, 1.53it/s, loss=0.011, lr=0.001]
Steps: 26%|██▌ | 255/1000 [02:41<08:06, 1.53it/s, loss=0.011, lr=0.001]
Steps: 26%|██▌ | 255/1000 [02:41<08:06, 1.53it/s, loss=0.00831, lr=0.001]
Steps: 26%|██▌ | 256/1000 [02:42<08:02, 1.54it/s, loss=0.00831, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4104],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4097], device='cuda:0')
Steps: 26%|██▌ | 256/1000 [02:42<08:02, 1.54it/s, loss=0.0235, lr=0.001]
Steps: 26%|██▌ | 257/1000 [02:43<08:03, 1.54it/s, loss=0.0235, lr=0.001]
Steps: 26%|██▌ | 257/1000 [02:43<08:03, 1.54it/s, loss=0.00303, lr=0.001]
Steps: 26%|██▌ | 258/1000 [02:43<07:59, 1.55it/s, loss=0.00303, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4103],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4095], device='cuda:0')
Steps: 26%|██▌ | 258/1000 [02:43<07:59, 1.55it/s, loss=0.0407, lr=0.001]
Steps: 26%|██▌ | 259/1000 [02:44<08:01, 1.54it/s, loss=0.0407, lr=0.001]
Steps: 26%|██▌ | 259/1000 [02:44<08:01, 1.54it/s, loss=0.0181, lr=0.001]
Steps: 26%|██▌ | 260/1000 [02:45<07:56, 1.55it/s, loss=0.0181, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4102],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4095], device='cuda:0')
Steps: 26%|██▌ | 260/1000 [02:45<07:56, 1.55it/s, loss=0.0383, lr=0.001]
Steps: 26%|██▌ | 261/1000 [02:45<08:00, 1.54it/s, loss=0.0383, lr=0.001]
Steps: 26%|██▌ | 261/1000 [02:45<08:00, 1.54it/s, loss=0.0344, lr=0.001]
Steps: 26%|██▌ | 262/1000 [02:46<07:55, 1.55it/s, loss=0.0344, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4102],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4094], device='cuda:0')
Steps: 26%|██▌ | 262/1000 [02:46<07:55, 1.55it/s, loss=0.00119, lr=0.001]
Steps: 26%|██▋ | 263/1000 [02:47<08:00, 1.53it/s, loss=0.00119, lr=0.001]
Steps: 26%|██▋ | 263/1000 [02:47<08:00, 1.53it/s, loss=0.00178, lr=0.001]
Steps: 26%|██▋ | 264/1000 [02:47<07:53, 1.55it/s, loss=0.00178, lr=0.001]
tensor(0.0092, device='cuda:0')
tensor([[0.4101],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4095], device='cuda:0')
Steps: 26%|██▋ | 264/1000 [02:47<07:53, 1.55it/s, loss=0.077, lr=0.001]
Steps: 26%|██▋ | 265/1000 [02:48<07:55, 1.55it/s, loss=0.077, lr=0.001]
Steps: 26%|██▋ | 265/1000 [02:48<07:55, 1.55it/s, loss=0.0233, lr=0.001]
Steps: 27%|██▋ | 266/1000 [02:49<07:50, 1.56it/s, loss=0.0233, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4100],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4095], device='cuda:0')
Steps: 27%|██▋ | 266/1000 [02:49<07:50, 1.56it/s, loss=0.000306, lr=0.001]
Steps: 27%|██▋ | 267/1000 [02:49<07:54, 1.54it/s, loss=0.000306, lr=0.001]
Steps: 27%|██▋ | 267/1000 [02:49<07:54, 1.54it/s, loss=0.0058, lr=0.001]
Steps: 27%|██▋ | 268/1000 [02:50<07:50, 1.56it/s, loss=0.0058, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4099],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 268/1000 [02:50<07:50, 1.56it/s, loss=0.00137, lr=0.001]
Steps: 27%|██▋ | 269/1000 [02:50<07:52, 1.55it/s, loss=0.00137, lr=0.001]
Steps: 27%|██▋ | 269/1000 [02:50<07:52, 1.55it/s, loss=0.000572, lr=0.001]
Steps: 27%|██▋ | 270/1000 [02:51<07:47, 1.56it/s, loss=0.000572, lr=0.001]
tensor(0.0095, device='cuda:0')
tensor([[0.4098],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 270/1000 [02:51<07:47, 1.56it/s, loss=0.0374, lr=0.001]
Steps: 27%|██▋ | 271/1000 [02:52<07:51, 1.55it/s, loss=0.0374, lr=0.001]
Steps: 27%|██▋ | 271/1000 [02:52<07:51, 1.55it/s, loss=0.00526, lr=0.001]
Steps: 27%|██▋ | 272/1000 [02:52<07:46, 1.56it/s, loss=0.00526, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4096],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 272/1000 [02:52<07:46, 1.56it/s, loss=0.000402, lr=0.001]
Steps: 27%|██▋ | 273/1000 [02:53<07:49, 1.55it/s, loss=0.000402, lr=0.001]
Steps: 27%|██▋ | 273/1000 [02:53<07:49, 1.55it/s, loss=0.0369, lr=0.001]
Steps: 27%|██▋ | 274/1000 [02:54<07:45, 1.56it/s, loss=0.0369, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4095],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4094], device='cuda:0')
Steps: 27%|██▋ | 274/1000 [02:54<07:45, 1.56it/s, loss=0.0328, lr=0.001]
Steps: 28%|██▊ | 275/1000 [02:54<07:49, 1.55it/s, loss=0.0328, lr=0.001]
Steps: 28%|██▊ | 275/1000 [02:54<07:49, 1.55it/s, loss=0.0216, lr=0.001]
Steps: 28%|██▊ | 276/1000 [02:55<07:45, 1.55it/s, loss=0.0216, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4093],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4094], device='cuda:0')
Steps: 28%|██▊ | 276/1000 [02:55<07:45, 1.55it/s, loss=0.00096, lr=0.001]
Steps: 28%|██▊ | 277/1000 [02:56<07:47, 1.55it/s, loss=0.00096, lr=0.001]
Steps: 28%|██▊ | 277/1000 [02:56<07:47, 1.55it/s, loss=0.00177, lr=0.001]
Steps: 28%|██▊ | 278/1000 [02:56<07:41, 1.57it/s, loss=0.00177, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4092],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4094], device='cuda:0')
Steps: 28%|██▊ | 278/1000 [02:56<07:41, 1.57it/s, loss=0.0209, lr=0.001]
Steps: 28%|██▊ | 279/1000 [02:57<07:45, 1.55it/s, loss=0.0209, lr=0.001]
Steps: 28%|██▊ | 279/1000 [02:57<07:45, 1.55it/s, loss=0.000778, lr=0.001]
Steps: 28%|██▊ | 280/1000 [02:58<07:40, 1.56it/s, loss=0.000778, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4091],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4093], device='cuda:0')
Steps: 28%|██▊ | 280/1000 [02:58<07:40, 1.56it/s, loss=0.0108, lr=0.001]
Steps: 28%|██▊ | 281/1000 [02:58<07:42, 1.55it/s, loss=0.0108, lr=0.001]
Steps: 28%|██▊ | 281/1000 [02:58<07:42, 1.55it/s, loss=0.000837, lr=0.001]
Steps: 28%|██▊ | 282/1000 [02:59<07:37, 1.57it/s, loss=0.000837, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4090],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4091], device='cuda:0')
Steps: 28%|██▊ | 282/1000 [02:59<07:37, 1.57it/s, loss=0.00502, lr=0.001]
Steps: 28%|██▊ | 283/1000 [02:59<07:41, 1.55it/s, loss=0.00502, lr=0.001]
Steps: 28%|██▊ | 283/1000 [02:59<07:41, 1.55it/s, loss=0.00781, lr=0.001]
Steps: 28%|██▊ | 284/1000 [03:00<07:36, 1.57it/s, loss=0.00781, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4089],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4090], device='cuda:0')
Steps: 28%|██▊ | 284/1000 [03:00<07:36, 1.57it/s, loss=0.00449, lr=0.001]
Steps: 28%|██▊ | 285/1000 [03:01<07:38, 1.56it/s, loss=0.00449, lr=0.001]
Steps: 28%|██▊ | 285/1000 [03:01<07:38, 1.56it/s, loss=0.0029, lr=0.001]
Steps: 29%|██▊ | 286/1000 [03:01<07:35, 1.57it/s, loss=0.0029, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4088],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4088], device='cuda:0')
Steps: 29%|██▊ | 286/1000 [03:01<07:35, 1.57it/s, loss=0.00402, lr=0.001]
Steps: 29%|██▊ | 287/1000 [03:02<07:37, 1.56it/s, loss=0.00402, lr=0.001]
Steps: 29%|██▊ | 287/1000 [03:02<07:37, 1.56it/s, loss=0.000204, lr=0.001]
Steps: 29%|██▉ | 288/1000 [03:03<07:32, 1.57it/s, loss=0.000204, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4087],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4085], device='cuda:0')
Steps: 29%|██▉ | 288/1000 [03:03<07:32, 1.57it/s, loss=0.0105, lr=0.001]
Steps: 29%|██▉ | 289/1000 [03:03<07:34, 1.56it/s, loss=0.0105, lr=0.001]
Steps: 29%|██▉ | 289/1000 [03:03<07:34, 1.56it/s, loss=0.000225, lr=0.001]
Steps: 29%|██▉ | 290/1000 [03:04<07:29, 1.58it/s, loss=0.000225, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4085],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4083], device='cuda:0')
Steps: 29%|██▉ | 290/1000 [03:04<07:29, 1.58it/s, loss=0.00173, lr=0.001]
Steps: 29%|██▉ | 291/1000 [03:05<07:32, 1.57it/s, loss=0.00173, lr=0.001]
Steps: 29%|██▉ | 291/1000 [03:05<07:32, 1.57it/s, loss=0.00477, lr=0.001]
Steps: 29%|██▉ | 292/1000 [03:05<07:27, 1.58it/s, loss=0.00477, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4083],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4080], device='cuda:0')
Steps: 29%|██▉ | 292/1000 [03:05<07:27, 1.58it/s, loss=0.0743, lr=0.001]
Steps: 29%|██▉ | 293/1000 [03:06<07:29, 1.57it/s, loss=0.0743, lr=0.001]
Steps: 29%|██▉ | 293/1000 [03:06<07:29, 1.57it/s, loss=0.0805, lr=0.001]
Steps: 29%|██▉ | 294/1000 [03:06<07:27, 1.58it/s, loss=0.0805, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4081],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4078], device='cuda:0')
Steps: 29%|██▉ | 294/1000 [03:06<07:27, 1.58it/s, loss=0.000428, lr=0.001]
Steps: 30%|██▉ | 295/1000 [03:07<07:29, 1.57it/s, loss=0.000428, lr=0.001]
Steps: 30%|██▉ | 295/1000 [03:07<07:29, 1.57it/s, loss=0.0122, lr=0.001]
Steps: 30%|██▉ | 296/1000 [03:08<07:26, 1.58it/s, loss=0.0122, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4080],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4076], device='cuda:0')
Steps: 30%|██▉ | 296/1000 [03:08<07:26, 1.58it/s, loss=0.0223, lr=0.001]
Steps: 30%|██▉ | 297/1000 [03:08<07:29, 1.56it/s, loss=0.0223, lr=0.001]
Steps: 30%|██▉ | 297/1000 [03:08<07:29, 1.56it/s, loss=0.00694, lr=0.001]
Steps: 30%|██▉ | 298/1000 [03:09<07:25, 1.58it/s, loss=0.00694, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4079],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4073], device='cuda:0')
Steps: 30%|██▉ | 298/1000 [03:09<07:25, 1.58it/s, loss=0.00844, lr=0.001]
Steps: 30%|██▉ | 299/1000 [03:10<07:28, 1.56it/s, loss=0.00844, lr=0.001]
Steps: 30%|██▉ | 299/1000 [03:10<07:28, 1.56it/s, loss=0.0155, lr=0.001]
Steps: 30%|███ | 300/1000 [03:10<07:24, 1.58it/s, loss=0.0155, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0150, 0.0010, -0.0104, -0.0219], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0016, -0.0075, -0.0043, 0.0085], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_300.safetensors
tensor(0.0046, device='cuda:0')
tensor([[0.4079],
[0.4080]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4072], device='cuda:0')
Steps: 30%|███ | 300/1000 [03:10<07:24, 1.58it/s, loss=0.0115, lr=0.001]
Steps: 30%|███ | 301/1000 [03:11<07:29, 1.55it/s, loss=0.0115, lr=0.001]
Steps: 30%|███ | 301/1000 [03:11<07:29, 1.55it/s, loss=0.00161, lr=0.001]
Steps: 30%|███ | 302/1000 [03:12<07:25, 1.57it/s, loss=0.00161, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4079],
[0.4078]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4070], device='cuda:0')
Steps: 30%|███ | 302/1000 [03:12<07:25, 1.57it/s, loss=0.00208, lr=0.001]
Steps: 30%|███ | 303/1000 [03:12<07:27, 1.56it/s, loss=0.00208, lr=0.001]
Steps: 30%|███ | 303/1000 [03:12<07:27, 1.56it/s, loss=0.00318, lr=0.001]
Steps: 30%|███ | 304/1000 [03:13<07:21, 1.58it/s, loss=0.00318, lr=0.001]
tensor(0.0041, device='cuda:0')
tensor([[0.4079],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4068], device='cuda:0')
Steps: 30%|███ | 304/1000 [03:13<07:21, 1.58it/s, loss=0.000203, lr=0.001]
Steps: 30%|███ | 305/1000 [03:13<07:23, 1.57it/s, loss=0.000203, lr=0.001]
Steps: 30%|███ | 305/1000 [03:13<07:23, 1.57it/s, loss=0.00242, lr=0.001]
Steps: 31%|███ | 306/1000 [03:14<07:20, 1.58it/s, loss=0.00242, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4079],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4067], device='cuda:0')
Steps: 31%|███ | 306/1000 [03:14<07:20, 1.58it/s, loss=0.00278, lr=0.001]
Steps: 31%|███ | 307/1000 [03:15<07:24, 1.56it/s, loss=0.00278, lr=0.001]
Steps: 31%|███ | 307/1000 [03:15<07:24, 1.56it/s, loss=0.0105, lr=0.001]
Steps: 31%|███ | 308/1000 [03:15<07:19, 1.58it/s, loss=0.0105, lr=0.001]
tensor(0.0036, device='cuda:0')
tensor([[0.4080],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4066], device='cuda:0')
Steps: 31%|███ | 308/1000 [03:15<07:19, 1.58it/s, loss=0.00387, lr=0.001]
Steps: 31%|███ | 309/1000 [03:16<07:20, 1.57it/s, loss=0.00387, lr=0.001]
Steps: 31%|███ | 309/1000 [03:16<07:20, 1.57it/s, loss=0.00835, lr=0.001]
Steps: 31%|███ | 310/1000 [03:17<07:16, 1.58it/s, loss=0.00835, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4080],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4066], device='cuda:0')
Steps: 31%|███ | 310/1000 [03:17<07:16, 1.58it/s, loss=0.000944, lr=0.001]
Steps: 31%|███ | 311/1000 [03:17<07:21, 1.56it/s, loss=0.000944, lr=0.001]
Steps: 31%|███ | 311/1000 [03:17<07:21, 1.56it/s, loss=0.0117, lr=0.001]
Steps: 31%|███ | 312/1000 [03:18<07:17, 1.57it/s, loss=0.0117, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4079],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4066], device='cuda:0')
Steps: 31%|███ | 312/1000 [03:18<07:17, 1.57it/s, loss=0.0103, lr=0.001]
Steps: 31%|███▏ | 313/1000 [03:19<07:18, 1.57it/s, loss=0.0103, lr=0.001]
Steps: 31%|███▏ | 313/1000 [03:19<07:18, 1.57it/s, loss=0.000619, lr=0.001]
Steps: 31%|███▏ | 314/1000 [03:19<07:14, 1.58it/s, loss=0.000619, lr=0.001]
tensor(0.0091, device='cuda:0')
tensor([[0.4079],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4066], device='cuda:0')
Steps: 31%|███▏ | 314/1000 [03:19<07:14, 1.58it/s, loss=0.0259, lr=0.001]
Steps: 32%|███▏ | 315/1000 [03:20<07:15, 1.57it/s, loss=0.0259, lr=0.001]
Steps: 32%|███▏ | 315/1000 [03:20<07:15, 1.57it/s, loss=0.0223, lr=0.001]
Steps: 32%|███▏ | 316/1000 [03:20<07:14, 1.58it/s, loss=0.0223, lr=0.001]
tensor(0.0302, device='cuda:0')
tensor([[0.4080],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4067], device='cuda:0')
Steps: 32%|███▏ | 316/1000 [03:20<07:14, 1.58it/s, loss=0.00669, lr=0.001]
Steps: 32%|███▏ | 317/1000 [03:21<07:15, 1.57it/s, loss=0.00669, lr=0.001]
Steps: 32%|███▏ | 317/1000 [03:21<07:15, 1.57it/s, loss=0.00348, lr=0.001]
Steps: 32%|███▏ | 318/1000 [03:22<07:10, 1.58it/s, loss=0.00348, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4084],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4069], device='cuda:0')
Steps: 32%|███▏ | 318/1000 [03:22<07:10, 1.58it/s, loss=0.0187, lr=0.001]
Steps: 32%|███▏ | 319/1000 [03:22<07:17, 1.56it/s, loss=0.0187, lr=0.001]
Steps: 32%|███▏ | 319/1000 [03:22<07:17, 1.56it/s, loss=0.000746, lr=0.001]
Steps: 32%|███▏ | 320/1000 [03:23<07:12, 1.57it/s, loss=0.000746, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4088],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4073], device='cuda:0')
Steps: 32%|███▏ | 320/1000 [03:23<07:12, 1.57it/s, loss=0.102, lr=0.001]
Steps: 32%|███▏ | 321/1000 [03:24<07:13, 1.56it/s, loss=0.102, lr=0.001]
Steps: 32%|███▏ | 321/1000 [03:24<07:13, 1.56it/s, loss=0.0553, lr=0.001]
Steps: 32%|███▏ | 322/1000 [03:24<07:08, 1.58it/s, loss=0.0553, lr=0.001]
tensor(0.0044, device='cuda:0')
tensor([[0.4093],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4078], device='cuda:0')
Steps: 32%|███▏ | 322/1000 [03:24<07:08, 1.58it/s, loss=0.00572, lr=0.001]
Steps: 32%|███▏ | 323/1000 [03:25<07:10, 1.57it/s, loss=0.00572, lr=0.001]
Steps: 32%|███▏ | 323/1000 [03:25<07:10, 1.57it/s, loss=0.0152, lr=0.001]
Steps: 32%|███▏ | 324/1000 [03:26<07:05, 1.59it/s, loss=0.0152, lr=0.001]
tensor(0.0066, device='cuda:0')
tensor([[0.4098],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4083], device='cuda:0')
Steps: 32%|███▏ | 324/1000 [03:26<07:05, 1.59it/s, loss=0.0164, lr=0.001]
Steps: 32%|███▎ | 325/1000 [03:26<07:10, 1.57it/s, loss=0.0164, lr=0.001]
Steps: 32%|███▎ | 325/1000 [03:26<07:10, 1.57it/s, loss=0.0122, lr=0.001]
Steps: 33%|███▎ | 326/1000 [03:27<07:05, 1.58it/s, loss=0.0122, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4104],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4088], device='cuda:0')
Steps: 33%|███▎ | 326/1000 [03:27<07:05, 1.58it/s, loss=0.0254, lr=0.001]
Steps: 33%|███▎ | 327/1000 [03:27<07:07, 1.57it/s, loss=0.0254, lr=0.001]
Steps: 33%|███▎ | 327/1000 [03:27<07:07, 1.57it/s, loss=0.00151, lr=0.001]
Steps: 33%|███▎ | 328/1000 [03:28<07:04, 1.58it/s, loss=0.00151, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4109],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4092], device='cuda:0')
Steps: 33%|███▎ | 328/1000 [03:28<07:04, 1.58it/s, loss=0.0125, lr=0.001]
Steps: 33%|███▎ | 329/1000 [03:29<07:08, 1.57it/s, loss=0.0125, lr=0.001]
Steps: 33%|███▎ | 329/1000 [03:29<07:08, 1.57it/s, loss=0.0043, lr=0.001]
Steps: 33%|███▎ | 330/1000 [03:29<07:04, 1.58it/s, loss=0.0043, lr=0.001]
tensor(0.0066, device='cuda:0')
tensor([[0.4114],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4102, 0.4097], device='cuda:0')
Steps: 33%|███▎ | 330/1000 [03:29<07:04, 1.58it/s, loss=0.0313, lr=0.001]
Steps: 33%|███▎ | 331/1000 [03:30<07:06, 1.57it/s, loss=0.0313, lr=0.001]
Steps: 33%|███▎ | 331/1000 [03:30<07:06, 1.57it/s, loss=0.0152, lr=0.001]
Steps: 33%|███▎ | 332/1000 [03:31<07:02, 1.58it/s, loss=0.0152, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4118],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4106, 0.4101], device='cuda:0')
Steps: 33%|███▎ | 332/1000 [03:31<07:02, 1.58it/s, loss=0.00426, lr=0.001]
Steps: 33%|███▎ | 333/1000 [03:31<07:06, 1.56it/s, loss=0.00426, lr=0.001]
Steps: 33%|███▎ | 333/1000 [03:31<07:06, 1.56it/s, loss=0.00462, lr=0.001]
Steps: 33%|███▎ | 334/1000 [03:32<07:02, 1.58it/s, loss=0.00462, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4121],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4103], device='cuda:0')
Steps: 33%|███▎ | 334/1000 [03:32<07:02, 1.58it/s, loss=0.00314, lr=0.001]
Steps: 34%|███▎ | 335/1000 [03:33<07:06, 1.56it/s, loss=0.00314, lr=0.001]
Steps: 34%|███▎ | 335/1000 [03:33<07:06, 1.56it/s, loss=0.00436, lr=0.001]
Steps: 34%|███▎ | 336/1000 [03:33<07:02, 1.57it/s, loss=0.00436, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4123],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4105], device='cuda:0')
Steps: 34%|███▎ | 336/1000 [03:33<07:02, 1.57it/s, loss=0.00463, lr=0.001]
Steps: 34%|███▎ | 337/1000 [03:34<07:04, 1.56it/s, loss=0.00463, lr=0.001]
Steps: 34%|███▎ | 337/1000 [03:34<07:04, 1.56it/s, loss=0.00249, lr=0.001]
Steps: 34%|███▍ | 338/1000 [03:34<06:59, 1.58it/s, loss=0.00249, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4124],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4112, 0.4106], device='cuda:0')
Steps: 34%|███▍ | 338/1000 [03:34<06:59, 1.58it/s, loss=0.00377, lr=0.001]
Steps: 34%|███▍ | 339/1000 [03:35<07:02, 1.57it/s, loss=0.00377, lr=0.001]
Steps: 34%|███▍ | 339/1000 [03:35<07:02, 1.57it/s, loss=0.00659, lr=0.001]
Steps: 34%|███▍ | 340/1000 [03:36<06:58, 1.58it/s, loss=0.00659, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4124],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4112, 0.4106], device='cuda:0')
Steps: 34%|███▍ | 340/1000 [03:36<06:58, 1.58it/s, loss=0.0136, lr=0.001]
Steps: 34%|███▍ | 341/1000 [03:36<07:00, 1.57it/s, loss=0.0136, lr=0.001]
Steps: 34%|███▍ | 341/1000 [03:36<07:00, 1.57it/s, loss=0.00102, lr=0.001]
Steps: 34%|███▍ | 342/1000 [03:37<06:56, 1.58it/s, loss=0.00102, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4124],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4112, 0.4106], device='cuda:0')
Steps: 34%|███▍ | 342/1000 [03:37<06:56, 1.58it/s, loss=0.00414, lr=0.001]
Steps: 34%|███▍ | 343/1000 [03:38<07:02, 1.55it/s, loss=0.00414, lr=0.001]
Steps: 34%|███▍ | 343/1000 [03:38<07:02, 1.55it/s, loss=0.00138, lr=0.001]
Steps: 34%|███▍ | 344/1000 [03:38<06:58, 1.57it/s, loss=0.00138, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4124],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4105], device='cuda:0')
Steps: 34%|███▍ | 344/1000 [03:38<06:58, 1.57it/s, loss=0.0326, lr=0.001]
Steps: 34%|███▍ | 345/1000 [03:39<07:01, 1.55it/s, loss=0.0326, lr=0.001]
Steps: 34%|███▍ | 345/1000 [03:39<07:01, 1.55it/s, loss=0.0121, lr=0.001]
Steps: 35%|███▍ | 346/1000 [03:40<06:58, 1.56it/s, loss=0.0121, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4123],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4111, 0.4103], device='cuda:0')
Steps: 35%|███▍ | 346/1000 [03:40<06:58, 1.56it/s, loss=0.00339, lr=0.001]
Steps: 35%|███▍ | 347/1000 [03:40<07:00, 1.55it/s, loss=0.00339, lr=0.001]
Steps: 35%|███▍ | 347/1000 [03:40<07:00, 1.55it/s, loss=0.000568, lr=0.001]
Steps: 35%|███▍ | 348/1000 [03:41<06:55, 1.57it/s, loss=0.000568, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4123],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4101], device='cuda:0')
Steps: 35%|███▍ | 348/1000 [03:41<06:55, 1.57it/s, loss=0.00185, lr=0.001]
Steps: 35%|███▍ | 349/1000 [03:42<06:58, 1.56it/s, loss=0.00185, lr=0.001]
Steps: 35%|███▍ | 349/1000 [03:42<06:58, 1.56it/s, loss=0.0526, lr=0.001]
Steps: 35%|███▌ | 350/1000 [03:42<06:54, 1.57it/s, loss=0.0526, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4122],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4100], device='cuda:0')
Steps: 35%|███▌ | 350/1000 [03:42<06:54, 1.57it/s, loss=0.00635, lr=0.001]
Steps: 35%|███▌ | 351/1000 [03:43<06:56, 1.56it/s, loss=0.00635, lr=0.001]
Steps: 35%|███▌ | 351/1000 [03:43<06:56, 1.56it/s, loss=0.0076, lr=0.001]
Steps: 35%|███▌ | 352/1000 [03:43<06:52, 1.57it/s, loss=0.0076, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4121],
[0.4109]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4098], device='cuda:0')
Steps: 35%|███▌ | 352/1000 [03:43<06:52, 1.57it/s, loss=0.000826, lr=0.001]
Steps: 35%|███▌ | 353/1000 [03:44<06:54, 1.56it/s, loss=0.000826, lr=0.001]
Steps: 35%|███▌ | 353/1000 [03:44<06:54, 1.56it/s, loss=0.0911, lr=0.001]
Steps: 35%|███▌ | 354/1000 [03:45<06:51, 1.57it/s, loss=0.0911, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4120],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4108, 0.4097], device='cuda:0')
Steps: 35%|███▌ | 354/1000 [03:45<06:51, 1.57it/s, loss=0.0925, lr=0.001]
Steps: 36%|███▌ | 355/1000 [03:45<06:53, 1.56it/s, loss=0.0925, lr=0.001]
Steps: 36%|███▌ | 355/1000 [03:45<06:53, 1.56it/s, loss=0.00582, lr=0.001]
Steps: 36%|███▌ | 356/1000 [03:46<06:49, 1.57it/s, loss=0.00582, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4119],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4096], device='cuda:0')
Steps: 36%|███▌ | 356/1000 [03:46<06:49, 1.57it/s, loss=0.00329, lr=0.001]
Steps: 36%|███▌ | 357/1000 [03:47<06:51, 1.56it/s, loss=0.00329, lr=0.001]
Steps: 36%|███▌ | 357/1000 [03:47<06:51, 1.56it/s, loss=0.00696, lr=0.001]
Steps: 36%|███▌ | 358/1000 [03:47<06:48, 1.57it/s, loss=0.00696, lr=0.001]
tensor(0.0007, device='cuda:0')
tensor([[0.4118],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4106, 0.4095], device='cuda:0')
Steps: 36%|███▌ | 358/1000 [03:47<06:48, 1.57it/s, loss=0.000468, lr=0.001]
Steps: 36%|███▌ | 359/1000 [03:48<06:50, 1.56it/s, loss=0.000468, lr=0.001]
Steps: 36%|███▌ | 359/1000 [03:48<06:50, 1.56it/s, loss=0.000551, lr=0.001]
Steps: 36%|███▌ | 360/1000 [03:49<06:47, 1.57it/s, loss=0.000551, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4116],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4104, 0.4094], device='cuda:0')
Steps: 36%|███▌ | 360/1000 [03:49<06:47, 1.57it/s, loss=0.00998, lr=0.001]
Steps: 36%|███▌ | 361/1000 [03:49<06:53, 1.55it/s, loss=0.00998, lr=0.001]
Steps: 36%|███▌ | 361/1000 [03:49<06:53, 1.55it/s, loss=0.00192, lr=0.001]
Steps: 36%|███▌ | 362/1000 [03:50<06:48, 1.56it/s, loss=0.00192, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4113],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4102, 0.4093], device='cuda:0')
Steps: 36%|███▌ | 362/1000 [03:50<06:48, 1.56it/s, loss=0.0871, lr=0.001]
Steps: 36%|███▋ | 363/1000 [03:50<06:52, 1.54it/s, loss=0.0871, lr=0.001]
Steps: 36%|███▋ | 363/1000 [03:50<06:52, 1.54it/s, loss=0.0123, lr=0.001]
Steps: 36%|███▋ | 364/1000 [03:51<06:47, 1.56it/s, loss=0.0123, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4110],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4092], device='cuda:0')
Steps: 36%|███▋ | 364/1000 [03:51<06:47, 1.56it/s, loss=0.00502, lr=0.001]
Steps: 36%|███▋ | 365/1000 [03:52<06:51, 1.54it/s, loss=0.00502, lr=0.001]
Steps: 36%|███▋ | 365/1000 [03:52<06:51, 1.54it/s, loss=0.00467, lr=0.001]
Steps: 37%|███▋ | 366/1000 [03:52<06:45, 1.56it/s, loss=0.00467, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4107],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4091], device='cuda:0')
Steps: 37%|███▋ | 366/1000 [03:52<06:45, 1.56it/s, loss=0.00221, lr=0.001]
Steps: 37%|███▋ | 367/1000 [03:53<06:48, 1.55it/s, loss=0.00221, lr=0.001]
Steps: 37%|███▋ | 367/1000 [03:53<06:48, 1.55it/s, loss=0.0025, lr=0.001]
Steps: 37%|███▋ | 368/1000 [03:54<06:43, 1.57it/s, loss=0.0025, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4103],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4090], device='cuda:0')
Steps: 37%|███▋ | 368/1000 [03:54<06:43, 1.57it/s, loss=0.00391, lr=0.001]
Steps: 37%|███▋ | 369/1000 [03:54<06:47, 1.55it/s, loss=0.00391, lr=0.001]
Steps: 37%|███▋ | 369/1000 [03:54<06:47, 1.55it/s, loss=0.0193, lr=0.001]
Steps: 37%|███▋ | 370/1000 [03:55<06:42, 1.56it/s, loss=0.0193, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4100],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4090], device='cuda:0')
Steps: 37%|███▋ | 370/1000 [03:55<06:42, 1.56it/s, loss=0.0164, lr=0.001]
Steps: 37%|███▋ | 371/1000 [03:56<06:44, 1.56it/s, loss=0.0164, lr=0.001]
Steps: 37%|███▋ | 371/1000 [03:56<06:44, 1.56it/s, loss=0.00448, lr=0.001]
Steps: 37%|███▋ | 372/1000 [03:56<06:38, 1.57it/s, loss=0.00448, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4098],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4089], device='cuda:0')
Steps: 37%|███▋ | 372/1000 [03:56<06:38, 1.57it/s, loss=0.025, lr=0.001]
Steps: 37%|███▋ | 373/1000 [03:57<06:44, 1.55it/s, loss=0.025, lr=0.001]
Steps: 37%|███▋ | 373/1000 [03:57<06:44, 1.55it/s, loss=0.0114, lr=0.001]
Steps: 37%|███▋ | 374/1000 [03:58<06:39, 1.57it/s, loss=0.0114, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4096],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4089], device='cuda:0')
Steps: 37%|███▋ | 374/1000 [03:58<06:39, 1.57it/s, loss=0.00137, lr=0.001]
Steps: 38%|███▊ | 375/1000 [03:58<06:41, 1.56it/s, loss=0.00137, lr=0.001]
Steps: 38%|███▊ | 375/1000 [03:58<06:41, 1.56it/s, loss=0.00574, lr=0.001]
Steps: 38%|███▊ | 376/1000 [03:59<06:37, 1.57it/s, loss=0.00574, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4093],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4089], device='cuda:0')
Steps: 38%|███▊ | 376/1000 [03:59<06:37, 1.57it/s, loss=0.00587, lr=0.001]
Steps: 38%|███▊ | 377/1000 [03:59<06:39, 1.56it/s, loss=0.00587, lr=0.001]
Steps: 38%|███▊ | 377/1000 [03:59<06:39, 1.56it/s, loss=0.000478, lr=0.001]
Steps: 38%|███▊ | 378/1000 [04:00<06:35, 1.57it/s, loss=0.000478, lr=0.001]
tensor(0.0050, device='cuda:0')
tensor([[0.4090],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4089], device='cuda:0')
Steps: 38%|███▊ | 378/1000 [04:00<06:35, 1.57it/s, loss=0.00445, lr=0.001]
Steps: 38%|███▊ | 379/1000 [04:01<06:37, 1.56it/s, loss=0.00445, lr=0.001]
Steps: 38%|███▊ | 379/1000 [04:01<06:37, 1.56it/s, loss=0.125, lr=0.001]
Steps: 38%|███▊ | 380/1000 [04:01<06:34, 1.57it/s, loss=0.125, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4087],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4088], device='cuda:0')
Steps: 38%|███▊ | 380/1000 [04:01<06:34, 1.57it/s, loss=0.00104, lr=0.001]
Steps: 38%|███▊ | 381/1000 [04:02<06:36, 1.56it/s, loss=0.00104, lr=0.001]
Steps: 38%|███▊ | 381/1000 [04:02<06:36, 1.56it/s, loss=0.0174, lr=0.001]
Steps: 38%|███▊ | 382/1000 [04:03<06:35, 1.56it/s, loss=0.0174, lr=0.001]
tensor(0.0005, device='cuda:0')
tensor([[0.4084],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4088], device='cuda:0')
Steps: 38%|███▊ | 382/1000 [04:03<06:35, 1.56it/s, loss=0.000611, lr=0.001]
Steps: 38%|███▊ | 383/1000 [04:03<06:38, 1.55it/s, loss=0.000611, lr=0.001]
Steps: 38%|███▊ | 383/1000 [04:03<06:38, 1.55it/s, loss=0.000669, lr=0.001]
Steps: 38%|███▊ | 384/1000 [04:04<06:35, 1.56it/s, loss=0.000669, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4081],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4087], device='cuda:0')
Steps: 38%|███▊ | 384/1000 [04:04<06:35, 1.56it/s, loss=0.00656, lr=0.001]
Steps: 38%|███▊ | 385/1000 [04:05<06:36, 1.55it/s, loss=0.00656, lr=0.001]
Steps: 38%|███▊ | 385/1000 [04:05<06:36, 1.55it/s, loss=0.0152, lr=0.001]
Steps: 39%|███▊ | 386/1000 [04:05<06:32, 1.57it/s, loss=0.0152, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4078],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 39%|███▊ | 386/1000 [04:05<06:32, 1.57it/s, loss=0.000874, lr=0.001]
Steps: 39%|███▊ | 387/1000 [04:06<06:34, 1.56it/s, loss=0.000874, lr=0.001]
Steps: 39%|███▊ | 387/1000 [04:06<06:34, 1.56it/s, loss=0.0114, lr=0.001]
Steps: 39%|███▉ | 388/1000 [04:06<06:29, 1.57it/s, loss=0.0114, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4075],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4084], device='cuda:0')
Steps: 39%|███▉ | 388/1000 [04:06<06:29, 1.57it/s, loss=0.00224, lr=0.001]
Steps: 39%|███▉ | 389/1000 [04:07<06:33, 1.55it/s, loss=0.00224, lr=0.001]
Steps: 39%|███▉ | 389/1000 [04:07<06:33, 1.55it/s, loss=0.00672, lr=0.001]
Steps: 39%|███▉ | 390/1000 [04:08<06:30, 1.56it/s, loss=0.00672, lr=0.001]
tensor(0.0092, device='cuda:0')
tensor([[0.4073],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4083], device='cuda:0')
Steps: 39%|███▉ | 390/1000 [04:08<06:30, 1.56it/s, loss=0.06, lr=0.001]
Steps: 39%|███▉ | 391/1000 [04:08<06:33, 1.55it/s, loss=0.06, lr=0.001]
Steps: 39%|███▉ | 391/1000 [04:08<06:33, 1.55it/s, loss=0.0165, lr=0.001]
Steps: 39%|███▉ | 392/1000 [04:09<06:29, 1.56it/s, loss=0.0165, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4071],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4083], device='cuda:0')
Steps: 39%|███▉ | 392/1000 [04:09<06:29, 1.56it/s, loss=0.00418, lr=0.001]
Steps: 39%|███▉ | 393/1000 [04:10<06:32, 1.55it/s, loss=0.00418, lr=0.001]
Steps: 39%|███▉ | 393/1000 [04:10<06:32, 1.55it/s, loss=0.0117, lr=0.001]
Steps: 39%|███▉ | 394/1000 [04:10<06:27, 1.56it/s, loss=0.0117, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4070],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4063, 0.4082], device='cuda:0')
Steps: 39%|███▉ | 394/1000 [04:10<06:27, 1.56it/s, loss=0.0147, lr=0.001]
Steps: 40%|███▉ | 395/1000 [04:11<06:31, 1.55it/s, loss=0.0147, lr=0.001]
Steps: 40%|███▉ | 395/1000 [04:11<06:31, 1.55it/s, loss=0.0108, lr=0.001]
Steps: 40%|███▉ | 396/1000 [04:12<06:26, 1.56it/s, loss=0.0108, lr=0.001]
tensor(0.0041, device='cuda:0')
tensor([[0.4069],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4062, 0.4082], device='cuda:0')
Steps: 40%|███▉ | 396/1000 [04:12<06:26, 1.56it/s, loss=0.0212, lr=0.001]
Steps: 40%|███▉ | 397/1000 [04:12<06:28, 1.55it/s, loss=0.0212, lr=0.001]
Steps: 40%|███▉ | 397/1000 [04:12<06:28, 1.55it/s, loss=0.000443, lr=0.001]
Steps: 40%|███▉ | 398/1000 [04:13<06:24, 1.57it/s, loss=0.000443, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4068],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4061, 0.4082], device='cuda:0')
Steps: 40%|███▉ | 398/1000 [04:13<06:24, 1.57it/s, loss=0.0275, lr=0.001]
Steps: 40%|███▉ | 399/1000 [04:14<06:26, 1.55it/s, loss=0.0275, lr=0.001]
Steps: 40%|███▉ | 399/1000 [04:14<06:26, 1.55it/s, loss=0.00281, lr=0.001]
Steps: 40%|████ | 400/1000 [04:14<06:22, 1.57it/s, loss=0.00281, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0111, -0.0023, -0.0138, -0.0236], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0041, -0.0082, -0.0049, 0.0073], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_400.safetensors
tensor(0.0030, device='cuda:0')
tensor([[0.4067],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4060, 0.4081], device='cuda:0')
Steps: 40%|████ | 400/1000 [04:14<06:22, 1.57it/s, loss=0.0186, lr=0.001]
Steps: 40%|████ | 401/1000 [04:15<06:25, 1.55it/s, loss=0.0186, lr=0.001]
Steps: 40%|████ | 401/1000 [04:15<06:25, 1.55it/s, loss=0.000768, lr=0.001]
Steps: 40%|████ | 402/1000 [04:15<06:20, 1.57it/s, loss=0.000768, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4066],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4059, 0.4079], device='cuda:0')
Steps: 40%|████ | 402/1000 [04:15<06:20, 1.57it/s, loss=0.024, lr=0.001]
Steps: 40%|████ | 403/1000 [04:16<06:23, 1.56it/s, loss=0.024, lr=0.001]
Steps: 40%|████ | 403/1000 [04:16<06:23, 1.56it/s, loss=0.00256, lr=0.001]
Steps: 40%|████ | 404/1000 [04:17<06:19, 1.57it/s, loss=0.00256, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4065],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4058, 0.4078], device='cuda:0')
Steps: 40%|████ | 404/1000 [04:17<06:19, 1.57it/s, loss=0.0389, lr=0.001]
Steps: 40%|████ | 405/1000 [04:17<06:22, 1.56it/s, loss=0.0389, lr=0.001]
Steps: 40%|████ | 405/1000 [04:17<06:22, 1.56it/s, loss=0.00633, lr=0.001]
Steps: 41%|████ | 406/1000 [04:18<06:19, 1.57it/s, loss=0.00633, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4064],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4058, 0.4077], device='cuda:0')
Steps: 41%|████ | 406/1000 [04:18<06:19, 1.57it/s, loss=0.0101, lr=0.001]
Steps: 41%|████ | 407/1000 [04:19<06:21, 1.55it/s, loss=0.0101, lr=0.001]
Steps: 41%|████ | 407/1000 [04:19<06:21, 1.55it/s, loss=0.0262, lr=0.001]
Steps: 41%|████ | 408/1000 [04:19<06:18, 1.56it/s, loss=0.0262, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4063],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4076], device='cuda:0')
Steps: 41%|████ | 408/1000 [04:19<06:18, 1.56it/s, loss=0.0159, lr=0.001]
Steps: 41%|████ | 409/1000 [04:20<06:19, 1.56it/s, loss=0.0159, lr=0.001]
Steps: 41%|████ | 409/1000 [04:20<06:19, 1.56it/s, loss=0.00111, lr=0.001]
Steps: 41%|████ | 410/1000 [04:21<06:16, 1.57it/s, loss=0.00111, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4063],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4075], device='cuda:0')
Steps: 41%|████ | 410/1000 [04:21<06:16, 1.57it/s, loss=0.00813, lr=0.001]
Steps: 41%|████ | 411/1000 [04:21<06:18, 1.55it/s, loss=0.00813, lr=0.001]
Steps: 41%|████ | 411/1000 [04:21<06:18, 1.55it/s, loss=0.00135, lr=0.001]
Steps: 41%|████ | 412/1000 [04:22<06:15, 1.57it/s, loss=0.00135, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4063],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4074], device='cuda:0')
Steps: 41%|████ | 412/1000 [04:22<06:15, 1.57it/s, loss=0.00114, lr=0.001]
Steps: 41%|████▏ | 413/1000 [04:23<06:21, 1.54it/s, loss=0.00114, lr=0.001]
Steps: 41%|████▏ | 413/1000 [04:23<06:21, 1.54it/s, loss=0.00502, lr=0.001]
Steps: 41%|████▏ | 414/1000 [04:23<06:15, 1.56it/s, loss=0.00502, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4062],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4056, 0.4073], device='cuda:0')
Steps: 41%|████▏ | 414/1000 [04:23<06:15, 1.56it/s, loss=0.00983, lr=0.001]
Steps: 42%|████▏ | 415/1000 [04:24<06:17, 1.55it/s, loss=0.00983, lr=0.001]
Steps: 42%|████▏ | 415/1000 [04:24<06:17, 1.55it/s, loss=0.0534, lr=0.001]
Steps: 42%|████▏ | 416/1000 [04:24<06:13, 1.56it/s, loss=0.0534, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4061],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4055, 0.4073], device='cuda:0')
Steps: 42%|████▏ | 416/1000 [04:24<06:13, 1.56it/s, loss=0.0325, lr=0.001]
Steps: 42%|████▏ | 417/1000 [04:25<06:15, 1.55it/s, loss=0.0325, lr=0.001]
Steps: 42%|████▏ | 417/1000 [04:25<06:15, 1.55it/s, loss=0.000167, lr=0.001]
Steps: 42%|████▏ | 418/1000 [04:26<06:10, 1.57it/s, loss=0.000167, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4061],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4055, 0.4073], device='cuda:0')
Steps: 42%|████▏ | 418/1000 [04:26<06:10, 1.57it/s, loss=0.0166, lr=0.001]
Steps: 42%|████▏ | 419/1000 [04:26<06:12, 1.56it/s, loss=0.0166, lr=0.001]
Steps: 42%|████▏ | 419/1000 [04:26<06:12, 1.56it/s, loss=0.000417, lr=0.001]
Steps: 42%|████▏ | 420/1000 [04:27<06:08, 1.57it/s, loss=0.000417, lr=0.001]
tensor(0.0050, device='cuda:0')
tensor([[0.4060],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4054, 0.4074], device='cuda:0')
Steps: 42%|████▏ | 420/1000 [04:27<06:08, 1.57it/s, loss=0.00439, lr=0.001]
Steps: 42%|████▏ | 421/1000 [04:28<06:13, 1.55it/s, loss=0.00439, lr=0.001]
Steps: 42%|████▏ | 421/1000 [04:28<06:13, 1.55it/s, loss=0.114, lr=0.001]
Steps: 42%|████▏ | 422/1000 [04:28<06:10, 1.56it/s, loss=0.114, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4060],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4054, 0.4075], device='cuda:0')
Steps: 42%|████▏ | 422/1000 [04:28<06:10, 1.56it/s, loss=0.0405, lr=0.001]
Steps: 42%|████▏ | 423/1000 [04:29<06:11, 1.55it/s, loss=0.0405, lr=0.001]
Steps: 42%|████▏ | 423/1000 [04:29<06:11, 1.55it/s, loss=0.00279, lr=0.001]
Steps: 42%|████▏ | 424/1000 [04:30<06:07, 1.57it/s, loss=0.00279, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4060],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4054, 0.4076], device='cuda:0')
Steps: 42%|████▏ | 424/1000 [04:30<06:07, 1.57it/s, loss=0.00228, lr=0.001]
Steps: 42%|████▎ | 425/1000 [04:30<06:08, 1.56it/s, loss=0.00228, lr=0.001]
Steps: 42%|████▎ | 425/1000 [04:30<06:08, 1.56it/s, loss=0.00422, lr=0.001]
Steps: 43%|████▎ | 426/1000 [04:31<06:05, 1.57it/s, loss=0.00422, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4061],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4055, 0.4078], device='cuda:0')
Steps: 43%|████▎ | 426/1000 [04:31<06:05, 1.57it/s, loss=0.00899, lr=0.001]
Steps: 43%|████▎ | 427/1000 [04:31<06:06, 1.56it/s, loss=0.00899, lr=0.001]
Steps: 43%|████▎ | 427/1000 [04:31<06:06, 1.56it/s, loss=0.00143, lr=0.001]
Steps: 43%|████▎ | 428/1000 [04:32<06:03, 1.57it/s, loss=0.00143, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4063],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4056, 0.4079], device='cuda:0')
Steps: 43%|████▎ | 428/1000 [04:32<06:03, 1.57it/s, loss=0.00524, lr=0.001]
Steps: 43%|████▎ | 429/1000 [04:33<06:05, 1.56it/s, loss=0.00524, lr=0.001]
Steps: 43%|████▎ | 429/1000 [04:33<06:05, 1.56it/s, loss=0.00667, lr=0.001]
Steps: 43%|████▎ | 430/1000 [04:33<06:02, 1.57it/s, loss=0.00667, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4064],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4057, 0.4080], device='cuda:0')
Steps: 43%|████▎ | 430/1000 [04:33<06:02, 1.57it/s, loss=0.0314, lr=0.001]
Steps: 43%|████▎ | 431/1000 [04:34<06:05, 1.56it/s, loss=0.0314, lr=0.001]
Steps: 43%|████▎ | 431/1000 [04:34<06:05, 1.56it/s, loss=0.000126, lr=0.001]
Steps: 43%|████▎ | 432/1000 [04:35<06:01, 1.57it/s, loss=0.000126, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4065],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4059, 0.4081], device='cuda:0')
Steps: 43%|████▎ | 432/1000 [04:35<06:01, 1.57it/s, loss=0.00386, lr=0.001]
Steps: 43%|████▎ | 433/1000 [04:35<06:04, 1.56it/s, loss=0.00386, lr=0.001]
Steps: 43%|████▎ | 433/1000 [04:35<06:04, 1.56it/s, loss=0.00157, lr=0.001]
Steps: 43%|████▎ | 434/1000 [04:36<05:59, 1.57it/s, loss=0.00157, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4067],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4060, 0.4082], device='cuda:0')
Steps: 43%|████▎ | 434/1000 [04:36<05:59, 1.57it/s, loss=0.0367, lr=0.001]
Steps: 44%|████▎ | 435/1000 [04:37<06:03, 1.55it/s, loss=0.0367, lr=0.001]
Steps: 44%|████▎ | 435/1000 [04:37<06:03, 1.55it/s, loss=0.000496, lr=0.001]
Steps: 44%|████▎ | 436/1000 [04:37<06:01, 1.56it/s, loss=0.000496, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4068],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4061, 0.4083], device='cuda:0')
Steps: 44%|████▎ | 436/1000 [04:37<06:01, 1.56it/s, loss=0.00519, lr=0.001]
Steps: 44%|████▎ | 437/1000 [04:38<06:03, 1.55it/s, loss=0.00519, lr=0.001]
Steps: 44%|████▎ | 437/1000 [04:38<06:03, 1.55it/s, loss=0.000983, lr=0.001]
Steps: 44%|████▍ | 438/1000 [04:39<06:00, 1.56it/s, loss=0.000983, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4069],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4062, 0.4084], device='cuda:0')
Steps: 44%|████▍ | 438/1000 [04:39<06:00, 1.56it/s, loss=0.0276, lr=0.001]
Steps: 44%|████▍ | 439/1000 [04:39<06:01, 1.55it/s, loss=0.0276, lr=0.001]
Steps: 44%|████▍ | 439/1000 [04:39<06:01, 1.55it/s, loss=0.017, lr=0.001]
Steps: 44%|████▍ | 440/1000 [04:40<05:57, 1.57it/s, loss=0.017, lr=0.001]
tensor(0.0067, device='cuda:0')
tensor([[0.4071],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4085], device='cuda:0')
Steps: 44%|████▍ | 440/1000 [04:40<05:57, 1.57it/s, loss=0.0138, lr=0.001]
Steps: 44%|████▍ | 441/1000 [04:40<05:58, 1.56it/s, loss=0.0138, lr=0.001]
Steps: 44%|████▍ | 441/1000 [04:40<05:58, 1.56it/s, loss=0.083, lr=0.001]
Steps: 44%|████▍ | 442/1000 [04:41<05:55, 1.57it/s, loss=0.083, lr=0.001]
tensor(0.0102, device='cuda:0')
tensor([[0.4072],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4086], device='cuda:0')
Steps: 44%|████▍ | 442/1000 [04:41<05:55, 1.57it/s, loss=0.00452, lr=0.001]
Steps: 44%|████▍ | 443/1000 [04:42<05:56, 1.56it/s, loss=0.00452, lr=0.001]
Steps: 44%|████▍ | 443/1000 [04:42<05:56, 1.56it/s, loss=0.00397, lr=0.001]
Steps: 44%|████▍ | 444/1000 [04:42<05:53, 1.57it/s, loss=0.00397, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4074],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4087], device='cuda:0')
Steps: 44%|████▍ | 444/1000 [04:42<05:53, 1.57it/s, loss=0.0393, lr=0.001]
Steps: 44%|████▍ | 445/1000 [04:43<05:53, 1.57it/s, loss=0.0393, lr=0.001]
Steps: 44%|████▍ | 445/1000 [04:43<05:53, 1.57it/s, loss=0.00046, lr=0.001]
Steps: 45%|████▍ | 446/1000 [04:44<05:50, 1.58it/s, loss=0.00046, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4077],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4088], device='cuda:0')
Steps: 45%|████▍ | 446/1000 [04:44<05:50, 1.58it/s, loss=0.0732, lr=0.001]
Steps: 45%|████▍ | 447/1000 [04:44<05:52, 1.57it/s, loss=0.0732, lr=0.001]
Steps: 45%|████▍ | 447/1000 [04:44<05:52, 1.57it/s, loss=0.022, lr=0.001]
Steps: 45%|████▍ | 448/1000 [04:45<05:49, 1.58it/s, loss=0.022, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4079],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4089], device='cuda:0')
Steps: 45%|████▍ | 448/1000 [04:45<05:49, 1.58it/s, loss=0.0017, lr=0.001]
Steps: 45%|████▍ | 449/1000 [04:46<05:51, 1.57it/s, loss=0.0017, lr=0.001]
Steps: 45%|████▍ | 449/1000 [04:46<05:51, 1.57it/s, loss=0.0157, lr=0.001]
Steps: 45%|████▌ | 450/1000 [04:46<05:48, 1.58it/s, loss=0.0157, lr=0.001]
tensor(0.0044, device='cuda:0')
tensor([[0.4081],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4090], device='cuda:0')
Steps: 45%|████▌ | 450/1000 [04:46<05:48, 1.58it/s, loss=0.00657, lr=0.001]
Steps: 45%|████▌ | 451/1000 [04:47<05:51, 1.56it/s, loss=0.00657, lr=0.001]
Steps: 45%|████▌ | 451/1000 [04:47<05:51, 1.56it/s, loss=0.00209, lr=0.001]
Steps: 45%|████▌ | 452/1000 [04:47<05:48, 1.57it/s, loss=0.00209, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4083],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4091], device='cuda:0')
Steps: 45%|████▌ | 452/1000 [04:47<05:48, 1.57it/s, loss=0.000439, lr=0.001]
Steps: 45%|████▌ | 453/1000 [04:48<05:48, 1.57it/s, loss=0.000439, lr=0.001]
Steps: 45%|████▌ | 453/1000 [04:48<05:48, 1.57it/s, loss=0.00718, lr=0.001]
Steps: 45%|████▌ | 454/1000 [04:49<05:46, 1.58it/s, loss=0.00718, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4084],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4092], device='cuda:0')
Steps: 45%|████▌ | 454/1000 [04:49<05:46, 1.58it/s, loss=0.0172, lr=0.001]
Steps: 46%|████▌ | 455/1000 [04:49<05:55, 1.53it/s, loss=0.0172, lr=0.001]
Steps: 46%|████▌ | 455/1000 [04:49<05:55, 1.53it/s, loss=0.0327, lr=0.001]
Steps: 46%|████▌ | 456/1000 [04:50<05:51, 1.55it/s, loss=0.0327, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 456/1000 [04:50<05:51, 1.55it/s, loss=0.00666, lr=0.001]
Steps: 46%|████▌ | 457/1000 [04:51<05:54, 1.53it/s, loss=0.00666, lr=0.001]
Steps: 46%|████▌ | 457/1000 [04:51<05:54, 1.53it/s, loss=0.00171, lr=0.001]
Steps: 46%|████▌ | 458/1000 [04:51<05:51, 1.54it/s, loss=0.00171, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 458/1000 [04:51<05:51, 1.54it/s, loss=0.00249, lr=0.001]
Steps: 46%|████▌ | 459/1000 [04:52<05:54, 1.52it/s, loss=0.00249, lr=0.001]
Steps: 46%|████▌ | 459/1000 [04:52<05:54, 1.52it/s, loss=0.0129, lr=0.001]
Steps: 46%|████▌ | 460/1000 [04:53<05:50, 1.54it/s, loss=0.0129, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 460/1000 [04:53<05:50, 1.54it/s, loss=0.0134, lr=0.001]
Steps: 46%|████▌ | 461/1000 [04:53<05:50, 1.54it/s, loss=0.0134, lr=0.001]
Steps: 46%|████▌ | 461/1000 [04:53<05:50, 1.54it/s, loss=0.0508, lr=0.001]
Steps: 46%|████▌ | 462/1000 [04:54<05:45, 1.56it/s, loss=0.0508, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 46%|████▌ | 462/1000 [04:54<05:45, 1.56it/s, loss=0.00419, lr=0.001]
Steps: 46%|████▋ | 463/1000 [04:55<05:48, 1.54it/s, loss=0.00419, lr=0.001]
Steps: 46%|████▋ | 463/1000 [04:55<05:48, 1.54it/s, loss=0.00566, lr=0.001]
Steps: 46%|████▋ | 464/1000 [04:55<05:44, 1.56it/s, loss=0.00566, lr=0.001]
tensor(0.0102, device='cuda:0')
tensor([[0.4086],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4091], device='cuda:0')
Steps: 46%|████▋ | 464/1000 [04:55<05:44, 1.56it/s, loss=0.00386, lr=0.001]
Steps: 46%|████▋ | 465/1000 [04:56<05:46, 1.54it/s, loss=0.00386, lr=0.001]
Steps: 46%|████▋ | 465/1000 [04:56<05:46, 1.54it/s, loss=0.00318, lr=0.001]
Steps: 47%|████▋ | 466/1000 [04:57<05:42, 1.56it/s, loss=0.00318, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4087],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4091], device='cuda:0')
Steps: 47%|████▋ | 466/1000 [04:57<05:42, 1.56it/s, loss=0.00886, lr=0.001]
Steps: 47%|████▋ | 467/1000 [04:57<05:42, 1.55it/s, loss=0.00886, lr=0.001]
Steps: 47%|████▋ | 467/1000 [04:57<05:42, 1.55it/s, loss=0.000229, lr=0.001]
Steps: 47%|████▋ | 468/1000 [04:58<05:38, 1.57it/s, loss=0.000229, lr=0.001]
tensor(0.0087, device='cuda:0')
tensor([[0.4088],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4091], device='cuda:0')
Steps: 47%|████▋ | 468/1000 [04:58<05:38, 1.57it/s, loss=0.0432, lr=0.001]
Steps: 47%|████▋ | 469/1000 [04:58<05:39, 1.56it/s, loss=0.0432, lr=0.001]
Steps: 47%|████▋ | 469/1000 [04:58<05:39, 1.56it/s, loss=0.00354, lr=0.001]
Steps: 47%|████▋ | 470/1000 [04:59<05:36, 1.58it/s, loss=0.00354, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4090],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4092], device='cuda:0')
Steps: 47%|████▋ | 470/1000 [04:59<05:36, 1.58it/s, loss=0.00289, lr=0.001]
Steps: 47%|████▋ | 471/1000 [05:00<05:38, 1.56it/s, loss=0.00289, lr=0.001]
Steps: 47%|████▋ | 471/1000 [05:00<05:38, 1.56it/s, loss=0.077, lr=0.001]
Steps: 47%|████▋ | 472/1000 [05:00<05:34, 1.58it/s, loss=0.077, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4092],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4092], device='cuda:0')
Steps: 47%|████▋ | 472/1000 [05:00<05:34, 1.58it/s, loss=0.00646, lr=0.001]
Steps: 47%|████▋ | 473/1000 [05:01<05:35, 1.57it/s, loss=0.00646, lr=0.001]
Steps: 47%|████▋ | 473/1000 [05:01<05:35, 1.57it/s, loss=0.00128, lr=0.001]
Steps: 47%|████▋ | 474/1000 [05:02<05:32, 1.58it/s, loss=0.00128, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4094],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4092], device='cuda:0')
Steps: 47%|████▋ | 474/1000 [05:02<05:32, 1.58it/s, loss=0.0353, lr=0.001]
Steps: 48%|████▊ | 475/1000 [05:02<05:36, 1.56it/s, loss=0.0353, lr=0.001]
Steps: 48%|████▊ | 475/1000 [05:02<05:36, 1.56it/s, loss=0.0144, lr=0.001]
Steps: 48%|████▊ | 476/1000 [05:03<05:33, 1.57it/s, loss=0.0144, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4096],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 476/1000 [05:03<05:33, 1.57it/s, loss=0.0618, lr=0.001]
Steps: 48%|████▊ | 477/1000 [05:04<05:35, 1.56it/s, loss=0.0618, lr=0.001]
Steps: 48%|████▊ | 477/1000 [05:04<05:35, 1.56it/s, loss=0.00403, lr=0.001]
Steps: 48%|████▊ | 478/1000 [05:04<05:32, 1.57it/s, loss=0.00403, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4097],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 478/1000 [05:04<05:32, 1.57it/s, loss=0.00982, lr=0.001]
Steps: 48%|████▊ | 479/1000 [05:05<05:35, 1.55it/s, loss=0.00982, lr=0.001]
Steps: 48%|████▊ | 479/1000 [05:05<05:35, 1.55it/s, loss=0.00697, lr=0.001]
Steps: 48%|████▊ | 480/1000 [05:05<05:31, 1.57it/s, loss=0.00697, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4098],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 480/1000 [05:05<05:31, 1.57it/s, loss=0.000747, lr=0.001]
Steps: 48%|████▊ | 481/1000 [05:06<05:35, 1.55it/s, loss=0.000747, lr=0.001]
Steps: 48%|████▊ | 481/1000 [05:06<05:35, 1.55it/s, loss=0.0442, lr=0.001]
Steps: 48%|████▊ | 482/1000 [05:07<05:32, 1.56it/s, loss=0.0442, lr=0.001]
tensor(0.0087, device='cuda:0')
tensor([[0.4099],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4091], device='cuda:0')
Steps: 48%|████▊ | 482/1000 [05:07<05:32, 1.56it/s, loss=0.0821, lr=0.001]
Steps: 48%|████▊ | 483/1000 [05:07<05:32, 1.55it/s, loss=0.0821, lr=0.001]
Steps: 48%|████▊ | 483/1000 [05:07<05:32, 1.55it/s, loss=0.0551, lr=0.001]
Steps: 48%|████▊ | 484/1000 [05:08<05:29, 1.57it/s, loss=0.0551, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4100],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4092], device='cuda:0')
Steps: 48%|████▊ | 484/1000 [05:08<05:29, 1.57it/s, loss=0.0276, lr=0.001]
Steps: 48%|████▊ | 485/1000 [05:09<05:33, 1.55it/s, loss=0.0276, lr=0.001]
Steps: 48%|████▊ | 485/1000 [05:09<05:33, 1.55it/s, loss=0.049, lr=0.001]
Steps: 49%|████▊ | 486/1000 [05:09<05:29, 1.56it/s, loss=0.049, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4101],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4093], device='cuda:0')
Steps: 49%|████▊ | 486/1000 [05:09<05:29, 1.56it/s, loss=0.0103, lr=0.001]
Steps: 49%|████▊ | 487/1000 [05:10<05:33, 1.54it/s, loss=0.0103, lr=0.001]
Steps: 49%|████▊ | 487/1000 [05:10<05:33, 1.54it/s, loss=0.016, lr=0.001]
Steps: 49%|████▉ | 488/1000 [05:11<05:28, 1.56it/s, loss=0.016, lr=0.001]
tensor(0.0004, device='cuda:0')
tensor([[0.4102],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4095], device='cuda:0')
Steps: 49%|████▉ | 488/1000 [05:11<05:28, 1.56it/s, loss=0.00151, lr=0.001]
Steps: 49%|████▉ | 489/1000 [05:11<05:29, 1.55it/s, loss=0.00151, lr=0.001]
Steps: 49%|████▉ | 489/1000 [05:11<05:29, 1.55it/s, loss=0.000248, lr=0.001]
Steps: 49%|████▉ | 490/1000 [05:12<05:25, 1.57it/s, loss=0.000248, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4103],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4096], device='cuda:0')
Steps: 49%|████▉ | 490/1000 [05:12<05:25, 1.57it/s, loss=0.00138, lr=0.001]
Steps: 49%|████▉ | 491/1000 [05:13<05:28, 1.55it/s, loss=0.00138, lr=0.001]
Steps: 49%|████▉ | 491/1000 [05:13<05:28, 1.55it/s, loss=0.0205, lr=0.001]
Steps: 49%|████▉ | 492/1000 [05:13<05:25, 1.56it/s, loss=0.0205, lr=0.001]
tensor(0.0024, device='cuda:0')
tensor([[0.4104],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4096], device='cuda:0')
Steps: 49%|████▉ | 492/1000 [05:13<05:25, 1.56it/s, loss=0.00147, lr=0.001]
Steps: 49%|████▉ | 493/1000 [05:14<05:27, 1.55it/s, loss=0.00147, lr=0.001]
Steps: 49%|████▉ | 493/1000 [05:14<05:27, 1.55it/s, loss=0.00637, lr=0.001]
Steps: 49%|████▉ | 494/1000 [05:14<05:24, 1.56it/s, loss=0.00637, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4105],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4096], device='cuda:0')
Steps: 49%|████▉ | 494/1000 [05:14<05:24, 1.56it/s, loss=0.044, lr=0.001]
Steps: 50%|████▉ | 495/1000 [05:15<05:26, 1.55it/s, loss=0.044, lr=0.001]
Steps: 50%|████▉ | 495/1000 [05:15<05:26, 1.55it/s, loss=0.00497, lr=0.001]
Steps: 50%|████▉ | 496/1000 [05:16<05:22, 1.56it/s, loss=0.00497, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4105],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4096], device='cuda:0')
Steps: 50%|████▉ | 496/1000 [05:16<05:22, 1.56it/s, loss=0.0009, lr=0.001]
Steps: 50%|████▉ | 497/1000 [05:16<05:25, 1.54it/s, loss=0.0009, lr=0.001]
Steps: 50%|████▉ | 497/1000 [05:16<05:25, 1.54it/s, loss=0.00351, lr=0.001]
Steps: 50%|████▉ | 498/1000 [05:17<05:22, 1.56it/s, loss=0.00351, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4105],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4095, 0.4096], device='cuda:0')
Steps: 50%|████▉ | 498/1000 [05:17<05:22, 1.56it/s, loss=0.0155, lr=0.001]
Steps: 50%|████▉ | 499/1000 [05:18<05:23, 1.55it/s, loss=0.0155, lr=0.001]
Steps: 50%|████▉ | 499/1000 [05:18<05:23, 1.55it/s, loss=0.0452, lr=0.001]
Steps: 50%|█████ | 500/1000 [05:18<05:19, 1.56it/s, loss=0.0452, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0147, -0.0021, -0.0056, -0.0236], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0072, 0.0016, -0.0013, 0.0088], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_500.safetensors
tensor(0.0015, device='cuda:0')
tensor([[0.4105],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4096], device='cuda:0')
Steps: 50%|█████ | 500/1000 [05:18<05:19, 1.56it/s, loss=0.00169, lr=0.001]
Steps: 50%|█████ | 501/1000 [05:19<05:20, 1.55it/s, loss=0.00169, lr=0.001]
Steps: 50%|█████ | 501/1000 [05:19<05:20, 1.55it/s, loss=0.000385, lr=0.001]
Steps: 50%|█████ | 502/1000 [05:20<05:18, 1.56it/s, loss=0.000385, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4104],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4095], device='cuda:0')
Steps: 50%|█████ | 502/1000 [05:20<05:18, 1.56it/s, loss=0.004, lr=0.001]
Steps: 50%|█████ | 503/1000 [05:20<05:20, 1.55it/s, loss=0.004, lr=0.001]
Steps: 50%|█████ | 503/1000 [05:20<05:20, 1.55it/s, loss=0.0547, lr=0.001]
Steps: 50%|█████ | 504/1000 [05:21<05:17, 1.56it/s, loss=0.0547, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4103],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4094], device='cuda:0')
Steps: 50%|█████ | 504/1000 [05:21<05:17, 1.56it/s, loss=0.000724, lr=0.001]
Steps: 50%|█████ | 505/1000 [05:22<05:17, 1.56it/s, loss=0.000724, lr=0.001]
Steps: 50%|█████ | 505/1000 [05:22<05:17, 1.56it/s, loss=0.00276, lr=0.001]
Steps: 51%|█████ | 506/1000 [05:22<05:14, 1.57it/s, loss=0.00276, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4101],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4093], device='cuda:0')
Steps: 51%|█████ | 506/1000 [05:22<05:14, 1.57it/s, loss=0.000694, lr=0.001]
Steps: 51%|█████ | 507/1000 [05:23<05:16, 1.56it/s, loss=0.000694, lr=0.001]
Steps: 51%|█████ | 507/1000 [05:23<05:16, 1.56it/s, loss=0.000989, lr=0.001]
Steps: 51%|█████ | 508/1000 [05:23<05:13, 1.57it/s, loss=0.000989, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4100],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4092], device='cuda:0')
Steps: 51%|█████ | 508/1000 [05:23<05:13, 1.57it/s, loss=0.00329, lr=0.001]
Steps: 51%|█████ | 509/1000 [05:24<05:15, 1.56it/s, loss=0.00329, lr=0.001]
Steps: 51%|█████ | 509/1000 [05:24<05:15, 1.56it/s, loss=0.0359, lr=0.001]
Steps: 51%|█████ | 510/1000 [05:25<05:11, 1.57it/s, loss=0.0359, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4098],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4091], device='cuda:0')
Steps: 51%|█████ | 510/1000 [05:25<05:11, 1.57it/s, loss=0.0109, lr=0.001]
Steps: 51%|█████ | 511/1000 [05:25<05:15, 1.55it/s, loss=0.0109, lr=0.001]
Steps: 51%|█████ | 511/1000 [05:25<05:15, 1.55it/s, loss=0.005, lr=0.001]
Steps: 51%|█████ | 512/1000 [05:26<05:12, 1.56it/s, loss=0.005, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4096],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4090], device='cuda:0')
Steps: 51%|█████ | 512/1000 [05:26<05:12, 1.56it/s, loss=0.152, lr=0.001]
Steps: 51%|█████▏ | 513/1000 [05:27<05:17, 1.54it/s, loss=0.152, lr=0.001]
Steps: 51%|█████▏ | 513/1000 [05:27<05:17, 1.54it/s, loss=0.00354, lr=0.001]
Steps: 51%|█████▏ | 514/1000 [05:27<05:12, 1.56it/s, loss=0.00354, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4094],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4088], device='cuda:0')
Steps: 51%|█████▏ | 514/1000 [05:27<05:12, 1.56it/s, loss=0.0389, lr=0.001]
Steps: 52%|█████▏ | 515/1000 [05:28<05:13, 1.54it/s, loss=0.0389, lr=0.001]
Steps: 52%|█████▏ | 515/1000 [05:28<05:13, 1.54it/s, loss=0.0244, lr=0.001]
Steps: 52%|█████▏ | 516/1000 [05:29<05:11, 1.56it/s, loss=0.0244, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4093],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4087], device='cuda:0')
Steps: 52%|█████▏ | 516/1000 [05:29<05:11, 1.56it/s, loss=0.0105, lr=0.001]
Steps: 52%|█████▏ | 517/1000 [05:29<05:12, 1.55it/s, loss=0.0105, lr=0.001]
Steps: 52%|█████▏ | 517/1000 [05:29<05:12, 1.55it/s, loss=0.000476, lr=0.001]
Steps: 52%|█████▏ | 518/1000 [05:30<05:10, 1.55it/s, loss=0.000476, lr=0.001]
tensor(0.0080, device='cuda:0')
tensor([[0.4091],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4086], device='cuda:0')
Steps: 52%|█████▏ | 518/1000 [05:30<05:10, 1.55it/s, loss=0.00252, lr=0.001]
Steps: 52%|█████▏ | 519/1000 [05:31<05:11, 1.55it/s, loss=0.00252, lr=0.001]
Steps: 52%|█████▏ | 519/1000 [05:31<05:11, 1.55it/s, loss=0.015, lr=0.001]
Steps: 52%|█████▏ | 520/1000 [05:31<05:06, 1.56it/s, loss=0.015, lr=0.001]
tensor(0.0011, device='cuda:0')
tensor([[0.4090],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4085], device='cuda:0')
Steps: 52%|█████▏ | 520/1000 [05:31<05:06, 1.56it/s, loss=0.000183, lr=0.001]
Steps: 52%|█████▏ | 521/1000 [05:32<05:09, 1.55it/s, loss=0.000183, lr=0.001]
Steps: 52%|█████▏ | 521/1000 [05:32<05:09, 1.55it/s, loss=0.00404, lr=0.001]
Steps: 52%|█████▏ | 522/1000 [05:32<05:06, 1.56it/s, loss=0.00404, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4088],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4084], device='cuda:0')
Steps: 52%|█████▏ | 522/1000 [05:32<05:06, 1.56it/s, loss=0.00752, lr=0.001]
Steps: 52%|█████▏ | 523/1000 [05:33<05:06, 1.56it/s, loss=0.00752, lr=0.001]
Steps: 52%|█████▏ | 523/1000 [05:33<05:06, 1.56it/s, loss=0.0113, lr=0.001]
Steps: 52%|█████▏ | 524/1000 [05:34<05:02, 1.57it/s, loss=0.0113, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4086],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4082], device='cuda:0')
Steps: 52%|█████▏ | 524/1000 [05:34<05:02, 1.57it/s, loss=0.00317, lr=0.001]
Steps: 52%|█████▎ | 525/1000 [05:34<05:05, 1.55it/s, loss=0.00317, lr=0.001]
Steps: 52%|█████▎ | 525/1000 [05:34<05:05, 1.55it/s, loss=0.000528, lr=0.001]
Steps: 53%|█████▎ | 526/1000 [05:35<05:02, 1.57it/s, loss=0.000528, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4084],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4081], device='cuda:0')
Steps: 53%|█████▎ | 526/1000 [05:35<05:02, 1.57it/s, loss=0.000145, lr=0.001]
Steps: 53%|█████▎ | 527/1000 [05:36<05:03, 1.56it/s, loss=0.000145, lr=0.001]
Steps: 53%|█████▎ | 527/1000 [05:36<05:03, 1.56it/s, loss=0.0233, lr=0.001]
Steps: 53%|█████▎ | 528/1000 [05:36<04:59, 1.57it/s, loss=0.0233, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4083],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4080], device='cuda:0')
Steps: 53%|█████▎ | 528/1000 [05:36<04:59, 1.57it/s, loss=0.000492, lr=0.001]
Steps: 53%|█████▎ | 529/1000 [05:37<05:01, 1.56it/s, loss=0.000492, lr=0.001]
Steps: 53%|█████▎ | 529/1000 [05:37<05:01, 1.56it/s, loss=0.0405, lr=0.001]
Steps: 53%|█████▎ | 530/1000 [05:38<04:57, 1.58it/s, loss=0.0405, lr=0.001]
tensor(0.0062, device='cuda:0')
tensor([[0.4081],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4078], device='cuda:0')
Steps: 53%|█████▎ | 530/1000 [05:38<04:57, 1.58it/s, loss=0.00214, lr=0.001]
Steps: 53%|█████▎ | 531/1000 [05:38<04:59, 1.57it/s, loss=0.00214, lr=0.001]
Steps: 53%|█████▎ | 531/1000 [05:38<04:59, 1.57it/s, loss=0.0452, lr=0.001]
Steps: 53%|█████▎ | 532/1000 [05:39<04:56, 1.58it/s, loss=0.0452, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4080],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4077], device='cuda:0')
Steps: 53%|█████▎ | 532/1000 [05:39<04:56, 1.58it/s, loss=0.0139, lr=0.001]
Steps: 53%|█████▎ | 533/1000 [05:39<04:57, 1.57it/s, loss=0.0139, lr=0.001]
Steps: 53%|█████▎ | 533/1000 [05:39<04:57, 1.57it/s, loss=0.0151, lr=0.001]
Steps: 53%|█████▎ | 534/1000 [05:40<04:55, 1.58it/s, loss=0.0151, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4078],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4077], device='cuda:0')
Steps: 53%|█████▎ | 534/1000 [05:40<04:55, 1.58it/s, loss=0.021, lr=0.001]
Steps: 54%|█████▎ | 535/1000 [05:41<04:57, 1.56it/s, loss=0.021, lr=0.001]
Steps: 54%|█████▎ | 535/1000 [05:41<04:57, 1.56it/s, loss=0.00169, lr=0.001]
Steps: 54%|█████▎ | 536/1000 [05:41<04:55, 1.57it/s, loss=0.00169, lr=0.001]
tensor(0.0003, device='cuda:0')
tensor([[0.4077],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4077], device='cuda:0')
Steps: 54%|█████▎ | 536/1000 [05:41<04:55, 1.57it/s, loss=0.000928, lr=0.001]
Steps: 54%|█████▎ | 537/1000 [05:42<04:57, 1.56it/s, loss=0.000928, lr=0.001]
Steps: 54%|█████▎ | 537/1000 [05:42<04:57, 1.56it/s, loss=0.000183, lr=0.001]
Steps: 54%|█████▍ | 538/1000 [05:43<04:53, 1.57it/s, loss=0.000183, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4075],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4077], device='cuda:0')
Steps: 54%|█████▍ | 538/1000 [05:43<04:53, 1.57it/s, loss=0.0489, lr=0.001]
Steps: 54%|█████▍ | 539/1000 [05:43<04:57, 1.55it/s, loss=0.0489, lr=0.001]
Steps: 54%|█████▍ | 539/1000 [05:43<04:57, 1.55it/s, loss=0.0065, lr=0.001]
Steps: 54%|█████▍ | 540/1000 [05:44<04:53, 1.57it/s, loss=0.0065, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4074],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4077], device='cuda:0')
Steps: 54%|█████▍ | 540/1000 [05:44<04:53, 1.57it/s, loss=0.00161, lr=0.001]
Steps: 54%|█████▍ | 541/1000 [05:45<04:56, 1.55it/s, loss=0.00161, lr=0.001]
Steps: 54%|█████▍ | 541/1000 [05:45<04:56, 1.55it/s, loss=0.00176, lr=0.001]
Steps: 54%|█████▍ | 542/1000 [05:45<04:53, 1.56it/s, loss=0.00176, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4073],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4076], device='cuda:0')
Steps: 54%|█████▍ | 542/1000 [05:45<04:53, 1.56it/s, loss=0.00181, lr=0.001]
Steps: 54%|█████▍ | 543/1000 [05:46<04:54, 1.55it/s, loss=0.00181, lr=0.001]
Steps: 54%|█████▍ | 543/1000 [05:46<04:54, 1.55it/s, loss=0.00432, lr=0.001]
Steps: 54%|█████▍ | 544/1000 [05:47<04:51, 1.56it/s, loss=0.00432, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4072],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4076], device='cuda:0')
Steps: 54%|█████▍ | 544/1000 [05:47<04:51, 1.56it/s, loss=0.00219, lr=0.001]
Steps: 55%|█████▍ | 545/1000 [05:47<04:51, 1.56it/s, loss=0.00219, lr=0.001]
Steps: 55%|█████▍ | 545/1000 [05:47<04:51, 1.56it/s, loss=0.0136, lr=0.001]
Steps: 55%|█████▍ | 546/1000 [05:48<04:49, 1.57it/s, loss=0.0136, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4071],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4075], device='cuda:0')
Steps: 55%|█████▍ | 546/1000 [05:48<04:49, 1.57it/s, loss=0.00109, lr=0.001]
Steps: 55%|█████▍ | 547/1000 [05:48<04:49, 1.56it/s, loss=0.00109, lr=0.001]
Steps: 55%|█████▍ | 547/1000 [05:48<04:49, 1.56it/s, loss=0.021, lr=0.001]
Steps: 55%|█████▍ | 548/1000 [05:49<04:47, 1.57it/s, loss=0.021, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4071],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4073], device='cuda:0')
Steps: 55%|█████▍ | 548/1000 [05:49<04:47, 1.57it/s, loss=0.016, lr=0.001]
Steps: 55%|█████▍ | 549/1000 [05:50<04:49, 1.56it/s, loss=0.016, lr=0.001]
Steps: 55%|█████▍ | 549/1000 [05:50<04:49, 1.56it/s, loss=0.00611, lr=0.001]
Steps: 55%|█████▌ | 550/1000 [05:50<04:46, 1.57it/s, loss=0.00611, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4071],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4071], device='cuda:0')
Steps: 55%|█████▌ | 550/1000 [05:50<04:46, 1.57it/s, loss=0.00452, lr=0.001]
Steps: 55%|█████▌ | 551/1000 [05:51<04:47, 1.56it/s, loss=0.00452, lr=0.001]
Steps: 55%|█████▌ | 551/1000 [05:51<04:47, 1.56it/s, loss=0.00135, lr=0.001]
Steps: 55%|█████▌ | 552/1000 [05:52<04:47, 1.56it/s, loss=0.00135, lr=0.001]
tensor(0.0008, device='cuda:0')
tensor([[0.4071],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4069], device='cuda:0')
Steps: 55%|█████▌ | 552/1000 [05:52<04:47, 1.56it/s, loss=0.000583, lr=0.001]
Steps: 55%|█████▌ | 553/1000 [05:52<04:48, 1.55it/s, loss=0.000583, lr=0.001]
Steps: 55%|█████▌ | 553/1000 [05:52<04:48, 1.55it/s, loss=0.000458, lr=0.001]
Steps: 55%|█████▌ | 554/1000 [05:53<04:46, 1.56it/s, loss=0.000458, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4072],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4068], device='cuda:0')
Steps: 55%|█████▌ | 554/1000 [05:53<04:46, 1.56it/s, loss=0.00163, lr=0.001]
Steps: 56%|█████▌ | 555/1000 [05:54<04:47, 1.55it/s, loss=0.00163, lr=0.001]
Steps: 56%|█████▌ | 555/1000 [05:54<04:47, 1.55it/s, loss=0.076, lr=0.001]
Steps: 56%|█████▌ | 556/1000 [05:54<04:44, 1.56it/s, loss=0.076, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4072],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 556/1000 [05:54<04:44, 1.56it/s, loss=0.00438, lr=0.001]
Steps: 56%|█████▌ | 557/1000 [05:55<04:45, 1.55it/s, loss=0.00438, lr=0.001]
Steps: 56%|█████▌ | 557/1000 [05:55<04:45, 1.55it/s, loss=0.00992, lr=0.001]
Steps: 56%|█████▌ | 558/1000 [05:55<04:41, 1.57it/s, loss=0.00992, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4072],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 558/1000 [05:55<04:41, 1.57it/s, loss=0.00724, lr=0.001]
Steps: 56%|█████▌ | 559/1000 [05:56<04:42, 1.56it/s, loss=0.00724, lr=0.001]
Steps: 56%|█████▌ | 559/1000 [05:56<04:42, 1.56it/s, loss=0.0981, lr=0.001]
Steps: 56%|█████▌ | 560/1000 [05:57<04:39, 1.58it/s, loss=0.0981, lr=0.001]
tensor(0.0098, device='cuda:0')
tensor([[0.4072],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 560/1000 [05:57<04:39, 1.58it/s, loss=0.0443, lr=0.001]
Steps: 56%|█████▌ | 561/1000 [05:57<04:40, 1.56it/s, loss=0.0443, lr=0.001]
Steps: 56%|█████▌ | 561/1000 [05:57<04:40, 1.56it/s, loss=0.00534, lr=0.001]
Steps: 56%|█████▌ | 562/1000 [05:58<04:37, 1.58it/s, loss=0.00534, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4072],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4066], device='cuda:0')
Steps: 56%|█████▌ | 562/1000 [05:58<04:37, 1.58it/s, loss=0.000203, lr=0.001]
Steps: 56%|█████▋ | 563/1000 [05:59<04:38, 1.57it/s, loss=0.000203, lr=0.001]
Steps: 56%|█████▋ | 563/1000 [05:59<04:38, 1.57it/s, loss=0.0275, lr=0.001]
Steps: 56%|█████▋ | 564/1000 [05:59<04:36, 1.58it/s, loss=0.0275, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 56%|█████▋ | 564/1000 [05:59<04:36, 1.58it/s, loss=0.0524, lr=0.001]
Steps: 56%|█████▋ | 565/1000 [06:00<04:37, 1.57it/s, loss=0.0524, lr=0.001]
Steps: 56%|█████▋ | 565/1000 [06:00<04:37, 1.57it/s, loss=0.0067, lr=0.001]
Steps: 57%|█████▋ | 566/1000 [06:01<04:34, 1.58it/s, loss=0.0067, lr=0.001]
tensor(0.0110, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 57%|█████▋ | 566/1000 [06:01<04:34, 1.58it/s, loss=0.00774, lr=0.001]
Steps: 57%|█████▋ | 567/1000 [06:01<04:36, 1.57it/s, loss=0.00774, lr=0.001]
Steps: 57%|█████▋ | 567/1000 [06:01<04:36, 1.57it/s, loss=0.00531, lr=0.001]
Steps: 57%|█████▋ | 568/1000 [06:02<04:32, 1.58it/s, loss=0.00531, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4073],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4068], device='cuda:0')
Steps: 57%|█████▋ | 568/1000 [06:02<04:32, 1.58it/s, loss=0.00398, lr=0.001]
Steps: 57%|█████▋ | 569/1000 [06:02<04:34, 1.57it/s, loss=0.00398, lr=0.001]
Steps: 57%|█████▋ | 569/1000 [06:02<04:34, 1.57it/s, loss=0.000125, lr=0.001]
Steps: 57%|█████▋ | 570/1000 [06:03<04:32, 1.58it/s, loss=0.000125, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4073],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4068], device='cuda:0')
Steps: 57%|█████▋ | 570/1000 [06:03<04:32, 1.58it/s, loss=0.00406, lr=0.001]
Steps: 57%|█████▋ | 571/1000 [06:04<04:34, 1.57it/s, loss=0.00406, lr=0.001]
Steps: 57%|█████▋ | 571/1000 [06:04<04:34, 1.57it/s, loss=0.000723, lr=0.001]
Steps: 57%|█████▋ | 572/1000 [06:04<04:31, 1.57it/s, loss=0.000723, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4072],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4069], device='cuda:0')
Steps: 57%|█████▋ | 572/1000 [06:04<04:31, 1.57it/s, loss=0.0526, lr=0.001]
Steps: 57%|█████▋ | 573/1000 [06:05<04:33, 1.56it/s, loss=0.0526, lr=0.001]
Steps: 57%|█████▋ | 573/1000 [06:05<04:33, 1.56it/s, loss=0.000955, lr=0.001]
Steps: 57%|█████▋ | 574/1000 [06:06<04:30, 1.57it/s, loss=0.000955, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4071],
[0.4078]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4070], device='cuda:0')
Steps: 57%|█████▋ | 574/1000 [06:06<04:30, 1.57it/s, loss=0.0284, lr=0.001]
Steps: 57%|█████▊ | 575/1000 [06:06<04:31, 1.56it/s, loss=0.0284, lr=0.001]
Steps: 57%|█████▊ | 575/1000 [06:06<04:31, 1.56it/s, loss=0.00668, lr=0.001]
Steps: 58%|█████▊ | 576/1000 [06:07<04:28, 1.58it/s, loss=0.00668, lr=0.001]
tensor(0.0083, device='cuda:0')
tensor([[0.4070],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4063, 0.4071], device='cuda:0')
Steps: 58%|█████▊ | 576/1000 [06:07<04:28, 1.58it/s, loss=0.0343, lr=0.001]
Steps: 58%|█████▊ | 577/1000 [06:08<04:29, 1.57it/s, loss=0.0343, lr=0.001]
Steps: 58%|█████▊ | 577/1000 [06:08<04:29, 1.57it/s, loss=0.0111, lr=0.001]
Steps: 58%|█████▊ | 578/1000 [06:08<04:26, 1.58it/s, loss=0.0111, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4070],
[0.4081]], device='cuda:0')
Current Norm : tensor([0.4063, 0.4073], device='cuda:0')
Steps: 58%|█████▊ | 578/1000 [06:08<04:26, 1.58it/s, loss=0.0138, lr=0.001]
Steps: 58%|█████▊ | 579/1000 [06:09<04:28, 1.57it/s, loss=0.0138, lr=0.001]
Steps: 58%|█████▊ | 579/1000 [06:09<04:28, 1.57it/s, loss=0.0497, lr=0.001]
Steps: 58%|█████▊ | 580/1000 [06:09<04:26, 1.58it/s, loss=0.0497, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4071],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4064, 0.4075], device='cuda:0')
Steps: 58%|█████▊ | 580/1000 [06:09<04:26, 1.58it/s, loss=0.000655, lr=0.001]
Steps: 58%|█████▊ | 581/1000 [06:10<04:27, 1.57it/s, loss=0.000655, lr=0.001]
Steps: 58%|█████▊ | 581/1000 [06:10<04:27, 1.57it/s, loss=0.013, lr=0.001]
Steps: 58%|█████▊ | 582/1000 [06:11<04:24, 1.58it/s, loss=0.013, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4072],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4077], device='cuda:0')
Steps: 58%|█████▊ | 582/1000 [06:11<04:24, 1.58it/s, loss=0.0135, lr=0.001]
Steps: 58%|█████▊ | 583/1000 [06:11<04:26, 1.56it/s, loss=0.0135, lr=0.001]
Steps: 58%|█████▊ | 583/1000 [06:11<04:26, 1.56it/s, loss=0.0109, lr=0.001]
Steps: 58%|█████▊ | 584/1000 [06:12<04:24, 1.57it/s, loss=0.0109, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4073],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4079], device='cuda:0')
Steps: 58%|█████▊ | 584/1000 [06:12<04:24, 1.57it/s, loss=0.00615, lr=0.001]
Steps: 58%|█████▊ | 585/1000 [06:13<04:25, 1.56it/s, loss=0.00615, lr=0.001]
Steps: 58%|█████▊ | 585/1000 [06:13<04:25, 1.56it/s, loss=0.0127, lr=0.001]
Steps: 59%|█████▊ | 586/1000 [06:13<04:22, 1.57it/s, loss=0.0127, lr=0.001]
tensor(0.0024, device='cuda:0')
tensor([[0.4074],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4081], device='cuda:0')
Steps: 59%|█████▊ | 586/1000 [06:13<04:22, 1.57it/s, loss=0.000324, lr=0.001]
Steps: 59%|█████▊ | 587/1000 [06:14<04:24, 1.56it/s, loss=0.000324, lr=0.001]
Steps: 59%|█████▊ | 587/1000 [06:14<04:24, 1.56it/s, loss=0.013, lr=0.001]
Steps: 59%|█████▉ | 588/1000 [06:15<04:21, 1.57it/s, loss=0.013, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4075],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4083], device='cuda:0')
Steps: 59%|█████▉ | 588/1000 [06:15<04:21, 1.57it/s, loss=0.00368, lr=0.001]
Steps: 59%|█████▉ | 589/1000 [06:15<04:23, 1.56it/s, loss=0.00368, lr=0.001]
Steps: 59%|█████▉ | 589/1000 [06:15<04:23, 1.56it/s, loss=0.00373, lr=0.001]
Steps: 59%|█████▉ | 590/1000 [06:16<04:20, 1.57it/s, loss=0.00373, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4076],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4084], device='cuda:0')
Steps: 59%|█████▉ | 590/1000 [06:16<04:20, 1.57it/s, loss=0.00137, lr=0.001]
Steps: 59%|█████▉ | 591/1000 [06:16<04:20, 1.57it/s, loss=0.00137, lr=0.001]
Steps: 59%|█████▉ | 591/1000 [06:16<04:20, 1.57it/s, loss=0.0131, lr=0.001]
Steps: 59%|█████▉ | 592/1000 [06:17<04:18, 1.58it/s, loss=0.0131, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4076],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4085], device='cuda:0')
Steps: 59%|█████▉ | 592/1000 [06:17<04:18, 1.58it/s, loss=0.014, lr=0.001]
Steps: 59%|█████▉ | 593/1000 [06:18<04:19, 1.57it/s, loss=0.014, lr=0.001]
Steps: 59%|█████▉ | 593/1000 [06:18<04:19, 1.57it/s, loss=0.000151, lr=0.001]
Steps: 59%|█████▉ | 594/1000 [06:18<04:16, 1.58it/s, loss=0.000151, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4077],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4085], device='cuda:0')
Steps: 59%|█████▉ | 594/1000 [06:18<04:16, 1.58it/s, loss=0.0421, lr=0.001]
Steps: 60%|█████▉ | 595/1000 [06:19<04:18, 1.57it/s, loss=0.0421, lr=0.001]
Steps: 60%|█████▉ | 595/1000 [06:19<04:18, 1.57it/s, loss=0.000933, lr=0.001]
Steps: 60%|█████▉ | 596/1000 [06:20<04:16, 1.57it/s, loss=0.000933, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4077],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 60%|█████▉ | 596/1000 [06:20<04:16, 1.57it/s, loss=0.00709, lr=0.001]
Steps: 60%|█████▉ | 597/1000 [06:20<04:19, 1.55it/s, loss=0.00709, lr=0.001]
Steps: 60%|█████▉ | 597/1000 [06:20<04:19, 1.55it/s, loss=0.000454, lr=0.001]
Steps: 60%|█████▉ | 598/1000 [06:21<04:15, 1.57it/s, loss=0.000454, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4078],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4087], device='cuda:0')
Steps: 60%|█████▉ | 598/1000 [06:21<04:15, 1.57it/s, loss=0.0658, lr=0.001]
Steps: 60%|█████▉ | 599/1000 [06:22<04:17, 1.56it/s, loss=0.0658, lr=0.001]
Steps: 60%|█████▉ | 599/1000 [06:22<04:17, 1.56it/s, loss=0.00331, lr=0.001]
Steps: 60%|██████ | 600/1000 [06:22<04:14, 1.57it/s, loss=0.00331, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0173, -0.0068, -0.0072, -0.0267], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0088, 0.0018, -0.0002, 0.0101], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_600.safetensors
tensor(0.0059, device='cuda:0')
tensor([[0.4078],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 60%|██████ | 600/1000 [06:22<04:14, 1.57it/s, loss=0.000187, lr=0.001]
Steps: 60%|██████ | 601/1000 [06:23<04:17, 1.55it/s, loss=0.000187, lr=0.001]
Steps: 60%|██████ | 601/1000 [06:23<04:17, 1.55it/s, loss=0.0136, lr=0.001]
Steps: 60%|██████ | 602/1000 [06:23<04:13, 1.57it/s, loss=0.0136, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4078],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4086], device='cuda:0')
Steps: 60%|██████ | 602/1000 [06:23<04:13, 1.57it/s, loss=0.0572, lr=0.001]
Steps: 60%|██████ | 603/1000 [06:24<04:14, 1.56it/s, loss=0.0572, lr=0.001]
Steps: 60%|██████ | 603/1000 [06:24<04:14, 1.56it/s, loss=0.0157, lr=0.001]
Steps: 60%|██████ | 604/1000 [06:25<04:10, 1.58it/s, loss=0.0157, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4078],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4085], device='cuda:0')
Steps: 60%|██████ | 604/1000 [06:25<04:10, 1.58it/s, loss=0.00394, lr=0.001]
Steps: 60%|██████ | 605/1000 [06:25<04:11, 1.57it/s, loss=0.00394, lr=0.001]
Steps: 60%|██████ | 605/1000 [06:25<04:11, 1.57it/s, loss=0.00151, lr=0.001]
Steps: 61%|██████ | 606/1000 [06:26<04:09, 1.58it/s, loss=0.00151, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4078],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4084], device='cuda:0')
Steps: 61%|██████ | 606/1000 [06:26<04:09, 1.58it/s, loss=0.0347, lr=0.001]
Steps: 61%|██████ | 607/1000 [06:27<04:11, 1.56it/s, loss=0.0347, lr=0.001]
Steps: 61%|██████ | 607/1000 [06:27<04:11, 1.56it/s, loss=0.00051, lr=0.001]
Steps: 61%|██████ | 608/1000 [06:27<04:08, 1.58it/s, loss=0.00051, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4077],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4084], device='cuda:0')
Steps: 61%|██████ | 608/1000 [06:27<04:08, 1.58it/s, loss=0.0035, lr=0.001]
Steps: 61%|██████ | 609/1000 [06:28<04:10, 1.56it/s, loss=0.0035, lr=0.001]
Steps: 61%|██████ | 609/1000 [06:28<04:10, 1.56it/s, loss=0.00137, lr=0.001]
Steps: 61%|██████ | 610/1000 [06:29<04:07, 1.58it/s, loss=0.00137, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4077],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4083], device='cuda:0')
Steps: 61%|██████ | 610/1000 [06:29<04:07, 1.58it/s, loss=0.00724, lr=0.001]
Steps: 61%|██████ | 611/1000 [06:29<04:08, 1.57it/s, loss=0.00724, lr=0.001]
Steps: 61%|██████ | 611/1000 [06:29<04:08, 1.57it/s, loss=0.00916, lr=0.001]
Steps: 61%|██████ | 612/1000 [06:30<04:05, 1.58it/s, loss=0.00916, lr=0.001]
tensor(0.0041, device='cuda:0')
tensor([[0.4076],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4082], device='cuda:0')
Steps: 61%|██████ | 612/1000 [06:30<04:05, 1.58it/s, loss=0.00901, lr=0.001]
Steps: 61%|██████▏ | 613/1000 [06:30<04:07, 1.56it/s, loss=0.00901, lr=0.001]
Steps: 61%|██████▏ | 613/1000 [06:30<04:07, 1.56it/s, loss=0.0118, lr=0.001]
Steps: 61%|██████▏ | 614/1000 [06:31<04:04, 1.58it/s, loss=0.0118, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4076],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4081], device='cuda:0')
Steps: 61%|██████▏ | 614/1000 [06:31<04:04, 1.58it/s, loss=0.000786, lr=0.001]
Steps: 62%|██████▏ | 615/1000 [06:32<04:05, 1.57it/s, loss=0.000786, lr=0.001]
Steps: 62%|██████▏ | 615/1000 [06:32<04:05, 1.57it/s, loss=0.126, lr=0.001]
Steps: 62%|██████▏ | 616/1000 [06:32<04:03, 1.58it/s, loss=0.126, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4075],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4080], device='cuda:0')
Steps: 62%|██████▏ | 616/1000 [06:32<04:03, 1.58it/s, loss=0.00252, lr=0.001]
Steps: 62%|██████▏ | 617/1000 [06:33<04:05, 1.56it/s, loss=0.00252, lr=0.001]
Steps: 62%|██████▏ | 617/1000 [06:33<04:05, 1.56it/s, loss=0.0111, lr=0.001]
Steps: 62%|██████▏ | 618/1000 [06:34<04:02, 1.58it/s, loss=0.0111, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4075],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4078], device='cuda:0')
Steps: 62%|██████▏ | 618/1000 [06:34<04:02, 1.58it/s, loss=0.00777, lr=0.001]
Steps: 62%|██████▏ | 619/1000 [06:34<04:03, 1.56it/s, loss=0.00777, lr=0.001]
Steps: 62%|██████▏ | 619/1000 [06:34<04:03, 1.56it/s, loss=0.00577, lr=0.001]
Steps: 62%|██████▏ | 620/1000 [06:35<04:01, 1.57it/s, loss=0.00577, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4075],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4077], device='cuda:0')
Steps: 62%|██████▏ | 620/1000 [06:35<04:01, 1.57it/s, loss=0.0139, lr=0.001]
Steps: 62%|██████▏ | 621/1000 [06:36<04:02, 1.56it/s, loss=0.0139, lr=0.001]
Steps: 62%|██████▏ | 621/1000 [06:36<04:02, 1.56it/s, loss=0.0724, lr=0.001]
Steps: 62%|██████▏ | 622/1000 [06:36<04:00, 1.57it/s, loss=0.0724, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4075],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4075], device='cuda:0')
Steps: 62%|██████▏ | 622/1000 [06:36<04:00, 1.57it/s, loss=0.000151, lr=0.001]
Steps: 62%|██████▏ | 623/1000 [06:37<04:01, 1.56it/s, loss=0.000151, lr=0.001]
Steps: 62%|██████▏ | 623/1000 [06:37<04:01, 1.56it/s, loss=0.0327, lr=0.001]
Steps: 62%|██████▏ | 624/1000 [06:37<03:58, 1.57it/s, loss=0.0327, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4074],
[0.4082]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4074], device='cuda:0')
Steps: 62%|██████▏ | 624/1000 [06:37<03:58, 1.57it/s, loss=0.00154, lr=0.001]
Steps: 62%|██████▎ | 625/1000 [06:38<04:00, 1.56it/s, loss=0.00154, lr=0.001]
Steps: 62%|██████▎ | 625/1000 [06:38<04:00, 1.56it/s, loss=0.00364, lr=0.001]
Steps: 63%|██████▎ | 626/1000 [06:39<03:57, 1.57it/s, loss=0.00364, lr=0.001]
tensor(0.0008, device='cuda:0')
tensor([[0.4074],
[0.4080]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4072], device='cuda:0')
Steps: 63%|██████▎ | 626/1000 [06:39<03:57, 1.57it/s, loss=0.000339, lr=0.001]
Steps: 63%|██████▎ | 627/1000 [06:39<03:58, 1.56it/s, loss=0.000339, lr=0.001]
Steps: 63%|██████▎ | 627/1000 [06:39<03:58, 1.56it/s, loss=0.0116, lr=0.001]
Steps: 63%|██████▎ | 628/1000 [06:40<03:56, 1.57it/s, loss=0.0116, lr=0.001]
tensor(0.0045, device='cuda:0')
tensor([[0.4073],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4071], device='cuda:0')
Steps: 63%|██████▎ | 628/1000 [06:40<03:56, 1.57it/s, loss=0.00402, lr=0.001]
Steps: 63%|██████▎ | 629/1000 [06:41<03:57, 1.56it/s, loss=0.00402, lr=0.001]
Steps: 63%|██████▎ | 629/1000 [06:41<03:57, 1.56it/s, loss=0.0083, lr=0.001]
Steps: 63%|██████▎ | 630/1000 [06:41<03:55, 1.57it/s, loss=0.0083, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4073],
[0.4077]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4069], device='cuda:0')
Steps: 63%|██████▎ | 630/1000 [06:41<03:55, 1.57it/s, loss=0.00442, lr=0.001]
Steps: 63%|██████▎ | 631/1000 [06:42<03:55, 1.57it/s, loss=0.00442, lr=0.001]
Steps: 63%|██████▎ | 631/1000 [06:42<03:55, 1.57it/s, loss=0.00108, lr=0.001]
Steps: 63%|██████▎ | 632/1000 [06:43<03:53, 1.58it/s, loss=0.00108, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4073],
[0.4075]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4068], device='cuda:0')
Steps: 63%|██████▎ | 632/1000 [06:43<03:53, 1.58it/s, loss=0.0475, lr=0.001]
Steps: 63%|██████▎ | 633/1000 [06:43<03:54, 1.57it/s, loss=0.0475, lr=0.001]
Steps: 63%|██████▎ | 633/1000 [06:43<03:54, 1.57it/s, loss=0.000338, lr=0.001]
Steps: 63%|██████▎ | 634/1000 [06:44<03:52, 1.57it/s, loss=0.000338, lr=0.001]
tensor(0.0017, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 63%|██████▎ | 634/1000 [06:44<03:52, 1.57it/s, loss=0.00117, lr=0.001]
Steps: 64%|██████▎ | 635/1000 [06:45<03:53, 1.56it/s, loss=0.00117, lr=0.001]
Steps: 64%|██████▎ | 635/1000 [06:45<03:53, 1.56it/s, loss=0.000606, lr=0.001]
Steps: 64%|██████▎ | 636/1000 [06:45<03:51, 1.57it/s, loss=0.000606, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4074],
[0.4073]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4066], device='cuda:0')
Steps: 64%|██████▎ | 636/1000 [06:45<03:51, 1.57it/s, loss=0.00112, lr=0.001]
Steps: 64%|██████▎ | 637/1000 [06:46<03:52, 1.56it/s, loss=0.00112, lr=0.001]
Steps: 64%|██████▎ | 637/1000 [06:46<03:52, 1.56it/s, loss=0.00041, lr=0.001]
Steps: 64%|██████▍ | 638/1000 [06:46<03:49, 1.57it/s, loss=0.00041, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4073],
[0.4072]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4065], device='cuda:0')
Steps: 64%|██████▍ | 638/1000 [06:46<03:49, 1.57it/s, loss=0.0321, lr=0.001]
Steps: 64%|██████▍ | 639/1000 [06:47<03:51, 1.56it/s, loss=0.0321, lr=0.001]
Steps: 64%|██████▍ | 639/1000 [06:47<03:51, 1.56it/s, loss=0.0054, lr=0.001]
Steps: 64%|██████▍ | 640/1000 [06:48<03:48, 1.58it/s, loss=0.0054, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4073],
[0.4071]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4064], device='cuda:0')
Steps: 64%|██████▍ | 640/1000 [06:48<03:48, 1.58it/s, loss=0.00463, lr=0.001]
Steps: 64%|██████▍ | 641/1000 [06:48<03:49, 1.57it/s, loss=0.00463, lr=0.001]
Steps: 64%|██████▍ | 641/1000 [06:48<03:49, 1.57it/s, loss=0.00217, lr=0.001]
Steps: 64%|██████▍ | 642/1000 [06:49<03:46, 1.58it/s, loss=0.00217, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4073],
[0.4070]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4063], device='cuda:0')
Steps: 64%|██████▍ | 642/1000 [06:49<03:46, 1.58it/s, loss=0.0276, lr=0.001]
Steps: 64%|██████▍ | 643/1000 [06:50<03:47, 1.57it/s, loss=0.0276, lr=0.001]
Steps: 64%|██████▍ | 643/1000 [06:50<03:47, 1.57it/s, loss=0.0123, lr=0.001]
Steps: 64%|██████▍ | 644/1000 [06:50<03:45, 1.58it/s, loss=0.0123, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4072],
[0.4069]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4062], device='cuda:0')
Steps: 64%|██████▍ | 644/1000 [06:50<03:45, 1.58it/s, loss=0.000329, lr=0.001]
Steps: 64%|██████▍ | 645/1000 [06:51<03:46, 1.57it/s, loss=0.000329, lr=0.001]
Steps: 64%|██████▍ | 645/1000 [06:51<03:46, 1.57it/s, loss=0.0394, lr=0.001]
Steps: 65%|██████▍ | 646/1000 [06:52<03:44, 1.58it/s, loss=0.0394, lr=0.001]
tensor(0.0112, device='cuda:0')
tensor([[0.4072],
[0.4069]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4062], device='cuda:0')
Steps: 65%|██████▍ | 646/1000 [06:52<03:44, 1.58it/s, loss=0.0201, lr=0.001]
Steps: 65%|██████▍ | 647/1000 [06:52<03:45, 1.57it/s, loss=0.0201, lr=0.001]
Steps: 65%|██████▍ | 647/1000 [06:52<03:45, 1.57it/s, loss=0.0107, lr=0.001]
Steps: 65%|██████▍ | 648/1000 [06:53<03:42, 1.58it/s, loss=0.0107, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4073],
[0.4070]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4063], device='cuda:0')
Steps: 65%|██████▍ | 648/1000 [06:53<03:42, 1.58it/s, loss=0.00166, lr=0.001]
Steps: 65%|██████▍ | 649/1000 [06:53<03:43, 1.57it/s, loss=0.00166, lr=0.001]
Steps: 65%|██████▍ | 649/1000 [06:53<03:43, 1.57it/s, loss=0.00491, lr=0.001]
Steps: 65%|██████▌ | 650/1000 [06:54<03:41, 1.58it/s, loss=0.00491, lr=0.001]
tensor(0.0040, device='cuda:0')
tensor([[0.4073],
[0.4071]], device='cuda:0')
Current Norm : tensor([0.4065, 0.4064], device='cuda:0')
Steps: 65%|██████▌ | 650/1000 [06:54<03:41, 1.58it/s, loss=0.00333, lr=0.001]
Steps: 65%|██████▌ | 651/1000 [06:55<03:43, 1.56it/s, loss=0.00333, lr=0.001]
Steps: 65%|██████▌ | 651/1000 [06:55<03:43, 1.56it/s, loss=0.000373, lr=0.001]
Steps: 65%|██████▌ | 652/1000 [06:55<03:42, 1.57it/s, loss=0.000373, lr=0.001]
tensor(0.0067, device='cuda:0')
tensor([[0.4073],
[0.4072]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4065], device='cuda:0')
Steps: 65%|██████▌ | 652/1000 [06:55<03:42, 1.57it/s, loss=0.0434, lr=0.001]
Steps: 65%|██████▌ | 653/1000 [06:56<03:42, 1.56it/s, loss=0.0434, lr=0.001]
Steps: 65%|██████▌ | 653/1000 [06:56<03:42, 1.56it/s, loss=0.00562, lr=0.001]
Steps: 65%|██████▌ | 654/1000 [06:57<03:39, 1.57it/s, loss=0.00562, lr=0.001]
tensor(0.0015, device='cuda:0')
tensor([[0.4073],
[0.4074]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4067], device='cuda:0')
Steps: 65%|██████▌ | 654/1000 [06:57<03:39, 1.57it/s, loss=0.000348, lr=0.001]
Steps: 66%|██████▌ | 655/1000 [06:57<03:40, 1.56it/s, loss=0.000348, lr=0.001]
Steps: 66%|██████▌ | 655/1000 [06:57<03:40, 1.56it/s, loss=0.00938, lr=0.001]
Steps: 66%|██████▌ | 656/1000 [06:58<03:37, 1.58it/s, loss=0.00938, lr=0.001]
tensor(0.0103, device='cuda:0')
tensor([[0.4074],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4069], device='cuda:0')
Steps: 66%|██████▌ | 656/1000 [06:58<03:37, 1.58it/s, loss=0.0624, lr=0.001]
Steps: 66%|██████▌ | 657/1000 [06:59<03:38, 1.57it/s, loss=0.0624, lr=0.001]
Steps: 66%|██████▌ | 657/1000 [06:59<03:38, 1.57it/s, loss=0.0186, lr=0.001]
Steps: 66%|██████▌ | 658/1000 [06:59<03:36, 1.58it/s, loss=0.0186, lr=0.001]
tensor(0.0079, device='cuda:0')
tensor([[0.4075],
[0.4079]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4071], device='cuda:0')
Steps: 66%|██████▌ | 658/1000 [06:59<03:36, 1.58it/s, loss=0.00454, lr=0.001]
Steps: 66%|██████▌ | 659/1000 [07:00<03:37, 1.56it/s, loss=0.00454, lr=0.001]
Steps: 66%|██████▌ | 659/1000 [07:00<03:37, 1.56it/s, loss=0.0198, lr=0.001]
Steps: 66%|██████▌ | 660/1000 [07:00<03:35, 1.58it/s, loss=0.0198, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4076],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4074], device='cuda:0')
Steps: 66%|██████▌ | 660/1000 [07:00<03:35, 1.58it/s, loss=0.00106, lr=0.001]
Steps: 66%|██████▌ | 661/1000 [07:01<03:37, 1.56it/s, loss=0.00106, lr=0.001]
Steps: 66%|██████▌ | 661/1000 [07:01<03:37, 1.56it/s, loss=0.0202, lr=0.001]
Steps: 66%|██████▌ | 662/1000 [07:02<03:37, 1.56it/s, loss=0.0202, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4076],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4077], device='cuda:0')
Steps: 66%|██████▌ | 662/1000 [07:02<03:37, 1.56it/s, loss=0.0313, lr=0.001]
Steps: 66%|██████▋ | 663/1000 [07:02<03:38, 1.54it/s, loss=0.0313, lr=0.001]
Steps: 66%|██████▋ | 663/1000 [07:02<03:38, 1.54it/s, loss=0.00359, lr=0.001]
Steps: 66%|██████▋ | 664/1000 [07:03<03:35, 1.56it/s, loss=0.00359, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4077],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4079], device='cuda:0')
Steps: 66%|██████▋ | 664/1000 [07:03<03:35, 1.56it/s, loss=0.00739, lr=0.001]
Steps: 66%|██████▋ | 665/1000 [07:04<03:36, 1.55it/s, loss=0.00739, lr=0.001]
Steps: 66%|██████▋ | 665/1000 [07:04<03:36, 1.55it/s, loss=0.0146, lr=0.001]
Steps: 67%|██████▋ | 666/1000 [07:04<03:33, 1.56it/s, loss=0.0146, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4079],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4081], device='cuda:0')
Steps: 67%|██████▋ | 666/1000 [07:04<03:33, 1.56it/s, loss=0.00113, lr=0.001]
Steps: 67%|██████▋ | 667/1000 [07:05<03:33, 1.56it/s, loss=0.00113, lr=0.001]
Steps: 67%|██████▋ | 667/1000 [07:05<03:33, 1.56it/s, loss=0.0186, lr=0.001]
Steps: 67%|██████▋ | 668/1000 [07:06<03:31, 1.57it/s, loss=0.0186, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4081],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4082], device='cuda:0')
Steps: 67%|██████▋ | 668/1000 [07:06<03:31, 1.57it/s, loss=0.0289, lr=0.001]
Steps: 67%|██████▋ | 669/1000 [07:06<03:32, 1.56it/s, loss=0.0289, lr=0.001]
Steps: 67%|██████▋ | 669/1000 [07:06<03:32, 1.56it/s, loss=0.0101, lr=0.001]
Steps: 67%|██████▋ | 670/1000 [07:07<03:29, 1.57it/s, loss=0.0101, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4083],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4084], device='cuda:0')
Steps: 67%|██████▋ | 670/1000 [07:07<03:29, 1.57it/s, loss=0.00419, lr=0.001]
Steps: 67%|██████▋ | 671/1000 [07:08<03:31, 1.56it/s, loss=0.00419, lr=0.001]
Steps: 67%|██████▋ | 671/1000 [07:08<03:31, 1.56it/s, loss=0.00952, lr=0.001]
Steps: 67%|██████▋ | 672/1000 [07:08<03:28, 1.57it/s, loss=0.00952, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4086],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4085], device='cuda:0')
Steps: 67%|██████▋ | 672/1000 [07:08<03:28, 1.57it/s, loss=0.0168, lr=0.001]
Steps: 67%|██████▋ | 673/1000 [07:09<03:29, 1.56it/s, loss=0.0168, lr=0.001]
Steps: 67%|██████▋ | 673/1000 [07:09<03:29, 1.56it/s, loss=0.00155, lr=0.001]
Steps: 67%|██████▋ | 674/1000 [07:09<03:26, 1.58it/s, loss=0.00155, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4087],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4086], device='cuda:0')
Steps: 67%|██████▋ | 674/1000 [07:09<03:26, 1.58it/s, loss=0.0634, lr=0.001]
Steps: 68%|██████▊ | 675/1000 [07:10<03:28, 1.56it/s, loss=0.0634, lr=0.001]
Steps: 68%|██████▊ | 675/1000 [07:10<03:28, 1.56it/s, loss=0.0215, lr=0.001]
Steps: 68%|██████▊ | 676/1000 [07:11<03:26, 1.57it/s, loss=0.0215, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4089],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4087], device='cuda:0')
Steps: 68%|██████▊ | 676/1000 [07:11<03:26, 1.57it/s, loss=0.0023, lr=0.001]
Steps: 68%|██████▊ | 677/1000 [07:11<03:27, 1.56it/s, loss=0.0023, lr=0.001]
Steps: 68%|██████▊ | 677/1000 [07:11<03:27, 1.56it/s, loss=0.000175, lr=0.001]
Steps: 68%|██████▊ | 678/1000 [07:12<03:24, 1.57it/s, loss=0.000175, lr=0.001]
tensor(0.0088, device='cuda:0')
tensor([[0.4089],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4087], device='cuda:0')
Steps: 68%|██████▊ | 678/1000 [07:12<03:24, 1.57it/s, loss=0.00022, lr=0.001]
Steps: 68%|██████▊ | 679/1000 [07:13<03:25, 1.56it/s, loss=0.00022, lr=0.001]
Steps: 68%|██████▊ | 679/1000 [07:13<03:25, 1.56it/s, loss=0.0503, lr=0.001]
Steps: 68%|██████▊ | 680/1000 [07:13<03:23, 1.58it/s, loss=0.0503, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4090],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4088], device='cuda:0')
Steps: 68%|██████▊ | 680/1000 [07:13<03:23, 1.58it/s, loss=0.0464, lr=0.001]
Steps: 68%|██████▊ | 681/1000 [07:14<03:23, 1.56it/s, loss=0.0464, lr=0.001]
Steps: 68%|██████▊ | 681/1000 [07:14<03:23, 1.56it/s, loss=0.000616, lr=0.001]
Steps: 68%|██████▊ | 682/1000 [07:14<03:21, 1.58it/s, loss=0.000616, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4091],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4088], device='cuda:0')
Steps: 68%|██████▊ | 682/1000 [07:14<03:21, 1.58it/s, loss=0.000937, lr=0.001]
Steps: 68%|██████▊ | 683/1000 [07:15<03:21, 1.57it/s, loss=0.000937, lr=0.001]
Steps: 68%|██████▊ | 683/1000 [07:15<03:21, 1.57it/s, loss=0.00264, lr=0.001]
Steps: 68%|██████▊ | 684/1000 [07:16<03:19, 1.58it/s, loss=0.00264, lr=0.001]
tensor(0.0082, device='cuda:0')
tensor([[0.4093],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4088], device='cuda:0')
Steps: 68%|██████▊ | 684/1000 [07:16<03:19, 1.58it/s, loss=0.068, lr=0.001]
Steps: 68%|██████▊ | 685/1000 [07:16<03:19, 1.58it/s, loss=0.068, lr=0.001]
Steps: 68%|██████▊ | 685/1000 [07:16<03:19, 1.58it/s, loss=0.000431, lr=0.001]
Steps: 69%|██████▊ | 686/1000 [07:17<03:18, 1.59it/s, loss=0.000431, lr=0.001]
tensor(0.0094, device='cuda:0')
tensor([[0.4094],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4089], device='cuda:0')
Steps: 69%|██████▊ | 686/1000 [07:17<03:18, 1.59it/s, loss=0.0451, lr=0.001]
Steps: 69%|██████▊ | 687/1000 [07:18<03:19, 1.57it/s, loss=0.0451, lr=0.001]
Steps: 69%|██████▊ | 687/1000 [07:18<03:19, 1.57it/s, loss=0.000414, lr=0.001]
Steps: 69%|██████▉ | 688/1000 [07:18<03:17, 1.58it/s, loss=0.000414, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4095],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4090], device='cuda:0')
Steps: 69%|██████▉ | 688/1000 [07:18<03:17, 1.58it/s, loss=0.00126, lr=0.001]
Steps: 69%|██████▉ | 689/1000 [07:19<03:18, 1.57it/s, loss=0.00126, lr=0.001]
Steps: 69%|██████▉ | 689/1000 [07:19<03:18, 1.57it/s, loss=0.000966, lr=0.001]
Steps: 69%|██████▉ | 690/1000 [07:20<03:16, 1.58it/s, loss=0.000966, lr=0.001]
tensor(0.0064, device='cuda:0')
tensor([[0.4095],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4091], device='cuda:0')
Steps: 69%|██████▉ | 690/1000 [07:20<03:16, 1.58it/s, loss=0.0196, lr=0.001]
Steps: 69%|██████▉ | 691/1000 [07:20<03:17, 1.56it/s, loss=0.0196, lr=0.001]
Steps: 69%|██████▉ | 691/1000 [07:20<03:17, 1.56it/s, loss=0.000678, lr=0.001]
Steps: 69%|██████▉ | 692/1000 [07:21<03:15, 1.57it/s, loss=0.000678, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4096],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4092], device='cuda:0')
Steps: 69%|██████▉ | 692/1000 [07:21<03:15, 1.57it/s, loss=0.0733, lr=0.001]
Steps: 69%|██████▉ | 693/1000 [07:21<03:16, 1.56it/s, loss=0.0733, lr=0.001]
Steps: 69%|██████▉ | 693/1000 [07:21<03:16, 1.56it/s, loss=0.000746, lr=0.001]
Steps: 69%|██████▉ | 694/1000 [07:22<03:14, 1.57it/s, loss=0.000746, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4097],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4093], device='cuda:0')
Steps: 69%|██████▉ | 694/1000 [07:22<03:14, 1.57it/s, loss=0.0629, lr=0.001]
Steps: 70%|██████▉ | 695/1000 [07:23<03:15, 1.56it/s, loss=0.0629, lr=0.001]
Steps: 70%|██████▉ | 695/1000 [07:23<03:15, 1.56it/s, loss=0.00226, lr=0.001]
Steps: 70%|██████▉ | 696/1000 [07:23<03:13, 1.57it/s, loss=0.00226, lr=0.001]
tensor(0.0129, device='cuda:0')
tensor([[0.4097],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4094], device='cuda:0')
Steps: 70%|██████▉ | 696/1000 [07:23<03:13, 1.57it/s, loss=0.0491, lr=0.001]
Steps: 70%|██████▉ | 697/1000 [07:24<03:13, 1.56it/s, loss=0.0491, lr=0.001]
Steps: 70%|██████▉ | 697/1000 [07:24<03:13, 1.56it/s, loss=0.0576, lr=0.001]
Steps: 70%|██████▉ | 698/1000 [07:25<03:10, 1.58it/s, loss=0.0576, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4098],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4095], device='cuda:0')
Steps: 70%|██████▉ | 698/1000 [07:25<03:10, 1.58it/s, loss=0.0844, lr=0.001]
Steps: 70%|██████▉ | 699/1000 [07:25<03:11, 1.57it/s, loss=0.0844, lr=0.001]
Steps: 70%|██████▉ | 699/1000 [07:25<03:11, 1.57it/s, loss=0.00121, lr=0.001]
Steps: 70%|███████ | 700/1000 [07:26<03:09, 1.58it/s, loss=0.00121, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0145, -0.0087, -0.0056, -0.0218], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0014, 0.0039, -0.0050, 0.0112], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_700.safetensors
tensor(0.0117, device='cuda:0')
tensor([[0.4100],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4097], device='cuda:0')
Steps: 70%|███████ | 700/1000 [07:26<03:09, 1.58it/s, loss=0.00726, lr=0.001]
Steps: 70%|███████ | 701/1000 [07:27<03:12, 1.55it/s, loss=0.00726, lr=0.001]
Steps: 70%|███████ | 701/1000 [07:27<03:12, 1.55it/s, loss=0.0183, lr=0.001]
Steps: 70%|███████ | 702/1000 [07:27<03:09, 1.57it/s, loss=0.0183, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4104],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4093, 0.4100], device='cuda:0')
Steps: 70%|███████ | 702/1000 [07:27<03:09, 1.57it/s, loss=0.00239, lr=0.001]
Steps: 70%|███████ | 703/1000 [07:28<03:10, 1.56it/s, loss=0.00239, lr=0.001]
Steps: 70%|███████ | 703/1000 [07:28<03:10, 1.56it/s, loss=0.00413, lr=0.001]
Steps: 70%|███████ | 704/1000 [07:28<03:08, 1.57it/s, loss=0.00413, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4107],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4102], device='cuda:0')
Steps: 70%|███████ | 704/1000 [07:28<03:08, 1.57it/s, loss=0.00356, lr=0.001]
Steps: 70%|███████ | 705/1000 [07:29<03:08, 1.56it/s, loss=0.00356, lr=0.001]
Steps: 70%|███████ | 705/1000 [07:29<03:08, 1.56it/s, loss=0.0161, lr=0.001]
Steps: 71%|███████ | 706/1000 [07:30<03:07, 1.56it/s, loss=0.0161, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4111],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4104], device='cuda:0')
Steps: 71%|███████ | 706/1000 [07:30<03:07, 1.56it/s, loss=0.00678, lr=0.001]
Steps: 71%|███████ | 707/1000 [07:30<03:09, 1.55it/s, loss=0.00678, lr=0.001]
Steps: 71%|███████ | 707/1000 [07:30<03:09, 1.55it/s, loss=0.0347, lr=0.001]
Steps: 71%|███████ | 708/1000 [07:31<03:06, 1.57it/s, loss=0.0347, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4114],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4102, 0.4106], device='cuda:0')
Steps: 71%|███████ | 708/1000 [07:31<03:06, 1.57it/s, loss=0.000707, lr=0.001]
Steps: 71%|███████ | 709/1000 [07:32<03:06, 1.56it/s, loss=0.000707, lr=0.001]
Steps: 71%|███████ | 709/1000 [07:32<03:06, 1.56it/s, loss=0.0312, lr=0.001]
Steps: 71%|███████ | 710/1000 [07:32<03:04, 1.58it/s, loss=0.0312, lr=0.001]
tensor(0.0064, device='cuda:0')
tensor([[0.4116],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4105, 0.4109], device='cuda:0')
Steps: 71%|███████ | 710/1000 [07:32<03:04, 1.58it/s, loss=0.0412, lr=0.001]
Steps: 71%|███████ | 711/1000 [07:33<03:05, 1.56it/s, loss=0.0412, lr=0.001]
Steps: 71%|███████ | 711/1000 [07:33<03:05, 1.56it/s, loss=0.0127, lr=0.001]
Steps: 71%|███████ | 712/1000 [07:34<03:03, 1.57it/s, loss=0.0127, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4118],
[0.4123]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4111], device='cuda:0')
Steps: 71%|███████ | 712/1000 [07:34<03:03, 1.57it/s, loss=0.035, lr=0.001]
Steps: 71%|███████▏ | 713/1000 [07:34<03:04, 1.55it/s, loss=0.035, lr=0.001]
Steps: 71%|███████▏ | 713/1000 [07:34<03:04, 1.55it/s, loss=0.00396, lr=0.001]
Steps: 71%|███████▏ | 714/1000 [07:35<03:03, 1.56it/s, loss=0.00396, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4120],
[0.4124]], device='cuda:0')
Current Norm : tensor([0.4108, 0.4112], device='cuda:0')
Steps: 71%|███████▏ | 714/1000 [07:35<03:03, 1.56it/s, loss=0.0197, lr=0.001]
Steps: 72%|███████▏ | 715/1000 [07:36<03:03, 1.55it/s, loss=0.0197, lr=0.001]
Steps: 72%|███████▏ | 715/1000 [07:36<03:03, 1.55it/s, loss=0.013, lr=0.001]
Steps: 72%|███████▏ | 716/1000 [07:36<03:00, 1.57it/s, loss=0.013, lr=0.001]
tensor(0.0049, device='cuda:0')
tensor([[0.4121],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4113], device='cuda:0')
Steps: 72%|███████▏ | 716/1000 [07:36<03:00, 1.57it/s, loss=0.0397, lr=0.001]
Steps: 72%|███████▏ | 717/1000 [07:37<03:01, 1.56it/s, loss=0.0397, lr=0.001]
Steps: 72%|███████▏ | 717/1000 [07:37<03:01, 1.56it/s, loss=0.00506, lr=0.001]
Steps: 72%|███████▏ | 718/1000 [07:37<02:59, 1.57it/s, loss=0.00506, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4122],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4114], device='cuda:0')
Steps: 72%|███████▏ | 718/1000 [07:37<02:59, 1.57it/s, loss=0.198, lr=0.001]
Steps: 72%|███████▏ | 719/1000 [07:38<03:00, 1.56it/s, loss=0.198, lr=0.001]
Steps: 72%|███████▏ | 719/1000 [07:38<03:00, 1.56it/s, loss=0.00696, lr=0.001]
Steps: 72%|███████▏ | 720/1000 [07:39<02:57, 1.58it/s, loss=0.00696, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4123],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4114], device='cuda:0')
Steps: 72%|███████▏ | 720/1000 [07:39<02:57, 1.58it/s, loss=0.000396, lr=0.001]
Steps: 72%|███████▏ | 721/1000 [07:39<02:59, 1.56it/s, loss=0.000396, lr=0.001]
Steps: 72%|███████▏ | 721/1000 [07:39<02:59, 1.56it/s, loss=0.00705, lr=0.001]
Steps: 72%|███████▏ | 722/1000 [07:40<02:56, 1.57it/s, loss=0.00705, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4122],
[0.4126]], device='cuda:0')
Current Norm : tensor([0.4110, 0.4113], device='cuda:0')
Steps: 72%|███████▏ | 722/1000 [07:40<02:56, 1.57it/s, loss=0.00332, lr=0.001]
Steps: 72%|███████▏ | 723/1000 [07:41<02:57, 1.56it/s, loss=0.00332, lr=0.001]
Steps: 72%|███████▏ | 723/1000 [07:41<02:57, 1.56it/s, loss=0.000789, lr=0.001]
Steps: 72%|███████▏ | 724/1000 [07:41<02:55, 1.58it/s, loss=0.000789, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4121],
[0.4125]], device='cuda:0')
Current Norm : tensor([0.4109, 0.4112], device='cuda:0')
Steps: 72%|███████▏ | 724/1000 [07:41<02:55, 1.58it/s, loss=0.00194, lr=0.001]
Steps: 72%|███████▎ | 725/1000 [07:42<02:55, 1.57it/s, loss=0.00194, lr=0.001]
Steps: 72%|███████▎ | 725/1000 [07:42<02:55, 1.57it/s, loss=0.00128, lr=0.001]
Steps: 73%|███████▎ | 726/1000 [07:43<02:53, 1.58it/s, loss=0.00128, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4119],
[0.4123]], device='cuda:0')
Current Norm : tensor([0.4107, 0.4111], device='cuda:0')
Steps: 73%|███████▎ | 726/1000 [07:43<02:53, 1.58it/s, loss=0.0698, lr=0.001]
Steps: 73%|███████▎ | 727/1000 [07:43<02:53, 1.57it/s, loss=0.0698, lr=0.001]
Steps: 73%|███████▎ | 727/1000 [07:43<02:53, 1.57it/s, loss=0.00162, lr=0.001]
Steps: 73%|███████▎ | 728/1000 [07:44<02:51, 1.59it/s, loss=0.00162, lr=0.001]
tensor(0.0004, device='cuda:0')
tensor([[0.4117],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4106, 0.4110], device='cuda:0')
Steps: 73%|███████▎ | 728/1000 [07:44<02:51, 1.59it/s, loss=0.000498, lr=0.001]
Steps: 73%|███████▎ | 729/1000 [07:44<02:52, 1.57it/s, loss=0.000498, lr=0.001]
Steps: 73%|███████▎ | 729/1000 [07:44<02:52, 1.57it/s, loss=0.00026, lr=0.001]
Steps: 73%|███████▎ | 730/1000 [07:45<02:50, 1.58it/s, loss=0.00026, lr=0.001]
tensor(0.0013, device='cuda:0')
tensor([[0.4115],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4104, 0.4108], device='cuda:0')
Steps: 73%|███████▎ | 730/1000 [07:45<02:50, 1.58it/s, loss=0.0024, lr=0.001]
Steps: 73%|███████▎ | 731/1000 [07:46<02:50, 1.57it/s, loss=0.0024, lr=0.001]
Steps: 73%|███████▎ | 731/1000 [07:46<02:50, 1.57it/s, loss=0.000949, lr=0.001]
Steps: 73%|███████▎ | 732/1000 [07:46<02:48, 1.59it/s, loss=0.000949, lr=0.001]
tensor(0.0055, device='cuda:0')
tensor([[0.4112],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4101, 0.4106], device='cuda:0')
Steps: 73%|███████▎ | 732/1000 [07:46<02:48, 1.59it/s, loss=0.00115, lr=0.001]
Steps: 73%|███████▎ | 733/1000 [07:47<02:50, 1.56it/s, loss=0.00115, lr=0.001]
Steps: 73%|███████▎ | 733/1000 [07:47<02:50, 1.56it/s, loss=0.0166, lr=0.001]
Steps: 73%|███████▎ | 734/1000 [07:48<02:48, 1.58it/s, loss=0.0166, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4110],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4104], device='cuda:0')
Steps: 73%|███████▎ | 734/1000 [07:48<02:48, 1.58it/s, loss=0.0649, lr=0.001]
Steps: 74%|███████▎ | 735/1000 [07:48<02:48, 1.57it/s, loss=0.0649, lr=0.001]
Steps: 74%|███████▎ | 735/1000 [07:48<02:48, 1.57it/s, loss=0.0919, lr=0.001]
Steps: 74%|███████▎ | 736/1000 [07:49<02:47, 1.58it/s, loss=0.0919, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4107],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4101], device='cuda:0')
Steps: 74%|███████▎ | 736/1000 [07:49<02:47, 1.58it/s, loss=0.00105, lr=0.001]
Steps: 74%|███████▎ | 737/1000 [07:50<02:48, 1.56it/s, loss=0.00105, lr=0.001]
Steps: 74%|███████▎ | 737/1000 [07:50<02:48, 1.56it/s, loss=0.00841, lr=0.001]
Steps: 74%|███████▍ | 738/1000 [07:50<02:45, 1.58it/s, loss=0.00841, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4104],
[0.4110]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4099], device='cuda:0')
Steps: 74%|███████▍ | 738/1000 [07:50<02:45, 1.58it/s, loss=0.00961, lr=0.001]
Steps: 74%|███████▍ | 739/1000 [07:51<02:46, 1.57it/s, loss=0.00961, lr=0.001]
Steps: 74%|███████▍ | 739/1000 [07:51<02:46, 1.57it/s, loss=0.00115, lr=0.001]
Steps: 74%|███████▍ | 740/1000 [07:51<02:44, 1.58it/s, loss=0.00115, lr=0.001]
tensor(0.0072, device='cuda:0')
tensor([[0.4103],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4097], device='cuda:0')
Steps: 74%|███████▍ | 740/1000 [07:51<02:44, 1.58it/s, loss=0.0325, lr=0.001]
Steps: 74%|███████▍ | 741/1000 [07:52<02:44, 1.57it/s, loss=0.0325, lr=0.001]
Steps: 74%|███████▍ | 741/1000 [07:52<02:44, 1.57it/s, loss=0.00165, lr=0.001]
Steps: 74%|███████▍ | 742/1000 [07:53<02:42, 1.59it/s, loss=0.00165, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4101],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4091, 0.4095], device='cuda:0')
Steps: 74%|███████▍ | 742/1000 [07:53<02:42, 1.59it/s, loss=0.0285, lr=0.001]
Steps: 74%|███████▍ | 743/1000 [07:53<02:43, 1.57it/s, loss=0.0285, lr=0.001]
Steps: 74%|███████▍ | 743/1000 [07:53<02:43, 1.57it/s, loss=0.00142, lr=0.001]
Steps: 74%|███████▍ | 744/1000 [07:54<02:42, 1.58it/s, loss=0.00142, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4100],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4090, 0.4094], device='cuda:0')
Steps: 74%|███████▍ | 744/1000 [07:54<02:42, 1.58it/s, loss=0.017, lr=0.001]
Steps: 74%|███████▍ | 745/1000 [07:55<02:43, 1.56it/s, loss=0.017, lr=0.001]
Steps: 74%|███████▍ | 745/1000 [07:55<02:43, 1.56it/s, loss=0.00199, lr=0.001]
Steps: 75%|███████▍ | 746/1000 [07:55<02:41, 1.57it/s, loss=0.00199, lr=0.001]
tensor(0.0085, device='cuda:0')
tensor([[0.4099],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4094], device='cuda:0')
Steps: 75%|███████▍ | 746/1000 [07:55<02:41, 1.57it/s, loss=0.0158, lr=0.001]
Steps: 75%|███████▍ | 747/1000 [07:56<02:41, 1.56it/s, loss=0.0158, lr=0.001]
Steps: 75%|███████▍ | 747/1000 [07:56<02:41, 1.56it/s, loss=0.012, lr=0.001]
Steps: 75%|███████▍ | 748/1000 [07:57<02:39, 1.58it/s, loss=0.012, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4098],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4094], device='cuda:0')
Steps: 75%|███████▍ | 748/1000 [07:57<02:39, 1.58it/s, loss=0.0097, lr=0.001]
Steps: 75%|███████▍ | 749/1000 [07:57<02:40, 1.56it/s, loss=0.0097, lr=0.001]
Steps: 75%|███████▍ | 749/1000 [07:57<02:40, 1.56it/s, loss=0.00167, lr=0.001]
Steps: 75%|███████▌ | 750/1000 [07:58<02:38, 1.58it/s, loss=0.00167, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4098],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4088, 0.4093], device='cuda:0')
Steps: 75%|███████▌ | 750/1000 [07:58<02:38, 1.58it/s, loss=0.0012, lr=0.001]
Steps: 75%|███████▌ | 751/1000 [07:58<02:39, 1.56it/s, loss=0.0012, lr=0.001]
Steps: 75%|███████▌ | 751/1000 [07:58<02:39, 1.56it/s, loss=0.00566, lr=0.001]
Steps: 75%|███████▌ | 752/1000 [07:59<02:37, 1.57it/s, loss=0.00566, lr=0.001]
tensor(0.0034, device='cuda:0')
tensor([[0.4097],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4093], device='cuda:0')
Steps: 75%|███████▌ | 752/1000 [07:59<02:37, 1.57it/s, loss=0.000511, lr=0.001]
Steps: 75%|███████▌ | 753/1000 [08:00<02:38, 1.56it/s, loss=0.000511, lr=0.001]
Steps: 75%|███████▌ | 753/1000 [08:00<02:38, 1.56it/s, loss=0.00294, lr=0.001]
Steps: 75%|███████▌ | 754/1000 [08:00<02:36, 1.58it/s, loss=0.00294, lr=0.001]
tensor(0.0042, device='cuda:0')
tensor([[0.4096],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4092], device='cuda:0')
Steps: 75%|███████▌ | 754/1000 [08:00<02:36, 1.58it/s, loss=0.00867, lr=0.001]
Steps: 76%|███████▌ | 755/1000 [08:01<02:36, 1.57it/s, loss=0.00867, lr=0.001]
Steps: 76%|███████▌ | 755/1000 [08:01<02:36, 1.57it/s, loss=0.013, lr=0.001]
Steps: 76%|███████▌ | 756/1000 [08:02<02:34, 1.58it/s, loss=0.013, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4095],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4092], device='cuda:0')
Steps: 76%|███████▌ | 756/1000 [08:02<02:34, 1.58it/s, loss=0.0197, lr=0.001]
Steps: 76%|███████▌ | 757/1000 [08:02<02:34, 1.57it/s, loss=0.0197, lr=0.001]
Steps: 76%|███████▌ | 757/1000 [08:02<02:34, 1.57it/s, loss=0.00643, lr=0.001]
Steps: 76%|███████▌ | 758/1000 [08:03<02:32, 1.58it/s, loss=0.00643, lr=0.001]
tensor(0.0008, device='cuda:0')
tensor([[0.4094],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4091], device='cuda:0')
Steps: 76%|███████▌ | 758/1000 [08:03<02:32, 1.58it/s, loss=0.000187, lr=0.001]
Steps: 76%|███████▌ | 759/1000 [08:04<02:33, 1.57it/s, loss=0.000187, lr=0.001]
Steps: 76%|███████▌ | 759/1000 [08:04<02:33, 1.57it/s, loss=0.00284, lr=0.001]
Steps: 76%|███████▌ | 760/1000 [08:04<02:30, 1.59it/s, loss=0.00284, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4092],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4090], device='cuda:0')
Steps: 76%|███████▌ | 760/1000 [08:04<02:30, 1.59it/s, loss=0.016, lr=0.001]
Steps: 76%|███████▌ | 761/1000 [08:05<02:32, 1.56it/s, loss=0.016, lr=0.001]
Steps: 76%|███████▌ | 761/1000 [08:05<02:32, 1.56it/s, loss=0.000488, lr=0.001]
Steps: 76%|███████▌ | 762/1000 [08:05<02:31, 1.57it/s, loss=0.000488, lr=0.001]
tensor(0.0043, device='cuda:0')
tensor([[0.4090],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4090], device='cuda:0')
Steps: 76%|███████▌ | 762/1000 [08:05<02:31, 1.57it/s, loss=0.000765, lr=0.001]
Steps: 76%|███████▋ | 763/1000 [08:06<02:31, 1.56it/s, loss=0.000765, lr=0.001]
Steps: 76%|███████▋ | 763/1000 [08:06<02:31, 1.56it/s, loss=0.0178, lr=0.001]
Steps: 76%|███████▋ | 764/1000 [08:07<02:29, 1.58it/s, loss=0.0178, lr=0.001]
tensor(0.0050, device='cuda:0')
tensor([[0.4088],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4089], device='cuda:0')
Steps: 76%|███████▋ | 764/1000 [08:07<02:29, 1.58it/s, loss=0.0123, lr=0.001]
Steps: 76%|███████▋ | 765/1000 [08:07<02:29, 1.57it/s, loss=0.0123, lr=0.001]
Steps: 76%|███████▋ | 765/1000 [08:07<02:29, 1.57it/s, loss=0.00927, lr=0.001]
Steps: 77%|███████▋ | 766/1000 [08:08<02:27, 1.58it/s, loss=0.00927, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4087],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4089], device='cuda:0')
Steps: 77%|███████▋ | 766/1000 [08:08<02:27, 1.58it/s, loss=0.0215, lr=0.001]
Steps: 77%|███████▋ | 767/1000 [08:09<02:28, 1.57it/s, loss=0.0215, lr=0.001]
Steps: 77%|███████▋ | 767/1000 [08:09<02:28, 1.57it/s, loss=0.0185, lr=0.001]
Steps: 77%|███████▋ | 768/1000 [08:09<02:26, 1.58it/s, loss=0.0185, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4085],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4088], device='cuda:0')
Steps: 77%|███████▋ | 768/1000 [08:09<02:26, 1.58it/s, loss=0.00787, lr=0.001]
Steps: 77%|███████▋ | 769/1000 [08:10<02:27, 1.56it/s, loss=0.00787, lr=0.001]
Steps: 77%|███████▋ | 769/1000 [08:10<02:27, 1.56it/s, loss=0.0034, lr=0.001]
Steps: 77%|███████▋ | 770/1000 [08:11<02:26, 1.57it/s, loss=0.0034, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4084],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4088], device='cuda:0')
Steps: 77%|███████▋ | 770/1000 [08:11<02:26, 1.57it/s, loss=0.00167, lr=0.001]
Steps: 77%|███████▋ | 771/1000 [08:11<02:27, 1.55it/s, loss=0.00167, lr=0.001]
Steps: 77%|███████▋ | 771/1000 [08:11<02:27, 1.55it/s, loss=0.0461, lr=0.001]
Steps: 77%|███████▋ | 772/1000 [08:12<02:25, 1.57it/s, loss=0.0461, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4082],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4087], device='cuda:0')
Steps: 77%|███████▋ | 772/1000 [08:12<02:25, 1.57it/s, loss=0.0114, lr=0.001]
Steps: 77%|███████▋ | 773/1000 [08:12<02:25, 1.56it/s, loss=0.0114, lr=0.001]
Steps: 77%|███████▋ | 773/1000 [08:12<02:25, 1.56it/s, loss=0.00369, lr=0.001]
Steps: 77%|███████▋ | 774/1000 [08:13<02:24, 1.57it/s, loss=0.00369, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4080],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4087], device='cuda:0')
Steps: 77%|███████▋ | 774/1000 [08:13<02:24, 1.57it/s, loss=0.0384, lr=0.001]
Steps: 78%|███████▊ | 775/1000 [08:14<02:24, 1.56it/s, loss=0.0384, lr=0.001]
Steps: 78%|███████▊ | 775/1000 [08:14<02:24, 1.56it/s, loss=0.0432, lr=0.001]
Steps: 78%|███████▊ | 776/1000 [08:14<02:22, 1.57it/s, loss=0.0432, lr=0.001]
tensor(0.0095, device='cuda:0')
tensor([[0.4079],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 776/1000 [08:14<02:22, 1.57it/s, loss=0.00367, lr=0.001]
Steps: 78%|███████▊ | 777/1000 [08:15<02:23, 1.56it/s, loss=0.00367, lr=0.001]
Steps: 78%|███████▊ | 777/1000 [08:15<02:23, 1.56it/s, loss=0.0117, lr=0.001]
Steps: 78%|███████▊ | 778/1000 [08:16<02:20, 1.57it/s, loss=0.0117, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4078],
[0.4096]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 778/1000 [08:16<02:20, 1.57it/s, loss=0.00418, lr=0.001]
Steps: 78%|███████▊ | 779/1000 [08:16<02:21, 1.56it/s, loss=0.00418, lr=0.001]
Steps: 78%|███████▊ | 779/1000 [08:16<02:21, 1.56it/s, loss=0.00894, lr=0.001]
Steps: 78%|███████▊ | 780/1000 [08:17<02:19, 1.57it/s, loss=0.00894, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4078],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 780/1000 [08:17<02:19, 1.57it/s, loss=0.00025, lr=0.001]
Steps: 78%|███████▊ | 781/1000 [08:18<02:20, 1.56it/s, loss=0.00025, lr=0.001]
Steps: 78%|███████▊ | 781/1000 [08:18<02:20, 1.56it/s, loss=0.0261, lr=0.001]
Steps: 78%|███████▊ | 782/1000 [08:18<02:18, 1.57it/s, loss=0.0261, lr=0.001]
tensor(0.0098, device='cuda:0')
tensor([[0.4079],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4087], device='cuda:0')
Steps: 78%|███████▊ | 782/1000 [08:18<02:18, 1.57it/s, loss=0.0055, lr=0.001]
Steps: 78%|███████▊ | 783/1000 [08:19<02:18, 1.57it/s, loss=0.0055, lr=0.001]
Steps: 78%|███████▊ | 783/1000 [08:19<02:18, 1.57it/s, loss=0.0204, lr=0.001]
Steps: 78%|███████▊ | 784/1000 [08:19<02:16, 1.58it/s, loss=0.0204, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4080],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4088], device='cuda:0')
Steps: 78%|███████▊ | 784/1000 [08:19<02:16, 1.58it/s, loss=0.0061, lr=0.001]
Steps: 78%|███████▊ | 785/1000 [08:20<02:16, 1.57it/s, loss=0.0061, lr=0.001]
Steps: 78%|███████▊ | 785/1000 [08:20<02:16, 1.57it/s, loss=0.0105, lr=0.001]
Steps: 79%|███████▊ | 786/1000 [08:21<02:15, 1.58it/s, loss=0.0105, lr=0.001]
tensor(0.0076, device='cuda:0')
tensor([[0.4081],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4088], device='cuda:0')
Steps: 79%|███████▊ | 786/1000 [08:21<02:15, 1.58it/s, loss=0.0608, lr=0.001]
Steps: 79%|███████▊ | 787/1000 [08:21<02:17, 1.55it/s, loss=0.0608, lr=0.001]
Steps: 79%|███████▊ | 787/1000 [08:21<02:17, 1.55it/s, loss=0.00173, lr=0.001]
Steps: 79%|███████▉ | 788/1000 [08:22<02:14, 1.58it/s, loss=0.00173, lr=0.001]
tensor(0.0013, device='cuda:0')
tensor([[0.4083],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4089], device='cuda:0')
Steps: 79%|███████▉ | 788/1000 [08:22<02:14, 1.58it/s, loss=0.00145, lr=0.001]
Steps: 79%|███████▉ | 789/1000 [08:23<02:14, 1.56it/s, loss=0.00145, lr=0.001]
Steps: 79%|███████▉ | 789/1000 [08:23<02:14, 1.56it/s, loss=0.00302, lr=0.001]
Steps: 79%|███████▉ | 790/1000 [08:23<02:13, 1.58it/s, loss=0.00302, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4085],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4090], device='cuda:0')
Steps: 79%|███████▉ | 790/1000 [08:23<02:13, 1.58it/s, loss=0.00383, lr=0.001]
Steps: 79%|███████▉ | 791/1000 [08:24<02:13, 1.56it/s, loss=0.00383, lr=0.001]
Steps: 79%|███████▉ | 791/1000 [08:24<02:13, 1.56it/s, loss=0.004, lr=0.001]
Steps: 79%|███████▉ | 792/1000 [08:25<02:11, 1.58it/s, loss=0.004, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4085],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4091], device='cuda:0')
Steps: 79%|███████▉ | 792/1000 [08:25<02:11, 1.58it/s, loss=0.0108, lr=0.001]
Steps: 79%|███████▉ | 793/1000 [08:25<02:12, 1.56it/s, loss=0.0108, lr=0.001]
Steps: 79%|███████▉ | 793/1000 [08:25<02:12, 1.56it/s, loss=0.00444, lr=0.001]
Steps: 79%|███████▉ | 794/1000 [08:26<02:10, 1.57it/s, loss=0.00444, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4086],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 79%|███████▉ | 794/1000 [08:26<02:10, 1.57it/s, loss=0.000504, lr=0.001]
Steps: 80%|███████▉ | 795/1000 [08:26<02:11, 1.56it/s, loss=0.000504, lr=0.001]
Steps: 80%|███████▉ | 795/1000 [08:26<02:11, 1.56it/s, loss=0.00754, lr=0.001]
Steps: 80%|███████▉ | 796/1000 [08:27<02:09, 1.58it/s, loss=0.00754, lr=0.001]
tensor(0.0015, device='cuda:0')
tensor([[0.4086],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4092], device='cuda:0')
Steps: 80%|███████▉ | 796/1000 [08:27<02:09, 1.58it/s, loss=0.000605, lr=0.001]
Steps: 80%|███████▉ | 797/1000 [08:28<02:09, 1.56it/s, loss=0.000605, lr=0.001]
Steps: 80%|███████▉ | 797/1000 [08:28<02:09, 1.56it/s, loss=0.0101, lr=0.001]
Steps: 80%|███████▉ | 798/1000 [08:28<02:08, 1.57it/s, loss=0.0101, lr=0.001]
tensor(0.0057, device='cuda:0')
tensor([[0.4085],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4091], device='cuda:0')
Steps: 80%|███████▉ | 798/1000 [08:28<02:08, 1.57it/s, loss=0.0274, lr=0.001]
Steps: 80%|███████▉ | 799/1000 [08:29<02:08, 1.56it/s, loss=0.0274, lr=0.001]
Steps: 80%|███████▉ | 799/1000 [08:29<02:08, 1.56it/s, loss=0.00328, lr=0.001]
Steps: 80%|████████ | 800/1000 [08:30<02:07, 1.57it/s, loss=0.00328, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0130, 0.0020, -0.0095, -0.0193], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([-0.0026, 0.0098, -0.0090, 0.0132], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_800.safetensors
tensor(0.0061, device='cuda:0')
tensor([[0.4083],
[0.4101]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4091], device='cuda:0')
Steps: 80%|████████ | 800/1000 [08:30<02:07, 1.57it/s, loss=0.00261, lr=0.001]
Steps: 80%|████████ | 801/1000 [08:30<02:07, 1.56it/s, loss=0.00261, lr=0.001]
Steps: 80%|████████ | 801/1000 [08:30<02:07, 1.56it/s, loss=0.149, lr=0.001]
Steps: 80%|████████ | 802/1000 [08:31<02:05, 1.58it/s, loss=0.149, lr=0.001]
tensor(0.0048, device='cuda:0')
tensor([[0.4082],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4090], device='cuda:0')
Steps: 80%|████████ | 802/1000 [08:31<02:05, 1.58it/s, loss=0.0241, lr=0.001]
Steps: 80%|████████ | 803/1000 [08:32<02:05, 1.57it/s, loss=0.0241, lr=0.001]
Steps: 80%|████████ | 803/1000 [08:32<02:05, 1.57it/s, loss=0.000268, lr=0.001]
Steps: 80%|████████ | 804/1000 [08:32<02:03, 1.58it/s, loss=0.000268, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4080],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4090], device='cuda:0')
Steps: 80%|████████ | 804/1000 [08:32<02:03, 1.58it/s, loss=0.0149, lr=0.001]
Steps: 80%|████████ | 805/1000 [08:33<02:03, 1.58it/s, loss=0.0149, lr=0.001]
Steps: 80%|████████ | 805/1000 [08:33<02:03, 1.58it/s, loss=0.00414, lr=0.001]
Steps: 81%|████████ | 806/1000 [08:33<02:02, 1.58it/s, loss=0.00414, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4079],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4089], device='cuda:0')
Steps: 81%|████████ | 806/1000 [08:33<02:02, 1.58it/s, loss=0.00103, lr=0.001]
Steps: 81%|████████ | 807/1000 [08:34<02:02, 1.57it/s, loss=0.00103, lr=0.001]
Steps: 81%|████████ | 807/1000 [08:34<02:02, 1.57it/s, loss=0.00419, lr=0.001]
Steps: 81%|████████ | 808/1000 [08:35<02:00, 1.59it/s, loss=0.00419, lr=0.001]
tensor(0.0009, device='cuda:0')
tensor([[0.4078],
[0.4098]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4088], device='cuda:0')
Steps: 81%|████████ | 808/1000 [08:35<02:00, 1.59it/s, loss=0.000414, lr=0.001]
Steps: 81%|████████ | 809/1000 [08:35<02:01, 1.58it/s, loss=0.000414, lr=0.001]
Steps: 81%|████████ | 809/1000 [08:35<02:01, 1.58it/s, loss=0.000762, lr=0.001]
Steps: 81%|████████ | 810/1000 [08:36<01:59, 1.59it/s, loss=0.000762, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4077],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4087], device='cuda:0')
Steps: 81%|████████ | 810/1000 [08:36<01:59, 1.59it/s, loss=0.00104, lr=0.001]
Steps: 81%|████████ | 811/1000 [08:37<02:00, 1.57it/s, loss=0.00104, lr=0.001]
Steps: 81%|████████ | 811/1000 [08:37<02:00, 1.57it/s, loss=0.000707, lr=0.001]
Steps: 81%|████████ | 812/1000 [08:37<01:59, 1.58it/s, loss=0.000707, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4076],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4086], device='cuda:0')
Steps: 81%|████████ | 812/1000 [08:37<01:59, 1.58it/s, loss=0.0044, lr=0.001]
Steps: 81%|████████▏ | 813/1000 [08:38<01:59, 1.57it/s, loss=0.0044, lr=0.001]
Steps: 81%|████████▏ | 813/1000 [08:38<01:59, 1.57it/s, loss=0.0651, lr=0.001]
Steps: 81%|████████▏ | 814/1000 [08:39<01:57, 1.58it/s, loss=0.0651, lr=0.001]
tensor(0.0085, device='cuda:0')
tensor([[0.4076],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4085], device='cuda:0')
Steps: 81%|████████▏ | 814/1000 [08:39<01:57, 1.58it/s, loss=0.0106, lr=0.001]
Steps: 82%|████████▏ | 815/1000 [08:39<01:57, 1.57it/s, loss=0.0106, lr=0.001]
Steps: 82%|████████▏ | 815/1000 [08:39<01:57, 1.57it/s, loss=0.0596, lr=0.001]
Steps: 82%|████████▏ | 816/1000 [08:40<01:56, 1.58it/s, loss=0.0596, lr=0.001]
tensor(0.0065, device='cuda:0')
tensor([[0.4076],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4084], device='cuda:0')
Steps: 82%|████████▏ | 816/1000 [08:40<01:56, 1.58it/s, loss=0.0189, lr=0.001]
Steps: 82%|████████▏ | 817/1000 [08:40<01:56, 1.57it/s, loss=0.0189, lr=0.001]
Steps: 82%|████████▏ | 817/1000 [08:40<01:56, 1.57it/s, loss=0.0145, lr=0.001]
Steps: 82%|████████▏ | 818/1000 [08:41<01:54, 1.58it/s, loss=0.0145, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4076],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 818/1000 [08:41<01:54, 1.58it/s, loss=0.00208, lr=0.001]
Steps: 82%|████████▏ | 819/1000 [08:42<01:55, 1.57it/s, loss=0.00208, lr=0.001]
Steps: 82%|████████▏ | 819/1000 [08:42<01:55, 1.57it/s, loss=0.00879, lr=0.001]
Steps: 82%|████████▏ | 820/1000 [08:42<01:53, 1.58it/s, loss=0.00879, lr=0.001]
tensor(0.0078, device='cuda:0')
tensor([[0.4076],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 820/1000 [08:42<01:53, 1.58it/s, loss=0.000313, lr=0.001]
Steps: 82%|████████▏ | 821/1000 [08:43<01:54, 1.56it/s, loss=0.000313, lr=0.001]
Steps: 82%|████████▏ | 821/1000 [08:43<01:54, 1.56it/s, loss=0.011, lr=0.001]
Steps: 82%|████████▏ | 822/1000 [08:44<01:52, 1.58it/s, loss=0.011, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4076],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 822/1000 [08:44<01:52, 1.58it/s, loss=0.0259, lr=0.001]
Steps: 82%|████████▏ | 823/1000 [08:44<01:53, 1.57it/s, loss=0.0259, lr=0.001]
Steps: 82%|████████▏ | 823/1000 [08:44<01:53, 1.57it/s, loss=0.00135, lr=0.001]
Steps: 82%|████████▏ | 824/1000 [08:45<01:51, 1.59it/s, loss=0.00135, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4077],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4083], device='cuda:0')
Steps: 82%|████████▏ | 824/1000 [08:45<01:51, 1.59it/s, loss=0.0033, lr=0.001]
Steps: 82%|████████▎ | 825/1000 [08:46<01:51, 1.57it/s, loss=0.0033, lr=0.001]
Steps: 82%|████████▎ | 825/1000 [08:46<01:51, 1.57it/s, loss=0.00391, lr=0.001]
Steps: 83%|████████▎ | 826/1000 [08:46<01:49, 1.58it/s, loss=0.00391, lr=0.001]
tensor(0.0020, device='cuda:0')
tensor([[0.4078],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4083], device='cuda:0')
Steps: 83%|████████▎ | 826/1000 [08:46<01:49, 1.58it/s, loss=0.000745, lr=0.001]
Steps: 83%|████████▎ | 827/1000 [08:47<01:50, 1.57it/s, loss=0.000745, lr=0.001]
Steps: 83%|████████▎ | 827/1000 [08:47<01:50, 1.57it/s, loss=0.00252, lr=0.001]
Steps: 83%|████████▎ | 828/1000 [08:47<01:48, 1.58it/s, loss=0.00252, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4078],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4083], device='cuda:0')
Steps: 83%|████████▎ | 828/1000 [08:47<01:48, 1.58it/s, loss=0.00584, lr=0.001]
Steps: 83%|████████▎ | 829/1000 [08:48<01:48, 1.57it/s, loss=0.00584, lr=0.001]
Steps: 83%|████████▎ | 829/1000 [08:48<01:48, 1.57it/s, loss=0.013, lr=0.001]
Steps: 83%|████████▎ | 830/1000 [08:49<01:47, 1.59it/s, loss=0.013, lr=0.001]
tensor(0.0074, device='cuda:0')
tensor([[0.4079],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4084], device='cuda:0')
Steps: 83%|████████▎ | 830/1000 [08:49<01:47, 1.59it/s, loss=0.00509, lr=0.001]
Steps: 83%|████████▎ | 831/1000 [08:49<01:47, 1.57it/s, loss=0.00509, lr=0.001]
Steps: 83%|████████▎ | 831/1000 [08:49<01:47, 1.57it/s, loss=0.0333, lr=0.001]
Steps: 83%|████████▎ | 832/1000 [08:50<01:45, 1.59it/s, loss=0.0333, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4080],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4084], device='cuda:0')
Steps: 83%|████████▎ | 832/1000 [08:50<01:45, 1.59it/s, loss=0.00102, lr=0.001]
Steps: 83%|████████▎ | 833/1000 [08:51<01:46, 1.57it/s, loss=0.00102, lr=0.001]
Steps: 83%|████████▎ | 833/1000 [08:51<01:46, 1.57it/s, loss=0.00504, lr=0.001]
Steps: 83%|████████▎ | 834/1000 [08:51<01:45, 1.58it/s, loss=0.00504, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4081],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4084], device='cuda:0')
Steps: 83%|████████▎ | 834/1000 [08:51<01:45, 1.58it/s, loss=0.0171, lr=0.001]
Steps: 84%|████████▎ | 835/1000 [08:52<01:45, 1.56it/s, loss=0.0171, lr=0.001]
Steps: 84%|████████▎ | 835/1000 [08:52<01:45, 1.56it/s, loss=0.00969, lr=0.001]
Steps: 84%|████████▎ | 836/1000 [08:52<01:44, 1.58it/s, loss=0.00969, lr=0.001]
tensor(0.0010, device='cuda:0')
tensor([[0.4081],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4085], device='cuda:0')
Steps: 84%|████████▎ | 836/1000 [08:52<01:44, 1.58it/s, loss=0.00105, lr=0.001]
Steps: 84%|████████▎ | 837/1000 [08:53<01:44, 1.57it/s, loss=0.00105, lr=0.001]
Steps: 84%|████████▎ | 837/1000 [08:53<01:44, 1.57it/s, loss=0.000342, lr=0.001]
Steps: 84%|████████▍ | 838/1000 [08:54<01:42, 1.58it/s, loss=0.000342, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4082],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4085], device='cuda:0')
Steps: 84%|████████▍ | 838/1000 [08:54<01:42, 1.58it/s, loss=0.0134, lr=0.001]
Steps: 84%|████████▍ | 839/1000 [08:54<01:42, 1.57it/s, loss=0.0134, lr=0.001]
Steps: 84%|████████▍ | 839/1000 [08:54<01:42, 1.57it/s, loss=0.0579, lr=0.001]
Steps: 84%|████████▍ | 840/1000 [08:55<01:41, 1.57it/s, loss=0.0579, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4081],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4084], device='cuda:0')
Steps: 84%|████████▍ | 840/1000 [08:55<01:41, 1.57it/s, loss=0.00562, lr=0.001]
Steps: 84%|████████▍ | 841/1000 [08:56<01:41, 1.56it/s, loss=0.00562, lr=0.001]
Steps: 84%|████████▍ | 841/1000 [08:56<01:41, 1.56it/s, loss=0.000182, lr=0.001]
Steps: 84%|████████▍ | 842/1000 [08:56<01:40, 1.57it/s, loss=0.000182, lr=0.001]
tensor(0.0037, device='cuda:0')
tensor([[0.4080],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4083], device='cuda:0')
Steps: 84%|████████▍ | 842/1000 [08:56<01:40, 1.57it/s, loss=0.00387, lr=0.001]
Steps: 84%|████████▍ | 843/1000 [08:57<01:40, 1.57it/s, loss=0.00387, lr=0.001]
Steps: 84%|████████▍ | 843/1000 [08:57<01:40, 1.57it/s, loss=0.00379, lr=0.001]
Steps: 84%|████████▍ | 844/1000 [08:58<01:38, 1.59it/s, loss=0.00379, lr=0.001]
tensor(0.0019, device='cuda:0')
tensor([[0.4079],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4072, 0.4082], device='cuda:0')
Steps: 84%|████████▍ | 844/1000 [08:58<01:38, 1.59it/s, loss=0.00819, lr=0.001]
Steps: 84%|████████▍ | 845/1000 [08:58<01:38, 1.57it/s, loss=0.00819, lr=0.001]
Steps: 84%|████████▍ | 845/1000 [08:58<01:38, 1.57it/s, loss=0.00126, lr=0.001]
Steps: 85%|████████▍ | 846/1000 [08:59<01:37, 1.58it/s, loss=0.00126, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4078],
[0.4090]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4081], device='cuda:0')
Steps: 85%|████████▍ | 846/1000 [08:59<01:37, 1.58it/s, loss=0.000191, lr=0.001]
Steps: 85%|████████▍ | 847/1000 [08:59<01:37, 1.57it/s, loss=0.000191, lr=0.001]
Steps: 85%|████████▍ | 847/1000 [08:59<01:37, 1.57it/s, loss=0.00379, lr=0.001]
Steps: 85%|████████▍ | 848/1000 [09:00<01:35, 1.58it/s, loss=0.00379, lr=0.001]
tensor(0.0109, device='cuda:0')
tensor([[0.4077],
[0.4088]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4079], device='cuda:0')
Steps: 85%|████████▍ | 848/1000 [09:00<01:35, 1.58it/s, loss=0.0238, lr=0.001]
Steps: 85%|████████▍ | 849/1000 [09:01<01:35, 1.58it/s, loss=0.0238, lr=0.001]
Steps: 85%|████████▍ | 849/1000 [09:01<01:35, 1.58it/s, loss=0.0042, lr=0.001]
Steps: 85%|████████▌ | 850/1000 [09:01<01:34, 1.59it/s, loss=0.0042, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4075],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4078], device='cuda:0')
Steps: 85%|████████▌ | 850/1000 [09:01<01:34, 1.59it/s, loss=0.0154, lr=0.001]
Steps: 85%|████████▌ | 851/1000 [09:02<01:34, 1.58it/s, loss=0.0154, lr=0.001]
Steps: 85%|████████▌ | 851/1000 [09:02<01:34, 1.58it/s, loss=0.0067, lr=0.001]
Steps: 85%|████████▌ | 852/1000 [09:03<01:33, 1.58it/s, loss=0.0067, lr=0.001]
tensor(0.0052, device='cuda:0')
tensor([[0.4074],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4076], device='cuda:0')
Steps: 85%|████████▌ | 852/1000 [09:03<01:33, 1.58it/s, loss=0.000584, lr=0.001]
Steps: 85%|████████▌ | 853/1000 [09:03<01:33, 1.56it/s, loss=0.000584, lr=0.001]
Steps: 85%|████████▌ | 853/1000 [09:03<01:33, 1.56it/s, loss=0.0849, lr=0.001]
Steps: 85%|████████▌ | 854/1000 [09:04<01:32, 1.57it/s, loss=0.0849, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4073],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4076], device='cuda:0')
Steps: 85%|████████▌ | 854/1000 [09:04<01:32, 1.57it/s, loss=0.036, lr=0.001]
Steps: 86%|████████▌ | 855/1000 [09:05<01:32, 1.56it/s, loss=0.036, lr=0.001]
Steps: 86%|████████▌ | 855/1000 [09:05<01:32, 1.56it/s, loss=0.016, lr=0.001]
Steps: 86%|████████▌ | 856/1000 [09:05<01:31, 1.58it/s, loss=0.016, lr=0.001]
tensor(0.0053, device='cuda:0')
tensor([[0.4074],
[0.4084]], device='cuda:0')
Current Norm : tensor([0.4066, 0.4076], device='cuda:0')
Steps: 86%|████████▌ | 856/1000 [09:05<01:31, 1.58it/s, loss=0.018, lr=0.001]
Steps: 86%|████████▌ | 857/1000 [09:06<01:31, 1.57it/s, loss=0.018, lr=0.001]
Steps: 86%|████████▌ | 857/1000 [09:06<01:31, 1.57it/s, loss=0.0116, lr=0.001]
Steps: 86%|████████▌ | 858/1000 [09:06<01:29, 1.58it/s, loss=0.0116, lr=0.001]
tensor(0.0047, device='cuda:0')
tensor([[0.4074],
[0.4085]], device='cuda:0')
Current Norm : tensor([0.4067, 0.4077], device='cuda:0')
Steps: 86%|████████▌ | 858/1000 [09:06<01:29, 1.58it/s, loss=0.00814, lr=0.001]
Steps: 86%|████████▌ | 859/1000 [09:07<01:29, 1.57it/s, loss=0.00814, lr=0.001]
Steps: 86%|████████▌ | 859/1000 [09:07<01:29, 1.57it/s, loss=0.00405, lr=0.001]
Steps: 86%|████████▌ | 860/1000 [09:08<01:28, 1.58it/s, loss=0.00405, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4075],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4068, 0.4077], device='cuda:0')
Steps: 86%|████████▌ | 860/1000 [09:08<01:28, 1.58it/s, loss=0.0098, lr=0.001]
Steps: 86%|████████▌ | 861/1000 [09:08<01:28, 1.57it/s, loss=0.0098, lr=0.001]
Steps: 86%|████████▌ | 861/1000 [09:08<01:28, 1.57it/s, loss=0.01, lr=0.001]
Steps: 86%|████████▌ | 862/1000 [09:09<01:27, 1.58it/s, loss=0.01, lr=0.001]
tensor(0.0073, device='cuda:0')
tensor([[0.4077],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4069, 0.4077], device='cuda:0')
Steps: 86%|████████▌ | 862/1000 [09:09<01:27, 1.58it/s, loss=0.00465, lr=0.001]
Steps: 86%|████████▋ | 863/1000 [09:10<01:27, 1.57it/s, loss=0.00465, lr=0.001]
Steps: 86%|████████▋ | 863/1000 [09:10<01:27, 1.57it/s, loss=0.0772, lr=0.001]
Steps: 86%|████████▋ | 864/1000 [09:10<01:26, 1.58it/s, loss=0.0772, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4079],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4077], device='cuda:0')
Steps: 86%|████████▋ | 864/1000 [09:10<01:26, 1.58it/s, loss=0.000222, lr=0.001]
Steps: 86%|████████▋ | 865/1000 [09:11<01:26, 1.56it/s, loss=0.000222, lr=0.001]
Steps: 86%|████████▋ | 865/1000 [09:11<01:26, 1.56it/s, loss=0.015, lr=0.001]
Steps: 87%|████████▋ | 866/1000 [09:12<01:24, 1.58it/s, loss=0.015, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4081],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4078], device='cuda:0')
Steps: 87%|████████▋ | 866/1000 [09:12<01:24, 1.58it/s, loss=0.0187, lr=0.001]
Steps: 87%|████████▋ | 867/1000 [09:12<01:25, 1.56it/s, loss=0.0187, lr=0.001]
Steps: 87%|████████▋ | 867/1000 [09:12<01:25, 1.56it/s, loss=0.0145, lr=0.001]
Steps: 87%|████████▋ | 868/1000 [09:13<01:23, 1.57it/s, loss=0.0145, lr=0.001]
tensor(0.0118, device='cuda:0')
tensor([[0.4083],
[0.4087]], device='cuda:0')
Current Norm : tensor([0.4074, 0.4079], device='cuda:0')
Steps: 87%|████████▋ | 868/1000 [09:13<01:23, 1.57it/s, loss=0.0673, lr=0.001]
Steps: 87%|████████▋ | 869/1000 [09:13<01:23, 1.56it/s, loss=0.0673, lr=0.001]
Steps: 87%|████████▋ | 869/1000 [09:13<01:23, 1.56it/s, loss=0.052, lr=0.001]
Steps: 87%|████████▋ | 870/1000 [09:14<01:22, 1.58it/s, loss=0.052, lr=0.001]
tensor(0.0075, device='cuda:0')
tensor([[0.4086],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4077, 0.4080], device='cuda:0')
Steps: 87%|████████▋ | 870/1000 [09:14<01:22, 1.58it/s, loss=0.0164, lr=0.001]
Steps: 87%|████████▋ | 871/1000 [09:15<01:22, 1.57it/s, loss=0.0164, lr=0.001]
Steps: 87%|████████▋ | 871/1000 [09:15<01:22, 1.57it/s, loss=0.0497, lr=0.001]
Steps: 87%|████████▋ | 872/1000 [09:15<01:20, 1.58it/s, loss=0.0497, lr=0.001]
tensor(0.0013, device='cuda:0')
tensor([[0.4089],
[0.4091]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4082], device='cuda:0')
Steps: 87%|████████▋ | 872/1000 [09:15<01:20, 1.58it/s, loss=0.000777, lr=0.001]
Steps: 87%|████████▋ | 873/1000 [09:16<01:20, 1.57it/s, loss=0.000777, lr=0.001]
Steps: 87%|████████▋ | 873/1000 [09:16<01:20, 1.57it/s, loss=0.00841, lr=0.001]
Steps: 87%|████████▋ | 874/1000 [09:17<01:19, 1.58it/s, loss=0.00841, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4092],
[0.4093]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4084], device='cuda:0')
Steps: 87%|████████▋ | 874/1000 [09:17<01:19, 1.58it/s, loss=0.0873, lr=0.001]
Steps: 88%|████████▊ | 875/1000 [09:17<01:19, 1.57it/s, loss=0.0873, lr=0.001]
Steps: 88%|████████▊ | 875/1000 [09:17<01:19, 1.57it/s, loss=0.000283, lr=0.001]
Steps: 88%|████████▊ | 876/1000 [09:18<01:18, 1.57it/s, loss=0.000283, lr=0.001]
tensor(0.0094, device='cuda:0')
tensor([[0.4094],
[0.4095]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4085], device='cuda:0')
Steps: 88%|████████▊ | 876/1000 [09:18<01:18, 1.57it/s, loss=0.0306, lr=0.001]
Steps: 88%|████████▊ | 877/1000 [09:19<01:18, 1.56it/s, loss=0.0306, lr=0.001]
Steps: 88%|████████▊ | 877/1000 [09:19<01:18, 1.56it/s, loss=0.0644, lr=0.001]
Steps: 88%|████████▊ | 878/1000 [09:19<01:17, 1.58it/s, loss=0.0644, lr=0.001]
tensor(0.0087, device='cuda:0')
tensor([[0.4096],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4087, 0.4089], device='cuda:0')
Steps: 88%|████████▊ | 878/1000 [09:19<01:17, 1.58it/s, loss=0.0373, lr=0.001]
Steps: 88%|████████▊ | 879/1000 [09:20<01:16, 1.58it/s, loss=0.0373, lr=0.001]
Steps: 88%|████████▊ | 879/1000 [09:20<01:16, 1.58it/s, loss=0.00094, lr=0.001]
Steps: 88%|████████▊ | 880/1000 [09:20<01:15, 1.59it/s, loss=0.00094, lr=0.001]
tensor(0.0046, device='cuda:0')
tensor([[0.4099],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4092], device='cuda:0')
Steps: 88%|████████▊ | 880/1000 [09:20<01:15, 1.59it/s, loss=0.0125, lr=0.001]
Steps: 88%|████████▊ | 881/1000 [09:21<01:15, 1.58it/s, loss=0.0125, lr=0.001]
Steps: 88%|████████▊ | 881/1000 [09:21<01:15, 1.58it/s, loss=0.0143, lr=0.001]
Steps: 88%|████████▊ | 882/1000 [09:22<01:14, 1.59it/s, loss=0.0143, lr=0.001]
tensor(0.0109, device='cuda:0')
tensor([[0.4102],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4096], device='cuda:0')
Steps: 88%|████████▊ | 882/1000 [09:22<01:14, 1.59it/s, loss=0.0168, lr=0.001]
Steps: 88%|████████▊ | 883/1000 [09:22<01:14, 1.58it/s, loss=0.0168, lr=0.001]
Steps: 88%|████████▊ | 883/1000 [09:22<01:14, 1.58it/s, loss=0.00371, lr=0.001]
Steps: 88%|████████▊ | 884/1000 [09:23<01:13, 1.59it/s, loss=0.00371, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4104],
[0.4110]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4099], device='cuda:0')
Steps: 88%|████████▊ | 884/1000 [09:23<01:13, 1.59it/s, loss=0.000795, lr=0.001]
Steps: 88%|████████▊ | 885/1000 [09:24<01:13, 1.58it/s, loss=0.000795, lr=0.001]
Steps: 88%|████████▊ | 885/1000 [09:24<01:13, 1.58it/s, loss=0.00487, lr=0.001]
Steps: 89%|████████▊ | 886/1000 [09:24<01:12, 1.58it/s, loss=0.00487, lr=0.001]
tensor(0.0044, device='cuda:0')
tensor([[0.4106],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4101], device='cuda:0')
Steps: 89%|████████▊ | 886/1000 [09:24<01:12, 1.58it/s, loss=0.000165, lr=0.001]
Steps: 89%|████████▊ | 887/1000 [09:25<01:11, 1.57it/s, loss=0.000165, lr=0.001]
Steps: 89%|████████▊ | 887/1000 [09:25<01:11, 1.57it/s, loss=0.011, lr=0.001]
Steps: 89%|████████▉ | 888/1000 [09:25<01:10, 1.59it/s, loss=0.011, lr=0.001]
tensor(0.0014, device='cuda:0')
tensor([[0.4108],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4103], device='cuda:0')
Steps: 89%|████████▉ | 888/1000 [09:25<01:10, 1.59it/s, loss=0.000376, lr=0.001]
Steps: 89%|████████▉ | 889/1000 [09:26<01:10, 1.58it/s, loss=0.000376, lr=0.001]
Steps: 89%|████████▉ | 889/1000 [09:26<01:10, 1.58it/s, loss=0.00271, lr=0.001]
Steps: 89%|████████▉ | 890/1000 [09:27<01:09, 1.58it/s, loss=0.00271, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4109],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4105], device='cuda:0')
Steps: 89%|████████▉ | 890/1000 [09:27<01:09, 1.58it/s, loss=0.107, lr=0.001]
Steps: 89%|████████▉ | 891/1000 [09:27<01:09, 1.57it/s, loss=0.107, lr=0.001]
Steps: 89%|████████▉ | 891/1000 [09:27<01:09, 1.57it/s, loss=0.00925, lr=0.001]
Steps: 89%|████████▉ | 892/1000 [09:28<01:07, 1.59it/s, loss=0.00925, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4110],
[0.4118]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4106], device='cuda:0')
Steps: 89%|████████▉ | 892/1000 [09:28<01:07, 1.59it/s, loss=0.00599, lr=0.001]
Steps: 89%|████████▉ | 893/1000 [09:29<01:07, 1.58it/s, loss=0.00599, lr=0.001]
Steps: 89%|████████▉ | 893/1000 [09:29<01:07, 1.58it/s, loss=0.00141, lr=0.001]
Steps: 89%|████████▉ | 894/1000 [09:29<01:06, 1.60it/s, loss=0.00141, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4110],
[0.4119]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4107], device='cuda:0')
Steps: 89%|████████▉ | 894/1000 [09:29<01:06, 1.60it/s, loss=0.00575, lr=0.001]
Steps: 90%|████████▉ | 895/1000 [09:30<01:06, 1.58it/s, loss=0.00575, lr=0.001]
Steps: 90%|████████▉ | 895/1000 [09:30<01:06, 1.58it/s, loss=0.00116, lr=0.001]
Steps: 90%|████████▉ | 896/1000 [09:31<01:05, 1.59it/s, loss=0.00116, lr=0.001]
tensor(0.0084, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 90%|████████▉ | 896/1000 [09:31<01:05, 1.59it/s, loss=0.0673, lr=0.001]
Steps: 90%|████████▉ | 897/1000 [09:31<01:05, 1.58it/s, loss=0.0673, lr=0.001]
Steps: 90%|████████▉ | 897/1000 [09:31<01:05, 1.58it/s, loss=0.0103, lr=0.001]
Steps: 90%|████████▉ | 898/1000 [09:32<01:04, 1.59it/s, loss=0.0103, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 90%|████████▉ | 898/1000 [09:32<01:04, 1.59it/s, loss=0.04, lr=0.001]
Steps: 90%|████████▉ | 899/1000 [09:32<01:03, 1.58it/s, loss=0.04, lr=0.001]
Steps: 90%|████████▉ | 899/1000 [09:32<01:03, 1.58it/s, loss=0.0017, lr=0.001]
Steps: 90%|█████████ | 900/1000 [09:33<01:02, 1.59it/s, loss=0.0017, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0191, 0.0026, -0.0084, -0.0223], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0061, 0.0070, -0.0055, 0.0103], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_900.safetensors
tensor(0.0058, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 90%|█████████ | 900/1000 [09:33<01:02, 1.59it/s, loss=0.00088, lr=0.001]
Steps: 90%|█████████ | 901/1000 [09:34<01:02, 1.58it/s, loss=0.00088, lr=0.001]
Steps: 90%|█████████ | 901/1000 [09:34<01:02, 1.58it/s, loss=0.0159, lr=0.001]
Steps: 90%|█████████ | 902/1000 [09:34<01:01, 1.58it/s, loss=0.0159, lr=0.001]
tensor(0.0033, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 90%|█████████ | 902/1000 [09:34<01:01, 1.58it/s, loss=0.00016, lr=0.001]
Steps: 90%|█████████ | 903/1000 [09:35<01:01, 1.57it/s, loss=0.00016, lr=0.001]
Steps: 90%|█████████ | 903/1000 [09:35<01:01, 1.57it/s, loss=0.00761, lr=0.001]
Steps: 90%|█████████ | 904/1000 [09:36<01:00, 1.58it/s, loss=0.00761, lr=0.001]
tensor(0.0021, device='cuda:0')
tensor([[0.4110],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4110], device='cuda:0')
Steps: 90%|█████████ | 904/1000 [09:36<01:00, 1.58it/s, loss=0.00334, lr=0.001]
Steps: 90%|█████████ | 905/1000 [09:36<01:00, 1.57it/s, loss=0.00334, lr=0.001]
Steps: 90%|█████████ | 905/1000 [09:36<01:00, 1.57it/s, loss=0.00641, lr=0.001]
Steps: 91%|█████████ | 906/1000 [09:37<00:59, 1.59it/s, loss=0.00641, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4110],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 906/1000 [09:37<00:59, 1.59it/s, loss=0.00472, lr=0.001]
Steps: 91%|█████████ | 907/1000 [09:38<00:58, 1.58it/s, loss=0.00472, lr=0.001]
Steps: 91%|█████████ | 907/1000 [09:38<00:58, 1.58it/s, loss=0.00117, lr=0.001]
Steps: 91%|█████████ | 908/1000 [09:38<00:57, 1.59it/s, loss=0.00117, lr=0.001]
tensor(0.0101, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 908/1000 [09:38<00:57, 1.59it/s, loss=0.0283, lr=0.001]
Steps: 91%|█████████ | 909/1000 [09:39<00:57, 1.58it/s, loss=0.0283, lr=0.001]
Steps: 91%|█████████ | 909/1000 [09:39<00:57, 1.58it/s, loss=0.0145, lr=0.001]
Steps: 91%|█████████ | 910/1000 [09:39<00:56, 1.58it/s, loss=0.0145, lr=0.001]
tensor(0.0054, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 910/1000 [09:39<00:56, 1.58it/s, loss=0.000187, lr=0.001]
Steps: 91%|█████████ | 911/1000 [09:40<00:56, 1.56it/s, loss=0.000187, lr=0.001]
Steps: 91%|█████████ | 911/1000 [09:40<00:56, 1.56it/s, loss=0.0178, lr=0.001]
Steps: 91%|█████████ | 912/1000 [09:41<00:55, 1.58it/s, loss=0.0178, lr=0.001]
tensor(0.0091, device='cuda:0')
tensor([[0.4110],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4109], device='cuda:0')
Steps: 91%|█████████ | 912/1000 [09:41<00:55, 1.58it/s, loss=0.0288, lr=0.001]
Steps: 91%|█████████▏| 913/1000 [09:41<00:55, 1.57it/s, loss=0.0288, lr=0.001]
Steps: 91%|█████████▏| 913/1000 [09:41<00:55, 1.57it/s, loss=0.0104, lr=0.001]
Steps: 91%|█████████▏| 914/1000 [09:42<00:54, 1.58it/s, loss=0.0104, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4111],
[0.4122]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4109], device='cuda:0')
Steps: 91%|█████████▏| 914/1000 [09:42<00:54, 1.58it/s, loss=0.00321, lr=0.001]
Steps: 92%|█████████▏| 915/1000 [09:43<00:54, 1.57it/s, loss=0.00321, lr=0.001]
Steps: 92%|█████████▏| 915/1000 [09:43<00:54, 1.57it/s, loss=0.00369, lr=0.001]
Steps: 92%|█████████▏| 916/1000 [09:43<00:53, 1.58it/s, loss=0.00369, lr=0.001]
tensor(0.0083, device='cuda:0')
tensor([[0.4111],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4109], device='cuda:0')
Steps: 92%|█████████▏| 916/1000 [09:43<00:53, 1.58it/s, loss=0.0451, lr=0.001]
Steps: 92%|█████████▏| 917/1000 [09:44<00:52, 1.57it/s, loss=0.0451, lr=0.001]
Steps: 92%|█████████▏| 917/1000 [09:44<00:52, 1.57it/s, loss=0.00412, lr=0.001]
Steps: 92%|█████████▏| 918/1000 [09:44<00:51, 1.58it/s, loss=0.00412, lr=0.001]
tensor(0.0099, device='cuda:0')
tensor([[0.4111],
[0.4121]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4109], device='cuda:0')
Steps: 92%|█████████▏| 918/1000 [09:44<00:51, 1.58it/s, loss=0.0192, lr=0.001]
Steps: 92%|█████████▏| 919/1000 [09:45<00:51, 1.57it/s, loss=0.0192, lr=0.001]
Steps: 92%|█████████▏| 919/1000 [09:45<00:51, 1.57it/s, loss=0.00786, lr=0.001]
Steps: 92%|█████████▏| 920/1000 [09:46<00:50, 1.58it/s, loss=0.00786, lr=0.001]
tensor(0.0030, device='cuda:0')
tensor([[0.4111],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4108], device='cuda:0')
Steps: 92%|█████████▏| 920/1000 [09:46<00:50, 1.58it/s, loss=0.00157, lr=0.001]
Steps: 92%|█████████▏| 921/1000 [09:46<00:50, 1.57it/s, loss=0.00157, lr=0.001]
Steps: 92%|█████████▏| 921/1000 [09:46<00:50, 1.57it/s, loss=0.00689, lr=0.001]
Steps: 92%|█████████▏| 922/1000 [09:47<00:49, 1.58it/s, loss=0.00689, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4111],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4100, 0.4108], device='cuda:0')
Steps: 92%|█████████▏| 922/1000 [09:47<00:49, 1.58it/s, loss=0.00361, lr=0.001]
Steps: 92%|█████████▏| 923/1000 [09:48<00:49, 1.56it/s, loss=0.00361, lr=0.001]
Steps: 92%|█████████▏| 923/1000 [09:48<00:49, 1.56it/s, loss=0.000176, lr=0.001]
Steps: 92%|█████████▏| 924/1000 [09:48<00:48, 1.57it/s, loss=0.000176, lr=0.001]
tensor(0.0025, device='cuda:0')
tensor([[0.4110],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4099, 0.4108], device='cuda:0')
Steps: 92%|█████████▏| 924/1000 [09:48<00:48, 1.57it/s, loss=0.0106, lr=0.001]
Steps: 92%|█████████▎| 925/1000 [09:49<00:47, 1.57it/s, loss=0.0106, lr=0.001]
Steps: 92%|█████████▎| 925/1000 [09:49<00:47, 1.57it/s, loss=0.0317, lr=0.001]
Steps: 93%|█████████▎| 926/1000 [09:50<00:46, 1.59it/s, loss=0.0317, lr=0.001]
tensor(0.0038, device='cuda:0')
tensor([[0.4109],
[0.4120]], device='cuda:0')
Current Norm : tensor([0.4098, 0.4108], device='cuda:0')
Steps: 93%|█████████▎| 926/1000 [09:50<00:46, 1.59it/s, loss=0.000973, lr=0.001]
Steps: 93%|█████████▎| 927/1000 [09:50<00:46, 1.58it/s, loss=0.000973, lr=0.001]
Steps: 93%|█████████▎| 927/1000 [09:50<00:46, 1.58it/s, loss=0.00414, lr=0.001]
Steps: 93%|█████████▎| 928/1000 [09:51<00:45, 1.59it/s, loss=0.00414, lr=0.001]
tensor(0.0077, device='cuda:0')
tensor([[0.4108],
[0.4119]], device='cuda:0')
Current Norm : tensor([0.4097, 0.4107], device='cuda:0')
Steps: 93%|█████████▎| 928/1000 [09:51<00:45, 1.59it/s, loss=0.00288, lr=0.001]
Steps: 93%|█████████▎| 929/1000 [09:51<00:44, 1.58it/s, loss=0.00288, lr=0.001]
Steps: 93%|█████████▎| 929/1000 [09:51<00:44, 1.58it/s, loss=0.0412, lr=0.001]
Steps: 93%|█████████▎| 930/1000 [09:52<00:44, 1.58it/s, loss=0.0412, lr=0.001]
tensor(0.0071, device='cuda:0')
tensor([[0.4106],
[0.4117]], device='cuda:0')
Current Norm : tensor([0.4096, 0.4106], device='cuda:0')
Steps: 93%|█████████▎| 930/1000 [09:52<00:44, 1.58it/s, loss=0.000763, lr=0.001]
Steps: 93%|█████████▎| 931/1000 [09:53<00:43, 1.57it/s, loss=0.000763, lr=0.001]
Steps: 93%|█████████▎| 931/1000 [09:53<00:43, 1.57it/s, loss=0.0899, lr=0.001]
Steps: 93%|█████████▎| 932/1000 [09:53<00:43, 1.58it/s, loss=0.0899, lr=0.001]
tensor(0.0035, device='cuda:0')
tensor([[0.4105],
[0.4116]], device='cuda:0')
Current Norm : tensor([0.4094, 0.4104], device='cuda:0')
Steps: 93%|█████████▎| 932/1000 [09:53<00:43, 1.58it/s, loss=0.00604, lr=0.001]
Steps: 93%|█████████▎| 933/1000 [09:54<00:42, 1.58it/s, loss=0.00604, lr=0.001]
Steps: 93%|█████████▎| 933/1000 [09:54<00:42, 1.58it/s, loss=0.00167, lr=0.001]
Steps: 93%|█████████▎| 934/1000 [09:55<00:41, 1.59it/s, loss=0.00167, lr=0.001]
tensor(0.0070, device='cuda:0')
tensor([[0.4102],
[0.4115]], device='cuda:0')
Current Norm : tensor([0.4092, 0.4103], device='cuda:0')
Steps: 93%|█████████▎| 934/1000 [09:55<00:41, 1.59it/s, loss=0.00276, lr=0.001]
Steps: 94%|█████████▎| 935/1000 [09:55<00:41, 1.57it/s, loss=0.00276, lr=0.001]
Steps: 94%|█████████▎| 935/1000 [09:55<00:41, 1.57it/s, loss=0.0862, lr=0.001]
Steps: 94%|█████████▎| 936/1000 [09:56<00:40, 1.58it/s, loss=0.0862, lr=0.001]
tensor(0.0059, device='cuda:0')
tensor([[0.4099],
[0.4114]], device='cuda:0')
Current Norm : tensor([0.4089, 0.4103], device='cuda:0')
Steps: 94%|█████████▎| 936/1000 [09:56<00:40, 1.58it/s, loss=0.0186, lr=0.001]
Steps: 94%|█████████▎| 937/1000 [09:57<00:40, 1.57it/s, loss=0.0186, lr=0.001]
Steps: 94%|█████████▎| 937/1000 [09:57<00:40, 1.57it/s, loss=0.000916, lr=0.001]
Steps: 94%|█████████▍| 938/1000 [09:57<00:39, 1.58it/s, loss=0.000916, lr=0.001]
tensor(0.0056, device='cuda:0')
tensor([[0.4096],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4102], device='cuda:0')
Steps: 94%|█████████▍| 938/1000 [09:57<00:39, 1.58it/s, loss=0.0088, lr=0.001]
Steps: 94%|█████████▍| 939/1000 [09:58<00:38, 1.57it/s, loss=0.0088, lr=0.001]
Steps: 94%|█████████▍| 939/1000 [09:58<00:38, 1.57it/s, loss=0.000176, lr=0.001]
Steps: 94%|█████████▍| 940/1000 [09:58<00:37, 1.59it/s, loss=0.000176, lr=0.001]
tensor(0.0058, device='cuda:0')
tensor([[0.4093],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4101], device='cuda:0')
Steps: 94%|█████████▍| 940/1000 [09:58<00:37, 1.59it/s, loss=0.0042, lr=0.001]
Steps: 94%|█████████▍| 941/1000 [09:59<00:37, 1.57it/s, loss=0.0042, lr=0.001]
Steps: 94%|█████████▍| 941/1000 [09:59<00:37, 1.57it/s, loss=0.0133, lr=0.001]
Steps: 94%|█████████▍| 942/1000 [10:00<00:36, 1.58it/s, loss=0.0133, lr=0.001]
tensor(0.0099, device='cuda:0')
tensor([[0.4091],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4101], device='cuda:0')
Steps: 94%|█████████▍| 942/1000 [10:00<00:36, 1.58it/s, loss=0.092, lr=0.001]
Steps: 94%|█████████▍| 943/1000 [10:00<00:36, 1.57it/s, loss=0.092, lr=0.001]
Steps: 94%|█████████▍| 943/1000 [10:00<00:36, 1.57it/s, loss=0.0613, lr=0.001]
Steps: 94%|█████████▍| 944/1000 [10:01<00:35, 1.59it/s, loss=0.0613, lr=0.001]
tensor(0.0089, device='cuda:0')
tensor([[0.4090],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4101], device='cuda:0')
Steps: 94%|█████████▍| 944/1000 [10:01<00:35, 1.59it/s, loss=0.019, lr=0.001]
Steps: 94%|█████████▍| 945/1000 [10:02<00:34, 1.58it/s, loss=0.019, lr=0.001]
Steps: 94%|█████████▍| 945/1000 [10:02<00:34, 1.58it/s, loss=0.0166, lr=0.001]
Steps: 95%|█████████▍| 946/1000 [10:02<00:34, 1.58it/s, loss=0.0166, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4090],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4101], device='cuda:0')
Steps: 95%|█████████▍| 946/1000 [10:02<00:34, 1.58it/s, loss=0.00476, lr=0.001]
Steps: 95%|█████████▍| 947/1000 [10:03<00:33, 1.57it/s, loss=0.00476, lr=0.001]
Steps: 95%|█████████▍| 947/1000 [10:03<00:33, 1.57it/s, loss=0.00987, lr=0.001]
Steps: 95%|█████████▍| 948/1000 [10:04<00:32, 1.58it/s, loss=0.00987, lr=0.001]
tensor(0.0051, device='cuda:0')
tensor([[0.4090],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4102], device='cuda:0')
Steps: 95%|█████████▍| 948/1000 [10:04<00:32, 1.58it/s, loss=0.00904, lr=0.001]
Steps: 95%|█████████▍| 949/1000 [10:04<00:32, 1.57it/s, loss=0.00904, lr=0.001]
Steps: 95%|█████████▍| 949/1000 [10:04<00:32, 1.57it/s, loss=0.00788, lr=0.001]
Steps: 95%|█████████▌| 950/1000 [10:05<00:31, 1.58it/s, loss=0.00788, lr=0.001]
tensor(0.0036, device='cuda:0')
tensor([[0.4091],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4102], device='cuda:0')
Steps: 95%|█████████▌| 950/1000 [10:05<00:31, 1.58it/s, loss=0.00126, lr=0.001]
Steps: 95%|█████████▌| 951/1000 [10:05<00:31, 1.56it/s, loss=0.00126, lr=0.001]
Steps: 95%|█████████▌| 951/1000 [10:05<00:31, 1.56it/s, loss=0.00716, lr=0.001]
Steps: 95%|█████████▌| 952/1000 [10:06<00:30, 1.58it/s, loss=0.00716, lr=0.001]
tensor(0.0069, device='cuda:0')
tensor([[0.4091],
[0.4113]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4102], device='cuda:0')
Steps: 95%|█████████▌| 952/1000 [10:06<00:30, 1.58it/s, loss=0.000269, lr=0.001]
Steps: 95%|█████████▌| 953/1000 [10:07<00:30, 1.56it/s, loss=0.000269, lr=0.001]
Steps: 95%|█████████▌| 953/1000 [10:07<00:30, 1.56it/s, loss=0.0689, lr=0.001]
Steps: 95%|█████████▌| 954/1000 [10:07<00:29, 1.58it/s, loss=0.0689, lr=0.001]
tensor(0.0086, device='cuda:0')
tensor([[0.4091],
[0.4112]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4101], device='cuda:0')
Steps: 95%|█████████▌| 954/1000 [10:07<00:29, 1.58it/s, loss=0.0537, lr=0.001]
Steps: 96%|█████████▌| 955/1000 [10:08<00:28, 1.56it/s, loss=0.0537, lr=0.001]
Steps: 96%|█████████▌| 955/1000 [10:08<00:28, 1.56it/s, loss=0.0128, lr=0.001]
Steps: 96%|█████████▌| 956/1000 [10:09<00:27, 1.58it/s, loss=0.0128, lr=0.001]
tensor(0.0061, device='cuda:0')
tensor([[0.4092],
[0.4111]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4100], device='cuda:0')
Steps: 96%|█████████▌| 956/1000 [10:09<00:27, 1.58it/s, loss=0.0269, lr=0.001]
Steps: 96%|█████████▌| 957/1000 [10:09<00:27, 1.57it/s, loss=0.0269, lr=0.001]
Steps: 96%|█████████▌| 957/1000 [10:09<00:27, 1.57it/s, loss=0.0107, lr=0.001]
Steps: 96%|█████████▌| 958/1000 [10:10<00:26, 1.58it/s, loss=0.0107, lr=0.001]
tensor(0.0027, device='cuda:0')
tensor([[0.4092],
[0.4110]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4099], device='cuda:0')
Steps: 96%|█████████▌| 958/1000 [10:10<00:26, 1.58it/s, loss=0.0081, lr=0.001]
Steps: 96%|█████████▌| 959/1000 [10:11<00:26, 1.57it/s, loss=0.0081, lr=0.001]
Steps: 96%|█████████▌| 959/1000 [10:11<00:26, 1.57it/s, loss=0.00707, lr=0.001]
Steps: 96%|█████████▌| 960/1000 [10:11<00:25, 1.59it/s, loss=0.00707, lr=0.001]
tensor(0.0100, device='cuda:0')
tensor([[0.4093],
[0.4109]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4098], device='cuda:0')
Steps: 96%|█████████▌| 960/1000 [10:11<00:25, 1.59it/s, loss=0.0223, lr=0.001]
Steps: 96%|█████████▌| 961/1000 [10:12<00:24, 1.58it/s, loss=0.0223, lr=0.001]
Steps: 96%|█████████▌| 961/1000 [10:12<00:24, 1.58it/s, loss=0.0405, lr=0.001]
Steps: 96%|█████████▌| 962/1000 [10:12<00:23, 1.59it/s, loss=0.0405, lr=0.001]
tensor(0.0028, device='cuda:0')
tensor([[0.4094],
[0.4108]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4097], device='cuda:0')
Steps: 96%|█████████▌| 962/1000 [10:12<00:23, 1.59it/s, loss=0.00538, lr=0.001]
Steps: 96%|█████████▋| 963/1000 [10:13<00:23, 1.57it/s, loss=0.00538, lr=0.001]
Steps: 96%|█████████▋| 963/1000 [10:13<00:23, 1.57it/s, loss=0.000406, lr=0.001]
Steps: 96%|█████████▋| 964/1000 [10:14<00:22, 1.58it/s, loss=0.000406, lr=0.001]
tensor(0.0081, device='cuda:0')
tensor([[0.4094],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4097], device='cuda:0')
Steps: 96%|█████████▋| 964/1000 [10:14<00:22, 1.58it/s, loss=0.00454, lr=0.001]
Steps: 96%|█████████▋| 965/1000 [10:14<00:22, 1.57it/s, loss=0.00454, lr=0.001]
Steps: 96%|█████████▋| 965/1000 [10:14<00:22, 1.57it/s, loss=0.0148, lr=0.001]
Steps: 97%|█████████▋| 966/1000 [10:15<00:21, 1.58it/s, loss=0.0148, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4095],
[0.4107]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4096], device='cuda:0')
Steps: 97%|█████████▋| 966/1000 [10:15<00:21, 1.58it/s, loss=0.000232, lr=0.001]
Steps: 97%|█████████▋| 967/1000 [10:16<00:21, 1.56it/s, loss=0.000232, lr=0.001]
Steps: 97%|█████████▋| 967/1000 [10:16<00:21, 1.56it/s, loss=0.00795, lr=0.001]
Steps: 97%|█████████▋| 968/1000 [10:16<00:20, 1.58it/s, loss=0.00795, lr=0.001]
tensor(0.0018, device='cuda:0')
tensor([[0.4095],
[0.4106]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4095], device='cuda:0')
Steps: 97%|█████████▋| 968/1000 [10:16<00:20, 1.58it/s, loss=0.00367, lr=0.001]
Steps: 97%|█████████▋| 969/1000 [10:17<00:19, 1.56it/s, loss=0.00367, lr=0.001]
Steps: 97%|█████████▋| 969/1000 [10:17<00:19, 1.56it/s, loss=0.00094, lr=0.001]
Steps: 97%|█████████▋| 970/1000 [10:17<00:19, 1.58it/s, loss=0.00094, lr=0.001]
tensor(0.0063, device='cuda:0')
tensor([[0.4095],
[0.4105]], device='cuda:0')
Current Norm : tensor([0.4086, 0.4095], device='cuda:0')
Steps: 97%|█████████▋| 970/1000 [10:17<00:19, 1.58it/s, loss=0.00166, lr=0.001]
Steps: 97%|█████████▋| 971/1000 [10:18<00:18, 1.56it/s, loss=0.00166, lr=0.001]
Steps: 97%|█████████▋| 971/1000 [10:18<00:18, 1.56it/s, loss=0.0325, lr=0.001]
Steps: 97%|█████████▋| 972/1000 [10:19<00:17, 1.58it/s, loss=0.0325, lr=0.001]
tensor(0.0064, device='cuda:0')
tensor([[0.4094],
[0.4104]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4094], device='cuda:0')
Steps: 97%|█████████▋| 972/1000 [10:19<00:17, 1.58it/s, loss=0.0677, lr=0.001]
Steps: 97%|█████████▋| 973/1000 [10:19<00:17, 1.57it/s, loss=0.0677, lr=0.001]
Steps: 97%|█████████▋| 973/1000 [10:19<00:17, 1.57it/s, loss=0.00378, lr=0.001]
Steps: 97%|█████████▋| 974/1000 [10:20<00:16, 1.57it/s, loss=0.00378, lr=0.001]
tensor(0.0032, device='cuda:0')
tensor([[0.4094],
[0.4103]], device='cuda:0')
Current Norm : tensor([0.4085, 0.4093], device='cuda:0')
Steps: 97%|█████████▋| 974/1000 [10:20<00:16, 1.57it/s, loss=0.00283, lr=0.001]
Steps: 98%|█████████▊| 975/1000 [10:21<00:15, 1.56it/s, loss=0.00283, lr=0.001]
Steps: 98%|█████████▊| 975/1000 [10:21<00:15, 1.56it/s, loss=0.0066, lr=0.001]
Steps: 98%|█████████▊| 976/1000 [10:21<00:15, 1.57it/s, loss=0.0066, lr=0.001]
tensor(0.0031, device='cuda:0')
tensor([[0.4094],
[0.4102]], device='cuda:0')
Current Norm : tensor([0.4084, 0.4092], device='cuda:0')
Steps: 98%|█████████▊| 976/1000 [10:21<00:15, 1.57it/s, loss=0.00529, lr=0.001]
Steps: 98%|█████████▊| 977/1000 [10:22<00:14, 1.56it/s, loss=0.00529, lr=0.001]
Steps: 98%|█████████▊| 977/1000 [10:22<00:14, 1.56it/s, loss=0.000486, lr=0.001]
Steps: 98%|█████████▊| 978/1000 [10:23<00:13, 1.57it/s, loss=0.000486, lr=0.001]
tensor(0.0005, device='cuda:0')
tensor([[0.4093],
[0.4100]], device='cuda:0')
Current Norm : tensor([0.4083, 0.4090], device='cuda:0')
Steps: 98%|█████████▊| 978/1000 [10:23<00:13, 1.57it/s, loss=0.000342, lr=0.001]
Steps: 98%|█████████▊| 979/1000 [10:23<00:13, 1.56it/s, loss=0.000342, lr=0.001]
Steps: 98%|█████████▊| 979/1000 [10:23<00:13, 1.56it/s, loss=0.000447, lr=0.001]
Steps: 98%|█████████▊| 980/1000 [10:24<00:12, 1.57it/s, loss=0.000447, lr=0.001]
tensor(0.0029, device='cuda:0')
tensor([[0.4092],
[0.4099]], device='cuda:0')
Current Norm : tensor([0.4082, 0.4089], device='cuda:0')
Steps: 98%|█████████▊| 980/1000 [10:24<00:12, 1.57it/s, loss=0.00251, lr=0.001]
Steps: 98%|█████████▊| 981/1000 [10:25<00:12, 1.57it/s, loss=0.00251, lr=0.001]
Steps: 98%|█████████▊| 981/1000 [10:25<00:12, 1.57it/s, loss=0.00702, lr=0.001]
Steps: 98%|█████████▊| 982/1000 [10:25<00:11, 1.58it/s, loss=0.00702, lr=0.001]
tensor(0.0039, device='cuda:0')
tensor([[0.4091],
[0.4097]], device='cuda:0')
Current Norm : tensor([0.4081, 0.4087], device='cuda:0')
Steps: 98%|█████████▊| 982/1000 [10:25<00:11, 1.58it/s, loss=0.00368, lr=0.001]
Steps: 98%|█████████▊| 983/1000 [10:26<00:10, 1.57it/s, loss=0.00368, lr=0.001]
Steps: 98%|█████████▊| 983/1000 [10:26<00:10, 1.57it/s, loss=0.00127, lr=0.001]
Steps: 98%|█████████▊| 984/1000 [10:26<00:10, 1.59it/s, loss=0.00127, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4089],
[0.4094]], device='cuda:0')
Current Norm : tensor([0.4080, 0.4085], device='cuda:0')
Steps: 98%|█████████▊| 984/1000 [10:26<00:10, 1.59it/s, loss=0.0239, lr=0.001]
Steps: 98%|█████████▊| 985/1000 [10:27<00:09, 1.58it/s, loss=0.0239, lr=0.001]
Steps: 98%|█████████▊| 985/1000 [10:27<00:09, 1.58it/s, loss=0.00117, lr=0.001]
Steps: 99%|█████████▊| 986/1000 [10:28<00:08, 1.59it/s, loss=0.00117, lr=0.001]
tensor(0.0016, device='cuda:0')
tensor([[0.4088],
[0.4092]], device='cuda:0')
Current Norm : tensor([0.4079, 0.4083], device='cuda:0')
Steps: 99%|█████████▊| 986/1000 [10:28<00:08, 1.59it/s, loss=0.000254, lr=0.001]
Steps: 99%|█████████▊| 987/1000 [10:28<00:08, 1.58it/s, loss=0.000254, lr=0.001]
Steps: 99%|█████████▊| 987/1000 [10:28<00:08, 1.58it/s, loss=0.0089, lr=0.001]
Steps: 99%|█████████▉| 988/1000 [10:29<00:07, 1.59it/s, loss=0.0089, lr=0.001]
tensor(0.0023, device='cuda:0')
tensor([[0.4087],
[0.4089]], device='cuda:0')
Current Norm : tensor([0.4078, 0.4080], device='cuda:0')
Steps: 99%|█████████▉| 988/1000 [10:29<00:07, 1.59it/s, loss=0.000653, lr=0.001]
Steps: 99%|█████████▉| 989/1000 [10:30<00:07, 1.57it/s, loss=0.000653, lr=0.001]
Steps: 99%|█████████▉| 989/1000 [10:30<00:07, 1.57it/s, loss=0.00647, lr=0.001]
Steps: 99%|█████████▉| 990/1000 [10:30<00:06, 1.58it/s, loss=0.00647, lr=0.001]
tensor(0.0026, device='cuda:0')
tensor([[0.4085],
[0.4086]], device='cuda:0')
Current Norm : tensor([0.4076, 0.4078], device='cuda:0')
Steps: 99%|█████████▉| 990/1000 [10:30<00:06, 1.58it/s, loss=0.00136, lr=0.001]
Steps: 99%|█████████▉| 991/1000 [10:31<00:05, 1.57it/s, loss=0.00136, lr=0.001]
Steps: 99%|█████████▉| 991/1000 [10:31<00:05, 1.57it/s, loss=0.0052, lr=0.001]
Steps: 99%|█████████▉| 992/1000 [10:31<00:05, 1.58it/s, loss=0.0052, lr=0.001]
tensor(0.0022, device='cuda:0')
tensor([[0.4083],
[0.4083]], device='cuda:0')
Current Norm : tensor([0.4075, 0.4075], device='cuda:0')
Steps: 99%|█████████▉| 992/1000 [10:31<00:05, 1.58it/s, loss=0.0147, lr=0.001]
Steps: 99%|█████████▉| 993/1000 [10:32<00:04, 1.57it/s, loss=0.0147, lr=0.001]
Steps: 99%|█████████▉| 993/1000 [10:32<00:04, 1.57it/s, loss=0.000513, lr=0.001]
Steps: 99%|█████████▉| 994/1000 [10:33<00:03, 1.59it/s, loss=0.000513, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4081],
[0.4080]], device='cuda:0')
Current Norm : tensor([0.4073, 0.4072], device='cuda:0')
Steps: 99%|█████████▉| 994/1000 [10:33<00:03, 1.59it/s, loss=0.0951, lr=0.001]
Steps: 100%|█████████▉| 995/1000 [10:33<00:03, 1.57it/s, loss=0.0951, lr=0.001]
Steps: 100%|█████████▉| 995/1000 [10:33<00:03, 1.57it/s, loss=0.000628, lr=0.001]
Steps: 100%|█████████▉| 996/1000 [10:34<00:02, 1.58it/s, loss=0.000628, lr=0.001]
tensor(0.0060, device='cuda:0')
tensor([[0.4079],
[0.4078]], device='cuda:0')
Current Norm : tensor([0.4071, 0.4070], device='cuda:0')
Steps: 100%|█████████▉| 996/1000 [10:34<00:02, 1.58it/s, loss=0.0297, lr=0.001]
Steps: 100%|█████████▉| 997/1000 [10:35<00:01, 1.57it/s, loss=0.0297, lr=0.001]
Steps: 100%|█████████▉| 997/1000 [10:35<00:01, 1.57it/s, loss=0.00103, lr=0.001]
Steps: 100%|█████████▉| 998/1000 [10:35<00:01, 1.58it/s, loss=0.00103, lr=0.001]
tensor(0.0121, device='cuda:0')
tensor([[0.4078],
[0.4076]], device='cuda:0')
Current Norm : tensor([0.4070, 0.4068], device='cuda:0')
Steps: 100%|█████████▉| 998/1000 [10:35<00:01, 1.58it/s, loss=0.0109, lr=0.001]
Steps: 100%|█████████▉| 999/1000 [10:36<00:00, 1.54it/s, loss=0.0109, lr=0.001]
Steps: 100%|█████████▉| 999/1000 [10:36<00:00, 1.54it/s, loss=0.0542, lr=0.001]
Steps: 100%|██████████| 1000/1000 [10:37<00:00, 1.56it/s, loss=0.0542, lr=0.001]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0',
grad_fn=<SliceBackward0>)
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0',
grad_fn=<SliceBackward0>)
Saving weights to checkpoints/step_inv_1000.safetensors
Steps: 100%|██████████| 1000/1000 [10:37<00:00, 1.56it/s, loss=0.0126, lr=0.001]
Steps: 100%|██████████| 1000/1000 [10:37<00:00, 1.57it/s, loss=0.0126, lr=0.001]
PTI : has 288 lora
PTI : Before training:
0%| | 0/700 [00:00<?, ?it/s]
Steps: 0%| | 0/700 [00:00<?, ?it/s]
Steps: 0%| | 1/700 [00:05<1:09:22, 5.95s/it]
Steps: 0%| | 1/700 [00:05<1:09:22, 5.95s/it, loss=0.0136, lr=0.0004]
Steps: 0%| | 2/700 [00:06<32:36, 2.80s/it, loss=0.0136, lr=0.0004]
Steps: 0%| | 2/700 [00:06<32:36, 2.80s/it, loss=0.0645, lr=0.0004]
Steps: 0%| | 3/700 [00:07<20:35, 1.77s/it, loss=0.0645, lr=0.0004]
Steps: 0%| | 3/700 [00:07<20:35, 1.77s/it, loss=0.114, lr=0.0004]
Steps: 1%| | 4/700 [00:07<15:02, 1.30s/it, loss=0.114, lr=0.0004]
Steps: 1%| | 4/700 [00:07<15:02, 1.30s/it, loss=0.0147, lr=0.0004]
Steps: 1%| | 5/700 [00:08<11:48, 1.02s/it, loss=0.0147, lr=0.0004]
Steps: 1%| | 5/700 [00:08<11:48, 1.02s/it, loss=0.0199, lr=0.0004]
Steps: 1%| | 6/700 [00:08<10:01, 1.15it/s, loss=0.0199, lr=0.0004]
Steps: 1%| | 6/700 [00:08<10:01, 1.15it/s, loss=0.127, lr=0.0004]
Steps: 1%| | 7/700 [00:09<08:50, 1.31it/s, loss=0.127, lr=0.0004]
Steps: 1%| | 7/700 [00:09<08:50, 1.31it/s, loss=0.194, lr=0.0004]
Steps: 1%| | 8/700 [00:09<08:06, 1.42it/s, loss=0.194, lr=0.0004]
Steps: 1%| | 8/700 [00:09<08:06, 1.42it/s, loss=0.0105, lr=0.0004]
Steps: 1%|▏ | 9/700 [00:10<07:33, 1.52it/s, loss=0.0105, lr=0.0004]
Steps: 1%|▏ | 9/700 [00:10<07:33, 1.52it/s, loss=0.0122, lr=0.0004]
Steps: 1%|▏ | 10/700 [00:10<07:01, 1.64it/s, loss=0.0122, lr=0.0004]
Steps: 1%|▏ | 10/700 [00:10<07:01, 1.64it/s, loss=0.0168, lr=0.0004]
Steps: 2%|▏ | 11/700 [00:11<06:43, 1.71it/s, loss=0.0168, lr=0.0004]
Steps: 2%|▏ | 11/700 [00:11<06:43, 1.71it/s, loss=0.126, lr=0.0004]
Steps: 2%|▏ | 12/700 [00:12<06:31, 1.76it/s, loss=0.126, lr=0.0004]
Steps: 2%|▏ | 12/700 [00:12<06:31, 1.76it/s, loss=0.0972, lr=0.0004]
Steps: 2%|▏ | 13/700 [00:12<06:22, 1.80it/s, loss=0.0972, lr=0.0004]
Steps: 2%|▏ | 13/700 [00:12<06:22, 1.80it/s, loss=0.176, lr=0.0004]
Steps: 2%|▏ | 14/700 [00:13<06:08, 1.86it/s, loss=0.176, lr=0.0004]
Steps: 2%|▏ | 14/700 [00:13<06:08, 1.86it/s, loss=0.00823, lr=0.0004]
Steps: 2%|▏ | 15/700 [00:13<06:01, 1.89it/s, loss=0.00823, lr=0.0004]
Steps: 2%|▏ | 15/700 [00:13<06:01, 1.89it/s, loss=0.0132, lr=0.0004]
Steps: 2%|▏ | 16/700 [00:14<05:55, 1.92it/s, loss=0.0132, lr=0.0004]
Steps: 2%|▏ | 16/700 [00:14<05:55, 1.92it/s, loss=0.0208, lr=0.0004]
Steps: 2%|▏ | 17/700 [00:14<05:52, 1.93it/s, loss=0.0208, lr=0.0004]
Steps: 2%|▏ | 17/700 [00:14<05:52, 1.93it/s, loss=0.0074, lr=0.0004]
Steps: 3%|▎ | 18/700 [00:15<06:02, 1.88it/s, loss=0.0074, lr=0.0004]
Steps: 3%|▎ | 18/700 [00:15<06:02, 1.88it/s, loss=0.00776, lr=0.0004]
Steps: 3%|▎ | 19/700 [00:15<06:10, 1.84it/s, loss=0.00776, lr=0.0004]
Steps: 3%|▎ | 19/700 [00:15<06:10, 1.84it/s, loss=0.0114, lr=0.0004]
Steps: 3%|▎ | 20/700 [00:16<06:19, 1.79it/s, loss=0.0114, lr=0.0004]
Steps: 3%|▎ | 20/700 [00:16<06:19, 1.79it/s, loss=0.0615, lr=0.0004]
Steps: 3%|▎ | 21/700 [00:16<06:25, 1.76it/s, loss=0.0615, lr=0.0004]
Steps: 3%|▎ | 21/700 [00:16<06:25, 1.76it/s, loss=0.00527, lr=0.0004]
Steps: 3%|▎ | 22/700 [00:17<06:24, 1.76it/s, loss=0.00527, lr=0.0004]
Steps: 3%|▎ | 22/700 [00:17<06:24, 1.76it/s, loss=0.0075, lr=0.0004]
Steps: 3%|▎ | 23/700 [00:18<06:29, 1.74it/s, loss=0.0075, lr=0.0004]
Steps: 3%|▎ | 23/700 [00:18<06:29, 1.74it/s, loss=0.027, lr=0.0004]
Steps: 3%|▎ | 24/700 [00:18<06:17, 1.79it/s, loss=0.027, lr=0.0004]
Steps: 3%|▎ | 24/700 [00:18<06:17, 1.79it/s, loss=0.0509, lr=0.0004]
Steps: 4%|▎ | 25/700 [00:19<06:09, 1.83it/s, loss=0.0509, lr=0.0004]
Steps: 4%|▎ | 25/700 [00:19<06:09, 1.83it/s, loss=0.0534, lr=0.0004]
Steps: 4%|▎ | 26/700 [00:19<06:11, 1.81it/s, loss=0.0534, lr=0.0004]
Steps: 4%|▎ | 26/700 [00:19<06:11, 1.81it/s, loss=0.0332, lr=0.0004]
Steps: 4%|▍ | 27/700 [00:20<06:05, 1.84it/s, loss=0.0332, lr=0.0004]
Steps: 4%|▍ | 27/700 [00:20<06:05, 1.84it/s, loss=0.134, lr=0.0004]
Steps: 4%|▍ | 28/700 [00:20<05:56, 1.89it/s, loss=0.134, lr=0.0004]
Steps: 4%|▍ | 28/700 [00:20<05:56, 1.89it/s, loss=0.0159, lr=0.0004]
Steps: 4%|▍ | 29/700 [00:21<05:51, 1.91it/s, loss=0.0159, lr=0.0004]
Steps: 4%|▍ | 29/700 [00:21<05:51, 1.91it/s, loss=0.00841, lr=0.0004]
Steps: 4%|▍ | 30/700 [00:21<05:52, 1.90it/s, loss=0.00841, lr=0.0004]
Steps: 4%|▍ | 30/700 [00:21<05:52, 1.90it/s, loss=0.0104, lr=0.0004]
Steps: 4%|▍ | 31/700 [00:22<05:55, 1.88it/s, loss=0.0104, lr=0.0004]
Steps: 4%|▍ | 31/700 [00:22<05:55, 1.88it/s, loss=0.0769, lr=0.0004]
Steps: 5%|▍ | 32/700 [00:22<05:50, 1.91it/s, loss=0.0769, lr=0.0004]
Steps: 5%|▍ | 32/700 [00:22<05:50, 1.91it/s, loss=0.0564, lr=0.0004]
Steps: 5%|▍ | 33/700 [00:23<05:48, 1.91it/s, loss=0.0564, lr=0.0004]
Steps: 5%|▍ | 33/700 [00:23<05:48, 1.91it/s, loss=0.00519, lr=0.0004]
Steps: 5%|▍ | 34/700 [00:23<05:44, 1.93it/s, loss=0.00519, lr=0.0004]
Steps: 5%|▍ | 34/700 [00:23<05:44, 1.93it/s, loss=0.00172, lr=0.0004]
Steps: 5%|▌ | 35/700 [00:24<05:43, 1.94it/s, loss=0.00172, lr=0.0004]
Steps: 5%|▌ | 35/700 [00:24<05:43, 1.94it/s, loss=0.00847, lr=0.0004]
Steps: 5%|▌ | 36/700 [00:24<05:40, 1.95it/s, loss=0.00847, lr=0.0004]
Steps: 5%|▌ | 36/700 [00:24<05:40, 1.95it/s, loss=0.00893, lr=0.0004]
Steps: 5%|▌ | 37/700 [00:25<05:38, 1.96it/s, loss=0.00893, lr=0.0004]
Steps: 5%|▌ | 37/700 [00:25<05:38, 1.96it/s, loss=0.00843, lr=0.0004]
Steps: 5%|▌ | 38/700 [00:25<05:40, 1.94it/s, loss=0.00843, lr=0.0004]
Steps: 5%|▌ | 38/700 [00:25<05:40, 1.94it/s, loss=0.00305, lr=0.0004]
Steps: 6%|▌ | 39/700 [00:26<05:48, 1.90it/s, loss=0.00305, lr=0.0004]
Steps: 6%|▌ | 39/700 [00:26<05:48, 1.90it/s, loss=0.012, lr=0.0004]
Steps: 6%|▌ | 40/700 [00:26<05:53, 1.87it/s, loss=0.012, lr=0.0004]
Steps: 6%|▌ | 40/700 [00:26<05:53, 1.87it/s, loss=0.0233, lr=0.0004]
Steps: 6%|▌ | 41/700 [00:27<05:56, 1.85it/s, loss=0.0233, lr=0.0004]
Steps: 6%|▌ | 41/700 [00:27<05:56, 1.85it/s, loss=0.0213, lr=0.0004]
Steps: 6%|▌ | 42/700 [00:28<05:57, 1.84it/s, loss=0.0213, lr=0.0004]
Steps: 6%|▌ | 42/700 [00:28<05:57, 1.84it/s, loss=0.00223, lr=0.0004]
Steps: 6%|▌ | 43/700 [00:28<06:00, 1.82it/s, loss=0.00223, lr=0.0004]
Steps: 6%|▌ | 43/700 [00:28<06:00, 1.82it/s, loss=0.0261, lr=0.0004]
Steps: 6%|▋ | 44/700 [00:29<06:10, 1.77it/s, loss=0.0261, lr=0.0004]
Steps: 6%|▋ | 44/700 [00:29<06:10, 1.77it/s, loss=0.0833, lr=0.0004]
Steps: 6%|▋ | 45/700 [00:29<06:12, 1.76it/s, loss=0.0833, lr=0.0004]
Steps: 6%|▋ | 45/700 [00:29<06:12, 1.76it/s, loss=0.0273, lr=0.0004]
Steps: 7%|▋ | 46/700 [00:30<06:14, 1.75it/s, loss=0.0273, lr=0.0004]
Steps: 7%|▋ | 46/700 [00:30<06:14, 1.75it/s, loss=0.00564, lr=0.0004]
Steps: 7%|▋ | 47/700 [00:30<06:12, 1.76it/s, loss=0.00564, lr=0.0004]
Steps: 7%|▋ | 47/700 [00:30<06:12, 1.76it/s, loss=0.0392, lr=0.0004]
Steps: 7%|▋ | 48/700 [00:31<06:07, 1.77it/s, loss=0.0392, lr=0.0004]
Steps: 7%|▋ | 48/700 [00:31<06:07, 1.77it/s, loss=0.00178, lr=0.0004]
Steps: 7%|▋ | 49/700 [00:32<06:06, 1.78it/s, loss=0.00178, lr=0.0004]
Steps: 7%|▋ | 49/700 [00:32<06:06, 1.78it/s, loss=0.0246, lr=0.0004]
Steps: 7%|▋ | 50/700 [00:32<06:04, 1.78it/s, loss=0.0246, lr=0.0004]
Steps: 7%|▋ | 50/700 [00:32<06:04, 1.78it/s, loss=0.00817, lr=0.0004]
Steps: 7%|▋ | 51/700 [00:33<06:10, 1.75it/s, loss=0.00817, lr=0.0004]
Steps: 7%|▋ | 51/700 [00:33<06:10, 1.75it/s, loss=0.0547, lr=0.0004]
Steps: 7%|▋ | 52/700 [00:33<06:10, 1.75it/s, loss=0.0547, lr=0.0004]
Steps: 7%|▋ | 52/700 [00:33<06:10, 1.75it/s, loss=0.0248, lr=0.0004]
Steps: 8%|▊ | 53/700 [00:34<06:16, 1.72it/s, loss=0.0248, lr=0.0004]
Steps: 8%|▊ | 53/700 [00:34<06:16, 1.72it/s, loss=0.0956, lr=0.0004]
Steps: 8%|▊ | 54/700 [00:34<06:09, 1.75it/s, loss=0.0956, lr=0.0004]
Steps: 8%|▊ | 54/700 [00:34<06:09, 1.75it/s, loss=0.0246, lr=0.0004]
Steps: 8%|▊ | 55/700 [00:35<06:06, 1.76it/s, loss=0.0246, lr=0.0004]
Steps: 8%|▊ | 55/700 [00:35<06:06, 1.76it/s, loss=0.0204, lr=0.0004]
Steps: 8%|▊ | 56/700 [00:36<06:02, 1.78it/s, loss=0.0204, lr=0.0004]
Steps: 8%|▊ | 56/700 [00:36<06:02, 1.78it/s, loss=0.00192, lr=0.0004]
Steps: 8%|▊ | 57/700 [00:36<06:03, 1.77it/s, loss=0.00192, lr=0.0004]
Steps: 8%|▊ | 57/700 [00:36<06:03, 1.77it/s, loss=0.0176, lr=0.0004]
Steps: 8%|▊ | 58/700 [00:37<06:03, 1.77it/s, loss=0.0176, lr=0.0004]
Steps: 8%|▊ | 58/700 [00:37<06:03, 1.77it/s, loss=0.0782, lr=0.0004]
Steps: 8%|▊ | 59/700 [00:37<06:04, 1.76it/s, loss=0.0782, lr=0.0004]
Steps: 8%|▊ | 59/700 [00:37<06:04, 1.76it/s, loss=0.297, lr=0.0004]
Steps: 9%|▊ | 60/700 [00:38<05:59, 1.78it/s, loss=0.297, lr=0.0004]
Steps: 9%|▊ | 60/700 [00:38<05:59, 1.78it/s, loss=0.0103, lr=0.0004]
Steps: 9%|▊ | 61/700 [00:38<05:56, 1.79it/s, loss=0.0103, lr=0.0004]
Steps: 9%|▊ | 61/700 [00:38<05:56, 1.79it/s, loss=0.00232, lr=0.0004]
Steps: 9%|▉ | 62/700 [00:39<05:57, 1.78it/s, loss=0.00232, lr=0.0004]
Steps: 9%|▉ | 62/700 [00:39<05:57, 1.78it/s, loss=0.135, lr=0.0004]
Steps: 9%|▉ | 63/700 [00:39<05:52, 1.81it/s, loss=0.135, lr=0.0004]
Steps: 9%|▉ | 63/700 [00:39<05:52, 1.81it/s, loss=0.0448, lr=0.0004]
Steps: 9%|▉ | 64/700 [00:40<05:49, 1.82it/s, loss=0.0448, lr=0.0004]
Steps: 9%|▉ | 64/700 [00:40<05:49, 1.82it/s, loss=0.0329, lr=0.0004]
Steps: 9%|▉ | 65/700 [00:41<05:48, 1.82it/s, loss=0.0329, lr=0.0004]
Steps: 9%|▉ | 65/700 [00:41<05:48, 1.82it/s, loss=0.102, lr=0.0004]
Steps: 9%|▉ | 66/700 [00:41<05:48, 1.82it/s, loss=0.102, lr=0.0004]
Steps: 9%|▉ | 66/700 [00:41<05:48, 1.82it/s, loss=0.136, lr=0.0004]
Steps: 10%|▉ | 67/700 [00:42<05:49, 1.81it/s, loss=0.136, lr=0.0004]
Steps: 10%|▉ | 67/700 [00:42<05:49, 1.81it/s, loss=0.0229, lr=0.0004]
Steps: 10%|▉ | 68/700 [00:42<05:47, 1.82it/s, loss=0.0229, lr=0.0004]
Steps: 10%|▉ | 68/700 [00:42<05:47, 1.82it/s, loss=0.0538, lr=0.0004]
Steps: 10%|▉ | 69/700 [00:43<05:43, 1.84it/s, loss=0.0538, lr=0.0004]
Steps: 10%|▉ | 69/700 [00:43<05:43, 1.84it/s, loss=0.0282, lr=0.0004]
Steps: 10%|█ | 70/700 [00:43<05:42, 1.84it/s, loss=0.0282, lr=0.0004]
Steps: 10%|█ | 70/700 [00:43<05:42, 1.84it/s, loss=0.00587, lr=0.0004]
Steps: 10%|█ | 71/700 [00:44<05:45, 1.82it/s, loss=0.00587, lr=0.0004]
Steps: 10%|█ | 71/700 [00:44<05:45, 1.82it/s, loss=0.0534, lr=0.0004]
Steps: 10%|█ | 72/700 [00:44<05:43, 1.83it/s, loss=0.0534, lr=0.0004]
Steps: 10%|█ | 72/700 [00:44<05:43, 1.83it/s, loss=0.00902, lr=0.0004]
Steps: 10%|█ | 73/700 [00:45<05:40, 1.84it/s, loss=0.00902, lr=0.0004]
Steps: 10%|█ | 73/700 [00:45<05:40, 1.84it/s, loss=0.00754, lr=0.0004]
Steps: 11%|█ | 74/700 [00:45<05:42, 1.83it/s, loss=0.00754, lr=0.0004]
Steps: 11%|█ | 74/700 [00:45<05:42, 1.83it/s, loss=0.00843, lr=0.0004]
Steps: 11%|█ | 75/700 [00:46<05:42, 1.83it/s, loss=0.00843, lr=0.0004]
Steps: 11%|█ | 75/700 [00:46<05:42, 1.83it/s, loss=0.0558, lr=0.0004]
Steps: 11%|█ | 76/700 [00:47<05:41, 1.83it/s, loss=0.0558, lr=0.0004]
Steps: 11%|█ | 76/700 [00:47<05:41, 1.83it/s, loss=0.014, lr=0.0004]
Steps: 11%|█ | 77/700 [00:47<06:10, 1.68it/s, loss=0.014, lr=0.0004]
Steps: 11%|█ | 77/700 [00:47<06:10, 1.68it/s, loss=0.0103, lr=0.0004]
Steps: 11%|█ | 78/700 [00:48<06:02, 1.72it/s, loss=0.0103, lr=0.0004]
Steps: 11%|█ | 78/700 [00:48<06:02, 1.72it/s, loss=0.199, lr=0.0004]
Steps: 11%|█▏ | 79/700 [00:48<05:54, 1.75it/s, loss=0.199, lr=0.0004]
Steps: 11%|█▏ | 79/700 [00:48<05:54, 1.75it/s, loss=0.00994, lr=0.0004]
Steps: 11%|█▏ | 80/700 [00:49<05:50, 1.77it/s, loss=0.00994, lr=0.0004]
Steps: 11%|█▏ | 80/700 [00:49<05:50, 1.77it/s, loss=0.00166, lr=0.0004]
Steps: 12%|█▏ | 81/700 [00:49<05:47, 1.78it/s, loss=0.00166, lr=0.0004]
Steps: 12%|█▏ | 81/700 [00:49<05:47, 1.78it/s, loss=0.307, lr=0.0004]
Steps: 12%|█▏ | 82/700 [00:50<05:46, 1.79it/s, loss=0.307, lr=0.0004]
Steps: 12%|█▏ | 82/700 [00:50<05:46, 1.79it/s, loss=0.0787, lr=0.0004]
Steps: 12%|█▏ | 83/700 [00:51<05:41, 1.80it/s, loss=0.0787, lr=0.0004]
Steps: 12%|█▏ | 83/700 [00:51<05:41, 1.80it/s, loss=0.0285, lr=0.0004]
Steps: 12%|█▏ | 84/700 [00:51<05:41, 1.81it/s, loss=0.0285, lr=0.0004]
Steps: 12%|█▏ | 84/700 [00:51<05:41, 1.81it/s, loss=0.0156, lr=0.0004]
Steps: 12%|█▏ | 85/700 [00:52<05:39, 1.81it/s, loss=0.0156, lr=0.0004]
Steps: 12%|█▏ | 85/700 [00:52<05:39, 1.81it/s, loss=0.00945, lr=0.0004]
Steps: 12%|█▏ | 86/700 [00:52<05:36, 1.82it/s, loss=0.00945, lr=0.0004]
Steps: 12%|█▏ | 86/700 [00:52<05:36, 1.82it/s, loss=0.0294, lr=0.0004]
Steps: 12%|█▏ | 87/700 [00:53<05:38, 1.81it/s, loss=0.0294, lr=0.0004]
Steps: 12%|█▏ | 87/700 [00:53<05:38, 1.81it/s, loss=0.0266, lr=0.0004]
Steps: 13%|█▎ | 88/700 [00:53<05:37, 1.82it/s, loss=0.0266, lr=0.0004]
Steps: 13%|█▎ | 88/700 [00:53<05:37, 1.82it/s, loss=0.00252, lr=0.0004]
Steps: 13%|█▎ | 89/700 [00:54<05:39, 1.80it/s, loss=0.00252, lr=0.0004]
Steps: 13%|█▎ | 89/700 [00:54<05:39, 1.80it/s, loss=0.0111, lr=0.0004]
Steps: 13%|█▎ | 90/700 [00:54<05:44, 1.77it/s, loss=0.0111, lr=0.0004]
Steps: 13%|█▎ | 90/700 [00:54<05:44, 1.77it/s, loss=0.0113, lr=0.0004]
Steps: 13%|█▎ | 91/700 [00:55<05:38, 1.80it/s, loss=0.0113, lr=0.0004]
Steps: 13%|█▎ | 91/700 [00:55<05:38, 1.80it/s, loss=0.0463, lr=0.0004]
Steps: 13%|█▎ | 92/700 [00:56<05:32, 1.83it/s, loss=0.0463, lr=0.0004]
Steps: 13%|█▎ | 92/700 [00:56<05:32, 1.83it/s, loss=0.00671, lr=0.0004]
Steps: 13%|█▎ | 93/700 [00:56<05:29, 1.84it/s, loss=0.00671, lr=0.0004]
Steps: 13%|█▎ | 93/700 [00:56<05:29, 1.84it/s, loss=0.0407, lr=0.0004]
Steps: 13%|█▎ | 94/700 [00:57<05:28, 1.84it/s, loss=0.0407, lr=0.0004]
Steps: 13%|█▎ | 94/700 [00:57<05:28, 1.84it/s, loss=0.00514, lr=0.0004]
Steps: 14%|█▎ | 95/700 [00:57<05:31, 1.83it/s, loss=0.00514, lr=0.0004]
Steps: 14%|█▎ | 95/700 [00:57<05:31, 1.83it/s, loss=0.0298, lr=0.0004]
Steps: 14%|█▎ | 96/700 [00:58<05:34, 1.81it/s, loss=0.0298, lr=0.0004]
Steps: 14%|█▎ | 96/700 [00:58<05:34, 1.81it/s, loss=0.0139, lr=0.0004]
Steps: 14%|█▍ | 97/700 [00:58<05:29, 1.83it/s, loss=0.0139, lr=0.0004]
Steps: 14%|█▍ | 97/700 [00:58<05:29, 1.83it/s, loss=0.00684, lr=0.0004]
Steps: 14%|█▍ | 98/700 [00:59<05:20, 1.88it/s, loss=0.00684, lr=0.0004]
Steps: 14%|█▍ | 98/700 [00:59<05:20, 1.88it/s, loss=0.0252, lr=0.0004]
Steps: 14%|█▍ | 99/700 [00:59<05:21, 1.87it/s, loss=0.0252, lr=0.0004]
Steps: 14%|█▍ | 99/700 [00:59<05:21, 1.87it/s, loss=0.212, lr=0.0004]
Steps: 14%|█▍ | 100/700 [01:00<05:22, 1.86it/s, loss=0.212, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_100.safetensors
LORA Unet Moved 0.0007244422449730337
LORA CLIP Moved 3.097903754678555e-05
Steps: 14%|█▍ | 100/700 [01:00<05:22, 1.86it/s, loss=0.0229, lr=0.0004]
Steps: 14%|█▍ | 101/700 [01:01<05:58, 1.67it/s, loss=0.0229, lr=0.0004]
Steps: 14%|█▍ | 101/700 [01:01<05:58, 1.67it/s, loss=0.0265, lr=0.0004]
Steps: 15%|█▍ | 102/700 [01:01<05:48, 1.72it/s, loss=0.0265, lr=0.0004]
Steps: 15%|█▍ | 102/700 [01:01<05:48, 1.72it/s, loss=0.0872, lr=0.0004]
Steps: 15%|█▍ | 103/700 [01:02<05:42, 1.75it/s, loss=0.0872, lr=0.0004]
Steps: 15%|█▍ | 103/700 [01:02<05:42, 1.75it/s, loss=0.0143, lr=0.0004]
Steps: 15%|█▍ | 104/700 [01:02<05:35, 1.78it/s, loss=0.0143, lr=0.0004]
Steps: 15%|█▍ | 104/700 [01:02<05:35, 1.78it/s, loss=0.0161, lr=0.0004]
Steps: 15%|█▌ | 105/700 [01:03<05:32, 1.79it/s, loss=0.0161, lr=0.0004]
Steps: 15%|█▌ | 105/700 [01:03<05:32, 1.79it/s, loss=0.011, lr=0.0004]
Steps: 15%|█▌ | 106/700 [01:03<05:31, 1.79it/s, loss=0.011, lr=0.0004]
Steps: 15%|█▌ | 106/700 [01:03<05:31, 1.79it/s, loss=0.0072, lr=0.0004]
Steps: 15%|█▌ | 107/700 [01:04<05:30, 1.80it/s, loss=0.0072, lr=0.0004]
Steps: 15%|█▌ | 107/700 [01:04<05:30, 1.80it/s, loss=0.00261, lr=0.0004]
Steps: 15%|█▌ | 108/700 [01:04<05:26, 1.81it/s, loss=0.00261, lr=0.0004]
Steps: 15%|█▌ | 108/700 [01:04<05:26, 1.81it/s, loss=0.00597, lr=0.0004]
Steps: 16%|█▌ | 109/700 [01:05<05:24, 1.82it/s, loss=0.00597, lr=0.0004]
Steps: 16%|█▌ | 109/700 [01:05<05:24, 1.82it/s, loss=0.073, lr=0.0004]
Steps: 16%|█▌ | 110/700 [01:05<05:23, 1.82it/s, loss=0.073, lr=0.0004]
Steps: 16%|█▌ | 110/700 [01:05<05:23, 1.82it/s, loss=0.0238, lr=0.0004]
Steps: 16%|█▌ | 111/700 [01:06<05:21, 1.83it/s, loss=0.0238, lr=0.0004]
Steps: 16%|█▌ | 111/700 [01:06<05:21, 1.83it/s, loss=0.00492, lr=0.0004]
Steps: 16%|█▌ | 112/700 [01:07<05:19, 1.84it/s, loss=0.00492, lr=0.0004]
Steps: 16%|█▌ | 112/700 [01:07<05:19, 1.84it/s, loss=0.00202, lr=0.0004]
Steps: 16%|█▌ | 113/700 [01:07<05:18, 1.84it/s, loss=0.00202, lr=0.0004]
Steps: 16%|█▌ | 113/700 [01:07<05:18, 1.84it/s, loss=0.0107, lr=0.0004]
Steps: 16%|█▋ | 114/700 [01:08<05:18, 1.84it/s, loss=0.0107, lr=0.0004]
Steps: 16%|█▋ | 114/700 [01:08<05:18, 1.84it/s, loss=0.0017, lr=0.0004]
Steps: 16%|█▋ | 115/700 [01:08<05:19, 1.83it/s, loss=0.0017, lr=0.0004]
Steps: 16%|█▋ | 115/700 [01:08<05:19, 1.83it/s, loss=0.0193, lr=0.0004]
Steps: 17%|█▋ | 116/700 [01:09<05:17, 1.84it/s, loss=0.0193, lr=0.0004]
Steps: 17%|█▋ | 116/700 [01:09<05:17, 1.84it/s, loss=0.0246, lr=0.0004]
Steps: 17%|█▋ | 117/700 [01:09<05:16, 1.84it/s, loss=0.0246, lr=0.0004]
Steps: 17%|█▋ | 117/700 [01:09<05:16, 1.84it/s, loss=0.0084, lr=0.0004]
Steps: 17%|█▋ | 118/700 [01:10<05:19, 1.82it/s, loss=0.0084, lr=0.0004]
Steps: 17%|█▋ | 118/700 [01:10<05:19, 1.82it/s, loss=0.369, lr=0.0004]
Steps: 17%|█▋ | 119/700 [01:10<05:18, 1.82it/s, loss=0.369, lr=0.0004]
Steps: 17%|█▋ | 119/700 [01:10<05:18, 1.82it/s, loss=0.0188, lr=0.0004]
Steps: 17%|█▋ | 120/700 [01:11<05:20, 1.81it/s, loss=0.0188, lr=0.0004]
Steps: 17%|█▋ | 120/700 [01:11<05:20, 1.81it/s, loss=0.0234, lr=0.0004]
Steps: 17%|█▋ | 121/700 [01:11<05:18, 1.82it/s, loss=0.0234, lr=0.0004]
Steps: 17%|█▋ | 121/700 [01:12<05:18, 1.82it/s, loss=0.0663, lr=0.0004]
Steps: 17%|█▋ | 122/700 [01:12<05:15, 1.83it/s, loss=0.0663, lr=0.0004]
Steps: 17%|█▋ | 122/700 [01:12<05:15, 1.83it/s, loss=0.00747, lr=0.0004]
Steps: 18%|█▊ | 123/700 [01:13<05:14, 1.84it/s, loss=0.00747, lr=0.0004]
Steps: 18%|█▊ | 123/700 [01:13<05:14, 1.84it/s, loss=0.0517, lr=0.0004]
Steps: 18%|█▊ | 124/700 [01:13<05:13, 1.84it/s, loss=0.0517, lr=0.0004]
Steps: 18%|█▊ | 124/700 [01:13<05:13, 1.84it/s, loss=0.00986, lr=0.0004]
Steps: 18%|█▊ | 125/700 [01:14<05:13, 1.83it/s, loss=0.00986, lr=0.0004]
Steps: 18%|█▊ | 125/700 [01:14<05:13, 1.83it/s, loss=0.00407, lr=0.0004]
Steps: 18%|█▊ | 126/700 [01:14<05:14, 1.83it/s, loss=0.00407, lr=0.0004]
Steps: 18%|█▊ | 126/700 [01:14<05:14, 1.83it/s, loss=0.00421, lr=0.0004]
Steps: 18%|█▊ | 127/700 [01:15<05:15, 1.81it/s, loss=0.00421, lr=0.0004]
Steps: 18%|█▊ | 127/700 [01:15<05:15, 1.81it/s, loss=0.0145, lr=0.0004]
Steps: 18%|█▊ | 128/700 [01:15<05:11, 1.84it/s, loss=0.0145, lr=0.0004]
Steps: 18%|█▊ | 128/700 [01:15<05:11, 1.84it/s, loss=0.00552, lr=0.0004]
Steps: 18%|█▊ | 129/700 [01:16<05:09, 1.84it/s, loss=0.00552, lr=0.0004]
Steps: 18%|█▊ | 129/700 [01:16<05:09, 1.84it/s, loss=0.0378, lr=0.0004]
Steps: 19%|█▊ | 130/700 [01:16<05:08, 1.85it/s, loss=0.0378, lr=0.0004]
Steps: 19%|█▊ | 130/700 [01:16<05:08, 1.85it/s, loss=0.0183, lr=0.0004]
Steps: 19%|█▊ | 131/700 [01:17<05:07, 1.85it/s, loss=0.0183, lr=0.0004]
Steps: 19%|█▊ | 131/700 [01:17<05:07, 1.85it/s, loss=0.0362, lr=0.0004]
Steps: 19%|█▉ | 132/700 [01:17<05:06, 1.86it/s, loss=0.0362, lr=0.0004]
Steps: 19%|█▉ | 132/700 [01:17<05:06, 1.86it/s, loss=0.0043, lr=0.0004]
Steps: 19%|█▉ | 133/700 [01:18<05:02, 1.87it/s, loss=0.0043, lr=0.0004]
Steps: 19%|█▉ | 133/700 [01:18<05:02, 1.87it/s, loss=0.0103, lr=0.0004]
Steps: 19%|█▉ | 134/700 [01:18<04:58, 1.89it/s, loss=0.0103, lr=0.0004]
Steps: 19%|█▉ | 134/700 [01:18<04:58, 1.89it/s, loss=0.0782, lr=0.0004]
Steps: 19%|█▉ | 135/700 [01:19<04:57, 1.90it/s, loss=0.0782, lr=0.0004]
Steps: 19%|█▉ | 135/700 [01:19<04:57, 1.90it/s, loss=0.00536, lr=0.0004]
Steps: 19%|█▉ | 136/700 [01:20<04:54, 1.91it/s, loss=0.00536, lr=0.0004]
Steps: 19%|█▉ | 136/700 [01:20<04:54, 1.91it/s, loss=0.00977, lr=0.0004]
Steps: 20%|█▉ | 137/700 [01:20<04:50, 1.94it/s, loss=0.00977, lr=0.0004]
Steps: 20%|█▉ | 137/700 [01:20<04:50, 1.94it/s, loss=0.0244, lr=0.0004]
Steps: 20%|█▉ | 138/700 [01:21<04:49, 1.94it/s, loss=0.0244, lr=0.0004]
Steps: 20%|█▉ | 138/700 [01:21<04:49, 1.94it/s, loss=0.0119, lr=0.0004]
Steps: 20%|█▉ | 139/700 [01:21<04:53, 1.91it/s, loss=0.0119, lr=0.0004]
Steps: 20%|█▉ | 139/700 [01:21<04:53, 1.91it/s, loss=0.00262, lr=0.0004]
Steps: 20%|██ | 140/700 [01:22<04:55, 1.90it/s, loss=0.00262, lr=0.0004]
Steps: 20%|██ | 140/700 [01:22<04:55, 1.90it/s, loss=0.0776, lr=0.0004]
Steps: 20%|██ | 141/700 [01:22<04:58, 1.87it/s, loss=0.0776, lr=0.0004]
Steps: 20%|██ | 141/700 [01:22<04:58, 1.87it/s, loss=0.00148, lr=0.0004]
Steps: 20%|██ | 142/700 [01:23<05:01, 1.85it/s, loss=0.00148, lr=0.0004]
Steps: 20%|██ | 142/700 [01:23<05:01, 1.85it/s, loss=0.0134, lr=0.0004]
Steps: 20%|██ | 143/700 [01:23<05:04, 1.83it/s, loss=0.0134, lr=0.0004]
Steps: 20%|██ | 143/700 [01:23<05:04, 1.83it/s, loss=0.0393, lr=0.0004]
Steps: 21%|██ | 144/700 [01:24<05:06, 1.81it/s, loss=0.0393, lr=0.0004]
Steps: 21%|██ | 144/700 [01:24<05:06, 1.81it/s, loss=0.164, lr=0.0004]
Steps: 21%|██ | 145/700 [01:24<05:00, 1.85it/s, loss=0.164, lr=0.0004]
Steps: 21%|██ | 145/700 [01:24<05:00, 1.85it/s, loss=0.0173, lr=0.0004]
Steps: 21%|██ | 146/700 [01:25<05:00, 1.85it/s, loss=0.0173, lr=0.0004]
Steps: 21%|██ | 146/700 [01:25<05:00, 1.85it/s, loss=0.00347, lr=0.0004]
Steps: 21%|██ | 147/700 [01:25<04:58, 1.85it/s, loss=0.00347, lr=0.0004]
Steps: 21%|██ | 147/700 [01:25<04:58, 1.85it/s, loss=0.0358, lr=0.0004]
Steps: 21%|██ | 148/700 [01:26<04:57, 1.85it/s, loss=0.0358, lr=0.0004]
Steps: 21%|██ | 148/700 [01:26<04:57, 1.85it/s, loss=0.00457, lr=0.0004]
Steps: 21%|██▏ | 149/700 [01:27<04:56, 1.86it/s, loss=0.00457, lr=0.0004]
Steps: 21%|██▏ | 149/700 [01:27<04:56, 1.86it/s, loss=0.0184, lr=0.0004]
Steps: 21%|██▏ | 150/700 [01:27<04:49, 1.90it/s, loss=0.0184, lr=0.0004]
Steps: 21%|██▏ | 150/700 [01:27<04:49, 1.90it/s, loss=0.00209, lr=0.0004]
Steps: 22%|██▏ | 151/700 [01:28<04:47, 1.91it/s, loss=0.00209, lr=0.0004]
Steps: 22%|██▏ | 151/700 [01:28<04:47, 1.91it/s, loss=0.0184, lr=0.0004]
Steps: 22%|██▏ | 152/700 [01:28<04:48, 1.90it/s, loss=0.0184, lr=0.0004]
Steps: 22%|██▏ | 152/700 [01:28<04:48, 1.90it/s, loss=0.242, lr=0.0004]
Steps: 22%|██▏ | 153/700 [01:29<04:50, 1.88it/s, loss=0.242, lr=0.0004]
Steps: 22%|██▏ | 153/700 [01:29<04:50, 1.88it/s, loss=0.0147, lr=0.0004]
Steps: 22%|██▏ | 154/700 [01:29<04:51, 1.88it/s, loss=0.0147, lr=0.0004]
Steps: 22%|██▏ | 154/700 [01:29<04:51, 1.88it/s, loss=0.018, lr=0.0004]
Steps: 22%|██▏ | 155/700 [01:30<04:48, 1.89it/s, loss=0.018, lr=0.0004]
Steps: 22%|██▏ | 155/700 [01:30<04:48, 1.89it/s, loss=0.0357, lr=0.0004]
Steps: 22%|██▏ | 156/700 [01:30<04:51, 1.86it/s, loss=0.0357, lr=0.0004]
Steps: 22%|██▏ | 156/700 [01:30<04:51, 1.86it/s, loss=0.0363, lr=0.0004]
Steps: 22%|██▏ | 157/700 [01:31<04:46, 1.90it/s, loss=0.0363, lr=0.0004]
Steps: 22%|██▏ | 157/700 [01:31<04:46, 1.90it/s, loss=0.0198, lr=0.0004]
Steps: 23%|██▎ | 158/700 [01:31<04:43, 1.91it/s, loss=0.0198, lr=0.0004]
Steps: 23%|██▎ | 158/700 [01:31<04:43, 1.91it/s, loss=0.00913, lr=0.0004]
Steps: 23%|██▎ | 159/700 [01:32<04:47, 1.88it/s, loss=0.00913, lr=0.0004]
Steps: 23%|██▎ | 159/700 [01:32<04:47, 1.88it/s, loss=0.00706, lr=0.0004]
Steps: 23%|██▎ | 160/700 [01:32<04:46, 1.88it/s, loss=0.00706, lr=0.0004]
Steps: 23%|██▎ | 160/700 [01:32<04:46, 1.88it/s, loss=0.0376, lr=0.0004]
Steps: 23%|██▎ | 161/700 [01:33<04:46, 1.88it/s, loss=0.0376, lr=0.0004]
Steps: 23%|██▎ | 161/700 [01:33<04:46, 1.88it/s, loss=0.0822, lr=0.0004]
Steps: 23%|██▎ | 162/700 [01:33<04:54, 1.82it/s, loss=0.0822, lr=0.0004]
Steps: 23%|██▎ | 162/700 [01:33<04:54, 1.82it/s, loss=0.0165, lr=0.0004]
Steps: 23%|██▎ | 163/700 [01:34<04:54, 1.82it/s, loss=0.0165, lr=0.0004]
Steps: 23%|██▎ | 163/700 [01:34<04:54, 1.82it/s, loss=0.0109, lr=0.0004]
Steps: 23%|██▎ | 164/700 [01:35<04:52, 1.83it/s, loss=0.0109, lr=0.0004]
Steps: 23%|██▎ | 164/700 [01:35<04:52, 1.83it/s, loss=0.0233, lr=0.0004]
Steps: 24%|██▎ | 165/700 [01:35<04:49, 1.85it/s, loss=0.0233, lr=0.0004]
Steps: 24%|██▎ | 165/700 [01:35<04:49, 1.85it/s, loss=0.00457, lr=0.0004]
Steps: 24%|██▎ | 166/700 [01:36<04:45, 1.87it/s, loss=0.00457, lr=0.0004]
Steps: 24%|██▎ | 166/700 [01:36<04:45, 1.87it/s, loss=0.0383, lr=0.0004]
Steps: 24%|██▍ | 167/700 [01:36<04:42, 1.89it/s, loss=0.0383, lr=0.0004]
Steps: 24%|██▍ | 167/700 [01:36<04:42, 1.89it/s, loss=0.074, lr=0.0004]
Steps: 24%|██▍ | 168/700 [01:37<04:42, 1.88it/s, loss=0.074, lr=0.0004]
Steps: 24%|██▍ | 168/700 [01:37<04:42, 1.88it/s, loss=0.0275, lr=0.0004]
Steps: 24%|██▍ | 169/700 [01:37<04:42, 1.88it/s, loss=0.0275, lr=0.0004]
Steps: 24%|██▍ | 169/700 [01:37<04:42, 1.88it/s, loss=0.012, lr=0.0004]
Steps: 24%|██▍ | 170/700 [01:38<04:44, 1.87it/s, loss=0.012, lr=0.0004]
Steps: 24%|██▍ | 170/700 [01:38<04:44, 1.87it/s, loss=0.00168, lr=0.0004]
Steps: 24%|██▍ | 171/700 [01:38<04:44, 1.86it/s, loss=0.00168, lr=0.0004]
Steps: 24%|██▍ | 171/700 [01:38<04:44, 1.86it/s, loss=0.00761, lr=0.0004]
Steps: 25%|██▍ | 172/700 [01:39<04:45, 1.85it/s, loss=0.00761, lr=0.0004]
Steps: 25%|██▍ | 172/700 [01:39<04:45, 1.85it/s, loss=0.002, lr=0.0004]
Steps: 25%|██▍ | 173/700 [01:39<04:46, 1.84it/s, loss=0.002, lr=0.0004]
Steps: 25%|██▍ | 173/700 [01:39<04:46, 1.84it/s, loss=0.0126, lr=0.0004]
Steps: 25%|██▍ | 174/700 [01:40<04:47, 1.83it/s, loss=0.0126, lr=0.0004]
Steps: 25%|██▍ | 174/700 [01:40<04:47, 1.83it/s, loss=0.0468, lr=0.0004]
Steps: 25%|██▌ | 175/700 [01:40<04:47, 1.83it/s, loss=0.0468, lr=0.0004]
Steps: 25%|██▌ | 175/700 [01:40<04:47, 1.83it/s, loss=0.0351, lr=0.0004]
Steps: 25%|██▌ | 176/700 [01:41<04:44, 1.84it/s, loss=0.0351, lr=0.0004]
Steps: 25%|██▌ | 176/700 [01:41<04:44, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 25%|██▌ | 177/700 [01:42<04:44, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 25%|██▌ | 177/700 [01:42<04:44, 1.84it/s, loss=0.133, lr=0.0004]
Steps: 25%|██▌ | 178/700 [01:42<04:42, 1.85it/s, loss=0.133, lr=0.0004]
Steps: 25%|██▌ | 178/700 [01:42<04:42, 1.85it/s, loss=0.00218, lr=0.0004]
Steps: 26%|██▌ | 179/700 [01:43<04:38, 1.87it/s, loss=0.00218, lr=0.0004]
Steps: 26%|██▌ | 179/700 [01:43<04:38, 1.87it/s, loss=0.00678, lr=0.0004]
Steps: 26%|██▌ | 180/700 [01:43<04:36, 1.88it/s, loss=0.00678, lr=0.0004]
Steps: 26%|██▌ | 180/700 [01:43<04:36, 1.88it/s, loss=0.0145, lr=0.0004]
Steps: 26%|██▌ | 181/700 [01:44<04:33, 1.90it/s, loss=0.0145, lr=0.0004]
Steps: 26%|██▌ | 181/700 [01:44<04:33, 1.90it/s, loss=0.0168, lr=0.0004]
Steps: 26%|██▌ | 182/700 [01:44<04:29, 1.93it/s, loss=0.0168, lr=0.0004]
Steps: 26%|██▌ | 182/700 [01:44<04:29, 1.93it/s, loss=0.0101, lr=0.0004]
Steps: 26%|██▌ | 183/700 [01:45<04:24, 1.96it/s, loss=0.0101, lr=0.0004]
Steps: 26%|██▌ | 183/700 [01:45<04:24, 1.96it/s, loss=0.0785, lr=0.0004]
Steps: 26%|██▋ | 184/700 [01:45<04:23, 1.96it/s, loss=0.0785, lr=0.0004]
Steps: 26%|██▋ | 184/700 [01:45<04:23, 1.96it/s, loss=0.00305, lr=0.0004]
Steps: 26%|██▋ | 185/700 [01:46<04:23, 1.96it/s, loss=0.00305, lr=0.0004]
Steps: 26%|██▋ | 185/700 [01:46<04:23, 1.96it/s, loss=0.208, lr=0.0004]
Steps: 27%|██▋ | 186/700 [01:46<04:25, 1.93it/s, loss=0.208, lr=0.0004]
Steps: 27%|██▋ | 186/700 [01:46<04:25, 1.93it/s, loss=0.00711, lr=0.0004]
Steps: 27%|██▋ | 187/700 [01:47<04:32, 1.88it/s, loss=0.00711, lr=0.0004]
Steps: 27%|██▋ | 187/700 [01:47<04:32, 1.88it/s, loss=0.0302, lr=0.0004]
Steps: 27%|██▋ | 188/700 [01:47<04:35, 1.86it/s, loss=0.0302, lr=0.0004]
Steps: 27%|██▋ | 188/700 [01:47<04:35, 1.86it/s, loss=0.0422, lr=0.0004]
Steps: 27%|██▋ | 189/700 [01:48<04:36, 1.85it/s, loss=0.0422, lr=0.0004]
Steps: 27%|██▋ | 189/700 [01:48<04:36, 1.85it/s, loss=0.0568, lr=0.0004]
Steps: 27%|██▋ | 190/700 [01:48<04:35, 1.85it/s, loss=0.0568, lr=0.0004]
Steps: 27%|██▋ | 190/700 [01:48<04:35, 1.85it/s, loss=0.00478, lr=0.0004]
Steps: 27%|██▋ | 191/700 [01:49<04:36, 1.84it/s, loss=0.00478, lr=0.0004]
Steps: 27%|██▋ | 191/700 [01:49<04:36, 1.84it/s, loss=0.0315, lr=0.0004]
Steps: 27%|██▋ | 192/700 [01:49<04:37, 1.83it/s, loss=0.0315, lr=0.0004]
Steps: 27%|██▋ | 192/700 [01:49<04:37, 1.83it/s, loss=0.00483, lr=0.0004]
Steps: 28%|██▊ | 193/700 [01:50<04:30, 1.87it/s, loss=0.00483, lr=0.0004]
Steps: 28%|██▊ | 193/700 [01:50<04:30, 1.87it/s, loss=0.0079, lr=0.0004]
Steps: 28%|██▊ | 194/700 [01:51<04:28, 1.88it/s, loss=0.0079, lr=0.0004]
Steps: 28%|██▊ | 194/700 [01:51<04:28, 1.88it/s, loss=0.00442, lr=0.0004]
Steps: 28%|██▊ | 195/700 [01:51<04:28, 1.88it/s, loss=0.00442, lr=0.0004]
Steps: 28%|██▊ | 195/700 [01:51<04:28, 1.88it/s, loss=0.047, lr=0.0004]
Steps: 28%|██▊ | 196/700 [01:52<04:28, 1.88it/s, loss=0.047, lr=0.0004]
Steps: 28%|██▊ | 196/700 [01:52<04:28, 1.88it/s, loss=0.0346, lr=0.0004]
Steps: 28%|██▊ | 197/700 [01:52<04:28, 1.87it/s, loss=0.0346, lr=0.0004]
Steps: 28%|██▊ | 197/700 [01:52<04:28, 1.87it/s, loss=0.128, lr=0.0004]
Steps: 28%|██▊ | 198/700 [01:53<04:31, 1.85it/s, loss=0.128, lr=0.0004]
Steps: 28%|██▊ | 198/700 [01:53<04:31, 1.85it/s, loss=0.00269, lr=0.0004]
Steps: 28%|██▊ | 199/700 [01:53<04:32, 1.84it/s, loss=0.00269, lr=0.0004]
Steps: 28%|██▊ | 199/700 [01:53<04:32, 1.84it/s, loss=0.0341, lr=0.0004]
Steps: 29%|██▊ | 200/700 [01:54<04:39, 1.79it/s, loss=0.0341, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_200.safetensors
LORA Unet Moved 0.0009888404747471213
LORA CLIP Moved 4.0488466765964404e-05
Steps: 29%|██▊ | 200/700 [01:54<04:39, 1.79it/s, loss=0.12, lr=0.0004]
Steps: 29%|██▊ | 201/700 [01:55<05:00, 1.66it/s, loss=0.12, lr=0.0004]
Steps: 29%|██▊ | 201/700 [01:55<05:00, 1.66it/s, loss=0.0149, lr=0.0004]
Steps: 29%|██▉ | 202/700 [01:55<04:52, 1.70it/s, loss=0.0149, lr=0.0004]
Steps: 29%|██▉ | 202/700 [01:55<04:52, 1.70it/s, loss=0.0194, lr=0.0004]
Steps: 29%|██▉ | 203/700 [01:56<04:44, 1.75it/s, loss=0.0194, lr=0.0004]
Steps: 29%|██▉ | 203/700 [01:56<04:44, 1.75it/s, loss=0.00362, lr=0.0004]
Steps: 29%|██▉ | 204/700 [01:56<04:42, 1.76it/s, loss=0.00362, lr=0.0004]
Steps: 29%|██▉ | 204/700 [01:56<04:42, 1.76it/s, loss=0.0177, lr=0.0004]
Steps: 29%|██▉ | 205/700 [01:57<04:39, 1.77it/s, loss=0.0177, lr=0.0004]
Steps: 29%|██▉ | 205/700 [01:57<04:39, 1.77it/s, loss=0.0221, lr=0.0004]
Steps: 29%|██▉ | 206/700 [01:57<04:36, 1.78it/s, loss=0.0221, lr=0.0004]
Steps: 29%|██▉ | 206/700 [01:57<04:36, 1.78it/s, loss=0.0169, lr=0.0004]
Steps: 30%|██▉ | 207/700 [01:58<04:34, 1.80it/s, loss=0.0169, lr=0.0004]
Steps: 30%|██▉ | 207/700 [01:58<04:34, 1.80it/s, loss=0.0307, lr=0.0004]
Steps: 30%|██▉ | 208/700 [01:58<04:39, 1.76it/s, loss=0.0307, lr=0.0004]
Steps: 30%|██▉ | 208/700 [01:58<04:39, 1.76it/s, loss=0.0412, lr=0.0004]
Steps: 30%|██▉ | 209/700 [01:59<04:51, 1.69it/s, loss=0.0412, lr=0.0004]
Steps: 30%|██▉ | 209/700 [01:59<04:51, 1.69it/s, loss=0.0109, lr=0.0004]
Steps: 30%|███ | 210/700 [02:00<04:45, 1.72it/s, loss=0.0109, lr=0.0004]
Steps: 30%|███ | 210/700 [02:00<04:45, 1.72it/s, loss=0.00631, lr=0.0004]
Steps: 30%|███ | 211/700 [02:00<04:42, 1.73it/s, loss=0.00631, lr=0.0004]
Steps: 30%|███ | 211/700 [02:00<04:42, 1.73it/s, loss=0.135, lr=0.0004]
Steps: 30%|███ | 212/700 [02:01<04:35, 1.77it/s, loss=0.135, lr=0.0004]
Steps: 30%|███ | 212/700 [02:01<04:35, 1.77it/s, loss=0.0202, lr=0.0004]
Steps: 30%|███ | 213/700 [02:01<04:34, 1.77it/s, loss=0.0202, lr=0.0004]
Steps: 30%|███ | 213/700 [02:01<04:34, 1.77it/s, loss=0.00592, lr=0.0004]
Steps: 31%|███ | 214/700 [02:02<04:34, 1.77it/s, loss=0.00592, lr=0.0004]
Steps: 31%|███ | 214/700 [02:02<04:34, 1.77it/s, loss=0.267, lr=0.0004]
Steps: 31%|███ | 215/700 [02:02<04:28, 1.81it/s, loss=0.267, lr=0.0004]
Steps: 31%|███ | 215/700 [02:02<04:28, 1.81it/s, loss=0.0209, lr=0.0004]
Steps: 31%|███ | 216/700 [02:03<04:25, 1.82it/s, loss=0.0209, lr=0.0004]
Steps: 31%|███ | 216/700 [02:03<04:25, 1.82it/s, loss=0.0375, lr=0.0004]
Steps: 31%|███ | 217/700 [02:03<04:26, 1.81it/s, loss=0.0375, lr=0.0004]
Steps: 31%|███ | 217/700 [02:03<04:26, 1.81it/s, loss=0.00811, lr=0.0004]
Steps: 31%|███ | 218/700 [02:04<04:28, 1.79it/s, loss=0.00811, lr=0.0004]
Steps: 31%|███ | 218/700 [02:04<04:28, 1.79it/s, loss=0.0201, lr=0.0004]
Steps: 31%|███▏ | 219/700 [02:05<04:25, 1.81it/s, loss=0.0201, lr=0.0004]
Steps: 31%|███▏ | 219/700 [02:05<04:25, 1.81it/s, loss=0.0114, lr=0.0004]
Steps: 31%|███▏ | 220/700 [02:05<04:24, 1.82it/s, loss=0.0114, lr=0.0004]
Steps: 31%|███▏ | 220/700 [02:05<04:24, 1.82it/s, loss=0.104, lr=0.0004]
Steps: 32%|███▏ | 221/700 [02:06<04:24, 1.81it/s, loss=0.104, lr=0.0004]
Steps: 32%|███▏ | 221/700 [02:06<04:24, 1.81it/s, loss=0.0184, lr=0.0004]
Steps: 32%|███▏ | 222/700 [02:06<04:27, 1.78it/s, loss=0.0184, lr=0.0004]
Steps: 32%|███▏ | 222/700 [02:06<04:27, 1.78it/s, loss=0.0112, lr=0.0004]
Steps: 32%|███▏ | 223/700 [02:07<04:34, 1.73it/s, loss=0.0112, lr=0.0004]
Steps: 32%|███▏ | 223/700 [02:07<04:34, 1.73it/s, loss=0.0133, lr=0.0004]
Steps: 32%|███▏ | 224/700 [02:07<04:32, 1.75it/s, loss=0.0133, lr=0.0004]
Steps: 32%|███▏ | 224/700 [02:07<04:32, 1.75it/s, loss=0.0264, lr=0.0004]
Steps: 32%|███▏ | 225/700 [02:08<04:26, 1.78it/s, loss=0.0264, lr=0.0004]
Steps: 32%|███▏ | 225/700 [02:08<04:26, 1.78it/s, loss=0.0537, lr=0.0004]
Steps: 32%|███▏ | 226/700 [02:09<04:25, 1.79it/s, loss=0.0537, lr=0.0004]
Steps: 32%|███▏ | 226/700 [02:09<04:25, 1.79it/s, loss=0.00868, lr=0.0004]
Steps: 32%|███▏ | 227/700 [02:09<04:23, 1.79it/s, loss=0.00868, lr=0.0004]
Steps: 32%|███▏ | 227/700 [02:09<04:23, 1.79it/s, loss=0.0373, lr=0.0004]
Steps: 33%|███▎ | 228/700 [02:10<04:19, 1.82it/s, loss=0.0373, lr=0.0004]
Steps: 33%|███▎ | 228/700 [02:10<04:19, 1.82it/s, loss=0.0108, lr=0.0004]
Steps: 33%|███▎ | 229/700 [02:10<04:16, 1.83it/s, loss=0.0108, lr=0.0004]
Steps: 33%|███▎ | 229/700 [02:10<04:16, 1.83it/s, loss=0.0296, lr=0.0004]
Steps: 33%|███▎ | 230/700 [02:11<04:14, 1.85it/s, loss=0.0296, lr=0.0004]
Steps: 33%|███▎ | 230/700 [02:11<04:14, 1.85it/s, loss=0.0044, lr=0.0004]
Steps: 33%|███▎ | 231/700 [02:11<04:15, 1.83it/s, loss=0.0044, lr=0.0004]
Steps: 33%|███▎ | 231/700 [02:11<04:15, 1.83it/s, loss=0.156, lr=0.0004]
Steps: 33%|███▎ | 232/700 [02:12<04:15, 1.83it/s, loss=0.156, lr=0.0004]
Steps: 33%|███▎ | 232/700 [02:12<04:15, 1.83it/s, loss=0.00477, lr=0.0004]
Steps: 33%|███▎ | 233/700 [02:12<04:11, 1.86it/s, loss=0.00477, lr=0.0004]
Steps: 33%|███▎ | 233/700 [02:12<04:11, 1.86it/s, loss=0.112, lr=0.0004]
Steps: 33%|███▎ | 234/700 [02:13<04:09, 1.87it/s, loss=0.112, lr=0.0004]
Steps: 33%|███▎ | 234/700 [02:13<04:09, 1.87it/s, loss=0.0136, lr=0.0004]
Steps: 34%|███▎ | 235/700 [02:13<04:05, 1.89it/s, loss=0.0136, lr=0.0004]
Steps: 34%|███▎ | 235/700 [02:13<04:05, 1.89it/s, loss=0.0123, lr=0.0004]
Steps: 34%|███▎ | 236/700 [02:14<04:03, 1.91it/s, loss=0.0123, lr=0.0004]
Steps: 34%|███▎ | 236/700 [02:14<04:03, 1.91it/s, loss=0.022, lr=0.0004]
Steps: 34%|███▍ | 237/700 [02:14<04:00, 1.93it/s, loss=0.022, lr=0.0004]
Steps: 34%|███▍ | 237/700 [02:14<04:00, 1.93it/s, loss=0.00886, lr=0.0004]
Steps: 34%|███▍ | 238/700 [02:15<03:59, 1.93it/s, loss=0.00886, lr=0.0004]
Steps: 34%|███▍ | 238/700 [02:15<03:59, 1.93it/s, loss=0.00845, lr=0.0004]
Steps: 34%|███▍ | 239/700 [02:15<03:57, 1.94it/s, loss=0.00845, lr=0.0004]
Steps: 34%|███▍ | 239/700 [02:15<03:57, 1.94it/s, loss=0.00988, lr=0.0004]
Steps: 34%|███▍ | 240/700 [02:16<03:56, 1.94it/s, loss=0.00988, lr=0.0004]
Steps: 34%|███▍ | 240/700 [02:16<03:56, 1.94it/s, loss=0.00246, lr=0.0004]
Steps: 34%|███▍ | 241/700 [02:16<03:53, 1.97it/s, loss=0.00246, lr=0.0004]
Steps: 34%|███▍ | 241/700 [02:16<03:53, 1.97it/s, loss=0.00873, lr=0.0004]
Steps: 35%|███▍ | 242/700 [02:17<03:51, 1.98it/s, loss=0.00873, lr=0.0004]
Steps: 35%|███▍ | 242/700 [02:17<03:51, 1.98it/s, loss=0.00512, lr=0.0004]
Steps: 35%|███▍ | 243/700 [02:17<03:49, 1.99it/s, loss=0.00512, lr=0.0004]
Steps: 35%|███▍ | 243/700 [02:17<03:49, 1.99it/s, loss=0.0248, lr=0.0004]
Steps: 35%|███▍ | 244/700 [02:18<03:47, 2.00it/s, loss=0.0248, lr=0.0004]
Steps: 35%|███▍ | 244/700 [02:18<03:47, 2.00it/s, loss=0.00431, lr=0.0004]
Steps: 35%|███▌ | 245/700 [02:18<03:49, 1.98it/s, loss=0.00431, lr=0.0004]
Steps: 35%|███▌ | 245/700 [02:18<03:49, 1.98it/s, loss=0.0201, lr=0.0004]
Steps: 35%|███▌ | 246/700 [02:19<03:53, 1.95it/s, loss=0.0201, lr=0.0004]
Steps: 35%|███▌ | 246/700 [02:19<03:53, 1.95it/s, loss=0.0103, lr=0.0004]
Steps: 35%|███▌ | 247/700 [02:19<03:50, 1.96it/s, loss=0.0103, lr=0.0004]
Steps: 35%|███▌ | 247/700 [02:19<03:50, 1.96it/s, loss=0.0497, lr=0.0004]
Steps: 35%|███▌ | 248/700 [02:20<03:55, 1.92it/s, loss=0.0497, lr=0.0004]
Steps: 35%|███▌ | 248/700 [02:20<03:55, 1.92it/s, loss=0.163, lr=0.0004]
Steps: 36%|███▌ | 249/700 [02:21<04:03, 1.85it/s, loss=0.163, lr=0.0004]
Steps: 36%|███▌ | 249/700 [02:21<04:03, 1.85it/s, loss=0.0142, lr=0.0004]
Steps: 36%|███▌ | 250/700 [02:21<03:57, 1.90it/s, loss=0.0142, lr=0.0004]
Steps: 36%|███▌ | 250/700 [02:21<03:57, 1.90it/s, loss=0.00624, lr=0.0004]
Steps: 36%|███▌ | 251/700 [02:22<03:53, 1.92it/s, loss=0.00624, lr=0.0004]
Steps: 36%|███▌ | 251/700 [02:22<03:53, 1.92it/s, loss=0.0026, lr=0.0004]
Steps: 36%|███▌ | 252/700 [02:22<03:52, 1.93it/s, loss=0.0026, lr=0.0004]
Steps: 36%|███▌ | 252/700 [02:22<03:52, 1.93it/s, loss=0.15, lr=0.0004]
Steps: 36%|███▌ | 253/700 [02:23<03:51, 1.93it/s, loss=0.15, lr=0.0004]
Steps: 36%|███▌ | 253/700 [02:23<03:51, 1.93it/s, loss=0.0312, lr=0.0004]
Steps: 36%|███▋ | 254/700 [02:23<03:51, 1.93it/s, loss=0.0312, lr=0.0004]
Steps: 36%|███▋ | 254/700 [02:23<03:51, 1.93it/s, loss=0.0161, lr=0.0004]
Steps: 36%|███▋ | 255/700 [02:24<03:50, 1.93it/s, loss=0.0161, lr=0.0004]
Steps: 36%|███▋ | 255/700 [02:24<03:50, 1.93it/s, loss=0.00627, lr=0.0004]
Steps: 37%|███▋ | 256/700 [02:24<03:47, 1.95it/s, loss=0.00627, lr=0.0004]
Steps: 37%|███▋ | 256/700 [02:24<03:47, 1.95it/s, loss=0.0224, lr=0.0004]
Steps: 37%|███▋ | 257/700 [02:25<03:48, 1.94it/s, loss=0.0224, lr=0.0004]
Steps: 37%|███▋ | 257/700 [02:25<03:48, 1.94it/s, loss=0.0383, lr=0.0004]
Steps: 37%|███▋ | 258/700 [02:25<03:53, 1.90it/s, loss=0.0383, lr=0.0004]
Steps: 37%|███▋ | 258/700 [02:25<03:53, 1.90it/s, loss=0.0124, lr=0.0004]
Steps: 37%|███▋ | 259/700 [02:26<04:00, 1.84it/s, loss=0.0124, lr=0.0004]
Steps: 37%|███▋ | 259/700 [02:26<04:00, 1.84it/s, loss=0.00859, lr=0.0004]
Steps: 37%|███▋ | 260/700 [02:26<03:59, 1.84it/s, loss=0.00859, lr=0.0004]
Steps: 37%|███▋ | 260/700 [02:26<03:59, 1.84it/s, loss=0.25, lr=0.0004]
Steps: 37%|███▋ | 261/700 [02:27<04:04, 1.80it/s, loss=0.25, lr=0.0004]
Steps: 37%|███▋ | 261/700 [02:27<04:04, 1.80it/s, loss=0.00184, lr=0.0004]
Steps: 37%|███▋ | 262/700 [02:28<04:07, 1.77it/s, loss=0.00184, lr=0.0004]
Steps: 37%|███▋ | 262/700 [02:28<04:07, 1.77it/s, loss=0.0153, lr=0.0004]
Steps: 38%|███▊ | 263/700 [02:28<04:12, 1.73it/s, loss=0.0153, lr=0.0004]
Steps: 38%|███▊ | 263/700 [02:28<04:12, 1.73it/s, loss=0.0682, lr=0.0004]
Steps: 38%|███▊ | 264/700 [02:29<04:14, 1.71it/s, loss=0.0682, lr=0.0004]
Steps: 38%|███▊ | 264/700 [02:29<04:14, 1.71it/s, loss=0.0619, lr=0.0004]
Steps: 38%|███▊ | 265/700 [02:29<04:28, 1.62it/s, loss=0.0619, lr=0.0004]
Steps: 38%|███▊ | 265/700 [02:29<04:28, 1.62it/s, loss=0.0181, lr=0.0004]
Steps: 38%|███▊ | 266/700 [02:30<04:25, 1.64it/s, loss=0.0181, lr=0.0004]
Steps: 38%|███▊ | 266/700 [02:30<04:25, 1.64it/s, loss=0.0288, lr=0.0004]
Steps: 38%|███▊ | 267/700 [02:31<04:23, 1.64it/s, loss=0.0288, lr=0.0004]
Steps: 38%|███▊ | 267/700 [02:31<04:23, 1.64it/s, loss=0.00962, lr=0.0004]
Steps: 38%|███▊ | 268/700 [02:31<04:28, 1.61it/s, loss=0.00962, lr=0.0004]
Steps: 38%|███▊ | 268/700 [02:31<04:28, 1.61it/s, loss=0.0127, lr=0.0004]
Steps: 38%|███▊ | 269/700 [02:32<04:27, 1.61it/s, loss=0.0127, lr=0.0004]
Steps: 38%|███▊ | 269/700 [02:32<04:27, 1.61it/s, loss=0.00764, lr=0.0004]
Steps: 39%|███▊ | 270/700 [02:33<04:29, 1.60it/s, loss=0.00764, lr=0.0004]
Steps: 39%|███▊ | 270/700 [02:33<04:29, 1.60it/s, loss=0.005, lr=0.0004]
Steps: 39%|███▊ | 271/700 [02:33<04:29, 1.59it/s, loss=0.005, lr=0.0004]
Steps: 39%|███▊ | 271/700 [02:33<04:29, 1.59it/s, loss=0.0286, lr=0.0004]
Steps: 39%|███▉ | 272/700 [02:34<04:31, 1.58it/s, loss=0.0286, lr=0.0004]
Steps: 39%|███▉ | 272/700 [02:34<04:31, 1.58it/s, loss=0.0257, lr=0.0004]
Steps: 39%|███▉ | 273/700 [02:34<04:24, 1.62it/s, loss=0.0257, lr=0.0004]
Steps: 39%|███▉ | 273/700 [02:34<04:24, 1.62it/s, loss=0.0963, lr=0.0004]
Steps: 39%|███▉ | 274/700 [02:35<04:26, 1.60it/s, loss=0.0963, lr=0.0004]
Steps: 39%|███▉ | 274/700 [02:35<04:26, 1.60it/s, loss=0.00725, lr=0.0004]
Steps: 39%|███▉ | 275/700 [02:36<04:19, 1.64it/s, loss=0.00725, lr=0.0004]
Steps: 39%|███▉ | 275/700 [02:36<04:19, 1.64it/s, loss=0.00157, lr=0.0004]
Steps: 39%|███▉ | 276/700 [02:36<04:14, 1.66it/s, loss=0.00157, lr=0.0004]
Steps: 39%|███▉ | 276/700 [02:36<04:14, 1.66it/s, loss=0.00832, lr=0.0004]
Steps: 40%|███▉ | 277/700 [02:37<04:13, 1.67it/s, loss=0.00832, lr=0.0004]
Steps: 40%|███▉ | 277/700 [02:37<04:13, 1.67it/s, loss=0.0604, lr=0.0004]
Steps: 40%|███▉ | 278/700 [02:37<04:10, 1.68it/s, loss=0.0604, lr=0.0004]
Steps: 40%|███▉ | 278/700 [02:37<04:10, 1.68it/s, loss=0.0378, lr=0.0004]
Steps: 40%|███▉ | 279/700 [02:38<04:06, 1.71it/s, loss=0.0378, lr=0.0004]
Steps: 40%|███▉ | 279/700 [02:38<04:06, 1.71it/s, loss=0.0044, lr=0.0004]
Steps: 40%|████ | 280/700 [02:39<04:06, 1.71it/s, loss=0.0044, lr=0.0004]
Steps: 40%|████ | 280/700 [02:39<04:06, 1.71it/s, loss=0.0125, lr=0.0004]
Steps: 40%|████ | 281/700 [02:39<04:02, 1.73it/s, loss=0.0125, lr=0.0004]
Steps: 40%|████ | 281/700 [02:39<04:02, 1.73it/s, loss=0.00308, lr=0.0004]
Steps: 40%|████ | 282/700 [02:40<03:59, 1.74it/s, loss=0.00308, lr=0.0004]
Steps: 40%|████ | 282/700 [02:40<03:59, 1.74it/s, loss=0.0162, lr=0.0004]
Steps: 40%|████ | 283/700 [02:40<03:59, 1.74it/s, loss=0.0162, lr=0.0004]
Steps: 40%|████ | 283/700 [02:40<03:59, 1.74it/s, loss=0.0964, lr=0.0004]
Steps: 41%|████ | 284/700 [02:41<04:00, 1.73it/s, loss=0.0964, lr=0.0004]
Steps: 41%|████ | 284/700 [02:41<04:00, 1.73it/s, loss=0.0236, lr=0.0004]
Steps: 41%|████ | 285/700 [02:41<03:58, 1.74it/s, loss=0.0236, lr=0.0004]
Steps: 41%|████ | 285/700 [02:41<03:58, 1.74it/s, loss=0.016, lr=0.0004]
Steps: 41%|████ | 286/700 [02:42<03:55, 1.76it/s, loss=0.016, lr=0.0004]
Steps: 41%|████ | 286/700 [02:42<03:55, 1.76it/s, loss=0.00831, lr=0.0004]
Steps: 41%|████ | 287/700 [02:43<03:54, 1.76it/s, loss=0.00831, lr=0.0004]
Steps: 41%|████ | 287/700 [02:43<03:54, 1.76it/s, loss=0.0241, lr=0.0004]
Steps: 41%|████ | 288/700 [02:43<03:49, 1.80it/s, loss=0.0241, lr=0.0004]
Steps: 41%|████ | 288/700 [02:43<03:49, 1.80it/s, loss=0.0839, lr=0.0004]
Steps: 41%|████▏ | 289/700 [02:44<03:41, 1.85it/s, loss=0.0839, lr=0.0004]
Steps: 41%|████▏ | 289/700 [02:44<03:41, 1.85it/s, loss=0.0263, lr=0.0004]
Steps: 41%|████▏ | 290/700 [02:44<03:35, 1.90it/s, loss=0.0263, lr=0.0004]
Steps: 41%|████▏ | 290/700 [02:44<03:35, 1.90it/s, loss=0.0967, lr=0.0004]
Steps: 42%|████▏ | 291/700 [02:45<03:31, 1.93it/s, loss=0.0967, lr=0.0004]
Steps: 42%|████▏ | 291/700 [02:45<03:31, 1.93it/s, loss=0.0111, lr=0.0004]
Steps: 42%|████▏ | 292/700 [02:45<03:30, 1.94it/s, loss=0.0111, lr=0.0004]
Steps: 42%|████▏ | 292/700 [02:45<03:30, 1.94it/s, loss=0.0426, lr=0.0004]
Steps: 42%|████▏ | 293/700 [02:46<03:32, 1.92it/s, loss=0.0426, lr=0.0004]
Steps: 42%|████▏ | 293/700 [02:46<03:32, 1.92it/s, loss=0.0054, lr=0.0004]
Steps: 42%|████▏ | 294/700 [02:46<03:31, 1.92it/s, loss=0.0054, lr=0.0004]
Steps: 42%|████▏ | 294/700 [02:46<03:31, 1.92it/s, loss=0.0031, lr=0.0004]
Steps: 42%|████▏ | 295/700 [02:47<03:31, 1.91it/s, loss=0.0031, lr=0.0004]
Steps: 42%|████▏ | 295/700 [02:47<03:31, 1.91it/s, loss=0.0399, lr=0.0004]
Steps: 42%|████▏ | 296/700 [02:47<03:35, 1.87it/s, loss=0.0399, lr=0.0004]
Steps: 42%|████▏ | 296/700 [02:47<03:35, 1.87it/s, loss=0.0144, lr=0.0004]
Steps: 42%|████▏ | 297/700 [02:48<03:36, 1.86it/s, loss=0.0144, lr=0.0004]
Steps: 42%|████▏ | 297/700 [02:48<03:36, 1.86it/s, loss=0.0868, lr=0.0004]
Steps: 43%|████▎ | 298/700 [02:48<03:37, 1.85it/s, loss=0.0868, lr=0.0004]
Steps: 43%|████▎ | 298/700 [02:48<03:37, 1.85it/s, loss=0.0358, lr=0.0004]
Steps: 43%|████▎ | 299/700 [02:49<03:39, 1.82it/s, loss=0.0358, lr=0.0004]
Steps: 43%|████▎ | 299/700 [02:49<03:39, 1.82it/s, loss=0.0683, lr=0.0004]
Steps: 43%|████▎ | 300/700 [02:49<03:36, 1.84it/s, loss=0.0683, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_300.safetensors
LORA Unet Moved 0.0012533192057162523
LORA CLIP Moved 5.122544462210499e-05
Steps: 43%|████▎ | 300/700 [02:49<03:36, 1.84it/s, loss=0.00153, lr=0.0004]
Steps: 43%|████▎ | 301/700 [02:50<04:01, 1.65it/s, loss=0.00153, lr=0.0004]
Steps: 43%|████▎ | 301/700 [02:50<04:01, 1.65it/s, loss=0.0337, lr=0.0004]
Steps: 43%|████▎ | 302/700 [02:51<03:52, 1.71it/s, loss=0.0337, lr=0.0004]
Steps: 43%|████▎ | 302/700 [02:51<03:52, 1.71it/s, loss=0.0974, lr=0.0004]
Steps: 43%|████▎ | 303/700 [02:51<03:40, 1.80it/s, loss=0.0974, lr=0.0004]
Steps: 43%|████▎ | 303/700 [02:51<03:40, 1.80it/s, loss=0.00531, lr=0.0004]
Steps: 43%|████▎ | 304/700 [02:52<03:34, 1.85it/s, loss=0.00531, lr=0.0004]
Steps: 43%|████▎ | 304/700 [02:52<03:34, 1.85it/s, loss=0.0179, lr=0.0004]
Steps: 44%|████▎ | 305/700 [02:52<03:32, 1.86it/s, loss=0.0179, lr=0.0004]
Steps: 44%|████▎ | 305/700 [02:52<03:32, 1.86it/s, loss=0.0687, lr=0.0004]
Steps: 44%|████▎ | 306/700 [02:53<03:35, 1.83it/s, loss=0.0687, lr=0.0004]
Steps: 44%|████▎ | 306/700 [02:53<03:35, 1.83it/s, loss=0.00892, lr=0.0004]
Steps: 44%|████▍ | 307/700 [02:53<03:36, 1.81it/s, loss=0.00892, lr=0.0004]
Steps: 44%|████▍ | 307/700 [02:53<03:36, 1.81it/s, loss=0.0717, lr=0.0004]
Steps: 44%|████▍ | 308/700 [02:54<03:31, 1.85it/s, loss=0.0717, lr=0.0004]
Steps: 44%|████▍ | 308/700 [02:54<03:31, 1.85it/s, loss=0.00435, lr=0.0004]
Steps: 44%|████▍ | 309/700 [02:54<03:31, 1.85it/s, loss=0.00435, lr=0.0004]
Steps: 44%|████▍ | 309/700 [02:54<03:31, 1.85it/s, loss=0.00829, lr=0.0004]
Steps: 44%|████▍ | 310/700 [02:55<03:36, 1.80it/s, loss=0.00829, lr=0.0004]
Steps: 44%|████▍ | 310/700 [02:55<03:36, 1.80it/s, loss=0.0713, lr=0.0004]
Steps: 44%|████▍ | 311/700 [02:55<03:31, 1.84it/s, loss=0.0713, lr=0.0004]
Steps: 44%|████▍ | 311/700 [02:55<03:31, 1.84it/s, loss=0.00767, lr=0.0004]
Steps: 45%|████▍ | 312/700 [02:56<03:25, 1.89it/s, loss=0.00767, lr=0.0004]
Steps: 45%|████▍ | 312/700 [02:56<03:25, 1.89it/s, loss=0.0893, lr=0.0004]
Steps: 45%|████▍ | 313/700 [02:56<03:19, 1.94it/s, loss=0.0893, lr=0.0004]
Steps: 45%|████▍ | 313/700 [02:56<03:19, 1.94it/s, loss=0.019, lr=0.0004]
Steps: 45%|████▍ | 314/700 [02:57<03:18, 1.95it/s, loss=0.019, lr=0.0004]
Steps: 45%|████▍ | 314/700 [02:57<03:18, 1.95it/s, loss=0.00861, lr=0.0004]
Steps: 45%|████▌ | 315/700 [02:57<03:15, 1.97it/s, loss=0.00861, lr=0.0004]
Steps: 45%|████▌ | 315/700 [02:57<03:15, 1.97it/s, loss=0.0777, lr=0.0004]
Steps: 45%|████▌ | 316/700 [02:58<03:13, 1.98it/s, loss=0.0777, lr=0.0004]
Steps: 45%|████▌ | 316/700 [02:58<03:13, 1.98it/s, loss=0.00247, lr=0.0004]
Steps: 45%|████▌ | 317/700 [02:58<03:15, 1.96it/s, loss=0.00247, lr=0.0004]
Steps: 45%|████▌ | 317/700 [02:58<03:15, 1.96it/s, loss=0.229, lr=0.0004]
Steps: 45%|████▌ | 318/700 [02:59<03:17, 1.93it/s, loss=0.229, lr=0.0004]
Steps: 45%|████▌ | 318/700 [02:59<03:17, 1.93it/s, loss=0.0106, lr=0.0004]
Steps: 46%|████▌ | 319/700 [03:00<03:21, 1.89it/s, loss=0.0106, lr=0.0004]
Steps: 46%|████▌ | 319/700 [03:00<03:21, 1.89it/s, loss=0.00504, lr=0.0004]
Steps: 46%|████▌ | 320/700 [03:00<03:23, 1.87it/s, loss=0.00504, lr=0.0004]
Steps: 46%|████▌ | 320/700 [03:00<03:23, 1.87it/s, loss=0.00787, lr=0.0004]
Steps: 46%|████▌ | 321/700 [03:01<03:25, 1.84it/s, loss=0.00787, lr=0.0004]
Steps: 46%|████▌ | 321/700 [03:01<03:25, 1.84it/s, loss=0.103, lr=0.0004]
Steps: 46%|████▌ | 322/700 [03:01<03:25, 1.84it/s, loss=0.103, lr=0.0004]
Steps: 46%|████▌ | 322/700 [03:01<03:25, 1.84it/s, loss=0.028, lr=0.0004]
Steps: 46%|████▌ | 323/700 [03:02<03:27, 1.82it/s, loss=0.028, lr=0.0004]
Steps: 46%|████▌ | 323/700 [03:02<03:27, 1.82it/s, loss=0.173, lr=0.0004]
Steps: 46%|████▋ | 324/700 [03:02<03:27, 1.82it/s, loss=0.173, lr=0.0004]
Steps: 46%|████▋ | 324/700 [03:02<03:27, 1.82it/s, loss=0.0602, lr=0.0004]
Steps: 46%|████▋ | 325/700 [03:03<03:28, 1.80it/s, loss=0.0602, lr=0.0004]
Steps: 46%|████▋ | 325/700 [03:03<03:28, 1.80it/s, loss=0.0443, lr=0.0004]
Steps: 47%|████▋ | 326/700 [03:03<03:27, 1.81it/s, loss=0.0443, lr=0.0004]
Steps: 47%|████▋ | 326/700 [03:03<03:27, 1.81it/s, loss=0.0424, lr=0.0004]
Steps: 47%|████▋ | 327/700 [03:04<03:27, 1.80it/s, loss=0.0424, lr=0.0004]
Steps: 47%|████▋ | 327/700 [03:04<03:27, 1.80it/s, loss=0.00866, lr=0.0004]
Steps: 47%|████▋ | 328/700 [03:05<03:29, 1.78it/s, loss=0.00866, lr=0.0004]
Steps: 47%|████▋ | 328/700 [03:05<03:29, 1.78it/s, loss=0.0145, lr=0.0004]
Steps: 47%|████▋ | 329/700 [03:05<03:27, 1.79it/s, loss=0.0145, lr=0.0004]
Steps: 47%|████▋ | 329/700 [03:05<03:27, 1.79it/s, loss=0.0291, lr=0.0004]
Steps: 47%|████▋ | 330/700 [03:06<03:27, 1.79it/s, loss=0.0291, lr=0.0004]
Steps: 47%|████▋ | 330/700 [03:06<03:27, 1.79it/s, loss=0.112, lr=0.0004]
Steps: 47%|████▋ | 331/700 [03:06<03:27, 1.78it/s, loss=0.112, lr=0.0004]
Steps: 47%|████▋ | 331/700 [03:06<03:27, 1.78it/s, loss=0.0583, lr=0.0004]
Steps: 47%|████▋ | 332/700 [03:07<03:29, 1.76it/s, loss=0.0583, lr=0.0004]
Steps: 47%|████▋ | 332/700 [03:07<03:29, 1.76it/s, loss=0.0574, lr=0.0004]
Steps: 48%|████▊ | 333/700 [03:07<03:29, 1.75it/s, loss=0.0574, lr=0.0004]
Steps: 48%|████▊ | 333/700 [03:07<03:29, 1.75it/s, loss=0.00921, lr=0.0004]
Steps: 48%|████▊ | 334/700 [03:08<03:21, 1.82it/s, loss=0.00921, lr=0.0004]
Steps: 48%|████▊ | 334/700 [03:08<03:21, 1.82it/s, loss=0.0178, lr=0.0004]
Steps: 48%|████▊ | 335/700 [03:08<03:19, 1.83it/s, loss=0.0178, lr=0.0004]
Steps: 48%|████▊ | 335/700 [03:08<03:19, 1.83it/s, loss=0.0147, lr=0.0004]
Steps: 48%|████▊ | 336/700 [03:09<03:21, 1.80it/s, loss=0.0147, lr=0.0004]
Steps: 48%|████▊ | 336/700 [03:09<03:21, 1.80it/s, loss=0.0233, lr=0.0004]
Steps: 48%|████▊ | 337/700 [03:10<03:22, 1.80it/s, loss=0.0233, lr=0.0004]
Steps: 48%|████▊ | 337/700 [03:10<03:22, 1.80it/s, loss=0.0265, lr=0.0004]
Steps: 48%|████▊ | 338/700 [03:10<03:22, 1.79it/s, loss=0.0265, lr=0.0004]
Steps: 48%|████▊ | 338/700 [03:10<03:22, 1.79it/s, loss=0.0103, lr=0.0004]
Steps: 48%|████▊ | 339/700 [03:11<03:26, 1.75it/s, loss=0.0103, lr=0.0004]
Steps: 48%|████▊ | 339/700 [03:11<03:26, 1.75it/s, loss=0.00171, lr=0.0004]
Steps: 49%|████▊ | 340/700 [03:11<03:23, 1.77it/s, loss=0.00171, lr=0.0004]
Steps: 49%|████▊ | 340/700 [03:11<03:23, 1.77it/s, loss=0.226, lr=0.0004]
Steps: 49%|████▊ | 341/700 [03:12<03:18, 1.80it/s, loss=0.226, lr=0.0004]
Steps: 49%|████▊ | 341/700 [03:12<03:18, 1.80it/s, loss=0.0407, lr=0.0004]
Steps: 49%|████▉ | 342/700 [03:12<03:13, 1.85it/s, loss=0.0407, lr=0.0004]
Steps: 49%|████▉ | 342/700 [03:12<03:13, 1.85it/s, loss=0.0194, lr=0.0004]
Steps: 49%|████▉ | 343/700 [03:13<03:08, 1.89it/s, loss=0.0194, lr=0.0004]
Steps: 49%|████▉ | 343/700 [03:13<03:08, 1.89it/s, loss=0.00992, lr=0.0004]
Steps: 49%|████▉ | 344/700 [03:13<03:06, 1.90it/s, loss=0.00992, lr=0.0004]
Steps: 49%|████▉ | 344/700 [03:13<03:06, 1.90it/s, loss=0.0107, lr=0.0004]
Steps: 49%|████▉ | 345/700 [03:14<03:03, 1.93it/s, loss=0.0107, lr=0.0004]
Steps: 49%|████▉ | 345/700 [03:14<03:03, 1.93it/s, loss=0.028, lr=0.0004]
Steps: 49%|████▉ | 346/700 [03:14<03:02, 1.93it/s, loss=0.028, lr=0.0004]
Steps: 49%|████▉ | 346/700 [03:14<03:02, 1.93it/s, loss=0.00153, lr=0.0004]
Steps: 50%|████▉ | 347/700 [03:15<03:04, 1.91it/s, loss=0.00153, lr=0.0004]
Steps: 50%|████▉ | 347/700 [03:15<03:04, 1.91it/s, loss=0.0558, lr=0.0004]
Steps: 50%|████▉ | 348/700 [03:15<03:07, 1.88it/s, loss=0.0558, lr=0.0004]
Steps: 50%|████▉ | 348/700 [03:15<03:07, 1.88it/s, loss=0.0713, lr=0.0004]
Steps: 50%|████▉ | 349/700 [03:16<03:08, 1.86it/s, loss=0.0713, lr=0.0004]
Steps: 50%|████▉ | 349/700 [03:16<03:08, 1.86it/s, loss=0.0164, lr=0.0004]
Steps: 50%|█████ | 350/700 [03:17<03:10, 1.83it/s, loss=0.0164, lr=0.0004]
Steps: 50%|█████ | 350/700 [03:17<03:10, 1.83it/s, loss=0.243, lr=0.0004]
Steps: 50%|█████ | 351/700 [03:17<03:12, 1.82it/s, loss=0.243, lr=0.0004]
Steps: 50%|█████ | 351/700 [03:17<03:12, 1.82it/s, loss=0.0152, lr=0.0004]
Steps: 50%|█████ | 352/700 [03:18<03:11, 1.82it/s, loss=0.0152, lr=0.0004]
Steps: 50%|█████ | 352/700 [03:18<03:11, 1.82it/s, loss=0.0497, lr=0.0004]
Steps: 50%|█████ | 353/700 [03:18<03:10, 1.82it/s, loss=0.0497, lr=0.0004]
Steps: 50%|█████ | 353/700 [03:18<03:10, 1.82it/s, loss=0.0611, lr=0.0004]
Steps: 51%|█████ | 354/700 [03:19<03:07, 1.84it/s, loss=0.0611, lr=0.0004]
Steps: 51%|█████ | 354/700 [03:19<03:07, 1.84it/s, loss=0.0738, lr=0.0004]
Steps: 51%|█████ | 355/700 [03:19<03:04, 1.87it/s, loss=0.0738, lr=0.0004]
Steps: 51%|█████ | 355/700 [03:19<03:04, 1.87it/s, loss=0.00715, lr=0.0004]
Steps: 51%|█████ | 356/700 [03:20<03:06, 1.84it/s, loss=0.00715, lr=0.0004]
Steps: 51%|█████ | 356/700 [03:20<03:06, 1.84it/s, loss=0.0472, lr=0.0004]
Steps: 51%|█████ | 357/700 [03:20<03:06, 1.84it/s, loss=0.0472, lr=0.0004]
Steps: 51%|█████ | 357/700 [03:20<03:06, 1.84it/s, loss=0.0275, lr=0.0004]
Steps: 51%|█████ | 358/700 [03:21<03:06, 1.83it/s, loss=0.0275, lr=0.0004]
Steps: 51%|█████ | 358/700 [03:21<03:06, 1.83it/s, loss=0.111, lr=0.0004]
Steps: 51%|█████▏ | 359/700 [03:22<03:05, 1.84it/s, loss=0.111, lr=0.0004]
Steps: 51%|█████▏ | 359/700 [03:22<03:05, 1.84it/s, loss=0.0267, lr=0.0004]
Steps: 51%|█████▏ | 360/700 [03:22<03:07, 1.82it/s, loss=0.0267, lr=0.0004]
Steps: 51%|█████▏ | 360/700 [03:22<03:07, 1.82it/s, loss=0.0598, lr=0.0004]
Steps: 52%|█████▏ | 361/700 [03:23<03:07, 1.81it/s, loss=0.0598, lr=0.0004]
Steps: 52%|█████▏ | 361/700 [03:23<03:07, 1.81it/s, loss=0.0234, lr=0.0004]
Steps: 52%|█████▏ | 362/700 [03:23<03:08, 1.79it/s, loss=0.0234, lr=0.0004]
Steps: 52%|█████▏ | 362/700 [03:23<03:08, 1.79it/s, loss=0.00394, lr=0.0004]
Steps: 52%|█████▏ | 363/700 [03:24<03:07, 1.80it/s, loss=0.00394, lr=0.0004]
Steps: 52%|█████▏ | 363/700 [03:24<03:07, 1.80it/s, loss=0.103, lr=0.0004]
Steps: 52%|█████▏ | 364/700 [03:24<03:07, 1.80it/s, loss=0.103, lr=0.0004]
Steps: 52%|█████▏ | 364/700 [03:24<03:07, 1.80it/s, loss=0.0446, lr=0.0004]
Steps: 52%|█████▏ | 365/700 [03:25<03:07, 1.79it/s, loss=0.0446, lr=0.0004]
Steps: 52%|█████▏ | 365/700 [03:25<03:07, 1.79it/s, loss=0.0886, lr=0.0004]
Steps: 52%|█████▏ | 366/700 [03:25<03:03, 1.82it/s, loss=0.0886, lr=0.0004]
Steps: 52%|█████▏ | 366/700 [03:25<03:03, 1.82it/s, loss=0.00974, lr=0.0004]
Steps: 52%|█████▏ | 367/700 [03:26<02:57, 1.87it/s, loss=0.00974, lr=0.0004]
Steps: 52%|█████▏ | 367/700 [03:26<02:57, 1.87it/s, loss=0.0581, lr=0.0004]
Steps: 53%|█████▎ | 368/700 [03:26<02:55, 1.89it/s, loss=0.0581, lr=0.0004]
Steps: 53%|█████▎ | 368/700 [03:26<02:55, 1.89it/s, loss=0.0141, lr=0.0004]
Steps: 53%|█████▎ | 369/700 [03:27<02:53, 1.91it/s, loss=0.0141, lr=0.0004]
Steps: 53%|█████▎ | 369/700 [03:27<02:53, 1.91it/s, loss=0.108, lr=0.0004]
Steps: 53%|█████▎ | 370/700 [03:27<02:51, 1.93it/s, loss=0.108, lr=0.0004]
Steps: 53%|█████▎ | 370/700 [03:27<02:51, 1.93it/s, loss=0.0274, lr=0.0004]
Steps: 53%|█████▎ | 371/700 [03:28<02:53, 1.90it/s, loss=0.0274, lr=0.0004]
Steps: 53%|█████▎ | 371/700 [03:28<02:53, 1.90it/s, loss=0.0238, lr=0.0004]
Steps: 53%|█████▎ | 372/700 [03:29<02:55, 1.87it/s, loss=0.0238, lr=0.0004]
Steps: 53%|█████▎ | 372/700 [03:29<02:55, 1.87it/s, loss=0.0135, lr=0.0004]
Steps: 53%|█████▎ | 373/700 [03:29<02:56, 1.85it/s, loss=0.0135, lr=0.0004]
Steps: 53%|█████▎ | 373/700 [03:29<02:56, 1.85it/s, loss=0.0273, lr=0.0004]
Steps: 53%|█████▎ | 374/700 [03:30<02:57, 1.84it/s, loss=0.0273, lr=0.0004]
Steps: 53%|█████▎ | 374/700 [03:30<02:57, 1.84it/s, loss=0.0107, lr=0.0004]
Steps: 54%|█████▎ | 375/700 [03:30<02:58, 1.82it/s, loss=0.0107, lr=0.0004]
Steps: 54%|█████▎ | 375/700 [03:30<02:58, 1.82it/s, loss=0.117, lr=0.0004]
Steps: 54%|█████▎ | 376/700 [03:31<02:57, 1.82it/s, loss=0.117, lr=0.0004]
Steps: 54%|█████▎ | 376/700 [03:31<02:57, 1.82it/s, loss=0.00753, lr=0.0004]
Steps: 54%|█████▍ | 377/700 [03:31<02:53, 1.86it/s, loss=0.00753, lr=0.0004]
Steps: 54%|█████▍ | 377/700 [03:31<02:53, 1.86it/s, loss=0.00374, lr=0.0004]
Steps: 54%|█████▍ | 378/700 [03:32<02:50, 1.89it/s, loss=0.00374, lr=0.0004]
Steps: 54%|█████▍ | 378/700 [03:32<02:50, 1.89it/s, loss=0.00199, lr=0.0004]
Steps: 54%|█████▍ | 379/700 [03:32<02:49, 1.89it/s, loss=0.00199, lr=0.0004]
Steps: 54%|█████▍ | 379/700 [03:32<02:49, 1.89it/s, loss=0.0103, lr=0.0004]
Steps: 54%|█████▍ | 380/700 [03:33<02:50, 1.88it/s, loss=0.0103, lr=0.0004]
Steps: 54%|█████▍ | 380/700 [03:33<02:50, 1.88it/s, loss=0.0585, lr=0.0004]
Steps: 54%|█████▍ | 381/700 [03:33<02:50, 1.87it/s, loss=0.0585, lr=0.0004]
Steps: 54%|█████▍ | 381/700 [03:33<02:50, 1.87it/s, loss=0.00844, lr=0.0004]
Steps: 55%|█████▍ | 382/700 [03:34<02:50, 1.87it/s, loss=0.00844, lr=0.0004]
Steps: 55%|█████▍ | 382/700 [03:34<02:50, 1.87it/s, loss=0.0385, lr=0.0004]
Steps: 55%|█████▍ | 383/700 [03:34<02:46, 1.90it/s, loss=0.0385, lr=0.0004]
Steps: 55%|█████▍ | 383/700 [03:34<02:46, 1.90it/s, loss=0.0191, lr=0.0004]
Steps: 55%|█████▍ | 384/700 [03:35<02:46, 1.90it/s, loss=0.0191, lr=0.0004]
Steps: 55%|█████▍ | 384/700 [03:35<02:46, 1.90it/s, loss=0.00918, lr=0.0004]
Steps: 55%|█████▌ | 385/700 [03:35<02:47, 1.88it/s, loss=0.00918, lr=0.0004]
Steps: 55%|█████▌ | 385/700 [03:36<02:47, 1.88it/s, loss=0.0416, lr=0.0004]
Steps: 55%|█████▌ | 386/700 [03:36<02:50, 1.84it/s, loss=0.0416, lr=0.0004]
Steps: 55%|█████▌ | 386/700 [03:36<02:50, 1.84it/s, loss=0.0671, lr=0.0004]
Steps: 55%|█████▌ | 387/700 [03:37<02:50, 1.84it/s, loss=0.0671, lr=0.0004]
Steps: 55%|█████▌ | 387/700 [03:37<02:50, 1.84it/s, loss=0.0628, lr=0.0004]
Steps: 55%|█████▌ | 388/700 [03:37<02:50, 1.83it/s, loss=0.0628, lr=0.0004]
Steps: 55%|█████▌ | 388/700 [03:37<02:50, 1.83it/s, loss=0.00164, lr=0.0004]
Steps: 56%|█████▌ | 389/700 [03:38<02:50, 1.83it/s, loss=0.00164, lr=0.0004]
Steps: 56%|█████▌ | 389/700 [03:38<02:50, 1.83it/s, loss=0.0177, lr=0.0004]
Steps: 56%|█████▌ | 390/700 [03:38<02:49, 1.83it/s, loss=0.0177, lr=0.0004]
Steps: 56%|█████▌ | 390/700 [03:38<02:49, 1.83it/s, loss=0.0583, lr=0.0004]
Steps: 56%|█████▌ | 391/700 [03:39<02:54, 1.77it/s, loss=0.0583, lr=0.0004]
Steps: 56%|█████▌ | 391/700 [03:39<02:54, 1.77it/s, loss=0.0428, lr=0.0004]
Steps: 56%|█████▌ | 392/700 [03:39<02:55, 1.76it/s, loss=0.0428, lr=0.0004]
Steps: 56%|█████▌ | 392/700 [03:39<02:55, 1.76it/s, loss=0.01, lr=0.0004]
Steps: 56%|█████▌ | 393/700 [03:40<02:55, 1.75it/s, loss=0.01, lr=0.0004]
Steps: 56%|█████▌ | 393/700 [03:40<02:55, 1.75it/s, loss=0.0341, lr=0.0004]
Steps: 56%|█████▋ | 394/700 [03:41<02:54, 1.76it/s, loss=0.0341, lr=0.0004]
Steps: 56%|█████▋ | 394/700 [03:41<02:54, 1.76it/s, loss=0.104, lr=0.0004]
Steps: 56%|█████▋ | 395/700 [03:41<02:54, 1.75it/s, loss=0.104, lr=0.0004]
Steps: 56%|█████▋ | 395/700 [03:41<02:54, 1.75it/s, loss=0.00275, lr=0.0004]
Steps: 57%|█████▋ | 396/700 [03:42<02:55, 1.74it/s, loss=0.00275, lr=0.0004]
Steps: 57%|█████▋ | 396/700 [03:42<02:55, 1.74it/s, loss=0.0398, lr=0.0004]
Steps: 57%|█████▋ | 397/700 [03:42<02:57, 1.71it/s, loss=0.0398, lr=0.0004]
Steps: 57%|█████▋ | 397/700 [03:42<02:57, 1.71it/s, loss=0.0031, lr=0.0004]
Steps: 57%|█████▋ | 398/700 [03:43<02:53, 1.74it/s, loss=0.0031, lr=0.0004]
Steps: 57%|█████▋ | 398/700 [03:43<02:53, 1.74it/s, loss=0.00922, lr=0.0004]
Steps: 57%|█████▋ | 399/700 [03:43<02:51, 1.76it/s, loss=0.00922, lr=0.0004]
Steps: 57%|█████▋ | 399/700 [03:43<02:51, 1.76it/s, loss=0.0128, lr=0.0004]
Steps: 57%|█████▋ | 400/700 [03:44<02:48, 1.79it/s, loss=0.0128, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_400.safetensors
LORA Unet Moved 0.0015479204012081027
LORA CLIP Moved 6.280629168031737e-05
Steps: 57%|█████▋ | 400/700 [03:44<02:48, 1.79it/s, loss=0.00486, lr=0.0004]
Steps: 57%|█████▋ | 401/700 [03:45<02:59, 1.67it/s, loss=0.00486, lr=0.0004]
Steps: 57%|█████▋ | 401/700 [03:45<02:59, 1.67it/s, loss=0.0242, lr=0.0004]
Steps: 57%|█████▋ | 402/700 [03:45<02:54, 1.70it/s, loss=0.0242, lr=0.0004]
Steps: 57%|█████▋ | 402/700 [03:45<02:54, 1.70it/s, loss=0.0114, lr=0.0004]
Steps: 58%|█████▊ | 403/700 [03:46<02:49, 1.75it/s, loss=0.0114, lr=0.0004]
Steps: 58%|█████▊ | 403/700 [03:46<02:49, 1.75it/s, loss=0.101, lr=0.0004]
Steps: 58%|█████▊ | 404/700 [03:46<02:44, 1.80it/s, loss=0.101, lr=0.0004]
Steps: 58%|█████▊ | 404/700 [03:46<02:44, 1.80it/s, loss=0.0565, lr=0.0004]
Steps: 58%|█████▊ | 405/700 [03:47<02:39, 1.85it/s, loss=0.0565, lr=0.0004]
Steps: 58%|█████▊ | 405/700 [03:47<02:39, 1.85it/s, loss=0.0139, lr=0.0004]
Steps: 58%|█████▊ | 406/700 [03:47<02:37, 1.86it/s, loss=0.0139, lr=0.0004]
Steps: 58%|█████▊ | 406/700 [03:47<02:37, 1.86it/s, loss=0.00395, lr=0.0004]
Steps: 58%|█████▊ | 407/700 [03:48<02:34, 1.89it/s, loss=0.00395, lr=0.0004]
Steps: 58%|█████▊ | 407/700 [03:48<02:34, 1.89it/s, loss=0.00693, lr=0.0004]
Steps: 58%|█████▊ | 408/700 [03:48<02:34, 1.89it/s, loss=0.00693, lr=0.0004]
Steps: 58%|█████▊ | 408/700 [03:48<02:34, 1.89it/s, loss=0.0185, lr=0.0004]
Steps: 58%|█████▊ | 409/700 [03:49<02:36, 1.85it/s, loss=0.0185, lr=0.0004]
Steps: 58%|█████▊ | 409/700 [03:49<02:36, 1.85it/s, loss=0.0226, lr=0.0004]
Steps: 59%|█████▊ | 410/700 [03:49<02:37, 1.84it/s, loss=0.0226, lr=0.0004]
Steps: 59%|█████▊ | 410/700 [03:49<02:37, 1.84it/s, loss=0.0122, lr=0.0004]
Steps: 59%|█████▊ | 411/700 [03:50<02:37, 1.83it/s, loss=0.0122, lr=0.0004]
Steps: 59%|█████▊ | 411/700 [03:50<02:37, 1.83it/s, loss=0.00795, lr=0.0004]
Steps: 59%|█████▉ | 412/700 [03:51<02:38, 1.82it/s, loss=0.00795, lr=0.0004]
Steps: 59%|█████▉ | 412/700 [03:51<02:38, 1.82it/s, loss=0.00217, lr=0.0004]
Steps: 59%|█████▉ | 413/700 [03:51<02:39, 1.80it/s, loss=0.00217, lr=0.0004]
Steps: 59%|█████▉ | 413/700 [03:51<02:39, 1.80it/s, loss=0.0183, lr=0.0004]
Steps: 59%|█████▉ | 414/700 [03:52<02:37, 1.82it/s, loss=0.0183, lr=0.0004]
Steps: 59%|█████▉ | 414/700 [03:52<02:37, 1.82it/s, loss=0.0149, lr=0.0004]
Steps: 59%|█████▉ | 415/700 [03:52<02:32, 1.87it/s, loss=0.0149, lr=0.0004]
Steps: 59%|█████▉ | 415/700 [03:52<02:32, 1.87it/s, loss=0.00353, lr=0.0004]
Steps: 59%|█████▉ | 416/700 [03:53<02:29, 1.90it/s, loss=0.00353, lr=0.0004]
Steps: 59%|█████▉ | 416/700 [03:53<02:29, 1.90it/s, loss=0.0368, lr=0.0004]
Steps: 60%|█████▉ | 417/700 [03:53<02:31, 1.87it/s, loss=0.0368, lr=0.0004]
Steps: 60%|█████▉ | 417/700 [03:53<02:31, 1.87it/s, loss=0.00279, lr=0.0004]
Steps: 60%|█████▉ | 418/700 [03:54<02:34, 1.83it/s, loss=0.00279, lr=0.0004]
Steps: 60%|█████▉ | 418/700 [03:54<02:34, 1.83it/s, loss=0.01, lr=0.0004]
Steps: 60%|█████▉ | 419/700 [03:54<02:34, 1.82it/s, loss=0.01, lr=0.0004]
Steps: 60%|█████▉ | 419/700 [03:54<02:34, 1.82it/s, loss=0.00632, lr=0.0004]
Steps: 60%|██████ | 420/700 [03:55<02:34, 1.81it/s, loss=0.00632, lr=0.0004]
Steps: 60%|██████ | 420/700 [03:55<02:34, 1.81it/s, loss=0.178, lr=0.0004]
Steps: 60%|██████ | 421/700 [03:55<02:31, 1.85it/s, loss=0.178, lr=0.0004]
Steps: 60%|██████ | 421/700 [03:55<02:31, 1.85it/s, loss=0.00584, lr=0.0004]
Steps: 60%|██████ | 422/700 [03:56<02:29, 1.85it/s, loss=0.00584, lr=0.0004]
Steps: 60%|██████ | 422/700 [03:56<02:29, 1.85it/s, loss=0.0698, lr=0.0004]
Steps: 60%|██████ | 423/700 [03:57<02:28, 1.87it/s, loss=0.0698, lr=0.0004]
Steps: 60%|██████ | 423/700 [03:57<02:28, 1.87it/s, loss=0.0128, lr=0.0004]
Steps: 61%|██████ | 424/700 [03:57<02:28, 1.86it/s, loss=0.0128, lr=0.0004]
Steps: 61%|██████ | 424/700 [03:57<02:28, 1.86it/s, loss=0.0616, lr=0.0004]
Steps: 61%|██████ | 425/700 [03:58<02:29, 1.84it/s, loss=0.0616, lr=0.0004]
Steps: 61%|██████ | 425/700 [03:58<02:29, 1.84it/s, loss=0.0102, lr=0.0004]
Steps: 61%|██████ | 426/700 [03:58<02:34, 1.77it/s, loss=0.0102, lr=0.0004]
Steps: 61%|██████ | 426/700 [03:58<02:34, 1.77it/s, loss=0.00736, lr=0.0004]
Steps: 61%|██████ | 427/700 [03:59<02:40, 1.70it/s, loss=0.00736, lr=0.0004]
Steps: 61%|██████ | 427/700 [03:59<02:40, 1.70it/s, loss=0.0113, lr=0.0004]
Steps: 61%|██████ | 428/700 [04:00<02:43, 1.67it/s, loss=0.0113, lr=0.0004]
Steps: 61%|██████ | 428/700 [04:00<02:43, 1.67it/s, loss=0.00517, lr=0.0004]
Steps: 61%|██████▏ | 429/700 [04:00<02:37, 1.72it/s, loss=0.00517, lr=0.0004]
Steps: 61%|██████▏ | 429/700 [04:00<02:37, 1.72it/s, loss=0.032, lr=0.0004]
Steps: 61%|██████▏ | 430/700 [04:01<02:33, 1.76it/s, loss=0.032, lr=0.0004]
Steps: 61%|██████▏ | 430/700 [04:01<02:33, 1.76it/s, loss=0.0133, lr=0.0004]
Steps: 62%|██████▏ | 431/700 [04:01<02:32, 1.77it/s, loss=0.0133, lr=0.0004]
Steps: 62%|██████▏ | 431/700 [04:01<02:32, 1.77it/s, loss=0.0429, lr=0.0004]
Steps: 62%|██████▏ | 432/700 [04:02<02:29, 1.79it/s, loss=0.0429, lr=0.0004]
Steps: 62%|██████▏ | 432/700 [04:02<02:29, 1.79it/s, loss=0.00896, lr=0.0004]
Steps: 62%|██████▏ | 433/700 [04:02<02:30, 1.78it/s, loss=0.00896, lr=0.0004]
Steps: 62%|██████▏ | 433/700 [04:02<02:30, 1.78it/s, loss=0.072, lr=0.0004]
Steps: 62%|██████▏ | 434/700 [04:03<02:28, 1.79it/s, loss=0.072, lr=0.0004]
Steps: 62%|██████▏ | 434/700 [04:03<02:28, 1.79it/s, loss=0.011, lr=0.0004]
Steps: 62%|██████▏ | 435/700 [04:03<02:26, 1.81it/s, loss=0.011, lr=0.0004]
Steps: 62%|██████▏ | 435/700 [04:03<02:26, 1.81it/s, loss=0.116, lr=0.0004]
Steps: 62%|██████▏ | 436/700 [04:04<02:25, 1.81it/s, loss=0.116, lr=0.0004]
Steps: 62%|██████▏ | 436/700 [04:04<02:25, 1.81it/s, loss=0.00514, lr=0.0004]
Steps: 62%|██████▏ | 437/700 [04:04<02:25, 1.81it/s, loss=0.00514, lr=0.0004]
Steps: 62%|██████▏ | 437/700 [04:04<02:25, 1.81it/s, loss=0.0137, lr=0.0004]
Steps: 63%|██████▎ | 438/700 [04:05<02:24, 1.81it/s, loss=0.0137, lr=0.0004]
Steps: 63%|██████▎ | 438/700 [04:05<02:24, 1.81it/s, loss=0.00167, lr=0.0004]
Steps: 63%|██████▎ | 439/700 [04:06<02:22, 1.84it/s, loss=0.00167, lr=0.0004]
Steps: 63%|██████▎ | 439/700 [04:06<02:22, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 63%|██████▎ | 440/700 [04:06<02:21, 1.84it/s, loss=0.0108, lr=0.0004]
Steps: 63%|██████▎ | 440/700 [04:06<02:21, 1.84it/s, loss=0.0135, lr=0.0004]
Steps: 63%|██████▎ | 441/700 [04:07<02:21, 1.83it/s, loss=0.0135, lr=0.0004]
Steps: 63%|██████▎ | 441/700 [04:07<02:21, 1.83it/s, loss=0.0287, lr=0.0004]
Steps: 63%|██████▎ | 442/700 [04:07<02:21, 1.83it/s, loss=0.0287, lr=0.0004]
Steps: 63%|██████▎ | 442/700 [04:07<02:21, 1.83it/s, loss=0.0146, lr=0.0004]
Steps: 63%|██████▎ | 443/700 [04:08<02:20, 1.83it/s, loss=0.0146, lr=0.0004]
Steps: 63%|██████▎ | 443/700 [04:08<02:20, 1.83it/s, loss=0.216, lr=0.0004]
Steps: 63%|██████▎ | 444/700 [04:08<02:19, 1.83it/s, loss=0.216, lr=0.0004]
Steps: 63%|██████▎ | 444/700 [04:08<02:19, 1.83it/s, loss=0.0454, lr=0.0004]
Steps: 64%|██████▎ | 445/700 [04:09<02:18, 1.84it/s, loss=0.0454, lr=0.0004]
Steps: 64%|██████▎ | 445/700 [04:09<02:18, 1.84it/s, loss=0.0396, lr=0.0004]
Steps: 64%|██████▎ | 446/700 [04:09<02:16, 1.86it/s, loss=0.0396, lr=0.0004]
Steps: 64%|██████▎ | 446/700 [04:09<02:16, 1.86it/s, loss=0.0378, lr=0.0004]
Steps: 64%|██████▍ | 447/700 [04:10<02:15, 1.86it/s, loss=0.0378, lr=0.0004]
Steps: 64%|██████▍ | 447/700 [04:10<02:15, 1.86it/s, loss=0.0112, lr=0.0004]
Steps: 64%|██████▍ | 448/700 [04:10<02:16, 1.85it/s, loss=0.0112, lr=0.0004]
Steps: 64%|██████▍ | 448/700 [04:10<02:16, 1.85it/s, loss=0.0411, lr=0.0004]
Steps: 64%|██████▍ | 449/700 [04:11<02:16, 1.83it/s, loss=0.0411, lr=0.0004]
Steps: 64%|██████▍ | 449/700 [04:11<02:16, 1.83it/s, loss=0.0222, lr=0.0004]
Steps: 64%|██████▍ | 450/700 [04:12<02:16, 1.83it/s, loss=0.0222, lr=0.0004]
Steps: 64%|██████▍ | 450/700 [04:12<02:16, 1.83it/s, loss=0.0735, lr=0.0004]
Steps: 64%|██████▍ | 451/700 [04:12<02:15, 1.84it/s, loss=0.0735, lr=0.0004]
Steps: 64%|██████▍ | 451/700 [04:12<02:15, 1.84it/s, loss=0.0261, lr=0.0004]
Steps: 65%|██████▍ | 452/700 [04:13<02:15, 1.83it/s, loss=0.0261, lr=0.0004]
Steps: 65%|██████▍ | 452/700 [04:13<02:15, 1.83it/s, loss=0.0861, lr=0.0004]
Steps: 65%|██████▍ | 453/700 [04:13<02:15, 1.82it/s, loss=0.0861, lr=0.0004]
Steps: 65%|██████▍ | 453/700 [04:13<02:15, 1.82it/s, loss=0.148, lr=0.0004]
Steps: 65%|██████▍ | 454/700 [04:14<02:16, 1.81it/s, loss=0.148, lr=0.0004]
Steps: 65%|██████▍ | 454/700 [04:14<02:16, 1.81it/s, loss=0.0519, lr=0.0004]
Steps: 65%|██████▌ | 455/700 [04:14<02:15, 1.80it/s, loss=0.0519, lr=0.0004]
Steps: 65%|██████▌ | 455/700 [04:14<02:15, 1.80it/s, loss=0.0917, lr=0.0004]
Steps: 65%|██████▌ | 456/700 [04:15<02:16, 1.79it/s, loss=0.0917, lr=0.0004]
Steps: 65%|██████▌ | 456/700 [04:15<02:16, 1.79it/s, loss=0.00812, lr=0.0004]
Steps: 65%|██████▌ | 457/700 [04:15<02:14, 1.81it/s, loss=0.00812, lr=0.0004]
Steps: 65%|██████▌ | 457/700 [04:15<02:14, 1.81it/s, loss=0.0117, lr=0.0004]
Steps: 65%|██████▌ | 458/700 [04:16<02:12, 1.82it/s, loss=0.0117, lr=0.0004]
Steps: 65%|██████▌ | 458/700 [04:16<02:12, 1.82it/s, loss=0.0597, lr=0.0004]
Steps: 66%|██████▌ | 459/700 [04:16<02:12, 1.82it/s, loss=0.0597, lr=0.0004]
Steps: 66%|██████▌ | 459/700 [04:16<02:12, 1.82it/s, loss=0.0163, lr=0.0004]
Steps: 66%|██████▌ | 460/700 [04:17<02:11, 1.82it/s, loss=0.0163, lr=0.0004]
Steps: 66%|██████▌ | 460/700 [04:17<02:11, 1.82it/s, loss=0.0808, lr=0.0004]
Steps: 66%|██████▌ | 461/700 [04:18<02:10, 1.83it/s, loss=0.0808, lr=0.0004]
Steps: 66%|██████▌ | 461/700 [04:18<02:10, 1.83it/s, loss=0.0125, lr=0.0004]
Steps: 66%|██████▌ | 462/700 [04:18<02:10, 1.83it/s, loss=0.0125, lr=0.0004]
Steps: 66%|██████▌ | 462/700 [04:18<02:10, 1.83it/s, loss=0.00627, lr=0.0004]
Steps: 66%|██████▌ | 463/700 [04:19<02:09, 1.83it/s, loss=0.00627, lr=0.0004]
Steps: 66%|██████▌ | 463/700 [04:19<02:09, 1.83it/s, loss=0.049, lr=0.0004]
Steps: 66%|██████▋ | 464/700 [04:19<02:09, 1.83it/s, loss=0.049, lr=0.0004]
Steps: 66%|██████▋ | 464/700 [04:19<02:09, 1.83it/s, loss=0.0678, lr=0.0004]
Steps: 66%|██████▋ | 465/700 [04:20<02:09, 1.82it/s, loss=0.0678, lr=0.0004]
Steps: 66%|██████▋ | 465/700 [04:20<02:09, 1.82it/s, loss=0.035, lr=0.0004]
Steps: 67%|██████▋ | 466/700 [04:20<02:08, 1.82it/s, loss=0.035, lr=0.0004]
Steps: 67%|██████▋ | 466/700 [04:20<02:08, 1.82it/s, loss=0.131, lr=0.0004]
Steps: 67%|██████▋ | 467/700 [04:21<02:08, 1.82it/s, loss=0.131, lr=0.0004]
Steps: 67%|██████▋ | 467/700 [04:21<02:08, 1.82it/s, loss=0.277, lr=0.0004]
Steps: 67%|██████▋ | 468/700 [04:21<02:06, 1.83it/s, loss=0.277, lr=0.0004]
Steps: 67%|██████▋ | 468/700 [04:21<02:06, 1.83it/s, loss=0.0124, lr=0.0004]
Steps: 67%|██████▋ | 469/700 [04:22<02:06, 1.82it/s, loss=0.0124, lr=0.0004]
Steps: 67%|██████▋ | 469/700 [04:22<02:06, 1.82it/s, loss=0.0462, lr=0.0004]
Steps: 67%|██████▋ | 470/700 [04:23<02:07, 1.80it/s, loss=0.0462, lr=0.0004]
Steps: 67%|██████▋ | 470/700 [04:23<02:07, 1.80it/s, loss=0.0415, lr=0.0004]
Steps: 67%|██████▋ | 471/700 [04:23<02:05, 1.82it/s, loss=0.0415, lr=0.0004]
Steps: 67%|██████▋ | 471/700 [04:23<02:05, 1.82it/s, loss=0.169, lr=0.0004]
Steps: 67%|██████▋ | 472/700 [04:24<02:04, 1.83it/s, loss=0.169, lr=0.0004]
Steps: 67%|██████▋ | 472/700 [04:24<02:04, 1.83it/s, loss=0.0197, lr=0.0004]
Steps: 68%|██████▊ | 473/700 [04:24<02:04, 1.82it/s, loss=0.0197, lr=0.0004]
Steps: 68%|██████▊ | 473/700 [04:24<02:04, 1.82it/s, loss=0.0275, lr=0.0004]
Steps: 68%|██████▊ | 474/700 [04:25<02:03, 1.83it/s, loss=0.0275, lr=0.0004]
Steps: 68%|██████▊ | 474/700 [04:25<02:03, 1.83it/s, loss=0.00273, lr=0.0004]
Steps: 68%|██████▊ | 475/700 [04:25<02:04, 1.81it/s, loss=0.00273, lr=0.0004]
Steps: 68%|██████▊ | 475/700 [04:25<02:04, 1.81it/s, loss=0.0279, lr=0.0004]
Steps: 68%|██████▊ | 476/700 [04:26<02:04, 1.79it/s, loss=0.0279, lr=0.0004]
Steps: 68%|██████▊ | 476/700 [04:26<02:04, 1.79it/s, loss=0.133, lr=0.0004]
Steps: 68%|██████▊ | 477/700 [04:26<02:04, 1.79it/s, loss=0.133, lr=0.0004]
Steps: 68%|██████▊ | 477/700 [04:26<02:04, 1.79it/s, loss=0.00584, lr=0.0004]
Steps: 68%|██████▊ | 478/700 [04:27<02:03, 1.79it/s, loss=0.00584, lr=0.0004]
Steps: 68%|██████▊ | 478/700 [04:27<02:03, 1.79it/s, loss=0.0541, lr=0.0004]
Steps: 68%|██████▊ | 479/700 [04:28<02:03, 1.79it/s, loss=0.0541, lr=0.0004]
Steps: 68%|██████▊ | 479/700 [04:28<02:03, 1.79it/s, loss=0.0163, lr=0.0004]
Steps: 69%|██████▊ | 480/700 [04:28<02:03, 1.79it/s, loss=0.0163, lr=0.0004]
Steps: 69%|██████▊ | 480/700 [04:28<02:03, 1.79it/s, loss=0.00538, lr=0.0004]
Steps: 69%|██████▊ | 481/700 [04:29<02:01, 1.80it/s, loss=0.00538, lr=0.0004]
Steps: 69%|██████▊ | 481/700 [04:29<02:01, 1.80it/s, loss=0.00586, lr=0.0004]
Steps: 69%|██████▉ | 482/700 [04:29<02:00, 1.81it/s, loss=0.00586, lr=0.0004]
Steps: 69%|██████▉ | 482/700 [04:29<02:00, 1.81it/s, loss=0.0193, lr=0.0004]
Steps: 69%|██████▉ | 483/700 [04:30<02:00, 1.80it/s, loss=0.0193, lr=0.0004]
Steps: 69%|██████▉ | 483/700 [04:30<02:00, 1.80it/s, loss=0.00902, lr=0.0004]
Steps: 69%|██████▉ | 484/700 [04:30<02:03, 1.76it/s, loss=0.00902, lr=0.0004]
Steps: 69%|██████▉ | 484/700 [04:30<02:03, 1.76it/s, loss=0.386, lr=0.0004]
Steps: 69%|██████▉ | 485/700 [04:31<01:59, 1.80it/s, loss=0.386, lr=0.0004]
Steps: 69%|██████▉ | 485/700 [04:31<01:59, 1.80it/s, loss=0.00357, lr=0.0004]
Steps: 69%|██████▉ | 486/700 [04:31<01:56, 1.83it/s, loss=0.00357, lr=0.0004]
Steps: 69%|██████▉ | 486/700 [04:31<01:56, 1.83it/s, loss=0.0271, lr=0.0004]
Steps: 70%|██████▉ | 487/700 [04:32<01:57, 1.81it/s, loss=0.0271, lr=0.0004]
Steps: 70%|██████▉ | 487/700 [04:32<01:57, 1.81it/s, loss=0.122, lr=0.0004]
Steps: 70%|██████▉ | 488/700 [04:32<01:54, 1.85it/s, loss=0.122, lr=0.0004]
Steps: 70%|██████▉ | 488/700 [04:32<01:54, 1.85it/s, loss=0.0115, lr=0.0004]
Steps: 70%|██████▉ | 489/700 [04:33<01:52, 1.88it/s, loss=0.0115, lr=0.0004]
Steps: 70%|██████▉ | 489/700 [04:33<01:52, 1.88it/s, loss=0.0324, lr=0.0004]
Steps: 70%|███████ | 490/700 [04:34<01:50, 1.89it/s, loss=0.0324, lr=0.0004]
Steps: 70%|███████ | 490/700 [04:34<01:50, 1.89it/s, loss=0.00157, lr=0.0004]
Steps: 70%|███████ | 491/700 [04:34<01:49, 1.91it/s, loss=0.00157, lr=0.0004]
Steps: 70%|███████ | 491/700 [04:34<01:49, 1.91it/s, loss=0.014, lr=0.0004]
Steps: 70%|███████ | 492/700 [04:35<01:48, 1.92it/s, loss=0.014, lr=0.0004]
Steps: 70%|███████ | 492/700 [04:35<01:48, 1.92it/s, loss=0.0567, lr=0.0004]
Steps: 70%|███████ | 493/700 [04:35<01:49, 1.90it/s, loss=0.0567, lr=0.0004]
Steps: 70%|███████ | 493/700 [04:35<01:49, 1.90it/s, loss=0.046, lr=0.0004]
Steps: 71%|███████ | 494/700 [04:36<01:50, 1.87it/s, loss=0.046, lr=0.0004]
Steps: 71%|███████ | 494/700 [04:36<01:50, 1.87it/s, loss=0.0275, lr=0.0004]
Steps: 71%|███████ | 495/700 [04:36<01:52, 1.83it/s, loss=0.0275, lr=0.0004]
Steps: 71%|███████ | 495/700 [04:36<01:52, 1.83it/s, loss=0.00814, lr=0.0004]
Steps: 71%|███████ | 496/700 [04:37<01:51, 1.82it/s, loss=0.00814, lr=0.0004]
Steps: 71%|███████ | 496/700 [04:37<01:51, 1.82it/s, loss=0.00738, lr=0.0004]
Steps: 71%|███████ | 497/700 [04:37<01:51, 1.82it/s, loss=0.00738, lr=0.0004]
Steps: 71%|███████ | 497/700 [04:37<01:51, 1.82it/s, loss=0.00353, lr=0.0004]
Steps: 71%|███████ | 498/700 [04:38<01:49, 1.85it/s, loss=0.00353, lr=0.0004]
Steps: 71%|███████ | 498/700 [04:38<01:49, 1.85it/s, loss=0.0116, lr=0.0004]
Steps: 71%|███████▏ | 499/700 [04:38<01:47, 1.88it/s, loss=0.0116, lr=0.0004]
Steps: 71%|███████▏ | 499/700 [04:38<01:47, 1.88it/s, loss=0.133, lr=0.0004]
Steps: 71%|███████▏ | 500/700 [04:39<01:47, 1.86it/s, loss=0.133, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_500.safetensors
LORA Unet Moved 0.0018108426593244076
LORA CLIP Moved 7.164952694438398e-05
Steps: 71%|███████▏ | 500/700 [04:39<01:47, 1.86it/s, loss=0.0136, lr=0.0004]
Steps: 72%|███████▏ | 501/700 [04:40<01:57, 1.70it/s, loss=0.0136, lr=0.0004]
Steps: 72%|███████▏ | 501/700 [04:40<01:57, 1.70it/s, loss=0.0168, lr=0.0004]
Steps: 72%|███████▏ | 502/700 [04:40<01:53, 1.74it/s, loss=0.0168, lr=0.0004]
Steps: 72%|███████▏ | 502/700 [04:40<01:53, 1.74it/s, loss=0.0313, lr=0.0004]
Steps: 72%|███████▏ | 503/700 [04:41<01:51, 1.76it/s, loss=0.0313, lr=0.0004]
Steps: 72%|███████▏ | 503/700 [04:41<01:51, 1.76it/s, loss=0.162, lr=0.0004]
Steps: 72%|███████▏ | 504/700 [04:41<01:49, 1.78it/s, loss=0.162, lr=0.0004]
Steps: 72%|███████▏ | 504/700 [04:41<01:49, 1.78it/s, loss=0.0117, lr=0.0004]
Steps: 72%|███████▏ | 505/700 [04:42<01:48, 1.80it/s, loss=0.0117, lr=0.0004]
Steps: 72%|███████▏ | 505/700 [04:42<01:48, 1.80it/s, loss=0.00169, lr=0.0004]
Steps: 72%|███████▏ | 506/700 [04:42<01:46, 1.81it/s, loss=0.00169, lr=0.0004]
Steps: 72%|███████▏ | 506/700 [04:42<01:46, 1.81it/s, loss=0.0182, lr=0.0004]
Steps: 72%|███████▏ | 507/700 [04:43<01:45, 1.83it/s, loss=0.0182, lr=0.0004]
Steps: 72%|███████▏ | 507/700 [04:43<01:45, 1.83it/s, loss=0.0245, lr=0.0004]
Steps: 73%|███████▎ | 508/700 [04:43<01:46, 1.81it/s, loss=0.0245, lr=0.0004]
Steps: 73%|███████▎ | 508/700 [04:43<01:46, 1.81it/s, loss=0.00677, lr=0.0004]
Steps: 73%|███████▎ | 509/700 [04:44<01:45, 1.81it/s, loss=0.00677, lr=0.0004]
Steps: 73%|███████▎ | 509/700 [04:44<01:45, 1.81it/s, loss=0.076, lr=0.0004]
Steps: 73%|███████▎ | 510/700 [04:45<01:44, 1.82it/s, loss=0.076, lr=0.0004]
Steps: 73%|███████▎ | 510/700 [04:45<01:44, 1.82it/s, loss=0.295, lr=0.0004]
Steps: 73%|███████▎ | 511/700 [04:45<01:44, 1.80it/s, loss=0.295, lr=0.0004]
Steps: 73%|███████▎ | 511/700 [04:45<01:44, 1.80it/s, loss=0.00341, lr=0.0004]
Steps: 73%|███████▎ | 512/700 [04:46<01:44, 1.80it/s, loss=0.00341, lr=0.0004]
Steps: 73%|███████▎ | 512/700 [04:46<01:44, 1.80it/s, loss=0.0115, lr=0.0004]
Steps: 73%|███████▎ | 513/700 [04:46<01:41, 1.84it/s, loss=0.0115, lr=0.0004]
Steps: 73%|███████▎ | 513/700 [04:46<01:41, 1.84it/s, loss=0.0503, lr=0.0004]
Steps: 73%|███████▎ | 514/700 [04:47<01:39, 1.86it/s, loss=0.0503, lr=0.0004]
Steps: 73%|███████▎ | 514/700 [04:47<01:39, 1.86it/s, loss=0.00832, lr=0.0004]
Steps: 74%|███████▎ | 515/700 [04:47<01:40, 1.84it/s, loss=0.00832, lr=0.0004]
Steps: 74%|███████▎ | 515/700 [04:47<01:40, 1.84it/s, loss=0.00209, lr=0.0004]
Steps: 74%|███████▎ | 516/700 [04:48<01:40, 1.83it/s, loss=0.00209, lr=0.0004]
Steps: 74%|███████▎ | 516/700 [04:48<01:40, 1.83it/s, loss=0.014, lr=0.0004]
Steps: 74%|███████▍ | 517/700 [04:48<01:40, 1.83it/s, loss=0.014, lr=0.0004]
Steps: 74%|███████▍ | 517/700 [04:48<01:40, 1.83it/s, loss=0.035, lr=0.0004]
Steps: 74%|███████▍ | 518/700 [04:49<01:40, 1.82it/s, loss=0.035, lr=0.0004]
Steps: 74%|███████▍ | 518/700 [04:49<01:40, 1.82it/s, loss=0.223, lr=0.0004]
Steps: 74%|███████▍ | 519/700 [04:49<01:39, 1.82it/s, loss=0.223, lr=0.0004]
Steps: 74%|███████▍ | 519/700 [04:49<01:39, 1.82it/s, loss=0.0441, lr=0.0004]
Steps: 74%|███████▍ | 520/700 [04:50<01:38, 1.83it/s, loss=0.0441, lr=0.0004]
Steps: 74%|███████▍ | 520/700 [04:50<01:38, 1.83it/s, loss=0.0202, lr=0.0004]
Steps: 74%|███████▍ | 521/700 [04:50<01:35, 1.88it/s, loss=0.0202, lr=0.0004]
Steps: 74%|███████▍ | 521/700 [04:50<01:35, 1.88it/s, loss=0.0171, lr=0.0004]
Steps: 75%|███████▍ | 522/700 [04:51<01:33, 1.90it/s, loss=0.0171, lr=0.0004]
Steps: 75%|███████▍ | 522/700 [04:51<01:33, 1.90it/s, loss=0.0126, lr=0.0004]
Steps: 75%|███████▍ | 523/700 [04:52<01:34, 1.87it/s, loss=0.0126, lr=0.0004]
Steps: 75%|███████▍ | 523/700 [04:52<01:34, 1.87it/s, loss=0.0803, lr=0.0004]
Steps: 75%|███████▍ | 524/700 [04:52<01:34, 1.87it/s, loss=0.0803, lr=0.0004]
Steps: 75%|███████▍ | 524/700 [04:52<01:34, 1.87it/s, loss=0.00485, lr=0.0004]
Steps: 75%|███████▌ | 525/700 [04:53<01:33, 1.87it/s, loss=0.00485, lr=0.0004]
Steps: 75%|███████▌ | 525/700 [04:53<01:33, 1.87it/s, loss=0.0205, lr=0.0004]
Steps: 75%|███████▌ | 526/700 [04:53<01:31, 1.90it/s, loss=0.0205, lr=0.0004]
Steps: 75%|███████▌ | 526/700 [04:53<01:31, 1.90it/s, loss=0.0313, lr=0.0004]
Steps: 75%|███████▌ | 527/700 [04:54<01:32, 1.88it/s, loss=0.0313, lr=0.0004]
Steps: 75%|███████▌ | 527/700 [04:54<01:32, 1.88it/s, loss=0.00287, lr=0.0004]
Steps: 75%|███████▌ | 528/700 [04:54<01:32, 1.85it/s, loss=0.00287, lr=0.0004]
Steps: 75%|███████▌ | 528/700 [04:54<01:32, 1.85it/s, loss=0.00346, lr=0.0004]
Steps: 76%|███████▌ | 529/700 [04:55<01:33, 1.83it/s, loss=0.00346, lr=0.0004]
Steps: 76%|███████▌ | 529/700 [04:55<01:33, 1.83it/s, loss=0.277, lr=0.0004]
Steps: 76%|███████▌ | 530/700 [04:55<01:31, 1.85it/s, loss=0.277, lr=0.0004]
Steps: 76%|███████▌ | 530/700 [04:55<01:31, 1.85it/s, loss=0.114, lr=0.0004]
Steps: 76%|███████▌ | 531/700 [04:56<01:31, 1.84it/s, loss=0.114, lr=0.0004]
Steps: 76%|███████▌ | 531/700 [04:56<01:31, 1.84it/s, loss=0.00907, lr=0.0004]
Steps: 76%|███████▌ | 532/700 [04:56<01:29, 1.88it/s, loss=0.00907, lr=0.0004]
Steps: 76%|███████▌ | 532/700 [04:56<01:29, 1.88it/s, loss=0.0188, lr=0.0004]
Steps: 76%|███████▌ | 533/700 [04:57<01:27, 1.92it/s, loss=0.0188, lr=0.0004]
Steps: 76%|███████▌ | 533/700 [04:57<01:27, 1.92it/s, loss=0.00488, lr=0.0004]
Steps: 76%|███████▋ | 534/700 [04:57<01:25, 1.95it/s, loss=0.00488, lr=0.0004]
Steps: 76%|███████▋ | 534/700 [04:57<01:25, 1.95it/s, loss=0.043, lr=0.0004]
Steps: 76%|███████▋ | 535/700 [04:58<01:23, 1.97it/s, loss=0.043, lr=0.0004]
Steps: 76%|███████▋ | 535/700 [04:58<01:23, 1.97it/s, loss=0.0856, lr=0.0004]
Steps: 77%|███████▋ | 536/700 [04:58<01:23, 1.96it/s, loss=0.0856, lr=0.0004]
Steps: 77%|███████▋ | 536/700 [04:58<01:23, 1.96it/s, loss=0.0465, lr=0.0004]
Steps: 77%|███████▋ | 537/700 [04:59<01:26, 1.88it/s, loss=0.0465, lr=0.0004]
Steps: 77%|███████▋ | 537/700 [04:59<01:26, 1.88it/s, loss=0.0128, lr=0.0004]
Steps: 77%|███████▋ | 538/700 [05:00<01:26, 1.87it/s, loss=0.0128, lr=0.0004]
Steps: 77%|███████▋ | 538/700 [05:00<01:26, 1.87it/s, loss=0.0311, lr=0.0004]
Steps: 77%|███████▋ | 539/700 [05:00<01:26, 1.87it/s, loss=0.0311, lr=0.0004]
Steps: 77%|███████▋ | 539/700 [05:00<01:26, 1.87it/s, loss=0.0866, lr=0.0004]
Steps: 77%|███████▋ | 540/700 [05:01<01:26, 1.86it/s, loss=0.0866, lr=0.0004]
Steps: 77%|███████▋ | 540/700 [05:01<01:26, 1.86it/s, loss=0.0238, lr=0.0004]
Steps: 77%|███████▋ | 541/700 [05:01<01:25, 1.87it/s, loss=0.0238, lr=0.0004]
Steps: 77%|███████▋ | 541/700 [05:01<01:25, 1.87it/s, loss=0.167, lr=0.0004]
Steps: 77%|███████▋ | 542/700 [05:02<01:23, 1.90it/s, loss=0.167, lr=0.0004]
Steps: 77%|███████▋ | 542/700 [05:02<01:23, 1.90it/s, loss=0.0733, lr=0.0004]
Steps: 78%|███████▊ | 543/700 [05:02<01:23, 1.87it/s, loss=0.0733, lr=0.0004]
Steps: 78%|███████▊ | 543/700 [05:02<01:23, 1.87it/s, loss=0.0158, lr=0.0004]
Steps: 78%|███████▊ | 544/700 [05:03<01:24, 1.86it/s, loss=0.0158, lr=0.0004]
Steps: 78%|███████▊ | 544/700 [05:03<01:24, 1.86it/s, loss=0.0303, lr=0.0004]
Steps: 78%|███████▊ | 545/700 [05:03<01:24, 1.82it/s, loss=0.0303, lr=0.0004]
Steps: 78%|███████▊ | 545/700 [05:03<01:24, 1.82it/s, loss=0.00213, lr=0.0004]
Steps: 78%|███████▊ | 546/700 [05:04<01:26, 1.78it/s, loss=0.00213, lr=0.0004]
Steps: 78%|███████▊ | 546/700 [05:04<01:26, 1.78it/s, loss=0.0131, lr=0.0004]
Steps: 78%|███████▊ | 547/700 [05:04<01:24, 1.80it/s, loss=0.0131, lr=0.0004]
Steps: 78%|███████▊ | 547/700 [05:04<01:24, 1.80it/s, loss=0.00865, lr=0.0004]
Steps: 78%|███████▊ | 548/700 [05:05<01:21, 1.86it/s, loss=0.00865, lr=0.0004]
Steps: 78%|███████▊ | 548/700 [05:05<01:21, 1.86it/s, loss=0.0364, lr=0.0004]
Steps: 78%|███████▊ | 549/700 [05:05<01:20, 1.88it/s, loss=0.0364, lr=0.0004]
Steps: 78%|███████▊ | 549/700 [05:05<01:20, 1.88it/s, loss=0.0189, lr=0.0004]
Steps: 79%|███████▊ | 550/700 [05:06<01:19, 1.88it/s, loss=0.0189, lr=0.0004]
Steps: 79%|███████▊ | 550/700 [05:06<01:19, 1.88it/s, loss=0.0136, lr=0.0004]
Steps: 79%|███████▊ | 551/700 [05:07<01:19, 1.88it/s, loss=0.0136, lr=0.0004]
Steps: 79%|███████▊ | 551/700 [05:07<01:19, 1.88it/s, loss=0.0498, lr=0.0004]
Steps: 79%|███████▉ | 552/700 [05:07<01:18, 1.89it/s, loss=0.0498, lr=0.0004]
Steps: 79%|███████▉ | 552/700 [05:07<01:18, 1.89it/s, loss=0.0141, lr=0.0004]
Steps: 79%|███████▉ | 553/700 [05:08<01:16, 1.92it/s, loss=0.0141, lr=0.0004]
Steps: 79%|███████▉ | 553/700 [05:08<01:16, 1.92it/s, loss=0.00719, lr=0.0004]
Steps: 79%|███████▉ | 554/700 [05:08<01:15, 1.93it/s, loss=0.00719, lr=0.0004]
Steps: 79%|███████▉ | 554/700 [05:08<01:15, 1.93it/s, loss=0.00273, lr=0.0004]
Steps: 79%|███████▉ | 555/700 [05:09<01:15, 1.92it/s, loss=0.00273, lr=0.0004]
Steps: 79%|███████▉ | 555/700 [05:09<01:15, 1.92it/s, loss=0.0116, lr=0.0004]
Steps: 79%|███████▉ | 556/700 [05:09<01:14, 1.94it/s, loss=0.0116, lr=0.0004]
Steps: 79%|███████▉ | 556/700 [05:09<01:14, 1.94it/s, loss=0.0282, lr=0.0004]
Steps: 80%|███████▉ | 557/700 [05:10<01:13, 1.95it/s, loss=0.0282, lr=0.0004]
Steps: 80%|███████▉ | 557/700 [05:10<01:13, 1.95it/s, loss=0.0122, lr=0.0004]
Steps: 80%|███████▉ | 558/700 [05:10<01:11, 1.97it/s, loss=0.0122, lr=0.0004]
Steps: 80%|███████▉ | 558/700 [05:10<01:11, 1.97it/s, loss=0.0149, lr=0.0004]
Steps: 80%|███████▉ | 559/700 [05:11<01:11, 1.96it/s, loss=0.0149, lr=0.0004]
Steps: 80%|███████▉ | 559/700 [05:11<01:11, 1.96it/s, loss=0.00336, lr=0.0004]
Steps: 80%|████████ | 560/700 [05:11<01:10, 1.98it/s, loss=0.00336, lr=0.0004]
Steps: 80%|████████ | 560/700 [05:11<01:10, 1.98it/s, loss=0.0495, lr=0.0004]
Steps: 80%|████████ | 561/700 [05:12<01:10, 1.98it/s, loss=0.0495, lr=0.0004]
Steps: 80%|████████ | 561/700 [05:12<01:10, 1.98it/s, loss=0.00663, lr=0.0004]
Steps: 80%|████████ | 562/700 [05:12<01:10, 1.96it/s, loss=0.00663, lr=0.0004]
Steps: 80%|████████ | 562/700 [05:12<01:10, 1.96it/s, loss=0.00749, lr=0.0004]
Steps: 80%|████████ | 563/700 [05:13<01:10, 1.94it/s, loss=0.00749, lr=0.0004]
Steps: 80%|████████ | 563/700 [05:13<01:10, 1.94it/s, loss=0.0777, lr=0.0004]
Steps: 81%|████████ | 564/700 [05:13<01:10, 1.94it/s, loss=0.0777, lr=0.0004]
Steps: 81%|████████ | 564/700 [05:13<01:10, 1.94it/s, loss=0.00752, lr=0.0004]
Steps: 81%|████████ | 565/700 [05:14<01:09, 1.94it/s, loss=0.00752, lr=0.0004]
Steps: 81%|████████ | 565/700 [05:14<01:09, 1.94it/s, loss=0.0213, lr=0.0004]
Steps: 81%|████████ | 566/700 [05:14<01:08, 1.95it/s, loss=0.0213, lr=0.0004]
Steps: 81%|████████ | 566/700 [05:14<01:08, 1.95it/s, loss=0.182, lr=0.0004]
Steps: 81%|████████ | 567/700 [05:15<01:08, 1.93it/s, loss=0.182, lr=0.0004]
Steps: 81%|████████ | 567/700 [05:15<01:08, 1.93it/s, loss=0.00876, lr=0.0004]
Steps: 81%|████████ | 568/700 [05:15<01:08, 1.94it/s, loss=0.00876, lr=0.0004]
Steps: 81%|████████ | 568/700 [05:15<01:08, 1.94it/s, loss=0.0193, lr=0.0004]
Steps: 81%|████████▏ | 569/700 [05:16<01:06, 1.97it/s, loss=0.0193, lr=0.0004]
Steps: 81%|████████▏ | 569/700 [05:16<01:06, 1.97it/s, loss=0.0154, lr=0.0004]
Steps: 81%|████████▏ | 570/700 [05:16<01:07, 1.91it/s, loss=0.0154, lr=0.0004]
Steps: 81%|████████▏ | 570/700 [05:16<01:07, 1.91it/s, loss=0.346, lr=0.0004]
Steps: 82%|████████▏ | 571/700 [05:17<01:09, 1.84it/s, loss=0.346, lr=0.0004]
Steps: 82%|████████▏ | 571/700 [05:17<01:09, 1.84it/s, loss=0.00996, lr=0.0004]
Steps: 82%|████████▏ | 572/700 [05:17<01:09, 1.84it/s, loss=0.00996, lr=0.0004]
Steps: 82%|████████▏ | 572/700 [05:17<01:09, 1.84it/s, loss=0.0344, lr=0.0004]
Steps: 82%|████████▏ | 573/700 [05:18<01:09, 1.81it/s, loss=0.0344, lr=0.0004]
Steps: 82%|████████▏ | 573/700 [05:18<01:09, 1.81it/s, loss=0.00388, lr=0.0004]
Steps: 82%|████████▏ | 574/700 [05:19<01:11, 1.75it/s, loss=0.00388, lr=0.0004]
Steps: 82%|████████▏ | 574/700 [05:19<01:11, 1.75it/s, loss=0.00327, lr=0.0004]
Steps: 82%|████████▏ | 575/700 [05:19<01:11, 1.75it/s, loss=0.00327, lr=0.0004]
Steps: 82%|████████▏ | 575/700 [05:19<01:11, 1.75it/s, loss=0.0173, lr=0.0004]
Steps: 82%|████████▏ | 576/700 [05:20<01:09, 1.77it/s, loss=0.0173, lr=0.0004]
Steps: 82%|████████▏ | 576/700 [05:20<01:09, 1.77it/s, loss=0.0087, lr=0.0004]
Steps: 82%|████████▏ | 577/700 [05:20<01:09, 1.77it/s, loss=0.0087, lr=0.0004]
Steps: 82%|████████▏ | 577/700 [05:20<01:09, 1.77it/s, loss=0.0399, lr=0.0004]
Steps: 83%|████████▎ | 578/700 [05:21<01:09, 1.76it/s, loss=0.0399, lr=0.0004]
Steps: 83%|████████▎ | 578/700 [05:21<01:09, 1.76it/s, loss=0.00906, lr=0.0004]
Steps: 83%|████████▎ | 579/700 [05:21<01:08, 1.78it/s, loss=0.00906, lr=0.0004]
Steps: 83%|████████▎ | 579/700 [05:21<01:08, 1.78it/s, loss=0.0716, lr=0.0004]
Steps: 83%|████████▎ | 580/700 [05:22<01:07, 1.77it/s, loss=0.0716, lr=0.0004]
Steps: 83%|████████▎ | 580/700 [05:22<01:07, 1.77it/s, loss=0.214, lr=0.0004]
Steps: 83%|████████▎ | 581/700 [05:23<01:07, 1.75it/s, loss=0.214, lr=0.0004]
Steps: 83%|████████▎ | 581/700 [05:23<01:07, 1.75it/s, loss=0.0602, lr=0.0004]
Steps: 83%|████████▎ | 582/700 [05:23<01:07, 1.75it/s, loss=0.0602, lr=0.0004]
Steps: 83%|████████▎ | 582/700 [05:23<01:07, 1.75it/s, loss=0.0708, lr=0.0004]
Steps: 83%|████████▎ | 583/700 [05:24<01:07, 1.75it/s, loss=0.0708, lr=0.0004]
Steps: 83%|████████▎ | 583/700 [05:24<01:07, 1.75it/s, loss=0.00627, lr=0.0004]
Steps: 83%|████████▎ | 584/700 [05:24<01:05, 1.76it/s, loss=0.00627, lr=0.0004]
Steps: 83%|████████▎ | 584/700 [05:24<01:05, 1.76it/s, loss=0.00603, lr=0.0004]
Steps: 84%|████████▎ | 585/700 [05:25<01:05, 1.76it/s, loss=0.00603, lr=0.0004]
Steps: 84%|████████▎ | 585/700 [05:25<01:05, 1.76it/s, loss=0.0861, lr=0.0004]
Steps: 84%|████████▎ | 586/700 [05:25<01:04, 1.77it/s, loss=0.0861, lr=0.0004]
Steps: 84%|████████▎ | 586/700 [05:25<01:04, 1.77it/s, loss=0.00681, lr=0.0004]
Steps: 84%|████████▍ | 587/700 [05:26<01:04, 1.76it/s, loss=0.00681, lr=0.0004]
Steps: 84%|████████▍ | 587/700 [05:26<01:04, 1.76it/s, loss=0.0772, lr=0.0004]
Steps: 84%|████████▍ | 588/700 [05:27<01:04, 1.75it/s, loss=0.0772, lr=0.0004]
Steps: 84%|████████▍ | 588/700 [05:27<01:04, 1.75it/s, loss=0.0183, lr=0.0004]
Steps: 84%|████████▍ | 589/700 [05:27<01:03, 1.75it/s, loss=0.0183, lr=0.0004]
Steps: 84%|████████▍ | 589/700 [05:27<01:03, 1.75it/s, loss=0.00783, lr=0.0004]
Steps: 84%|████████▍ | 590/700 [05:28<01:02, 1.75it/s, loss=0.00783, lr=0.0004]
Steps: 84%|████████▍ | 590/700 [05:28<01:02, 1.75it/s, loss=0.0575, lr=0.0004]
Steps: 84%|████████▍ | 591/700 [05:28<01:01, 1.77it/s, loss=0.0575, lr=0.0004]
Steps: 84%|████████▍ | 591/700 [05:28<01:01, 1.77it/s, loss=0.0142, lr=0.0004]
Steps: 85%|████████▍ | 592/700 [05:29<01:00, 1.78it/s, loss=0.0142, lr=0.0004]
Steps: 85%|████████▍ | 592/700 [05:29<01:00, 1.78it/s, loss=0.00664, lr=0.0004]
Steps: 85%|████████▍ | 593/700 [05:29<00:59, 1.80it/s, loss=0.00664, lr=0.0004]
Steps: 85%|████████▍ | 593/700 [05:29<00:59, 1.80it/s, loss=0.00879, lr=0.0004]
Steps: 85%|████████▍ | 594/700 [05:30<00:59, 1.79it/s, loss=0.00879, lr=0.0004]
Steps: 85%|████████▍ | 594/700 [05:30<00:59, 1.79it/s, loss=0.0716, lr=0.0004]
Steps: 85%|████████▌ | 595/700 [05:30<00:58, 1.79it/s, loss=0.0716, lr=0.0004]
Steps: 85%|████████▌ | 595/700 [05:30<00:58, 1.79it/s, loss=0.0366, lr=0.0004]
Steps: 85%|████████▌ | 596/700 [05:31<00:58, 1.79it/s, loss=0.0366, lr=0.0004]
Steps: 85%|████████▌ | 596/700 [05:31<00:58, 1.79it/s, loss=0.0431, lr=0.0004]
Steps: 85%|████████▌ | 597/700 [05:32<00:57, 1.79it/s, loss=0.0431, lr=0.0004]
Steps: 85%|████████▌ | 597/700 [05:32<00:57, 1.79it/s, loss=0.0399, lr=0.0004]
Steps: 85%|████████▌ | 598/700 [05:32<00:57, 1.79it/s, loss=0.0399, lr=0.0004]
Steps: 85%|████████▌ | 598/700 [05:32<00:57, 1.79it/s, loss=0.0735, lr=0.0004]
Steps: 86%|████████▌ | 599/700 [05:33<00:56, 1.78it/s, loss=0.0735, lr=0.0004]
Steps: 86%|████████▌ | 599/700 [05:33<00:56, 1.78it/s, loss=0.0237, lr=0.0004]
Steps: 86%|████████▌ | 600/700 [05:33<00:56, 1.78it/s, loss=0.0237, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_600.safetensors
LORA Unet Moved 0.0020308804232627153
LORA CLIP Moved 8.028616866795346e-05
Steps: 86%|████████▌ | 600/700 [05:33<00:56, 1.78it/s, loss=0.00955, lr=0.0004]
Steps: 86%|████████▌ | 601/700 [05:34<01:00, 1.62it/s, loss=0.00955, lr=0.0004]
Steps: 86%|████████▌ | 601/700 [05:34<01:00, 1.62it/s, loss=0.0184, lr=0.0004]
Steps: 86%|████████▌ | 602/700 [05:35<00:59, 1.66it/s, loss=0.0184, lr=0.0004]
Steps: 86%|████████▌ | 602/700 [05:35<00:59, 1.66it/s, loss=0.0569, lr=0.0004]
Steps: 86%|████████▌ | 603/700 [05:35<00:57, 1.69it/s, loss=0.0569, lr=0.0004]
Steps: 86%|████████▌ | 603/700 [05:35<00:57, 1.69it/s, loss=0.00788, lr=0.0004]
Steps: 86%|████████▋ | 604/700 [05:36<00:55, 1.72it/s, loss=0.00788, lr=0.0004]
Steps: 86%|████████▋ | 604/700 [05:36<00:55, 1.72it/s, loss=0.0886, lr=0.0004]
Steps: 86%|████████▋ | 605/700 [05:36<00:54, 1.74it/s, loss=0.0886, lr=0.0004]
Steps: 86%|████████▋ | 605/700 [05:36<00:54, 1.74it/s, loss=0.0103, lr=0.0004]
Steps: 87%|████████▋ | 606/700 [05:37<00:53, 1.77it/s, loss=0.0103, lr=0.0004]
Steps: 87%|████████▋ | 606/700 [05:37<00:53, 1.77it/s, loss=0.00687, lr=0.0004]
Steps: 87%|████████▋ | 607/700 [05:37<00:52, 1.79it/s, loss=0.00687, lr=0.0004]
Steps: 87%|████████▋ | 607/700 [05:37<00:52, 1.79it/s, loss=0.00811, lr=0.0004]
Steps: 87%|████████▋ | 608/700 [05:38<00:50, 1.81it/s, loss=0.00811, lr=0.0004]
Steps: 87%|████████▋ | 608/700 [05:38<00:50, 1.81it/s, loss=0.0626, lr=0.0004]
Steps: 87%|████████▋ | 609/700 [05:38<00:50, 1.81it/s, loss=0.0626, lr=0.0004]
Steps: 87%|████████▋ | 609/700 [05:38<00:50, 1.81it/s, loss=0.037, lr=0.0004]
Steps: 87%|████████▋ | 610/700 [05:39<00:50, 1.79it/s, loss=0.037, lr=0.0004]
Steps: 87%|████████▋ | 610/700 [05:39<00:50, 1.79it/s, loss=0.0101, lr=0.0004]
Steps: 87%|████████▋ | 611/700 [05:40<00:49, 1.80it/s, loss=0.0101, lr=0.0004]
Steps: 87%|████████▋ | 611/700 [05:40<00:49, 1.80it/s, loss=0.00297, lr=0.0004]
Steps: 87%|████████▋ | 612/700 [05:40<00:49, 1.79it/s, loss=0.00297, lr=0.0004]
Steps: 87%|████████▋ | 612/700 [05:40<00:49, 1.79it/s, loss=0.045, lr=0.0004]
Steps: 88%|████████▊ | 613/700 [05:41<00:48, 1.79it/s, loss=0.045, lr=0.0004]
Steps: 88%|████████▊ | 613/700 [05:41<00:48, 1.79it/s, loss=0.00866, lr=0.0004]
Steps: 88%|████████▊ | 614/700 [05:41<00:47, 1.79it/s, loss=0.00866, lr=0.0004]
Steps: 88%|████████▊ | 614/700 [05:41<00:47, 1.79it/s, loss=0.00474, lr=0.0004]
Steps: 88%|████████▊ | 615/700 [05:42<00:47, 1.79it/s, loss=0.00474, lr=0.0004]
Steps: 88%|████████▊ | 615/700 [05:42<00:47, 1.79it/s, loss=0.0106, lr=0.0004]
Steps: 88%|████████▊ | 616/700 [05:42<00:46, 1.81it/s, loss=0.0106, lr=0.0004]
Steps: 88%|████████▊ | 616/700 [05:42<00:46, 1.81it/s, loss=0.0635, lr=0.0004]
Steps: 88%|████████▊ | 617/700 [05:43<00:46, 1.79it/s, loss=0.0635, lr=0.0004]
Steps: 88%|████████▊ | 617/700 [05:43<00:46, 1.79it/s, loss=0.0116, lr=0.0004]
Steps: 88%|████████▊ | 618/700 [05:43<00:46, 1.77it/s, loss=0.0116, lr=0.0004]
Steps: 88%|████████▊ | 618/700 [05:43<00:46, 1.77it/s, loss=0.0267, lr=0.0004]
Steps: 88%|████████▊ | 619/700 [05:44<00:45, 1.77it/s, loss=0.0267, lr=0.0004]
Steps: 88%|████████▊ | 619/700 [05:44<00:45, 1.77it/s, loss=0.0141, lr=0.0004]
Steps: 89%|████████▊ | 620/700 [05:45<00:45, 1.78it/s, loss=0.0141, lr=0.0004]
Steps: 89%|████████▊ | 620/700 [05:45<00:45, 1.78it/s, loss=0.0269, lr=0.0004]
Steps: 89%|████████▊ | 621/700 [05:45<00:43, 1.80it/s, loss=0.0269, lr=0.0004]
Steps: 89%|████████▊ | 621/700 [05:45<00:43, 1.80it/s, loss=0.0219, lr=0.0004]
Steps: 89%|████████▉ | 622/700 [05:46<00:42, 1.82it/s, loss=0.0219, lr=0.0004]
Steps: 89%|████████▉ | 622/700 [05:46<00:42, 1.82it/s, loss=0.0307, lr=0.0004]
Steps: 89%|████████▉ | 623/700 [05:46<00:42, 1.81it/s, loss=0.0307, lr=0.0004]
Steps: 89%|████████▉ | 623/700 [05:46<00:42, 1.81it/s, loss=0.0196, lr=0.0004]
Steps: 89%|████████▉ | 624/700 [05:47<00:42, 1.81it/s, loss=0.0196, lr=0.0004]
Steps: 89%|████████▉ | 624/700 [05:47<00:42, 1.81it/s, loss=0.0529, lr=0.0004]
Steps: 89%|████████▉ | 625/700 [05:47<00:41, 1.82it/s, loss=0.0529, lr=0.0004]
Steps: 89%|████████▉ | 625/700 [05:47<00:41, 1.82it/s, loss=0.0333, lr=0.0004]
Steps: 89%|████████▉ | 626/700 [05:48<00:40, 1.83it/s, loss=0.0333, lr=0.0004]
Steps: 89%|████████▉ | 626/700 [05:48<00:40, 1.83it/s, loss=0.0369, lr=0.0004]
Steps: 90%|████████▉ | 627/700 [05:48<00:39, 1.86it/s, loss=0.0369, lr=0.0004]
Steps: 90%|████████▉ | 627/700 [05:48<00:39, 1.86it/s, loss=0.0185, lr=0.0004]
Steps: 90%|████████▉ | 628/700 [05:49<00:38, 1.87it/s, loss=0.0185, lr=0.0004]
Steps: 90%|████████▉ | 628/700 [05:49<00:38, 1.87it/s, loss=0.00975, lr=0.0004]
Steps: 90%|████████▉ | 629/700 [05:49<00:38, 1.84it/s, loss=0.00975, lr=0.0004]
Steps: 90%|████████▉ | 629/700 [05:49<00:38, 1.84it/s, loss=0.021, lr=0.0004]
Steps: 90%|█████████ | 630/700 [05:50<00:38, 1.84it/s, loss=0.021, lr=0.0004]
Steps: 90%|█████████ | 630/700 [05:50<00:38, 1.84it/s, loss=0.111, lr=0.0004]
Steps: 90%|█████████ | 631/700 [05:51<00:37, 1.84it/s, loss=0.111, lr=0.0004]
Steps: 90%|█████████ | 631/700 [05:51<00:37, 1.84it/s, loss=0.00458, lr=0.0004]
Steps: 90%|█████████ | 632/700 [05:51<00:36, 1.84it/s, loss=0.00458, lr=0.0004]
Steps: 90%|█████████ | 632/700 [05:51<00:36, 1.84it/s, loss=0.0759, lr=0.0004]
Steps: 90%|█████████ | 633/700 [05:52<00:36, 1.84it/s, loss=0.0759, lr=0.0004]
Steps: 90%|█████████ | 633/700 [05:52<00:36, 1.84it/s, loss=0.0882, lr=0.0004]
Steps: 91%|█████████ | 634/700 [05:52<00:36, 1.83it/s, loss=0.0882, lr=0.0004]
Steps: 91%|█████████ | 634/700 [05:52<00:36, 1.83it/s, loss=0.0142, lr=0.0004]
Steps: 91%|█████████ | 635/700 [05:53<00:35, 1.84it/s, loss=0.0142, lr=0.0004]
Steps: 91%|█████████ | 635/700 [05:53<00:35, 1.84it/s, loss=0.00448, lr=0.0004]
Steps: 91%|█████████ | 636/700 [05:53<00:34, 1.84it/s, loss=0.00448, lr=0.0004]
Steps: 91%|█████████ | 636/700 [05:53<00:34, 1.84it/s, loss=0.0323, lr=0.0004]
Steps: 91%|█████████ | 637/700 [05:54<00:34, 1.81it/s, loss=0.0323, lr=0.0004]
Steps: 91%|█████████ | 637/700 [05:54<00:34, 1.81it/s, loss=0.00757, lr=0.0004]
Steps: 91%|█████████ | 638/700 [05:54<00:34, 1.80it/s, loss=0.00757, lr=0.0004]
Steps: 91%|█████████ | 638/700 [05:54<00:34, 1.80it/s, loss=0.0161, lr=0.0004]
Steps: 91%|█████████▏| 639/700 [05:55<00:33, 1.80it/s, loss=0.0161, lr=0.0004]
Steps: 91%|█████████▏| 639/700 [05:55<00:33, 1.80it/s, loss=0.0543, lr=0.0004]
Steps: 91%|█████████▏| 640/700 [05:56<00:33, 1.80it/s, loss=0.0543, lr=0.0004]
Steps: 91%|█████████▏| 640/700 [05:56<00:33, 1.80it/s, loss=0.0417, lr=0.0004]
Steps: 92%|█████████▏| 641/700 [05:56<00:32, 1.81it/s, loss=0.0417, lr=0.0004]
Steps: 92%|█████████▏| 641/700 [05:56<00:32, 1.81it/s, loss=0.0085, lr=0.0004]
Steps: 92%|█████████▏| 642/700 [05:57<00:31, 1.85it/s, loss=0.0085, lr=0.0004]
Steps: 92%|█████████▏| 642/700 [05:57<00:31, 1.85it/s, loss=0.00933, lr=0.0004]
Steps: 92%|█████████▏| 643/700 [05:57<00:29, 1.90it/s, loss=0.00933, lr=0.0004]
Steps: 92%|█████████▏| 643/700 [05:57<00:29, 1.90it/s, loss=0.00429, lr=0.0004]
Steps: 92%|█████████▏| 644/700 [05:58<00:28, 1.94it/s, loss=0.00429, lr=0.0004]
Steps: 92%|█████████▏| 644/700 [05:58<00:28, 1.94it/s, loss=0.051, lr=0.0004]
Steps: 92%|█████████▏| 645/700 [05:58<00:28, 1.95it/s, loss=0.051, lr=0.0004]
Steps: 92%|█████████▏| 645/700 [05:58<00:28, 1.95it/s, loss=0.122, lr=0.0004]
Steps: 92%|█████████▏| 646/700 [05:59<00:27, 1.94it/s, loss=0.122, lr=0.0004]
Steps: 92%|█████████▏| 646/700 [05:59<00:27, 1.94it/s, loss=0.0861, lr=0.0004]
Steps: 92%|█████████▏| 647/700 [05:59<00:27, 1.93it/s, loss=0.0861, lr=0.0004]
Steps: 92%|█████████▏| 647/700 [05:59<00:27, 1.93it/s, loss=0.0105, lr=0.0004]
Steps: 93%|█████████▎| 648/700 [06:00<00:26, 1.95it/s, loss=0.0105, lr=0.0004]
Steps: 93%|█████████▎| 648/700 [06:00<00:26, 1.95it/s, loss=0.28, lr=0.0004]
Steps: 93%|█████████▎| 649/700 [06:00<00:25, 1.98it/s, loss=0.28, lr=0.0004]
Steps: 93%|█████████▎| 649/700 [06:00<00:25, 1.98it/s, loss=0.00453, lr=0.0004]
Steps: 93%|█████████▎| 650/700 [06:01<00:25, 1.97it/s, loss=0.00453, lr=0.0004]
Steps: 93%|█████████▎| 650/700 [06:01<00:25, 1.97it/s, loss=0.0112, lr=0.0004]
Steps: 93%|█████████▎| 651/700 [06:01<00:24, 1.97it/s, loss=0.0112, lr=0.0004]
Steps: 93%|█████████▎| 651/700 [06:01<00:24, 1.97it/s, loss=0.00302, lr=0.0004]
Steps: 93%|█████████▎| 652/700 [06:02<00:24, 1.95it/s, loss=0.00302, lr=0.0004]
Steps: 93%|█████████▎| 652/700 [06:02<00:24, 1.95it/s, loss=0.0966, lr=0.0004]
Steps: 93%|█████████▎| 653/700 [06:02<00:25, 1.85it/s, loss=0.0966, lr=0.0004]
Steps: 93%|█████████▎| 653/700 [06:02<00:25, 1.85it/s, loss=0.0116, lr=0.0004]
Steps: 93%|█████████▎| 654/700 [06:03<00:25, 1.84it/s, loss=0.0116, lr=0.0004]
Steps: 93%|█████████▎| 654/700 [06:03<00:25, 1.84it/s, loss=0.00164, lr=0.0004]
Steps: 94%|█████████▎| 655/700 [06:03<00:24, 1.83it/s, loss=0.00164, lr=0.0004]
Steps: 94%|█████████▎| 655/700 [06:03<00:24, 1.83it/s, loss=0.0755, lr=0.0004]
Steps: 94%|█████████▎| 656/700 [06:04<00:23, 1.84it/s, loss=0.0755, lr=0.0004]
Steps: 94%|█████████▎| 656/700 [06:04<00:23, 1.84it/s, loss=0.118, lr=0.0004]
Steps: 94%|█████████▍| 657/700 [06:04<00:22, 1.87it/s, loss=0.118, lr=0.0004]
Steps: 94%|█████████▍| 657/700 [06:04<00:22, 1.87it/s, loss=0.00279, lr=0.0004]
Steps: 94%|█████████▍| 658/700 [06:05<00:22, 1.90it/s, loss=0.00279, lr=0.0004]
Steps: 94%|█████████▍| 658/700 [06:05<00:22, 1.90it/s, loss=0.0254, lr=0.0004]
Steps: 94%|█████████▍| 659/700 [06:05<00:21, 1.91it/s, loss=0.0254, lr=0.0004]
Steps: 94%|█████████▍| 659/700 [06:05<00:21, 1.91it/s, loss=0.00583, lr=0.0004]
Steps: 94%|█████████▍| 660/700 [06:06<00:20, 1.93it/s, loss=0.00583, lr=0.0004]
Steps: 94%|█████████▍| 660/700 [06:06<00:20, 1.93it/s, loss=0.0188, lr=0.0004]
Steps: 94%|█████████▍| 661/700 [06:06<00:20, 1.94it/s, loss=0.0188, lr=0.0004]
Steps: 94%|█████████▍| 661/700 [06:06<00:20, 1.94it/s, loss=0.0194, lr=0.0004]
Steps: 95%|█████████▍| 662/700 [06:07<00:19, 1.96it/s, loss=0.0194, lr=0.0004]
Steps: 95%|█████████▍| 662/700 [06:07<00:19, 1.96it/s, loss=0.0046, lr=0.0004]
Steps: 95%|█████████▍| 663/700 [06:07<00:18, 1.97it/s, loss=0.0046, lr=0.0004]
Steps: 95%|█████████▍| 663/700 [06:07<00:18, 1.97it/s, loss=0.0282, lr=0.0004]
Steps: 95%|█████████▍| 664/700 [06:08<00:18, 1.95it/s, loss=0.0282, lr=0.0004]
Steps: 95%|█████████▍| 664/700 [06:08<00:18, 1.95it/s, loss=0.0177, lr=0.0004]
Steps: 95%|█████████▌| 665/700 [06:09<00:18, 1.87it/s, loss=0.0177, lr=0.0004]
Steps: 95%|█████████▌| 665/700 [06:09<00:18, 1.87it/s, loss=0.028, lr=0.0004]
Steps: 95%|█████████▌| 666/700 [06:09<00:17, 1.91it/s, loss=0.028, lr=0.0004]
Steps: 95%|█████████▌| 666/700 [06:09<00:17, 1.91it/s, loss=0.00854, lr=0.0004]
Steps: 95%|█████████▌| 667/700 [06:10<00:17, 1.92it/s, loss=0.00854, lr=0.0004]
Steps: 95%|█████████▌| 667/700 [06:10<00:17, 1.92it/s, loss=0.0678, lr=0.0004]
Steps: 95%|█████████▌| 668/700 [06:10<00:16, 1.93it/s, loss=0.0678, lr=0.0004]
Steps: 95%|█████████▌| 668/700 [06:10<00:16, 1.93it/s, loss=0.0106, lr=0.0004]
Steps: 96%|█████████▌| 669/700 [06:11<00:15, 1.94it/s, loss=0.0106, lr=0.0004]
Steps: 96%|█████████▌| 669/700 [06:11<00:15, 1.94it/s, loss=0.00561, lr=0.0004]
Steps: 96%|█████████▌| 670/700 [06:11<00:15, 1.89it/s, loss=0.00561, lr=0.0004]
Steps: 96%|█████████▌| 670/700 [06:11<00:15, 1.89it/s, loss=0.0232, lr=0.0004]
Steps: 96%|█████████▌| 671/700 [06:12<00:15, 1.88it/s, loss=0.0232, lr=0.0004]
Steps: 96%|█████████▌| 671/700 [06:12<00:15, 1.88it/s, loss=0.0145, lr=0.0004]
Steps: 96%|█████████▌| 672/700 [06:12<00:15, 1.87it/s, loss=0.0145, lr=0.0004]
Steps: 96%|█████████▌| 672/700 [06:12<00:15, 1.87it/s, loss=0.0449, lr=0.0004]
Steps: 96%|█████████▌| 673/700 [06:13<00:14, 1.83it/s, loss=0.0449, lr=0.0004]
Steps: 96%|█████████▌| 673/700 [06:13<00:14, 1.83it/s, loss=0.0102, lr=0.0004]
Steps: 96%|█████████▋| 674/700 [06:13<00:14, 1.80it/s, loss=0.0102, lr=0.0004]
Steps: 96%|█████████▋| 674/700 [06:13<00:14, 1.80it/s, loss=0.0219, lr=0.0004]
Steps: 96%|█████████▋| 675/700 [06:14<00:14, 1.78it/s, loss=0.0219, lr=0.0004]
Steps: 96%|█████████▋| 675/700 [06:14<00:14, 1.78it/s, loss=0.00629, lr=0.0004]
Steps: 97%|█████████▋| 676/700 [06:15<00:13, 1.78it/s, loss=0.00629, lr=0.0004]
Steps: 97%|█████████▋| 676/700 [06:15<00:13, 1.78it/s, loss=0.112, lr=0.0004]
Steps: 97%|█████████▋| 677/700 [06:15<00:13, 1.77it/s, loss=0.112, lr=0.0004]
Steps: 97%|█████████▋| 677/700 [06:15<00:13, 1.77it/s, loss=0.00805, lr=0.0004]
Steps: 97%|█████████▋| 678/700 [06:16<00:12, 1.76it/s, loss=0.00805, lr=0.0004]
Steps: 97%|█████████▋| 678/700 [06:16<00:12, 1.76it/s, loss=0.00428, lr=0.0004]
Steps: 97%|█████████▋| 679/700 [06:16<00:11, 1.76it/s, loss=0.00428, lr=0.0004]
Steps: 97%|█████████▋| 679/700 [06:16<00:11, 1.76it/s, loss=0.00553, lr=0.0004]
Steps: 97%|█████████▋| 680/700 [06:17<00:11, 1.76it/s, loss=0.00553, lr=0.0004]
Steps: 97%|█████████▋| 680/700 [06:17<00:11, 1.76it/s, loss=0.00655, lr=0.0004]
Steps: 97%|█████████▋| 681/700 [06:17<00:10, 1.74it/s, loss=0.00655, lr=0.0004]
Steps: 97%|█████████▋| 681/700 [06:17<00:10, 1.74it/s, loss=0.0833, lr=0.0004]
Steps: 97%|█████████▋| 682/700 [06:18<00:10, 1.73it/s, loss=0.0833, lr=0.0004]
Steps: 97%|█████████▋| 682/700 [06:18<00:10, 1.73it/s, loss=0.0285, lr=0.0004]
Steps: 98%|█████████▊| 683/700 [06:19<00:09, 1.71it/s, loss=0.0285, lr=0.0004]
Steps: 98%|█████████▊| 683/700 [06:19<00:09, 1.71it/s, loss=0.0525, lr=0.0004]
Steps: 98%|█████████▊| 684/700 [06:19<00:09, 1.72it/s, loss=0.0525, lr=0.0004]
Steps: 98%|█████████▊| 684/700 [06:19<00:09, 1.72it/s, loss=0.00216, lr=0.0004]
Steps: 98%|█████████▊| 685/700 [06:20<00:08, 1.71it/s, loss=0.00216, lr=0.0004]
Steps: 98%|█████████▊| 685/700 [06:20<00:08, 1.71it/s, loss=0.0627, lr=0.0004]
Steps: 98%|█████████▊| 686/700 [06:20<00:08, 1.73it/s, loss=0.0627, lr=0.0004]
Steps: 98%|█████████▊| 686/700 [06:20<00:08, 1.73it/s, loss=0.0122, lr=0.0004]
Steps: 98%|█████████▊| 687/700 [06:21<00:07, 1.72it/s, loss=0.0122, lr=0.0004]
Steps: 98%|█████████▊| 687/700 [06:21<00:07, 1.72it/s, loss=0.00683, lr=0.0004]
Steps: 98%|█████████▊| 688/700 [06:22<00:06, 1.72it/s, loss=0.00683, lr=0.0004]
Steps: 98%|█████████▊| 688/700 [06:22<00:06, 1.72it/s, loss=0.00972, lr=0.0004]
Steps: 98%|█████████▊| 689/700 [06:22<00:06, 1.73it/s, loss=0.00972, lr=0.0004]
Steps: 98%|█████████▊| 689/700 [06:22<00:06, 1.73it/s, loss=0.0338, lr=0.0004]
Steps: 99%|█████████▊| 690/700 [06:23<00:05, 1.74it/s, loss=0.0338, lr=0.0004]
Steps: 99%|█████████▊| 690/700 [06:23<00:05, 1.74it/s, loss=0.0056, lr=0.0004]
Steps: 99%|█████████▊| 691/700 [06:23<00:05, 1.75it/s, loss=0.0056, lr=0.0004]
Steps: 99%|█████████▊| 691/700 [06:23<00:05, 1.75it/s, loss=0.00928, lr=0.0004]
Steps: 99%|█████████▉| 692/700 [06:24<00:04, 1.75it/s, loss=0.00928, lr=0.0004]
Steps: 99%|█████████▉| 692/700 [06:24<00:04, 1.75it/s, loss=0.00226, lr=0.0004]
Steps: 99%|█████████▉| 693/700 [06:24<00:04, 1.72it/s, loss=0.00226, lr=0.0004]
Steps: 99%|█████████▉| 693/700 [06:24<00:04, 1.72it/s, loss=0.00318, lr=0.0004]
Steps: 99%|█████████▉| 694/700 [06:25<00:03, 1.73it/s, loss=0.00318, lr=0.0004]
Steps: 99%|█████████▉| 694/700 [06:25<00:03, 1.73it/s, loss=0.00763, lr=0.0004]
Steps: 99%|█████████▉| 695/700 [06:26<00:02, 1.74it/s, loss=0.00763, lr=0.0004]
Steps: 99%|█████████▉| 695/700 [06:26<00:02, 1.74it/s, loss=0.0217, lr=0.0004]
Steps: 99%|█████████▉| 696/700 [06:26<00:02, 1.75it/s, loss=0.0217, lr=0.0004]
Steps: 99%|█████████▉| 696/700 [06:26<00:02, 1.75it/s, loss=0.0112, lr=0.0004]
Steps: 100%|█████████▉| 697/700 [06:27<00:01, 1.76it/s, loss=0.0112, lr=0.0004]
Steps: 100%|█████████▉| 697/700 [06:27<00:01, 1.76it/s, loss=0.0276, lr=0.0004]
Steps: 100%|█████████▉| 698/700 [06:27<00:01, 1.76it/s, loss=0.0276, lr=0.0004]
Steps: 100%|█████████▉| 698/700 [06:27<00:01, 1.76it/s, loss=0.0766, lr=0.0004]
Steps: 100%|█████████▉| 699/700 [06:28<00:00, 1.74it/s, loss=0.0766, lr=0.0004]
Steps: 100%|█████████▉| 699/700 [06:28<00:00, 1.74it/s, loss=0.00645, lr=0.0004]
Steps: 100%|██████████| 700/700 [06:28<00:00, 1.74it/s, loss=0.00645, lr=0.0004]
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/step_700.safetensors
LORA Unet Moved 0.002203464973717928
LORA CLIP Moved 8.760895434534177e-05
Current Learned Embeddings for <s1>:, id 49408 tensor([ 0.0171, 0.0102, -0.0029, -0.0226], device='cuda:0')
Current Learned Embeddings for <s2>:, id 49409 tensor([ 0.0081, 0.0151, -0.0066, 0.0028], device='cuda:0')
Saving weights to checkpoints/final_lora.safetensors
Steps: 100%|██████████| 700/700 [06:28<00:00, 1.74it/s, loss=0.00693, lr=0.0004]
Steps: 100%|██████████| 700/700 [06:29<00:00, 1.80it/s, loss=0.00693, lr=0.0004]