Image files: ['./temp_in/zeke1.jpg', './temp_in/zeke2.jpg', './temp_in/zeke3.jpg', './temp_in/zeke4.jpg', './temp_in/zeke5.jpg', './temp_in/zeke6.jpg'] Generating 6 captions... Input captioning text: a photo of TOK, TOK there is a man with glasses and a beard sitting down 0%| | 0/6 [00:00, there is a man with glasses and a beard sitting down a photo of , curly haired man wearing glasses looking at camera on beach a photo of , someone is holding up a wine glass and looking to the side a photo of , taken in an office of a man with a beard and glasses a photo of , lacy white shirt worn with a colorful tie and glasses, and mustache a photo of , ##raffe man with glasses and frecked hair standing near a brick wall # PTI : Loaded dataset # PTI : Running training # PTI : Num examples = 6 # PTI : Num batches each epoch = 2 # PTI : Num Epochs = 500 # PTI : Instantaneous batch size per device = 4 Total train batch size (w. parallel, distributed & accumulation) = 4 # PTI : Gradient Accumulation steps = 1 # PTI : Total optimization steps = 1000 0%| | 0/1000 [00:00