Generates MagicaVoxel VOX models, using flux dev + hunyuan3d-2. Can generate high detail and low detail models at varying resolutions.
Image captioning via vision-language models with instruction tuning