adirik / imagedream

Image-Prompt Multi-view Diffusion for 3D Generation

  • Public
  • 755 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 83 minutes.

Readme

ImageDream

ImageDream is text and image to 3D model by ByteDance, which leverages a multi-view diffusion model with canonical camera coordination for enhanced geometric and textural accuracy. It excels in creating accurate and detailed 3D objects by utilizing a multi-level image-prompt controller for precise control over the modeling process. Currently outperforming existing state-of-the-art single image 3D model generators, ImageDream demonstrates its superiority in geometry and texture quality through extensive user studies and quantitative evaluations. See the paper and original repository.

How to use the API

To use ImageDream, simply enter a text description and corresponding image of 3D asset you want to generate. Depending on the parameters you set, the 3D model will be generated in 1h-2h. The input arguments are as follows:

  • image: Image of an object to generate a 3D object from. The object should be placed in the center and must not be too small/big in the image.
  • prompt: Short text description of the 3D object to generate.
  • negative_prompt: Short text description of the 3D object to not generate.
  • guidance_scale: The higher the value, the more similar the generated 3D object will be to the inputs.
  • shading: If set to True, the texture of the generated 3D object will be better, but the generation takes ~2h. If set to False, the texture of the generated 3D object will be worse, but the generation takes ~1h.
  • num_steps: Number of training steps. Strongly advised to keep the default value for optimal results.
  • seed: Seed for reproducibility, default value is None. Use default value for random seed. Set to an arbitrary value for deterministic generation.

References

@article{wang2023imagedream, title={ImageDream: Image-Prompt Multi-view Diffusion for 3D Generation}, author={Wang, Peng and Shi, Yichun}, journal={arXiv preprint arXiv:2312.02201}, year={2023} }