Examples

View more examples

Run time and cost

This model costs approximately $0.012 to run on Replicate, or 83 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 9 seconds.

Readme

ProteusV0.5

ProteusV0.5 is the latest full release of my AI image generation model, built as a sophisticated enhancement over OpenDalleV1.1. This version brings significant improvements in photorealism, prompt comprehension, and stylistic capabilities across various domains. About Proteus Proteus leverages and enhances the core functionalities of OpenDalleV1.1 to deliver superior outcomes. Key areas of advancement include heightened responsiveness to prompts and augmented creative capacities. The model has been fine-tuned using a carefully curated dataset of copyright-free stock images and high-quality AI-generated image pairs.

Key Improvements in V0.5:

Custom-Trained CLIP Model:

Dramatically improved prompt understanding and interpretation
Enables more accurate and nuanced responses to complex prompts
Sets Proteus apart from most other models in the field

Further Refinement of Stylistic Capabilities:

Enhanced ability to generate diverse artistic styles
Improved coherence in complex scenes and compositions

Expanded Training Dataset:

Now totaling over 400,000 images
Significantly broadened knowledge base and generation capabilities

Balanced Creativity and Accuracy:

Addressed previous issues of being “too stylistic/creative”
Improved alignment between user prompts and generated outputs

Proteus’s Background

Proteus serves as a sophisticated enhancement over OpenDalleV1.1, leveraging its core functionalities to deliver superior outcomes. Key areas of advancement include heightened responsiveness to prompts and augmented creative capacities. To achieve this, it was fine-tuned using approximately 220,000 GPTV captioned images from copyright-free stock images (with some anime included), which were then normalized. Additionally, DPO (Direct Preference Optimization) was employed through a collection of 10,000 carefully selected high-quality, AI-generated image pairs. In pursuit of optimal performance, numerous LORA (Low-Rank Adaptation) models are trained independently before being selectively incorporated into the principal model via dynamic application methods. These techniques involve targeting particular segments within the model while avoiding interference with other areas during the learning phase. Consequently, Proteus exhibits marked improvements in portraying intricate facial characteristics and lifelike skin textures, all while sustaining commendable proficiency across various aesthetic domains, notably surrealism, anime, and cartoon-style visualizations.

Training Details

Total training dataset: Now over 400,000 images Initial training: ~220,000 GPTV captioned images from copyright-free stock images (including some anime) Additional training: Hand-picked photorealistic images Fine-tuning: Direct Preference Optimization (DPO) with 10,000 carefully selected high-quality, AI-generated image pairs LORA (Low-Rank Adaptation) models trained independently and selectively incorporated

Improvements

Enhanced portrayal of intricate facial characteristics and lifelike skin textures Improved proficiency in surrealism, anime, and cartoon-style visualizations Superior prompt comprehension due to custom-trained CLIP Expanded dataset leading to more diverse and accurate outputs Refined balance between creativity and accuracy

Recommended Settings

Clip Skip: 2 CFG Scale: 7 Steps: 25 - 50 Sampler: DPM++ 2M SDE Scheduler: Karras Resolution: 1024x1024

The custom-trained CLIP is a significant point of differentiation, as very few models incorporate this feature. Enjoy creating with the fully released ProteusV0.5!