DeepLabV3+ Binary Segmentation Model
This repository provides a DeepLabV3+ model for binary semantic segmentation (background vs. foreground), packaged for deployment with Cog / Replicate. It is designed for high-accuracy foreground detection tasks such as roof segmentation in aerial imagery, but can be retrained for any binary segmentation dataset.
Features
- DeepLabV3+ architecture
- Two output classes (0 = background, 1 = foreground)
- Combined Cross-Entropy + Dice Loss
- IoU-based validation and model selection
- Mixed-precision training (AMP)
- Data augmentation with Albumentations
- Automatic best-checkpoint saving
- GPU and CPU inference support
Training
Training is handled via train.py. The script:
- Loads paired images and masks
- Applies normalization and augmentation
- Trains using mixed precision
- Logs training and validation loss
- Computes validation IoU
The best performing checkpoint is saved to:
models/checkpoint.pth
Training curves and a sample prediction image are also exported.
Inference
predict.py loads the saved checkpoint and performs inference by:
1. Normalizing the input image using ImageNet mean and standard deviation
2. Running the model on GPU if available, otherwise CPU
3. Applying softmax and argmax to produce class predictions
4. Exporting a grayscale PNG mask where:
- 0 = background
- 255 = foreground
Output is written to:
/tmp/output.png