piotr-infordb/image-segmentation

DeepLabV3+ model for high-accuracy binary image segmentation, trained to detect roofs (foreground vs background) and output a grayscale mask.

Public
29 runs

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

DeepLabV3+ Binary Segmentation Model

This repository provides a DeepLabV3+ model for binary semantic segmentation (background vs. foreground), packaged for deployment with Cog / Replicate. It is designed for high-accuracy foreground detection tasks such as roof segmentation in aerial imagery, but can be retrained for any binary segmentation dataset.

Features

  • DeepLabV3+ architecture
  • Two output classes (0 = background, 1 = foreground)
  • Combined Cross-Entropy + Dice Loss
  • IoU-based validation and model selection
  • Mixed-precision training (AMP)
  • Data augmentation with Albumentations
  • Automatic best-checkpoint saving
  • GPU and CPU inference support

Training

Training is handled via train.py. The script: - Loads paired images and masks
- Applies normalization and augmentation
- Trains using mixed precision
- Logs training and validation loss
- Computes validation IoU

The best performing checkpoint is saved to: models/checkpoint.pth

Training curves and a sample prediction image are also exported.

Inference

predict.py loads the saved checkpoint and performs inference by: 1. Normalizing the input image using ImageNet mean and standard deviation
2. Running the model on GPU if available, otherwise CPU
3. Applying softmax and argmax to produce class predictions
4. Exporting a grayscale PNG mask where: - 0 = background
- 255 = foreground

Output is written to: /tmp/output.png

Model created
Model updated