Home / Topics / Deployments

Monitor a deployment

Track deployment performance and health


Deployments provide tools to help you track performance, diagnose issues, and monitor your model’s health.

Metrics dashboard

Each deployment provides a metrics dashboard to help you understand how your model is performing in production.

You can view up to 24 hours of historical metrics data, aggregated into 15-minute intervals.

Available metrics

The dashboard tracks the following metrics:

  • Latency: Track response times and identify performance bottlenecks
  • Throughput: Monitor requests per second and capacity utilization
  • Error rates: Identify and troubleshoot failed predictions
  • Instance status: See how many instances are starting, idle, or actively processing requests
  • Queue depth: Monitor pending predictions waiting for processing
  • GPU memory usage: Monitor how much GPU memory your deployment is using across all instances

The metrics graphs automatically refresh and provide interactive controls for zooming and filtering data by time range.

Access your deployment metrics by visiting replicate.com/deployments and selecting the deployment you want to monitor.

GPU memory monitoring

GPU memory monitoring helps you optimize resource utilization and ensure your models are running efficiently. The GPU memory visualization shows:

  • Total memory available: The total GPU memory capacity allocated to your deployment across all instances.
  • Memory usage patterns: Both median and maximum GPU memory usage over configurable time periods (2 hours or 24 hours).
  • Multi-instance aggregation: Memory usage is aggregated across all running instances in your deployment, giving you a comprehensive view of resource utilization.

This monitoring helps you:

  • See if your model is using GPU memory efficiently
  • Decide if you need different hardware for better performance
  • Spot memory usage patterns and potential optimizations
  • Plan capacity for scaling your deployment