Monitoring
Building a Production Grafana Dashboard for AWS ECS & EC2
Why Custom Grafana Dashboards?
AWS CloudWatch is powerful but its default dashboards miss the business-level context you need. Custom Grafana dashboards give you exactly the metrics that matter for your specific service, all in one place.
Architecture Overview
The monitoring stack connects CloudWatch metrics to Grafana via the CloudWatch datasource plugin. EC2 and on-premise nodes feed metrics through Zabbix agents to a Grafana Zabbix plugin. Logs flow through the ELK Stack into Kibana. Alerts route through PagerDuty for on-call management.
Key ECS Metrics to Track
CPUUtilization per service and per task to detect performance issues early.
MemoryUtilization to catch memory leaks before they cause outages.
RunningTaskCount to detect unexpected scaling events or task crashes.
ALB RequestCount and TargetResponseTime for end-user experience monitoring.
5xx error rate which is the most critical metric for SLA tracking.
Setting Up PagerDuty Alerting
Connect Grafana contact points to PagerDuty using your integration key. Set alert thresholds at 80% CPU utilisation and any 5xx error spike above 1% of total requests. Configure escalation policies so the right person gets paged at the right time.
Zabbix for EC2 and On-Premise Nodes
Install the Zabbix agent on each EC2 instance. Configure it to report to your Zabbix server deployed on a dedicated EC2 instance. Add the Grafana Zabbix datasource to pull node metrics into your unified dashboard.
Result
A single pane of glass showing ECS service health, EC2 node metrics, error rates, and latency. Automatic PagerDuty alerts fire before users notice issues, giving you time to respond proactively.
Need a monitoring stack for your infrastructure? Schedule a meeting.