feat(monitoring): add comprehensive Grafana dashboards and alerting

New dashboards:
- Application Details: Node.js runtime (heap, event loop, GC),
  HTTP details (status codes, methods, top routes), error analysis
- Database Details: PostgreSQL and Redis metrics with detailed breakdowns

Alerting rules (docker/prometheus/alerts.yml):
- Service: down, high/very high error rate, slow response time
- Infrastructure: high CPU/memory/disk usage
- Database: PostgreSQL/Redis down, high connections, low cache hit
- Container: high CPU/memory, restarts

All dashboards include service selector variable for filtering.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Till-JS 2026-01-26 09:47:18 +01:00
parent 41dea775a6
commit 8c259a008b
5 changed files with 2029 additions and 0 deletions

View file

@ -533,6 +533,7 @@ services:
- '--web.enable-lifecycle'
volumes:
- ./docker/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./docker/prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
- prometheus_data:/prometheus
ports:
- "9090:9090"