feat(observability): add mana-search, mana-media, and Synapse to monitoring

- Add Prometheus scraping for mana-search (port 3020, already has metrics)
- Add Prometheus scraping for mana-media (port 3015, MetricsModule added)
- Add Prometheus scraping for Matrix Synapse (port 9002, already enabled)
- Add MetricsModule to mana-media with media_ prefix
- Update Dockerfile for mana-media to include shared-nestjs-metrics
- Replace hardcoded ServiceDown alert list with dynamic regex
  (.*-backend|mana-core-auth|mana-search|mana-media|synapse)
- Replace hardcoded backends.json query with dynamic regex
- Add Search, Media, Synapse to master-overview and system-overview dashboards

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-03-23 10:46:59 +01:00
parent 5bcbb4b71d
commit 143112f77a
9 changed files with 1160 additions and 310 deletions

View file

@ -3,7 +3,7 @@ groups:
rules:
# Service Down Alert
- alert: ServiceDown
expr: up{job=~"mana-core-auth|chat-backend|todo-backend|calendar-backend|clock-backend|contacts-backend|storage-backend|presi-backend|nutriphi-backend|skilltree-backend|photos-backend|zitare-backend|mukke-backend|planta-backend|picture-backend"} == 0
expr: up{job=~"mana-core-auth|.*-backend|mana-search|mana-media|synapse"} == 0
for: 1m
labels:
severity: critical

View file

@ -154,6 +154,31 @@ scrape_configs:
metrics_path: '/metrics'
scrape_interval: 30s
# ============================================
# Core Services
# ============================================
# Mana Search Service
- job_name: 'mana-search'
static_configs:
- targets: ['mana-search:3020']
metrics_path: '/metrics'
scrape_interval: 30s
# Mana Media Service
- job_name: 'mana-media'
static_configs:
- targets: ['mana-media:3015']
metrics_path: '/metrics'
scrape_interval: 30s
# Matrix Synapse
- job_name: 'synapse'
static_configs:
- targets: ['synapse:9002']
metrics_path: '/_synapse/metrics'
scrape_interval: 30s
# ============================================
# Pushgateway (deploy metrics, batch jobs)
# ============================================