feat(monitoring): add LLM Grafana dashboard, Prometheus scraping, and alerts

Wire mana-llm service into the monitoring stack: Prometheus (docker/prometheus/prometheus.yml): - Add mana-llm scrape job (port 3025, 15s interval) - Include mana-llm in ServiceDown alert expression Alerts (docker/prometheus/alerts.yml): - New llm_alerts group with 4 rules: - LLMServiceDown: mana-llm down > 1 min (critical) - LLMHighErrorRate: > 10% errors for 5 min (warning) - OllamaProviderDown: > 50% requests via Google fallback (warning) - LLMSlowResponses: p95 > 30s for 5 min (warning) Grafana Dashboard (docker/grafana/dashboards/mana-llm.json): - 6 stat panels: status, req/min, error rate, fallback rate, latency, tokens/min - Requests by Provider (stacked area: Ollama vs Google vs OpenRouter) - Tokens by Type (prompt vs completion) - Latency Percentiles (p50, p90, p99) - Latency by Provider comparison - Requests by Model breakdown - Errors by Type - Google Fallback Rate over time (with threshold coloring) - Provider Distribution pie chart (24h) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-14 20:21:09 +02:00 · 2026-03-24 11:16:27 +01:00 · 2026-03-24 11:16:27 +01:00 · 169821de1a
commit 169821de1a
parent 57a2841168
3 changed files with 477 additions and 1 deletions
--- a/docker/prometheus/prometheus.yml
+++ b/docker/prometheus/prometheus.yml
@ -158,6 +158,13 @@ scrape_configs:
  # Core Services
  # ============================================

+  # Mana LLM Gateway (Ollama + Google Fallback)
+  - job_name: 'mana-llm'
+    static_configs:
+      - targets: ['mana-llm:3025']
+    metrics_path: '/metrics'
+    scrape_interval: 15s
+
  # Mana Search Service
  - job_name: 'mana-search'
    static_configs: