managarten/docker/grafana/dashboards
Till JS 169821de1a feat(monitoring): add LLM Grafana dashboard, Prometheus scraping, and alerts
Wire mana-llm service into the monitoring stack:

Prometheus (docker/prometheus/prometheus.yml):
- Add mana-llm scrape job (port 3025, 15s interval)
- Include mana-llm in ServiceDown alert expression

Alerts (docker/prometheus/alerts.yml):
- New llm_alerts group with 4 rules:
  - LLMServiceDown: mana-llm down > 1 min (critical)
  - LLMHighErrorRate: > 10% errors for 5 min (warning)
  - OllamaProviderDown: > 50% requests via Google fallback (warning)
  - LLMSlowResponses: p95 > 30s for 5 min (warning)

Grafana Dashboard (docker/grafana/dashboards/mana-llm.json):
- 6 stat panels: status, req/min, error rate, fallback rate, latency, tokens/min
- Requests by Provider (stacked area: Ollama vs Google vs OpenRouter)
- Tokens by Type (prompt vs completion)
- Latency Percentiles (p50, p90, p99)
- Latency by Provider comparison
- Requests by Model breakdown
- Errors by Type
- Google Fallback Rate over time (with threshold coloring)
- Provider Distribution pie chart (24h)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:16:27 +01:00
..
application-details.json feat(observability): add metrics and monitoring for all 15 backends 2026-03-23 09:09:04 +01:00
auth-service.json fix(mana-core-auth): complete production readiness with test fixes 2026-02-01 14:18:58 +01:00
backends.json feat(observability): add mana-search, mana-media, and Synapse to monitoring 2026-03-23 10:46:59 +01:00
business-metrics.json 📈 feat(monitoring): upgrade to VictoriaMetrics + DuckDB analytics 2026-01-28 12:38:04 +01:00
database-details.json feat(monitoring): add comprehensive Grafana dashboards and alerting 2026-01-26 09:47:18 +01:00
deploy-tracking.json fix(infra): fix deploy tracking dashboard datasource UIDs and instant queries 2026-03-20 17:35:41 +01:00
error-tracking.json feat(grafana): add GlitchTip error tracking dashboard 2026-03-19 21:14:09 +01:00
mana-llm.json feat(monitoring): add LLM Grafana dashboard, Prometheus scraping, and alerts 2026-03-24 11:16:27 +01:00
master-overview.json feat(observability): add mana-search, mana-media, and Synapse to monitoring 2026-03-23 10:46:59 +01:00
system-overview.json feat(observability): add mana-search, mana-media, and Synapse to monitoring 2026-03-23 10:46:59 +01:00
user-statistics.json feat(stats): add user statistics to Prometheus metrics and Grafana 2026-01-26 10:53:57 +01:00