managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-18 14:09:41 +02:00

Commit graph

Author	SHA1	Message	Date
Till JS	6fa6509fa5	feat(observability): add metrics and monitoring for all 15 backends - Add MetricsModule to 8 backends missing it (photos, zitare, mukke, planta, picture, storage, presi, nutriphi) - Enable Prometheus scraping for all 15 backends in prometheus.yml (was only 6, with 3 commented out and 6 missing entirely) - Update ServiceDown alert rule to cover all 15 backends - Update Grafana dashboards (backends, master-overview, system-overview) with all backend services in health panels - Fix imprecise regex in application-details dashboard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 09:09:04 +01:00
Till-JS	fdaf6a9c75	🔧 fix(dashboards): fix broken panels and metrics - Backends: Remove Docker container section (cAdvisor not deployed) - Backends: Add Auth Service Runtime section with correct auth_ prefixed metrics - Backends: Rename to "Backends Overview" - Application Details: Fix Node.js Runtime to use auth_ prefixed metrics - Application Details: Rename section to "Auth Service Runtime" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 12:54:07 +01:00
Till-JS	8c259a008b	feat(monitoring): add comprehensive Grafana dashboards and alerting New dashboards: - Application Details: Node.js runtime (heap, event loop, GC), HTTP details (status codes, methods, top routes), error analysis - Database Details: PostgreSQL and Redis metrics with detailed breakdowns Alerting rules (docker/prometheus/alerts.yml): - Service: down, high/very high error rate, slow response time - Infrastructure: high CPU/memory/disk usage - Database: PostgreSQL/Redis down, high connections, low cache hit - Container: high CPU/memory, restarts All dashboards include service selector variable for filtering. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 09:47:18 +01:00

Author

SHA1

Message

Date

Till JS

6fa6509fa5

feat(observability): add metrics and monitoring for all 15 backends

- Add MetricsModule to 8 backends missing it (photos, zitare, mukke,
  planta, picture, storage, presi, nutriphi)
- Enable Prometheus scraping for all 15 backends in prometheus.yml
  (was only 6, with 3 commented out and 6 missing entirely)
- Update ServiceDown alert rule to cover all 15 backends
- Update Grafana dashboards (backends, master-overview, system-overview)
  with all backend services in health panels
- Fix imprecise regex in application-details dashboard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-23 09:09:04 +01:00

Till-JS

fdaf6a9c75

🔧 fix(dashboards): fix broken panels and metrics

- Backends: Remove Docker container section (cAdvisor not deployed)
- Backends: Add Auth Service Runtime section with correct auth_ prefixed metrics
- Backends: Rename to "Backends Overview"
- Application Details: Fix Node.js Runtime to use auth_ prefixed metrics
- Application Details: Rename section to "Auth Service Runtime"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-01 12:54:07 +01:00

Till-JS

8c259a008b

feat(monitoring): add comprehensive Grafana dashboards and alerting

New dashboards:
- Application Details: Node.js runtime (heap, event loop, GC),
  HTTP details (status codes, methods, top routes), error analysis
- Database Details: PostgreSQL and Redis metrics with detailed breakdowns

Alerting rules (docker/prometheus/alerts.yml):
- Service: down, high/very high error rate, slow response time
- Infrastructure: high CPU/memory/disk usage
- Database: PostgreSQL/Redis down, high connections, low cache hit
- Container: high CPU/memory, restarts

All dashboards include service selector variable for filtering.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-26 09:47:18 +01:00

3 commits