- check-disk-space.sh now pushes mac_disk_used_percent + mac_colima_disk_used_gb
to Pushgateway every hour so vmalert can alert on real macOS disk usage
- alerts.yml: replace broken node-exporter disk alerts with Pushgateway-based ones
- master-overview.json: add "Recent Errors (Loki)" section with live error log
stream, error rate timeseries and top error sources barchart
- move-colima-to-external-ssd.sh: guided script to move 200GB Colima VM
datadisk from internal SSD to /Volumes/ManaData (3.6TB external SSD)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MetricsModule to 8 backends missing it (photos, zitare, mukke,
planta, picture, storage, presi, nutriphi)
- Enable Prometheus scraping for all 15 backends in prometheus.yml
(was only 6, with 3 commented out and 6 missing entirely)
- Update ServiceDown alert rule to cover all 15 backends
- Update Grafana dashboards (backends, master-overview, system-overview)
with all backend services in health panels
- Fix imprecise regex in application-details dashboard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added Total Requests counter for overall user interaction
- Added Requests/sec for current load visibility
- Reduced panel width to fit 8 metrics in one row
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Move Key Metrics section to top of dashboard
- Add new panels: Services UP, Apps Running, Matrix Bots, Avg Response Time
- Reorganize layout for better overview at a glance
- Remove CPU/Memory/Disk (no node-exporter), add Redis Keys
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace Prometheus with VictoriaMetrics (2-year retention)
- Add DuckDB analytics module for business KPIs (unlimited retention)
- Add master overview dashboard combining all metrics
- Add business metrics dashboard for user growth tracking
- Add backup script for VictoriaMetrics snapshots and DuckDB
- Add ADR documentation for monitoring stack decision
Analytics API endpoints:
- GET /api/v1/analytics/health - Service health
- GET /api/v1/analytics/latest - Latest metrics snapshot
- GET /api/v1/analytics/growth - User growth over time
- GET /api/v1/analytics/monthly - Monthly aggregates
- POST /api/v1/analytics/snapshot - Manual snapshot trigger