- Replace Prometheus with VictoriaMetrics (2-year retention)
- Add DuckDB analytics module for business KPIs (unlimited retention)
- Add master overview dashboard combining all metrics
- Add business metrics dashboard for user growth tracking
- Add backup script for VictoriaMetrics snapshots and DuckDB
- Add ADR documentation for monitoring stack decision
Analytics API endpoints:
- GET /api/v1/analytics/health - Service health
- GET /api/v1/analytics/latest - Latest metrics snapshot
- GET /api/v1/analytics/growth - User growth over time
- GET /api/v1/analytics/monthly - Monthly aggregates
- POST /api/v1/analytics/snapshot - Manual snapshot trigger
- Add user metrics to mana-core-auth MetricsService:
- auth_users_total: Total registered users
- auth_users_verified: Email-verified users
- auth_users_created_today/this_week/this_month
- Create Grafana user-statistics dashboard with:
- User overview stats (total, verified, verification rate, new today)
- Registration period breakdown (today/week/month)
- User growth trends over time
- Enhance telegram-stats-bot /users command:
- Add yesterday comparison with trends
- Add week-over-week comparison
- Add mini bar chart for last 7 days registration
- Include user stats in daily Telegram report
Reverting 618c58c5 which broke the CI workflow.
Will re-add notifications after fixing the issue.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add notify-start job with Telegram notification for build start
- Add notify-complete job with build status and duration notification
- Push CI metrics to Prometheus Pushgateway for Grafana visualization
- Create CI/CD Grafana dashboard with build status, duration, and history
- Add Pushgateway scrape config to Prometheus
Requires TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, and PUSHGATEWAY_URL secrets.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New dashboards:
- Application Details: Node.js runtime (heap, event loop, GC),
HTTP details (status codes, methods, top routes), error analysis
- Database Details: PostgreSQL and Redis metrics with detailed breakdowns
Alerting rules (docker/prometheus/alerts.yml):
- Service: down, high/very high error rate, slow response time
- Infrastructure: high CPU/memory/disk usage
- Database: PostgreSQL/Redis down, high connections, low cache hit
- Container: high CPU/memory, restarts
All dashboards include service selector variable for filtering.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add MetricsModule with prom-client for todo backend
- Add MetricsInterceptor for request tracking
- Update COMMANDS.md with presi and storage commands
- Update Grafana dashboards for backend monitoring
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>