Complete rename across the entire monorepo pre-launch: - Module, routes, API, i18n, standalone landing app directories - All code identifiers, display names, logo component - German user-facing label: "Essen" (English brand stays "Food") - Dexie table nutriFavorites -> foodFavorites - Infra configs (docker-compose, cloudflared, nginx, wrangler) Zero residue of nutriphi remains. No data migration needed (pre-launch). Follow-up: run pnpm install, update Cloudflare DNS (food.mana.how), rename Cloudflare Pages project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
Monitoring Stack Documentation
This document describes the Mana monitoring infrastructure, including metrics collection, business analytics, and long-term data retention.
Quick Access
All monitoring tools are publicly accessible - no login required (except GlitchTip).
| Tool | URL | Access |
|---|---|---|
| Grafana | https://grafana.mana.how | No login needed (Anonymous Viewer) |
| Umami | Public Dashboard | No login needed (Public Share) |
| GlitchTip | https://glitchtip.mana.how | guest@mana.how / guestguest |
Grafana Dashboards
| Dashboard | Description |
|---|---|
| Master Overview | CPU, RAM, Disk, Container Status |
| Error Tracking | GlitchTip errors via PostgreSQL datasource |
| Backend Metrics | Request rates, latency, error rates |
| Database Details | PostgreSQL connections, queries |
Umami Public Share Links
GlitchTip Error Tracking
18 backend projects configured. See ERROR_TRACKING.md for DSNs and integration details.
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ Mana Monitoring Stack │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Services │────▶│ VictoriaMetrics │────▶│ Grafana │ │
│ │ (Backends) │ │ (2yr retention) │ │ (Dashboards) │ │
│ └──────────────┘ └──────────────────┘ └──────────────────┘ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ PostgreSQL │────▶│ DuckDB │────────────┘ │
│ │ (Source) │ │ (Business KPIs) │ │
│ └──────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Components
1. VictoriaMetrics (Operative Metrics)
Purpose: High-performance time-series database for operational metrics (CPU, memory, request latency, etc.)
| Property | Value |
|---|---|
| Image | victoriametrics/victoria-metrics:v1.99.0 |
| Port | 8428 |
| Retention | 2 years |
| Storage | Docker volume mana-victoriametrics |
Why VictoriaMetrics instead of Prometheus?
- 3-10x better compression
- Lower memory usage
- Faster queries over long time ranges
- Drop-in replacement (PromQL compatible)
- Better suited for long-term retention
Endpoints:
# Health check
curl http://localhost:8428/health
# Query metrics (PromQL)
curl "http://localhost:8428/api/v1/query?query=up"
# Query range
curl "http://localhost:8428/api/v1/query_range?query=auth_users_total&start=-1h&step=1m"
2. DuckDB Analytics (Business KPIs)
Purpose: Embedded OLAP database for business metrics with unlimited retention.
| Property | Value |
|---|---|
| Location | /data/analytics/metrics.duckdb (in mana-auth container) |
| Storage | Docker volume mana-analytics |
| Retention | Unlimited |
| Snapshot | Daily at midnight UTC |
Tracked Metrics:
- Total users
- Verified users
- New users (today, this week, this month)
- Database size
- Growth rates
API Endpoints:
| Endpoint | Method | Description |
|---|---|---|
/api/v1/analytics/health |
GET | Service health and database status |
/api/v1/analytics/latest |
GET | Latest metrics snapshot |
/api/v1/analytics/growth |
GET | User growth over time |
/api/v1/analytics/monthly |
GET | Monthly aggregated metrics |
/api/v1/analytics/summary |
GET | Dashboard summary with trends |
/api/v1/analytics/snapshot |
POST | Trigger manual snapshot |
Example Responses:
# Health
curl https://auth.mana.how/api/v1/analytics/health
{
"status": "healthy",
"database_path": "/data/analytics/metrics.duckdb",
"total_records": 30,
"latest_snapshot": "2026-01-28"
}
# Latest metrics
curl https://auth.mana.how/api/v1/analytics/latest
{
"date": "2026-01-28",
"total_users": 9,
"verified_users": 1,
"new_users_today": 0,
"new_users_week": 9,
"new_users_month": 9,
"total_db_size_bytes": 9613795,
"recorded_at": "2026-01-28 11:46:45.440934"
}
# Growth data
curl "https://auth.mana.how/api/v1/analytics/growth?days=30"
[
{"date": "2026-01-01", "total_users": 5, "growth": null, "growth_percent": null},
{"date": "2026-01-02", "total_users": 6, "growth": 1, "growth_percent": 20.0},
{"date": "2026-01-03", "total_users": 9, "growth": 3, "growth_percent": 50.0}
]
3. Grafana (Visualization)
Purpose: Dashboard visualization for both operative and business metrics.
| Property | Value |
|---|---|
| Image | grafana/grafana:10.4.1 |
| Port | 3100 (external), 3000 (internal) |
| URL | https://grafana.mana.how |
Available Dashboards:
| Dashboard | Description |
|---|---|
| Master Overview | Combined view of all key metrics |
| Business Metrics | User growth, KPIs from DuckDB |
| System Overview | Infrastructure health |
| Backends | Backend service metrics |
| Application Details | Detailed app metrics |
| Database Details | PostgreSQL metrics |
| User Statistics | User-related metrics |
Data Retention Strategy
| Data Type | Storage | Retention | Use Case |
|---|---|---|---|
| Operative Metrics | VictoriaMetrics | 2 years | CPU, memory, latency, request rates |
| Business KPIs | DuckDB | Unlimited | User growth, feature usage, revenue |
| Raw Logs | External (optional) | 30 days | Debugging, auditing |
Deployment
Starting the Monitoring Stack
# On Mac Mini server
cd ~/projects/mana-monorepo
# Start all monitoring services
docker compose -f docker-compose.macmini.yml up -d victoriametrics grafana mana-auth
# Check status
docker compose -f docker-compose.macmini.yml ps | grep -E "(victoria|grafana|auth)"
Rebuilding mana-auth (with Analytics)
# Build from monorepo root
docker build -t ghcr.io/memo-2023/mana-auth:latest -f services/mana-auth/Dockerfile .
# Restart container
docker compose -f docker-compose.macmini.yml up -d mana-auth
Volume Permissions
If DuckDB fails with permission errors, fix the volume ownership:
docker exec -u root mana-auth chown -R nestjs:nodejs /data/analytics
docker restart mana-auth
Backup
Manual Backup
./scripts/backup-monitoring.sh
This script backs up:
- VictoriaMetrics: Creates a snapshot and compresses it
- DuckDB: Copies the database file and exports to Parquet
Backup Location
Default: /backup/monitoring/
Files created:
victoriametrics-YYYY-MM-DD.tar.gzanalytics-YYYY-MM-DD.duckdbanalytics-YYYY-MM-DD.parquet
Automated Backups
Add to crontab for daily backups:
# Daily backup at 2 AM
0 2 * * * /path/to/mana-monorepo/scripts/backup-monitoring.sh
Troubleshooting
VictoriaMetrics not scraping targets
# Check scrape config
docker exec mana-victoriametrics cat /etc/prometheus/prometheus.yml
# Check targets status
curl http://localhost:8428/api/v1/targets
DuckDB initialization fails
- Check permissions:
docker exec mana-auth ls -la /data/analytics/
- Fix if needed:
docker exec -u root mana-auth chown -R nestjs:nodejs /data/analytics
- Restart:
docker restart mana-auth
Grafana can't connect to VictoriaMetrics
- Check VictoriaMetrics is running:
curl http://localhost:8428/health
- Check datasource configuration:
cat docker/grafana/provisioning/datasources/prometheus.yml
- Restart Grafana:
docker restart mana-grafana
Missing metrics in Grafana
- Check if VictoriaMetrics has the data:
curl "http://localhost:8428/api/v1/query?query=auth_users_total"
- Check service is exposing metrics:
curl http://localhost:3001/metrics
Environment Variables
mana-auth
| Variable | Description | Default |
|---|---|---|
DUCKDB_PATH |
Path to DuckDB file | /data/analytics/metrics.duckdb |
DATABASE_URL |
PostgreSQL connection string | Required |
VictoriaMetrics
Configured via command-line arguments in docker-compose:
-retentionPeriod=2y-storageDataPath=/storage-promscrape.config=/etc/prometheus/prometheus.yml
Architecture Decision Record
For the full decision rationale, see: docs/decisions/001-monitoring-stack-upgrade.md