managarten/docs/MONITORING.md
Till JS b1b9bbc269
Some checks are pending
CD Mac Mini / Detect Changes (push) Waiting to run
CD Mac Mini / Deploy (push) Blocked by required conditions
CI / Detect Changes (push) Waiting to run
CI / Validate (push) Waiting to run
CI / Build mana-search (push) Blocked by required conditions
CI / Build mana-sync (push) Blocked by required conditions
CI / Build mana-api-gateway (push) Blocked by required conditions
CI / Build mana-crawler (push) Blocked by required conditions
Docker Validate / Validate Dockerfiles (push) Waiting to run
Docker Validate / Build calendar-web (push) Blocked by required conditions
Docker Validate / Build quotes-web (push) Blocked by required conditions
Docker Validate / Build todo-backend (push) Blocked by required conditions
Docker Validate / Build todo-web (push) Blocked by required conditions
Docker Validate / Build mana-auth (push) Blocked by required conditions
Docker Validate / Build mana-sync (push) Blocked by required conditions
Docker Validate / Build mana-media (push) Blocked by required conditions
Mirror to Forgejo / Push to Forgejo (push) Waiting to run
chore: rename repo mana-monorepo → managarten
Phase-3-Rename des ehemaligen Multi-App-Monorepos zum eigenständigen
Produkt-Repo. Verein heißt mana e.V., Plattform-Domain bleibt mana.how,
apps/mana/ bleibt unverändert — nur der Repo-Container kriegt den
neuen Namen "managarten" (Garten der mana-Apps).

Geändert:
- package.json#name + #description
- README.md (Titel + erster Absatz)
- TROUBLESHOOTING.md
- alle Mac-Mini-Skripte (Pfade ~/projects/mana-monorepo → ~/projects/managarten)
- COMPOSE_PROJECT_NAME-default in scripts/mac-mini/status.sh
- .github/workflows/cd-macmini.yml + mirror-to-forgejo.yml
- apps/docs (astro.config.mjs + content)
- .claude/settings.local.json (Bash-Permission-Pfade)
- alle docs/*.md Pfad-Referenzen
- launchd plists, .env.macmini.example, infrastructure/

Forgejo-Repo + GitHub-Repo bereits via API umbenannt. Lokales
Verzeichnis-Rename + Mac-Mini-Cutover folgen separat.
2026-05-09 01:16:02 +02:00

351 lines
12 KiB
Markdown

# Monitoring Stack Documentation
This document describes the Mana monitoring infrastructure, including metrics collection, business analytics, and long-term data retention.
## Quick Access
All monitoring tools are publicly accessible - no login required (except GlitchTip).
| Tool | URL | Access |
|------|-----|--------|
| **Grafana** | https://grafana.mana.how | No login needed (Anonymous Viewer) |
| **Umami** | [Public Dashboard](https://stats.mana.how/share/face76f42d3e42beb8c80ea03f33a462/mana-webapp) | No login needed (Public Share) |
| **GlitchTip** | https://glitchtip.mana.how | `guest@mana.how` / `guestguest` |
### Grafana Dashboards
| Dashboard | Description |
|-----------|-------------|
| Master Overview | CPU, RAM, Disk, Container Status |
| Error Tracking | GlitchTip errors via PostgreSQL datasource |
| Backend Metrics | Request rates, latency, error rates |
| Database Details | PostgreSQL connections, queries |
### Umami Public Share Links
| App | Share URL |
|-----|-----------|
| Mana | https://stats.mana.how/share/face76f42d3e42beb8c80ea03f33a462/mana-webapp |
| Calendar | https://stats.mana.how/share/772d2510c5bb47e0b490267f2821510a/calendar-webapp |
| Todo | https://stats.mana.how/share/ec1bb158d8714bc6bdbc147c97b9c1c7/todo-webapp |
| Chat | https://stats.mana.how/share/1c43fd9847674f899dc2ebdfbd8960db/chat-webapp |
| Contacts | https://stats.mana.how/share/d2cc0f019e464a88a49ba365f58b78e7/contacts-webapp |
| Clock | https://stats.mana.how/share/f893945efea7449382abf04812a54bea/clock-webapp |
| Quotes | https://stats.mana.how/share/6a86139ad8e2469c97541c40a70397fa/quotes-webapp |
| Picture | https://stats.mana.how/share/273f67fa569940f6b85e7a7a0a003539/picture-webapp |
| Photos | https://stats.mana.how/share/dc201d685f784716a0b8587376eca7a1/photos-webapp |
| Storage | https://stats.mana.how/share/392ff51d11f14f0c9d556af1402a3ee6/storage-webapp |
| Food | https://stats.mana.how/share/33dfae72f8e24aaa8008cbbceeaf072d/food-webapp |
| Planta | https://stats.mana.how/share/1e83a8a67fa84d3995455c21dedbe3a2/plants-webapp |
| Presi | https://stats.mana.how/share/a1eb8d1fa4d543e6b97ac41351fe1c6f/presi-webapp |
| Skilltree | https://stats.mana.how/share/5de13e0895ae4a69aa2a834f985be14d/skilltree-webapp |
| Cardecky | https://stats.mana.how/share/1c1d54c4782943e58dde0a6db7c86ec6/cards-webapp |
### GlitchTip Error Tracking
18 backend projects configured. See [ERROR_TRACKING.md](ERROR_TRACKING.md) for DSNs and integration details.
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Mana Monitoring Stack │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Services │────▶│ VictoriaMetrics │────▶│ Grafana │ │
│ │ (Backends) │ │ (2yr retention) │ │ (Dashboards) │ │
│ └──────────────┘ └──────────────────┘ └──────────────────┘ │
│ │ ▲ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ PostgreSQL │────▶│ DuckDB │────────────┘ │
│ │ (Source) │ │ (Business KPIs) │ │
│ └──────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Components
### 1. VictoriaMetrics (Operative Metrics)
**Purpose:** High-performance time-series database for operational metrics (CPU, memory, request latency, etc.)
| Property | Value |
|----------|-------|
| Image | `victoriametrics/victoria-metrics:v1.99.0` |
| Port | 8428 |
| Retention | 2 years |
| Storage | Docker volume `mana-victoriametrics` |
**Why VictoriaMetrics instead of Prometheus?**
- 3-10x better compression
- Lower memory usage
- Faster queries over long time ranges
- Drop-in replacement (PromQL compatible)
- Better suited for long-term retention
**Endpoints:**
```bash
# Health check
curl http://localhost:8428/health
# Query metrics (PromQL)
curl "http://localhost:8428/api/v1/query?query=up"
# Query range
curl "http://localhost:8428/api/v1/query_range?query=auth_users_total&start=-1h&step=1m"
```
### 2. DuckDB Analytics (Business KPIs)
**Purpose:** Embedded OLAP database for business metrics with unlimited retention.
| Property | Value |
|----------|-------|
| Location | `/data/analytics/metrics.duckdb` (in mana-auth container) |
| Storage | Docker volume `mana-analytics` |
| Retention | Unlimited |
| Snapshot | Daily at midnight UTC |
**Tracked Metrics:**
- Total users
- Verified users
- New users (today, this week, this month)
- Database size
- Growth rates
**API Endpoints:**
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/v1/analytics/health` | GET | Service health and database status |
| `/api/v1/analytics/latest` | GET | Latest metrics snapshot |
| `/api/v1/analytics/growth` | GET | User growth over time |
| `/api/v1/analytics/monthly` | GET | Monthly aggregated metrics |
| `/api/v1/analytics/summary` | GET | Dashboard summary with trends |
| `/api/v1/analytics/snapshot` | POST | Trigger manual snapshot |
**Example Responses:**
```bash
# Health
curl https://auth.mana.how/api/v1/analytics/health
```
```json
{
"status": "healthy",
"database_path": "/data/analytics/metrics.duckdb",
"total_records": 30,
"latest_snapshot": "2026-01-28"
}
```
```bash
# Latest metrics
curl https://auth.mana.how/api/v1/analytics/latest
```
```json
{
"date": "2026-01-28",
"total_users": 9,
"verified_users": 1,
"new_users_today": 0,
"new_users_week": 9,
"new_users_month": 9,
"total_db_size_bytes": 9613795,
"recorded_at": "2026-01-28 11:46:45.440934"
}
```
```bash
# Growth data
curl "https://auth.mana.how/api/v1/analytics/growth?days=30"
```
```json
[
{"date": "2026-01-01", "total_users": 5, "growth": null, "growth_percent": null},
{"date": "2026-01-02", "total_users": 6, "growth": 1, "growth_percent": 20.0},
{"date": "2026-01-03", "total_users": 9, "growth": 3, "growth_percent": 50.0}
]
```
### 3. Grafana (Visualization)
**Purpose:** Dashboard visualization for both operative and business metrics.
| Property | Value |
|----------|-------|
| Image | `grafana/grafana:10.4.1` |
| Port | 3100 (external), 3000 (internal) |
| URL | https://grafana.mana.how |
**Available Dashboards:**
| Dashboard | Description |
|-----------|-------------|
| Master Overview | Combined view of all key metrics |
| Business Metrics | User growth, KPIs from DuckDB |
| System Overview | Infrastructure health |
| Backends | Backend service metrics |
| Application Details | Detailed app metrics |
| Database Details | PostgreSQL metrics |
| User Statistics | User-related metrics |
## Data Retention Strategy
| Data Type | Storage | Retention | Use Case |
|-----------|---------|-----------|----------|
| Operative Metrics | VictoriaMetrics | 2 years | CPU, memory, latency, request rates |
| Business KPIs | DuckDB | Unlimited | User growth, feature usage, revenue |
| Raw Logs | External (optional) | 30 days | Debugging, auditing |
## Deployment
### Starting the Monitoring Stack
```bash
# On Mac Mini server
cd ~/projects/managarten
# Start all monitoring services
docker compose -f docker-compose.macmini.yml up -d victoriametrics grafana mana-auth
# Check status
docker compose -f docker-compose.macmini.yml ps | grep -E "(victoria|grafana|auth)"
```
### Rebuilding mana-auth (with Analytics)
```bash
# Build from monorepo root
docker build -t ghcr.io/memo-2023/mana-auth:latest -f services/mana-auth/Dockerfile .
# Restart container
docker compose -f docker-compose.macmini.yml up -d mana-auth
```
### Volume Permissions
If DuckDB fails with permission errors, fix the volume ownership:
```bash
docker exec -u root mana-auth chown -R nestjs:nodejs /data/analytics
docker restart mana-auth
```
## Backup
### Manual Backup
```bash
./scripts/backup-monitoring.sh
```
This script backs up:
1. **VictoriaMetrics**: Creates a snapshot and compresses it
2. **DuckDB**: Copies the database file and exports to Parquet
### Backup Location
Default: `/backup/monitoring/`
Files created:
- `victoriametrics-YYYY-MM-DD.tar.gz`
- `analytics-YYYY-MM-DD.duckdb`
- `analytics-YYYY-MM-DD.parquet`
### Automated Backups
Add to crontab for daily backups:
```bash
# Daily backup at 2 AM
0 2 * * * /path/to/managarten/scripts/backup-monitoring.sh
```
## Troubleshooting
### VictoriaMetrics not scraping targets
```bash
# Check scrape config
docker exec mana-victoriametrics cat /etc/prometheus/prometheus.yml
# Check targets status
curl http://localhost:8428/api/v1/targets
```
### DuckDB initialization fails
1. Check permissions:
```bash
docker exec mana-auth ls -la /data/analytics/
```
2. Fix if needed:
```bash
docker exec -u root mana-auth chown -R nestjs:nodejs /data/analytics
```
3. Restart:
```bash
docker restart mana-auth
```
### Grafana can't connect to VictoriaMetrics
1. Check VictoriaMetrics is running:
```bash
curl http://localhost:8428/health
```
2. Check datasource configuration:
```bash
cat docker/grafana/provisioning/datasources/prometheus.yml
```
3. Restart Grafana:
```bash
docker restart mana-grafana
```
### Missing metrics in Grafana
1. Check if VictoriaMetrics has the data:
```bash
curl "http://localhost:8428/api/v1/query?query=auth_users_total"
```
2. Check service is exposing metrics:
```bash
curl http://localhost:3001/metrics
```
## Environment Variables
### mana-auth
| Variable | Description | Default |
|----------|-------------|---------|
| `DUCKDB_PATH` | Path to DuckDB file | `/data/analytics/metrics.duckdb` |
| `DATABASE_URL` | PostgreSQL connection string | Required |
### VictoriaMetrics
Configured via command-line arguments in docker-compose:
- `-retentionPeriod=2y`
- `-storageDataPath=/storage`
- `-promscrape.config=/etc/prometheus/prometheus.yml`
## Architecture Decision Record
For the full decision rationale, see: [docs/decisions/001-monitoring-stack-upgrade.md](decisions/001-monitoring-stack-upgrade.md)
## Related Documentation
- [Local Development](LOCAL_DEVELOPMENT.md)
- [Mac Mini Server](MAC_MINI_SERVER.md)
- [Database Migrations](DATABASE_MIGRATIONS.md)