managarten/docs/MONITORING.md
Till JS 75a3ea2957 refactor: rename ManaDeck to Cards across entire monorepo
Rename the flashcard/deck management app from ManaDeck to Cards:
- Directory: apps/manadeck → apps/cards, packages/manadeck-database → packages/cards-database
- Packages: @manadeck/* → @cards/*, @manacore/manadeck-database → @manacore/cards-database
- Domain: manadeck.mana.how → cards.mana.how
- Storage: manadeck-storage → cards-storage
- Database: manadeck → cards
- All shared packages, infra configs, services, i18n, and docs updated
- 244 files changed, zero remaining manadeck references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:45:21 +02:00

12 KiB

Monitoring Stack Documentation

This document describes the ManaCore monitoring infrastructure, including metrics collection, business analytics, and long-term data retention.

Quick Access

All monitoring tools are publicly accessible - no login required (except GlitchTip).

Tool URL Access
Grafana https://grafana.mana.how No login needed (Anonymous Viewer)
Umami Public Dashboard No login needed (Public Share)
GlitchTip https://glitchtip.mana.how guest@mana.how / guestguest

Grafana Dashboards

Dashboard Description
Master Overview CPU, RAM, Disk, Container Status
Error Tracking GlitchTip errors via PostgreSQL datasource
Backend Metrics Request rates, latency, error rates
Database Details PostgreSQL connections, queries
App Share URL
ManaCore https://stats.mana.how/share/face76f42d3e42beb8c80ea03f33a462/manacore-webapp
Calendar https://stats.mana.how/share/772d2510c5bb47e0b490267f2821510a/calendar-webapp
Todo https://stats.mana.how/share/ec1bb158d8714bc6bdbc147c97b9c1c7/todo-webapp
Chat https://stats.mana.how/share/1c43fd9847674f899dc2ebdfbd8960db/chat-webapp
Contacts https://stats.mana.how/share/d2cc0f019e464a88a49ba365f58b78e7/contacts-webapp
Clock https://stats.mana.how/share/f893945efea7449382abf04812a54bea/clock-webapp
Zitare https://stats.mana.how/share/6a86139ad8e2469c97541c40a70397fa/zitare-webapp
Picture https://stats.mana.how/share/273f67fa569940f6b85e7a7a0a003539/picture-webapp
Photos https://stats.mana.how/share/dc201d685f784716a0b8587376eca7a1/photos-webapp
Storage https://stats.mana.how/share/392ff51d11f14f0c9d556af1402a3ee6/storage-webapp
NutriPhi https://stats.mana.how/share/33dfae72f8e24aaa8008cbbceeaf072d/nutriphi-webapp
Planta https://stats.mana.how/share/1e83a8a67fa84d3995455c21dedbe3a2/planta-webapp
Presi https://stats.mana.how/share/a1eb8d1fa4d543e6b97ac41351fe1c6f/presi-webapp
Skilltree https://stats.mana.how/share/5de13e0895ae4a69aa2a834f985be14d/skilltree-webapp
Cards https://stats.mana.how/share/1c1d54c4782943e58dde0a6db7c86ec6/cards-webapp

GlitchTip Error Tracking

18 backend projects configured. See ERROR_TRACKING.md for DSNs and integration details.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                           ManaCore Monitoring Stack                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐        │
│  │   Services   │────▶│  VictoriaMetrics │────▶│     Grafana      │        │
│  │  (Backends)  │     │   (2yr retention) │     │   (Dashboards)   │        │
│  └──────────────┘     └──────────────────┘     └──────────────────┘        │
│         │                                              ▲                    │
│         │                                              │                    │
│         ▼                                              │                    │
│  ┌──────────────┐     ┌──────────────────┐            │                    │
│  │  PostgreSQL  │────▶│      DuckDB      │────────────┘                    │
│  │   (Source)   │     │  (Business KPIs) │                                 │
│  └──────────────┘     └──────────────────┘                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Components

1. VictoriaMetrics (Operative Metrics)

Purpose: High-performance time-series database for operational metrics (CPU, memory, request latency, etc.)

Property Value
Image victoriametrics/victoria-metrics:v1.99.0
Port 8428
Retention 2 years
Storage Docker volume manacore-victoriametrics

Why VictoriaMetrics instead of Prometheus?

  • 3-10x better compression
  • Lower memory usage
  • Faster queries over long time ranges
  • Drop-in replacement (PromQL compatible)
  • Better suited for long-term retention

Endpoints:

# Health check
curl http://localhost:8428/health

# Query metrics (PromQL)
curl "http://localhost:8428/api/v1/query?query=up"

# Query range
curl "http://localhost:8428/api/v1/query_range?query=auth_users_total&start=-1h&step=1m"

2. DuckDB Analytics (Business KPIs)

Purpose: Embedded OLAP database for business metrics with unlimited retention.

Property Value
Location /data/analytics/metrics.duckdb (in mana-core-auth container)
Storage Docker volume manacore-analytics
Retention Unlimited
Snapshot Daily at midnight UTC

Tracked Metrics:

  • Total users
  • Verified users
  • New users (today, this week, this month)
  • Database size
  • Growth rates

API Endpoints:

Endpoint Method Description
/api/v1/analytics/health GET Service health and database status
/api/v1/analytics/latest GET Latest metrics snapshot
/api/v1/analytics/growth GET User growth over time
/api/v1/analytics/monthly GET Monthly aggregated metrics
/api/v1/analytics/summary GET Dashboard summary with trends
/api/v1/analytics/snapshot POST Trigger manual snapshot

Example Responses:

# Health
curl https://auth.mana.how/api/v1/analytics/health
{
  "status": "healthy",
  "database_path": "/data/analytics/metrics.duckdb",
  "total_records": 30,
  "latest_snapshot": "2026-01-28"
}
# Latest metrics
curl https://auth.mana.how/api/v1/analytics/latest
{
  "date": "2026-01-28",
  "total_users": 9,
  "verified_users": 1,
  "new_users_today": 0,
  "new_users_week": 9,
  "new_users_month": 9,
  "total_db_size_bytes": 9613795,
  "recorded_at": "2026-01-28 11:46:45.440934"
}
# Growth data
curl "https://auth.mana.how/api/v1/analytics/growth?days=30"
[
  {"date": "2026-01-01", "total_users": 5, "growth": null, "growth_percent": null},
  {"date": "2026-01-02", "total_users": 6, "growth": 1, "growth_percent": 20.0},
  {"date": "2026-01-03", "total_users": 9, "growth": 3, "growth_percent": 50.0}
]

3. Grafana (Visualization)

Purpose: Dashboard visualization for both operative and business metrics.

Property Value
Image grafana/grafana:10.4.1
Port 3100 (external), 3000 (internal)
URL https://grafana.mana.how

Available Dashboards:

Dashboard Description
Master Overview Combined view of all key metrics
Business Metrics User growth, KPIs from DuckDB
System Overview Infrastructure health
Backends Backend service metrics
Application Details Detailed app metrics
Database Details PostgreSQL metrics
User Statistics User-related metrics

Data Retention Strategy

Data Type Storage Retention Use Case
Operative Metrics VictoriaMetrics 2 years CPU, memory, latency, request rates
Business KPIs DuckDB Unlimited User growth, feature usage, revenue
Raw Logs External (optional) 30 days Debugging, auditing

Deployment

Starting the Monitoring Stack

# On Mac Mini server
cd ~/projects/manacore-monorepo

# Start all monitoring services
docker compose -f docker-compose.macmini.yml up -d victoriametrics grafana mana-core-auth

# Check status
docker compose -f docker-compose.macmini.yml ps | grep -E "(victoria|grafana|auth)"

Rebuilding mana-core-auth (with Analytics)

# Build from monorepo root
docker build -t ghcr.io/memo-2023/mana-core-auth:latest -f services/mana-core-auth/Dockerfile .

# Restart container
docker compose -f docker-compose.macmini.yml up -d mana-core-auth

Volume Permissions

If DuckDB fails with permission errors, fix the volume ownership:

docker exec -u root mana-core-auth chown -R nestjs:nodejs /data/analytics
docker restart mana-core-auth

Backup

Manual Backup

./scripts/backup-monitoring.sh

This script backs up:

  1. VictoriaMetrics: Creates a snapshot and compresses it
  2. DuckDB: Copies the database file and exports to Parquet

Backup Location

Default: /backup/monitoring/

Files created:

  • victoriametrics-YYYY-MM-DD.tar.gz
  • analytics-YYYY-MM-DD.duckdb
  • analytics-YYYY-MM-DD.parquet

Automated Backups

Add to crontab for daily backups:

# Daily backup at 2 AM
0 2 * * * /path/to/manacore-monorepo/scripts/backup-monitoring.sh

Troubleshooting

VictoriaMetrics not scraping targets

# Check scrape config
docker exec manacore-victoriametrics cat /etc/prometheus/prometheus.yml

# Check targets status
curl http://localhost:8428/api/v1/targets

DuckDB initialization fails

  1. Check permissions:
docker exec mana-core-auth ls -la /data/analytics/
  1. Fix if needed:
docker exec -u root mana-core-auth chown -R nestjs:nodejs /data/analytics
  1. Restart:
docker restart mana-core-auth

Grafana can't connect to VictoriaMetrics

  1. Check VictoriaMetrics is running:
curl http://localhost:8428/health
  1. Check datasource configuration:
cat docker/grafana/provisioning/datasources/prometheus.yml
  1. Restart Grafana:
docker restart manacore-grafana

Missing metrics in Grafana

  1. Check if VictoriaMetrics has the data:
curl "http://localhost:8428/api/v1/query?query=auth_users_total"
  1. Check service is exposing metrics:
curl http://localhost:3001/metrics

Environment Variables

mana-core-auth

Variable Description Default
DUCKDB_PATH Path to DuckDB file /data/analytics/metrics.duckdb
DATABASE_URL PostgreSQL connection string Required

VictoriaMetrics

Configured via command-line arguments in docker-compose:

  • -retentionPeriod=2y
  • -storageDataPath=/storage
  • -promscrape.config=/etc/prometheus/prometheus.yml

Architecture Decision Record

For the full decision rationale, see: docs/decisions/001-monitoring-stack-upgrade.md