Commit graph

129 commits

Author SHA1 Message Date
Till JS
18fae3b66d feat(infra): add docker-compose for new Hono services + DB init
- Add mana-user (3062), mana-subscriptions (3063), mana-analytics (3064)
  to docker-compose with health checks and traefik labels
- Replace old NestJS Tier 3 app backends (~300 lines) with comment
  placeholder for Hono compute servers (need shared Dockerfile)
- Create docker/Dockerfile.hono-server — shared Bun Dockerfile for
  all 14 app compute servers (ARG APP for build context)
- Add 5 new databases to setup-databases.sh: mana_auth, mana_credits,
  mana_user, mana_subscriptions, mana_analytics, mana_sync

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:54:24 +01:00
Till JS
14099cc42c docs(infra): add PORT_SCHEMA.md + update Prometheus scrape targets
Comprehensive port schema documentation as single source of truth.
All services assigned to logical ranges:
- 3000-3009: Core platform (auth, credits, subscriptions, user, analytics)
- 3010-3019: Core infra (sync, media, search, notify, crawler, gateway)
- 3020-3029: AI/ML (llm, stt, tts, image-gen, voice-bot)
- 3030-3059: App backends
- 4000-4099: Matrix stack
- 5000-5059: Web frontends
- 8000-8099: Monitoring
- 9000-9199: Infrastructure exporters

All port conflicts resolved. Prometheus targets updated to match.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 03:02:12 +01:00
Till JS
753c685ef7 feat(services): create mana-analytics, remove feedback/analytics/ai from auth
Extract feedback, analytics, and AI modules from mana-core-auth into
standalone mana-analytics service (Hono + Bun, Port 3064).

New service (services/mana-analytics/):
- User feedback CRUD with voting
- AI-powered feedback title generation via mana-llm
- Simplified from DuckDB analytics to pure PostgreSQL
- ~550 LOC

Removed from mana-core-auth:
- feedback/ module (6 files)
- analytics/ module (4 files)
- ai/ module (3 files)
- db/schema/feedback.schema.ts

mana-core-auth now contains ONLY pure auth:
- Better Auth (JWT, Sessions, 2FA, Passkeys, OIDC, Magic Links)
- Organizations/Guilds (membership management)
- API Keys, Security, Me (GDPR), Health, Metrics
- Ready for Phase 5: Hono rewrite

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 02:29:24 +01:00
Till JS
ced7dd7441 feat(monitoring): add mana-sync, mana-notify, mana-crawler to Prometheus
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 02:18:21 +01:00
Till JS
d7799ec95d refactor(photos): remove NestJS backend, use local-first + direct mana-media
The Photos NestJS backend was a proxy to mana-media that enriched
responses with local album/favorite/tag data. Now:

- Albums store → local-first via albumCollection + albumItemCollection
- Favorites → local-first via favoriteCollection (toggle in IndexedDB)
- Photo tags → local-first via photoTagCollection
- Photo listing/stats → direct mana-media API calls from frontend
- Upload → direct mana-media upload from frontend
- Delete → direct mana-media delete from frontend

Removed 27 TypeScript files, 1 Docker container, 1 port (3039).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 02:18:03 +01:00
Till JS
dd2f814cf3 refactor(presi): replace NestJS backend with lightweight Hono server
The Presi NestJS backend (40 source files, 50 deps) was a CRUD wrapper
around decks, slides, and themes — all now handled by local-first sync.

Only the share-link feature requires server-side state (public URLs
without auth), so a minimal Hono + Bun server replaces the entire
NestJS backend:

- apps/presi/apps/server/ — Hono server with share routes + GDPR admin
  Uses @manacore/shared-hono for auth (JWKS), health, admin, errors
- Web app API client stripped to share-only (was 270 lines → 90 lines)
- Removed from docker-compose, CI/CD, Prometheus, env generation
- NestJS backend deleted (40 TS files, 8 test specs, 3038 lines)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 02:08:40 +01:00
Till JS
32939fbfb5 refactor(infra): remove zitare + clock NestJS backends, add shared-hono package
Both apps are fully local-first via Dexie.js + mana-sync. Their NestJS
backends were pure CRUD wrappers (20 + 31 source files) that are no
longer needed.

Changes:
- Add packages/shared-hono: JWT auth via JWKS (jose), Drizzle DB factory,
  health route, generic GDPR admin handler, error middleware
- Migrate zitare lists page from fetch() to listsStore (local-first)
- Rewrite clock timers store from API-based to timerCollection (Dexie)
- Update clock +layout.svelte CommandBar search to use local collections
- Remove zitare-backend + clock-backend from docker-compose, CI/CD,
  Prometheus, env generation, setup scripts
- Add docs/TECHNOLOGY_AUDIT_2026_03.md with full repo analysis

Net result: -2 Docker containers, -2 ports, -2728 lines of code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 22:43:46 +01:00
Till JS
16e0d99c5a feat(gpu-server): complete GPU server setup with AI services, monitoring, and public access
- Set up 5 AI services on Windows GPU server (RTX 3090):
  - mana-llm (Port 3025): OpenAI-compatible LLM gateway via Ollama
  - mana-stt (Port 3020): WhisperX with word timestamps + speaker diarization
  - mana-tts (Port 3022): Kokoro (EN) + Edge TTS (DE) + Piper (local DE)
  - mana-image-gen (Port 3023): FLUX.2 klein 4B image generation
  - Ollama (Port 11434): gemma3:4b/12b, qwen2.5-coder:14b, nomic-embed-text

- Add @manacore/shared-gpu TypeScript client package with SttClient, TtsClient, ImageClient
- Add CUDA-compatible whisper_service using faster-whisper for Windows
- Configure public access via Cloudflare Tunnel (gpu-llm/stt/tts/img.mana.how)
- Add Loki log aggregator (Docker on Mac Mini) + log shipper on GPU server
- Add GPU scrape targets to Prometheus/VictoriaMetrics config
- Add Grafana Loki datasource for GPU service logs
- Add health check with auto-restart, log rotation, and log shipping
- Document complete setup: Always-On config, troubleshooting, architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:35:30 +01:00
Till JS
a31ccc6c62 feat(infra): add api.mana.how route + Prometheus scrape targets for Go services
- Cloudflare Tunnel: api.mana.how → localhost:3060 (Go API Gateway)
- Prometheus: scrape targets for mana-api-gateway:3060 and mana-matrix-bot:4000

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:27:04 +01:00
Till JS
cdfbfcd13e feat(infra): add sveltekit-base image and build-app script for Mac Mini
- Add docker/Dockerfile.sveltekit-base: pre-built base with all 34 shared
  packages (mirrors nestjs-base pattern), eliminates redundant COPY/build
  steps from individual web Dockerfiles
- Add scripts/mac-mini/build-app.sh: stops monitoring stack before build
  to free RAM, auto-restarts on exit (trap cleanup)
- Migrate todo web Dockerfile to use sveltekit-base:local (47 COPY lines
  → 2, 4 build steps → 0)
- Update CD workflow to build sveltekit-base when deploying web apps

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 12:17:48 +01:00
Till JS
1052469397 feat(infra): extend Dockerfile validator to backends and services
Validator now checks 52 Dockerfiles (web + backend + service).
Fixed 10 missing COPYs across backends, services, and nestjs-base.
Generator also supports backend/service Dockerfiles with markers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 08:57:10 +01:00
Till JS
e3115b302d feat(infra): add Cloudflare fallback plan + self-hosted landing pages
Two infrastructure improvements for tech independence:

1. Cloudflare Fallback Documentation (docs/CLOUDFLARE_FALLBACK.md):
   - Plan B: WireGuard + Caddy on Hetzner VPS (€3.79/mo)
   - Complete Caddyfile with all 30+ subdomains
   - Step-by-step failover checklist (~15 min to switch)
   - Plan C: Direct IP with ISP

2. Self-Hosted Landing Pages (eliminates Cloudflare Pages dependency):
   - Nginx container (mana-infra-landings) on port 4400
   - Multi-site config: each subdomain → separate dist/ folder
   - Build script: scripts/mac-mini/build-landings.sh
   - Cloudflare Tunnel ingress rules for 10 landing page domains
   - Storage: /Volumes/ManaData/landings/ on external SSD
   - Domains: it, chats, pics, zitares, presis, clocks,
     manadeck, nutriphi, citycorners, docs

Migration path: Build landings locally, set Cloudflare DNS to
tunnel instead of Pages, then decommission CF Pages projects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 12:07:40 +01:00
Till JS
e06e8cca59 fix(infra): use postgres -c flags instead of config_file override
The config_file override replaced the entire default PostgreSQL config
including listen_addresses, breaking inter-container communication.
Use inline -c flags instead which only override specific parameters.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:42:42 +01:00
Till JS
fcd7c82ce4 fix(infra): simplify PostgreSQL backup to pg_dumpall + pg_basebackup
pgBackRest as Docker sidecar was overly complex (needs shared WAL
directory, stanza management, special entrypoint). Replace with a
simpler, proven approach using native PostgreSQL tools:

Backup container (postgres:16-alpine):
- Hourly: pg_dumpall | gzip (all databases as SQL, ~2 day retention)
- Daily 03:00: pg_basebackup -Ft -z (physical backup, 30 day retention)
- Auto-cleanup of old backups
- Storage: /Volumes/ManaData/backups/postgres/

Also: Remove pgbackrest.conf, simplify postgresql.conf (remove WAL
archiving config, keep performance tuning + replication for basebackup)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:39:20 +01:00
Till JS
39526918a3 feat(infra): add pgBackRest for PostgreSQL Point-in-Time Recovery
Replace simple pg_dumpall with pgBackRest PITR backup system.
This enables recovery to any second, not just the last daily dump.

Configuration:
- docker/postgres/postgresql.conf: WAL archiving + performance tuning
  (shared_buffers=512MB, effective_cache_size=2GB for 16GB Mac Mini)
- docker/postgres/pgbackrest.conf: stanza config + retention policy

Docker (docker-compose.macmini.yml):
- postgres: mount custom config, enable WAL archiving
- postgres-backup: new pgBackRest container
  - Storage: /Volumes/ManaData/backups/pgbackrest
  - Retention: 4 full + 14 differential (~4 weeks)
  - Compression: Zstandard (zst)

Backup Schedule:
- 03:00 daily: Full backup
- Every 6h: Differential (changes since last full)
- Every hour: Incremental (changes since last backup)
- Continuous: WAL archiving (every 60s)

Documentation (docs/POSTGRES_BACKUP.md):
- Complete restore procedures (full, PITR, single DB)
- First-time setup instructions
- Monitoring and alerting integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:18:33 +01:00
Till JS
169821de1a feat(monitoring): add LLM Grafana dashboard, Prometheus scraping, and alerts
Wire mana-llm service into the monitoring stack:

Prometheus (docker/prometheus/prometheus.yml):
- Add mana-llm scrape job (port 3025, 15s interval)
- Include mana-llm in ServiceDown alert expression

Alerts (docker/prometheus/alerts.yml):
- New llm_alerts group with 4 rules:
  - LLMServiceDown: mana-llm down > 1 min (critical)
  - LLMHighErrorRate: > 10% errors for 5 min (warning)
  - OllamaProviderDown: > 50% requests via Google fallback (warning)
  - LLMSlowResponses: p95 > 30s for 5 min (warning)

Grafana Dashboard (docker/grafana/dashboards/mana-llm.json):
- 6 stat panels: status, req/min, error rate, fallback rate, latency, tokens/min
- Requests by Provider (stacked area: Ollama vs Google vs OpenRouter)
- Tokens by Type (prompt vs completion)
- Latency Percentiles (p50, p90, p99)
- Latency by Provider comparison
- Requests by Model breakdown
- Errors by Type
- Google Fallback Rate over time (with threshold coloring)
- Provider Distribution pie chart (24h)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 11:16:27 +01:00
Till JS
1c5c2446f6 feat(citycorners): add city guide app for Konstanz with full monorepo integration
New project with three apps:
- Landing (Astro): static site with SVG illustrations, location data
- Backend (NestJS, port 3025): CRUD API for locations + favorites, Drizzle ORM, auth via mana-core-auth
- Web (SvelteKit, port 5196): Tailwind 4, PillNav, auth (login/register/SSO), Leaflet map, favorites with optimistic updates, theme/settings

Infrastructure: DB init SQL, setup-databases.sh, generate-env.mjs, root package.json scripts, Dockerfiles, docker-compose.macmini.yml (backend:3025, web:5022), Cloudflare wrangler.toml.

Branding: registered in shared-branding (AppId, APP_BRANDING, APP_ICONS, MANA_APPS, CitycornersLogo).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:56:26 +01:00
Till JS
143112f77a feat(observability): add mana-search, mana-media, and Synapse to monitoring
- Add Prometheus scraping for mana-search (port 3020, already has metrics)
- Add Prometheus scraping for mana-media (port 3015, MetricsModule added)
- Add Prometheus scraping for Matrix Synapse (port 9002, already enabled)
- Add MetricsModule to mana-media with media_ prefix
- Update Dockerfile for mana-media to include shared-nestjs-metrics
- Replace hardcoded ServiceDown alert list with dynamic regex
  (.*-backend|mana-core-auth|mana-search|mana-media|synapse)
- Replace hardcoded backends.json query with dynamic regex
- Add Search, Media, Synapse to master-overview and system-overview dashboards

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:46:59 +01:00
Till JS
cc5ba3bb90 chore: remove Hetzner legacy artifacts and update docs for Mac Mini self-hosting
Deleted files:
- docker/caddy/Caddyfile.production + Caddyfile.staging (Hetzner reverse proxy configs)
- scripts/deploy/ (deploy-hetzner.sh, build-and-push.sh, health-check.sh, migrate-db.sh, rollback.sh)
- scripts/generate-staging-secrets.sh
- cicd/ directory (11 Hetzner CI/CD planning docs)
- CI_CD_IMPLEMENTATION_SUMMARY.md, CI_CD_README.md, FILES_CREATED.md, HIVE_MIND_FINAL_REPORT.md

Updated docs:
- CLAUDE.md: Remove Hetzner Object Storage references, update to MinIO
- docs/ANALYTICS.md: Cloudflare Tunnel instead of Caddy
- docs/URL_SCHEMA.md: Mac Mini + Cloudflare Tunnel instead of Hetzner IP
- .env.development: Remove "Hetzner in production" comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 10:12:24 +01:00
Till JS
c8de944c8d feat(monitoring): add GlitchTip health check and disk space monitoring
- Add GlitchTip to health-check.sh monitoring endpoints
- Add native disk space checks for / and /Volumes/ManaData with 80%/90% thresholds
- Extend Prometheus disk alerts to include /host_mnt/Volumes/ManaData mountpoint
- Add ManaData disk usage gauge to Grafana system-overview dashboard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 09:33:09 +01:00
Till JS
c1ef55fd54 fix(infra): rename LightWrite to Mukke in Caddyfile production config
LightWrite was replaced by Mukke on the same ports (5180/3010).
Update reverse proxy to use mukke.mana.how and mukke-api.mana.how.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 09:26:54 +01:00
Till JS
6fa6509fa5 feat(observability): add metrics and monitoring for all 15 backends
- Add MetricsModule to 8 backends missing it (photos, zitare, mukke,
  planta, picture, storage, presi, nutriphi)
- Enable Prometheus scraping for all 15 backends in prometheus.yml
  (was only 6, with 3 commented out and 6 missing entirely)
- Update ServiceDown alert rule to cover all 15 backends
- Update Grafana dashboards (backends, master-overview, system-overview)
  with all backend services in health panels
- Fix imprecise regex in application-details dashboard

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 09:09:04 +01:00
Till JS
420926aef1 fix(infra): add no-cache headers for PWA files in Caddyfile
Ensure sw.js, manifest.webmanifest, and registerSW.js are never cached
by the browser or CDN so service worker updates are picked up immediately
after deploys. Uses a reusable Caddy snippet imported by all web app blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 19:10:49 +01:00
Till JS
403b1c7b87 feat(storage): add controller tests, Caddy config, and PWA improvements
Controller tests (50 tests, all passing):
- FileController: 12 tests (CRUD, upload, download with headers/URL mode)
- FolderController: 8 tests (CRUD, move, favorite)
- TrashController: 6 tests (restore file/folder, permanent delete, empty)
- SearchController: 6 tests (search, empty query, favorites)
- ShareController: 7 tests (CRUD, expiresInDays conversion, public token)
- TagController: 7 tests (CRUD with optional color)

Total test count now: 159 (133 backend + 26 web)

Deployment:
- Add Caddy reverse proxy entries for storage.mana.how and storage-api.mana.how

PWA:
- Upgrade to 'full' preset for better offline caching (fonts, external resources)
- Add app shortcuts: Dateien, Suche, Favoriten
- Improve offline page with links to cached pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 12:48:11 +01:00
Till JS
683a4c5331 feat(docker): add shared NestJS builder base image
- Add docker/Dockerfile.nestjs-base with all shared packages pre-built
- Convert 6 backend Dockerfiles (chat, todo, calendar, clock, contacts,
  mukke) to inherit from nestjs-base:local
- Fix bugs: duplicate shared-nestjs-setup builds (mukke), unnecessary
  shared-error-tracking rebuild in production stage (chat, clock)
- CD pipeline builds base image before services when backends deploy
- Net reduction: 317 lines removed, 112 added (-205 lines)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:48:31 +01:00
Till JS
d9ccb5e31b feat(games): add whopixels hosting at whopxl.mana.how
Dockerfile, docker-compose service (port 5100), Caddy and cloudflared
routing for the WhoPixels game. PORT is now configurable via env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 19:57:50 +01:00
Till JS
42fe39c6a2 fix(infra): fix deploy tracking dashboard datasource UIDs and instant queries
- Add explicit uid: deploy-tracking to datasource provisioning
- Add instant: true to all Prometheus stat/gauge panel queries
- Pushgateway gauges need instant queries, not range queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 17:35:41 +01:00
Till JS
dea632c6c7 fix(caddy): update all reverse proxy ports to match docker containers
Many Caddy ports were outdated and pointing to dead services:
- mana.how: 5173→5000
- chat: 3000→5010, chat-api: 3002→3030
- todo: 5188→5011
- calendar: 5186→5012, calendar-api: 3016→3032
- clock: 5187→5013, clock-api: 3017→3033
- contacts: 5184→5014
- grafana: 3100→8000, stats: 3200→8010

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 17:09:09 +01:00
Till JS
3f91c4656a feat(infra): add deploy tracking with PostgreSQL, Pushgateway & Grafana dashboard
Instrument the CD pipeline to record per-deploy and per-service metrics
(build time, image size, startup time, health status) into PostgreSQL and
push gauges to Pushgateway. Adds a Grafana dashboard with 13 panels covering
deploy frequency, build performance, service health, and history.

New files:
- scripts/mac-mini/init-deploy-tracking.sql (idempotent DDL)
- scripts/deploy-metrics.sh (bash library for CI)
- docker/grafana/provisioning/datasources/deploy-tracking.yml
- docker/grafana/dashboards/deploy-tracking.json

Modified:
- docker/prometheus/prometheus.yml (pushgateway scrape job)
- .github/workflows/cd-macmini.yml (build/health instrumentation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 17:08:03 +01:00
Till JS
f264e9f2ae feat(grafana): add GlitchTip error tracking dashboard
- Add PostgreSQL datasource pointing to GlitchTip database
- Add Error Tracking dashboard with 7 panels:
  - Total Open Issues (stat)
  - Issues by Project (pie chart)
  - Total Events (stat)
  - Projects Tracked (stat)
  - Resolved vs Unresolved (stat)
  - New Issues Over Time (stacked bar chart, 30 days)
  - Recent Issues (table with 50 latest, color-coded levels)
- Dashboard links to GlitchTip UI for detailed investigation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 21:14:09 +01:00
Till JS
b11e1284dc feat(error-tracking): add GlitchTip integration with shared error-tracking package
Infrastructure:
- Add GlitchTip (web + worker) to docker-compose.macmini.yml (port 8020)
- Add glitchtip.mana.how to Cloudflare Tunnel config
- Add glitchtip database to init-db SQL
- Add GLITCHTIP_DSN to .env.development

Shared Package (@manacore/shared-error-tracking):
- initErrorTracking() - Sentry-compatible init with GlitchTip DSN
- captureException(), captureMessage(), setUser(), setTag(), flush()
- SentryExceptionFilter for NestJS (captures 5xx errors only)
- Graceful no-op when DSN is not configured

Integration:
- Add instrument.ts to calendar, contacts, todo backends
- Import instrument.ts before app bootstrap in all 3 main.ts files
- Error tracking auto-initializes when GLITCHTIP_DSN env var is set

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 13:30:13 +01:00
Till JS
ea4b585f37 feat(context): add NestJS backend, PostgreSQL database, and migrate web app from Supabase to API
- Create NestJS backend on port 3020 with 4 modules (space, document, ai, token)
- Add Drizzle schema with 5 tables (spaces, documents, token_transactions, model_prices, user_tokens)
- Rewrite web services (spaces, documents, tokens, ai) to use shared API client instead of Supabase
- Move AI API keys server-side (Azure OpenAI, Google Gemini)
- Add seed script for model prices (gpt-4.1, gemini-pro, gemini-flash)
- Add 70 unit tests across 4 test suites (space, document, token, ai services)
- Add monorepo integration (setup-databases.sh, generate-env.mjs, docker init-db, root scripts)
- Remove @supabase/supabase-js dependency and delete supabase.ts from web app
- Update CLAUDE.md with full API documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 09:28:01 +01:00
Till-JS
6797195bdc 🔧 chore(infra): add lightwrite subdomain configuration
- Add lightwrite.mana.how → localhost:5180
- Add lightwrite-api.mana.how → localhost:3010

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-16 11:19:54 +01:00
Till-JS
d81b8aebf2 🔒 refactor(bots): remove !login command and enforce OIDC-only auth
- Remove !login and !logout commands from all 16+ Matrix bots
- Remove login/logout references from all help/welcome messages
- Disable password login in Synapse (password_config.enabled: false)
- System is now OIDC-only via Mana Core authentication

Users must authenticate via "Sign in with Mana Core" in Element.
Existing bot access tokens remain valid.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-14 11:26:58 +01:00
Till-JS
405084b52d 🔧 fix(skilltree): change web port to 5020 (5018 used by zitare) 2026-02-13 23:14:38 +01:00
Till-JS
d49d147257 🔧 chore(caddy): add skilltree reverse proxy config 2026-02-13 23:11:12 +01:00
Till-JS
acc8de36ee feat(monitoring): add alerting stack and maintenance scripts
Medium priority stability improvements:

Alerting:
- Add vmalert for evaluating Prometheus alert rules
- Add alertmanager for alert routing and grouping
- Add alert-notifier service for Telegram/ntfy notifications
- Enable cadvisor scraping in prometheus config

Disk Monitoring:
- Add check-disk-space.sh for hourly disk monitoring
- Alert on 80% (warning) and 90% (critical) thresholds
- Auto-cleanup Docker when disk is critical
- Add com.manacore.disk-check.plist for LaunchD

Weekly Reports:
- Add weekly-report.sh for system health summary
- Includes: backup status, disk usage, container health,
  database stats, error log summary
- Runs every Sunday at 10 AM via LaunchD

Health Check Updates:
- Add checks for vmalert, alertmanager, alert-notifier

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 13:46:57 +01:00
Till-JS
d5e18c9c27 🔧 fix(mac-mini): update health checks and disable missing services
- Disable api-gateway and skilltree-web (no working images/Dockerfiles)
- Fix mana-search Dockerfile healthcheck port and endpoint
- Update health-check.sh to skip disabled services
- Fix search service health endpoint (/api/v1/health)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 13:28:55 +01:00
Till-JS
8525020e8a feat(playground): integrate shared auth UI for consistent login experience
- Add PlaygroundLogo to shared-branding package
- Add playground to APP_BRANDING, APP_ICONS, and APP_URLS
- Replace custom login/register pages with shared-auth-ui components
- Update authStore with resendVerificationEmail and improved signUp
- Add Caddy reverse proxy entry for playground.mana.how

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 14:53:51 +01:00
Till-JS
fe33f4b355 fix(mana-core-auth): complete production readiness with test fixes
- Fix LoggerService mock in better-auth.service.spec.ts
- Fix name assertion in auth.controller.spec.ts (empty string fallback)
- Fix createRemoteJWKSet mock in jwt-auth.guard.spec.ts
- Add Grafana dashboard for Auth Service monitoring
- Add 10 auth-specific Prometheus alert rules
- Update production readiness plan to 100% complete

All 199 unit tests passing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 14:18:58 +01:00
Till-JS
fdaf6a9c75 🔧 fix(dashboards): fix broken panels and metrics
- Backends: Remove Docker container section (cAdvisor not deployed)
- Backends: Add Auth Service Runtime section with correct auth_ prefixed metrics
- Backends: Rename to "Backends Overview"
- Application Details: Fix Node.js Runtime to use auth_ prefixed metrics
- Application Details: Rename section to "Auth Service Runtime"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:54:07 +01:00
Till-JS
7aa5115c78 📊 feat(monitoring): add node-exporter for host system metrics
- Add node-exporter service to docker-compose for CPU/Memory/Disk monitoring
- Enable node-exporter scrape target in Prometheus config
- Update System Overview dashboard with Host System section:
  - CPU, Memory, Disk usage gauges
  - Total RAM, Total Disk, Uptime, Load stats
  - CPU & Memory over time graph
  - Network I/O graph
- Add Node Exporter to service status panel

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:38:44 +01:00
Till-JS
84e9f86db9 🔧 fix(grafana): rewrite System Overview with available metrics
- Removed node_* metrics (node-exporter not deployed)
- Removed container_last_seen (cAdvisor not deployed)
- Added Service Status, Traffic Overview, Database sections
- All panels now use available Prometheus metrics

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:33:11 +01:00
Till-JS
edbf775f37 📊 feat(grafana): add Total Requests and Requests/sec to Key Metrics
- Added Total Requests counter for overall user interaction
- Added Requests/sec for current load visibility
- Reduced panel width to fit 8 metrics in one row

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:32:01 +01:00
Till-JS
e7719eeba0 feat(grafana): enhance Master Overview with Key Metrics on top
- Move Key Metrics section to top of dashboard
- Add new panels: Services UP, Apps Running, Matrix Bots, Avg Response Time
- Reorganize layout for better overview at a glance
- Remove CPU/Memory/Disk (no node-exporter), add Redis Keys

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 12:28:53 +01:00
Till-JS
1b39aa8308 🔧 fix(prometheus): disable non-existent scrape targets
Commented out:
- node-exporter (container not deployed)
- cadvisor (container not deployed)
- storage/presi/nutriphi-backend (no /metrics endpoint yet)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 05:53:22 +01:00
Till-JS
dac6a85427 🔧 fix(prometheus): correct backend ports and add missing services
- chat-backend: 3002 → 3030
- todo-backend: 3018 → 3031
- calendar-backend: 3016 → 3032
- clock-backend: 3017 → 3033
- contacts-backend: 3015 → 3034
- Add storage-backend (3035), presi-backend (3036), nutriphi-backend (3037)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 05:51:50 +01:00
Till-JS
9b7d8c36b8 🐛 fix(grafana): correct VictoriaMetrics datasource port (8428 → 9090)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 05:13:48 +01:00
Till-JS
c28410b736 🔧 chore: enable OIDC for Matrix Synapse
- Add SYNAPSE_OIDC_CLIENT_SECRET to mana-core-auth env
- Enable OIDC provider config in homeserver.yaml
- Add matrix.mana.how and element.mana.how to CORS origins

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 03:25:59 +01:00
Till-JS
4d8c7f1a7c 🔧 chore: temporarily disable OIDC in synapse config
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 02:49:45 +01:00