managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 20:01:09 +02:00

Author	SHA1	Message	Date
Till JS	79a53cf70a	fix(infra): sync Prometheus + cloudflared ports with current deployment - Prometheus: mana-sync 3010→3051, mana-matrix-bot 4001→4000 - Cloudflared: api.mana.how 3060→3016 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 18:07:12 +01:00
Till JS	099a40bbd1	chore: replace all mana-core-auth references with mana-auth Update docker-compose (dev + macmini), CI/CD workflows, Prometheus, package.json scripts, env generation, database setup, CODEOWNERS, and dependabot to reference the new Hono-based mana-auth service. Delete zombie mana-core-auth directory (already removed from Git). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 18:05:31 +01:00
Till JS	14099cc42c	docs(infra): add PORT_SCHEMA.md + update Prometheus scrape targets Comprehensive port schema documentation as single source of truth. All services assigned to logical ranges: - 3000-3009: Core platform (auth, credits, subscriptions, user, analytics) - 3010-3019: Core infra (sync, media, search, notify, crawler, gateway) - 3020-3029: AI/ML (llm, stt, tts, image-gen, voice-bot) - 3030-3059: App backends - 4000-4099: Matrix stack - 5000-5059: Web frontends - 8000-8099: Monitoring - 9000-9199: Infrastructure exporters All port conflicts resolved. Prometheus targets updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 03:02:12 +01:00
Till JS	753c685ef7	feat(services): create mana-analytics, remove feedback/analytics/ai from auth Extract feedback, analytics, and AI modules from mana-core-auth into standalone mana-analytics service (Hono + Bun, Port 3064). New service (services/mana-analytics/): - User feedback CRUD with voting - AI-powered feedback title generation via mana-llm - Simplified from DuckDB analytics to pure PostgreSQL - ~550 LOC Removed from mana-core-auth: - feedback/ module (6 files) - analytics/ module (4 files) - ai/ module (3 files) - db/schema/feedback.schema.ts mana-core-auth now contains ONLY pure auth: - Better Auth (JWT, Sessions, 2FA, Passkeys, OIDC, Magic Links) - Organizations/Guilds (membership management) - API Keys, Security, Me (GDPR), Health, Metrics - Ready for Phase 5: Hono rewrite Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 02:29:24 +01:00
Till JS	ced7dd7441	feat(monitoring): add mana-sync, mana-notify, mana-crawler to Prometheus Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 02:18:21 +01:00
Till JS	d7799ec95d	refactor(photos): remove NestJS backend, use local-first + direct mana-media The Photos NestJS backend was a proxy to mana-media that enriched responses with local album/favorite/tag data. Now: - Albums store → local-first via albumCollection + albumItemCollection - Favorites → local-first via favoriteCollection (toggle in IndexedDB) - Photo tags → local-first via photoTagCollection - Photo listing/stats → direct mana-media API calls from frontend - Upload → direct mana-media upload from frontend - Delete → direct mana-media delete from frontend Removed 27 TypeScript files, 1 Docker container, 1 port (3039). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 02:18:03 +01:00
Till JS	dd2f814cf3	refactor(presi): replace NestJS backend with lightweight Hono server The Presi NestJS backend (40 source files, 50 deps) was a CRUD wrapper around decks, slides, and themes — all now handled by local-first sync. Only the share-link feature requires server-side state (public URLs without auth), so a minimal Hono + Bun server replaces the entire NestJS backend: - apps/presi/apps/server/ — Hono server with share routes + GDPR admin Uses @manacore/shared-hono for auth (JWKS), health, admin, errors - Web app API client stripped to share-only (was 270 lines → 90 lines) - Removed from docker-compose, CI/CD, Prometheus, env generation - NestJS backend deleted (40 TS files, 8 test specs, 3038 lines) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 02:08:40 +01:00
Till JS	32939fbfb5	refactor(infra): remove zitare + clock NestJS backends, add shared-hono package Both apps are fully local-first via Dexie.js + mana-sync. Their NestJS backends were pure CRUD wrappers (20 + 31 source files) that are no longer needed. Changes: - Add packages/shared-hono: JWT auth via JWKS (jose), Drizzle DB factory, health route, generic GDPR admin handler, error middleware - Migrate zitare lists page from fetch() to listsStore (local-first) - Rewrite clock timers store from API-based to timerCollection (Dexie) - Update clock +layout.svelte CommandBar search to use local collections - Remove zitare-backend + clock-backend from docker-compose, CI/CD, Prometheus, env generation, setup scripts - Add docs/TECHNOLOGY_AUDIT_2026_03.md with full repo analysis Net result: -2 Docker containers, -2 ports, -2728 lines of code Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 22:43:46 +01:00
Till JS	a31ccc6c62	feat(infra): add api.mana.how route + Prometheus scrape targets for Go services - Cloudflare Tunnel: api.mana.how → localhost:3060 (Go API Gateway) - Prometheus: scrape targets for mana-api-gateway:3060 and mana-matrix-bot:4000 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-27 21:27:04 +01:00
Till JS	169821de1a	feat(monitoring): add LLM Grafana dashboard, Prometheus scraping, and alerts Wire mana-llm service into the monitoring stack: Prometheus (docker/prometheus/prometheus.yml): - Add mana-llm scrape job (port 3025, 15s interval) - Include mana-llm in ServiceDown alert expression Alerts (docker/prometheus/alerts.yml): - New llm_alerts group with 4 rules: - LLMServiceDown: mana-llm down > 1 min (critical) - LLMHighErrorRate: > 10% errors for 5 min (warning) - OllamaProviderDown: > 50% requests via Google fallback (warning) - LLMSlowResponses: p95 > 30s for 5 min (warning) Grafana Dashboard (docker/grafana/dashboards/mana-llm.json): - 6 stat panels: status, req/min, error rate, fallback rate, latency, tokens/min - Requests by Provider (stacked area: Ollama vs Google vs OpenRouter) - Tokens by Type (prompt vs completion) - Latency Percentiles (p50, p90, p99) - Latency by Provider comparison - Requests by Model breakdown - Errors by Type - Google Fallback Rate over time (with threshold coloring) - Provider Distribution pie chart (24h) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 11:16:27 +01:00
Till JS	143112f77a	feat(observability): add mana-search, mana-media, and Synapse to monitoring - Add Prometheus scraping for mana-search (port 3020, already has metrics) - Add Prometheus scraping for mana-media (port 3015, MetricsModule added) - Add Prometheus scraping for Matrix Synapse (port 9002, already enabled) - Add MetricsModule to mana-media with media_ prefix - Update Dockerfile for mana-media to include shared-nestjs-metrics - Replace hardcoded ServiceDown alert list with dynamic regex (.*-backend\|mana-core-auth\|mana-search\|mana-media\|synapse) - Replace hardcoded backends.json query with dynamic regex - Add Search, Media, Synapse to master-overview and system-overview dashboards Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 10:46:59 +01:00
Till JS	c8de944c8d	feat(monitoring): add GlitchTip health check and disk space monitoring - Add GlitchTip to health-check.sh monitoring endpoints - Add native disk space checks for / and /Volumes/ManaData with 80%/90% thresholds - Extend Prometheus disk alerts to include /host_mnt/Volumes/ManaData mountpoint - Add ManaData disk usage gauge to Grafana system-overview dashboard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 09:33:09 +01:00
Till JS	6fa6509fa5	feat(observability): add metrics and monitoring for all 15 backends - Add MetricsModule to 8 backends missing it (photos, zitare, mukke, planta, picture, storage, presi, nutriphi) - Enable Prometheus scraping for all 15 backends in prometheus.yml (was only 6, with 3 commented out and 6 missing entirely) - Update ServiceDown alert rule to cover all 15 backends - Update Grafana dashboards (backends, master-overview, system-overview) with all backend services in health panels - Fix imprecise regex in application-details dashboard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 09:09:04 +01:00
Till JS	3f91c4656a	feat(infra): add deploy tracking with PostgreSQL, Pushgateway & Grafana dashboard Instrument the CD pipeline to record per-deploy and per-service metrics (build time, image size, startup time, health status) into PostgreSQL and push gauges to Pushgateway. Adds a Grafana dashboard with 13 panels covering deploy frequency, build performance, service health, and history. New files: - scripts/mac-mini/init-deploy-tracking.sql (idempotent DDL) - scripts/deploy-metrics.sh (bash library for CI) - docker/grafana/provisioning/datasources/deploy-tracking.yml - docker/grafana/dashboards/deploy-tracking.json Modified: - docker/prometheus/prometheus.yml (pushgateway scrape job) - .github/workflows/cd-macmini.yml (build/health instrumentation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-20 17:08:03 +01:00
Till-JS	acc8de36ee	feat(monitoring): add alerting stack and maintenance scripts Medium priority stability improvements: Alerting: - Add vmalert for evaluating Prometheus alert rules - Add alertmanager for alert routing and grouping - Add alert-notifier service for Telegram/ntfy notifications - Enable cadvisor scraping in prometheus config Disk Monitoring: - Add check-disk-space.sh for hourly disk monitoring - Alert on 80% (warning) and 90% (critical) thresholds - Auto-cleanup Docker when disk is critical - Add com.manacore.disk-check.plist for LaunchD Weekly Reports: - Add weekly-report.sh for system health summary - Includes: backup status, disk usage, container health, database stats, error log summary - Runs every Sunday at 10 AM via LaunchD Health Check Updates: - Add checks for vmalert, alertmanager, alert-notifier Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-12 13:46:57 +01:00
Till-JS	fe33f4b355	✅ fix(mana-core-auth): complete production readiness with test fixes - Fix LoggerService mock in better-auth.service.spec.ts - Fix name assertion in auth.controller.spec.ts (empty string fallback) - Fix createRemoteJWKSet mock in jwt-auth.guard.spec.ts - Add Grafana dashboard for Auth Service monitoring - Add 10 auth-specific Prometheus alert rules - Update production readiness plan to 100% complete All 199 unit tests passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 14:18:58 +01:00
Till-JS	7aa5115c78	📊 feat(monitoring): add node-exporter for host system metrics - Add node-exporter service to docker-compose for CPU/Memory/Disk monitoring - Enable node-exporter scrape target in Prometheus config - Update System Overview dashboard with Host System section: - CPU, Memory, Disk usage gauges - Total RAM, Total Disk, Uptime, Load stats - CPU & Memory over time graph - Network I/O graph - Add Node Exporter to service status panel Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 12:38:44 +01:00
Till-JS	1b39aa8308	🔧 fix(prometheus): disable non-existent scrape targets Commented out: - node-exporter (container not deployed) - cadvisor (container not deployed) - storage/presi/nutriphi-backend (no /metrics endpoint yet) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 05:53:22 +01:00
Till-JS	dac6a85427	🔧 fix(prometheus): correct backend ports and add missing services - chat-backend: 3002 → 3030 - todo-backend: 3018 → 3031 - calendar-backend: 3016 → 3032 - clock-backend: 3017 → 3033 - contacts-backend: 3015 → 3034 - Add storage-backend (3035), presi-backend (3036), nutriphi-backend (3037) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-01 05:51:50 +01:00
Till-JS	edf13b7102	revert: fix CI by reverting Telegram notifications Reverting `618c58c5` which broke the CI workflow. Will re-add notifications after fixing the issue. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 10:40:10 +01:00
Till-JS	618c58c519	feat(ci): add Telegram notifications and Grafana CI/CD dashboard - Add notify-start job with Telegram notification for build start - Add notify-complete job with build status and duration notification - Push CI metrics to Prometheus Pushgateway for Grafana visualization - Create CI/CD Grafana dashboard with build status, duration, and history - Add Pushgateway scrape config to Prometheus Requires TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, and PUSHGATEWAY_URL secrets. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 10:31:17 +01:00
Till-JS	8c259a008b	feat(monitoring): add comprehensive Grafana dashboards and alerting New dashboards: - Application Details: Node.js runtime (heap, event loop, GC), HTTP details (status codes, methods, top routes), error analysis - Database Details: PostgreSQL and Redis metrics with detailed breakdowns Alerting rules (docker/prometheus/alerts.yml): - Service: down, high/very high error rate, slow response time - Infrastructure: high CPU/memory/disk usage - Database: PostgreSQL/Redis down, high connections, low cache hit - Container: high CPU/memory, restarts All dashboards include service selector variable for filtering. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-26 09:47:18 +01:00
Till-JS	6d86a08d63	feat: add monitoring dashboard (Prometheus + Grafana + Umami + Admin) Phase 1: Infrastructure - Add docker/prometheus/prometheus.yml with scrape configs for all services - Add docker/grafana/provisioning for auto-configured datasources - Add docker/grafana/dashboards (system-overview, backends-docker) - Update docker-compose.macmini.yml with monitoring services: - prometheus, grafana, node-exporter, cadvisor - postgres-exporter, redis-exporter, umami - Add grafana.mana.how and analytics.mana.how to Caddyfile Phase 2: Backend Metrics - Create packages/shared-nestjs-metrics with: - MetricsModule (auto /metrics endpoint) - MetricsService (Counter, Histogram, Gauge helpers) - MetricsMiddleware (auto HTTP request tracking) Phase 3: Umami Web Analytics - Add Umami tracking scripts to all landing pages - Add Umami tracking scripts to all web apps - Create scripts/mac-mini/setup-umami-db.sh Phase 4: Admin Dashboard (ManaCore Web) - Add admin routes: /admin, /admin/users, /admin/system - Create StatCard, QuickLinks, UserTable components - Add Admin link to navigation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-23 15:31:39 +01:00

23 commits