shared-errors, shared-logger, shared-llm, notify-client are not
needed by SvelteKit web apps. Their presence caused transitive
dependency conflicts (astro check failing).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Route all AI workloads (Ollama, STT, TTS, Image Gen) to GPU server
(192.168.178.11) via LAN instead of host.docker.internal
- Upgrade default model to gemma3:12b and max concurrent to 5
- Add daily signup limit service (MAX_DAILY_SIGNUPS env var)
- Add GET /api/v1/auth/signup-status public endpoint
- Add k6 load test suite (web-apps, auth, sync-websocket, ollama)
- Add capacity planning documentation
- Fix: add eslint-config to sveltekit-base and calendar Dockerfiles
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After extensive package restructuring (deletions, consolidations, new
packages), the frozen lockfile causes resolution failures in Docker.
Use --no-frozen-lockfile until lockfile stabilizes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 8 packages that were created after the base image was defined:
subscriptions, credits, shared-hono, shared-storage, shared-landing-ui,
shared-llm, notify-client, shared-errors, shared-logger
Fixes: Rollup failed to resolve @manacore/subscriptions during web builds.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace deleted shared-feedback-* and shared-help-* packages with
consolidated @manacore/feedback and @manacore/help packages.
Add local-store and shared-auth-stores packages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add mana-user (3062), mana-subscriptions (3063), mana-analytics (3064)
to docker-compose with health checks and traefik labels
- Replace old NestJS Tier 3 app backends (~300 lines) with comment
placeholder for Hono compute servers (need shared Dockerfile)
- Create docker/Dockerfile.hono-server — shared Bun Dockerfile for
all 14 app compute servers (ARG APP for build context)
- Add 5 new databases to setup-databases.sh: mana_auth, mana_credits,
mana_user, mana_subscriptions, mana_analytics, mana_sync
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract feedback, analytics, and AI modules from mana-core-auth into
standalone mana-analytics service (Hono + Bun, Port 3064).
New service (services/mana-analytics/):
- User feedback CRUD with voting
- AI-powered feedback title generation via mana-llm
- Simplified from DuckDB analytics to pure PostgreSQL
- ~550 LOC
Removed from mana-core-auth:
- feedback/ module (6 files)
- analytics/ module (4 files)
- ai/ module (3 files)
- db/schema/feedback.schema.ts
mana-core-auth now contains ONLY pure auth:
- Better Auth (JWT, Sessions, 2FA, Passkeys, OIDC, Magic Links)
- Organizations/Guilds (membership management)
- API Keys, Security, Me (GDPR), Health, Metrics
- Ready for Phase 5: Hono rewrite
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Photos NestJS backend was a proxy to mana-media that enriched
responses with local album/favorite/tag data. Now:
- Albums store → local-first via albumCollection + albumItemCollection
- Favorites → local-first via favoriteCollection (toggle in IndexedDB)
- Photo tags → local-first via photoTagCollection
- Photo listing/stats → direct mana-media API calls from frontend
- Upload → direct mana-media upload from frontend
- Delete → direct mana-media delete from frontend
Removed 27 TypeScript files, 1 Docker container, 1 port (3039).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Presi NestJS backend (40 source files, 50 deps) was a CRUD wrapper
around decks, slides, and themes — all now handled by local-first sync.
Only the share-link feature requires server-side state (public URLs
without auth), so a minimal Hono + Bun server replaces the entire
NestJS backend:
- apps/presi/apps/server/ — Hono server with share routes + GDPR admin
Uses @manacore/shared-hono for auth (JWKS), health, admin, errors
- Web app API client stripped to share-only (was 270 lines → 90 lines)
- Removed from docker-compose, CI/CD, Prometheus, env generation
- NestJS backend deleted (40 TS files, 8 test specs, 3038 lines)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both apps are fully local-first via Dexie.js + mana-sync. Their NestJS
backends were pure CRUD wrappers (20 + 31 source files) that are no
longer needed.
Changes:
- Add packages/shared-hono: JWT auth via JWKS (jose), Drizzle DB factory,
health route, generic GDPR admin handler, error middleware
- Migrate zitare lists page from fetch() to listsStore (local-first)
- Rewrite clock timers store from API-based to timerCollection (Dexie)
- Update clock +layout.svelte CommandBar search to use local collections
- Remove zitare-backend + clock-backend from docker-compose, CI/CD,
Prometheus, env generation, setup scripts
- Add docs/TECHNOLOGY_AUDIT_2026_03.md with full repo analysis
Net result: -2 Docker containers, -2 ports, -2728 lines of code
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Cloudflare Tunnel: api.mana.how → localhost:3060 (Go API Gateway)
- Prometheus: scrape targets for mana-api-gateway:3060 and mana-matrix-bot:4000
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.sveltekit-base: pre-built base with all 34 shared
packages (mirrors nestjs-base pattern), eliminates redundant COPY/build
steps from individual web Dockerfiles
- Add scripts/mac-mini/build-app.sh: stops monitoring stack before build
to free RAM, auto-restarts on exit (trap cleanup)
- Migrate todo web Dockerfile to use sveltekit-base:local (47 COPY lines
→ 2, 4 build steps → 0)
- Update CD workflow to build sveltekit-base when deploying web apps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validator now checks 52 Dockerfiles (web + backend + service).
Fixed 10 missing COPYs across backends, services, and nestjs-base.
Generator also supports backend/service Dockerfiles with markers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two infrastructure improvements for tech independence:
1. Cloudflare Fallback Documentation (docs/CLOUDFLARE_FALLBACK.md):
- Plan B: WireGuard + Caddy on Hetzner VPS (€3.79/mo)
- Complete Caddyfile with all 30+ subdomains
- Step-by-step failover checklist (~15 min to switch)
- Plan C: Direct IP with ISP
2. Self-Hosted Landing Pages (eliminates Cloudflare Pages dependency):
- Nginx container (mana-infra-landings) on port 4400
- Multi-site config: each subdomain → separate dist/ folder
- Build script: scripts/mac-mini/build-landings.sh
- Cloudflare Tunnel ingress rules for 10 landing page domains
- Storage: /Volumes/ManaData/landings/ on external SSD
- Domains: it, chats, pics, zitares, presis, clocks,
manadeck, nutriphi, citycorners, docs
Migration path: Build landings locally, set Cloudflare DNS to
tunnel instead of Pages, then decommission CF Pages projects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The config_file override replaced the entire default PostgreSQL config
including listen_addresses, breaking inter-container communication.
Use inline -c flags instead which only override specific parameters.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire mana-llm service into the monitoring stack:
Prometheus (docker/prometheus/prometheus.yml):
- Add mana-llm scrape job (port 3025, 15s interval)
- Include mana-llm in ServiceDown alert expression
Alerts (docker/prometheus/alerts.yml):
- New llm_alerts group with 4 rules:
- LLMServiceDown: mana-llm down > 1 min (critical)
- LLMHighErrorRate: > 10% errors for 5 min (warning)
- OllamaProviderDown: > 50% requests via Google fallback (warning)
- LLMSlowResponses: p95 > 30s for 5 min (warning)
Grafana Dashboard (docker/grafana/dashboards/mana-llm.json):
- 6 stat panels: status, req/min, error rate, fallback rate, latency, tokens/min
- Requests by Provider (stacked area: Ollama vs Google vs OpenRouter)
- Tokens by Type (prompt vs completion)
- Latency Percentiles (p50, p90, p99)
- Latency by Provider comparison
- Requests by Model breakdown
- Errors by Type
- Google Fallback Rate over time (with threshold coloring)
- Provider Distribution pie chart (24h)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New project with three apps:
- Landing (Astro): static site with SVG illustrations, location data
- Backend (NestJS, port 3025): CRUD API for locations + favorites, Drizzle ORM, auth via mana-core-auth
- Web (SvelteKit, port 5196): Tailwind 4, PillNav, auth (login/register/SSO), Leaflet map, favorites with optimistic updates, theme/settings
Infrastructure: DB init SQL, setup-databases.sh, generate-env.mjs, root package.json scripts, Dockerfiles, docker-compose.macmini.yml (backend:3025, web:5022), Cloudflare wrangler.toml.
Branding: registered in shared-branding (AppId, APP_BRANDING, APP_ICONS, MANA_APPS, CitycornersLogo).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add GlitchTip to health-check.sh monitoring endpoints
- Add native disk space checks for / and /Volumes/ManaData with 80%/90% thresholds
- Extend Prometheus disk alerts to include /host_mnt/Volumes/ManaData mountpoint
- Add ManaData disk usage gauge to Grafana system-overview dashboard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LightWrite was replaced by Mukke on the same ports (5180/3010).
Update reverse proxy to use mukke.mana.how and mukke-api.mana.how.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MetricsModule to 8 backends missing it (photos, zitare, mukke,
planta, picture, storage, presi, nutriphi)
- Enable Prometheus scraping for all 15 backends in prometheus.yml
(was only 6, with 3 commented out and 6 missing entirely)
- Update ServiceDown alert rule to cover all 15 backends
- Update Grafana dashboards (backends, master-overview, system-overview)
with all backend services in health panels
- Fix imprecise regex in application-details dashboard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensure sw.js, manifest.webmanifest, and registerSW.js are never cached
by the browser or CDN so service worker updates are picked up immediately
after deploys. Uses a reusable Caddy snippet imported by all web app blocks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.nestjs-base with all shared packages pre-built
- Convert 6 backend Dockerfiles (chat, todo, calendar, clock, contacts,
mukke) to inherit from nestjs-base:local
- Fix bugs: duplicate shared-nestjs-setup builds (mukke), unnecessary
shared-error-tracking rebuild in production stage (chat, clock)
- CD pipeline builds base image before services when backends deploy
- Net reduction: 317 lines removed, 112 added (-205 lines)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dockerfile, docker-compose service (port 5100), Caddy and cloudflared
routing for the WhoPixels game. PORT is now configurable via env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add explicit uid: deploy-tracking to datasource provisioning
- Add instant: true to all Prometheus stat/gauge panel queries
- Pushgateway gauges need instant queries, not range queries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the CD pipeline to record per-deploy and per-service metrics
(build time, image size, startup time, health status) into PostgreSQL and
push gauges to Pushgateway. Adds a Grafana dashboard with 13 panels covering
deploy frequency, build performance, service health, and history.
New files:
- scripts/mac-mini/init-deploy-tracking.sql (idempotent DDL)
- scripts/deploy-metrics.sh (bash library for CI)
- docker/grafana/provisioning/datasources/deploy-tracking.yml
- docker/grafana/dashboards/deploy-tracking.json
Modified:
- docker/prometheus/prometheus.yml (pushgateway scrape job)
- .github/workflows/cd-macmini.yml (build/health instrumentation)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add PostgreSQL datasource pointing to GlitchTip database
- Add Error Tracking dashboard with 7 panels:
- Total Open Issues (stat)
- Issues by Project (pie chart)
- Total Events (stat)
- Projects Tracked (stat)
- Resolved vs Unresolved (stat)
- New Issues Over Time (stacked bar chart, 30 days)
- Recent Issues (table with 50 latest, color-coded levels)
- Dashboard links to GlitchTip UI for detailed investigation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Infrastructure:
- Add GlitchTip (web + worker) to docker-compose.macmini.yml (port 8020)
- Add glitchtip.mana.how to Cloudflare Tunnel config
- Add glitchtip database to init-db SQL
- Add GLITCHTIP_DSN to .env.development
Shared Package (@manacore/shared-error-tracking):
- initErrorTracking() - Sentry-compatible init with GlitchTip DSN
- captureException(), captureMessage(), setUser(), setTag(), flush()
- SentryExceptionFilter for NestJS (captures 5xx errors only)
- Graceful no-op when DSN is not configured
Integration:
- Add instrument.ts to calendar, contacts, todo backends
- Import instrument.ts before app bootstrap in all 3 main.ts files
- Error tracking auto-initializes when GLITCHTIP_DSN env var is set
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Create NestJS backend on port 3020 with 4 modules (space, document, ai, token)
- Add Drizzle schema with 5 tables (spaces, documents, token_transactions, model_prices, user_tokens)
- Rewrite web services (spaces, documents, tokens, ai) to use shared API client instead of Supabase
- Move AI API keys server-side (Azure OpenAI, Google Gemini)
- Add seed script for model prices (gpt-4.1, gemini-pro, gemini-flash)
- Add 70 unit tests across 4 test suites (space, document, token, ai services)
- Add monorepo integration (setup-databases.sh, generate-env.mjs, docker init-db, root scripts)
- Remove @supabase/supabase-js dependency and delete supabase.ts from web app
- Update CLAUDE.md with full API documentation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove !login and !logout commands from all 16+ Matrix bots
- Remove login/logout references from all help/welcome messages
- Disable password login in Synapse (password_config.enabled: false)
- System is now OIDC-only via Mana Core authentication
Users must authenticate via "Sign in with Mana Core" in Element.
Existing bot access tokens remain valid.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Medium priority stability improvements:
Alerting:
- Add vmalert for evaluating Prometheus alert rules
- Add alertmanager for alert routing and grouping
- Add alert-notifier service for Telegram/ntfy notifications
- Enable cadvisor scraping in prometheus config
Disk Monitoring:
- Add check-disk-space.sh for hourly disk monitoring
- Alert on 80% (warning) and 90% (critical) thresholds
- Auto-cleanup Docker when disk is critical
- Add com.manacore.disk-check.plist for LaunchD
Weekly Reports:
- Add weekly-report.sh for system health summary
- Includes: backup status, disk usage, container health,
database stats, error log summary
- Runs every Sunday at 10 AM via LaunchD
Health Check Updates:
- Add checks for vmalert, alertmanager, alert-notifier
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Disable api-gateway and skilltree-web (no working images/Dockerfiles)
- Fix mana-search Dockerfile healthcheck port and endpoint
- Update health-check.sh to skip disabled services
- Fix search service health endpoint (/api/v1/health)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PlaygroundLogo to shared-branding package
- Add playground to APP_BRANDING, APP_ICONS, and APP_URLS
- Replace custom login/register pages with shared-auth-ui components
- Update authStore with resendVerificationEmail and improved signUp
- Add Caddy reverse proxy entry for playground.mana.how
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix LoggerService mock in better-auth.service.spec.ts
- Fix name assertion in auth.controller.spec.ts (empty string fallback)
- Fix createRemoteJWKSet mock in jwt-auth.guard.spec.ts
- Add Grafana dashboard for Auth Service monitoring
- Add 10 auth-specific Prometheus alert rules
- Update production readiness plan to 100% complete
All 199 unit tests passing.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add node-exporter service to docker-compose for CPU/Memory/Disk monitoring
- Enable node-exporter scrape target in Prometheus config
- Update System Overview dashboard with Host System section:
- CPU, Memory, Disk usage gauges
- Total RAM, Total Disk, Uptime, Load stats
- CPU & Memory over time graph
- Network I/O graph
- Add Node Exporter to service status panel
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Removed node_* metrics (node-exporter not deployed)
- Removed container_last_seen (cAdvisor not deployed)
- Added Service Status, Traffic Overview, Database sections
- All panels now use available Prometheus metrics
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>