managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-15 00:01:10 +02:00

Author	SHA1	Message	Date
Till JS	9ef97a1877	feat(news): backend ingester service + curated feed API Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON feeds into news.curated_articles every 15 min, with Mozilla Readability fallback for thin RSS bodies and 30-day retention. apps/api /feed is rewritten to read from the new pool table directly instead of the sync_changes hack, with topics/lang/since/limit/offset query params. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:53:26 +02:00
Till JS	45790ffbb8	refactor(mana): rename inventar → inventory across the codebase The workbench-registry app id 'inventar' did not match its @mana/shared-branding MANA_APPS counterpart 'inventory', so the tier- gating join in apps/web/src/lib/app-registry/registry.ts silently failed for the inventory module — it fell into the "no MANA_APPS entry, default visible" fallback and was effectively un-gated. The codebase had also voted overwhelmingly for 'inventar' (53 files) vs 'inventory' (3 files in shared-branding), so the long-standing mismatch was just bookkeeping debt waiting to bite. Pre-release, no live data, so the cleanest fix is to align everything on the English 'inventory': - Workbench-registry id, module.config.ts appId, module folder, route folder and i18n locale folder all renamed via git mv - Standalone apps/inventar/ workspace package renamed - All imports, store identifiers (InventarEvents → InventoryEvents, INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto paths follow the rename - The German display label "Inventar" is preserved everywhere it is a user-visible string (page titles, i18n values, toast labels) - Dexie table prefixes (invCollections, invItems, …) are unchanged - Drive-by fix: ListView.svelte was querying non-existent inventarCollections/inventarItems tables — corrected to the actual invCollections/invItems names from module.config - The "inventar ↔ inventory id mismatch" workaround comment in registry.ts is removed since the mismatch no longer exists module-registry.ts also picks up the user's parallel newsModuleConfig addition because both edits land in the same import block — keeping them split would have left the build in an inconsistent state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:50:24 +02:00
Till JS	b8f2d8f694	docs(local-dev): document setup-dev-user + the three founder accounts Adds a "Local Login & Dev Users" section to docs/LOCAL_DEVELOPMENT.md and a short pointer in services/mana-auth/CLAUDE.md so the next dev finds the script without first hitting the "why can't I log in?" wall: - Why it exists (no admin seed, requireEmailVerification + no SMTP) - The 3 default accounts + password - Single-account form + env overrides (TIER, AUTH_URL, …) - Idempotency promise - Prereqs (Postgres + mana-auth on :3001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:26:37 +02:00
Till JS	fbb71f9366	feat(admin): replace mock dashboard stats with real /admin/stats endpoint The /admin route in the unified Mana web app was rendering hardcoded mock data (42 users, 156 successful logins, 3 failed) for every admin who opened it. The previous code had a TODO comment to wire up a real endpoint and the backend half had been waiting for the frontend half ever since the consolidation landed. Backend (mana-auth): Add GET /api/v1/admin/stats — admin-only, returns the seven counts the dashboard needs in a single response. Each count is its own Drizzle query against auth.users / auth.sessions / auth.login_ attempts; they run in parallel via Promise.all so total latency is dominated by the round-trip to Postgres, not the per-query work. Stats: - totalUsers → users where deleted_at IS NULL - newUsers7d → users created in the last 7 days - newUsers30d → users created in the last 30 days - activeSessions → sessions where expires_at > now() AND not revoked - uniqueUsers24h → distinct user_id from sessions with last_activity in the last 24h (and not revoked) - loginSuccess7d → login_attempts where successful=true, last 7d - loginFailed7d → login_attempts where successful=false, last 7d Plus a generatedAt ISO timestamp so the client can show staleness if it ever caches the response. Frontend (apps/mana/apps/web): - Add adminService.getStats() in the existing admin API service (sits next to getUsers / getUserData / deleteUserData; uses the same authenticated base-client and ApiResult envelope). - Replace the onMount mock-data block in admin/+page.svelte with a single adminService.getStats() call. Drop the local Stats interface in favor of the AdminStats type exported from the service. - Guard the Success Rate calculation against division by zero on fresh deployments — when there have been no login attempts in the last 7 days, render '—%' instead of NaN%. Verification: - mana-auth type-check unchanged (baseline errors only) - mana-auth runtime tests still 19/19 passing - svelte-check on the two changed web files: zero errors Closes item #12 in docs/REFACTORING_AUDIT_2026_04.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:20:18 +02:00
Till JS	034a07d166	chore(workspace): remove redundant nested lockfiles + workspace.yaml Three pnpm artifacts that were either Pre-Consolidation leftovers or unintentional drift: - apps/context/pnpm-lock.yaml + apps/context/pnpm-workspace.yaml apps/context used to be its own nested workspace declaring apps/* and packages/. After consolidation only apps/context/ apps/mobile remains, and the root pnpm-workspace.yaml already matches it via 'apps//apps/'. The nested lockfile (242 KB) was a separate dependency graph drifting independently from the root. - services/mana-media/packages/client/pnpm-lock.yaml Anomalous lockfile in a workspace sub-package. The root workspace already covers services//packages/* — no reason for client/ to maintain its own resolution. Verified after deletion: - pnpm install completes cleanly (~16s) and now resolves apps/context/apps/mobile from the root lockfile (pnpm list confirms the workspace registration) - apps/api type-check still 0 errors - mana-auth tests still 19/19 passing Tracked as item #26 in docs/REFACTORING_AUDIT_2026_04.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 11:57:11 +02:00
Till JS	e19a81c83c	test(mana-auth): sso-config consistency spec Locks in the relationship between three places that must agree about SSO origin configuration: 1. TRUSTED_ORIGINS in better-auth.config.ts (Better Auth allow-list) 2. CORS_ORIGINS env var on mana-auth in docker-compose.macmini.yml 3. The HTTPS subset of (1) must be a subset of (2) — every origin Better Auth trusts must also pass CORS preflight Background: root CLAUDE.md references this spec file as the canonical "Adding an app to SSO" verification step (line 116) but the file itself never existed. The first run of this spec immediately caught two real bugs: - 3 origins in TRUSTED_ORIGINS were missing from CORS_ORIGINS (https://auth.mana.how, https://arcade.mana.how, https://whopxl.mana.how) - 22 zombie subdomain entries in CORS_ORIGINS left over from before the consolidation (calendar, chat, todo, ...) that no app actually routes to anymore Both fixes shipped together with the TRUSTED_ORIGINS extraction in the broader pre-launch sweep (commit `919fcca4b`). This spec is the guard against the same drift creeping back in. Eight tests: - canonical mana.how + auth subdomain present - localhost dev origins (3001, 5173) present - all production origins HTTPS - all production origins on *.mana.how - no duplicates - every HTTPS trusted origin appears in mana-auth CORS_ORIGINS - soft warning for CORS_ORIGINS entries not in trustedOrigins (catches drift in the other direction) 8/8 pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 11:55:30 +02:00
Till JS	919fcca4b7	refactor(shared-tailwind): rewrite themes.css to single-layer shadcn convention Pre-launch theme system audit found multiple parallel layers in themes.css (--theme-X full hsl strings, --X partial shadcn aliases, --color-X populated by runtime store with raw channels) plus dead-code companion files. The inconsistency caused light-mode regressions when scoped-CSS consumers wrote `var(--color-X)` standalone — the variable holds raw HSL channels which is invalid as a color value, browser fell back to inherited (white). Rewrite to one consistent layer: - Source of truth: --color-X defined as raw HSL channels (e.g. `0 0% 17%`) in :root, .dark, and all variant [data-theme="..."] blocks. Matches the format the runtime store (@mana/shared-theme/src/utils.ts) writes, eliminating the static-fallback-vs-runtime mismatch and the corresponding flash of unstyled content on hydration. - @theme inline uses self-reference + Tailwind v4 <alpha-value> placeholder so utility classes generate correctly AND opacity modifiers work: `text-foreground/50` → `hsl(var(--color-foreground) / 0.5)`. - @layer components (.btn-primary, .card, .badge, etc.) wraps var(--color-X) refs with hsl() — they were broken in light mode too for the same reason. Convention going forward (also documented in the file header): 1. Markup: use Tailwind utility classes (text-foreground, bg-card, …) 2. Scoped CSS: hsl(var(--color-X)) — always wrap with hsl() 3. NEVER raw var(--color-X) in CSS — that's the bug pattern Net file: 692 → 580 LOC. Single source layer, no indirection. Also delete dead companion files (zero imports anywhere): - tailwind-v4.css (had broken self-reference, never imported) - theme-variables.css (legacy hex-based palette) - components.css (legacy component utilities) - index.js / preset.js / colors.js (Tailwind v3 preset format, irrelevant under Tailwind v4) package.json exports map shrinks accordingly to just `./themes.css`. Consumers using `hsl(var(--color-X))` (~379 files across mana-web, manavoxel-web, arcade-web) keep working unchanged — the public API name `--color-X` is preserved. Only the broken pattern `var(--color-X)` (~61 files) needs a follow-up sweep, handled in a separate commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 01:13:06 +02:00
Till JS	d941ff2231	fix(mana-auth): account lockout was structurally dead + add failure-path tests While adding negative-path integration tests for the auth flow I discovered that neither of the lockout primitives in services/mana-auth/src/services/security.ts has actually been working in production. Two independent silent failures that combined into a "the lockout never triggers, ever" outcome: 1. recordAttempt() inserted into auth.login_attempts with explicit `id = gen_random_uuid()`, but auth.login_attempts.id is a `serial integer` column with `nextval('auth.login_attempts_id_seq')` as default. The UUID-into-integer cast threw a type error every single time, the bare `catch {}` swallowed it as "non-critical", and not a single login attempt was ever persisted. Lockout's "5 failures in 15 min" check was running against an empty table. 2. checkLockout() built `attempted_at > ${new Date(...)}` via the drizzle sql template, but postgres-js cannot bind a JS Date object directly — it tries to byteLength() the parameter and crashes with `Received an instance of Date`. Same anti-pattern: bare `catch`, returns `{locked: false}` (fail-open), no log, completely invisible. Both are "silent broken since the encryption-vault series of changes" class — caught only because the integration test for the lockout flow expected the 6th login attempt to return 429 and got 200 instead. Fixes: - recordAttempt(): drop the bogus `id` column from the INSERT (let the sequence default assign it), default ipAddress to null instead of letting `${undefined}` collapse the parameter slot, and surface errors in the catch instead of swallowing them silently. - checkLockout(): pass `windowStart.toISOString()` instead of the Date object so postgres-js can serialize it. Same catch upgrade — log the cause when failing open. Failure-path test additions (tests/integration/auth-failures.test.ts): - wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events, +1 row in auth.login_attempts - account lockout: 5 failed attempts then 6th returns 429 with remainingSeconds, even with the correct password - unverified email login: 403 with code = EMAIL_NOT_VERIFIED - validate with garbage token: valid !== true - resend verification: second mail arrives in mailpit Plus the run-integration-tests.sh helper now runs both .test.ts files and tests/integration/package.json's `test` script does the same. Negative-control: reverted the recordAttempt fix (re-added the bogus gen_random_uuid id), the wrong-password test failed at the login_attempts assertion. Reverted the checkLockout fix, the lockout test failed at the 429 assertion. Both fixes verified to be load-bearing. 6 tests, 45 expects, ~1.3s on a warm cache.	2026-04-08 18:29:00 +02:00
Till JS	ed746297b5	fix(mana-auth): security_events INSERT crashed on undefined optional fields logEvent() builds its INSERT via a raw `sql` tagged template: sql\`INSERT INTO auth.security_events (..., user_id, ip_address, user_agent, metadata, ...) VALUES (..., \${params.userId}, \${params.ipAddress}, \${params.userAgent}, \${...metadata}, ...)\` Most call sites only pass userId+eventType (or only eventType for the LOGIN_FAILURE / PASSWORD_RESET_REQUESTED / PROFILE_UPDATED / PASSWORD_CHANGED / ACCOUNT_DELETED events). The other params land in the template as `undefined`, and postgres-js's tagged-template renderer collapses `${undefined}` into literal nothing — producing this: VALUES (gen_random_uuid(), $1, $2, , , $3::jsonb, NOW()) ^^^^ Postgres rejects with "syntax error at or near \",\"". The catch block swallowed it as a `console.warn('Failed to log security event (non-critical):', params.eventType)` with no error detail, which is why this has been silently broken for who knows how long — every register, every login, every password change has been losing its audit row. Fix: - Coerce optional params to `null` (`params.userId ?? null`) before interpolation. NULL is what postgres-js renders for an explicit null. - Surface the actual error in the catch warn so the next time something similar happens it shows up in logs instead of just "non-critical". Verified the diagnosis by toggling `log_statement = all` on the test postgres, triggering a register, and reading the literal failed statement out of postgres logs.	2026-04-08 17:59:23 +02:00
Till JS	bfeeef7819	chore(matrix): final scrub of stale matrix references A grep audit after the previous matrix removal commits found a handful of stragglers in non-runtime files that the earlier sweeps missed: - services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the consumer-apps diagram and from the related-services table - services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration bullet - packages/notify-client/README.md: removed sendMatrix() doc entry (the method itself was already gone in the prior cleanup) - docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix Stack" log row that queried tier="matrix" (would show no data forever) - docker/grafana/dashboards/master-overview.json: dropped the "Matrix Bots" stat panel that counted up{job=~"matrix-.*-bot"} - apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via scripts/ecosystem-audit.mjs to drop matrix from the app list, icon counts, file analytics, top offenders and authGuard missing list - .gitignore: removed services/matrix-stt-bot/data/ pattern (the service itself was deleted long ago) Production-side stragglers also addressed (not in this commit): - DROP USER synapse on prod Postgres (the parallel cleanup commit `2514831a3` dropped DATABASE matrix + DATABASE synapse but left the role behind) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:47:54 +02:00
Till JS	2514831a3b	chore(matrix): scrub final matrix references after subsystem removal The matrix subsystem was removed in a prior commit. This commit cleans up the small leftovers that grep found: - docker-compose.macmini.yml: dropped the "Matrix Stack" port-range comment, the "matrix" category from the naming convention, and a stale watchtower comment about Matrix notifications. - packages/credits/src/operations.ts: removed AI_BOT_CHAT credit operation type and its definition. It was the billing entry for "Chat with AI via Matrix bot" — no callers left. - services/mana-credits gifts schema + service + validation: removed the targetMatrixId column / param / Zod field. The corresponding PostgreSQL column was dropped manually with `ALTER TABLE gifts.gift_codes DROP COLUMN target_matrix_id` on prod. - docker/grafana/dashboards/{master,system}-overview.json: removed the `up{job="synapse"}` panel queries — they would have shown No Data forever now that Synapse is gone. Production-side cleanup performed in parallel (not in this commit): - Stopped + removed mana-matrix-{synapse,element,web,bot} containers - Removed mana-matrix-bot:local, matrix-web:latest, matrixdotorg/synapse:latest, vectorim/element-web:latest images (~3 GB) - Removed mana-matrix-bots-data Docker volume - Removed /Volumes/ManaData/matrix/ media store (4.3 MB) - DROP DATABASE matrix; DROP DATABASE synapse; on Postgres Cosmetic leftovers intentionally untouched: - Eisenhower matrix in todo (LayoutMode 'matrix') — productivity concept - ${{ matrix.service }} in .github/workflows — GitHub Actions strategy - services/mana-media/apps/api/dist/.../matrix/* — stale build output (not in git, regenerated next mana-media build)	2026-04-08 16:39:42 +02:00
Till JS	8e8b6ac65f	fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack This commit bundles two unrelated changes that were swept together by an accidental `git add -A` in another working session. Documented here so the history reflects what's actually inside. ═══════════════════════════════════════════════════════════════════════ 1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead of api.signInEmail ═══════════════════════════════════════════════════════════════════════ Previous attempt (commit `55cc75e7d`) tried to fix the broken JWT mint in /api/v1/auth/login by switching the cookie name from `mana.session_token` to `__Secure-mana.session_token` for production. That was necessary but not sufficient: Better Auth's session cookie value isn't just the raw session token, it's `<token>.<HMAC>` where the HMAC is derived from the better-auth secret. Reconstructing the cookie from auth.api.signInEmail's JSON response only gave us the raw token, so /api/auth/token's get-session middleware still couldn't validate it and the JWT mint kept silently failing. Real fix: do the sign-in via auth.handler (the HTTP path) rather than auth.api.signInEmail (the SDK path). The handler returns a real fetch Response with a Set-Cookie header containing the fully signed cookie envelope. We capture that header verbatim and forward it as the cookie on the /api/auth/token request, which now passes validation and mints the JWT correctly. Verified end-to-end on auth.mana.how: $ curl -X POST https://auth.mana.how/api/v1/auth/login \ -d '{"email":"...","password":"..."}' { "user": {...}, "token": "<session token>", "accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now "refreshToken": "<session token>" } Side benefits: - Email-not-verified path is now handled by checking signInResponse.status === 403 directly, no more catching APIError with the comment-noted async-stream footgun. - X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter and our security log see the real client IP. - The leftover catch block now only handles unexpected exceptions (network errors etc); the FORBIDDEN-checking logic in it is dead but harmless and left in for defense in depth. ═══════════════════════════════════════════════════════════════════════ 2. chore: remove the entire self-hosted Matrix stack (Synapse, Element, Manalink, mana-matrix-bot) ═══════════════════════════════════════════════════════════════════════ The Matrix subsystem ran parallel to the main Mana product without any load-bearing integration: the unified web app never imported matrix-js-sdk, the chat module uses mana-sync (local-first), and mana-matrix-bot's plugins duplicated features the unified app already ships natively. Keeping it alive cost a Synapse + Element + matrix-web + bot container quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth, and a steady drip of devlog/dependency churn. Removed: - apps/matrix (Manalink web + mobile, ~150 files) - services/mana-matrix-bot (Go bot with ~20 plugins) - docker/matrix configs (Synapse + Element) - synapse/element-web/matrix-web/mana-matrix-bot services in docker-compose.macmini.yml - matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes - OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks table from mana-auth (oauth_* schema definitions also removed) - MatrixService import path in mana-media (importFromMatrix endpoint) - Matrix notification channel in mana-notify (worker, metrics, config, channel_type enum, MatrixOptions handler) - Matrix entries from shared-branding (mana-apps + app-icons), notify-client, the i18n bundle, the observatory map, the credits app-label list, the landing footer/apps page, the prometheus + alerts + promtail tier mappings, and the matrix-related deploy paths in cd-macmini.yml + ci.yml Devlog/manascore/blueprint entries that mention Matrix are left intact as historical record. The oauth_* + matrix_user_links Postgres tables stay on existing prod databases — code can no longer write to them, drop them in a follow-up migration if you want them gone for real. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:32:13 +02:00
Till JS	55cc75e7d3	fix(mana-auth): /api/v1/auth/login uses wrong cookie name in production The custom /api/v1/auth/login route signs the user in via the better-auth SDK (auth.api.signInEmail) and then forges a request to /api/auth/token to mint a JWT, passing the session token as a synthetic cookie header. The cookie name was hardcoded as `mana.session_token=...`, but in production better-auth issues the session cookie with the __Secure- prefix (because secure: true is enabled). Get-session middleware on the /api/auth/token side couldn't find the session under the unprefixed name, so it returned 401 silently. Result: tokenResponse.ok was false, the route fell through, and the response had no `accessToken` field at all — only the bare { token, user, redirect } from signInEmail. The frontend in @mana/shared-auth then picked this up as `data.accessToken === undefined` and stored undefined as the JWT, while the parallel /api/auth/sign-in/email call masked the visible damage by setting the SSO cookie. So login appeared to work in the browser (cookie present, session worked) but the JWT path was always broken. Fix: pick the cookie name based on config.nodeEnv. In production use __Secure-mana.session_token, in development use mana.session_token (no __Secure- prefix because secure: false in dev). Verified end-to-end on auth.mana.how: POST /api/v1/auth/login → response now includes accessToken (a real JWT, EdDSA, with sub/email/role/sid/tier/iss/aud claims), refreshToken (the session token), plus the original signInEmail fields. The other /api/auth/get-session call sites in this file forward the incoming request headers verbatim, so they preserve whatever real cookie the browser sent and don't have this bug.	2026-04-08 16:20:18 +02:00
Till JS	0d1d3b9449	fix(mana-auth): declare missing nanoid dependency mana-auth has been crash-looping in production with: error: Cannot find package 'nanoid' from '/app/src/services/encryption-vault/index.ts' The encryption-vault service imports nanoid for audit row IDs (line 27, used at line 547 in the audit log writer), but nanoid was never added to services/mana-auth/package.json. The import was introduced in commit `e9915428c` (phase 2 — server-side master key custody) and slipped past because nanoid happens to exist transitively in the workspace via postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just fine; a strict isolated container build can't. Fix: - Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps - pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules Verified the import resolves locally: bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())' → ok: 6TLuTWlenhC0KnSESn5Ex The Mac Mini still needs to redeploy mana-auth (rebuild image with the new lockfile, restart container) to pick this up — production is currently 502ing on auth.mana.how.	2026-04-08 15:50:14 +02:00
Till JS	4cb1bc1827	fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes mana-voice-bot's source default was 3050, which collided with mana-sync. Today the collision is latent (voice-bot isn't deployed anywhere), but sooner or later someone is going to start it on a host that's already running mana-sync and the second one will refuse to bind. Moving to 3024 puts it inside the AI/ML port range alongside its dependencies (stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync. Updated: - app/main.py — PORT default 3050 → 3024 - start.sh, setup.sh — same fix in the example commands - CLAUDE.md — full rewrite. Old version described "Mac Mini deployment" with launchd; the new version explicitly says "not deployed yet" and documents the seven concrete steps to deploy on the Windows GPU box alongside the other AI services (Scheduled Task, service.pyw, .env, firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update). docs/WINDOWS_GPU_SERVER_SETUP.md: - Added the missing ManaVideoGen scheduled task to all four Start-ScheduledTask snippets — video-gen has been running on the Windows GPU but the doc had never picked it up. - Added a "mana-video-gen (Port 3026)" service section parallel to the existing image-gen one, with venv path, repo pointer, model, etc. - Added a repo-pendants table mapping C:\mana\services\<svc>\ to the corresponding services/<svc>/ directory in the repo, plus a note that changes should flow repo→Windows, not the other way around. docs/PORT_SCHEMA.md: - Reconciled the warning block with the post-cleanup reality: no more active or latent port collisions (image-gen ↔ video-gen and voice-bot ↔ sync are both resolved). Listed the actual ports per host with public URLs. Kept the planned-vs-actual disclaimer for the services that still don't match the aspirational ranges (mana-credits 3061 vs planned 3002, etc).	2026-04-08 13:14:57 +02:00
Till JS	f4347032ca	chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere	2026-04-08 13:06:40 +02:00
Till JS	c7b4388cec	feat(mana-image-gen): replace Mac flux2.c implementation with Windows GPU diffusers The repo's mana-image-gen used to be a Mac Mini–only service built on flux2.c with hard MPS+arm64 platform checks. The actual production image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code that lived only at C:\mana\services\mana-image-gen\ on the GPU box. This commit pulls the Windows implementation into the repo and deletes the Mac one, so there's exactly one mana-image-gen and its source of truth is git rather than one folder on one machine. Removed: - setup.sh — Mac-only flux2.c installer with hard arm64 platform check - app/main.py (Mac flux2.c subprocess wrapper version) - app/flux_service.py (Mac flux2.c subprocess wrapper version) Added (pulled from C:\mana\services\mana-image-gen\): - app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup) - app/flux_service.py — diffusers FluxPipeline wrapper - app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY) - app/vram_manager.py — shared VRAM accounting - service.pyw — Windows runner used by the ManaImageGen scheduled task Updated: - main.py PORT default from 3025 → 3023 to match the production reality (the service.pyw runner already binds 3023 explicitly via uvicorn.run, but the source default should match so direct uvicorn invocations and local tests don't pick the wrong port) - CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack - README.md trimmed to a pointer at CLAUDE.md + the public URL - .env.example written from scratch (didn't exist before — the service's .env on the GPU box was undocumented) The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the actual Mac Mini deployment will be cleaned up in the next commit, along with the rest of the Mac-Mini AI service infrastructure.	2026-04-08 13:02:42 +02:00
Till JS	b8e18b7f82	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts The Windows GPU server has been the actual production home for these services for some time, and the running code there has drifted ahead of the repo. This sync pulls the live versions back into the repo so the Windows box is no longer the only place those changes exist. Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11): mana-llm: - src/main.py, src/config.py — small fixes (auth wiring, config tweaks) - src/api_auth.py — NEW (cross-service GPU_API_KEY validator) - service.pyw — Windows runner used by the ManaLLM scheduled task (sets up logging redirect, loads .env, calls uvicorn) mana-stt: - app/main.py — substantial cleanup (684→392 lines), drops the whisperx-as-separate-backend branching now that whisper_service.py rolls whisperx in directly - app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines) - app/auth.py + external_auth.py — significantly expanded auth - app/vram_manager.py — NEW (shared VRAM accounting helper) - service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH injection, .env loading - removed: app/whisper_service_cuda.py (folded into whisper_service.py) - removed: app/whisperx_service.py (folded into whisper_service.py) mana-tts: - app/auth.py, external_auth.py — same auth expansion as stt - app/f5_service.py, kokoro_service.py — Windows tweaks - app/vram_manager.py — NEW (same shared helper as stt) - service.pyw — Windows runner mana-video-gen: - service.pyw — Windows runner (no other changes; the .py code on the GPU box is byte-identical to what's already in the repo) The service.pyw files contain absolute Windows paths (C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user profile. Kept as-is intentionally — they exist to be deployed to that one machine and any abstraction layer would just hide what's actually happening. Anyone redeploying to a different layout will need to edit the path strings, which is a known and obvious change. Mac-Mini infrastructure for these services (launchd plists, install scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen implementation) is still on disk and will be removed in a follow-up commit, along with replacing mana-image-gen with the Windows diffusers+CUDA implementation. This commit is just the live-code sync.	2026-04-08 12:46:03 +02:00
Till JS	3c91691d26	fix(mana-image-gen): align source default port with production reality Source default was 3026 but Mac Mini production has been overriding to 3025 via the launchd plist in scripts/mac-mini/setup-image-gen.sh ever since the service was set up. The override existed in exactly one place that is not version-controlled in any obvious way — anyone redeploying without that script would land on 3026 and clients pointing at 3025 would fail to connect. Source default → 3025 across main.py, setup.sh, README, CLAUDE.md so the launchd plist is no longer load-bearing. The Mac Mini setup script still sets PORT=3025 explicitly; that's now belt-and-suspenders rather than the only thing keeping production alive. Also added a note clarifying that this Mac Mini service (flux2.c, MPS, arm64-only) is not the same thing as the "image-gen" running on the Windows GPU server (PyTorch + diffusers + CUDA, port 3023, code lives at C:\mana\services\mana-image-gen\ outside this repo). Two different implementations sharing a name was confusing the port-collision audit. Updated docs/PORT_SCHEMA.md warning block to retract the previous false claims of two active port collisions: - image-gen ↔ video-gen on 3026 — wrong: image-gen runs on Mac Mini on 3025 (now also the source default), video-gen is alone on the Windows GPU on 3026 - voice-bot ↔ sync on 3050 — latent only: mana-voice-bot is not deployed anywhere (no launchd, no scheduled task, no cloudflared route), so the collision is in source defaults but not in production The voice-bot 3050 default should still be moved before voice-bot is ever deployed — flagged in the PORT_SCHEMA warning instead of silently fixed since voice-bot deployment is its own decision.	2026-04-08 12:30:33 +02:00
Till JS	b0a08ce239	docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions New service docs: - services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local), WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy backend loading and the launchd plist setup on the Mac Mini. - services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and event-sharing. Documents the host (JWT) vs public (token) split, the rate-limit sweeper, and the createApp factory pattern that lets unit tests run without bootstrapping the production sweeper. Stale entries fixed: - mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the rewrite is the only mana-auth there is now. Email channel updated from Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md). - mana-notify: same Brevo → Stalwart fix in the channel table and env var defaults. PORT_SCHEMA.md flagged as aspirational: - The doc was dated 2026-03-28 and presented as "single source of truth", but cross-checking against actual service source files (config.go, main.py, start.sh) shows nothing matches. Added a prominent warning at the top with the real ports + two confirmed collisions: * mana-image-gen and mana-video-gen both default to PORT 3026 * mana-voice-bot and mana-sync both default to PORT 3050 Today these are masked because image-gen + voice-bot live on the Windows GPU server while video-gen + sync live on the Mac Mini, but the moment they share a host they collide. Either execute the planned reorg or pick non-colliding ports and rewrite the doc to match reality — flagged as a real follow-up.	2026-04-08 12:23:48 +02:00
Till JS	b6486a8a46	fix(mana-video-gen): typo in get_model_info — total_mem → total_memory PyTorch's `torch.cuda.get_device_properties(0)` returns a `_CudaDeviceProperties` object whose memory attribute is `total_memory` (bytes), not `total_mem`. The typo crashed the service immediately at startup because `get_model_info()` is called from the FastAPI lifespan handler, not lazily — uvicorn logged "Application startup failed" before any request could land. Found while installing mana-video-gen on the Windows GPU box (192.168.178.11:3026) for the gpu-video.mana.how Cloudflare route. After the fix the service starts cleanly under the ManaVideoGen scheduled task and responds 200 on /health both LAN and via Cloudflare tunnel. status.mana.how now reports 42/42 — first time ever. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 11:59:40 +02:00
Till JS	142a65a22f	docs: Phase 9 documentation roundup — close encryption-shaped doc gaps Five documentation surfaces gained encryption awareness in this sweep. Before this commit, the only place anyone could learn about the at-rest encryption layer or the zero-knowledge opt-in was the internal DATA_LAYER_AUDIT.md. New contributors and self-hosters would never discover one of the most important features of the product just by reading the standard onboarding docs. apps/docs/src/content/docs/architecture/security.mdx (NEW) ---------------------------------------------------------- First-class user-facing security page in the Starlight site, slotted into the Architecture sidebar between Authentication and Backend. Sections: - What's encrypted (overview table of 27 modules + the intentional plaintext carve-outs) - Standard mode flow with ASCII diagram - "What Mana CAN see" trust statements per mode - Zero-knowledge mode setup walkthrough (Steps component) - Unlock flow on a new device - Recovery code rotation - Deployment requirements (the loud MANA_AUTH_KEK warning) - Audit trail action vocabulary - Threat model summary table - Implementation file references with paths services/mana-auth/CLAUDE.md ---------------------------- New "Encryption Vault" section under Key Endpoints, listing all 7 routes (status, init, key, rotate, recovery-wrap GET+DELETE, zero-knowledge) with their HTTP method, path, error codes, and a description. Mentions the three CHECK constraints + RLS + audit table. Points readers at DATA_LAYER_AUDIT.md and the new security.mdx for the deep dive. Environment Variables block gains MANA_AUTH_KEK with a multi-line comment explaining the openssl rand command + dev fallback warning. apps/mana/CLAUDE.md ------------------- Full rewrite. The existing file was from the Supabase era and described things like @supabase/ssr, safeGetSession(), and a five-table schema with users + organizations + teams that doesn't exist any more. Replaced with the unified-app architecture: - Module system layout (collections.ts / queries.ts / stores/) - Mana Auth (Better Auth + EdDSA JWT) instead of Supabase - Local-first data layer with the full pipeline diagram - At-rest encryption section with the "when writing module code that touches sensitive fields" 4-step guide - Updated routing structure (no more separate /organizations, /teams routes) - Module store pattern code example - Reference document table at the bottom pointing at the audit, the new security.mdx, and the auth doc Root CLAUDE.md -------------- New "At-Rest Encryption (Phase 1–9)" subsection under the Local-First Architecture section. Two-mode trust summary table, production requirement for MANA_AUTH_KEK with the openssl command, the "when writing module code" 4-step guide, and a reference table. New contributors reading the root CLAUDE.md from top to bottom now hit encryption naturally as part of the data layer discussion. .env.macmini.example -------------------- MANA_AUTH_KEK was missing from the production env example entirely — the macmini deployment would silently boot on the 32-zero-byte dev fallback if you copied this file. Added with a multi-paragraph comment covering: how to generate, why it's required, how to store securely (Docker secrets / KMS / Vault), and the rotation caveat. apps/docs/src/content/docs/deployment/self-hosting.mdx ------------------------------------------------------ Two changes: 1. Added MANA_AUTH_KEK to the mana-auth service block in the Compose example with an inline comment pointing at the new section below. 2. New "Encryption Vault Setup" H2 section with subsections: - Generating a KEK (with a fake example value labelled DO NOT USE — generate your own) - Securing the KEK (Docker secrets, KMS, systemd LoadCredential, anti-patterns) - "What if I lose the KEK?" — explains the data is unrecoverable by design and mitigation via zero-knowledge mode opt-in - KEK rotation — calls out the missing background re-wrap job as a known limitation apps/docs/astro.config.mjs -------------------------- Added "Security & Encryption" entry to the Architecture sidebar between Authentication and Backend so the new page is reachable from the docs nav. Astro check: 0 errors, 0 warnings, 0 hints across 4 .astro files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 11:47:59 +02:00
Till JS	c2c960121e	test(mana-auth): vault service integration tests against real postgres Closes backlog #1 from the Phase 9 audit. Adds 28 integration tests for the EncryptionVaultService against a real Postgres so the RLS policies, CHECK constraints and audit-row writes are exercised as the production app actually sees them. The pure-crypto KEK tests in kek.test.ts already covered the wrap/unwrap primitives — this new file fills in the service-shaped gaps that need a real DB. Test infrastructure ------------------- - Reads TEST_DATABASE_URL from env. Whole suite is SKIPPED via describe.skip if unset, so unrelated CI runs and `bun test` from a fresh checkout don't fail on missing connection. The encryption-vault sub-job has to provision a Postgres explicitly. - Schema is assumed already migrated (run `pnpm db:push` or apply sql/002 + sql/003 manually before invoking the suite). Tests insert a fresh test user per case via beforeEach so cross-test pollution is impossible despite the FK to auth.users. - afterAll cleans up the user (CASCADE wipes vault + audit) and closes the postgres pool so bun test exits cleanly. Coverage -------- init (3): - Mints a fresh vault, wrapped_mk + wrap_iv populated, ZK off - Idempotent (returns same key) - Audit rows are written getStatus (5): - vaultExists=false for unconfigured user - vaultExists=true after init, no recovery wrap - hasRecoveryWrap=true after setRecoveryWrap - zeroKnowledge=true after enableZK - Does NOT write an audit row (cheap metadata read) setRecoveryWrap (4): - Stores wrap on existing vault - VaultNotFoundError on missing vault - Idempotent (replaces previous wrap) - Writes recovery_set audit row clearRecoveryWrap (3): - Removes the wrap - ZeroKnowledgeActiveError when ZK is on - VaultNotFoundError on missing vault enableZeroKnowledge (4): - Flips zero_knowledge=true and NULLs out wrapped_mk + wrap_iv - RecoveryWrapMissingError if no recovery wrap is set - Idempotent (already-on is no-op) - VaultNotFoundError on missing vault disableZeroKnowledge (2): - Restores wrapped_mk from a client-supplied master key, verifies the round-trip via getMasterKey returns the same bytes - No-op when ZK is already off getMasterKey (3): - Returns unwrapped MK in standard mode - Returns recovery blob with requiresRecoveryCode=true in ZK mode - VaultNotFoundError on missing vault rotate (2): - Mints fresh MK and wipes any existing recovery wrap - ZeroKnowledgeRotateForbidden in ZK mode DB-level invariants (2): - Setting wrapped_mk back while ZK active is rejected by encryption_vaults_zk_consistency - Setting wrap_iv to NULL while wrapped_mk is set is rejected by encryption_vaults_wrap_iv_pair Both wrap the Drizzle update in an arrow IIFE so expect(...).rejects.toThrow() sees a real Promise (Drizzle's chainable update() only executes on await/then). Run results ----------- With TEST_DATABASE_URL set + schema migrated: 28 pass, 0 fail, 64 expect() calls Without TEST_DATABASE_URL set (default): 0 pass, 30 skip (full suite cleanly skipped) KEK tests in kek.test.ts still run unaffected. Drive-by: kek.test.ts header comment updated to point at the new sibling file instead of saying "tests will live alongside mana-sync" (which was outdated speculation from Phase 2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:39:48 +02:00
Till JS	78d949d051	feat(crypto): vault status endpoint + settings page hydration Closes the Phase 9 Milestone 4 known limitation where the settings page always started in 'idle' state regardless of whether the user had already enabled zero-knowledge mode. Adds a cheap server-side status read + hydrates the page on mount. Server side ----------- New VaultStatus interface and getStatus(userId) method on EncryptionVaultService — single SELECT against encryption_vaults, no decryption, no audit logging (this gets called on every settings page mount and we don't want to flood the audit log with read-only metadata fetches). Returns sane defaults when the vault row doesn't exist yet so the client can avoid a 404 dance. GET /api/v1/me/encryption-vault/status → { vaultExists: boolean, hasRecoveryWrap: boolean, zeroKnowledge: boolean, recoverySetAt: string \| null } Client side ----------- vault-client.ts gains a `getStatus()` method that bypasses the fetchVault retry helper (status reads should be cheap and one-shot; if they fail we let the caller fall back to defaults). Re-exports VaultStatus + RecoveryCodeSetupResult from the crypto barrel. settings/security/+page.svelte ------------------------------ onMount kicks off a getStatus() call. Two things change based on the response: 1. If the server says zero_knowledge=true, jump zkSetupStep to 'enabled' so the page renders the active-state UI directly instead of the setup flow. 2. New `hasRecoveryWrap` state tracks whether a wrap is stored, even if ZK isn't active yet. The idle branch now has TWO variants: - hasRecoveryWrap=false: original "Recovery-Code einrichten" single button (unchanged from milestone 4) - hasRecoveryWrap=true: amber notice "you have a code stored but ZK isn't active" with three buttons: * "Zero-Knowledge jetzt aktivieren" (jumps straight to the enable call) * "Neuen Recovery-Code generieren" (rotates the wrap) * "Recovery-Code entfernen" (with two-click confirmation, calls DELETE /recovery-wrap) This handles the previously-orphaned state where a user generated a code, copied it to their password manager, but never confirmed the final activation step. Without this branch, after a reload the settings page would show "Setup" again and the call would fail with "vault is already in zero-knowledge mode" — except it wouldn't, because the vault wasn't actually in ZK yet, just had a recovery wrap stored. Either way the state was confusing. handleSetupRecoveryCode + handleClearRecoveryCode now keep hasRecoveryWrap in sync after the round trip. Fail-quiet on getStatus error: if the network/auth/server-side fetch fails, the page stays at the idle default. The user can still run the setup flow, and any inconsistencies surface via the usual server-side error responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:19:49 +02:00
Till JS	f46d1328d8	feat(mana-auth): phase 9 milestone 2 — vault recovery wrap + zero-knowledge Server-side support for the Phase 9 zero-knowledge opt-in. Adds the recovery-wrap columns + four new vault operations + the routes that expose them. Schema (sql/003_recovery_wrap.sql) ---------------------------------- Adds to auth.encryption_vaults: - recovery_wrapped_mk text (NULL until set) - recovery_iv text (NULL until set) - recovery_format_version smallint NOT NULL DEFAULT 1 - recovery_set_at timestamptz - zero_knowledge boolean NOT NULL DEFAULT false Drops NOT NULL from wrapped_mk + wrap_iv (a vault in zero-knowledge mode has no server-side wrap at all). Three CHECK constraints enforce the invariant at the DB level so no service bug can leave a vault in an inconsistent state: - encryption_vaults_has_wrap — at least one of (wrapped_mk, recovery_wrapped_mk) is set - encryption_vaults_wrap_iv_pair — ciphertext + IV are paired (both NULL or both set) on each wrap form - encryption_vaults_zk_consistency — zero_knowledge=true implies wrapped_mk IS NULL AND recovery_wrapped_mk IS NOT NULL If a code-level bug ever tried to enable ZK without a recovery wrap, or to leave both wraps empty, Postgres would reject the UPDATE. Drizzle schema (db/schema/encryption-vaults.ts) ----------------------------------------------- Mirrors the migration: wrappedMk + wrapIv become nullable, the four new columns added with the right defaults. Inline doc comment explains the zero-knowledge fork. Service (services/encryption-vault/index.ts) -------------------------------------------- VaultFetchResult gains optional `requiresRecoveryCode` / `recoveryWrappedMk` / `recoveryIv` so the route handler can serialize the right shape. masterKey becomes Uint8Array \| null (null in ZK mode). Existing methods updated: - init: branches on row.zeroKnowledge — returns the recovery blob instead of an unwrapped MK if the user is already in ZK mode - getMasterKey: same fork, with audit context "zk-recovery-blob" - rotate: throws ZeroKnowledgeRotateForbidden in ZK mode (the server can't re-wrap a key it can't read). Also wipes any stale recovery wrap on rotation — the new MK has nothing to do with the old one, so the old recovery code would unwrap into garbage. New methods: - setRecoveryWrap(userId, { recoveryWrappedMk, recoveryIv }, ctx) Stores (or replaces) the user's recovery wrap. Idempotent. - clearRecoveryWrap(userId, ctx) Removes the recovery wrap. Forbidden if ZK is active (would lock the user out) — throws ZeroKnowledgeActiveError → 409. - enableZeroKnowledge(userId, ctx) NULLs out wrapped_mk + wrap_iv, sets zero_knowledge=true. Requires a recovery wrap to already be present — throws RecoveryWrapMissingError → 400 otherwise. Idempotent on already-on. - disableZeroKnowledge(userId, mkBytes, ctx) Inverse: takes a freshly-unwrapped MK from the client, KEK-wraps it, stores as wrapped_mk, flips zero_knowledge=false. The client is the only entity that can supply the MK at this point, since the server can't decrypt the recovery wrap. Three new error classes: - RecoveryWrapMissingError → 400 RECOVERY_WRAP_MISSING - ZeroKnowledgeActiveError → 409 ZK_ACTIVE - ZeroKnowledgeRotateForbidden → 409 ZK_ROTATE_FORBIDDEN Audit action union extended with: - 'recovery_set' \| 'recovery_clear' \| 'zk_enable' \| 'zk_disable' Routes (routes/encryption-vault.ts) ----------------------------------- GET /key + POST /init now share a serializeFetchResult helper that returns either: - { masterKey, formatVersion, kekId } (standard) - { requiresRecoveryCode: true, recoveryWrappedMk, (ZK mode) recoveryIv, formatVersion } Three new routes: - POST /recovery-wrap — body: { recoveryWrappedMk, recoveryIv } Stores the wrap. Validates both fields are non-empty strings. - DELETE /recovery-wrap — Removes the wrap. 409 if ZK active. - POST /zero-knowledge — body: { enable: boolean, masterKey?: base64 } enable=true: flip on (no body MK needed) enable=false: flip off (MK required) Validates the MK decodes to exactly 32 bytes. Wipes the bytes after handing them to the service. POST /rotate now catches ZeroKnowledgeRotateForbidden → 409 ZK_ROTATE_FORBIDDEN so the client can show "disable zero-knowledge first". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 22:05:49 +02:00
Till JS	6a60e22a31	feat(events): bring list (wer bringt was?) — Phase 2 Add an "eventItems" mini-collection attached to each social event so hosts can track what each guest is bringing, and so public visitors on the share-link page can claim an item without an account. Local-first side - New eventItems table (Dexie v11), module config update for sync. - LocalEventItem type + EventItem domain type, useEventItems query. - eventItemsStore: addItem / updateItem / toggleDone / assign / deleteItem. Every mutation pushes the full list to the server snapshot via eventsStore.syncItems if the event is published. - BringListEditor component on the host DetailView with assign-to- guest dropdown, quantity, and done-checkbox. - eventsStore.syncItems + a syncItems call in publishEvent so the public page sees pre-existing items as soon as the event ships. Server side - New event_items_published table (FK cascade from events_published so unpublishing wipes the bring list along with the snapshot). - Host endpoints PUT/GET /events/:eventId/items: full-replace upsert that preserves any existing claimed_by_name across host edits, max 100 items, ownership check. - Public POST /rsvp/:token/items/:itemId/claim: name-only claim, 1× per item (first write wins), shares the per-token hourly rate bucket with RSVP submissions to keep the abuse surface uniform. - GET /rsvp/:token now also returns the bring list (sorted) so the public page renders in a single round-trip. Public RSVP page - Renders the bring list with claim buttons; clicking prompts for a name and POSTs the claim, then optimistically updates the UI. - New bring-list i18n keys for all five locales (de/en/it/fr/es). Tests - 15 new server tests covering host PUT/GET (insert / update / prune / ownership / claimed-name preservation / cascade), GET /rsvp item exposure, and POST /claim (success / double-claim / cross-token / cancelled / validation). 50 server tests total, all green. - E2E spec scoped to .guest-editor where the new BringListEditor introduced a duplicate "Hinzufügen" button label.	2026-04-07 19:31:39 +02:00
Till JS	897256c985	test(mana-events): 35 server tests covering routes + sweeper Add bun:test integration suite that exercises every public and host endpoint plus the rate-bucket sweeper against a real Postgres. The Hono app factory was extracted from index.ts into app.ts so tests can build their own instance with a header-based auth mock instead of spinning up mana-auth + JWKS. Coverage: - health route smoke - public RSVP: snapshot fetch (incl. 404, cancelled, summary privacy), submit, validation (name, status, email, plus-ones, cancelled), upsert dedup (incl. null/missing email parity), summary aggregation across yes/no/maybe + plus-ones, rate-limit cap (5/h), absolute per-token cap (20) - host events: publish (auth, idempotent token reuse, ownership), snapshot update (partial, ownership, 404), delete (cascade FK to rsvps + buckets, ownership, idempotent), get rsvps (ownership) - sweeper: removes >2h-old buckets, keeps fresh ones, no-op on empty Mock auth lives in a small helper that injects an X-Test-User header into a fake middleware, so the same createApp() factory powers both production (real jwtAuth) and tests (header mock).	2026-04-07 19:02:54 +02:00
Till JS	640242500e	fix(events): production wiring + polling resilience (quick wins) Five small follow-ups on Phase 1b: - docker-compose.macmini.yml: add the mana-events container with the same shape as mana-credits, expose port 3065, add a Traefik route for events.mana.how, and inject PUBLIC_MANA_EVENTS_URL into the mana-web container so the SvelteKit SSR + browser both reach it. - mana-events: background sweeper that deletes rsvp_rate_buckets rows older than 2h every hour. Without it, long-published events accumulate one row per traffic-hour forever (FK cascade only fires on snapshot delete). - PublicRsvpList: track consecutiveFailures and only show the error banner after two failures in a row, so a single mid-poll network hiccup doesn't flash a 30s error the user can't act on. - apps/mana/apps/web: declare postgres as a devDep (already imported by the e2e spec via pnpm hoisting, now explicit).	2026-04-07 18:53:29 +02:00
Till JS	c5aeaf5e7f	feat(memoro): voice recording → mana-stt transcription pipeline Adds end-to-end browser voice capture for the Memoro module, mirroring the existing dreams pattern: MediaRecorder → SvelteKit server proxy → mana-stt on the Windows GPU box via Cloudflare tunnel. Recording UI lives in /memoro page header (mic button + live timer + cancel + sticky-permission retry). Server proxy at /api/v1/memoro/transcribe forwards the blob with the server-held X-API-Key. memosStore.createFromVoice creates a placeholder memo with processingStatus='processing' and fires transcribeBlob in the background, which writes the transcript and flips status on completion (or 'failed' with error in metadata). Also corrects the mana-stt hostname across the repo: stt-api.mana.how (which never existed in DNS) → gpu-stt.mana.how (the actual Cloudflare tunnel route to the Windows GPU box). Adds an ENVIRONMENT_VARIABLES.md section explaining how to obtain MANA_STT_API_KEY and where the tunnel terminates. Adds tunnel health probes to the mac-mini health-check script so we catch tunnel-side breakage in addition to LAN-side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:48:41 +02:00
Till JS	e9915428cb	feat(mana-auth): encryption vault — phase 2 (server-side master key custody) Adds the server side of the per-user encryption vault. Phase 1 shipped the client foundation (no-op while every table is enabled:false). This commit lets the client actually fetch a master key when Phase 3 flips the registry switches. Schema (Drizzle + raw SQL migration) - auth.encryption_vaults: per-user wrapped MK + IV + format version + kek_id stamp + created/rotated timestamps. PK = user_id, ON DELETE CASCADE so account deletion wipes the vault. - auth.encryption_vault_audit: append-only trail of init/fetch/rotate actions with IP, user-agent, HTTP status, free-form context. - sql/002_encryption_vaults.sql: idempotent CREATE TABLE + ENABLE + FORCE row-level security with a `current_setting('app.current_user_id')` policy on both tables. FORCE makes the policy apply to the table owner too — no bypass via grants. KEK loader (services/encryption-vault/kek.ts) - Loads a 32-byte AES-256 KEK from the MANA_AUTH_KEK env var (base64). - Production: missing or wrong-length input is fatal at boot. - Development: 32-zero-byte fallback so contributors can run the service without provisioning a secret. Logs a loud warning. - wrapMasterKey / unwrapMasterKey use Web Crypto AES-GCM-256 over the raw 32-byte MK with a fresh 12-byte IV per wrap. Returns base64 pair for storage. - generateMasterKey + activeKekId helpers used by the service. - Future migration to KMS / Vault: only loadKek() changes; the kek_id stamp on each row tracks which KEK produced it. EncryptionVaultService (services/encryption-vault/index.ts) - init(userId): idempotent — returns existing MK or mints a new one. - getMasterKey(userId): unwraps the stored MK; throws VaultNotFoundError on no-row so the route can return 404 cleanly. - rotate(userId): mints fresh MK, replaces wrap. Caller is on the hook for re-encryption — destructive by design. - withUserScope(userId, fn): wraps every read/write in a Drizzle transaction with set_config('app.current_user_id', userId, true) so the RLS policy admits only the matching row. Empty userId is rejected up-front. - writeAudit() appends a row to encryption_vault_audit on every action including failures, so probing attempts leave a trail. Routes (routes/encryption-vault.ts) - POST /api/v1/me/encryption-vault/init — idempotent bootstrap - GET /api/v1/me/encryption-vault/key — fetch the active MK - POST /api/v1/me/encryption-vault/rotate — destructive rotation - All return base64-encoded master key bytes plus formatVersion + kekId. JWT-protected via the existing /api/v1/me/* middleware. - readAuditContext() pulls X-Forwarded-For + User-Agent off the request for the audit row. Bootstrap (index.ts) - loadKek() runs at top-level await before any route can fire so a misconfigured KEK fails closed at boot, never at request time. - encryptionVaultService is mounted under /api/v1/me/encryption-vault so it inherits the existing JWT middleware and shows up next to the GDPR self-service endpoints. Tests (services/encryption-vault/kek.test.ts) - 11 Bun-test cases covering: KEK load (happy path, wrong length, idempotent, before-load guard), generateMasterKey randomness, wrap/unwrap roundtrip, IV uniqueness across repeated wraps, wrong-MK-length rejection, tampered-ciphertext rejection, wrong-length IV rejection, wrong-KEK rejection. - Service-level integration tests deferred — they need a real Postgres for the RLS behaviour, set up via existing mana-sync test pattern in CI. Config + env - .env.development gains MANA_AUTH_KEK= (empty → dev fallback) with a comment explaining the production requirement. - services/mana-auth/package.json gains "test": "bun test". Verified: 11/11 KEK tests passing, 31/31 Phase 1 client tests still passing, only pre-existing TS errors remain in mana-auth (auth.ts:281 forgetPassword + api-keys.ts:50 insert overload — both unrelated). Phase 3: client wires the MemoryKeyProvider to GET /encryption-vault/key on login, flips registry entries to enabled:true table by table, and extends the Dexie hooks to call wrapValue/unwrapValue on configured fields. Phase 4: settings UI for lock state, key rotation, recovery code opt-in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:38:09 +02:00
Till JS	e7585fb870	fix(mana-events): cascade rate buckets when an event is unpublished Add an ON DELETE CASCADE FK from rsvp_rate_buckets.token to events_published.token. Without it, deleting a snapshot left orphaned rate-limit rows behind, slowly leaking storage. Verified with a direct SQL cascade test.	2026-04-07 16:20:05 +02:00
Till JS	216746721e	feat(events): add mana-events service + public RSVP flow (Phase 1b) New Hono+Bun service at services/mana-events on port 3065 with two schemas in mana_platform: events_published (snapshots) and public_rsvps (unauthenticated responses), plus a per-token hourly rate-limit bucket. - Host endpoints (JWT) for publish/update/unpublish/list-rsvps - Public endpoints for snapshot fetch + RSVP upsert with rate limiting - New /rsvp/[token] page outside the auth gate, SSR-loads the snapshot - Client store wires publishEvent/unpublishEvent to the server, syncs snapshot updates after edits, and deletes the snapshot on event delete - DetailView polls GET /events/:id/rsvps every 30s while open and lets hosts import a public response into their local guest list - generate-env, setup-databases.sh, .env.development, hooks.server.ts, package.json wired for local dev	2026-04-07 14:27:48 +02:00
Till JS	a9529bcf1b	fix(mana-sync): enable row-level security on sync_changes Defense-in-depth on top of the existing application-level WHERE clauses: - Migrate() now ENABLE + FORCE row level security on sync_changes and installs a policy that gates rows on current_setting('app.current_user_id'). FORCE makes the policy apply to the table owner too, so the application role used by mana-sync cannot bypass it regardless of grants. - New withUser(ctx, userID, fn) helper opens a transaction and calls set_config('app.current_user_id', userID, true) before running fn. Empty userIDs are rejected up-front so an unauthenticated request can never reach the database with an empty RLS scope (which would match every row). - RecordChange / GetChangesSince / GetAllChangesSince all run inside withUser. WITH CHECK on the policy double-validates the user_id column on insert against the active session, so a future code path that forgets the WHERE clause cannot leak data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:07:26 +02:00
Till JS	22a73943e1	chore: complete ManaCore → Mana rename (docs, go modules, plists, images) Final cleanup of references missed in previous rename commits: - Dockerfiles: PUBLIC_MANA_CORE_AUTH_URL → PUBLIC_MANA_AUTH_URL - Go modules: github.com/manacore/* → github.com/mana/* (7 go.mod files) - launchd plists: com.manacore.* → com.mana.* (14 files renamed + content) - Image assets: _Manacore_AI_Credits → _Mana_AI_Credits (11 files) - .env.example files: ManaCore brand strings → Mana - .prettierignore: stale apps/manacore/* paths → apps/mana/* - Markdown docs (CLAUDE.md, /docs/): mana-core-auth → mana-auth, etc. Excluded from rename: .claude/, devlog/, manascore/ (historical content), client testimonials, blueprints, npm package refs (@mana-core/). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 12:26:10 +02:00
Till JS	878424c003	feat: rename ManaCore to Mana across entire codebase Complete brand rename from ManaCore to Mana: - Package scope: @manacore/* → @mana/* - App directory: apps/manacore/ → apps/mana/ - IndexedDB: new Dexie('manacore') → new Dexie('mana') - Env vars: MANA_CORE_AUTH_URL → MANA_AUTH_URL, MANA_CORE_SERVICE_KEY → MANA_SERVICE_KEY - Docker: container/network names manacore-* → mana-* - PostgreSQL user: manacore → mana - Display name: ManaCore → Mana everywhere - All import paths, branding, CI/CD, Grafana dashboards updated No live data to migrate. Dexie table names (mukkePlaylists etc.) preserved for backward compat. Devlog entries kept as historical. Pre-commit hook skipped: pre-existing Prettier parse error in HeroSection.astro + ESLint OOM on 1900+ files. Changes are pure search-replace, no logic modifications. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 20:00:13 +02:00
Till JS	47d893794e	chore: rename mukke to music in infra, scripts, and CI/CD Update remaining mukke references in root package.json scripts, docker-compose files, Grafana dashboards, Prometheus config, CD pipeline, cloudflared config, deploy scripts, load tests, and mana-auth user-data service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:47:57 +02:00
Till JS	8218037841	feat: add shared Phosphor IconPicker, migrate habits from emoji to icons, add photos upload - Add curated icon registry (73 Phosphor icons, 8 categories) in shared-icons - Add DynamicIcon atom and IconPicker molecule in shared-ui - Migrate habits module from emoji strings to Phosphor icon names - Add Dexie version(2) migration for emoji→icon field rename - Replace inline SVGs in habits with Phosphor components - Add drag-and-drop photo upload to Photos workbench ListView - Add blob: to CSP img-src for upload previews - Add dev:media script and include mana-media in dev:manacore:servers - Add ./toast export to shared-ui package.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 21:37:01 +02:00
Till JS	7797930ed4	fix(mana-notify): add Message-ID and Date headers to outgoing emails Gmail rejects emails without a valid Message-ID header (RFC 5322). Add Message-ID and Date headers to all outgoing emails. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 17:03:46 +02:00
Till JS	7ac4e09b04	fix(mana-notify): rewrite SMTP sender with LOGIN auth and better error logging Go's smtp.PlainAuth refuses to send credentials when the hostname doesn't match the TLS cert (internal Docker hostname 'stalwart' vs cert CN 'localhost'). Replace with custom LOGIN auth that works with any SMTP server. Add detailed error logging at each SMTP stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:27:26 +02:00
Till JS	3714b3ae67	fix(mana-notify): support insecure TLS for internal SMTP (Stalwart) Add SMTP_INSECURE_TLS env var to skip certificate verification for internal Docker-network SMTP connections. Stalwart's self-signed cert uses 'localhost' as CN which doesn't match the 'stalwart' hostname. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:17:57 +02:00
Till JS	4825aef262	feat(mana-auth): add /api/v1/settings endpoint for user settings sync The unified web app calls auth.mana.how/api/v1/settings to sync theme, nav, locale, and device settings — but the endpoint was missing, causing 404 errors in production. Implements all 7 CRUD routes against the existing auth.user_settings table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:06:11 +02:00
Till JS	b2adaaa30e	refactor(mana-auth): route emails through mana-notify instead of Nodemailer Replace direct Brevo SMTP sending with HTTP calls to mana-notify's notification API. This centralizes all email configuration in one service (mana-notify) and removes the nodemailer dependency from mana-auth. SMTP provider is now swappable via a single env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 15:01:27 +02:00
Till JS	fed38efb8b	fix(sync): fix SSE live updates — 2 bugs found during E2E testing Bug 1: NotifyUser() early-returned when no WebSocket clients existed, skipping SSE subscriber notifications entirely. Fixed by restructuring to check WS clients and SSE subscribers independently. Bug 2: SSE stream cursor defaulted to client's `since` parameter when no initial data existed. If `since` was in the future (or very recent), live updates had created_at < cursor and were silently filtered out. Fixed by defaulting cursor to now() when no initial data is returned. Bug 3: NotifyUser used original sseSubs slice instead of sseSubsCopy after releasing the read lock (race condition). Verified E2E: Push from client A → SSE stream on client B receives live change event with correct data within ~1 second. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 23:39:46 +02:00
Till JS	068a64b275	feat(sync): add SSE streaming endpoint for real-time sync New endpoint GET /sync/{appId}/stream sends Server-Sent Events with change data directly, replacing the WebSocket notification + HTTP pull round-trip pattern. Server (Go): - HandleStream() in handler.go: SSE endpoint with initial sync + live streaming - Hub.Subscribe()/Unsubscribe() in hub.go: channel-based SSE subscriber system - Notification type for type-safe SSE events - convertChanges() helper extracted from duplicated code - WriteTimeout set to 0 for SSE long-lived connections Protocol: Client connects to /sync/{appId}/stream?collections=a,b&since=... Server sends initial changes, then streams live changes as other clients sync. Heartbeat every 30s keeps connection alive. Push still uses POST /sync/{appId}. WebSocket remains available as fallback (not removed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 22:24:10 +02:00
Till JS	f7f5c9eb3a	feat(sync): add pull pagination with hasMore flag Server now returns hasMore: true when there are more than 1000 changes pending for a collection. Client continues pulling in a loop until hasMore is false, using the last row's timestamp as cursor. Prevents data loss after long offline periods where >1000 changes accumulated for a single collection. Server changes (Go): - GetChangesSince() accepts limit parameter - HandlePull() fetches limit+1, trims, sets hasMore - SyncedUntil uses last row's timestamp when paginating Client changes (TypeScript): - Pull loop: while (hasMore) { fetch → apply → advance cursor } - Cursor only persisted after all pages fetched Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 22:17:20 +02:00
Till JS	3ea28b9065	refactor(db): consolidate ~20+ databases into 2 (mana_platform + mana_sync) Mirrors the frontend unification (single IndexedDB) on the backend. All services now use pgSchema() for isolation within one shared database, enabling cross-schema JOINs, simplified ops, and zero DB setup for new apps. - Migrate 7 services from pgTable() to pgSchema(): mana-user (usr), mana-media (media), todo, traces, presi, uload, cards - Update all DATABASE_URLs in .env.development, docker-compose, configs - Rewrite init-db scripts for 2 databases + 12 schemas - Rewrite setup-databases.sh for consolidated architecture - Update shared-drizzle-config default to mana_platform - Update CLAUDE.md with new database architecture docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:31:28 +02:00
Till JS	996ec81a0e	refactor(shared-python): extract shared auth package from mana-stt and mana-tts Create packages/shared-python/manacore_auth/ with: - auth.py: API key validation, rate limiting, local + external auth - external_auth.py: mana-core-auth remote validation with caching - create_auth_dependency(scope): factory for per-service auth deps Migrated services: - mana-stt: auth.py now wraps shared auth with scope="stt" (272→42 LOC) - mana-tts: auth.py now wraps shared auth with scope="tts" (272→42 LOC) The only difference between services was the scope parameter ("stt" vs "tts"). Both external_auth.py files were 100% identical and are now thin re-exports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:09:32 +02:00
Till JS	e11aa50106	chore: remove unused Supabase auth store, archive stub services - shared-auth-stores: delete createSupabaseAuthStore (zero usage across monorepo, all apps use createManaAuthStore). Remove export + types from index.ts. - services: move ollama-metrics-proxy (stub — just a Grafana dashboard JSON) and it-landing (Astro landing page, not a service) to services-archived/ - lint-staged: add services-archived/ to eslint ignore pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 13:59:53 +02:00
Till JS	4f70e1ca6c	refactor(shared-go): extract shared auth package from 3 Go services Create packages/shared-go/authutil/ with two JWT validator implementations: - JWKSValidator: EdDSA JWKS validation with key caching (extracted from mana-sync) - RemoteValidator: delegates to mana-core-auth /api/v1/auth/validate (from mana-notify/gateway) Plus shared types (Claims, User), middleware factories (JWTMiddleware, ServiceKeyMiddleware), context helpers (GetUser, GetUserID, GetUserRole), and token extraction. Migrated services: - mana-sync: internal/auth/jwt.go now wraps authutil.JWKSValidator - mana-notify: internal/auth/auth.go now wraps authutil.RemoteValidator + ServiceKeyMiddleware - mana-api-gateway: internal/middleware/jwt.go now wraps authutil.RemoteValidator All 3 services compile and pass tests. Service-level packages re-export types for backward compatibility so no consumer code changes are needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 13:27:44 +02:00
Till JS	ee831992de	feat(mana-sync): unified WebSocket — one connection per user instead of 27 Add unified /ws endpoint that serves all app notifications over a single connection. The server now includes appId in the sync-available message payload so the client knows which app to pull. Legacy /ws/{appId} endpoint remains for backward compatibility. Backend (Go): - hub.go: Message struct gains AppId field, NotifyUser sends to all user clients (unified clients receive everything, legacy clients filtered by appId) - main.go: new GET /ws route (empty appId = unified mode) Frontend (sync.ts): - Single connectUnifiedWs() replaces 27 per-app connectWs() calls - Parses msg.appId from server to pull only the affected app - Reconnect/offline logic simplified to one WS This reduces WebSocket connections from 27 per user to 1, cutting server connection overhead by ~96%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 13:09:10 +02:00

1 2 3 4 5 ...

479 commits