managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 22:41:09 +02:00

Author	SHA1	Message	Date
Till JS	7007140d13	fix(voice): switch to gemma3:12b + few-shot prompt for parse-task Two related changes that fall out of real end-to-end testing against the now-working local mana-llm. 1. Default model bumped from gemma3:4b to gemma3:12b for both parse-task and parse-habit. The 4b model gets weekday math off-by-one ("nächsten Montag" from a Wednesday → 2026-04-14 instead of 2026-04-13), aggressively shortens titles ("Anna anrufen" → "Anrufen"), and frequently paraphrases habit names instead of copying verbatim ("Joggen" instead of "Laufen") which the verbatim-validation in coerce drops, costing an LLM round-trip for nothing. The 12b variant is roughly 10% slower for these tiny prompts (~1.1s vs ~1.0s on the GPU box) so the accuracy win is essentially free. 2. parse-task prompt rewritten as few-shot. Pure rule descriptions were worse than simple examples — the long "Rules — read carefully" section in the previous prompt actually made the model compute next Monday as 2026-04-14 even though a direct "what date is next Monday?" prompt to the same model returned 2026-04-13. The detailed rules were also priming the model to over-shorten titles and over-eagerly tag filler words. Five worked examples (including the previously-failing "Anna nächsten Montag anrufen" case) plus one novel case ("Mama am Wochenende besuchen") all come back correct now, including for the novel one. The deterministic guards in coerce() are kept as a backstop for the day the GPU box swaps in a weaker model — they're cheap and don't hurt the happy path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:59:32 +02:00
Till JS	68e8897c9c	chore(env): default MANA_LLM_URL to llm.mana.how Same convention as STT_URL — nobody runs mana-llm in local Docker for dev work, the shared gateway is always reachable, so the path of least friction is to point at it by default. Devs who want a fully offline stack can still override the var locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:55:01 +02:00
Till JS	b505024f7b	fix(voice/parse-task): guard against gemma3:4b hallucinating dueDate + priority Real end-to-end testing against the now-working local mana-llm surfaced two model behaviours the prompt couldn't talk down: 1. gemma3:4b stamps today's date on every task that doesn't have a real time anchor. "Mülltonnen rausstellen" came back with dueDate=2026-04-08 and priority=low even though the prompt explicitly said "MUST be null when no date is mentioned". After typing "Buy milk" the user would silently get a today-due task, which is worse than no parsing at all. 2. The model occasionally returns dueDate as a full ISO timestamp ("2026-04-09T14:00:00") when the transcript mentions a time. The coerce regex previously matched the prefix and let the timestamp through unchanged, which then breaks the YYYY-MM-DD-shaped Dexie field downstream. Fix: deterministic post-processing in coerce. The prompt is also tightened with explicit "ONLY when…" rules but the guards are the load-bearing change since gemma3:4b ignores prompt restrictions. - Strict YYYY-MM-DD extraction: a leading-anchor regex match keeps only the date prefix even if the model adds a time component. - DATE_TRIGGER_PATTERNS: substring scan over the original transcript for German + English date words. If the LLM returned a dueDate but the transcript has zero matches, drop the date — it was a hallucination. False positives are preferable to false negatives: letting through a fake date is more annoying than suppressing a real one the user can re-type. - PRIORITY_TRIGGER_PATTERNS: same idea for priority. The model thinks taxes are inherently urgent; we don't want to inherit that opinion. The labels field is left noisy on purpose — "müll", "unbedingt", "erledigen" all come back from a single transcript and only the ones that fuzzy-match an existing workspace tag end up on the task, so filtering filler words at this layer would be wasted work. Verified against five transcripts spanning bare/explicit/relative date in DE + EN. Real LLM round-trip via http://localhost:5173 → https://llm.mana.how → ollama gemma3:4b. Local mana-llm now reaches its Ollama backend after the gpu-proxy routing fix in `7f382138a`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:50:19 +02:00
Till JS	bfeeef7819	chore(matrix): final scrub of stale matrix references A grep audit after the previous matrix removal commits found a handful of stragglers in non-runtime files that the earlier sweeps missed: - services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the consumer-apps diagram and from the related-services table - services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration bullet - packages/notify-client/README.md: removed sendMatrix() doc entry (the method itself was already gone in the prior cleanup) - docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix Stack" log row that queried tier="matrix" (would show no data forever) - docker/grafana/dashboards/master-overview.json: dropped the "Matrix Bots" stat panel that counted up{job=~"matrix-.*-bot"} - apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via scripts/ecosystem-audit.mjs to drop matrix from the app list, icon counts, file analytics, top offenders and authGuard missing list - .gitignore: removed services/matrix-stt-bot/data/ pattern (the service itself was deleted long ago) Production-side stragglers also addressed (not in this commit): - DROP USER synapse on prod Postgres (the parallel cleanup commit `2514831a3` dropped DATABASE matrix + DATABASE synapse but left the role behind) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:47:54 +02:00
Till JS	7f382138a1	fix(mana-llm): route Ollama through gpu-proxy instead of LAN IP The mana-service-llm container had OLLAMA_URL pointed at the GPU box's LAN address (192.168.178.11:11434). On the Mac Mini host that route works fine, but from inside any Colima container the entire 192.168.178.0/24 subnet gets synthesized RST — Colima's VM "claims" the LAN range without being able to route to it, so every connect() returns "Connection refused" before a packet ever leaves the box. mana-llm started cleanly, reported the configured upstream as "unhealthy", served an empty /v1/models list, and every chat completion failed with "All connection attempts failed". The most visible downstream effect: voice quick-add (parse-task, parse-habit) silently degraded to its no-LLM fallback for everyone hitting the local stack — same shape as a successful response, no error log, just no enrichment. The Mac Mini already runs a gpu-proxy LaunchAgent (com.mana.gpu-proxy, /Users/mana/gpu-proxy.py) that forwards 127.0.0.1:13434 → 192.168.178.11:11434 alongside several other GPU service ports. Pointing OLLAMA_URL at host.docker.internal:13434 and adding the host-gateway extra_hosts mapping puts mana-llm on the already-running rail. Verified end-to-end: from inside the container, GET http://host.docker.internal:13434/api/tags now returns the full model list (gemma3:4b, gemma3:12b, gemma3:27b, qwen2.5-coder:14b, nomic-embed-text). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:46:14 +02:00
Till JS	da6e2f39da	chore(deps): update pnpm-lock after Matrix stack removal Reflects the removal of apps/matrix and services/mana-matrix-bot from the workspace plus the dropped @matrix-org/matrix-sdk-crypto-nodejs override in package.json. Net -365 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:41:15 +02:00
Till JS	029c7973ef	feat(mana/web): pass MANA_LLM_API_KEY from voice parse proxies The /api/v1/voice/parse-task and /api/v1/voice/parse-habit endpoints forwarded transcripts to mana-llm without an X-API-Key header. This worked against the local mana-llm container (no auth) but silently fell back to the no-LLM path when pointed at gpu-llm.mana.how, which requires an API key — voice quick-add would look like it was running in degraded mode forever with no signal that auth was the cause. Now both endpoints read MANA_LLM_API_KEY from the server-side env and attach it as X-API-Key when present, mirroring the pattern already used by /api/v1/voice/transcribe for mana-stt. When the var is empty the header is omitted, so local Docker setups without auth still work. Plumbing: generate-env.mjs writes MANA_LLM_URL + MANA_LLM_API_KEY into apps/mana/apps/web/.env, .env.development gets the new keys with empty defaults, ENVIRONMENT_VARIABLES.md documents the gateway and where to get a key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:40:26 +02:00
Till JS	2514831a3b	chore(matrix): scrub final matrix references after subsystem removal The matrix subsystem was removed in a prior commit. This commit cleans up the small leftovers that grep found: - docker-compose.macmini.yml: dropped the "Matrix Stack" port-range comment, the "matrix" category from the naming convention, and a stale watchtower comment about Matrix notifications. - packages/credits/src/operations.ts: removed AI_BOT_CHAT credit operation type and its definition. It was the billing entry for "Chat with AI via Matrix bot" — no callers left. - services/mana-credits gifts schema + service + validation: removed the targetMatrixId column / param / Zod field. The corresponding PostgreSQL column was dropped manually with `ALTER TABLE gifts.gift_codes DROP COLUMN target_matrix_id` on prod. - docker/grafana/dashboards/{master,system}-overview.json: removed the `up{job="synapse"}` panel queries — they would have shown No Data forever now that Synapse is gone. Production-side cleanup performed in parallel (not in this commit): - Stopped + removed mana-matrix-{synapse,element,web,bot} containers - Removed mana-matrix-bot:local, matrix-web:latest, matrixdotorg/synapse:latest, vectorim/element-web:latest images (~3 GB) - Removed mana-matrix-bots-data Docker volume - Removed /Volumes/ManaData/matrix/ media store (4.3 MB) - DROP DATABASE matrix; DROP DATABASE synapse; on Postgres Cosmetic leftovers intentionally untouched: - Eisenhower matrix in todo (LayoutMode 'matrix') — productivity concept - ${{ matrix.service }} in .github/workflows — GitHub Actions strategy - services/mana-media/apps/api/dist/.../matrix/* — stale build output (not in git, regenerated next mana-media build)	2026-04-08 16:39:42 +02:00
Till JS	e337243303	test(mana/web): unit tests for voice quick-add matchers + fix habit ranking Two new test files lock in the matching boundary where free-text LLM hints meet the user's actual workspace data — that's where bugs hide silently. Both matchers are now pure-function-shaped (the production wrappers just feed them Dexie data) so the tests run without fake-indexeddb or any I/O. todo: 16 cases for matchLabelsToTagsPure covering exact / case / diacritic / substring / specificity rules + the "never invent tags" guarantee. habits: 11 cases for matchHabitToTranscript including the word- boundary "Bier vs ausprobiert" false-positive, multi-word matching, and a real bug the test surfaced on the first run: Without specificity ranking, "Tee" would always beat "Grüner Tee" because the first matching habit in input order won. The matcher now collects all candidates and returns the one with the most matched tokens, so multi-word habits beat single-word substrings whenever both could fit the transcript. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:37:11 +02:00
Till JS	8e8b6ac65f	fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack This commit bundles two unrelated changes that were swept together by an accidental `git add -A` in another working session. Documented here so the history reflects what's actually inside. ═══════════════════════════════════════════════════════════════════════ 1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead of api.signInEmail ═══════════════════════════════════════════════════════════════════════ Previous attempt (commit `55cc75e7d`) tried to fix the broken JWT mint in /api/v1/auth/login by switching the cookie name from `mana.session_token` to `__Secure-mana.session_token` for production. That was necessary but not sufficient: Better Auth's session cookie value isn't just the raw session token, it's `<token>.<HMAC>` where the HMAC is derived from the better-auth secret. Reconstructing the cookie from auth.api.signInEmail's JSON response only gave us the raw token, so /api/auth/token's get-session middleware still couldn't validate it and the JWT mint kept silently failing. Real fix: do the sign-in via auth.handler (the HTTP path) rather than auth.api.signInEmail (the SDK path). The handler returns a real fetch Response with a Set-Cookie header containing the fully signed cookie envelope. We capture that header verbatim and forward it as the cookie on the /api/auth/token request, which now passes validation and mints the JWT correctly. Verified end-to-end on auth.mana.how: $ curl -X POST https://auth.mana.how/api/v1/auth/login \ -d '{"email":"...","password":"..."}' { "user": {...}, "token": "<session token>", "accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now "refreshToken": "<session token>" } Side benefits: - Email-not-verified path is now handled by checking signInResponse.status === 403 directly, no more catching APIError with the comment-noted async-stream footgun. - X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter and our security log see the real client IP. - The leftover catch block now only handles unexpected exceptions (network errors etc); the FORBIDDEN-checking logic in it is dead but harmless and left in for defense in depth. ═══════════════════════════════════════════════════════════════════════ 2. chore: remove the entire self-hosted Matrix stack (Synapse, Element, Manalink, mana-matrix-bot) ═══════════════════════════════════════════════════════════════════════ The Matrix subsystem ran parallel to the main Mana product without any load-bearing integration: the unified web app never imported matrix-js-sdk, the chat module uses mana-sync (local-first), and mana-matrix-bot's plugins duplicated features the unified app already ships natively. Keeping it alive cost a Synapse + Element + matrix-web + bot container quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth, and a steady drip of devlog/dependency churn. Removed: - apps/matrix (Manalink web + mobile, ~150 files) - services/mana-matrix-bot (Go bot with ~20 plugins) - docker/matrix configs (Synapse + Element) - synapse/element-web/matrix-web/mana-matrix-bot services in docker-compose.macmini.yml - matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes - OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks table from mana-auth (oauth_* schema definitions also removed) - MatrixService import path in mana-media (importFromMatrix endpoint) - Matrix notification channel in mana-notify (worker, metrics, config, channel_type enum, MatrixOptions handler) - Matrix entries from shared-branding (mana-apps + app-icons), notify-client, the i18n bundle, the observatory map, the credits app-label list, the landing footer/apps page, the prometheus + alerts + promtail tier mappings, and the matrix-related deploy paths in cd-macmini.yml + ci.yml Devlog/manascore/blueprint entries that mention Matrix are left intact as historical record. The oauth_* + matrix_user_links Postgres tables stay on existing prod databases — code can no longer write to them, drop them in a follow-up migration if you want them gone for real. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:32:13 +02:00
Till JS	4eb5dfe4a0	feat(mana/web): named workbench scenes (Home, Deep Work, …) Users can now define multiple named layouts of the workbench homepage and switch between them. Each scene holds its own openApps list with per-app window state (minimized / maximized / size). Scene list syncs cross-device via mana-sync; the active scene id is per-device (localStorage) so device A doesn't pull device B into a different scene. - new `workbenchScenes` Dexie table, registered in manaCoreConfig - `workbenchScenesStore` (Dexie liveQuery) with scene CRUD + per-scene app mutations; auto-seeds a default "Home" scene on first run - SceneTabs pill bar above the carousel with dnd reorder + context menu (rename / duplicate / delete); SceneRenameDialog and a reusable ConfirmDialog for the destructive path - workbench +page.svelte refactored to delegate all openApps mutations to the store; the carousel itself is unchanged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:25:13 +02:00
Till JS	d37483a0f9	feat(todo): map LLM topic hints to existing workspace tags The parse-task endpoint already returns free-text label hints from the LLM ("steuern", "haushalt", …). Now the todo store fuzzy-matches each hint against the user's existing tags via tagCollection and assigns the matched IDs to the task's metadata.labelIds. Match policy is intentionally conservative: - Normalize via NFD strip + lowercase + collapsed whitespace - Exact normalized match wins - Substring fallback only for ≥3 char strings (avoids "ab" hitting every tag containing "ab") - Never auto-creates a tag — even if the LLM is sure, an unknown topic silently drops, because auto-creating would clutter the user's tag list with one-off duplicates from voice transcripts Both flows pick this up: voice always (transcripts almost always carry topic hints) and typed only when there's structured payoff, same asymmetry as before — typed quick-add now also enriches when the LLM just finds a tag match without a date or priority. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:21:32 +02:00
Till JS	55cc75e7d3	fix(mana-auth): /api/v1/auth/login uses wrong cookie name in production The custom /api/v1/auth/login route signs the user in via the better-auth SDK (auth.api.signInEmail) and then forges a request to /api/auth/token to mint a JWT, passing the session token as a synthetic cookie header. The cookie name was hardcoded as `mana.session_token=...`, but in production better-auth issues the session cookie with the __Secure- prefix (because secure: true is enabled). Get-session middleware on the /api/auth/token side couldn't find the session under the unprefixed name, so it returned 401 silently. Result: tokenResponse.ok was false, the route fell through, and the response had no `accessToken` field at all — only the bare { token, user, redirect } from signInEmail. The frontend in @mana/shared-auth then picked this up as `data.accessToken === undefined` and stored undefined as the JWT, while the parallel /api/auth/sign-in/email call masked the visible damage by setting the SSO cookie. So login appeared to work in the browser (cookie present, session worked) but the JWT path was always broken. Fix: pick the cookie name based on config.nodeEnv. In production use __Secure-mana.session_token, in development use mana.session_token (no __Secure- prefix because secure: false in dev). Verified end-to-end on auth.mana.how: POST /api/v1/auth/login → response now includes accessToken (a real JWT, EdDSA, with sub/email/role/sid/tier/iss/aud claims), refreshToken (the session token), plus the original signInEmail fields. The other /api/auth/get-session call sites in this file forward the incoming request headers verbatim, so they preserve whatever real cookie the browser sent and don't have this bug.	2026-04-08 16:20:18 +02:00
Till JS	d8da11a4ff	feat(todo): typed quick-add gets the same LLM enrichment as voice Press Enter on "Steuererklärung morgen 14 Uhr hoch" and the task lands instantly with your exact text as the title — then a background pass through /api/v1/voice/parse-task swaps in dueDate + priority once mana-llm answers. The title only gets rewritten when the LLM actually finds structured info (dueDate or priority); for plain titles like "Mülltonnen rausstellen" the typed text is left alone, since silently "cleaning up" perfectly fine input is more annoying than helpful. Pulled the parse + STT-then-parse plumbing apart so both flows share parseTaskText() and only differ in policy: voice always applies the LLM title (raw transcripts are noisy), typed only when there's structured payoff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:12:17 +02:00
Till JS	c32a5a57de	feat(todo): LLM-parse spoken tasks into title + dueDate + priority The previous voice quick-add dumped the whole transcript into the task title — fine for "Steuererklärung" but useless for "Steuererklärung morgen 14 Uhr hoch", which should land as title="Steuererklärung", dueDate=tomorrow, priority="high". New endpoint /api/v1/voice/parse-task posts the transcript to mana-llm (gemma3:4b, temperature 0) with a tight system prompt that asks for strict JSON: { title, dueDate, priority, labels }. The endpoint coerces the response back into the typed shape and falls through to { title: transcript, … } whenever anything goes wrong — mana-llm down, JSON garbled, network timeout. Voice quick-add must never fail harder than typed quick-add, so the fallback path is the rule, not the exception. Labels come back from the LLM as free-text topic hints and don't yet map to the workspace's tag IDs — fuzzy matching against existing tags is a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:08:09 +02:00
Till JS	b48c9ff80f	refactor(mana/web): migrate dreams + memoro to /api/v1/voice/transcribe The per-module /api/v1/memoro/transcribe and /api/v1/dreams/transcribe endpoints were literal copies that proxied to mana-stt. Now that the generic /api/v1/voice/transcribe endpoint exists (added with notes), point both stores at it and delete the duplicates. -200 LOC, one place to update STT auth or response shape from now on. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:05:49 +02:00
Till JS	b841a24e73	feat(todo): voice quick-add in workbench ListView via shared <VoiceCaptureBar> Speak a task and it lands in the list as a placeholder while mana-stt transcribes it; the title swaps in once the transcript returns. No date/priority/label parsing yet — that's a follow-up that needs an LLM pass over the transcript. For now the whole transcript becomes the task title and the user can edit inline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:01:50 +02:00
Till JS	9b3d7c7325	feat(notes): voice capture in workbench ListView via shared <VoiceCaptureBar> Drop a mic into Notes — record, transcribe through the new generic /api/v1/voice/transcribe proxy (mana-stt), then write the result back into the placeholder note. The first transcript line becomes the title when it fits in 80 chars, otherwise a generic 'Sprachnotiz' label. The inline editor refreshes from the live note while the placeholder '…' content is still on screen, so a transcript that arrives a moment after the editor opens shows up automatically without overwriting anything the user has typed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:59:42 +02:00
Till JS	e0e801956a	fix(mac-mini): pass MANA_AUTH_KEK through to mana-auth container mana-auth's config.ts has hard-failed startup since commit `e9915428c` (phase 2 encryption vault) when MANA_AUTH_KEK is unset in production. .env.macmini.example documents the variable, but the docker-compose service definition for mana-auth never had a corresponding MANA_AUTH_KEK: ${MANA_AUTH_KEK} line in its environment block, so even when the variable was set in the host .env, it never reached the container. Result: every restart since yesterday looped on "MANA_AUTH_KEK env var is required in production". Added the env passthrough alongside BETTER_AUTH_SECRET with an inline comment pointing at the generation command + service CLAUDE.md. Operator action required on the Mac Mini: KEK=$(openssl rand -base64 32) echo "MANA_AUTH_KEK=$KEK" >> .env ./scripts/mac-mini/build-app.sh mana-auth # or compose up -d mana-auth Then back the value up — it cannot be rotated today without re-wrapping all existing user vaults (no background re-wrap job yet, kek_id column on encryption_vaults is reserved for the future migration path).	2026-04-08 15:58:19 +02:00
Till JS	079cc39dbc	refactor(mana/web): extract shared <VoiceCaptureBar> for module voice capture Dreams and Memoro had two literal copies of the MediaRecorder boilerplate plus parallel mic-button markup, error UI, and requireAuth gating. Lift the recorder + bar into $lib/components/voice and add it to the memoro workbench ListView (which had no mic at all). New voice-capture features just drop in <VoiceCaptureBar> with idleLabel/feature/reason/onComplete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:51:22 +02:00
Till JS	0d1d3b9449	fix(mana-auth): declare missing nanoid dependency mana-auth has been crash-looping in production with: error: Cannot find package 'nanoid' from '/app/src/services/encryption-vault/index.ts' The encryption-vault service imports nanoid for audit row IDs (line 27, used at line 547 in the audit log writer), but nanoid was never added to services/mana-auth/package.json. The import was introduced in commit `e9915428c` (phase 2 — server-side master key custody) and slipped past because nanoid happens to exist transitively in the workspace via postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just fine; a strict isolated container build can't. Fix: - Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps - pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules Verified the import resolves locally: bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())' → ok: 6TLuTWlenhC0KnSESn5Ex The Mac Mini still needs to redeploy mana-auth (rebuild image with the new lockfile, restart container) to pick this up — production is currently 502ing on auth.mana.how.	2026-04-08 15:50:14 +02:00
Till JS	f5678268ff	chore(deps): reconcile pnpm-lock with package.json drift The lockfile had drifted out of sync with two package.json files: - services/mana-events/package.json declared drizzle-orm, hono, jose, postgres, zod, drizzle-kit, typescript — but mana-events was never registered as an importer in pnpm-lock.yaml at all. A frozen-lockfile install would fail. - apps/mana/apps/web/package.json had "postgres": "^3.4.9" as a devDependency that the lockfile hadn't picked up. Both are already declared in their package.json — this commit just locks them in. No new top-level dependencies are introduced. The rest of the diff is non-substantive churn from running pnpm install (jiti peer-version flips between 1.21.7 ↔ 2.6.1, expo-font peer specifier format becoming more explicit). Net diff is −102 lines despite registering two new importers, because the peer-format verbose-ification deduplicates a few entries.	2026-04-08 15:41:14 +02:00
Till JS	45958ad885	feat(mana/web): global requireAuth() gate for guest-blocked features The unified Mana app runs most modules in a "guest mode": you can open a module, look around, type a quick note, etc. without an account. But anything that touches an encrypted table (dreams voice capture, memoro recordings, notes, todo, calendar events, …) needs the user to be logged in — the encryption vault only unlocks against a Mana Auth session, and writing to those tables without it throws `VaultLockedError` at the very last step of the action. Before this commit, every entry point into an encryption-required action would silently let the guest go through the whole flow (record audio, wait for transcription, open the dexie write) and then explode with a stack-trace error. The user lost work and didn't know why. The dreams voice capture flow surfaced this during the 2026-04-08 STT debugging session. The fix is a global imperative gate: `requireAuth({ feature, reason })`. Call sites await it before the action; it returns immediately if the user is already authenticated, otherwise pops a global modal that asks the guest to log in or cancel. Promise-based, so callers decide what to do with `false` (silent abort, restore state, own toast). $lib/auth/require-auth.svelte.ts new — store + helper $lib/components/auth/AuthRequiredModal.svelte new — global modal routes/+layout.svelte mount the modal once packages/shared-utils/src/analytics.ts new ManaEvents.featureBlockedByAuth event for conversion tracking Wired into the two voice-capture entry points that actually exhibited the bug: modules/dreams/ListView.svelte → feature: 'dreams-voice-capture' routes/(app)/memoro/+page.svelte → feature: 'memoro-voice-capture' Both gate on `requireAuth()` BEFORE the mic permission request, so guests see the friendly "Konto erforderlich" modal instead of recording → transcribing → crashing. Design choices documented in detail in the require-auth.svelte.ts header comment: - Imperative function (not a button wrapper component) so it works in event handlers, store actions, keyboard shortcuts, drag-drop handlers — anywhere async code runs. - Single global modal mounted once in the root layout, no portal/z-index gymnastics; two simultaneous prompts replace each other (the most recent one wins). - Checks `authStore.isAuthenticated`, not vault-unlocked state — the user-facing concept is "I need an account", not "I need a working encryption vault". Vault-unlock failures (network error etc.) are a separate bug class with their own UX. - The modal navigates to `/login?next=<current path>` so the user lands back on the same page after logging in. The Promise resolves `false` on navigation; the user re-clicks the original button after coming back, and the second click sees `isAuthenticated === true` and proceeds without a modal. Re-triggering the original action across a navigation cycle would require restoring half-recorded mic state — not worth the complexity, and the second click is a clean UX. How to wire a new entry point (4 lines): import { requireAuth } from '$lib/auth/require-auth.svelte'; async function handleCreateThing() { const ok = await requireAuth({ feature: 'create-thing', reason: 'Things werden verschlüsselt gespeichert. Dafür brauchst du ein Mana-Konto.', }); if (!ok) return; // ...existing logic } Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:36:38 +02:00
Till JS	2b4494628e	fix(mana/web): unblock voice capture — permissions policy, notification mount, dev SW Three independent bugs that conspired to make the dreams + memoro mic buttons completely unusable in production AND in dev. Each one alone would have been the only blocker; they layered on top of each other so fixing the top one just exposed the next. 1. Permissions-Policy header blocked the microphone API entirely. `packages/shared-utils/src/security-headers.ts` set `microphone=()` which means "no origin, including self, may use the microphone". `getUserMedia()` throws a `Permissions policy violation` and the browser never even shows the permission dialog — no amount of OS / browser / site settings can override it because the policy blocks the API at the document level. Fix: change to `microphone=(self)` so mana.how itself can use the API. Camera stays disallowed (no module needs it). 2. Notification permission was requested at layout mount time. `(app)/+layout.svelte` called `notificationService.requestPermission()` from `onMount()`. Modern browsers require permission requests to come from a user gesture — calling it without one queues the prompt until the next click. That meant the user's FIRST click on any button (in this case the dreams "Traum sprechen" mic button) showed the queued notifications prompt instead of the action they actually clicked. Worse, `getUserMedia()` was then silently dropped because Chrome only shows one permission dialog at a time. Fix: remove the mount-time call entirely. Notification permission must be requested from a button the user explicitly clicks ("Benachrichtigungen aktivieren" toggle in Settings or first time a reminder is created) — the reminder scheduler still runs without permission, it just won't fire OS notifications until granted. 3. vite-plugin-pwa registered a service worker in dev that cached the old layout chunks across reloads, so the fix for #2 was invisible until the user manually unregistered the SW in DevTools. `vite-plugin-pwa` defaults `devEnabled: true`, which is a well-known footgun for fast iteration. Production still gets the full SW (this only flips dev). The 2026-04-08 mic-button hunt took an extra hour for exactly this reason. Fix: pass `devEnabled: false` to createPWAConfig in vite.config.ts. Verified: in a fresh incognito tab on `localhost:5173/`, opening the Dreams app in the workbench and clicking the mic button now shows the microphone permission dialog directly (no notifications hijack), and recording → transcription works end-to-end against the production mana-stt service on the GPU box. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 15:36:03 +02:00
Till JS	4cb1bc1827	fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes mana-voice-bot's source default was 3050, which collided with mana-sync. Today the collision is latent (voice-bot isn't deployed anywhere), but sooner or later someone is going to start it on a host that's already running mana-sync and the second one will refuse to bind. Moving to 3024 puts it inside the AI/ML port range alongside its dependencies (stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync. Updated: - app/main.py — PORT default 3050 → 3024 - start.sh, setup.sh — same fix in the example commands - CLAUDE.md — full rewrite. Old version described "Mac Mini deployment" with launchd; the new version explicitly says "not deployed yet" and documents the seven concrete steps to deploy on the Windows GPU box alongside the other AI services (Scheduled Task, service.pyw, .env, firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update). docs/WINDOWS_GPU_SERVER_SETUP.md: - Added the missing ManaVideoGen scheduled task to all four Start-ScheduledTask snippets — video-gen has been running on the Windows GPU but the doc had never picked it up. - Added a "mana-video-gen (Port 3026)" service section parallel to the existing image-gen one, with venv path, repo pointer, model, etc. - Added a repo-pendants table mapping C:\mana\services\<svc>\ to the corresponding services/<svc>/ directory in the repo, plus a note that changes should flow repo→Windows, not the other way around. docs/PORT_SCHEMA.md: - Reconciled the warning block with the post-cleanup reality: no more active or latent port collisions (image-gen ↔ video-gen and voice-bot ↔ sync are both resolved). Listed the actual ports per host with public URLs. Kept the planned-vs-actual disclaimer for the services that still don't match the aspirational ranges (mana-credits 3061 vs planned 3002, etc).	2026-04-08 13:14:57 +02:00
Till JS	f4347032ca	chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere	2026-04-08 13:06:40 +02:00
Till JS	c7b4388cec	feat(mana-image-gen): replace Mac flux2.c implementation with Windows GPU diffusers The repo's mana-image-gen used to be a Mac Mini–only service built on flux2.c with hard MPS+arm64 platform checks. The actual production image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code that lived only at C:\mana\services\mana-image-gen\ on the GPU box. This commit pulls the Windows implementation into the repo and deletes the Mac one, so there's exactly one mana-image-gen and its source of truth is git rather than one folder on one machine. Removed: - setup.sh — Mac-only flux2.c installer with hard arm64 platform check - app/main.py (Mac flux2.c subprocess wrapper version) - app/flux_service.py (Mac flux2.c subprocess wrapper version) Added (pulled from C:\mana\services\mana-image-gen\): - app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup) - app/flux_service.py — diffusers FluxPipeline wrapper - app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY) - app/vram_manager.py — shared VRAM accounting - service.pyw — Windows runner used by the ManaImageGen scheduled task Updated: - main.py PORT default from 3025 → 3023 to match the production reality (the service.pyw runner already binds 3023 explicitly via uvicorn.run, but the source default should match so direct uvicorn invocations and local tests don't pick the wrong port) - CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack - README.md trimmed to a pointer at CLAUDE.md + the public URL - .env.example written from scratch (didn't exist before — the service's .env on the GPU box was undocumented) The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the actual Mac Mini deployment will be cleaned up in the next commit, along with the rest of the Mac-Mini AI service infrastructure.	2026-04-08 13:02:42 +02:00
Till JS	b8e18b7f82	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts The Windows GPU server has been the actual production home for these services for some time, and the running code there has drifted ahead of the repo. This sync pulls the live versions back into the repo so the Windows box is no longer the only place those changes exist. Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11): mana-llm: - src/main.py, src/config.py — small fixes (auth wiring, config tweaks) - src/api_auth.py — NEW (cross-service GPU_API_KEY validator) - service.pyw — Windows runner used by the ManaLLM scheduled task (sets up logging redirect, loads .env, calls uvicorn) mana-stt: - app/main.py — substantial cleanup (684→392 lines), drops the whisperx-as-separate-backend branching now that whisper_service.py rolls whisperx in directly - app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines) - app/auth.py + external_auth.py — significantly expanded auth - app/vram_manager.py — NEW (shared VRAM accounting helper) - service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH injection, .env loading - removed: app/whisper_service_cuda.py (folded into whisper_service.py) - removed: app/whisperx_service.py (folded into whisper_service.py) mana-tts: - app/auth.py, external_auth.py — same auth expansion as stt - app/f5_service.py, kokoro_service.py — Windows tweaks - app/vram_manager.py — NEW (same shared helper as stt) - service.pyw — Windows runner mana-video-gen: - service.pyw — Windows runner (no other changes; the .py code on the GPU box is byte-identical to what's already in the repo) The service.pyw files contain absolute Windows paths (C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user profile. Kept as-is intentionally — they exist to be deployed to that one machine and any abstraction layer would just hide what's actually happening. Anyone redeploying to a different layout will need to edit the path strings, which is a known and obvious change. Mac-Mini infrastructure for these services (launchd plists, install scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen implementation) is still on disk and will be removed in a follow-up commit, along with replacing mana-image-gen with the Windows diffusers+CUDA implementation. This commit is just the live-code sync.	2026-04-08 12:46:03 +02:00
Till JS	abe0a21966	refactor(auth-ui): tighten LoginPage UX, a11y, and dead code Some checks are pending CI / Build mana-crawler (push) Blocked by required conditions Details CI / Build mana-media (push) Blocked by required conditions Details CI / Build mana-credits (push) Blocked by required conditions Details CI / Build mana-web (push) Blocked by required conditions Details CI / Build chat-backend (push) Blocked by required conditions Details CI / Build chat-web (push) Blocked by required conditions Details CI / Build todo-backend (push) Blocked by required conditions Details CI / Build todo-web (push) Blocked by required conditions Details CI / Build calendar-backend (push) Blocked by required conditions Details CI / Build calendar-web (push) Blocked by required conditions Details CI / Build clock-web (push) Blocked by required conditions Details CI / Build contacts-backend (push) Blocked by required conditions Details CI / Build contacts-web (push) Blocked by required conditions Details CI / Build presi-web (push) Blocked by required conditions Details CI / Build storage-backend (push) Blocked by required conditions Details CI / Build storage-web (push) Blocked by required conditions Details CI / Build telegram-stats-bot (push) Blocked by required conditions Details CI / Build nutriphi-backend (push) Blocked by required conditions Details CI / Build nutriphi-web (push) Blocked by required conditions Details CI / Build skilltree-web (push) Blocked by required conditions Details CI / Build mana-matrix-bot (Go) (push) Blocked by required conditions Details Docker Validate / Validate Dockerfiles (push) Waiting to run Details Docker Validate / Build calendar-web (push) Blocked by required conditions Details Docker Validate / Build todo-backend (push) Blocked by required conditions Details Docker Validate / Build todo-web (push) Blocked by required conditions Details Docker Validate / Build zitare-web (push) Blocked by required conditions Details Docker Validate / Build mana-auth (push) Blocked by required conditions Details Docker Validate / Build mana-sync (push) Blocked by required conditions Details Docker Validate / Build mana-media (push) Blocked by required conditions Details Mirror to Forgejo / Push to Forgejo (push) Waiting to run Details LoginPage cleanup: - Drop dev pre-fill credentials and the secret logo-as-button trick - Remove duplicate in-component theme toggle; accept isDark as a prop and let the (auth) layout's global theme toggle drive it - Move passkey CTA below the password form so the primary flow stays primary - Remove the dead "Angemeldet bleiben" checkbox (was bound but never forwarded to onSignIn) - Fix the skip-to-form link to use sr-only/focus:not-sr-only so it only appears on keyboard focus - Fix the "oder" divider to render its before/after hairlines by setting an explicit color on the parent - Wire focus-visible outlines on all interactive controls - Bump 0.6 → 0.75 opacity on subtitle text for AA contrast - Drop opacity-60 from the headerControls wrapper Robustness: - Track all setTimeout IDs in a Set and clear them in an effect cleanup so navigation away doesn't fire stale callbacks (success redirects, error shake, focus restore) - Replace (result as any) casts with the new typed AuthResult fields - New resolveErrorCode() helper prefers result.errorCode and falls back to legacy string matching, so rate-limit / account-lock detection survives i18n - WebAuthn Conditional UI: on mount, if PublicKeyCredential.isConditionalMediationAvailable(), call onSignInWithPasskey({ conditional: true }) so passkeys appear inline in the email autofill dropdown - Extract the dismissible success-banner markup into a {#snippet successBanner} and reuse it for the verified / verification-sent / magic-link-sent cases (~50 lines of duplicate JSX out) Page wrappers: - login/+page.svelte passes isDark={theme.isDark} so the in-app theme store drives both layouts - register/+page.svelte wraps trackGuestConversion() in queueMicrotask + try/catch so analytics can never block the success redirect - Drop the dead baseSignupCredits={25} prop from register/+page.svelte (RegisterPage never accepted it) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:41:19 +02:00
Till JS	ff7dc5d875	feat(auth): structured error codes + conditional passkey UI - Add AuthErrorCode union and typed twoFactorRedirect/retryAfter fields on AuthResult so the frontend can branch on stable codes instead of locale-dependent error strings. - Extend signInWithPasskey with an optional { conditional } flag, threaded through to @simplewebauthn/browser via useBrowserAutofill, so hosts can opt into WebAuthn Conditional UI (passkey suggestions inline in the email autofill dropdown). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:40:51 +02:00
Till JS	3c91691d26	fix(mana-image-gen): align source default port with production reality Source default was 3026 but Mac Mini production has been overriding to 3025 via the launchd plist in scripts/mac-mini/setup-image-gen.sh ever since the service was set up. The override existed in exactly one place that is not version-controlled in any obvious way — anyone redeploying without that script would land on 3026 and clients pointing at 3025 would fail to connect. Source default → 3025 across main.py, setup.sh, README, CLAUDE.md so the launchd plist is no longer load-bearing. The Mac Mini setup script still sets PORT=3025 explicitly; that's now belt-and-suspenders rather than the only thing keeping production alive. Also added a note clarifying that this Mac Mini service (flux2.c, MPS, arm64-only) is not the same thing as the "image-gen" running on the Windows GPU server (PyTorch + diffusers + CUDA, port 3023, code lives at C:\mana\services\mana-image-gen\ outside this repo). Two different implementations sharing a name was confusing the port-collision audit. Updated docs/PORT_SCHEMA.md warning block to retract the previous false claims of two active port collisions: - image-gen ↔ video-gen on 3026 — wrong: image-gen runs on Mac Mini on 3025 (now also the source default), video-gen is alone on the Windows GPU on 3026 - voice-bot ↔ sync on 3050 — latent only: mana-voice-bot is not deployed anywhere (no launchd, no scheduled task, no cloudflared route), so the collision is in source defaults but not in production The voice-bot 3050 default should still be moved before voice-bot is ever deployed — flagged in the PORT_SCHEMA warning instead of silently fixed since voice-bot deployment is its own decision.	2026-04-08 12:30:33 +02:00
Till JS	b0a08ce239	docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions New service docs: - services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local), WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy backend loading and the launchd plist setup on the Mac Mini. - services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and event-sharing. Documents the host (JWT) vs public (token) split, the rate-limit sweeper, and the createApp factory pattern that lets unit tests run without bootstrapping the production sweeper. Stale entries fixed: - mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the rewrite is the only mana-auth there is now. Email channel updated from Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md). - mana-notify: same Brevo → Stalwart fix in the channel table and env var defaults. PORT_SCHEMA.md flagged as aspirational: - The doc was dated 2026-03-28 and presented as "single source of truth", but cross-checking against actual service source files (config.go, main.py, start.sh) shows nothing matches. Added a prominent warning at the top with the real ports + two confirmed collisions: * mana-image-gen and mana-video-gen both default to PORT 3026 * mana-voice-bot and mana-sync both default to PORT 3050 Today these are masked because image-gen + voice-bot live on the Windows GPU server while video-gen + sync live on the Mac Mini, but the moment they share a host they collide. Either execute the planned reorg or pick non-colliding ports and rewrite the doc to match reality — flagged as a real follow-up.	2026-04-08 12:23:48 +02:00
Till JS	a3a47459c6	docs(audit): file-bytes encryption implementation plan + audit roll-up Two changes: 1. New BACKLOG_FILE_BYTES_ENCRYPTION.md captures everything I'd want to know if I were picking up the file-bytes encryption work cold in 6 months. ~370 lines, sits next to DATA_LAYER_AUDIT.md for discoverability. Sections: - TL;DR + status (deferred, no production impact yet) - Goal + non-goals - Threat model delta table (mode-by-mode) - Architecture: write path with ASCII flow diagram - Architecture: read path with ASCII flow diagram - The six hard parts: 1. Web Crypto AES-GCM doesn't stream → chunked-AEAD wrapper 2. Multipart uploads need coordinated chunking (S3 5 MB minimum vs. our 1 MB AES-GCM chunks) 3. Resumable uploads + key persistence (new _pendingUploads table for the in-flight content key) 4. No more server-side thumbnails (three options, recommended: client-side resize before upload) 5. Sharing complicates the trust model (URL-fragment key sharing, recommended; Mega.nz / Cryptpad pattern) 6. Migration of existing plaintext files (lazy on-read, recommended) - Schema delta (sql + Dexie additions) - File map (~2200 LoC across 9 new files + 3 touched) - Testing strategy (unit + integration + e2e per layer) - Out-of-scope items explicitly listed - Decision criteria for when to actually do this - Five open questions for whoever picks it up - Cross-references to related files The doc is opinionated where I have a defensible recommendation and explicit about uncertainty where I don't. 2. DATA_LAYER_AUDIT.md updates: - Backlog "Offen" item #1 (File-Bytes-Encryption) now points directly at the new plan doc with a one-line teaser. - Backlog "Abgeschlossen" gains a row C for the Conflict Visualization UI shipped in `ed8ab4483` (was still listed as open from the previous audit roll-up). - List renumbered: Conflict-UI dropped from "Offen", remaining items shifted up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:17:15 +02:00
Till JS	5581295b12	chore: tidy root files + reorganize a few stale docs Root file cleanup: - mac-mini-setup.sh → scripts/mac-mini/bootstrap.sh (first-time bootstrap belongs next to the other mac-mini setup-* scripts) - test-chat-auth.sh → scripts/test-chat-auth.sh (ad-hoc smoke test, no reason to live in the repo root) - cloudflared-config.yml stays in root on purpose — it's the single source of truth read by scripts/mac-mini/setup-*.sh and scripts/check-status.sh. Docs: - docs/POSTMORTEM_2026-04-07.md → docs/postmortems/2026-04-07-memoro-deploy-prod-wipe.md (creates the postmortems/ home for future entries; descriptive name) - docs/future/MAIL_SERVER_MAC_MINI_TEMP.md deleted — what it described ("Bereit zur Umsetzung", Stalwart on Mac Mini) is what's actually running today, documented in docs/MAIL_SERVER.md. The DEDICATED variant in docs/future/ remains since it's still a real future plan. Root CLAUDE.md fix: - @mana/local-store description was wrong — claimed it was legacy/standalone only, but it's still used by apps/mana/apps/web itself, plus manavoxel, arcade, and three shared packages. Not touched (flagged for follow-up): - NewAppIdeas/ (344K of "Roblox Reimagined" planning notes in repo root) — user decision: archive externally or move under docs/future/ - Doc giants (PROJECT_OVERVIEW 41k, MATRIX_BOT_ARCHITECTURE 36k, etc.) — splitting them is its own refactor - Service CLAUDE.md staleness audit across 18 services — too broad for this pass	2026-04-08 12:15:27 +02:00
Till JS	c8ed58b7d1	fix(mana,ui): integrate guest nudge into bottom stack + theme it The "Gefällt es dir?" guest nudge was a free-floating fixed element at bottom: 10rem, so it didn't follow the bottom-stack when the PillNav was collapsed. Move it inside .bottom-stack as the first child so it shares the stack's reflow. NotificationBar now uses the elevation system (--color-surface-elevated, --color-border-strong, --color-foreground) instead of hardcoded rgba so it adapts to all themes. Bumped the CTA button (shadow + hover lift) and container (stronger border, layered shadow) to be more visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:13:05 +02:00
Till JS	b3523f8bdc	chore: cleanup leftover dirs from ManaCore→Mana rename + document apps/api Removed: - apps/manacore/ — three Svelte files were byte-identical duplicates of the apps/mana/ versions, leftover from the 2025 rename. Untracked .env files in the same dir were also cleared. - 21 empty apps//apps/web-archived/ directories — leftover from the unification move, never tracked in git. - services/it-landing/ — empty directory, picked up by the services/ workspace glob for no reason. - apps/news/apps/server-archived/ — empty. Fixed: - scripts/mac-mini/status.sh: COMPOSE_PROJECT_NAME fallback was still manacore-monorepo from before the rename. Documented: - Root CLAUDE.md now describes apps/api/ (the @mana/api unified backend) as a top-level peer to apps/mana/. It was completely missing from the trimmed CLAUDE.md, which made the layout look frontend-only.	2026-04-08 12:12:02 +02:00
Till JS	ed8ab44832	feat(sync): conflict visualization with restore-my-version toast Closes backlog C from the Phase 9 audit. The data layer has had real field-level LWW since Sprint 1, but when the server's value beat a local edit, the user had no way to know. This commit adds the missing UI piece: a toast that appears whenever applyServerChanges overwrites a non-empty local field with a strictly newer server value, with a one-click "restore my version" path. sync.ts — detection ------------------- Two new exports: - SyncConflictPayload: per-field overwrite event shape (tableName, recordId, field, wasLocal, nowServer, localTime, serverTime). - subscribeSyncConflicts(listener): in-module pub/sub. Returns an unsubscribe function. Both LWW branches in applyServerChanges (insert-as-update and the canonical update-with-fields path) now call notifyConflict() when: 1. The server time is STRICTLY greater (not equal) than the local field time → there's actually an edit window to lose 2. The local field value is non-null/undefined → user actually typed something to overwrite 3. The values are not equal (cheap JSON-string compare for objects, === for primitives) → there's a real change, not an idempotent server replay Why a custom registry instead of CustomEvent + window.dispatchEvent? The existing sync-telemetry + quota-detect helpers use window.dispatchEvent which doesn't work in node-based vitest envs (no DOM EventTarget). The conflict bus is small enough that a plain Set<listener> is simpler than polyfilling EventTarget — and the node test path matters because we need automated coverage of the detection logic. conflict-store.svelte.ts — UI state ----------------------------------- Svelte 5 $state-backed store with three responsibilities: 1. Coalescing: a SyncConflict is keyed by `${tableName}\|${recordId}`, so a burst of N field-overwrites on the same record collapses into ONE toast with all affected fields underneath. The original wasLocal value is preserved across coalescing (we don't clobber the user's first typed value if a later field event arrives). 2. Auto-dismiss: each conflict has a 30s TTL after which it evicts itself. Manual dismiss trumps the timer. 3. Restore: writes wasLocal back to Dexie with a fresh updatedAt that beats the server's serverTime, plus a __fieldTimestamps patch so the field-LWW pass on the next sync round will let our value win. Deferred via setTimeout(0) so it lands AFTER applyServerChanges releases its per-table apply lock — running before the lock release would silently drop the restore (the hook suppression is per-table-set, not per-record). FIFO eviction at MAX_VISIBLE=8 keeps a bursty server from growing the visible array unbounded. SyncConflictToast.svelte — the UI --------------------------------- Mounts globally in +layout.svelte. Stacks bottom-right above the OfflineIndicator. Each toast shows: - Module label ("Aufgabe", "Notiz", "Termin", …) derived from a table-name → German label map. Unknown tables fall through to the bare table name. - Field count summary ("Feld »title«" / "3 Felder") — we deliberately do NOT render the actual values because some are encrypted blobs and decrypting them in the toast would be significant complexity for marginal UX gain. The user knows what they were just editing. - Two buttons: "Wiederherstellen" (calls conflictStore.restore) and "Behalten" (calls dismiss). Slide-in animation, dark-mode-aware styling, role="alertdialog" for accessibility. Wiring ------ data-layer-listeners.ts: - Imports installConflictListener from conflict-store - Calls it from installDataLayerListeners() right after the quota + telemetry handlers - Adds the disposeConflict() call to the cleanup return +layout.svelte: - Imports SyncConflictToast and mounts it next to SuggestionToast so it inherits the same global-overlay positioning context Tests ----- Five new integration tests in sync.test.ts cover: - Fires when server overwrites a non-empty local field with a strictly newer value - Does NOT fire when local field is null/undefined (no edit to lose) - Does NOT fire when values are equal (idempotent replay) - Fires once per overwritten field on a multi-field update - Does NOT fire on a timestamp tie (LWW lets server win silently when there's no real edit window) All 25 sync tests + 138 total data-layer tests pass. The new captureConflicts() helper subscribes via subscribeSyncConflicts() which works in the node-vitest env without needing a DOM polyfill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 12:01:17 +02:00
Till JS	fe3fc9e7e2	docs: trim CLAUDE.md files — remove stale + duplicated guidance Root CLAUDE.md: 1138 → 169 lines. Removed ghost apps-archived list, Supabase env examples, duplicate mana-auth row, contradictory "Code Quality TODO" block. Pushed search/storage/database/landing/manascore howtos out to docs/ + .claude/guidelines/ pointers. apps/mana/CLAUDE.md: 259 → 175 lines. Dropped non-existent workbench/ route from the routing diagram. Folded the auth section into a pointer to root + the mana-specific current-user stamping pattern. Merged the two module-system sections. Kept the data-flow ASCII diagram and the encryption 3-step workflow (the part you actually need while writing stores).	2026-04-08 11:59:51 +02:00
Till JS	b6486a8a46	fix(mana-video-gen): typo in get_model_info — total_mem → total_memory PyTorch's `torch.cuda.get_device_properties(0)` returns a `_CudaDeviceProperties` object whose memory attribute is `total_memory` (bytes), not `total_mem`. The typo crashed the service immediately at startup because `get_model_info()` is called from the FastAPI lifespan handler, not lazily — uvicorn logged "Application startup failed" before any request could land. Found while installing mana-video-gen on the Windows GPU box (192.168.178.11:3026) for the gpu-video.mana.how Cloudflare route. After the fix the service starts cleanly under the ManaVideoGen scheduled task and responds 200 on /health both LAN and via Cloudflare tunnel. status.mana.how now reports 42/42 — first time ever. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 11:59:40 +02:00
Till JS	142a65a22f	docs: Phase 9 documentation roundup — close encryption-shaped doc gaps Five documentation surfaces gained encryption awareness in this sweep. Before this commit, the only place anyone could learn about the at-rest encryption layer or the zero-knowledge opt-in was the internal DATA_LAYER_AUDIT.md. New contributors and self-hosters would never discover one of the most important features of the product just by reading the standard onboarding docs. apps/docs/src/content/docs/architecture/security.mdx (NEW) ---------------------------------------------------------- First-class user-facing security page in the Starlight site, slotted into the Architecture sidebar between Authentication and Backend. Sections: - What's encrypted (overview table of 27 modules + the intentional plaintext carve-outs) - Standard mode flow with ASCII diagram - "What Mana CAN see" trust statements per mode - Zero-knowledge mode setup walkthrough (Steps component) - Unlock flow on a new device - Recovery code rotation - Deployment requirements (the loud MANA_AUTH_KEK warning) - Audit trail action vocabulary - Threat model summary table - Implementation file references with paths services/mana-auth/CLAUDE.md ---------------------------- New "Encryption Vault" section under Key Endpoints, listing all 7 routes (status, init, key, rotate, recovery-wrap GET+DELETE, zero-knowledge) with their HTTP method, path, error codes, and a description. Mentions the three CHECK constraints + RLS + audit table. Points readers at DATA_LAYER_AUDIT.md and the new security.mdx for the deep dive. Environment Variables block gains MANA_AUTH_KEK with a multi-line comment explaining the openssl rand command + dev fallback warning. apps/mana/CLAUDE.md ------------------- Full rewrite. The existing file was from the Supabase era and described things like @supabase/ssr, safeGetSession(), and a five-table schema with users + organizations + teams that doesn't exist any more. Replaced with the unified-app architecture: - Module system layout (collections.ts / queries.ts / stores/) - Mana Auth (Better Auth + EdDSA JWT) instead of Supabase - Local-first data layer with the full pipeline diagram - At-rest encryption section with the "when writing module code that touches sensitive fields" 4-step guide - Updated routing structure (no more separate /organizations, /teams routes) - Module store pattern code example - Reference document table at the bottom pointing at the audit, the new security.mdx, and the auth doc Root CLAUDE.md -------------- New "At-Rest Encryption (Phase 1–9)" subsection under the Local-First Architecture section. Two-mode trust summary table, production requirement for MANA_AUTH_KEK with the openssl command, the "when writing module code" 4-step guide, and a reference table. New contributors reading the root CLAUDE.md from top to bottom now hit encryption naturally as part of the data layer discussion. .env.macmini.example -------------------- MANA_AUTH_KEK was missing from the production env example entirely — the macmini deployment would silently boot on the 32-zero-byte dev fallback if you copied this file. Added with a multi-paragraph comment covering: how to generate, why it's required, how to store securely (Docker secrets / KMS / Vault), and the rotation caveat. apps/docs/src/content/docs/deployment/self-hosting.mdx ------------------------------------------------------ Two changes: 1. Added MANA_AUTH_KEK to the mana-auth service block in the Compose example with an inline comment pointing at the new section below. 2. New "Encryption Vault Setup" H2 section with subsections: - Generating a KEK (with a fake example value labelled DO NOT USE — generate your own) - Securing the KEK (Docker secrets, KMS, systemd LoadCredential, anti-patterns) - "What if I lose the KEK?" — explains the data is unrecoverable by design and mitigation via zero-knowledge mode opt-in - KEK rotation — calls out the missing background re-wrap job as a known limitation apps/docs/astro.config.mjs -------------------------- Added "Security & Encryption" entry to the Architecture sidebar between Authentication and Backend so the new page is reachable from the docs nav. Astro check: 0 errors, 0 warnings, 0 hints across 4 .astro files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 11:47:59 +02:00
Till JS	b961453244	docs(audit): roll up Phase 9 backlog sweep Marks the four backlog items closed in this session — vault service integration tests, recovery code rotation, pre-wired insert helpers for future server-pushed records, and boards/boardItems encryption. Updates the encrypted-tables list to 27 tables. Updates ------- 1. Sprint table grows by 4 rows (BL1, BL2, BL3+4, BL5) with the four backlog commits. 2. Test-Status line bumped: 21 web test files → 21 web + 2 mana-auth 78 vitest crypto tests + 39 bun mana-auth tests "25+ tables" → "27 tables" (boards + boardItems added) 3. Section 5 encrypted-tables list grows by: - boards (name, description) - boardItems (textContent, only when itemType === 'text') Both labelled "9 BL" in the Phase column to mark them as backlog-sweep additions. 4. "Tabellen ohne Encryption (bewusst)" subsection: removed the stale "boards/boardItems are a candidate for later" entry — they're encrypted now. Added a redirect note pointing readers at Section 6 where the actual decision is recorded. 5. Section 6 ("Backlog") completely restructured. The flat "in priority order" list became two subsections: "Abgeschlossen (Phase 9 Follow-Up Sweep)" — table with the four commits + a one-line "what" notice each. Item 3+4 is explicitly marked as a re-frame: the original "server pushes plaintext" risk turned out to overstate the problem because the generate/upload UIs are TODO stubs. The fix was pre-wired insert() helpers, not a server-side rewrite. "Offen" — five remaining items, reordered: 1. File-Bytes-Encryption (NEW: surfaced as "#4b" while documenting that filesStore.insert() only protects metadata) 2. Image-Generation / File-Upload Wire-Up (NEW: ensures the future UIs go through the helpers from #3+4) 3. Conflict Visualization UI (unchanged) 4. Composite Indexes für Multi-Account (unchanged) 5. V3 Migration Tests (unchanged) 6. Eckdaten line bumped from "25+ Tabellen aktiv" to "27 Tabellen aktiv". Best Practices line for ZK gets the "+ rotate im Active-State-Support" suffix. 7. Last-update header bumped to today. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 00:00:52 +02:00
Till JS	a7e5b39ad0	feat(picture): encrypt boards + boardItems Closes backlog #5 from the Phase 9 audit. Adds two new registry entries (boards, boardItems) and wraps the boards store + queries + search provider so the moodboard names, descriptions and text-item content are sealed at rest like every other user-typed field. Registry -------- - boards: ['name', 'description'] - boardItems: ['textContent'] Inline comments explain that textContent is only set when itemType === 'text' (image-type items have it null, encryptRecord is a pass-through). Coordinates / dimensions / z-index / opacity stay plaintext for the canvas renderer. Boards store ------------ - createBoard: snapshots plaintext for the return value before encryptRecord mutates the row in place - updateBoard: encrypts the diff before update, then re-fetches + decrypts for the return value (so the caller gets plaintext, not the ciphertext we just wrote) - duplicateBoard: NEW behaviour — explicitly decrypts the original board first because the duplicate concatenates "(Kopie)" onto the name string. Concatenating onto a "enc:1:..." prefix would produce a malformed blob that fails to decrypt later. The board items are spread directly because the duplicate uses the SAME master key, so the existing ciphertext stays valid; encryptRecord is idempotent on already-encrypted strings so it's a no-op safety check. Reads ----- - useAllBoards: decrypts the visible board set before mapping. The item count map only reads structural fields (deletedAt + boardId) so it doesn't need a decrypt pass for boardItems. - allBoards$ raw observable: same pattern - search/providers/picture: decrypts before substring scoring against the user query The unified mana app currently has no UI that renders boardItems .textContent (the seed data in collections.ts is exported as PICTURE_GUEST_SEED but never imported anywhere — dead code), so no item-side reader needs touching for this commit. When a future canvas editor lands it'll go through the existing decryptRecord helpers naturally. 78/78 crypto tests still pass (registry shape unchanged at the API level). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:57:54 +02:00
Till JS	109de61e21	feat(picture,storage): pre-wired insert helpers for future generate/upload flows Closes backlog #3+4 from the Phase 9 audit. The original framing — "server-pushed records bypass client-side encryption" — turned out to overstate the problem after a code audit: - apps/mana/apps/web/src/routes/(app)/picture/generate/+page.svelte is currently a TODO stub. The handleGenerate() function returns "requires connection to Picture-Server (port 3006)" without inserting anything. - There is no fileTable.add() call site anywhere in the unified mana app. File uploads still happen via the standalone storage server in apps/storage and arrive via legacy mana-sync push. So the production code path that would write plaintext images or files to the user's IndexedDB doesn't yet exist. The risk only materialises when someone wires up the in-app generate / upload UI in the unified app. The right action is to leave behind a clearly-labelled, encryption- aware insert() helper on each store so the future implementation has an obvious "do the right thing" path to call. This commit does exactly that. picture/stores/images.svelte.ts ------------------------------- New imagesStore.insert(image: LocalImage) method: - Calls encryptRecord('images', image) to seal `prompt` + `negativePrompt` (the two registered encrypted fields) - Calls imageTable().add(image) - Fires the PictureEvents.imageCreated analytic (replaces the old plain-table-add path) A long doc comment on the method explains the architectural reasoning: the server cannot encrypt under the user's master key (the key only lives in the browser), so the generation flow MUST round-trip through the client store even if the AI call itself happens server-side. The pattern is documented as: 1. Client posts { prompt, negativePrompt, ... } to image-gen API 2. Server returns { storagePath, generationId, dimensions, ... } 3. Client calls imagesStore.insert(...) with both halves 4. encryptRecord seals the prompt fields before the IndexedDB write The mixed-state guarantee from picture/queries.ts already covers the migration window where some images came in via legacy server-side push and others through this path — decryptRecord passes plaintext through and unwraps ciphertext blobs. storage/stores/files.svelte.ts ------------------------------ New filesStore.insert(file: LocalFile) method: - Calls encryptRecord('files', file) to seal `name` + `originalName` - Calls fileTable.add(file) Same architectural reasoning applies. The doc comment also flags a SEPARATE concern that this commit does NOT address: encrypting the actual file bytes on S3 (so the storage provider can't read the content) needs streaming AES-GCM and is a much bigger lift. Tracked as "backlog #4b" in the comment for whoever picks it up next. (No analytic call yet on the storage side because StorageEvents doesn't have a fileUploaded() event — the upload UI is unbuilt, so adding the analytic event is up to whoever lands the UI.) Pre-existing TS error on line 46 of images.svelte.ts (the `toggleField(imageTable(), ...)` Drizzle/Dexie type variance bug) is unchanged — it predates Phase 9 and is not introduced by this commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:52:20 +02:00
Till JS	05ae348b12	fix(macmini): blackbox-exporter uses 1.1.1.1/8.8.8.8 directly for DNS Docker's embedded DNS resolver (127.0.0.11) forwards to the host resolver, which on the Mac Mini forwards to the home router's FRITZ!Box DNS. The router keeps a stale negative cache for hours after a hostname first fails, so any newly added Cloudflare CNAME (e.g. the GPU public hostnames recreated via the Cloudflare dashboard during the 2026-04-07 cleanup) appears as "no such host" to the blackbox probes for the entire negative-cache TTL — even though the hostname resolves fine via 1.1.1.1 directly the entire time. Symptom before the fix: health-check.sh (uses dig @1.1.1.1) → All services healthy ✅ status.mana.how (via blackbox/VM) → 4 GPU services down ❌ The two views were lying to each other in opposite directions — the public-facing status page reported four healthy services as down while the operator runbook reported them as up. Confusing and exactly the kind of monitoring discrepancy a launch should not ship with. Fix: pin the blackbox container to public DNS (Cloudflare + Google) in compose. Blackbox now resolves directly against 1.1.1.1, bypassing the home-router negative cache entirely. After the recreate the four GPU probes flipped from probe_success=0 to probe_success=1 within one scrape interval, and status.mana.how went from 38/42 to 41/42 (only gpu-video remains down — LTX Video Gen is intentionally not deployed on the Windows GPU box yet). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:47:57 +02:00
Till JS	24001e9545	feat(vault): rotate recovery code while zero-knowledge is active Closes backlog #2 from the Phase 9 audit. Lets a user replace their recovery code without going through the disable→generate→re-enable dance. Works in BOTH standard and zero-knowledge modes. vault-client ------------ New rotateRecoveryCode() method on the VaultClient interface. Returns RecoveryCodeSetupResult, identical shape to setupRecoveryCode. Branches on the current vault state via getStatus(): Standard mode: Re-fetches the plaintext MK from the server (same path as the initial setupRecoveryCode), generates a fresh 32-byte recovery secret, derives the new wrap key via HKDF, seals the MK, posts the wrap to /recovery-wrap (idempotent server-side, replaces the existing row in place). Zero-knowledge mode: Server can't hand out the plaintext MK any more, so we use the cachedUnwrappedMkBytes that unlockWithRecoveryCode stashed when the user typed in their old recovery code earlier this session. Throws with a clear message if the cache is empty (e.g. user landed on the page via init rather than recovery-unlock): "sign out and back in with your current recovery code first" so the cache gets repopulated. Both branches: - Wipe the raw MK reference after sealing - Wipe the recovery secret after format - Return the formatted code for the UI to display The OLD recovery code is now permanently invalid. Using it on a future unlock attempt will fail with the standard generic "wrong recovery code" error. Settings UI ----------- New rotateStep state machine ('idle' / 'rotated') runs alongside the existing zkSetupStep so the user can rotate without leaving the active-state UI. In the active-mode card (zkSetupStep === 'enabled'): - Two side-by-side buttons: "🔁 Recovery-Code rotieren" + "Zero-Knowledge-Modus wieder deaktivieren …" - When the user clicks rotate, handleRotateRecoveryCode() runs the flow and renders an inline "Neuer Recovery-Code" subsection (same .recovery-code monospace block + Copy button as the initial setup) with explicit warning that the old code is now invalid. - "Ich habe den neuen Code gesichert" button wipes the displayed code and drops back to idle. - The disable flow stays available (the rotate UI hides itself when the user has clicked into the disable confirmation path). The 28 vault integration tests still pass (39 total in encryption-vault/, including the existing 11 KEK tests). The new rotateRecoveryCode method reuses the already-tested setRecoveryWrap server endpoint, so no new server-side tests are needed for this commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:43:10 +02:00
Till JS	c2c960121e	test(mana-auth): vault service integration tests against real postgres Closes backlog #1 from the Phase 9 audit. Adds 28 integration tests for the EncryptionVaultService against a real Postgres so the RLS policies, CHECK constraints and audit-row writes are exercised as the production app actually sees them. The pure-crypto KEK tests in kek.test.ts already covered the wrap/unwrap primitives — this new file fills in the service-shaped gaps that need a real DB. Test infrastructure ------------------- - Reads TEST_DATABASE_URL from env. Whole suite is SKIPPED via describe.skip if unset, so unrelated CI runs and `bun test` from a fresh checkout don't fail on missing connection. The encryption-vault sub-job has to provision a Postgres explicitly. - Schema is assumed already migrated (run `pnpm db:push` or apply sql/002 + sql/003 manually before invoking the suite). Tests insert a fresh test user per case via beforeEach so cross-test pollution is impossible despite the FK to auth.users. - afterAll cleans up the user (CASCADE wipes vault + audit) and closes the postgres pool so bun test exits cleanly. Coverage -------- init (3): - Mints a fresh vault, wrapped_mk + wrap_iv populated, ZK off - Idempotent (returns same key) - Audit rows are written getStatus (5): - vaultExists=false for unconfigured user - vaultExists=true after init, no recovery wrap - hasRecoveryWrap=true after setRecoveryWrap - zeroKnowledge=true after enableZK - Does NOT write an audit row (cheap metadata read) setRecoveryWrap (4): - Stores wrap on existing vault - VaultNotFoundError on missing vault - Idempotent (replaces previous wrap) - Writes recovery_set audit row clearRecoveryWrap (3): - Removes the wrap - ZeroKnowledgeActiveError when ZK is on - VaultNotFoundError on missing vault enableZeroKnowledge (4): - Flips zero_knowledge=true and NULLs out wrapped_mk + wrap_iv - RecoveryWrapMissingError if no recovery wrap is set - Idempotent (already-on is no-op) - VaultNotFoundError on missing vault disableZeroKnowledge (2): - Restores wrapped_mk from a client-supplied master key, verifies the round-trip via getMasterKey returns the same bytes - No-op when ZK is already off getMasterKey (3): - Returns unwrapped MK in standard mode - Returns recovery blob with requiresRecoveryCode=true in ZK mode - VaultNotFoundError on missing vault rotate (2): - Mints fresh MK and wipes any existing recovery wrap - ZeroKnowledgeRotateForbidden in ZK mode DB-level invariants (2): - Setting wrapped_mk back while ZK active is rejected by encryption_vaults_zk_consistency - Setting wrap_iv to NULL while wrapped_mk is set is rejected by encryption_vaults_wrap_iv_pair Both wrap the Drizzle update in an arrow IIFE so expect(...).rejects.toThrow() sees a real Promise (Drizzle's chainable update() only executes on await/then). Run results ----------- With TEST_DATABASE_URL set + schema migrated: 28 pass, 0 fail, 64 expect() calls Without TEST_DATABASE_URL set (default): 0 pass, 30 skip (full suite cleanly skipped) KEK tests in kek.test.ts still run unaffected. Drive-by: kek.test.ts header comment updated to point at the new sibling file instead of saying "tests will live alongside mana-sync" (which was outdated speculation from Phase 2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:39:48 +02:00
Till JS	ea165c8b46	docs(audit): roll up Phase 9 in DATA_LAYER_AUDIT.md Marks the Zero-Knowledge opt-in as live and documents the new architecture surface so future readers can understand the trust model without spelunking through six commits. Updates ------- 1. Sprint table grows from Phase 1–8 to Phase 1–9, adds the six new commits (4 milestones + 2 follow-ups: status endpoint + lock-screen modal). Test count bumped from 262 to 284 (22 new in recovery.test.ts). 2. Section 5 "Encryption Pipeline" reworked: - "Wer hält was?" now has TWO tables — Standard-Modus and Zero-Knowledge-Modus — making the trust model difference explicit - New "Recovery-Code-Pipeline" subsection with two ASCII flow diagrams (setup + unlock) showing every step from "user clicks button" to "MK in MemoryKeyProvider" - New "Schlüssel- + Datei-Kette für Phase 9" table mapping each code path to its file 3. "Was Mana technisch (nicht) sehen kann" rewritten to compare both modes side by side. Standard mode keeps the existing "theoretically decryptable by KEK operator" disclosure; zero-knowledge mode is upgraded to a hard "computationally incapable" guarantee — and the trade-off ("Recovery-Code lost = data lost") is called out explicitly. The DB CHECK constraint that enforces "ZK active ⇒ recovery wrap exists" is mentioned as the schema-level safety net. 4. Backlog reordered. Phase 9 is no longer listed as an open item; the only true-zero-knowledge follow-up is now item #1 (service tests against real Postgres for the four new vault methods, analogous to the existing kek.test.ts pattern but needing a container DB). Items 2–8 are unchanged from the previous roundup. 5. Eckdaten + Best Practices + final production-grade summary all reflect the new ZK opt-in. Schwachstelle #4 row updated to "Phase 1–9". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:28:06 +02:00
Till JS	a48b2d5841	feat(layout): lock-screen recovery code unlock modal Closes the second Phase 9 follow-up. When a user has zero-knowledge mode active and signs in on a new device (or after a session expiry), the layout's vault-unlock effect lands in the new 'awaiting-recovery-code' state. Previously this was a dead end — the layout just logged a warning and the rest of the app sat with a locked vault. This commit adds the missing UI piece: a non-dismissable modal that mounts whenever the unlock effect signals 'awaiting-recovery-code'. RecoveryCodeUnlockModal component --------------------------------- - Reads the singleton vault client via getVaultClient() - Single text input + submit button - On submit: 1. Calls vaultClient.unlockWithRecoveryCode(input) 2. On success: clears input, calls onUnlocked() prop → parent hides the modal, app boots normally 3. On RecoveryCodeFormatError: shows a format hint 4. On any other error (wrong code OR corrupted blob — surfaced uniformly so an attacker can't distinguish): shows "Recovery-Code falsch, prüfe deine Eingabe" - Non-dismissable: there's no Cancel button. Without the recovery code the app cannot read encrypted data and would just sit in a half-broken state. The user can sign out from the header (the auth flow runs above the encryption layer) if they need to bail. - Help text at the bottom is honest about the irreversible nature of losing the recovery code. Layout integration ------------------ +layout.svelte: - Imports the modal - New `needsRecoveryCode = $state(false)` flag - The vault-unlock effect now switches on three branches instead of just success/failure: 'unlocked' → needsRecoveryCode = false 'awaiting-recovery-code' → needsRecoveryCode = true (mount modal) anything else → console.warn (unchanged) - Logout path also resets needsRecoveryCode so the modal doesn't leak across sessions - {#if needsRecoveryCode} mounts the component at the bottom of the markup (above the existing global toasts and banners) The autofocus warning is suppressed via svelte-ignore — the input needs immediate focus because it's the only thing the user can interact with on this surface, and screen-reader users will hear the modal's accessible name from the role="dialog" + aria-labelledby binding. End-to-end smoke flow that now works: 1. User goes to /settings/security on Device A, enables ZK 2. User signs out, signs back in on Device B 3. Layout effect calls vaultClient.unlock() → server returns recovery blob → vaultClient state goes to awaiting-recovery-code 4. Modal mounts, user pastes their recovery code from password manager 5. unlockWithRecoveryCode runs the inline AES-GCM unwrap, imports the MK as non-extractable, caches the bytes for a future disable, transitions to 'unlocked' 6. Modal calls onUnlocked → layout dismisses modal → rest of the app boots and renders decrypted data Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:24:32 +02:00
Till JS	78d949d051	feat(crypto): vault status endpoint + settings page hydration Closes the Phase 9 Milestone 4 known limitation where the settings page always started in 'idle' state regardless of whether the user had already enabled zero-knowledge mode. Adds a cheap server-side status read + hydrates the page on mount. Server side ----------- New VaultStatus interface and getStatus(userId) method on EncryptionVaultService — single SELECT against encryption_vaults, no decryption, no audit logging (this gets called on every settings page mount and we don't want to flood the audit log with read-only metadata fetches). Returns sane defaults when the vault row doesn't exist yet so the client can avoid a 404 dance. GET /api/v1/me/encryption-vault/status → { vaultExists: boolean, hasRecoveryWrap: boolean, zeroKnowledge: boolean, recoverySetAt: string \| null } Client side ----------- vault-client.ts gains a `getStatus()` method that bypasses the fetchVault retry helper (status reads should be cheap and one-shot; if they fail we let the caller fall back to defaults). Re-exports VaultStatus + RecoveryCodeSetupResult from the crypto barrel. settings/security/+page.svelte ------------------------------ onMount kicks off a getStatus() call. Two things change based on the response: 1. If the server says zero_knowledge=true, jump zkSetupStep to 'enabled' so the page renders the active-state UI directly instead of the setup flow. 2. New `hasRecoveryWrap` state tracks whether a wrap is stored, even if ZK isn't active yet. The idle branch now has TWO variants: - hasRecoveryWrap=false: original "Recovery-Code einrichten" single button (unchanged from milestone 4) - hasRecoveryWrap=true: amber notice "you have a code stored but ZK isn't active" with three buttons: * "Zero-Knowledge jetzt aktivieren" (jumps straight to the enable call) * "Neuen Recovery-Code generieren" (rotates the wrap) * "Recovery-Code entfernen" (with two-click confirmation, calls DELETE /recovery-wrap) This handles the previously-orphaned state where a user generated a code, copied it to their password manager, but never confirmed the final activation step. Without this branch, after a reload the settings page would show "Setup" again and the call would fail with "vault is already in zero-knowledge mode" — except it wouldn't, because the vault wasn't actually in ZK yet, just had a recovery wrap stored. Either way the state was confusing. handleSetupRecoveryCode + handleClearRecoveryCode now keep hasRecoveryWrap in sync after the round trip. Fail-quiet on getStatus error: if the network/auth/server-side fetch fails, the page stays at the idle default. The user can still run the setup flow, and any inconsistencies surface via the usual server-side error responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:19:49 +02:00
Till JS	56312ff579	feat(settings): phase 9 milestone 4 — zero-knowledge UI section Adds the user-facing setup + management surface for the Phase 9 recovery code + zero-knowledge opt-in. Lives in /settings/security between the Rotate and Honest-disclosure cards. Three-step setup flow --------------------- Step 1 — Generate Single button "Recovery-Code einrichten". Disabled unless the vault is currently unlocked. Clicks call vaultClient.setupRecoveryCode() which mints a fresh 32-byte secret, derives the wrap key, posts the sealed wrap to /recovery-wrap, and returns the formatted code. Step 2 — Display + copy Shows the formatted code (1A2B-3C4D-...) in a monospace, user- selectable block with a 📋 Copy button. Explicit warning: "Wir zeigen ihn dir nur ein einziges Mal." User clicks "Ich habe den Code gesichert" to advance. Step 3 — Confirm User has to type (or paste) the code back into a verification input. Comparison is case-insensitive and ignores dashes/whitespace on both sides so format jitter doesn't punish them. Mismatch shows a clear inline error and stays in the same step. Step 4 — Activate Final danger confirmation: "Wenn du jetzt aktivierst, löscht der Server seine Kopie deines Schlüssels." Click → vaultClient. enableZeroKnowledge() → server NULLs out wrapped_mk + wrap_iv, state flips to 'enabled', generatedCode is wiped from the closure. Active state ------------ After enable, the section shows a green "✅ Zero-Knowledge-Modus aktiv" panel with a "Disable" button. Disabling needs an unlocked vault (the cached MK bytes from the recovery-code unlock get sent back to the server for KEK re-wrapping). Two-click confirmation guards the destructive call. State machine ------------- zkSetupStep: 'idle' → 'generated' → 'confirming' → 'enabling' → 'enabled' plus a `handleResetSetup` escape that clears the in-flight code + input + error and drops back to 'idle' from any step. Known limitation: the page state doesn't survive a reload — there is no GET /encryption-vault/status endpoint yet to query the server's current zero_knowledge flag, so on a fresh page load we always start at 'idle' regardless of whether ZK is actually on. A future commit will add the status endpoint + an onMount call to hydrate zkSetupStep correctly. For now, the existing 'awaiting-recovery-code' badge from milestone 3 covers the lock- screen path, and the dashboard sets the right initial state at unlock time. Status badge fix from milestone 3 (statusBadge() handling the new 'awaiting-recovery-code' variant) is reused here. Styles ------ .zk-error — light red bordered alert for inline errors .zk-actions — flex row of buttons (wraps on mobile) .zk-step — bordered group with the step heading .recovery-code — monospace, user-select:all so click+copy works .recovery-input — monospace input for the confirm step .btn-ghost — transparent border-less variant for "Abbrechen" Dark-mode handling for the new surfaces is in the existing media query block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 23:03:35 +02:00

1 2 3 4 5 ...

2654 commits