Commit graph

2635 commits

Author SHA1 Message Date
Till JS
079cc39dbc refactor(mana/web): extract shared <VoiceCaptureBar> for module voice capture
Dreams and Memoro had two literal copies of the MediaRecorder boilerplate
plus parallel mic-button markup, error UI, and requireAuth gating. Lift
the recorder + bar into $lib/components/voice and add it to the memoro
workbench ListView (which had no mic at all). New voice-capture features
just drop in <VoiceCaptureBar> with idleLabel/feature/reason/onComplete.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:51:22 +02:00
Till JS
0d1d3b9449 fix(mana-auth): declare missing nanoid dependency
mana-auth has been crash-looping in production with:

    error: Cannot find package 'nanoid' from
    '/app/src/services/encryption-vault/index.ts'

The encryption-vault service imports nanoid for audit row IDs (line 27,
used at line 547 in the audit log writer), but nanoid was never added
to services/mana-auth/package.json. The import was introduced in commit
e9915428c (phase 2 — server-side master key custody) and slipped past
because nanoid happens to exist transitively in the workspace via
postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just
fine; a strict isolated container build can't.

Fix:
- Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps
- pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules

Verified the import resolves locally:
    bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())'
    → ok: 6TLuTWlenhC0KnSESn5Ex

The Mac Mini still needs to redeploy mana-auth (rebuild image with the
new lockfile, restart container) to pick this up — production is
currently 502ing on auth.mana.how.
2026-04-08 15:50:14 +02:00
Till JS
f5678268ff chore(deps): reconcile pnpm-lock with package.json drift
The lockfile had drifted out of sync with two package.json files:

- services/mana-events/package.json declared drizzle-orm, hono, jose,
  postgres, zod, drizzle-kit, typescript — but mana-events was never
  registered as an importer in pnpm-lock.yaml at all. A frozen-lockfile
  install would fail.
- apps/mana/apps/web/package.json had "postgres": "^3.4.9" as a
  devDependency that the lockfile hadn't picked up.

Both are already declared in their package.json — this commit just
locks them in. No new top-level dependencies are introduced.

The rest of the diff is non-substantive churn from running pnpm install
(jiti peer-version flips between 1.21.7 ↔ 2.6.1, expo-font peer
specifier format becoming more explicit). Net diff is −102 lines
despite registering two new importers, because the peer-format
verbose-ification deduplicates a few entries.
2026-04-08 15:41:14 +02:00
Till JS
45958ad885 feat(mana/web): global requireAuth() gate for guest-blocked features
The unified Mana app runs most modules in a "guest mode": you can
open a module, look around, type a quick note, etc. without an
account. But anything that touches an *encrypted* table (dreams
voice capture, memoro recordings, notes, todo, calendar events, …)
needs the user to be logged in — the encryption vault only unlocks
against a Mana Auth session, and writing to those tables without
it throws `VaultLockedError` at the very last step of the action.

Before this commit, every entry point into an encryption-required
action would silently let the guest go through the whole flow
(record audio, wait for transcription, open the dexie write) and
then explode with a stack-trace error. The user lost work and
didn't know why. The dreams voice capture flow surfaced this
during the 2026-04-08 STT debugging session.

The fix is a global imperative gate: `requireAuth({ feature, reason })`.
Call sites await it before the action; it returns immediately if the
user is already authenticated, otherwise pops a global modal that
asks the guest to log in or cancel. Promise-based, so callers
decide what to do with `false` (silent abort, restore state, own
toast).

  $lib/auth/require-auth.svelte.ts          new — store + helper
  $lib/components/auth/AuthRequiredModal.svelte  new — global modal
  routes/+layout.svelte                     mount the modal once
  packages/shared-utils/src/analytics.ts    new ManaEvents.featureBlockedByAuth
                                            event for conversion tracking

Wired into the two voice-capture entry points that actually exhibited
the bug:

  modules/dreams/ListView.svelte  → feature: 'dreams-voice-capture'
  routes/(app)/memoro/+page.svelte → feature: 'memoro-voice-capture'

Both gate on `requireAuth()` BEFORE the mic permission request, so
guests see the friendly "Konto erforderlich" modal instead of
recording → transcribing → crashing.

Design choices documented in detail in the require-auth.svelte.ts
header comment:
  - Imperative function (not a button wrapper component) so it
    works in event handlers, store actions, keyboard shortcuts,
    drag-drop handlers — anywhere async code runs.
  - Single global modal mounted once in the root layout, no
    portal/z-index gymnastics; two simultaneous prompts replace
    each other (the most recent one wins).
  - Checks `authStore.isAuthenticated`, not vault-unlocked state —
    the user-facing concept is "I need an account", not "I need
    a working encryption vault". Vault-unlock failures (network
    error etc.) are a separate bug class with their own UX.
  - The modal navigates to `/login?next=<current path>` so the
    user lands back on the same page after logging in. The
    Promise resolves `false` on navigation; the user re-clicks
    the original button after coming back, and the second click
    sees `isAuthenticated === true` and proceeds without a modal.
    Re-triggering the original action across a navigation cycle
    would require restoring half-recorded mic state — not worth
    the complexity, and the second click is a clean UX.

How to wire a new entry point (4 lines):

    import { requireAuth } from '$lib/auth/require-auth.svelte';

    async function handleCreateThing() {
      const ok = await requireAuth({
        feature: 'create-thing',
        reason: 'Things werden verschlüsselt gespeichert. Dafür brauchst du ein Mana-Konto.',
      });
      if (!ok) return;
      // ...existing logic
    }

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:36:38 +02:00
Till JS
2b4494628e fix(mana/web): unblock voice capture — permissions policy, notification mount, dev SW
Three independent bugs that conspired to make the dreams + memoro mic
buttons completely unusable in production AND in dev. Each one alone
would have been the only blocker; they layered on top of each other so
fixing the top one just exposed the next.

1. Permissions-Policy header blocked the microphone API entirely.
   `packages/shared-utils/src/security-headers.ts` set
   `microphone=()` which means "no origin, including self, may use
   the microphone". `getUserMedia()` throws a `Permissions policy
   violation` and the browser never even shows the permission
   dialog — no amount of OS / browser / site settings can override
   it because the policy blocks the API at the document level.
   Fix: change to `microphone=(self)` so mana.how itself can use
   the API. Camera stays disallowed (no module needs it).

2. Notification permission was requested at layout mount time.
   `(app)/+layout.svelte` called
   `notificationService.requestPermission()` from `onMount()`. Modern
   browsers require permission requests to come from a user gesture
   — calling it without one queues the prompt until the next click.
   That meant the user's FIRST click on any button (in this case the
   dreams "Traum sprechen" mic button) showed the queued notifications
   prompt instead of the action they actually clicked. Worse,
   `getUserMedia()` was then silently dropped because Chrome only
   shows one permission dialog at a time.
   Fix: remove the mount-time call entirely. Notification permission
   must be requested from a button the user explicitly clicks
   ("Benachrichtigungen aktivieren" toggle in Settings or first time
   a reminder is created) — the reminder scheduler still runs without
   permission, it just won't fire OS notifications until granted.

3. vite-plugin-pwa registered a service worker in dev that cached
   the old layout chunks across reloads, so the fix for #2 was
   invisible until the user manually unregistered the SW in DevTools.
   `vite-plugin-pwa` defaults `devEnabled: true`, which is a
   well-known footgun for fast iteration. Production still gets the
   full SW (this only flips dev). The 2026-04-08 mic-button hunt
   took an extra hour for exactly this reason.
   Fix: pass `devEnabled: false` to createPWAConfig in vite.config.ts.

Verified: in a fresh incognito tab on `localhost:5173/`, opening the
Dreams app in the workbench and clicking the mic button now shows the
microphone permission dialog directly (no notifications hijack), and
recording → transcription works end-to-end against the production
mana-stt service on the GPU box.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 15:36:03 +02:00
Till JS
4cb1bc1827 fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes
mana-voice-bot's source default was 3050, which collided with mana-sync.
Today the collision is latent (voice-bot isn't deployed anywhere), but
sooner or later someone is going to start it on a host that's already
running mana-sync and the second one will refuse to bind. Moving to
3024 puts it inside the AI/ML port range alongside its dependencies
(stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync.

Updated:
- app/main.py — PORT default 3050 → 3024
- start.sh, setup.sh — same fix in the example commands
- CLAUDE.md — full rewrite. Old version described "Mac Mini deployment"
  with launchd; the new version explicitly says "not deployed yet" and
  documents the seven concrete steps to deploy on the Windows GPU box
  alongside the other AI services (Scheduled Task, service.pyw, .env,
  firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update).

docs/WINDOWS_GPU_SERVER_SETUP.md:
- Added the missing ManaVideoGen scheduled task to all four
  Start-ScheduledTask snippets — video-gen has been running on the
  Windows GPU but the doc had never picked it up.
- Added a "mana-video-gen (Port 3026)" service section parallel to the
  existing image-gen one, with venv path, repo pointer, model, etc.
- Added a repo-pendants table mapping C:\mana\services\<svc>\ to the
  corresponding services/<svc>/ directory in the repo, plus a note that
  changes should flow repo→Windows, not the other way around.

docs/PORT_SCHEMA.md:
- Reconciled the warning block with the post-cleanup reality: no more
  active or latent port collisions (image-gen ↔ video-gen and
  voice-bot ↔ sync are both resolved). Listed the actual ports per host
  with public URLs. Kept the planned-vs-actual disclaimer for the
  services that still don't match the aspirational ranges (mana-credits
  3061 vs planned 3002, etc).
2026-04-08 13:14:57 +02:00
Till JS
f4347032ca chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.

Removed (Mac-Mini deployment infrastructure):

services/mana-stt/
- com.mana.mana-stt.plist            (LaunchAgent)
- com.mana.vllm-voxtral.plist        (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh                 (single-service launchd installer)
- install-services.sh                (mana-stt + vllm-voxtral installer)
- setup.sh                           (Mac arm64 installer)
- scripts/setup-vllm.sh              (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh

services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh                           (Mac arm64 installer)

scripts/mac-mini/
- setup-image-gen.sh                 (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist

setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.

Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
  Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
  matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
  Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
  with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
  CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
  list to mention the now-removed plists, added the full GPU service
  port table with public URLs, added a cleanup snippet for any old plists
  still installed on a Mac Mini somewhere
2026-04-08 13:06:40 +02:00
Till JS
c7b4388cec feat(mana-image-gen): replace Mac flux2.c implementation with Windows GPU diffusers
The repo's mana-image-gen used to be a Mac Mini–only service built on
flux2.c with hard MPS+arm64 platform checks. The actual production
image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace
diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code
that lived only at C:\mana\services\mana-image-gen\ on the GPU box.

This commit pulls the Windows implementation into the repo and deletes
the Mac one, so there's exactly one mana-image-gen and its source of
truth is git rather than one folder on one machine.

Removed:
- setup.sh — Mac-only flux2.c installer with hard arm64 platform check
- app/main.py (Mac flux2.c subprocess wrapper version)
- app/flux_service.py (Mac flux2.c subprocess wrapper version)

Added (pulled from C:\mana\services\mana-image-gen\):
- app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup)
- app/flux_service.py — diffusers FluxPipeline wrapper
- app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY)
- app/vram_manager.py — shared VRAM accounting
- service.pyw — Windows runner used by the ManaImageGen scheduled task

Updated:
- main.py PORT default from 3025 → 3023 to match the production reality
  (the service.pyw runner already binds 3023 explicitly via uvicorn.run,
  but the source default should match so direct uvicorn invocations and
  local tests don't pick the wrong port)
- CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack
- README.md trimmed to a pointer at CLAUDE.md + the public URL
- .env.example written from scratch (didn't exist before — the service's
  .env on the GPU box was undocumented)

The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the
actual Mac Mini deployment will be cleaned up in the next commit, along
with the rest of the Mac-Mini AI service infrastructure.
2026-04-08 13:02:42 +02:00
Till JS
b8e18b7f82 chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts
The Windows GPU server has been the actual production home for these
services for some time, and the running code there has drifted ahead of
the repo. This sync pulls the live versions back into the repo so the
Windows box is no longer the only place those changes exist.

Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11):

mana-llm:
- src/main.py, src/config.py — small fixes (auth wiring, config tweaks)
- src/api_auth.py — NEW (cross-service GPU_API_KEY validator)
- service.pyw — Windows runner used by the ManaLLM scheduled task
  (sets up logging redirect, loads .env, calls uvicorn)

mana-stt:
- app/main.py — substantial cleanup (684→392 lines), drops the
  whisperx-as-separate-backend branching now that whisper_service.py
  rolls whisperx in directly
- app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines)
- app/auth.py + external_auth.py — significantly expanded auth
- app/vram_manager.py — NEW (shared VRAM accounting helper)
- service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH
  injection, .env loading
- removed: app/whisper_service_cuda.py (folded into whisper_service.py)
- removed: app/whisperx_service.py (folded into whisper_service.py)

mana-tts:
- app/auth.py, external_auth.py — same auth expansion as stt
- app/f5_service.py, kokoro_service.py — Windows tweaks
- app/vram_manager.py — NEW (same shared helper as stt)
- service.pyw — Windows runner

mana-video-gen:
- service.pyw — Windows runner (no other changes; the .py code on the
  GPU box is byte-identical to what's already in the repo)

The service.pyw files contain absolute Windows paths
(C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user
profile. Kept as-is intentionally — they exist to be deployed to that
one machine and any abstraction layer would just hide what's actually
happening. Anyone redeploying to a different layout will need to edit
the path strings, which is a known and obvious change.

Mac-Mini infrastructure for these services (launchd plists, install
scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen
implementation) is still on disk and will be removed in a follow-up
commit, along with replacing mana-image-gen with the Windows
diffusers+CUDA implementation. This commit is just the live-code sync.
2026-04-08 12:46:03 +02:00
Till JS
abe0a21966 refactor(auth-ui): tighten LoginPage UX, a11y, and dead code
Some checks are pending
CI / Build mana-crawler (push) Blocked by required conditions
CI / Build mana-media (push) Blocked by required conditions
CI / Build mana-credits (push) Blocked by required conditions
CI / Build mana-web (push) Blocked by required conditions
CI / Build chat-backend (push) Blocked by required conditions
CI / Build chat-web (push) Blocked by required conditions
CI / Build todo-backend (push) Blocked by required conditions
CI / Build todo-web (push) Blocked by required conditions
CI / Build calendar-backend (push) Blocked by required conditions
CI / Build calendar-web (push) Blocked by required conditions
CI / Build clock-web (push) Blocked by required conditions
CI / Build contacts-backend (push) Blocked by required conditions
CI / Build contacts-web (push) Blocked by required conditions
CI / Build presi-web (push) Blocked by required conditions
CI / Build storage-backend (push) Blocked by required conditions
CI / Build storage-web (push) Blocked by required conditions
CI / Build telegram-stats-bot (push) Blocked by required conditions
CI / Build nutriphi-backend (push) Blocked by required conditions
CI / Build nutriphi-web (push) Blocked by required conditions
CI / Build skilltree-web (push) Blocked by required conditions
CI / Build mana-matrix-bot (Go) (push) Blocked by required conditions
Docker Validate / Validate Dockerfiles (push) Waiting to run
Docker Validate / Build calendar-web (push) Blocked by required conditions
Docker Validate / Build todo-backend (push) Blocked by required conditions
Docker Validate / Build todo-web (push) Blocked by required conditions
Docker Validate / Build zitare-web (push) Blocked by required conditions
Docker Validate / Build mana-auth (push) Blocked by required conditions
Docker Validate / Build mana-sync (push) Blocked by required conditions
Docker Validate / Build mana-media (push) Blocked by required conditions
Mirror to Forgejo / Push to Forgejo (push) Waiting to run
LoginPage cleanup:
- Drop dev pre-fill credentials and the secret logo-as-button trick
- Remove duplicate in-component theme toggle; accept isDark as a prop and let the (auth) layout's global theme toggle drive it
- Move passkey CTA below the password form so the primary flow stays primary
- Remove the dead "Angemeldet bleiben" checkbox (was bound but never forwarded to onSignIn)
- Fix the skip-to-form link to use sr-only/focus:not-sr-only so it only appears on keyboard focus
- Fix the "oder" divider to render its before/after hairlines by setting an explicit color on the parent
- Wire focus-visible outlines on all interactive controls
- Bump 0.6 → 0.75 opacity on subtitle text for AA contrast
- Drop opacity-60 from the headerControls wrapper

Robustness:
- Track all setTimeout IDs in a Set and clear them in an effect cleanup so navigation away doesn't fire stale callbacks (success redirects, error shake, focus restore)
- Replace (result as any) casts with the new typed AuthResult fields
- New resolveErrorCode() helper prefers result.errorCode and falls back to legacy string matching, so rate-limit / account-lock detection survives i18n
- WebAuthn Conditional UI: on mount, if PublicKeyCredential.isConditionalMediationAvailable(), call onSignInWithPasskey({ conditional: true }) so passkeys appear inline in the email autofill dropdown
- Extract the dismissible success-banner markup into a {#snippet successBanner} and reuse it for the verified / verification-sent / magic-link-sent cases (~50 lines of duplicate JSX out)

Page wrappers:
- login/+page.svelte passes isDark={theme.isDark} so the in-app theme store drives both layouts
- register/+page.svelte wraps trackGuestConversion() in queueMicrotask + try/catch so analytics can never block the success redirect
- Drop the dead baseSignupCredits={25} prop from register/+page.svelte (RegisterPage never accepted it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:41:19 +02:00
Till JS
ff7dc5d875 feat(auth): structured error codes + conditional passkey UI
- Add AuthErrorCode union and typed twoFactorRedirect/retryAfter fields on AuthResult so the frontend can branch on stable codes instead of locale-dependent error strings.
- Extend signInWithPasskey with an optional { conditional } flag, threaded through to @simplewebauthn/browser via useBrowserAutofill, so hosts can opt into WebAuthn Conditional UI (passkey suggestions inline in the email autofill dropdown).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:40:51 +02:00
Till JS
3c91691d26 fix(mana-image-gen): align source default port with production reality
Source default was 3026 but Mac Mini production has been overriding to
3025 via the launchd plist in scripts/mac-mini/setup-image-gen.sh ever
since the service was set up. The override existed in exactly one place
that is not version-controlled in any obvious way — anyone redeploying
without that script would land on 3026 and clients pointing at 3025
would fail to connect.

Source default → 3025 across main.py, setup.sh, README, CLAUDE.md so the
launchd plist is no longer load-bearing. The Mac Mini setup script still
sets PORT=3025 explicitly; that's now belt-and-suspenders rather than the
only thing keeping production alive.

Also added a note clarifying that this Mac Mini service (flux2.c, MPS,
arm64-only) is *not* the same thing as the "image-gen" running on the
Windows GPU server (PyTorch + diffusers + CUDA, port 3023, code lives at
C:\mana\services\mana-image-gen\ outside this repo). Two different
implementations sharing a name was confusing the port-collision audit.

Updated docs/PORT_SCHEMA.md warning block to retract the previous false
claims of two active port collisions:

  - image-gen ↔ video-gen on 3026 — wrong: image-gen runs on Mac Mini
    on 3025 (now also the source default), video-gen is alone on the
    Windows GPU on 3026
  - voice-bot ↔ sync on 3050 — latent only: mana-voice-bot is not
    deployed anywhere (no launchd, no scheduled task, no cloudflared
    route), so the collision is in source defaults but not in production

The voice-bot 3050 default should still be moved before voice-bot is
ever deployed — flagged in the PORT_SCHEMA warning instead of silently
fixed since voice-bot deployment is its own decision.
2026-04-08 12:30:33 +02:00
Till JS
b0a08ce239 docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions
New service docs:
- services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local),
  WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy
  backend loading and the launchd plist setup on the Mac Mini.
- services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and
  event-sharing. Documents the host (JWT) vs public (token) split, the
  rate-limit sweeper, and the createApp factory pattern that lets unit
  tests run without bootstrapping the production sweeper.

Stale entries fixed:
- mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the
  rewrite is the only mana-auth there is now. Email channel updated from
  Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md).
- mana-notify: same Brevo → Stalwart fix in the channel table and env
  var defaults.

PORT_SCHEMA.md flagged as aspirational:
- The doc was dated 2026-03-28 and presented as "single source of truth",
  but cross-checking against actual service source files (config.go,
  main.py, start.sh) shows nothing matches. Added a prominent warning at
  the top with the real ports + two confirmed collisions:
  * mana-image-gen and mana-video-gen both default to PORT 3026
  * mana-voice-bot and mana-sync both default to PORT 3050
  Today these are masked because image-gen + voice-bot live on the
  Windows GPU server while video-gen + sync live on the Mac Mini, but
  the moment they share a host they collide. Either execute the planned
  reorg or pick non-colliding ports and rewrite the doc to match
  reality — flagged as a real follow-up.
2026-04-08 12:23:48 +02:00
Till JS
a3a47459c6 docs(audit): file-bytes encryption implementation plan + audit roll-up
Two changes:

1. New BACKLOG_FILE_BYTES_ENCRYPTION.md captures everything I'd
   want to know if I were picking up the file-bytes encryption
   work cold in 6 months. ~370 lines, sits next to
   DATA_LAYER_AUDIT.md for discoverability.

   Sections:
   - TL;DR + status (deferred, no production impact yet)
   - Goal + non-goals
   - Threat model delta table (mode-by-mode)
   - Architecture: write path with ASCII flow diagram
   - Architecture: read path with ASCII flow diagram
   - The six hard parts:
     1. Web Crypto AES-GCM doesn't stream → chunked-AEAD wrapper
     2. Multipart uploads need coordinated chunking (S3 5 MB minimum
        vs. our 1 MB AES-GCM chunks)
     3. Resumable uploads + key persistence (new _pendingUploads
        table for the in-flight content key)
     4. No more server-side thumbnails (three options, recommended:
        client-side resize before upload)
     5. Sharing complicates the trust model (URL-fragment key
        sharing, recommended; Mega.nz / Cryptpad pattern)
     6. Migration of existing plaintext files (lazy on-read,
        recommended)
   - Schema delta (sql + Dexie additions)
   - File map (~2200 LoC across 9 new files + 3 touched)
   - Testing strategy (unit + integration + e2e per layer)
   - Out-of-scope items explicitly listed
   - Decision criteria for when to actually do this
   - Five open questions for whoever picks it up
   - Cross-references to related files

   The doc is opinionated where I have a defensible recommendation
   and explicit about uncertainty where I don't.

2. DATA_LAYER_AUDIT.md updates:

   - Backlog "Offen" item #1 (File-Bytes-Encryption) now points
     directly at the new plan doc with a one-line teaser.
   - Backlog "Abgeschlossen" gains a row C for the Conflict
     Visualization UI shipped in ed8ab4483 (was still listed as
     open from the previous audit roll-up).
   - List renumbered: Conflict-UI dropped from "Offen", remaining
     items shifted up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:17:15 +02:00
Till JS
5581295b12 chore: tidy root files + reorganize a few stale docs
Root file cleanup:
- mac-mini-setup.sh → scripts/mac-mini/bootstrap.sh  (first-time bootstrap
  belongs next to the other mac-mini setup-* scripts)
- test-chat-auth.sh → scripts/test-chat-auth.sh  (ad-hoc smoke test, no
  reason to live in the repo root)
- cloudflared-config.yml stays in root on purpose — it's the single source
  of truth read by scripts/mac-mini/setup-*.sh and scripts/check-status.sh.

Docs:
- docs/POSTMORTEM_2026-04-07.md → docs/postmortems/2026-04-07-memoro-deploy-prod-wipe.md
  (creates the postmortems/ home for future entries; descriptive name)
- docs/future/MAIL_SERVER_MAC_MINI_TEMP.md deleted — what it described
  ("Bereit zur Umsetzung", Stalwart on Mac Mini) is what's actually
  running today, documented in docs/MAIL_SERVER.md. The DEDICATED variant
  in docs/future/ remains since it's still a real future plan.

Root CLAUDE.md fix:
- @mana/local-store description was wrong — claimed it was legacy/standalone
  only, but it's still used by apps/mana/apps/web itself, plus manavoxel,
  arcade, and three shared packages.

Not touched (flagged for follow-up):
- NewAppIdeas/ (344K of "Roblox Reimagined" planning notes in repo root) —
  user decision: archive externally or move under docs/future/
- Doc giants (PROJECT_OVERVIEW 41k, MATRIX_BOT_ARCHITECTURE 36k, etc.) —
  splitting them is its own refactor
- Service CLAUDE.md staleness audit across 18 services — too broad for
  this pass
2026-04-08 12:15:27 +02:00
Till JS
c8ed58b7d1 fix(mana,ui): integrate guest nudge into bottom stack + theme it
The "Gefällt es dir?" guest nudge was a free-floating fixed element at
bottom: 10rem, so it didn't follow the bottom-stack when the PillNav was
collapsed. Move it inside .bottom-stack as the first child so it shares
the stack's reflow.

NotificationBar now uses the elevation system (--color-surface-elevated,
--color-border-strong, --color-foreground) instead of hardcoded rgba so
it adapts to all themes. Bumped the CTA button (shadow + hover lift) and
container (stronger border, layered shadow) to be more visible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:13:05 +02:00
Till JS
b3523f8bdc chore: cleanup leftover dirs from ManaCore→Mana rename + document apps/api
Removed:
- apps/manacore/ — three Svelte files were byte-identical duplicates of
  the apps/mana/ versions, leftover from the 2025 rename. Untracked .env
  files in the same dir were also cleared.
- 21 empty apps/*/apps/web-archived/ directories — leftover from the
  unification move, never tracked in git.
- services/it-landing/ — empty directory, picked up by the services/*
  workspace glob for no reason.
- apps/news/apps/server-archived/ — empty.

Fixed:
- scripts/mac-mini/status.sh: COMPOSE_PROJECT_NAME fallback was still
  manacore-monorepo from before the rename.

Documented:
- Root CLAUDE.md now describes apps/api/ (the @mana/api unified backend)
  as a top-level peer to apps/mana/. It was completely missing from the
  trimmed CLAUDE.md, which made the layout look frontend-only.
2026-04-08 12:12:02 +02:00
Till JS
ed8ab44832 feat(sync): conflict visualization with restore-my-version toast
Closes backlog C from the Phase 9 audit. The data layer has had
real field-level LWW since Sprint 1, but when the server's value
beat a local edit, the user had no way to know. This commit adds
the missing UI piece: a toast that appears whenever applyServerChanges
overwrites a non-empty local field with a strictly newer server
value, with a one-click "restore my version" path.

sync.ts — detection
-------------------
Two new exports:

  - SyncConflictPayload: per-field overwrite event shape
    (tableName, recordId, field, wasLocal, nowServer, localTime,
    serverTime).
  - subscribeSyncConflicts(listener): in-module pub/sub. Returns
    an unsubscribe function.

Both LWW branches in applyServerChanges (insert-as-update and the
canonical update-with-fields path) now call notifyConflict() when:

  1. The server time is STRICTLY greater (not equal) than the local
     field time → there's actually an edit window to lose
  2. The local field value is non-null/undefined → user actually
     typed something to overwrite
  3. The values are not equal (cheap JSON-string compare for objects,
     === for primitives) → there's a real change, not an idempotent
     server replay

Why a custom registry instead of CustomEvent + window.dispatchEvent?
The existing sync-telemetry + quota-detect helpers use
window.dispatchEvent which doesn't work in node-based vitest envs
(no DOM EventTarget). The conflict bus is small enough that a plain
Set<listener> is simpler than polyfilling EventTarget — and the
node test path matters because we need automated coverage of the
detection logic.

conflict-store.svelte.ts — UI state
-----------------------------------
Svelte 5 $state-backed store with three responsibilities:

  1. Coalescing: a SyncConflict is keyed by `${tableName}|${recordId}`,
     so a burst of N field-overwrites on the same record collapses
     into ONE toast with all affected fields underneath. The original
     wasLocal value is preserved across coalescing (we don't clobber
     the user's first typed value if a later field event arrives).

  2. Auto-dismiss: each conflict has a 30s TTL after which it
     evicts itself. Manual dismiss trumps the timer.

  3. Restore: writes wasLocal back to Dexie with a fresh updatedAt
     that beats the server's serverTime, plus a __fieldTimestamps
     patch so the field-LWW pass on the next sync round will let
     our value win. Deferred via setTimeout(0) so it lands AFTER
     applyServerChanges releases its per-table apply lock — running
     before the lock release would silently drop the restore (the
     hook suppression is per-table-set, not per-record).

FIFO eviction at MAX_VISIBLE=8 keeps a bursty server from growing
the visible array unbounded.

SyncConflictToast.svelte — the UI
---------------------------------
Mounts globally in +layout.svelte. Stacks bottom-right above the
OfflineIndicator. Each toast shows:

  - Module label ("Aufgabe", "Notiz", "Termin", …) derived from a
    table-name → German label map. Unknown tables fall through to
    the bare table name.
  - Field count summary ("Feld »title«" / "3 Felder") — we
    deliberately do NOT render the actual values because some are
    encrypted blobs and decrypting them in the toast would be
    significant complexity for marginal UX gain. The user knows
    what they were just editing.
  - Two buttons: "Wiederherstellen" (calls conflictStore.restore)
    and "Behalten" (calls dismiss).

Slide-in animation, dark-mode-aware styling, role="alertdialog"
for accessibility.

Wiring
------
data-layer-listeners.ts:
  - Imports installConflictListener from conflict-store
  - Calls it from installDataLayerListeners() right after the
    quota + telemetry handlers
  - Adds the disposeConflict() call to the cleanup return

+layout.svelte:
  - Imports SyncConflictToast and mounts it next to SuggestionToast
    so it inherits the same global-overlay positioning context

Tests
-----
Five new integration tests in sync.test.ts cover:

  - Fires when server overwrites a non-empty local field with a
    strictly newer value
  - Does NOT fire when local field is null/undefined (no edit to lose)
  - Does NOT fire when values are equal (idempotent replay)
  - Fires once per overwritten field on a multi-field update
  - Does NOT fire on a timestamp tie (LWW lets server win silently
    when there's no real edit window)

All 25 sync tests + 138 total data-layer tests pass. The new
captureConflicts() helper subscribes via subscribeSyncConflicts()
which works in the node-vitest env without needing a DOM polyfill.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 12:01:17 +02:00
Till JS
fe3fc9e7e2 docs: trim CLAUDE.md files — remove stale + duplicated guidance
Root CLAUDE.md: 1138 → 169 lines. Removed ghost apps-archived list,
Supabase env examples, duplicate mana-auth row, contradictory "Code
Quality TODO" block. Pushed search/storage/database/landing/manascore
howtos out to docs/ + .claude/guidelines/ pointers.

apps/mana/CLAUDE.md: 259 → 175 lines. Dropped non-existent workbench/
route from the routing diagram. Folded the auth section into a pointer
to root + the mana-specific current-user stamping pattern. Merged the
two module-system sections. Kept the data-flow ASCII diagram and the
encryption 3-step workflow (the part you actually need while writing
stores).
2026-04-08 11:59:51 +02:00
Till JS
b6486a8a46 fix(mana-video-gen): typo in get_model_info — total_mem → total_memory
PyTorch's `torch.cuda.get_device_properties(0)` returns a
`_CudaDeviceProperties` object whose memory attribute is
`total_memory` (bytes), not `total_mem`. The typo crashed the
service immediately at startup because `get_model_info()` is
called from the FastAPI lifespan handler, not lazily — uvicorn
logged "Application startup failed" before any request could land.

Found while installing mana-video-gen on the Windows GPU box
(192.168.178.11:3026) for the gpu-video.mana.how Cloudflare route.
After the fix the service starts cleanly under the ManaVideoGen
scheduled task and responds 200 on /health both LAN and via
Cloudflare tunnel. status.mana.how now reports 42/42 — first time
ever.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:59:40 +02:00
Till JS
142a65a22f docs: Phase 9 documentation roundup — close encryption-shaped doc gaps
Five documentation surfaces gained encryption awareness in this
sweep. Before this commit, the only place anyone could learn about
the at-rest encryption layer or the zero-knowledge opt-in was the
internal DATA_LAYER_AUDIT.md. New contributors and self-hosters
would never discover one of the most important features of the
product just by reading the standard onboarding docs.

apps/docs/src/content/docs/architecture/security.mdx (NEW)
----------------------------------------------------------
First-class user-facing security page in the Starlight site,
slotted into the Architecture sidebar between Authentication and
Backend.

Sections:
  - What's encrypted (overview table of 27 modules + the
    intentional plaintext carve-outs)
  - Standard mode flow with ASCII diagram
  - "What Mana CAN see" trust statements per mode
  - Zero-knowledge mode setup walkthrough (Steps component)
  - Unlock flow on a new device
  - Recovery code rotation
  - Deployment requirements (the loud MANA_AUTH_KEK warning)
  - Audit trail action vocabulary
  - Threat model summary table
  - Implementation file references with paths

services/mana-auth/CLAUDE.md
----------------------------
New "Encryption Vault" section under Key Endpoints, listing all 7
routes (status, init, key, rotate, recovery-wrap GET+DELETE,
zero-knowledge) with their HTTP method, path, error codes, and a
description. Mentions the three CHECK constraints + RLS + audit
table. Points readers at DATA_LAYER_AUDIT.md and the new
security.mdx for the deep dive.

Environment Variables block gains MANA_AUTH_KEK with a multi-line
comment explaining the openssl rand command + dev fallback warning.

apps/mana/CLAUDE.md
-------------------
Full rewrite. The existing file was from the Supabase era and
described things like @supabase/ssr, safeGetSession(), and a
five-table schema with users + organizations + teams that doesn't
exist any more. Replaced with the unified-app architecture:

  - Module system layout (collections.ts / queries.ts / stores/)
  - Mana Auth (Better Auth + EdDSA JWT) instead of Supabase
  - Local-first data layer with the full pipeline diagram
  - At-rest encryption section with the "when writing module code
    that touches sensitive fields" 4-step guide
  - Updated routing structure (no more separate /organizations,
    /teams routes)
  - Module store pattern code example
  - Reference document table at the bottom pointing at the audit,
    the new security.mdx, and the auth doc

Root CLAUDE.md
--------------
New "At-Rest Encryption (Phase 1–9)" subsection under the
Local-First Architecture section. Two-mode trust summary table,
production requirement for MANA_AUTH_KEK with the openssl command,
the "when writing module code" 4-step guide, and a reference
table. New contributors reading the root CLAUDE.md from top to
bottom now hit encryption naturally as part of the data layer
discussion.

.env.macmini.example
--------------------
MANA_AUTH_KEK was missing from the production env example
entirely — the macmini deployment would silently boot on the
32-zero-byte dev fallback if you copied this file. Added with a
multi-paragraph comment covering: how to generate, why it's
required, how to store securely (Docker secrets / KMS / Vault),
and the rotation caveat.

apps/docs/src/content/docs/deployment/self-hosting.mdx
------------------------------------------------------
Two changes:

  1. Added MANA_AUTH_KEK to the mana-auth service block in the
     Compose example with an inline comment pointing at the new
     section below.

  2. New "Encryption Vault Setup" H2 section with subsections:
     - Generating a KEK (with a fake example value labelled DO NOT
       USE — generate your own)
     - Securing the KEK (Docker secrets, KMS, systemd
       LoadCredential, anti-patterns)
     - "What if I lose the KEK?" — explains the data is
       unrecoverable by design and mitigation via zero-knowledge
       mode opt-in
     - KEK rotation — calls out the missing background re-wrap
       job as a known limitation

apps/docs/astro.config.mjs
--------------------------
Added "Security & Encryption" entry to the Architecture sidebar
between Authentication and Backend so the new page is reachable
from the docs nav.

Astro check: 0 errors, 0 warnings, 0 hints across 4 .astro files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 11:47:59 +02:00
Till JS
b961453244 docs(audit): roll up Phase 9 backlog sweep
Marks the four backlog items closed in this session — vault service
integration tests, recovery code rotation, pre-wired insert helpers
for future server-pushed records, and boards/boardItems encryption.
Updates the encrypted-tables list to 27 tables.

Updates
-------
1. Sprint table grows by 4 rows (BL1, BL2, BL3+4, BL5) with the
   four backlog commits.

2. Test-Status line bumped:
     21 web test files → 21 web + 2 mana-auth
     78 vitest crypto tests + 39 bun mana-auth tests
     "25+ tables" → "27 tables" (boards + boardItems added)

3. Section 5 encrypted-tables list grows by:
     - boards     (name, description)
     - boardItems (textContent, only when itemType === 'text')
   Both labelled "9 BL" in the Phase column to mark them as
   backlog-sweep additions.

4. "Tabellen ohne Encryption (bewusst)" subsection: removed the
   stale "boards/boardItems are a candidate for later" entry —
   they're encrypted now. Added a redirect note pointing readers
   at Section 6 where the actual decision is recorded.

5. Section 6 ("Backlog") completely restructured. The flat
   "in priority order" list became two subsections:

   "Abgeschlossen (Phase 9 Follow-Up Sweep)" — table with the four
   commits + a one-line "what" notice each. Item 3+4 is explicitly
   marked as a re-frame: the original "server pushes plaintext"
   risk turned out to overstate the problem because the
   generate/upload UIs are TODO stubs. The fix was pre-wired
   insert() helpers, not a server-side rewrite.

   "Offen" — five remaining items, reordered:
     1. File-Bytes-Encryption (NEW: surfaced as "#4b" while
        documenting that filesStore.insert() only protects metadata)
     2. Image-Generation / File-Upload Wire-Up (NEW: ensures the
        future UIs go through the helpers from #3+4)
     3. Conflict Visualization UI (unchanged)
     4. Composite Indexes für Multi-Account (unchanged)
     5. V3 Migration Tests (unchanged)

6. Eckdaten line bumped from "25+ Tabellen aktiv" to "27 Tabellen
   aktiv". Best Practices line for ZK gets the "+ rotate im
   Active-State-Support" suffix.

7. Last-update header bumped to today.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 00:00:52 +02:00
Till JS
a7e5b39ad0 feat(picture): encrypt boards + boardItems
Closes backlog #5 from the Phase 9 audit. Adds two new registry
entries (boards, boardItems) and wraps the boards store + queries
+ search provider so the moodboard names, descriptions and
text-item content are sealed at rest like every other user-typed
field.

Registry
--------
  - boards:    ['name', 'description']
  - boardItems: ['textContent']

Inline comments explain that textContent is only set when
itemType === 'text' (image-type items have it null, encryptRecord
is a pass-through). Coordinates / dimensions / z-index / opacity
stay plaintext for the canvas renderer.

Boards store
------------
  - createBoard: snapshots plaintext for the return value before
    encryptRecord mutates the row in place
  - updateBoard: encrypts the diff before update, then re-fetches +
    decrypts for the return value (so the caller gets plaintext,
    not the ciphertext we just wrote)
  - duplicateBoard: NEW behaviour — explicitly decrypts the
    original board first because the duplicate concatenates "(Kopie)"
    onto the name string. Concatenating onto a "enc:1:..." prefix
    would produce a malformed blob that fails to decrypt later.
    The board items are spread directly because the duplicate
    uses the SAME master key, so the existing ciphertext stays
    valid; encryptRecord is idempotent on already-encrypted strings
    so it's a no-op safety check.

Reads
-----
  - useAllBoards: decrypts the visible board set before mapping. The
    item count map only reads structural fields (deletedAt + boardId)
    so it doesn't need a decrypt pass for boardItems.
  - allBoards$ raw observable: same pattern
  - search/providers/picture: decrypts before substring scoring
    against the user query

The unified mana app currently has no UI that renders boardItems
.textContent (the seed data in collections.ts is exported as
PICTURE_GUEST_SEED but never imported anywhere — dead code), so
no item-side reader needs touching for this commit. When a future
canvas editor lands it'll go through the existing decryptRecord
helpers naturally.

78/78 crypto tests still pass (registry shape unchanged at the API
level).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:57:54 +02:00
Till JS
109de61e21 feat(picture,storage): pre-wired insert helpers for future generate/upload flows
Closes backlog #3+4 from the Phase 9 audit. The original framing —
"server-pushed records bypass client-side encryption" — turned out
to overstate the problem after a code audit:

  - apps/mana/apps/web/src/routes/(app)/picture/generate/+page.svelte
    is currently a TODO stub. The handleGenerate() function returns
    "requires connection to Picture-Server (port 3006)" without
    inserting anything.
  - There is no fileTable.add() call site anywhere in the unified
    mana app. File uploads still happen via the standalone storage
    server in apps/storage and arrive via legacy mana-sync push.

So the production code path that would write plaintext images or
files to the user's IndexedDB doesn't yet exist. The risk only
materialises when someone wires up the in-app generate / upload
UI in the unified app.

The right action is to leave behind a clearly-labelled, encryption-
aware insert() helper on each store so the future implementation
has an obvious "do the right thing" path to call. This commit does
exactly that.

picture/stores/images.svelte.ts
-------------------------------
New imagesStore.insert(image: LocalImage) method:
  - Calls encryptRecord('images', image) to seal `prompt` +
    `negativePrompt` (the two registered encrypted fields)
  - Calls imageTable().add(image)
  - Fires the PictureEvents.imageCreated analytic (replaces the
    old plain-table-add path)

A long doc comment on the method explains the architectural
reasoning: the server cannot encrypt under the user's master key
(the key only lives in the browser), so the generation flow MUST
round-trip through the client store even if the AI call itself
happens server-side. The pattern is documented as:

  1. Client posts { prompt, negativePrompt, ... } to image-gen API
  2. Server returns { storagePath, generationId, dimensions, ... }
  3. Client calls imagesStore.insert(...) with both halves
  4. encryptRecord seals the prompt fields before the IndexedDB write

The mixed-state guarantee from picture/queries.ts already covers
the migration window where some images came in via legacy
server-side push and others through this path — decryptRecord
passes plaintext through and unwraps ciphertext blobs.

storage/stores/files.svelte.ts
------------------------------
New filesStore.insert(file: LocalFile) method:
  - Calls encryptRecord('files', file) to seal `name` +
    `originalName`
  - Calls fileTable.add(file)

Same architectural reasoning applies. The doc comment also flags a
SEPARATE concern that this commit does NOT address: encrypting the
actual file *bytes* on S3 (so the storage provider can't read the
content) needs streaming AES-GCM and is a much bigger lift. Tracked
as "backlog #4b" in the comment for whoever picks it up next.

(No analytic call yet on the storage side because StorageEvents
doesn't have a fileUploaded() event — the upload UI is unbuilt, so
adding the analytic event is up to whoever lands the UI.)

Pre-existing TS error on line 46 of images.svelte.ts (the
`toggleField(imageTable(), ...)` Drizzle/Dexie type variance bug)
is unchanged — it predates Phase 9 and is not introduced by this
commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:52:20 +02:00
Till JS
05ae348b12 fix(macmini): blackbox-exporter uses 1.1.1.1/8.8.8.8 directly for DNS
Docker's embedded DNS resolver (127.0.0.11) forwards to the host
resolver, which on the Mac Mini forwards to the home router's
FRITZ!Box DNS. The router keeps a stale negative cache for hours
after a hostname first fails, so any newly added Cloudflare CNAME
(e.g. the GPU public hostnames recreated via the Cloudflare dashboard
during the 2026-04-07 cleanup) appears as "no such host" to the
blackbox probes for the entire negative-cache TTL — even though the
hostname resolves fine via 1.1.1.1 directly the entire time.

Symptom before the fix:
  health-check.sh (uses dig @1.1.1.1)  → All services healthy 
  status.mana.how (via blackbox/VM)    → 4 GPU services down 

The two views were lying to each other in opposite directions —
the public-facing status page reported four healthy services as
down while the operator runbook reported them as up. Confusing
and exactly the kind of monitoring discrepancy a launch should not
ship with.

Fix: pin the blackbox container to public DNS (Cloudflare + Google)
in compose. Blackbox now resolves directly against 1.1.1.1, bypassing
the home-router negative cache entirely. After the recreate the four
GPU probes flipped from probe_success=0 to probe_success=1 within
one scrape interval, and status.mana.how went from 38/42 to 41/42
(only gpu-video remains down — LTX Video Gen is intentionally not
deployed on the Windows GPU box yet).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:47:57 +02:00
Till JS
24001e9545 feat(vault): rotate recovery code while zero-knowledge is active
Closes backlog #2 from the Phase 9 audit. Lets a user replace their
recovery code without going through the disable→generate→re-enable
dance. Works in BOTH standard and zero-knowledge modes.

vault-client
------------
New rotateRecoveryCode() method on the VaultClient interface.
Returns RecoveryCodeSetupResult, identical shape to setupRecoveryCode.

Branches on the current vault state via getStatus():

  Standard mode:
    Re-fetches the plaintext MK from the server (same path as the
    initial setupRecoveryCode), generates a fresh 32-byte recovery
    secret, derives the new wrap key via HKDF, seals the MK, posts
    the wrap to /recovery-wrap (idempotent server-side, replaces
    the existing row in place).

  Zero-knowledge mode:
    Server can't hand out the plaintext MK any more, so we use the
    cachedUnwrappedMkBytes that unlockWithRecoveryCode stashed when
    the user typed in their old recovery code earlier this session.
    Throws with a clear message if the cache is empty (e.g. user
    landed on the page via init rather than recovery-unlock):
    "sign out and back in with your current recovery code first"
    so the cache gets repopulated.

Both branches:
  - Wipe the raw MK reference after sealing
  - Wipe the recovery secret after format
  - Return the formatted code for the UI to display

The OLD recovery code is now permanently invalid. Using it on a
future unlock attempt will fail with the standard generic
"wrong recovery code" error.

Settings UI
-----------
New rotateStep state machine ('idle' / 'rotated') runs alongside
the existing zkSetupStep so the user can rotate without leaving the
active-state UI.

In the active-mode card (zkSetupStep === 'enabled'):
  - Two side-by-side buttons:
    "🔁 Recovery-Code rotieren" + "Zero-Knowledge-Modus wieder deaktivieren …"
  - When the user clicks rotate, handleRotateRecoveryCode() runs the
    flow and renders an inline "Neuer Recovery-Code" subsection
    (same .recovery-code monospace block + Copy button as the
    initial setup) with explicit warning that the old code is now
    invalid.
  - "Ich habe den neuen Code gesichert" button wipes the displayed
    code and drops back to idle.
  - The disable flow stays available (the rotate UI hides itself
    when the user has clicked into the disable confirmation path).

The 28 vault integration tests still pass (39 total in
encryption-vault/, including the existing 11 KEK tests). The new
rotateRecoveryCode method reuses the already-tested
setRecoveryWrap server endpoint, so no new server-side tests are
needed for this commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:43:10 +02:00
Till JS
c2c960121e test(mana-auth): vault service integration tests against real postgres
Closes backlog #1 from the Phase 9 audit. Adds 28 integration tests
for the EncryptionVaultService against a real Postgres so the
RLS policies, CHECK constraints and audit-row writes are exercised
as the production app actually sees them. The pure-crypto KEK tests
in kek.test.ts already covered the wrap/unwrap primitives — this
new file fills in the service-shaped gaps that need a real DB.

Test infrastructure
-------------------
- Reads TEST_DATABASE_URL from env. Whole suite is SKIPPED via
  describe.skip if unset, so unrelated CI runs and `bun test` from
  a fresh checkout don't fail on missing connection. The
  encryption-vault sub-job has to provision a Postgres explicitly.
- Schema is assumed already migrated (run `pnpm db:push` or apply
  sql/002 + sql/003 manually before invoking the suite). Tests
  insert a fresh test user per case via beforeEach so cross-test
  pollution is impossible despite the FK to auth.users.
- afterAll cleans up the user (CASCADE wipes vault + audit) and
  closes the postgres pool so bun test exits cleanly.

Coverage
--------
init (3):
  - Mints a fresh vault, wrapped_mk + wrap_iv populated, ZK off
  - Idempotent (returns same key)
  - Audit rows are written

getStatus (5):
  - vaultExists=false for unconfigured user
  - vaultExists=true after init, no recovery wrap
  - hasRecoveryWrap=true after setRecoveryWrap
  - zeroKnowledge=true after enableZK
  - Does NOT write an audit row (cheap metadata read)

setRecoveryWrap (4):
  - Stores wrap on existing vault
  - VaultNotFoundError on missing vault
  - Idempotent (replaces previous wrap)
  - Writes recovery_set audit row

clearRecoveryWrap (3):
  - Removes the wrap
  - ZeroKnowledgeActiveError when ZK is on
  - VaultNotFoundError on missing vault

enableZeroKnowledge (4):
  - Flips zero_knowledge=true and NULLs out wrapped_mk + wrap_iv
  - RecoveryWrapMissingError if no recovery wrap is set
  - Idempotent (already-on is no-op)
  - VaultNotFoundError on missing vault

disableZeroKnowledge (2):
  - Restores wrapped_mk from a client-supplied master key,
    verifies the round-trip via getMasterKey returns the same bytes
  - No-op when ZK is already off

getMasterKey (3):
  - Returns unwrapped MK in standard mode
  - Returns recovery blob with requiresRecoveryCode=true in ZK mode
  - VaultNotFoundError on missing vault

rotate (2):
  - Mints fresh MK and wipes any existing recovery wrap
  - ZeroKnowledgeRotateForbidden in ZK mode

DB-level invariants (2):
  - Setting wrapped_mk back while ZK active is rejected by
    encryption_vaults_zk_consistency
  - Setting wrap_iv to NULL while wrapped_mk is set is rejected
    by encryption_vaults_wrap_iv_pair
  Both wrap the Drizzle update in an arrow IIFE so
  expect(...).rejects.toThrow() sees a real Promise (Drizzle's
  chainable update() only executes on await/then).

Run results
-----------
With TEST_DATABASE_URL set + schema migrated:
  28 pass, 0 fail, 64 expect() calls

Without TEST_DATABASE_URL set (default):
  0 pass, 30 skip (full suite cleanly skipped)
  KEK tests in kek.test.ts still run unaffected.

Drive-by: kek.test.ts header comment updated to point at the new
sibling file instead of saying "tests will live alongside mana-sync"
(which was outdated speculation from Phase 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:39:48 +02:00
Till JS
ea165c8b46 docs(audit): roll up Phase 9 in DATA_LAYER_AUDIT.md
Marks the Zero-Knowledge opt-in as live and documents the new
architecture surface so future readers can understand the trust
model without spelunking through six commits.

Updates
-------
1. Sprint table grows from Phase 1–8 to Phase 1–9, adds the six new
   commits (4 milestones + 2 follow-ups: status endpoint + lock-screen
   modal). Test count bumped from 262 to 284 (22 new in recovery.test.ts).

2. Section 5 "Encryption Pipeline" reworked:
   - "Wer hält was?" now has TWO tables — Standard-Modus and
     Zero-Knowledge-Modus — making the trust model difference explicit
   - New "Recovery-Code-Pipeline" subsection with two ASCII flow
     diagrams (setup + unlock) showing every step from "user clicks
     button" to "MK in MemoryKeyProvider"
   - New "Schlüssel- + Datei-Kette für Phase 9" table mapping each
     code path to its file

3. "Was Mana technisch (nicht) sehen kann" rewritten to compare both
   modes side by side. Standard mode keeps the existing
   "theoretically decryptable by KEK operator" disclosure;
   zero-knowledge mode is upgraded to a hard "computationally
   incapable" guarantee — and the trade-off ("Recovery-Code lost =
   data lost") is called out explicitly. The DB CHECK constraint
   that enforces "ZK active ⇒ recovery wrap exists" is mentioned as
   the schema-level safety net.

4. Backlog reordered. Phase 9 is no longer listed as an open item;
   the only true-zero-knowledge follow-up is now item #1 (service
   tests against real Postgres for the four new vault methods,
   analogous to the existing kek.test.ts pattern but needing a
   container DB). Items 2–8 are unchanged from the previous
   roundup.

5. Eckdaten + Best Practices + final production-grade summary all
   reflect the new ZK opt-in. Schwachstelle #4 row updated to
   "Phase 1–9".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:28:06 +02:00
Till JS
a48b2d5841 feat(layout): lock-screen recovery code unlock modal
Closes the second Phase 9 follow-up. When a user has zero-knowledge
mode active and signs in on a new device (or after a session expiry),
the layout's vault-unlock effect lands in the new
'awaiting-recovery-code' state. Previously this was a dead end —
the layout just logged a warning and the rest of the app sat with a
locked vault.

This commit adds the missing UI piece: a non-dismissable modal that
mounts whenever the unlock effect signals 'awaiting-recovery-code'.

RecoveryCodeUnlockModal component
---------------------------------
  - Reads the singleton vault client via getVaultClient()
  - Single text input + submit button
  - On submit:
    1. Calls vaultClient.unlockWithRecoveryCode(input)
    2. On success: clears input, calls onUnlocked() prop → parent
       hides the modal, app boots normally
    3. On RecoveryCodeFormatError: shows a format hint
    4. On any other error (wrong code OR corrupted blob — surfaced
       uniformly so an attacker can't distinguish): shows
       "Recovery-Code falsch, prüfe deine Eingabe"
  - Non-dismissable: there's no Cancel button. Without the recovery
    code the app cannot read encrypted data and would just sit in a
    half-broken state. The user can sign out from the header (the
    auth flow runs above the encryption layer) if they need to bail.
  - Help text at the bottom is honest about the irreversible nature
    of losing the recovery code.

Layout integration
------------------
+layout.svelte:
  - Imports the modal
  - New `needsRecoveryCode = $state(false)` flag
  - The vault-unlock effect now switches on three branches instead
    of just success/failure:
      'unlocked'                → needsRecoveryCode = false
      'awaiting-recovery-code'  → needsRecoveryCode = true (mount modal)
      anything else             → console.warn (unchanged)
  - Logout path also resets needsRecoveryCode so the modal doesn't
    leak across sessions
  - {#if needsRecoveryCode} mounts the component at the bottom of
    the markup (above the existing global toasts and banners)

The autofocus warning is suppressed via svelte-ignore — the input
needs immediate focus because it's the only thing the user can
interact with on this surface, and screen-reader users will hear
the modal's accessible name from the role="dialog" + aria-labelledby
binding.

End-to-end smoke flow that now works:
  1. User goes to /settings/security on Device A, enables ZK
  2. User signs out, signs back in on Device B
  3. Layout effect calls vaultClient.unlock() → server returns
     recovery blob → vaultClient state goes to awaiting-recovery-code
  4. Modal mounts, user pastes their recovery code from password
     manager
  5. unlockWithRecoveryCode runs the inline AES-GCM unwrap, imports
     the MK as non-extractable, caches the bytes for a future
     disable, transitions to 'unlocked'
  6. Modal calls onUnlocked → layout dismisses modal → rest of the
     app boots and renders decrypted data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:24:32 +02:00
Till JS
78d949d051 feat(crypto): vault status endpoint + settings page hydration
Closes the Phase 9 Milestone 4 known limitation where the settings
page always started in 'idle' state regardless of whether the user
had already enabled zero-knowledge mode. Adds a cheap server-side
status read + hydrates the page on mount.

Server side
-----------
New VaultStatus interface and getStatus(userId) method on
EncryptionVaultService — single SELECT against encryption_vaults,
no decryption, no audit logging (this gets called on every settings
page mount and we don't want to flood the audit log with read-only
metadata fetches). Returns sane defaults when the vault row doesn't
exist yet so the client can avoid a 404 dance.

  GET /api/v1/me/encryption-vault/status →
  {
    vaultExists: boolean,
    hasRecoveryWrap: boolean,
    zeroKnowledge: boolean,
    recoverySetAt: string | null
  }

Client side
-----------
vault-client.ts gains a `getStatus()` method that bypasses the
fetchVault retry helper (status reads should be cheap and one-shot;
if they fail we let the caller fall back to defaults). Re-exports
VaultStatus + RecoveryCodeSetupResult from the crypto barrel.

settings/security/+page.svelte
------------------------------
onMount kicks off a getStatus() call. Two things change based on
the response:

  1. If the server says zero_knowledge=true, jump zkSetupStep to
     'enabled' so the page renders the active-state UI directly
     instead of the setup flow.

  2. New `hasRecoveryWrap` state tracks whether a wrap is stored,
     even if ZK isn't active yet. The idle branch now has TWO
     variants:

     - hasRecoveryWrap=false: original "Recovery-Code einrichten"
       single button (unchanged from milestone 4)

     - hasRecoveryWrap=true:  amber notice "you have a code stored
       but ZK isn't active" with three buttons:
       * "Zero-Knowledge jetzt aktivieren" (jumps straight to the
         enable call)
       * "Neuen Recovery-Code generieren" (rotates the wrap)
       * "Recovery-Code entfernen" (with two-click confirmation,
         calls DELETE /recovery-wrap)

This handles the previously-orphaned state where a user generated a
code, copied it to their password manager, but never confirmed the
final activation step. Without this branch, after a reload the
settings page would show "Setup" again and the call would fail
with "vault is already in zero-knowledge mode" — except it wouldn't,
because the vault wasn't actually in ZK yet, just had a recovery wrap
stored. Either way the state was confusing.

handleSetupRecoveryCode + handleClearRecoveryCode now keep
hasRecoveryWrap in sync after the round trip.

Fail-quiet on getStatus error: if the network/auth/server-side fetch
fails, the page stays at the idle default. The user can still run
the setup flow, and any inconsistencies surface via the usual
server-side error responses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:19:49 +02:00
Till JS
56312ff579 feat(settings): phase 9 milestone 4 — zero-knowledge UI section
Adds the user-facing setup + management surface for the Phase 9
recovery code + zero-knowledge opt-in. Lives in
/settings/security between the Rotate and Honest-disclosure cards.

Three-step setup flow
---------------------
Step 1 — Generate
  Single button "Recovery-Code einrichten". Disabled unless the
  vault is currently unlocked. Clicks call vaultClient.setupRecoveryCode()
  which mints a fresh 32-byte secret, derives the wrap key, posts
  the sealed wrap to /recovery-wrap, and returns the formatted code.

Step 2 — Display + copy
  Shows the formatted code (1A2B-3C4D-...) in a monospace, user-
  selectable block with a 📋 Copy button. Explicit warning: "Wir
  zeigen ihn dir nur ein einziges Mal." User clicks "Ich habe den
  Code gesichert" to advance.

Step 3 — Confirm
  User has to type (or paste) the code back into a verification
  input. Comparison is case-insensitive and ignores dashes/whitespace
  on both sides so format jitter doesn't punish them. Mismatch shows
  a clear inline error and stays in the same step.

Step 4 — Activate
  Final danger confirmation: "Wenn du jetzt aktivierst, löscht der
  Server seine Kopie deines Schlüssels." Click → vaultClient.
  enableZeroKnowledge() → server NULLs out wrapped_mk + wrap_iv,
  state flips to 'enabled', generatedCode is wiped from the closure.

Active state
------------
After enable, the section shows a green " Zero-Knowledge-Modus
aktiv" panel with a "Disable" button. Disabling needs an unlocked
vault (the cached MK bytes from the recovery-code unlock get sent
back to the server for KEK re-wrapping). Two-click confirmation
guards the destructive call.

State machine
-------------
zkSetupStep: 'idle' → 'generated' → 'confirming' → 'enabling' → 'enabled'
plus a `handleResetSetup` escape that clears the in-flight code +
input + error and drops back to 'idle' from any step.

Known limitation: the page state doesn't survive a reload — there
is no GET /encryption-vault/status endpoint yet to query the
server's current zero_knowledge flag, so on a fresh page load we
always start at 'idle' regardless of whether ZK is actually on.
A future commit will add the status endpoint + an onMount call to
hydrate zkSetupStep correctly. For now, the existing
'awaiting-recovery-code' badge from milestone 3 covers the lock-
screen path, and the dashboard sets the right initial state at
unlock time.

Status badge fix from milestone 3 (statusBadge() handling the new
'awaiting-recovery-code' variant) is reused here.

Styles
------
.zk-error      — light red bordered alert for inline errors
.zk-actions    — flex row of buttons (wraps on mobile)
.zk-step       — bordered group with the step heading
.recovery-code — monospace, user-select:all so click+copy works
.recovery-input — monospace input for the confirm step
.btn-ghost     — transparent border-less variant for "Abbrechen"

Dark-mode handling for the new surfaces is in the existing media
query block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:03:35 +02:00
Till JS
6de01937cf feat(vault-client): phase 9 milestone 3 — recovery + zero-knowledge flows
Extends the browser-side vault client with five new methods that
mirror the server-side Phase 9 routes, plus a new
`awaiting-recovery-code` state that pauses the unlock mid-flow
when the server is in zero-knowledge mode.

VaultUnlockState gains a fourth variant
---------------------------------------
  | { status: 'awaiting-recovery-code' }

This is the state the client sits in between calling unlock()
(which received a recovery blob from GET /key) and the user typing
their recovery code into the UI. The settings page status badge
got updated to render this case as "🔑 Recovery-Code erforderlich".

New closure state inside createVaultClient
------------------------------------------
  - pendingRecoveryBlob: stash for the recovery wrap returned by
    GET /key in zero-knowledge mode. unlockWithRecoveryCode reads
    from here so the second round of input doesn't need a re-fetch.
  - cachedUnwrappedMkBytes: kept ONLY when the vault was unlocked
    via the recovery code path AND the user might want to disable
    zero-knowledge later (which needs to hand the MK back to the
    server for KEK re-wrapping). The standard unlock path leaves
    this null because the server already has the KEK wrap. Wiped
    on lock(), on disable success, and on any state transition
    that destroys the master key.

Modified existing methods
-------------------------
  - unlock(): branches on the response shape. If the server returns
    a recovery blob (zero-knowledge mode), stash it via
    awaitRecoveryCode() and return state='awaiting-recovery-code'.
    Otherwise unwrap as before. Same fork applies to the /init
    fallback path.
  - rotate(): if the server somehow returned a ZK shape (it should
    never — rotate is forbidden in ZK mode server-side), bail with
    a server error instead of silently misinterpreting bytes.
  - lock(): also clears pendingRecoveryBlob + wipes
    cachedUnwrappedMkBytes.

New methods (all wired into the returned VaultClient)
-----------------------------------------------------
  - setupRecoveryCode(): generates a fresh 32-byte recovery secret,
    derives the wrap key, re-fetches the active master key in
    extractable form, seals it, posts to /recovery-wrap, returns
    the formatted recovery code for the UI to display. Wipes both
    raw byte references after the seal. Caller is responsible for
    clearing the formatted string from memory once the user has
    confirmed they backed it up.

  - clearRecoveryCode(): DELETE /recovery-wrap. Server enforces the
    "not while ZK is active" rule.

  - enableZeroKnowledge(): POST /zero-knowledge { enable: true }.
    Maps RECOVERY_WRAP_MISSING server response to a clear "set up
    a recovery code first" client error.

  - disableZeroKnowledge(): POST /zero-knowledge { enable: false,
    masterKey: base64 }. Reads the cached MK bytes, base64-encodes,
    sends. Wipes the cache after success.

  - unlockWithRecoveryCode(code): completes the flow that started
    in unlock(). Parses the user-typed code (RecoveryCodeFormatError
    bubbles up if the shape is wrong), derives the wrap key, runs a
    single inline AES-GCM decrypt on the stashed blob (yields both
    the raw bytes for the cache AND a non-extractable runtime key
    for the provider), wipes raw bytes, transitions to 'unlocked'.

    Generic error message on failure ("wrong recovery code or
    corrupted vault") so an attacker can't distinguish wrong-code
    from tampered-blob. Stays in 'awaiting-recovery-code' on
    failure so the user can retry without a re-fetch.

Drive-by stale test fix
-----------------------
aes.test.ts had an assertion from Phase 1 that `tasks` and `events`
return null because they were on enabled:false. Phase 7.1 flipped
both tables on, so the assertion has been failing since that
commit. Replaced the test with a stable negative case
(non-existent table name) that doesn't shift with each rollout
phase.

Test results: 78/78 crypto tests pass after the fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 23:01:16 +02:00
Till JS
a55aae6cb5 chore(macmini): infra cleanup — compose env, blackbox mem, prometheus gpu probes
Three Mac Mini infrastructure follow-ups bundled:

1. docker-compose.macmini.yml — drop ghost backend env vars from
   the mana-app-web service (todo, calendar, contacts, chat, storage,
   cards, music, nutriphi `PUBLIC_*_API_URL{,_CLIENT}` plus the memoro
   server URLs). The matching consumers were removed in the earlier
   ghost-API cleanup commits, so these env entries had been wiring
   nothing into the running container for several deploys. Force-
   recreating mana-app-web after pulling this commit will pick up
   the slimmer env automatically.

2. docker-compose.macmini.yml — bump `mana-mon-blackbox` mem_limit
   from 32m to 128m. blackbox-exporter v0.25 sits north of 32m
   under load and was OOM-restart-looping every ~90 seconds, which
   in turn made `status.mana.how` and the prometheus probe metrics
   stale (since the scraper was missing every other window).

3. docker/prometheus/prometheus.yml — split `blackbox-gpu` into two
   jobs:
     - `blackbox-gpu` now probes `/health` via the http_health
       module, because the GPU services (whisper STT, FLUX image
       gen, Coqui TTS) return 401/404 on `/` by design (auth or
       API-only). The previous http_2xx-on-`/` probe was reporting
       all four as down even though they answered `/health` with
       200, which inflated the down count on status.mana.how.
     - `blackbox-gpu-root` keeps the http_2xx-on-`/` probe for
       Ollama, which has no `/health` endpoint but does answer
       2xx on its root.
   Both jobs share the same blackbox-exporter relabel rewrite so
   the targets are routed through the exporter container, not
   scraped directly by VictoriaMetrics.

Verified post-fix: status.mana.how reports 41/42 services up (only
`gpu-video` remains down — LTX Video Gen is intentionally not
deployed yet on the Windows GPU box).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:59:38 +02:00
Till JS
4cfa869f33 docs: PRE_LAUNCH_CLEANUP.md — what we removed before launch and why
Companion document to the pre-launch cleanup commits. Describes every
piece of legacy/dead/deprecated scaffolding that was removed while the
system still has no live users — the cheapest moment to do it.

Each entry follows a fixed shape:
  - What was there
  - Why it had to happen pre-launch (the user-facing risk if done later)
  - What concretely changed
  - LOC / size impact

Thirteen entries land with this commit:

  1. Schema v1–v10 collapsed into a single db.version(1)
  2. setApplyingServerChanges() deprecated shim removed
  3. LocalLabel @deprecated alias renamed to TaskTag
  4. labelsStore backward-compat alias removed
  5. $lib/stores/tags.svelte.ts re-export shim removed
  6. EMOJI_TO_ICON_MAP legacy data-migration fallback removed
  7. useAllEvents() unused calendar query removed
  8. Cross-app search providers lazy-loaded
  9. Bundle analysis findings (web-llm route-isolated, no further work)
 10. Production restoration — 2026-04-07 outage postmortem
 11. Eighteen broken subdomains triaged — 16 fixed, 2 follow-ups
 12. Memoro server detached from mana.how stack
 13. Ghost backend API hostnames removed (12 hostnames + clients)

Plus a "How to add an entry" template for future cleanups.

The two open follow-ups are documented with concrete manual-fix
instructions:
  - stt-api / tts-api 502 — needs Cloudflare Zero Trust dashboard
    cleanup of stale Public Hostname mappings on an old tunnel.
  - gpu-video.mana.how — LTX video generation, planned but not yet
    deployed on the Windows GPU box.

Once the system has launched this document becomes historical and
should not be edited further — new pre-launch cleanups won't be a
thing anymore by definition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:32:14 +02:00
Till JS
85e38176d8 chore(macmini/scripts): runbook hardening — status diff + ingress walk
Two failures during the 2026-04-07 production outage triage were caused
not by the underlying outage but by `status.sh` and `health-check.sh`
hiding the broken state. Both scripts hardened so the same outage
shape can't reoccur invisibly.

status.sh — compose-vs-running diff
  The old script printed "X containers running / Y total" without
  noticing that some compose-defined containers were never started in
  the first place. The Mac Mini was running 37 of 42 declared
  containers and the script reported "37 running" with no indication
  of the gap — `mana-core-sync` and `mana-api-gateway` were silently
  missing for hours.

  New behaviour: read every service from `docker compose config`,
  diff its `container_name` against `docker ps`, and report each
  declared service whose container is not currently up. The same
  outage state would have been flagged on the very first run.

health-check.sh — public-hostname walk via Cloudflare DNS
  The old script probed ~50 hardcoded `localhost:<port>/health`
  endpoints across Chat, Todo, Calendar, etc. — but the per-app
  HTTP backends those endpoints expected don't exist anymore (the
  ghost-API cleanup removed them entirely). Every probe returned
  HTTP 000 / connection refused, generating a wall of false-positive
  alerts that drowned out the real signal.

  The block was replaced with a dynamic walk of every `hostname:`
  entry in `~/.cloudflared/config.yml`. Each hostname is probed via
  the public Cloudflare tunnel, so DNS gaps, missing tunnel routes,
  502/530 origin failures and timeouts surface as failures the same
  way real users would experience them. On its first run after the
  cleanup it surfaced eighteen previously-invisible hostname failures
  (no DNS, 502, or 530) — every one of them a real production issue.

  DNS resolution intentionally goes through `dig +short HOST @1.1.1.1`
  instead of the local resolver. The Mac Mini's home-router DNS keeps
  a negative cache for hours after the first failed lookup, so newly
  added CNAMEs (like the post-outage sync/media records) appeared as
  "no response" from inside the script for hours even though external
  users saw them resolve immediately. Asking Cloudflare's DNS directly
  gives the script the same view the public internet has.

  The Matrix, Element, GPU-LAN-redundant and monitoring port-by-port
  blocks were removed — the public-hostname walk covers all of them
  via their `*.mana.how` hostnames going through the actual tunnel.

  The "stuck container" detector now ignores `*-init` containers
  (one-shot init pods, Exit 0 = success, intentionally never re-run).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:31:53 +02:00
Till JS
a94abd37e0 chore(macmini): pin COMPOSE_PROJECT_NAME=manacore-monorepo
The Mac Mini's existing containers were originally created under the
project name `manacore-monorepo` (from the historical directory name)
but the current checkout lives in `mana-monorepo`. Without an explicit
pin, every `docker compose up` from this directory spawned a SECOND
project, creating duplicate containers and silent volume conflicts.
The 2026-04-07 outage recovery had to pass `-p manacore-monorepo`
manually for exactly this reason.

Pinning the name in `.env.macmini.example` (which is checked in)
means any fresh checkout that copies it to `.env.macmini` inherits
the right project name automatically. The pin is also live on the
production Mac Mini in `.env` and `.env.macmini` (untracked).

Removing this line WILL break the next deployment — the comment
in the file says so explicitly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:31:25 +02:00
Till JS
171fbd18be chore(mana/web): pre-launch module cleanup — schema collapse, dead code, lazy search
Six independent pre-launch tidy-ups bundled because they all touch
the same module-layer surface and the larger commit reads more
clearly than six adjacent two-line PRs.

1. database.ts schema v1–v10 collapsed into a single canonical
   db.version(1). The system has no live users yet, so dropping the
   versioned migration history is the cheapest moment to do it.
   The post-collapse Dexie table set is provably identical to the
   pre-collapse state (asserted by module-registry.test.ts).
   Removed: EMOJI_TO_ICON map + v2 upgrade, v3 timeBlocks data
   migration (~250 LOC of one-shot code), versions 4-10.
   Also dropped the @deprecated `setApplyingServerChanges()` shim
   (replaced by `beginApplyingTables()` weeks ago, no callers).

2. LocalLabel @deprecated alias renamed to TaskTag in the todo
   module and all 11 consumers (board-views, ListView, DetailView,
   QuickAddTask, +page.svelte). The alias was annotated @deprecated
   but had eleven live consumers — exactly the worst kind of dead
   code, the one that grows accidental new consumers via autocomplete
   the longer it stays. Renamed to TaskTag rather than `Tag` to
   avoid colliding with the `Tag` icon from `@mana/shared-icons`.

3. labelsStore backward-compat alias deleted from todo/stores —
   pure dead code with zero consumers.

4. EMOJI_TO_ICON_MAP fallback in habits/queries removed. The
   constant only existed as the in-memory equivalent of the v2
   schema migration that was just deleted; once no record can have
   the old `emoji` field, the fallback can never fire.

5. useAllEvents() in calendar/queries removed. JSDoc itself called
   it out as "for backward compatibility with calendar-specific
   views" — zero external consumers, only the barrel referenced it.

6. $lib/stores/tags.svelte.ts re-export shim deleted. It was a
   20-line pure re-export from @mana/shared-stores with the explicit
   header "for backward compatibility with existing imports".
   Thirteen importers (todo/calendar/contacts/places/zitare ListView
   + DetailView, plus +layout.svelte and the calendar/contacts/tags
   route +page.svelte files) rewritten to import directly.

7. SearchRegistry got `registerLazy(appId, loader)` and the eleven
   per-app providers now register via dynamic `import()`. Spotlight
   search is opened on demand, so the eleven provider chunks stay
   out of the initial JS bundle until the user actually searches.
   Sister benefit: a search filtered to a single appId only loads
   that one provider.

The structural backbone for all of this — the per-module
`module.config.ts` files plus `module-registry.{ts,test.ts}` — was
committed earlier in 5d4123d2b.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:31:08 +02:00
Till JS
3a473897ec chore(mana/web): pre-launch cleanup — remove ghost backend API clients
Twelve `*-api.mana.how` Cloudflare hostnames (todo, calendar, contacts,
chat, storage, cards, music, picture, presi, zitare, clock, context)
plus their matching `lib/api/services/*.ts` clients still existed in
the unified web app even though the per-app HTTP backends had been
gone since the local-first migration. Their tunnel routes pointed at
ports nothing listened on, so every consumer call returned 502 — and
the corresponding `__PUBLIC_*_API_URL__` runtime variables were
silently injected into every page render.

The only live consumer was `qrExportService` (committed separately as
part of the rewrite to read directly from Dexie). Two admin / data-
management pages also imported the types but were already migrated
to the unified `adminService` / `myDataService` clients.

Removed:
- Twenty-four files deleted: the twelve `lib/api/services/*.ts`
  clients plus their `*.test.ts` siblings.
- `services/index.ts` collapsed from a thirteen-symbol re-export
  to just the four genuinely server-bound services
  (`adminService`, `landing`, `myDataService`, `qrExportService`).
- `hooks.server.ts` no longer reads or injects any of the twelve
  `__PUBLIC_*_API_URL__` runtime variables, and the CSP `connect-src`
  list shrank by the same amount. Memoro server URL also removed
  since the unified `memoro` module is fully local-first and never
  hit the standalone server (the docker-compose service stays
  defined for the mobile app).
- `routes/status/+page.server.ts` stops probing the dead per-app
  health endpoints — only `auth`, `sync`, `uload-server`, `media`
  and `llm` remain in the public status page.

The cloudflared tunnel ingress entries for these hostnames were also
removed in `~/.cloudflared/config.yml` on the Mac Mini (not in this
repo) so the formerly-502 responses now return 404 from the edge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:30:24 +02:00
Till JS
c27cb84f28 fix(mana/web): bundle rrule into SSR build to fix /calendar 500
`rrule@2.8.1` ships dual CJS/ESM builds but its `package.json` has no
`exports` field, so the SvelteKit Node adapter resolves it to the CJS
bundle at runtime. The named import `import { RRule } from 'rrule'`
then throws `SyntaxError: Named export 'RRule' not found` whenever
`/calendar` SSRs, which crashed every render of the route in production.

Adding `'rrule'` to `ssr.noExternal` forces Vite to bundle rrule into
the server output, where its CJS↔ESM interop layer handles the named
import correctly. The source files using rrule (`time-blocks/recurrence.ts`
and `calendar/components/CustomRecurrenceBuilder.svelte`) need no change.

Surfaced via the rebuilt `health-check.sh` ingress walk after a
postgres restart cycle pushed mana-app-web into a 500 state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:29:55 +02:00
Till JS
f46d1328d8 feat(mana-auth): phase 9 milestone 2 — vault recovery wrap + zero-knowledge
Server-side support for the Phase 9 zero-knowledge opt-in. Adds the
recovery-wrap columns + four new vault operations + the routes that
expose them.

Schema (sql/003_recovery_wrap.sql)
----------------------------------
Adds to auth.encryption_vaults:

  - recovery_wrapped_mk    text                  (NULL until set)
  - recovery_iv            text                  (NULL until set)
  - recovery_format_version smallint NOT NULL DEFAULT 1
  - recovery_set_at        timestamptz
  - zero_knowledge         boolean NOT NULL DEFAULT false

Drops NOT NULL from wrapped_mk + wrap_iv (a vault in zero-knowledge
mode has no server-side wrap at all).

Three CHECK constraints enforce the invariant at the DB level so no
service bug can leave a vault in an inconsistent state:

  - encryption_vaults_has_wrap         — at least one of (wrapped_mk,
                                          recovery_wrapped_mk) is set
  - encryption_vaults_wrap_iv_pair     — ciphertext + IV are paired
                                          (both NULL or both set) on
                                          each wrap form
  - encryption_vaults_zk_consistency   — zero_knowledge=true implies
                                          wrapped_mk IS NULL AND
                                          recovery_wrapped_mk IS NOT NULL

If a code-level bug ever tried to enable ZK without a recovery wrap,
or to leave both wraps empty, Postgres would reject the UPDATE.

Drizzle schema (db/schema/encryption-vaults.ts)
-----------------------------------------------
Mirrors the migration: wrappedMk + wrapIv become nullable, the four
new columns added with the right defaults. Inline doc comment explains
the zero-knowledge fork.

Service (services/encryption-vault/index.ts)
--------------------------------------------
VaultFetchResult gains optional `requiresRecoveryCode` /
`recoveryWrappedMk` / `recoveryIv` so the route handler can serialize
the right shape. masterKey becomes Uint8Array | null (null in ZK mode).

Existing methods updated:
  - init: branches on row.zeroKnowledge — returns the recovery blob
    instead of an unwrapped MK if the user is already in ZK mode
  - getMasterKey: same fork, with audit context "zk-recovery-blob"
  - rotate: throws ZeroKnowledgeRotateForbidden in ZK mode (the server
    can't re-wrap a key it can't read). Also wipes any stale recovery
    wrap on rotation — the new MK has nothing to do with the old one,
    so the old recovery code would unwrap into garbage.

New methods:
  - setRecoveryWrap(userId, { recoveryWrappedMk, recoveryIv }, ctx)
    Stores (or replaces) the user's recovery wrap. Idempotent.
  - clearRecoveryWrap(userId, ctx)
    Removes the recovery wrap. Forbidden if ZK is active (would lock
    the user out) — throws ZeroKnowledgeActiveError → 409.
  - enableZeroKnowledge(userId, ctx)
    NULLs out wrapped_mk + wrap_iv, sets zero_knowledge=true. Requires
    a recovery wrap to already be present — throws
    RecoveryWrapMissingError → 400 otherwise. Idempotent on already-on.
  - disableZeroKnowledge(userId, mkBytes, ctx)
    Inverse: takes a freshly-unwrapped MK from the client, KEK-wraps
    it, stores as wrapped_mk, flips zero_knowledge=false. The client
    is the only entity that can supply the MK at this point, since
    the server can't decrypt the recovery wrap.

Three new error classes:
  - RecoveryWrapMissingError → 400 RECOVERY_WRAP_MISSING
  - ZeroKnowledgeActiveError → 409 ZK_ACTIVE
  - ZeroKnowledgeRotateForbidden → 409 ZK_ROTATE_FORBIDDEN

Audit action union extended with:
  - 'recovery_set' | 'recovery_clear' | 'zk_enable' | 'zk_disable'

Routes (routes/encryption-vault.ts)
-----------------------------------
GET /key + POST /init now share a serializeFetchResult helper that
returns either:
  - { masterKey, formatVersion, kekId }                 (standard)
  - { requiresRecoveryCode: true, recoveryWrappedMk,    (ZK mode)
      recoveryIv, formatVersion }

Three new routes:
  - POST   /recovery-wrap   — body: { recoveryWrappedMk, recoveryIv }
                              Stores the wrap. Validates both fields
                              are non-empty strings.
  - DELETE /recovery-wrap   — Removes the wrap. 409 if ZK active.
  - POST   /zero-knowledge  — body: { enable: boolean, masterKey?: base64 }
                              enable=true:  flip on (no body MK needed)
                              enable=false: flip off (MK required)
                              Validates the MK decodes to exactly 32 bytes.
                              Wipes the bytes after handing them to the
                              service.

POST /rotate now catches ZeroKnowledgeRotateForbidden → 409
ZK_ROTATE_FORBIDDEN so the client can show "disable zero-knowledge
first".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:05:49 +02:00
Till JS
2f48f867f1 feat(crypto): phase 9 milestone 1 — recovery code primitives
Foundation for the zero-knowledge opt-in. New crypto/recovery.ts
provides the user-held secret half of the Phase 9 design:

  - generateRecoverySecret() — 32 random bytes (256 bits) from Web
    Crypto CSPRNG
  - formatRecoveryCode() — renders raw bytes as 16 dash-separated
    groups of 4 uppercase hex chars: "1A2B-3C4D-5E6F-..." (79 chars
    total). Copy-pasteable, password-manager-friendly, no language
    dependency.
  - parseRecoveryCode() — tolerant inverse: strips whitespace + any
    dash placement, accepts mixed case, throws RecoveryCodeFormatError
    on wrong length / non-hex (no position-leaking errors)
  - deriveRecoveryWrapKey() — HKDF-SHA256 with empty salt + versioned
    info "mana-recovery-v1" → non-extractable AES-GCM-256 wrap key.
    HKDF (not PBKDF2/scrypt) because the input already has full 256
    bits of entropy — no slow KDF needed.
  - wrapMasterKeyWithRecovery() — exports the master key bytes,
    AES-GCM-encrypts with the recovery wrap key, returns base64
    ciphertext + IV ready for the server. Wipes the raw MK reference
    immediately after sealing.
  - unwrapMasterKeyWithRecovery() — inverse, returns a non-extractable
    CryptoKey. Throws uniformly on wrong code / tampered ciphertext —
    the UI maps both to "wrong recovery code" so an attacker gets no
    side-channel signal about which check failed.

Why hex over BIP-39?
  - No 2048-word wordlist to bundle (~17 KB even gzipped)
  - 32 random bytes have full 256 bits of entropy on their own — no
    checksum word needed because there's nothing to "validate"
  - Trivially copy-pasteable into any password manager, no language
    dependency, no autocomplete-confusing dictionary words
  - Survives autocorrect (no spaces)

22 tests in recovery.test.ts cover:
  - generation (length, randomness)
  - format (16 groups, uppercase, total 79 chars, wrong-length input)
  - parse (roundtrip, lowercase, whitespace, missing dashes, extra
    dashes, error cases, no position leakage)
  - key derivation (non-extractable, deterministic, wrong-length input)
  - wrap/unwrap roundtrip (with and without format/parse trip)
  - failure modes (wrong code, tampered ciphertext)
  - IV uniqueness (no reuse on repeated wraps)

This is the self-contained foundation. Server-side schema, vault
service extensions, vault-client wire-up and the settings UI all
build on these primitives in subsequent commits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:00:43 +02:00
Till JS
25aabc3f49 docs(audit): roll up Phase 7 + 8 in DATA_LAYER_AUDIT.md
The encryption rollout is complete. Updates the audit doc to reflect
the final state:

  - Encryption-Sprints table grows to Phase 1–8 with the four new
    commits (status roundup, 7.1 timeBlocks-coupled, 7.2 storeless,
    8 storage/picture/music/events)
  - Section 5 encrypted-tables list bumped from 14 to 25+ tables —
    adds tasks, calendar.events, timeBlocks, questions, answers,
    links, documents, meals, files, images, songs, mukkePlaylists,
    socialEvents, eventGuests
  - New "Bewusste Plaintext-Carve-Outs" subsection documents the
    structural fields kept plaintext on purpose (songs.artist for
    browsing aggregations, links.originalUrl for the public redirect
    handler, socialEvents decrypt-before-publish, files/images
    indexed columns where the index is now a no-op, etc.)
  - New "Tabellen ohne Encryption (bewusst)" subsection explains why
    manaLinks, boards, boardItems and the sync/system tables stay
    out of the registry
  - Backlog reordered: the three Phase 7 items are now done, only
    Phase 9 (recovery-code opt-in for true zero-knowledge),
    server-side image/file wrapping, and the boards edge case remain
  - "Test-Status" line + "Best Practices" line + "Eckdaten" line all
    bumped from 22 to 25+ tables

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:55:04 +02:00
Till JS
be611cd1ee feat(crypto): phase 8 — encrypt remaining tables (storage, picture, music, events, guests)
Closes the last sweep of registry entries that were stuck on
enabled:false. Each table is corrected to match the actual schema
fields, then flipped on with writers + readers wrapped.

Registry corrections + flips
----------------------------
  - files: was ['name','originalName','notes'] → ['name','originalName']
    LocalFile has no `notes` column. `name` IS indexed but no
    .where('name') call site exists in the app, so encryption is safe
    — the index just becomes a no-op for content lookups.
  - images: was ['prompt','negativePrompt','revisedPrompt','notes']
    → ['prompt','negativePrompt']. Neither revisedPrompt nor notes
    exists on LocalImage. `prompt` is indexed, same caveat as
    files.name.
  - songs: was ['title','artist','album','lyrics','notes']
    → ['title']. lyrics + notes don't exist; artist / album /
    albumArtist / genre stay PLAINTEXT so the album / artist / genre
    browsing views (which aggregate by those fields) don't have to
    decrypt the entire library on every render.
  - mukkePlaylists: kept ['name','description'], now flipped on
  - socialEvents: was ['title','description','notes']
    → ['title','description','location'] (no notes column; location
    is the actually sensitive third field)
  - eventGuests: was ['name','email','phone','notes']
    → ['name','email','phone','note'] (singular `note`, matching the
    schema)
  - manaLinks: REMOVED from registry entirely. Despite the name it's
    the cross-app foreign-key table — sourceAppId / sourceRecordId /
    targetAppId / targetRecordId — with zero user-typed content. The
    Phase 1 placeholder listed label/url/notes which don't exist.

Storage (files)
---------------
  - storage/stores/files.svelte.ts: renameFile encrypts diff before
    fileTable.update. Other store ops touch only metadata (favorite /
    isDeleted / parent) so they stay unwrapped.
  - storage/queries.ts: useAllFiles decrypts before sort
  - storage/ListView.svelte (Workbench): same decrypt-before-render
  - storage/views/DetailView.svelte (inline editor binds to plaintext)
  - cross-app-queries.useStorageStats: decrypts only the recent slice
    (totalSize stays cheap because it reads plaintext .size)
  - search/providers/storage: decrypts before substring scoring
  - storage/trash/+page.svelte: decrypts the visible deleted set

Picture (images)
----------------
  - No client-side .add for images — they arrive purely via sync, so
    no store-level encryption to add. Reads are wrapped:
  - picture/queries.ts: useAllImages, useArchivedImages, allImages\$
  - picture/ListView.svelte (uses prompt as alt text)
  - cross-app-queries.useRecentImages (dashboard widget renders prompt)
  - search/providers/picture: decrypts before substring scoring
  Sync-applied plaintext rows coexist with locally-edited ciphertext
  rows without issue — decryptRecord is per-row idempotent on
  non-encrypted strings.

Music (songs + playlists)
-------------------------
  - music/stores/library.svelte.ts: updateMetadata + insert encrypt
    diffs before write
  - music/stores/playlists.svelte.ts: create snapshots plaintext for
    the return value before encryptRecord mutates the row, update
    encrypts diff
  - music/queries.ts: useAllSongs decrypts before title sort,
    useAllPlaylists decrypts before name sort
  - music/ListView.svelte (Workbench)
  - music/views/DetailView.svelte (inline editor)
  - cross-app-queries.useMusicStats decrypts only the recent slice
  - search/providers/music decrypts songs + playlists before scoring

Events (social gatherings + guests)
-----------------------------------
This one needed careful handling because publishEvent is the
exception to the local-only confidentiality model — it intentionally
pushes the event content to a public RSVP page anyone with the link
can read.

  - events/stores/events.svelte.ts:
    - createEvent encrypts before .add
    - updateEvent encrypts the diff before .update
    - publishEvent + syncSnapshotIfPublished now DECRYPT the local row
      before forwarding to eventsApi.publish / .updateSnapshot — the
      server-side public snapshot needs plaintext, by design. The
      privacy contract is: drafts and unpublished events are
      encrypted at rest; the moment you publish, you accept that the
      content becomes readable via the share link.
  - events/stores/guests.svelte.ts: addGuest + updateGuest encrypt
    diff before write. Guests are NEVER pushed to the public
    snapshot, so no decrypt-before-publish path.
  - events/queries.ts: useAllEvents, useUpcomingEvents, usePastEvents,
    useEvent all decrypt the visible socialEvents rows before joining
    with timeBlocks. useGuestsByEvent + useEventGuests decrypt the
    eventGuests rows.

Phase 8 is the last big sweep. The registry is now ~25 tables on,
~3 left intentionally off (manaLinks because no user content;
boards / boardItems / dreamSymbols partially handled in earlier
phases). The "what's encrypted?" surface should look complete on
the settings/security page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:44:18 +02:00
Till JS
40b7069eb0 feat(crypto): phase 7.2 — encrypt storeless modules (questions, links, documents, meals)
Five storeless modules whose writes happen directly from view files
(no central store yet) get the same encryption treatment by wrapping
each .add/.update call site with encryptRecord and each read site
with decryptRecord(s). Registry entries are also corrected to match
the actual schemas — the previous Phase 1 placeholder names guessed
the wrong field names.

Registry corrections + flips
----------------------------
  - meals: was ['description', 'notes', 'aiAnalysis'] → now
    ['description', 'portionSize'] (LocalMeal has neither notes nor
    aiAnalysis on the schema; portionSize is a short user label same
    sensitivity as description)
  - documents: was ['title', 'content', 'body'] → now
    ['title', 'content'] (LocalDocument uses content, no body column)
  - links: was ['title', 'description', 'targetUrl'] → now
    ['title', 'description']. originalUrl STAYS PLAINTEXT — the
    public redirect handler resolves shortCode → originalUrl on every
    click, encrypting it would force the redirect path to do an async
    decrypt before issuing the 302
  - questions: was ['title', 'body', 'notes'] → now
    ['title', 'description'] (LocalQuestion uses description)
  - answers: was ['body'] → now ['content'] (LocalAnswer uses content)

All five tables flipped to enabled:true.

Write sites wrapped
-------------------
Each call site builds the row/diff as a typed object, runs
encryptRecord on it, then calls table.add / table.update:

  - questions/views/DetailView.svelte (saveField)
  - questions/[id]/+page.svelte (saveEdit + answer.add)
  - questions/new/+page.svelte (initial create)
  - uload/+page.svelte (createLink + saveEdit)
  - uload/views/DetailView.svelte (saveField)
  - context/documents/+page.svelte (handleCreateDocument)
  - context/documents/[id]/+page.svelte (handleSave with encrypted diff)
  - context/spaces/[id]/+page.svelte (handleCreateDocument)
  - nutriphi/add/+page.svelte (handleSubmit)

Pure metadata writes (toggle pinned, toggle isActive, soft-delete via
deletedAt) are intentionally NOT wrapped — they touch zero encrypted
fields so encryptRecord would be a no-op anyway.

Read sites decrypted
--------------------
  - questions/queries.ts: useAllQuestions, useAnswersByQuestion
  - questions/views/DetailView.svelte (liveQuery clone)
  - questions/ListView.svelte (Workbench)
  - uload/queries.ts: allLinks$, useAllLinks, useLinkById
  - uload/views/DetailView.svelte (liveQuery clone)
  - uload/ListView.svelte
  - uload/settings/+page.svelte (decrypts before serializing the
    JSON export — otherwise the user would download ciphertext)
  - context/queries.ts: useAllDocuments, useSpaceDocuments
  - context/ListView.svelte
  - cross-app-queries.useRecentDocuments (dashboard widget)
  - nutriphi/queries.ts: useAllMeals
  - nutriphi/ListView.svelte

The cards/dashboard widget for nutrition only reads m.nutrition (the
plaintext numeric breakdown), so it stays untouched. nutriphi/history
benefits transparently because it consumes useAllMeals which now
decrypts.

Why
---
Closes the second-tier plaintext gaps. The five tables flipped here
were on the registry from day one but stuck behind enabled:false
because no central store existed to hook into. Phase 7.2 takes the
pragmatic approach of wrapping at each call site rather than blocking
on a store extraction refactor — same end result for security, much
smaller diff. A future store consolidation pass can collapse the
duplication without changing the encryption surface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:29:32 +02:00
Till JS
c875b4e966 feat(crypto): phase 7.1 — encrypt timeBlocks-coupled tasks + calendar events
Flips three coordinated registry entries to enabled:true at once:

  - tasks: title, description, subtasks, metadata
  - events (calendar): title, description, location
  - timeBlocks: title, description (NEW entry)

These three tables have to move together because the consumer modules
(todo, calendar) denormalize their title/description into a TimeBlock
for cheap calendar rendering. Encrypting only the source records would
still leak the same fields through the timeBlocks hub. Indexed columns
(startDate, endDate, kind, type, sourceModule/sourceId, parentBlockId,
recurrenceDate, isLive, isCompleted, dueDate, priority) all stay
plaintext — the calendar query layer needs them for range scans.

Service layer
-------------
- time-blocks/service.ts: createBlock + updateBlock now route through
  encryptRecord before the Dexie write. startFromScheduled decrypts the
  scheduled block first so the new logged block carries plaintext
  forward instead of an already-encrypted blob (encryptRecord is
  idempotent so this is also defence-in-depth). New decryptBlock helper
  for callers that need plaintext outside a liveQuery.
- todo/stores/tasks.svelte.ts: createTask snapshots the plaintext task
  before encryptRecord mutates it, returns the snapshot to the UI.
  updateTask decrypts the existing row before forwarding task.title as
  a fallback into updateBlock (would otherwise leak ciphertext to the
  linked TimeBlock). updateLabels + updateSubtasks decrypt-merge-encrypt
  so structured fields don't get spliced into a ciphertext blob.
- calendar/stores/events.svelte.ts: encryptRecord wrapped around all
  four event-write paths (create, update, updateSingleInstance,
  updateAllFuture).

Read paths
----------
Every liveQuery / one-shot read that surfaces title/description/
location through the UI now decrypts after the plaintext-metadata
filter:

  - time-blocks/queries.ts: useAllTimeBlocks, timeBlocksInRange$,
    timeBlocksBySource$, useLiveTimeBlock
  - todo/queries.ts: useAllTasks
  - calendar/queries.ts: useAllCalendarItems (decrypts both the blocks
    and the joined events)
  - cross-app-queries.ts: useOpenTasks, useTodayTasks, useUpcomingTasks,
    useUpcomingEvents
  - dashboard widgets: DayTimelineWidget, ActivityFeedWidget,
    TasksTodayWidget, UpcomingEventsWidget
  - search providers: todo + calendar (substring scoring needs
    plaintext)
  - quick-input adapters: todo + calendar (search-as-you-type)
  - calendar/components/ConflictWarning, CalendarHeader (iCal export
    embeds title in the file)
  - calendar/views/DetailView, todo/views/DetailView (inline editor)
  - api/services/qr-export (the QR snapshot would otherwise ship
    ciphertext)
  - triggers/suggestions (cross-matches habit titles against task /
    event titles)
  - todo/reminder-source (notification body uses task title)

Habits is implicitly covered: it only writes through createBlock /
updateBlock and only reads block.startDate from the timeBlock side, so
no per-store changes were needed for habits to participate.

Why
---
This closes the last big plaintext gap on the dashboard. tasks +
events + the timeBlocks hub were the highest-value targets after chat
+ contacts because they're the surfaces a casual observer of an
unlocked DB would scan first ("what's this person doing today?"). With
Phase 7.1, the answer to that query is opaque without the master key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 20:37:59 +02:00
Till JS
4bdf4238ce docs(mana/web): roundup data layer audit through encryption phase 6
Updates DATA_LAYER_AUDIT.md to reflect everything that landed since
the last refresh (which stopped at Sprint 4). The doc is now the
authoritative status surface for both audit-sprint and encryption-
sprint progress.

What's new in the doc:

  Status table (Section 0)
    Adds the missing post-Sprint 4 work and the full encryption phase
    table:
      - Sprint 4+ Listeners (575c5c36f)
      - Test-Fix sprint (ae648650e)
      - Backlog 1/2/3 — Indexed queries V9, SSE pipeline, Activity log
      - Encryption phases 1-6 with commits
    The "tests passing" line bumps to 262/262 across 20 files.

  Architecture diagram (Section 1)
    Shows how a write now flows through encryptRecord BEFORE the
    Dexie hook, and how reads route through decryptRecords on the
    way out of liveQuery. Adds a second diagram for the Encryption
    Pipeline (login → vault unlock → MemoryKeyProvider → wrap/
    unwrap → IndexedDB) that wasn't documented anywhere before.

  File map (Section 1)
    Splits into "Datenschicht" and "Encryption" sub-tables. The
    encryption table lists all 17 new files across crypto/, mana-auth
    services, the settings page and the onboarding banner with a
    one-line purpose for each.

  Eckdaten
    Schema versions 1-10 (was 1-7), and the new "At-Rest-Encryption"
    bullet noting 22+ tables.

  Critical fixes table (Section 2 🔴)
    #4 "Keine Verschlüsselung im Browser" flips from "noch offen" to
    "Encryption Phase 1-6 " with the one-line summary.

  🟢 backlog status table
    #13 SSE buffer flips to  via Backlog 2.
    #14 Tombstone cleanup loop flips to  via Sprint 4+.
    #18 Activity log flips to  via Backlog 3.

  New Section 5 — Encryption Pipeline
    Documents the trust model end-to-end:
      - Where each piece lives (mana-auth env KEK, wrapped MK in
        encryption_vaults, browser sessionStorage, IndexedDB blobs)
      - The complete table-by-table list of WHAT is encrypted and
        WHAT stays plaintext, with the per-table reasoning for the
        plaintext exceptions (dreamSymbols.name for indexed lookup,
        cycleDayLogs.symptoms for Set-diff, inventar.invItems.name
        for index, etc.)
      - "Was Mana technisch (nicht) sehen kann" — three-level honest
        disclosure: never / theoretically / structurally

  Section 6 — Backlog
    Reorders by remaining encryption work first:
      1. Phase 7 cross-module title coverage (timeBlocks coupling)
      2. Phase 7 server-pushed records (picture/storage/music)
      3. Phase 7 storeless modules (nutriphi/uload/context/questions)
      4. Phase 8 recovery code opt-in for true zero-knowledge
      5. Conflict viz UI
      6. Composite indexes for multi-account
      7. V3 migration tests

  Stärken (Section 7)
    Adds the encryption-specific properties: dedicated crypto/ sub-
    module entkoppelt vom sync layer, vault-singleton via
    vault-instance.ts, dimension "Vertraulichkeit" added to the
    final tagline.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 20:03:43 +02:00
Till JS
28395b313d docs: GPU tunnel setup, STT env wiring, and 2026-04-07 postmortem
Three docs updates landing the institutional knowledge from today's
Memoro voice recording deploy:

- docs/MAC_MINI_SERVER.md: architecture diagram updated to show the
  two-tunnel setup (cloudflared on the Mac Mini for *.mana.how
  except gpu-*, plus a separate cloudflared running as a Windows
  Service on the GPU box for gpu-*.mana.how). New "GPU Tunnel
  (mana-gpu-server)" section explains how to add hostnames in the
  Cloudflare dashboard, the standard 502 debug ladder (DNS misroute,
  service stopped, scheduled task crashed, missing public hostname),
  and how the API key flows from the Windows .env through Mac Mini
  .env to the mana-web container.

- docs/ENVIRONMENT_VARIABLES.md: STT section updated to reflect that
  MANA_STT_URL/API_KEY are now wired into the mana-web container via
  docker-compose.macmini.yml (committed in 42bd2a3a0). Health-check
  command added; cross-link to MAC_MINI_SERVER.md for the debug ladder.

- docs/POSTMORTEM_2026-04-07.md (new): full incident timeline of
  today's deploy. Six root causes (tunnel never started, DB wiped
  without re-push, untracked module-registry files, uncommitted
  Dockerfile heap bump, missing compose env vars, /offline prerender
  500). Three "what went poorly" honest assessments (premature P0
  alarm, miscounted commits, clumsy stash dance). Action items split
  by priority — high priority is the clean-clone build CI job, which
  would have caught half the issues today.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:59:04 +02:00
Till JS
6b8e2c7176 feat(mana/web): encryption phase 6.2/6.3 — settings page + onboarding banner
Two user-facing surfaces for the encryption pipeline that's been
running invisibly since Phase 4. Closes the loop on "we encrypt
your data" by making the claim concrete, verifiable, and rotatable.

vault-instance.ts (new)
  Lazy-singleton wrapper around createVaultClient. The root layout
  was holding a private vault client reference; the settings page
  needs the same instance to call rotate() and read state.
  getVaultClient() builds it on first call from authStore +
  getManaAuthUrl(), reuses it forever after. Phase 3's
  setKeyProvider/getActiveKey wiring means the rest of the data
  layer doesn't need to know about the singleton at all — only
  callers that want to drive lock/unlock/rotate explicitly do.

  +layout.svelte and the new settings/security page both call
  getVaultClient() — the underlying MemoryKeyProvider is shared
  via setKeyProvider, so an unlock from either surface immediately
  reflects in both.

routes/(app)/settings/security/+page.svelte (new)
  Surface for the encryption vault state. Three sections:

    1. STATUS card with a coloured badge:
       - 🔒 Verschlüsselt (green) when unlocked
       - 🔓 Gesperrt (amber) when locked, plus a "Schlüssel jetzt
         laden" button that calls vaultClient.unlock()
       - error states distinguish auth/network/server with
         localised copy and a retry button

       A 1-second poll mirrors external lock/unlock events
       (logout, manual lock from another tab) so the badge stays
       fresh without a hard refresh. Disposed on unmount.

    2. ENCRYPTED FIELDS list — derived from the registry:
       Object.entries(ENCRYPTION_REGISTRY).filter(enabled).map(...)
       Renders one row per table with the field allowlist visible
       in monospace, plus a count summary at the top. The list is
       always honest: if a registry entry is enabled:false (Phase 7
       targets, server-pushed tables, etc.), it does not appear.

    3. ROTATE card (danger styling):
       Two-step confirm before mutating. Calls vaultClient.rotate()
       which the existing Phase 3 wire already routes through
       /api/v1/me/encryption-vault/rotate. Toast on success/failure.
       Explicitly documents that the old MK is GONE and current
       data is NOT auto-re-encrypted — the user accepts that risk.

    4. HONEST DISCLOSURE section: lists what Mana CAN'T see
       (encrypted blobs), what Mana COULD technically see
       (the wrapped MK if a hosting employee actively reaches for
       the KEK), and what's structurally visible (counts,
       timestamps, relationships). Reads better than any policy
       page because it's anchored in the actual data layout.

EncryptionIntroBanner.svelte (new)
  One-time onboarding banner that fires on the first vault unlock
  ever on a given device. Uses localStorage('mana-encryption-intro-
  dismissed') as the persistent flag. Shows a green-bordered card
  bottom-centre explaining at-rest encryption in three sentences,
  with a "Mehr erfahren →" link to /settings/security and an X
  dismiss button.

  Why a banner instead of a toast?
    - Toasts disappear after 3s; a privacy claim deserves longer
      attention.
    - The banner has room for a learn-more link; toasts don't.
    - Dismissing it is an explicit user action, which matches the
      "you understand and accept" social contract.

  Polls vault state every 500ms for up to 30s after mount so it
  fires even if the unlock happens asynchronously after the layout
  finishes rendering. Auto-clears the timer once it shows or after
  the 30s window. SSR-safe: localStorage access is guarded.

  Mounted globally in the root layout next to the existing
  SuggestionToast, OfflineIndicator, PwaUpdatePrompt.

Layout integration
  routes/+layout.svelte:
    - Drops the inline createVaultClient + getManaAuthUrl import
      in favour of getVaultClient() — single source of truth.
    - <EncryptionIntroBanner /> mounted alongside the other
      global UI elements.

Verified: 20 test files, 262/262 tests passing. Pre-existing
TS error in src/routes/(app)/settings/+page.svelte:338
(getSecurityEvents on authStore) is unrelated parallel drift.

Encryption pipeline status: Phase 1-6 complete.
  - 22 tables encrypted at rest covering >85% of user-typed bytes
  - Server-side master key vault with KEK-wrapping (mana-auth)
  - Vault unlock on login, lock on logout
  - Per-record encryptRecord/decryptRecord through every store
  - Settings UI showing status + rotate
  - First-login onboarding banner

Remaining for a hypothetical Phase 7:
  - tasks/calendar.events/habits — title leakage via timeBlocks
  - picture/storage/music — server-pushed, needs API encryption
  - nutriphi/uload/context.documents/questions — store extraction
    needed before they can flow through encryptRecord
  - Recovery code opt-in for true zero-knowledge users (server
    can't even technically decrypt)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:54:09 +02:00
Till JS
de33ed8687 fix(mana/web): disable prerender on /offline (FIXME)
The SvelteKit prerender worker throws "Error: 500 /offline" with no
usable stack trace, blocking the production build. Suspected cause: a
module-level side-effect on the shared layout that fails when no
`window` is available — likely from one of the new vault-client or
data-layer-listeners imports that landed in the encryption phase 4-6
sprints.

SSR'ing /offline at request time is harmless — it's just a static
"you're offline" message — so this is a safe workaround that unblocks
the deploy. The real fix is to bisect which import on the offline
codepath throws on the bare server and add a `typeof window` guard
or move it to onMount.

Without this, the unified mana-web image cannot be rebuilt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:50:32 +02:00
Till JS
5d4123d2b0 fix(mana/web): commit module-registry + module.config.ts files (build-critical)
These files have been sitting untracked in working trees on multiple
machines since the unified module-registry refactor. database.ts
imports from $lib/data/module-registry but the file itself was never
git-add'd, so the production build crashes on any clean clone with:

    Could not resolve "./module-registry" from "src/lib/data/database.ts"

Discovered today during the first deploy of the Memoro recording
pipeline: pulling onto the Mac Mini (which had its own untracked copies
of these files in a stash) revealed that origin/main has been silently
broken for clean builds. Fixed by committing the canonical versions:

  - apps/mana/apps/web/src/lib/data/module-registry.ts
  - apps/mana/apps/web/src/lib/data/module-registry.test.ts
  - apps/mana/apps/web/src/lib/modules/{31 modules}/module.config.ts

The events module already had its module.config.ts committed in
6a60e22a3 (events Phase 2), so it isn't included here.

Also bumps apps/mana/apps/web/Dockerfile build heap from 4096 → 8192:
the unified app outgrew the 4 GB ceiling somewhere between Sprint 2
and Sprint 3 of the data layer rewrite, and Vite OOMs while bundling
all 32 module chunks. The bump existed locally on multiple boxes but
was never committed; today's deploy hit the OOM and required restoring
the bump from a stash to make the image rebuild succeed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 19:49:58 +02:00