The custom /api/v1/auth/login route signs the user in via the
better-auth SDK (auth.api.signInEmail) and then forges a request to
/api/auth/token to mint a JWT, passing the session token as a synthetic
cookie header.
The cookie name was hardcoded as `mana.session_token=...`, but in
production better-auth issues the session cookie with the __Secure-
prefix (because secure: true is enabled). Get-session middleware on the
/api/auth/token side couldn't find the session under the unprefixed
name, so it returned 401 silently. Result: tokenResponse.ok was false,
the route fell through, and the response had no `accessToken` field at
all — only the bare { token, user, redirect } from signInEmail.
The frontend in @mana/shared-auth then picked this up as
`data.accessToken === undefined` and stored undefined as the JWT, while
the parallel /api/auth/sign-in/email call masked the visible damage by
setting the SSO cookie. So login *appeared* to work in the browser
(cookie present, session worked) but the JWT path was always broken.
Fix: pick the cookie name based on config.nodeEnv. In production use
__Secure-mana.session_token, in development use mana.session_token (no
__Secure- prefix because secure: false in dev).
Verified end-to-end on auth.mana.how:
POST /api/v1/auth/login → response now includes accessToken (a real
JWT, EdDSA, with sub/email/role/sid/tier/iss/aud claims), refreshToken
(the session token), plus the original signInEmail fields.
The other /api/auth/get-session call sites in this file forward the
incoming request headers verbatim, so they preserve whatever real cookie
the browser sent and don't have this bug.
mana-auth has been crash-looping in production with:
error: Cannot find package 'nanoid' from
'/app/src/services/encryption-vault/index.ts'
The encryption-vault service imports nanoid for audit row IDs (line 27,
used at line 547 in the audit log writer), but nanoid was never added
to services/mana-auth/package.json. The import was introduced in commit
e9915428c (phase 2 — server-side master key custody) and slipped past
because nanoid happens to exist transitively in the workspace via
postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just
fine; a strict isolated container build can't.
Fix:
- Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps
- pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules
Verified the import resolves locally:
bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())'
→ ok: 6TLuTWlenhC0KnSESn5Ex
The Mac Mini still needs to redeploy mana-auth (rebuild image with the
new lockfile, restart container) to pick this up — production is
currently 502ing on auth.mana.how.
mana-voice-bot's source default was 3050, which collided with mana-sync.
Today the collision is latent (voice-bot isn't deployed anywhere), but
sooner or later someone is going to start it on a host that's already
running mana-sync and the second one will refuse to bind. Moving to
3024 puts it inside the AI/ML port range alongside its dependencies
(stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync.
Updated:
- app/main.py — PORT default 3050 → 3024
- start.sh, setup.sh — same fix in the example commands
- CLAUDE.md — full rewrite. Old version described "Mac Mini deployment"
with launchd; the new version explicitly says "not deployed yet" and
documents the seven concrete steps to deploy on the Windows GPU box
alongside the other AI services (Scheduled Task, service.pyw, .env,
firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update).
docs/WINDOWS_GPU_SERVER_SETUP.md:
- Added the missing ManaVideoGen scheduled task to all four
Start-ScheduledTask snippets — video-gen has been running on the
Windows GPU but the doc had never picked it up.
- Added a "mana-video-gen (Port 3026)" service section parallel to the
existing image-gen one, with venv path, repo pointer, model, etc.
- Added a repo-pendants table mapping C:\mana\services\<svc>\ to the
corresponding services/<svc>/ directory in the repo, plus a note that
changes should flow repo→Windows, not the other way around.
docs/PORT_SCHEMA.md:
- Reconciled the warning block with the post-cleanup reality: no more
active or latent port collisions (image-gen ↔ video-gen and
voice-bot ↔ sync are both resolved). Listed the actual ports per host
with public URLs. Kept the planned-vs-actual disclaimer for the
services that still don't match the aspirational ranges (mana-credits
3061 vs planned 3002, etc).
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.
Removed (Mac-Mini deployment infrastructure):
services/mana-stt/
- com.mana.mana-stt.plist (LaunchAgent)
- com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh (single-service launchd installer)
- install-services.sh (mana-stt + vllm-voxtral installer)
- setup.sh (Mac arm64 installer)
- scripts/setup-vllm.sh (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh
services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh (Mac arm64 installer)
scripts/mac-mini/
- setup-image-gen.sh (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist
setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.
Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
list to mention the now-removed plists, added the full GPU service
port table with public URLs, added a cleanup snippet for any old plists
still installed on a Mac Mini somewhere
The repo's mana-image-gen used to be a Mac Mini–only service built on
flux2.c with hard MPS+arm64 platform checks. The actual production
image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace
diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code
that lived only at C:\mana\services\mana-image-gen\ on the GPU box.
This commit pulls the Windows implementation into the repo and deletes
the Mac one, so there's exactly one mana-image-gen and its source of
truth is git rather than one folder on one machine.
Removed:
- setup.sh — Mac-only flux2.c installer with hard arm64 platform check
- app/main.py (Mac flux2.c subprocess wrapper version)
- app/flux_service.py (Mac flux2.c subprocess wrapper version)
Added (pulled from C:\mana\services\mana-image-gen\):
- app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup)
- app/flux_service.py — diffusers FluxPipeline wrapper
- app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY)
- app/vram_manager.py — shared VRAM accounting
- service.pyw — Windows runner used by the ManaImageGen scheduled task
Updated:
- main.py PORT default from 3025 → 3023 to match the production reality
(the service.pyw runner already binds 3023 explicitly via uvicorn.run,
but the source default should match so direct uvicorn invocations and
local tests don't pick the wrong port)
- CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack
- README.md trimmed to a pointer at CLAUDE.md + the public URL
- .env.example written from scratch (didn't exist before — the service's
.env on the GPU box was undocumented)
The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the
actual Mac Mini deployment will be cleaned up in the next commit, along
with the rest of the Mac-Mini AI service infrastructure.
The Windows GPU server has been the actual production home for these
services for some time, and the running code there has drifted ahead of
the repo. This sync pulls the live versions back into the repo so the
Windows box is no longer the only place those changes exist.
Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11):
mana-llm:
- src/main.py, src/config.py — small fixes (auth wiring, config tweaks)
- src/api_auth.py — NEW (cross-service GPU_API_KEY validator)
- service.pyw — Windows runner used by the ManaLLM scheduled task
(sets up logging redirect, loads .env, calls uvicorn)
mana-stt:
- app/main.py — substantial cleanup (684→392 lines), drops the
whisperx-as-separate-backend branching now that whisper_service.py
rolls whisperx in directly
- app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines)
- app/auth.py + external_auth.py — significantly expanded auth
- app/vram_manager.py — NEW (shared VRAM accounting helper)
- service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH
injection, .env loading
- removed: app/whisper_service_cuda.py (folded into whisper_service.py)
- removed: app/whisperx_service.py (folded into whisper_service.py)
mana-tts:
- app/auth.py, external_auth.py — same auth expansion as stt
- app/f5_service.py, kokoro_service.py — Windows tweaks
- app/vram_manager.py — NEW (same shared helper as stt)
- service.pyw — Windows runner
mana-video-gen:
- service.pyw — Windows runner (no other changes; the .py code on the
GPU box is byte-identical to what's already in the repo)
The service.pyw files contain absolute Windows paths
(C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user
profile. Kept as-is intentionally — they exist to be deployed to that
one machine and any abstraction layer would just hide what's actually
happening. Anyone redeploying to a different layout will need to edit
the path strings, which is a known and obvious change.
Mac-Mini infrastructure for these services (launchd plists, install
scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen
implementation) is still on disk and will be removed in a follow-up
commit, along with replacing mana-image-gen with the Windows
diffusers+CUDA implementation. This commit is just the live-code sync.
Source default was 3026 but Mac Mini production has been overriding to
3025 via the launchd plist in scripts/mac-mini/setup-image-gen.sh ever
since the service was set up. The override existed in exactly one place
that is not version-controlled in any obvious way — anyone redeploying
without that script would land on 3026 and clients pointing at 3025
would fail to connect.
Source default → 3025 across main.py, setup.sh, README, CLAUDE.md so the
launchd plist is no longer load-bearing. The Mac Mini setup script still
sets PORT=3025 explicitly; that's now belt-and-suspenders rather than the
only thing keeping production alive.
Also added a note clarifying that this Mac Mini service (flux2.c, MPS,
arm64-only) is *not* the same thing as the "image-gen" running on the
Windows GPU server (PyTorch + diffusers + CUDA, port 3023, code lives at
C:\mana\services\mana-image-gen\ outside this repo). Two different
implementations sharing a name was confusing the port-collision audit.
Updated docs/PORT_SCHEMA.md warning block to retract the previous false
claims of two active port collisions:
- image-gen ↔ video-gen on 3026 — wrong: image-gen runs on Mac Mini
on 3025 (now also the source default), video-gen is alone on the
Windows GPU on 3026
- voice-bot ↔ sync on 3050 — latent only: mana-voice-bot is not
deployed anywhere (no launchd, no scheduled task, no cloudflared
route), so the collision is in source defaults but not in production
The voice-bot 3050 default should still be moved before voice-bot is
ever deployed — flagged in the PORT_SCHEMA warning instead of silently
fixed since voice-bot deployment is its own decision.
New service docs:
- services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local),
WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy
backend loading and the launchd plist setup on the Mac Mini.
- services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and
event-sharing. Documents the host (JWT) vs public (token) split, the
rate-limit sweeper, and the createApp factory pattern that lets unit
tests run without bootstrapping the production sweeper.
Stale entries fixed:
- mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the
rewrite is the only mana-auth there is now. Email channel updated from
Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md).
- mana-notify: same Brevo → Stalwart fix in the channel table and env
var defaults.
PORT_SCHEMA.md flagged as aspirational:
- The doc was dated 2026-03-28 and presented as "single source of truth",
but cross-checking against actual service source files (config.go,
main.py, start.sh) shows nothing matches. Added a prominent warning at
the top with the real ports + two confirmed collisions:
* mana-image-gen and mana-video-gen both default to PORT 3026
* mana-voice-bot and mana-sync both default to PORT 3050
Today these are masked because image-gen + voice-bot live on the
Windows GPU server while video-gen + sync live on the Mac Mini, but
the moment they share a host they collide. Either execute the planned
reorg or pick non-colliding ports and rewrite the doc to match
reality — flagged as a real follow-up.
PyTorch's `torch.cuda.get_device_properties(0)` returns a
`_CudaDeviceProperties` object whose memory attribute is
`total_memory` (bytes), not `total_mem`. The typo crashed the
service immediately at startup because `get_model_info()` is
called from the FastAPI lifespan handler, not lazily — uvicorn
logged "Application startup failed" before any request could land.
Found while installing mana-video-gen on the Windows GPU box
(192.168.178.11:3026) for the gpu-video.mana.how Cloudflare route.
After the fix the service starts cleanly under the ManaVideoGen
scheduled task and responds 200 on /health both LAN and via
Cloudflare tunnel. status.mana.how now reports 42/42 — first time
ever.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Five documentation surfaces gained encryption awareness in this
sweep. Before this commit, the only place anyone could learn about
the at-rest encryption layer or the zero-knowledge opt-in was the
internal DATA_LAYER_AUDIT.md. New contributors and self-hosters
would never discover one of the most important features of the
product just by reading the standard onboarding docs.
apps/docs/src/content/docs/architecture/security.mdx (NEW)
----------------------------------------------------------
First-class user-facing security page in the Starlight site,
slotted into the Architecture sidebar between Authentication and
Backend.
Sections:
- What's encrypted (overview table of 27 modules + the
intentional plaintext carve-outs)
- Standard mode flow with ASCII diagram
- "What Mana CAN see" trust statements per mode
- Zero-knowledge mode setup walkthrough (Steps component)
- Unlock flow on a new device
- Recovery code rotation
- Deployment requirements (the loud MANA_AUTH_KEK warning)
- Audit trail action vocabulary
- Threat model summary table
- Implementation file references with paths
services/mana-auth/CLAUDE.md
----------------------------
New "Encryption Vault" section under Key Endpoints, listing all 7
routes (status, init, key, rotate, recovery-wrap GET+DELETE,
zero-knowledge) with their HTTP method, path, error codes, and a
description. Mentions the three CHECK constraints + RLS + audit
table. Points readers at DATA_LAYER_AUDIT.md and the new
security.mdx for the deep dive.
Environment Variables block gains MANA_AUTH_KEK with a multi-line
comment explaining the openssl rand command + dev fallback warning.
apps/mana/CLAUDE.md
-------------------
Full rewrite. The existing file was from the Supabase era and
described things like @supabase/ssr, safeGetSession(), and a
five-table schema with users + organizations + teams that doesn't
exist any more. Replaced with the unified-app architecture:
- Module system layout (collections.ts / queries.ts / stores/)
- Mana Auth (Better Auth + EdDSA JWT) instead of Supabase
- Local-first data layer with the full pipeline diagram
- At-rest encryption section with the "when writing module code
that touches sensitive fields" 4-step guide
- Updated routing structure (no more separate /organizations,
/teams routes)
- Module store pattern code example
- Reference document table at the bottom pointing at the audit,
the new security.mdx, and the auth doc
Root CLAUDE.md
--------------
New "At-Rest Encryption (Phase 1–9)" subsection under the
Local-First Architecture section. Two-mode trust summary table,
production requirement for MANA_AUTH_KEK with the openssl command,
the "when writing module code" 4-step guide, and a reference
table. New contributors reading the root CLAUDE.md from top to
bottom now hit encryption naturally as part of the data layer
discussion.
.env.macmini.example
--------------------
MANA_AUTH_KEK was missing from the production env example
entirely — the macmini deployment would silently boot on the
32-zero-byte dev fallback if you copied this file. Added with a
multi-paragraph comment covering: how to generate, why it's
required, how to store securely (Docker secrets / KMS / Vault),
and the rotation caveat.
apps/docs/src/content/docs/deployment/self-hosting.mdx
------------------------------------------------------
Two changes:
1. Added MANA_AUTH_KEK to the mana-auth service block in the
Compose example with an inline comment pointing at the new
section below.
2. New "Encryption Vault Setup" H2 section with subsections:
- Generating a KEK (with a fake example value labelled DO NOT
USE — generate your own)
- Securing the KEK (Docker secrets, KMS, systemd
LoadCredential, anti-patterns)
- "What if I lose the KEK?" — explains the data is
unrecoverable by design and mitigation via zero-knowledge
mode opt-in
- KEK rotation — calls out the missing background re-wrap
job as a known limitation
apps/docs/astro.config.mjs
--------------------------
Added "Security & Encryption" entry to the Architecture sidebar
between Authentication and Backend so the new page is reachable
from the docs nav.
Astro check: 0 errors, 0 warnings, 0 hints across 4 .astro files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes backlog #1 from the Phase 9 audit. Adds 28 integration tests
for the EncryptionVaultService against a real Postgres so the
RLS policies, CHECK constraints and audit-row writes are exercised
as the production app actually sees them. The pure-crypto KEK tests
in kek.test.ts already covered the wrap/unwrap primitives — this
new file fills in the service-shaped gaps that need a real DB.
Test infrastructure
-------------------
- Reads TEST_DATABASE_URL from env. Whole suite is SKIPPED via
describe.skip if unset, so unrelated CI runs and `bun test` from
a fresh checkout don't fail on missing connection. The
encryption-vault sub-job has to provision a Postgres explicitly.
- Schema is assumed already migrated (run `pnpm db:push` or apply
sql/002 + sql/003 manually before invoking the suite). Tests
insert a fresh test user per case via beforeEach so cross-test
pollution is impossible despite the FK to auth.users.
- afterAll cleans up the user (CASCADE wipes vault + audit) and
closes the postgres pool so bun test exits cleanly.
Coverage
--------
init (3):
- Mints a fresh vault, wrapped_mk + wrap_iv populated, ZK off
- Idempotent (returns same key)
- Audit rows are written
getStatus (5):
- vaultExists=false for unconfigured user
- vaultExists=true after init, no recovery wrap
- hasRecoveryWrap=true after setRecoveryWrap
- zeroKnowledge=true after enableZK
- Does NOT write an audit row (cheap metadata read)
setRecoveryWrap (4):
- Stores wrap on existing vault
- VaultNotFoundError on missing vault
- Idempotent (replaces previous wrap)
- Writes recovery_set audit row
clearRecoveryWrap (3):
- Removes the wrap
- ZeroKnowledgeActiveError when ZK is on
- VaultNotFoundError on missing vault
enableZeroKnowledge (4):
- Flips zero_knowledge=true and NULLs out wrapped_mk + wrap_iv
- RecoveryWrapMissingError if no recovery wrap is set
- Idempotent (already-on is no-op)
- VaultNotFoundError on missing vault
disableZeroKnowledge (2):
- Restores wrapped_mk from a client-supplied master key,
verifies the round-trip via getMasterKey returns the same bytes
- No-op when ZK is already off
getMasterKey (3):
- Returns unwrapped MK in standard mode
- Returns recovery blob with requiresRecoveryCode=true in ZK mode
- VaultNotFoundError on missing vault
rotate (2):
- Mints fresh MK and wipes any existing recovery wrap
- ZeroKnowledgeRotateForbidden in ZK mode
DB-level invariants (2):
- Setting wrapped_mk back while ZK active is rejected by
encryption_vaults_zk_consistency
- Setting wrap_iv to NULL while wrapped_mk is set is rejected
by encryption_vaults_wrap_iv_pair
Both wrap the Drizzle update in an arrow IIFE so
expect(...).rejects.toThrow() sees a real Promise (Drizzle's
chainable update() only executes on await/then).
Run results
-----------
With TEST_DATABASE_URL set + schema migrated:
28 pass, 0 fail, 64 expect() calls
Without TEST_DATABASE_URL set (default):
0 pass, 30 skip (full suite cleanly skipped)
KEK tests in kek.test.ts still run unaffected.
Drive-by: kek.test.ts header comment updated to point at the new
sibling file instead of saying "tests will live alongside mana-sync"
(which was outdated speculation from Phase 2).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes the Phase 9 Milestone 4 known limitation where the settings
page always started in 'idle' state regardless of whether the user
had already enabled zero-knowledge mode. Adds a cheap server-side
status read + hydrates the page on mount.
Server side
-----------
New VaultStatus interface and getStatus(userId) method on
EncryptionVaultService — single SELECT against encryption_vaults,
no decryption, no audit logging (this gets called on every settings
page mount and we don't want to flood the audit log with read-only
metadata fetches). Returns sane defaults when the vault row doesn't
exist yet so the client can avoid a 404 dance.
GET /api/v1/me/encryption-vault/status →
{
vaultExists: boolean,
hasRecoveryWrap: boolean,
zeroKnowledge: boolean,
recoverySetAt: string | null
}
Client side
-----------
vault-client.ts gains a `getStatus()` method that bypasses the
fetchVault retry helper (status reads should be cheap and one-shot;
if they fail we let the caller fall back to defaults). Re-exports
VaultStatus + RecoveryCodeSetupResult from the crypto barrel.
settings/security/+page.svelte
------------------------------
onMount kicks off a getStatus() call. Two things change based on
the response:
1. If the server says zero_knowledge=true, jump zkSetupStep to
'enabled' so the page renders the active-state UI directly
instead of the setup flow.
2. New `hasRecoveryWrap` state tracks whether a wrap is stored,
even if ZK isn't active yet. The idle branch now has TWO
variants:
- hasRecoveryWrap=false: original "Recovery-Code einrichten"
single button (unchanged from milestone 4)
- hasRecoveryWrap=true: amber notice "you have a code stored
but ZK isn't active" with three buttons:
* "Zero-Knowledge jetzt aktivieren" (jumps straight to the
enable call)
* "Neuen Recovery-Code generieren" (rotates the wrap)
* "Recovery-Code entfernen" (with two-click confirmation,
calls DELETE /recovery-wrap)
This handles the previously-orphaned state where a user generated a
code, copied it to their password manager, but never confirmed the
final activation step. Without this branch, after a reload the
settings page would show "Setup" again and the call would fail
with "vault is already in zero-knowledge mode" — except it wouldn't,
because the vault wasn't actually in ZK yet, just had a recovery wrap
stored. Either way the state was confusing.
handleSetupRecoveryCode + handleClearRecoveryCode now keep
hasRecoveryWrap in sync after the round trip.
Fail-quiet on getStatus error: if the network/auth/server-side fetch
fails, the page stays at the idle default. The user can still run
the setup flow, and any inconsistencies surface via the usual
server-side error responses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server-side support for the Phase 9 zero-knowledge opt-in. Adds the
recovery-wrap columns + four new vault operations + the routes that
expose them.
Schema (sql/003_recovery_wrap.sql)
----------------------------------
Adds to auth.encryption_vaults:
- recovery_wrapped_mk text (NULL until set)
- recovery_iv text (NULL until set)
- recovery_format_version smallint NOT NULL DEFAULT 1
- recovery_set_at timestamptz
- zero_knowledge boolean NOT NULL DEFAULT false
Drops NOT NULL from wrapped_mk + wrap_iv (a vault in zero-knowledge
mode has no server-side wrap at all).
Three CHECK constraints enforce the invariant at the DB level so no
service bug can leave a vault in an inconsistent state:
- encryption_vaults_has_wrap — at least one of (wrapped_mk,
recovery_wrapped_mk) is set
- encryption_vaults_wrap_iv_pair — ciphertext + IV are paired
(both NULL or both set) on
each wrap form
- encryption_vaults_zk_consistency — zero_knowledge=true implies
wrapped_mk IS NULL AND
recovery_wrapped_mk IS NOT NULL
If a code-level bug ever tried to enable ZK without a recovery wrap,
or to leave both wraps empty, Postgres would reject the UPDATE.
Drizzle schema (db/schema/encryption-vaults.ts)
-----------------------------------------------
Mirrors the migration: wrappedMk + wrapIv become nullable, the four
new columns added with the right defaults. Inline doc comment explains
the zero-knowledge fork.
Service (services/encryption-vault/index.ts)
--------------------------------------------
VaultFetchResult gains optional `requiresRecoveryCode` /
`recoveryWrappedMk` / `recoveryIv` so the route handler can serialize
the right shape. masterKey becomes Uint8Array | null (null in ZK mode).
Existing methods updated:
- init: branches on row.zeroKnowledge — returns the recovery blob
instead of an unwrapped MK if the user is already in ZK mode
- getMasterKey: same fork, with audit context "zk-recovery-blob"
- rotate: throws ZeroKnowledgeRotateForbidden in ZK mode (the server
can't re-wrap a key it can't read). Also wipes any stale recovery
wrap on rotation — the new MK has nothing to do with the old one,
so the old recovery code would unwrap into garbage.
New methods:
- setRecoveryWrap(userId, { recoveryWrappedMk, recoveryIv }, ctx)
Stores (or replaces) the user's recovery wrap. Idempotent.
- clearRecoveryWrap(userId, ctx)
Removes the recovery wrap. Forbidden if ZK is active (would lock
the user out) — throws ZeroKnowledgeActiveError → 409.
- enableZeroKnowledge(userId, ctx)
NULLs out wrapped_mk + wrap_iv, sets zero_knowledge=true. Requires
a recovery wrap to already be present — throws
RecoveryWrapMissingError → 400 otherwise. Idempotent on already-on.
- disableZeroKnowledge(userId, mkBytes, ctx)
Inverse: takes a freshly-unwrapped MK from the client, KEK-wraps
it, stores as wrapped_mk, flips zero_knowledge=false. The client
is the only entity that can supply the MK at this point, since
the server can't decrypt the recovery wrap.
Three new error classes:
- RecoveryWrapMissingError → 400 RECOVERY_WRAP_MISSING
- ZeroKnowledgeActiveError → 409 ZK_ACTIVE
- ZeroKnowledgeRotateForbidden → 409 ZK_ROTATE_FORBIDDEN
Audit action union extended with:
- 'recovery_set' | 'recovery_clear' | 'zk_enable' | 'zk_disable'
Routes (routes/encryption-vault.ts)
-----------------------------------
GET /key + POST /init now share a serializeFetchResult helper that
returns either:
- { masterKey, formatVersion, kekId } (standard)
- { requiresRecoveryCode: true, recoveryWrappedMk, (ZK mode)
recoveryIv, formatVersion }
Three new routes:
- POST /recovery-wrap — body: { recoveryWrappedMk, recoveryIv }
Stores the wrap. Validates both fields
are non-empty strings.
- DELETE /recovery-wrap — Removes the wrap. 409 if ZK active.
- POST /zero-knowledge — body: { enable: boolean, masterKey?: base64 }
enable=true: flip on (no body MK needed)
enable=false: flip off (MK required)
Validates the MK decodes to exactly 32 bytes.
Wipes the bytes after handing them to the
service.
POST /rotate now catches ZeroKnowledgeRotateForbidden → 409
ZK_ROTATE_FORBIDDEN so the client can show "disable zero-knowledge
first".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an "eventItems" mini-collection attached to each social event so
hosts can track what each guest is bringing, and so public visitors
on the share-link page can claim an item without an account.
Local-first side
- New eventItems table (Dexie v11), module config update for sync.
- LocalEventItem type + EventItem domain type, useEventItems query.
- eventItemsStore: addItem / updateItem / toggleDone / assign /
deleteItem. Every mutation pushes the full list to the server
snapshot via eventsStore.syncItems if the event is published.
- BringListEditor component on the host DetailView with assign-to-
guest dropdown, quantity, and done-checkbox.
- eventsStore.syncItems + a syncItems call in publishEvent so the
public page sees pre-existing items as soon as the event ships.
Server side
- New event_items_published table (FK cascade from events_published
so unpublishing wipes the bring list along with the snapshot).
- Host endpoints PUT/GET /events/:eventId/items: full-replace upsert
that preserves any existing claimed_by_name across host edits, max
100 items, ownership check.
- Public POST /rsvp/:token/items/:itemId/claim: name-only claim, 1×
per item (first write wins), shares the per-token hourly rate
bucket with RSVP submissions to keep the abuse surface uniform.
- GET /rsvp/:token now also returns the bring list (sorted) so the
public page renders in a single round-trip.
Public RSVP page
- Renders the bring list with claim buttons; clicking prompts for a
name and POSTs the claim, then optimistically updates the UI.
- New bring-list i18n keys for all five locales (de/en/it/fr/es).
Tests
- 15 new server tests covering host PUT/GET (insert / update / prune /
ownership / claimed-name preservation / cascade), GET /rsvp item
exposure, and POST /claim (success / double-claim / cross-token /
cancelled / validation). 50 server tests total, all green.
- E2E spec scoped to .guest-editor where the new BringListEditor
introduced a duplicate "Hinzufügen" button label.
Add bun:test integration suite that exercises every public and host
endpoint plus the rate-bucket sweeper against a real Postgres. The
Hono app factory was extracted from index.ts into app.ts so tests can
build their own instance with a header-based auth mock instead of
spinning up mana-auth + JWKS.
Coverage:
- health route smoke
- public RSVP: snapshot fetch (incl. 404, cancelled, summary
privacy), submit, validation (name, status, email, plus-ones,
cancelled), upsert dedup (incl. null/missing email parity), summary
aggregation across yes/no/maybe + plus-ones, rate-limit cap (5/h),
absolute per-token cap (20)
- host events: publish (auth, idempotent token reuse, ownership),
snapshot update (partial, ownership, 404), delete (cascade FK to
rsvps + buckets, ownership, idempotent), get rsvps (ownership)
- sweeper: removes >2h-old buckets, keeps fresh ones, no-op on empty
Mock auth lives in a small helper that injects an X-Test-User header
into a fake middleware, so the same createApp() factory powers both
production (real jwtAuth) and tests (header mock).
Five small follow-ups on Phase 1b:
- docker-compose.macmini.yml: add the mana-events container with the
same shape as mana-credits, expose port 3065, add a Traefik route
for events.mana.how, and inject PUBLIC_MANA_EVENTS_URL into the
mana-web container so the SvelteKit SSR + browser both reach it.
- mana-events: background sweeper that deletes rsvp_rate_buckets
rows older than 2h every hour. Without it, long-published events
accumulate one row per traffic-hour forever (FK cascade only fires
on snapshot delete).
- PublicRsvpList: track consecutiveFailures and only show the error
banner after two failures in a row, so a single mid-poll network
hiccup doesn't flash a 30s error the user can't act on.
- apps/mana/apps/web: declare postgres as a devDep (already imported
by the e2e spec via pnpm hoisting, now explicit).
Adds end-to-end browser voice capture for the Memoro module, mirroring the
existing dreams pattern: MediaRecorder → SvelteKit server proxy → mana-stt
on the Windows GPU box via Cloudflare tunnel.
Recording UI lives in /memoro page header (mic button + live timer + cancel +
sticky-permission retry). Server proxy at /api/v1/memoro/transcribe forwards
the blob with the server-held X-API-Key. memosStore.createFromVoice creates a
placeholder memo with processingStatus='processing' and fires transcribeBlob
in the background, which writes the transcript and flips status on completion
(or 'failed' with error in metadata).
Also corrects the mana-stt hostname across the repo: stt-api.mana.how (which
never existed in DNS) → gpu-stt.mana.how (the actual Cloudflare tunnel route
to the Windows GPU box). Adds an ENVIRONMENT_VARIABLES.md section explaining
how to obtain MANA_STT_API_KEY and where the tunnel terminates. Adds tunnel
health probes to the mac-mini health-check script so we catch tunnel-side
breakage in addition to LAN-side.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the server side of the per-user encryption vault. Phase 1 shipped
the client foundation (no-op while every table is enabled:false). This
commit lets the client actually fetch a master key when Phase 3 flips
the registry switches.
Schema (Drizzle + raw SQL migration)
- auth.encryption_vaults: per-user wrapped MK + IV + format version +
kek_id stamp + created/rotated timestamps. PK = user_id, ON DELETE
CASCADE so account deletion wipes the vault.
- auth.encryption_vault_audit: append-only trail of init/fetch/rotate
actions with IP, user-agent, HTTP status, free-form context.
- sql/002_encryption_vaults.sql: idempotent CREATE TABLE + ENABLE +
FORCE row-level security with a `current_setting('app.current_user_id')`
policy on both tables. FORCE makes the policy apply to the table
owner too — no bypass via grants.
KEK loader (services/encryption-vault/kek.ts)
- Loads a 32-byte AES-256 KEK from the MANA_AUTH_KEK env var (base64).
- Production: missing or wrong-length input is fatal at boot.
- Development: 32-zero-byte fallback so contributors can run the
service without provisioning a secret. Logs a loud warning.
- wrapMasterKey / unwrapMasterKey use Web Crypto AES-GCM-256 over the
raw 32-byte MK with a fresh 12-byte IV per wrap. Returns base64
pair for storage.
- generateMasterKey + activeKekId helpers used by the service.
- Future migration to KMS / Vault: only loadKek() changes; the
kek_id stamp on each row tracks which KEK produced it.
EncryptionVaultService (services/encryption-vault/index.ts)
- init(userId): idempotent — returns existing MK or mints a new one.
- getMasterKey(userId): unwraps the stored MK; throws VaultNotFoundError
on no-row so the route can return 404 cleanly.
- rotate(userId): mints fresh MK, replaces wrap. Caller is on the
hook for re-encryption — destructive by design.
- withUserScope(userId, fn): wraps every read/write in a Drizzle
transaction with set_config('app.current_user_id', userId, true)
so the RLS policy admits only the matching row. Empty userId is
rejected up-front.
- writeAudit() appends a row to encryption_vault_audit on every
action including failures, so probing attempts leave a trail.
Routes (routes/encryption-vault.ts)
- POST /api/v1/me/encryption-vault/init — idempotent bootstrap
- GET /api/v1/me/encryption-vault/key — fetch the active MK
- POST /api/v1/me/encryption-vault/rotate — destructive rotation
- All return base64-encoded master key bytes plus formatVersion +
kekId. JWT-protected via the existing /api/v1/me/* middleware.
- readAuditContext() pulls X-Forwarded-For + User-Agent off the
request for the audit row.
Bootstrap (index.ts)
- loadKek() runs at top-level await before any route can fire so a
misconfigured KEK fails closed at boot, never at request time.
- encryptionVaultService is mounted under /api/v1/me/encryption-vault
so it inherits the existing JWT middleware and shows up next to the
GDPR self-service endpoints.
Tests (services/encryption-vault/kek.test.ts)
- 11 Bun-test cases covering: KEK load (happy path, wrong length,
idempotent, before-load guard), generateMasterKey randomness,
wrap/unwrap roundtrip, IV uniqueness across repeated wraps,
wrong-MK-length rejection, tampered-ciphertext rejection,
wrong-length IV rejection, wrong-KEK rejection.
- Service-level integration tests deferred — they need a real
Postgres for the RLS behaviour, set up via existing mana-sync
test pattern in CI.
Config + env
- .env.development gains MANA_AUTH_KEK= (empty → dev fallback)
with a comment explaining the production requirement.
- services/mana-auth/package.json gains "test": "bun test".
Verified: 11/11 KEK tests passing, 31/31 Phase 1 client tests still
passing, only pre-existing TS errors remain in mana-auth (auth.ts:281
forgetPassword + api-keys.ts:50 insert overload — both unrelated).
Phase 3: client wires the MemoryKeyProvider to GET /encryption-vault/key
on login, flips registry entries to enabled:true table by table, and
extends the Dexie hooks to call wrapValue/unwrapValue on configured
fields.
Phase 4: settings UI for lock state, key rotation, recovery code opt-in.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add an ON DELETE CASCADE FK from rsvp_rate_buckets.token to
events_published.token. Without it, deleting a snapshot left orphaned
rate-limit rows behind, slowly leaking storage. Verified with a
direct SQL cascade test.
New Hono+Bun service at services/mana-events on port 3065 with two
schemas in mana_platform: events_published (snapshots) and public_rsvps
(unauthenticated responses), plus a per-token hourly rate-limit bucket.
- Host endpoints (JWT) for publish/update/unpublish/list-rsvps
- Public endpoints for snapshot fetch + RSVP upsert with rate limiting
- New /rsvp/[token] page outside the auth gate, SSR-loads the snapshot
- Client store wires publishEvent/unpublishEvent to the server, syncs
snapshot updates after edits, and deletes the snapshot on event delete
- DetailView polls GET /events/:id/rsvps every 30s while open and lets
hosts import a public response into their local guest list
- generate-env, setup-databases.sh, .env.development, hooks.server.ts,
package.json wired for local dev
Defense-in-depth on top of the existing application-level WHERE clauses:
- Migrate() now ENABLE + FORCE row level security on sync_changes and
installs a policy that gates rows on current_setting('app.current_user_id').
FORCE makes the policy apply to the table owner too, so the application
role used by mana-sync cannot bypass it regardless of grants.
- New withUser(ctx, userID, fn) helper opens a transaction and calls
set_config('app.current_user_id', userID, true) before running fn.
Empty userIDs are rejected up-front so an unauthenticated request can
never reach the database with an empty RLS scope (which would match
every row).
- RecordChange / GetChangesSince / GetAllChangesSince all run inside
withUser. WITH CHECK on the policy double-validates the user_id column
on insert against the active session, so a future code path that
forgets the WHERE clause cannot leak data.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add curated icon registry (73 Phosphor icons, 8 categories) in shared-icons
- Add DynamicIcon atom and IconPicker molecule in shared-ui
- Migrate habits module from emoji strings to Phosphor icon names
- Add Dexie version(2) migration for emoji→icon field rename
- Replace inline SVGs in habits with Phosphor components
- Add drag-and-drop photo upload to Photos workbench ListView
- Add blob: to CSP img-src for upload previews
- Add dev:media script and include mana-media in dev:manacore:servers
- Add ./toast export to shared-ui package.json
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gmail rejects emails without a valid Message-ID header (RFC 5322).
Add Message-ID and Date headers to all outgoing emails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Go's smtp.PlainAuth refuses to send credentials when the hostname
doesn't match the TLS cert (internal Docker hostname 'stalwart' vs
cert CN 'localhost'). Replace with custom LOGIN auth that works with
any SMTP server. Add detailed error logging at each SMTP stage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add SMTP_INSECURE_TLS env var to skip certificate verification for
internal Docker-network SMTP connections. Stalwart's self-signed cert
uses 'localhost' as CN which doesn't match the 'stalwart' hostname.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The unified web app calls auth.mana.how/api/v1/settings to sync theme,
nav, locale, and device settings — but the endpoint was missing, causing
404 errors in production. Implements all 7 CRUD routes against the
existing auth.user_settings table.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace direct Brevo SMTP sending with HTTP calls to mana-notify's
notification API. This centralizes all email configuration in one
service (mana-notify) and removes the nodemailer dependency from
mana-auth. SMTP provider is now swappable via a single env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug 1: NotifyUser() early-returned when no WebSocket clients existed,
skipping SSE subscriber notifications entirely. Fixed by restructuring
to check WS clients and SSE subscribers independently.
Bug 2: SSE stream cursor defaulted to client's `since` parameter when
no initial data existed. If `since` was in the future (or very recent),
live updates had created_at < cursor and were silently filtered out.
Fixed by defaulting cursor to now() when no initial data is returned.
Bug 3: NotifyUser used original sseSubs slice instead of sseSubsCopy
after releasing the read lock (race condition).
Verified E2E: Push from client A → SSE stream on client B receives
live change event with correct data within ~1 second.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New endpoint GET /sync/{appId}/stream sends Server-Sent Events with
change data directly, replacing the WebSocket notification + HTTP pull
round-trip pattern.
Server (Go):
- HandleStream() in handler.go: SSE endpoint with initial sync + live streaming
- Hub.Subscribe()/Unsubscribe() in hub.go: channel-based SSE subscriber system
- Notification type for type-safe SSE events
- convertChanges() helper extracted from duplicated code
- WriteTimeout set to 0 for SSE long-lived connections
Protocol: Client connects to /sync/{appId}/stream?collections=a,b&since=...
Server sends initial changes, then streams live changes as other clients sync.
Heartbeat every 30s keeps connection alive. Push still uses POST /sync/{appId}.
WebSocket remains available as fallback (not removed).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Server now returns hasMore: true when there are more than 1000 changes
pending for a collection. Client continues pulling in a loop until
hasMore is false, using the last row's timestamp as cursor.
Prevents data loss after long offline periods where >1000 changes
accumulated for a single collection.
Server changes (Go):
- GetChangesSince() accepts limit parameter
- HandlePull() fetches limit+1, trims, sets hasMore
- SyncedUntil uses last row's timestamp when paginating
Client changes (TypeScript):
- Pull loop: while (hasMore) { fetch → apply → advance cursor }
- Cursor only persisted after all pages fetched
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors the frontend unification (single IndexedDB) on the backend.
All services now use pgSchema() for isolation within one shared database,
enabling cross-schema JOINs, simplified ops, and zero DB setup for new apps.
- Migrate 7 services from pgTable() to pgSchema(): mana-user (usr),
mana-media (media), todo, traces, presi, uload, cards
- Update all DATABASE_URLs in .env.development, docker-compose, configs
- Rewrite init-db scripts for 2 databases + 12 schemas
- Rewrite setup-databases.sh for consolidated architecture
- Update shared-drizzle-config default to mana_platform
- Update CLAUDE.md with new database architecture docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create packages/shared-python/manacore_auth/ with:
- auth.py: API key validation, rate limiting, local + external auth
- external_auth.py: mana-core-auth remote validation with caching
- create_auth_dependency(scope): factory for per-service auth deps
Migrated services:
- mana-stt: auth.py now wraps shared auth with scope="stt" (272→42 LOC)
- mana-tts: auth.py now wraps shared auth with scope="tts" (272→42 LOC)
The only difference between services was the scope parameter ("stt" vs "tts").
Both external_auth.py files were 100% identical and are now thin re-exports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- shared-auth-stores: delete createSupabaseAuthStore (zero usage across monorepo,
all apps use createManaAuthStore). Remove export + types from index.ts.
- services: move ollama-metrics-proxy (stub — just a Grafana dashboard JSON) and
it-landing (Astro landing page, not a service) to services-archived/
- lint-staged: add services-archived/ to eslint ignore pattern
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add unified /ws endpoint that serves all app notifications over a single connection.
The server now includes appId in the sync-available message payload so the client
knows which app to pull. Legacy /ws/{appId} endpoint remains for backward compatibility.
Backend (Go):
- hub.go: Message struct gains AppId field, NotifyUser sends to all user clients
(unified clients receive everything, legacy clients filtered by appId)
- main.go: new GET /ws route (empty appId = unified mode)
Frontend (sync.ts):
- Single connectUnifiedWs() replaces 27 per-app connectWs() calls
- Parses msg.appId from server to pull only the affected app
- Reconnect/offline logic simplified to one WS
This reduces WebSocket connections from 27 per user to 1, cutting server
connection overhead by ~96%.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- mana-image-gen: change default port from 3025 to 3026 to avoid conflict with mana-llm
- Dashboard widgets (12): replace APP_URLS.{app}.dev/prod with internal route paths (/todo, /calendar, etc.)
and remove target="_blank" since all apps are now internal routes in the unified app
- Home page: use goto() for internal apps, keep window.open() only for external apps (matrix, arcade)
- AppRow: remove unused APP_URLS import
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New GPU service for fast text-to-video generation using LTX-Video (~2B params)
on the RTX 3090. Generates 480p clips in 10-30 seconds, uses ~10GB VRAM.
Includes Cloudflare Tunnel route, Prometheus monitoring, and health checks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mana-stt: add WhisperX service with CUDA GPU support, speaker diarization, and auto-fallback chain.
mana-notify: add locale fallback and default templates for task reminders.
CD: update deployment pipeline and docker-compose configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extract ~120 hardcoded German strings from 14 Svelte components into i18n locale
files using svelte-i18n $t() calls. Add new translation sections (taskForm, filters,
tags, subtasks, durationPicker, kanban, toolbar) across all 5 languages (de/en/fr/es/it).
Also add missing shared common translations for Spanish, French, and Italian
(150+ keys each) in packages/shared-i18n.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Delete apps/memoro/apps/backend/ (NestJS) and apps/memoro/apps/audio-backend/
(NestJS) — all functionality has been ported to the new Hono/Bun servers
(apps/server/ and apps/audio-server/).
Also clean up root and memoro package.json scripts to remove references
to the old @memoro/backend and @memoro/audio-backend packages.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Accessing (error as any)?.body?.code on a Better Auth APIError triggers an internal
async stream read. When the request body contains special chars like '!', the deferred
JSON parse fails as an unhandled rejection that races with the response, causing 500.
Use only error.status === 'FORBIDDEN' which is a simple string property.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Better Auth uses callbackURL to determine the post-verification redirect target.
Setting only redirectTo left callbackURL=/ which resolved to auth.mana.how/ (404).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>