Commit graph

7 commits

Author SHA1 Message Date
Till JS
e66654068f feat(auth): error-classification layer + passkey end-to-end
Two interlocking fixes driven by a production lockout incident.

## Bug that motivated this

A fresh schema-drift column (auth.users.onboarding_completed_at) made
every Better Auth query crash with Postgres 42703. The /login wrapper
swallowed the non-2xx and mapped it onto a generic "401 Invalid
credentials" AND bumped the password lockout counter — so 5 legit
login attempts against a broken DB would have locked every real user
out of their own account. Same wrapper pattern on /register, /refresh,
/reset-password etc. The 30-minute hunt ended in a one-off repro
script that finally surfaced the real Postgres error.

The user-facing passkey button additionally returned generic 404s on
every login-page mount because the route wasn't registered (the DB
schema existed, the Better Auth plugin wasn't wired).

## Phase 1 — Error classification (services/mana-auth/src/lib/auth-errors)

- 19-code AuthErrorCode taxonomy (INVALID_CREDENTIALS, EMAIL_NOT_VERIFIED,
  ACCOUNT_LOCKED, SERVICE_UNAVAILABLE, PASSKEY_VERIFICATION_FAILED, …)
- classifyFromResponse/classifyFromError handle: Better Auth APIError
  (duck-typed on `name === 'APIError'`), Postgres errors (23505 unique,
  42703/08xxx → infra), ZodError, fetch/ECONNREFUSED network errors,
  bare Error, unknown.
- respondWithError routes the structured response, logs at the right
  level, fires the correct security event, and CRITICALLY only bumps
  the lockout counter for actual credential failures — SERVICE_UNAVAILABLE
  and INTERNAL never touch lockout.
- All 12 endpoints in routes/auth.ts refactored (/login, /register,
  /logout, /session-to-token, /refresh, /validate, /forgot-password,
  /reset-password, /resend-verification, /profile GET+POST,
  /change-email, /change-password, /account DELETE).
- Fixed pre-existing auth.api.forgetPassword typo (→ requestPasswordReset).
- shared-logger + requestLogger middleware wired in index.ts; all
  console.* calls in the service removed.

## Phase 2 — Passkey end-to-end (@better-auth/passkey 1.6+)

- sql/007_passkey_bootstrap.sql: idempotent schema alignment —
  friendly_name→name, +aaguid, transports jsonb→text, +method column
  on login_attempts.
- better-auth.config.ts: passkey plugin wired with rpID/rpName/origin
  from new webauthn config section. rpID defaults to mana.how in prod
  (from COOKIE_DOMAIN), localhost in dev.
- routes/passkeys.ts: 7 wrapper endpoints (capability probe,
  register/options+verify, authenticate/options+verify with JWT mint,
  list, delete, rename). Each routes errors through the classifier;
  authenticate/verify promotes generic INVALID_CREDENTIALS to
  PASSKEY_VERIFICATION_FAILED.
- PasskeyRateLimitService: in-memory per-IP (options: 20/min) and
  per-credential (verify: 10 failures/min → 5 min cooldown) buckets.
  Deliberately separate from the password lockout — different factor,
  different blast radius.
- Client: authService.getPasskeyCapability() async probe, memoised per
  session. authStore.passkeyAvailable reactive state. LoginPage gates
  on === true so a slow probe doesn't flash the button in.
- AuthResult grew a code: AuthErrorCode field; handleAuthError in
  shared-auth prefers the server envelope over the legacy message
  heuristics.

## Tests

- 30 unit tests for the classifier covering every branch (including
  the exact Postgres 42703 shape that started this).
- 9 unit tests for the rate limiter.
- 14 integration tests for the auth routes — the regression test
  explicitly asserts "upstream 500 → 503 + zero lockout bumps".
- 101 tests pass, 0 fail, 30 pre-existing skips unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:52:51 +02:00
Till JS
89388fb369 refactor(mana-auth): move enums from public to auth schema
pgEnum() defaults to the public schema. Because
drizzle.config.ts sets schemaFilter: ['auth'], push introspection
never saw the enums and kept re-emitting CREATE TYPE access_tier ...,
failing with 42710. This blocked setup-databases.sh from advancing
mana-auth past the enum declarations and silently masked other drift
(e.g. the new `kind` column on auth.users going un-pushed).

Source side: three enums now live on authSchema via
authSchema.enum(...) instead of pgEnum(...). DB side: migration 006
recreates access_tier / user_role / user_kind inside the auth schema,
repoints auth.users.access_tier and auth.users.role via ::text cast
(preserving all data and defaults), and drops the old public types.

After this, `drizzle-kit push --force` reports "No changes detected"
on a clean DB and the broader `pnpm setup:db` run is green without
workarounds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:39 +02:00
Till JS
52f53c844b chore(mana-auth): add 005 persona tables migration
Documents the SQL that was applied manually to match the personas.ts
Drizzle schema introduced in 493db0c3b. Idempotent. See
docs/plans/mana-mcp-and-personas.md for the design. Required because
the spaces tables created alongside personas sit outside the auth
schemaFilter, and pre-existing public enums would otherwise trip
drizzle-kit push (resolved separately in migration 006).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:26 +02:00
Till JS
698ffe797c feat(spaces): add spaces pg schema — credentials + module_permissions
Groundwork for server-side Space extensions that must NOT live in Dexie:
  - spaces.credentials         — per-space OAuth tokens, API keys, SMTP
                                 configs. Access tokens are stored
                                 encrypted at rest with the service KEK.
  - spaces.module_permissions  — role × module read/write/admin overrides
                                 on top of the SPACE_MODULE_ALLOWLIST
                                 defaults.

Both tables FK to auth.organizations with ON DELETE CASCADE so deleting
a space drops its credentials and permission overrides automatically.

RLS is intentionally deferred — enabling it now would lock out services
that don't yet pass space context. A follow-up migration turns it on
after mana-api speaks the Spaces protocol end-to-end.

To apply locally: bun run db:push in services/mana-auth, or psql -f
sql/004_spaces.sql against the mana_platform DB.

No runtime code reads these tables yet — they're the scaffolding that
Task-8 (mana-sync) and the eventual social-relay/clubs modules will
consume.

Plan: docs/plans/spaces-foundation.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:13:33 +02:00
Till JS
f46d1328d8 feat(mana-auth): phase 9 milestone 2 — vault recovery wrap + zero-knowledge
Server-side support for the Phase 9 zero-knowledge opt-in. Adds the
recovery-wrap columns + four new vault operations + the routes that
expose them.

Schema (sql/003_recovery_wrap.sql)
----------------------------------
Adds to auth.encryption_vaults:

  - recovery_wrapped_mk    text                  (NULL until set)
  - recovery_iv            text                  (NULL until set)
  - recovery_format_version smallint NOT NULL DEFAULT 1
  - recovery_set_at        timestamptz
  - zero_knowledge         boolean NOT NULL DEFAULT false

Drops NOT NULL from wrapped_mk + wrap_iv (a vault in zero-knowledge
mode has no server-side wrap at all).

Three CHECK constraints enforce the invariant at the DB level so no
service bug can leave a vault in an inconsistent state:

  - encryption_vaults_has_wrap         — at least one of (wrapped_mk,
                                          recovery_wrapped_mk) is set
  - encryption_vaults_wrap_iv_pair     — ciphertext + IV are paired
                                          (both NULL or both set) on
                                          each wrap form
  - encryption_vaults_zk_consistency   — zero_knowledge=true implies
                                          wrapped_mk IS NULL AND
                                          recovery_wrapped_mk IS NOT NULL

If a code-level bug ever tried to enable ZK without a recovery wrap,
or to leave both wraps empty, Postgres would reject the UPDATE.

Drizzle schema (db/schema/encryption-vaults.ts)
-----------------------------------------------
Mirrors the migration: wrappedMk + wrapIv become nullable, the four
new columns added with the right defaults. Inline doc comment explains
the zero-knowledge fork.

Service (services/encryption-vault/index.ts)
--------------------------------------------
VaultFetchResult gains optional `requiresRecoveryCode` /
`recoveryWrappedMk` / `recoveryIv` so the route handler can serialize
the right shape. masterKey becomes Uint8Array | null (null in ZK mode).

Existing methods updated:
  - init: branches on row.zeroKnowledge — returns the recovery blob
    instead of an unwrapped MK if the user is already in ZK mode
  - getMasterKey: same fork, with audit context "zk-recovery-blob"
  - rotate: throws ZeroKnowledgeRotateForbidden in ZK mode (the server
    can't re-wrap a key it can't read). Also wipes any stale recovery
    wrap on rotation — the new MK has nothing to do with the old one,
    so the old recovery code would unwrap into garbage.

New methods:
  - setRecoveryWrap(userId, { recoveryWrappedMk, recoveryIv }, ctx)
    Stores (or replaces) the user's recovery wrap. Idempotent.
  - clearRecoveryWrap(userId, ctx)
    Removes the recovery wrap. Forbidden if ZK is active (would lock
    the user out) — throws ZeroKnowledgeActiveError → 409.
  - enableZeroKnowledge(userId, ctx)
    NULLs out wrapped_mk + wrap_iv, sets zero_knowledge=true. Requires
    a recovery wrap to already be present — throws
    RecoveryWrapMissingError → 400 otherwise. Idempotent on already-on.
  - disableZeroKnowledge(userId, mkBytes, ctx)
    Inverse: takes a freshly-unwrapped MK from the client, KEK-wraps
    it, stores as wrapped_mk, flips zero_knowledge=false. The client
    is the only entity that can supply the MK at this point, since
    the server can't decrypt the recovery wrap.

Three new error classes:
  - RecoveryWrapMissingError → 400 RECOVERY_WRAP_MISSING
  - ZeroKnowledgeActiveError → 409 ZK_ACTIVE
  - ZeroKnowledgeRotateForbidden → 409 ZK_ROTATE_FORBIDDEN

Audit action union extended with:
  - 'recovery_set' | 'recovery_clear' | 'zk_enable' | 'zk_disable'

Routes (routes/encryption-vault.ts)
-----------------------------------
GET /key + POST /init now share a serializeFetchResult helper that
returns either:
  - { masterKey, formatVersion, kekId }                 (standard)
  - { requiresRecoveryCode: true, recoveryWrappedMk,    (ZK mode)
      recoveryIv, formatVersion }

Three new routes:
  - POST   /recovery-wrap   — body: { recoveryWrappedMk, recoveryIv }
                              Stores the wrap. Validates both fields
                              are non-empty strings.
  - DELETE /recovery-wrap   — Removes the wrap. 409 if ZK active.
  - POST   /zero-knowledge  — body: { enable: boolean, masterKey?: base64 }
                              enable=true:  flip on (no body MK needed)
                              enable=false: flip off (MK required)
                              Validates the MK decodes to exactly 32 bytes.
                              Wipes the bytes after handing them to the
                              service.

POST /rotate now catches ZeroKnowledgeRotateForbidden → 409
ZK_ROTATE_FORBIDDEN so the client can show "disable zero-knowledge
first".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:05:49 +02:00
Till JS
e9915428cb feat(mana-auth): encryption vault — phase 2 (server-side master key custody)
Adds the server side of the per-user encryption vault. Phase 1 shipped
the client foundation (no-op while every table is enabled:false). This
commit lets the client actually fetch a master key when Phase 3 flips
the registry switches.

Schema (Drizzle + raw SQL migration)
  - auth.encryption_vaults: per-user wrapped MK + IV + format version +
    kek_id stamp + created/rotated timestamps. PK = user_id, ON DELETE
    CASCADE so account deletion wipes the vault.
  - auth.encryption_vault_audit: append-only trail of init/fetch/rotate
    actions with IP, user-agent, HTTP status, free-form context.
  - sql/002_encryption_vaults.sql: idempotent CREATE TABLE + ENABLE +
    FORCE row-level security with a `current_setting('app.current_user_id')`
    policy on both tables. FORCE makes the policy apply to the table
    owner too — no bypass via grants.

KEK loader (services/encryption-vault/kek.ts)
  - Loads a 32-byte AES-256 KEK from the MANA_AUTH_KEK env var (base64).
  - Production: missing or wrong-length input is fatal at boot.
  - Development: 32-zero-byte fallback so contributors can run the
    service without provisioning a secret. Logs a loud warning.
  - wrapMasterKey / unwrapMasterKey use Web Crypto AES-GCM-256 over the
    raw 32-byte MK with a fresh 12-byte IV per wrap. Returns base64
    pair for storage.
  - generateMasterKey + activeKekId helpers used by the service.
  - Future migration to KMS / Vault: only loadKek() changes; the
    kek_id stamp on each row tracks which KEK produced it.

EncryptionVaultService (services/encryption-vault/index.ts)
  - init(userId): idempotent — returns existing MK or mints a new one.
  - getMasterKey(userId): unwraps the stored MK; throws VaultNotFoundError
    on no-row so the route can return 404 cleanly.
  - rotate(userId): mints fresh MK, replaces wrap. Caller is on the
    hook for re-encryption — destructive by design.
  - withUserScope(userId, fn): wraps every read/write in a Drizzle
    transaction with set_config('app.current_user_id', userId, true)
    so the RLS policy admits only the matching row. Empty userId is
    rejected up-front.
  - writeAudit() appends a row to encryption_vault_audit on every
    action including failures, so probing attempts leave a trail.

Routes (routes/encryption-vault.ts)
  - POST /api/v1/me/encryption-vault/init  — idempotent bootstrap
  - GET  /api/v1/me/encryption-vault/key   — fetch the active MK
  - POST /api/v1/me/encryption-vault/rotate — destructive rotation
  - All return base64-encoded master key bytes plus formatVersion +
    kekId. JWT-protected via the existing /api/v1/me/* middleware.
  - readAuditContext() pulls X-Forwarded-For + User-Agent off the
    request for the audit row.

Bootstrap (index.ts)
  - loadKek() runs at top-level await before any route can fire so a
    misconfigured KEK fails closed at boot, never at request time.
  - encryptionVaultService is mounted under /api/v1/me/encryption-vault
    so it inherits the existing JWT middleware and shows up next to the
    GDPR self-service endpoints.

Tests (services/encryption-vault/kek.test.ts)
  - 11 Bun-test cases covering: KEK load (happy path, wrong length,
    idempotent, before-load guard), generateMasterKey randomness,
    wrap/unwrap roundtrip, IV uniqueness across repeated wraps,
    wrong-MK-length rejection, tampered-ciphertext rejection,
    wrong-length IV rejection, wrong-KEK rejection.
  - Service-level integration tests deferred — they need a real
    Postgres for the RLS behaviour, set up via existing mana-sync
    test pattern in CI.

Config + env
  - .env.development gains MANA_AUTH_KEK= (empty → dev fallback)
    with a comment explaining the production requirement.
  - services/mana-auth/package.json gains "test": "bun test".

Verified: 11/11 KEK tests passing, 31/31 Phase 1 client tests still
passing, only pre-existing TS errors remain in mana-auth (auth.ts:281
forgetPassword + api-keys.ts:50 insert overload — both unrelated).

Phase 3: client wires the MemoryKeyProvider to GET /encryption-vault/key
on login, flips registry entries to enabled:true table by table, and
extends the Dexie hooks to call wrapValue/unwrapValue on configured
fields.
Phase 4: settings UI for lock state, key rotation, recovery code opt-in.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:38:09 +02:00
Till JS
0d6005dbcc fix(inventar): import FeedbackPage from @manacore/feedback, not shared-ui
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 21:56:19 +02:00