Commit graph

10 commits

Author SHA1 Message Date
Till JS
4cca25ed03 chore(dev): switch all Bun services from --watch to --hot
Fans out the cards-server fix from 08f422340 to every other Bun
service in the monorepo. --hot keeps the port bound across HMR
reloads via globalThis[hmrSymbol]; --watch restarts the process
and races old/new Bun.serve for the port.

Touched: dev:auth, dev:credits, dev:events, dev:analytics,
dev:memoro:server, dev:memoro:audio-server, dev:uload:server in
the root package.json plus the matching `dev` script in each
service's own package.json. All six services already export the
`{ port, fetch }` default that Bun's --hot expects.

Smoke-tested: pnpm dev:cardecky:full boots clean, then touching
auth/credits/cards-server entry files all hot-reload without
dropping their port.

(memoro/apps/audio-server doesn't have a `dev: bun --watch ...`
script in its own package.json, so only the root entry got the
swap there.)
2026-05-08 14:24:24 +02:00
Till JS
8b71d290f8 feat(mana-auth): wire Glitchtip/Sentry error tracking via shared-error-tracking
First server-side error-tracking integration. Pattern mirrors the
client-side one in apps/mana/apps/web/src/hooks.client.ts:

- pull @mana/shared-error-tracking into mana-auth's deps (workspace pkg
  with @sentry/node + a no-op fallback when GLITCHTIP_DSN is unset)
- call initErrorTracking() at the top of services/mana-auth/src/index.ts
  before the rest of the module body executes — this lets Sentry hook
  uncaughtException / unhandledRejection before any Hono handlers register
- wrap app.onError so non-HTTPException throws also flow into
  captureException with path/method/query context. HTTPExceptions are
  intentional 4xx/422 and stay out of the issue list (otherwise every
  401 from a stale session would page somebody at 3am)
- compose: pass GLITCHTIP_DSN_MANA_PLATFORM through as GLITCHTIP_DSN per
  service so each container's events get tagged with serverName='mana-auth'

DSN itself isn't in the repo; lives in .env.macmini on the Mac Mini and
is referenced from the Glitchtip credentials doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 01:55:07 +02:00
Till JS
e66654068f feat(auth): error-classification layer + passkey end-to-end
Two interlocking fixes driven by a production lockout incident.

## Bug that motivated this

A fresh schema-drift column (auth.users.onboarding_completed_at) made
every Better Auth query crash with Postgres 42703. The /login wrapper
swallowed the non-2xx and mapped it onto a generic "401 Invalid
credentials" AND bumped the password lockout counter — so 5 legit
login attempts against a broken DB would have locked every real user
out of their own account. Same wrapper pattern on /register, /refresh,
/reset-password etc. The 30-minute hunt ended in a one-off repro
script that finally surfaced the real Postgres error.

The user-facing passkey button additionally returned generic 404s on
every login-page mount because the route wasn't registered (the DB
schema existed, the Better Auth plugin wasn't wired).

## Phase 1 — Error classification (services/mana-auth/src/lib/auth-errors)

- 19-code AuthErrorCode taxonomy (INVALID_CREDENTIALS, EMAIL_NOT_VERIFIED,
  ACCOUNT_LOCKED, SERVICE_UNAVAILABLE, PASSKEY_VERIFICATION_FAILED, …)
- classifyFromResponse/classifyFromError handle: Better Auth APIError
  (duck-typed on `name === 'APIError'`), Postgres errors (23505 unique,
  42703/08xxx → infra), ZodError, fetch/ECONNREFUSED network errors,
  bare Error, unknown.
- respondWithError routes the structured response, logs at the right
  level, fires the correct security event, and CRITICALLY only bumps
  the lockout counter for actual credential failures — SERVICE_UNAVAILABLE
  and INTERNAL never touch lockout.
- All 12 endpoints in routes/auth.ts refactored (/login, /register,
  /logout, /session-to-token, /refresh, /validate, /forgot-password,
  /reset-password, /resend-verification, /profile GET+POST,
  /change-email, /change-password, /account DELETE).
- Fixed pre-existing auth.api.forgetPassword typo (→ requestPasswordReset).
- shared-logger + requestLogger middleware wired in index.ts; all
  console.* calls in the service removed.

## Phase 2 — Passkey end-to-end (@better-auth/passkey 1.6+)

- sql/007_passkey_bootstrap.sql: idempotent schema alignment —
  friendly_name→name, +aaguid, transports jsonb→text, +method column
  on login_attempts.
- better-auth.config.ts: passkey plugin wired with rpID/rpName/origin
  from new webauthn config section. rpID defaults to mana.how in prod
  (from COOKIE_DOMAIN), localhost in dev.
- routes/passkeys.ts: 7 wrapper endpoints (capability probe,
  register/options+verify, authenticate/options+verify with JWT mint,
  list, delete, rename). Each routes errors through the classifier;
  authenticate/verify promotes generic INVALID_CREDENTIALS to
  PASSKEY_VERIFICATION_FAILED.
- PasskeyRateLimitService: in-memory per-IP (options: 20/min) and
  per-credential (verify: 10 failures/min → 5 min cooldown) buckets.
  Deliberately separate from the password lockout — different factor,
  different blast radius.
- Client: authService.getPasskeyCapability() async probe, memoised per
  session. authStore.passkeyAvailable reactive state. LoginPage gates
  on === true so a slow probe doesn't flash the button in.
- AuthResult grew a code: AuthErrorCode field; handleAuthError in
  shared-auth prefers the server envelope over the legacy message
  heuristics.

## Tests

- 30 unit tests for the classifier covering every branch (including
  the exact Postgres 42703 shape that started this).
- 9 unit tests for the rate limiter.
- 14 integration tests for the auth routes — the regression test
  explicitly asserts "upstream 500 → 503 + zero lockout bumps".
- 101 tests pass, 0 fail, 30 pre-existing skips unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:52:51 +02:00
Till JS
166d6c6ffb feat(spaces): validate space metadata on Better Auth organization hooks
Moves the canonical SpaceType + SPACE_MODULE_ALLOWLIST to @mana/shared-types
(framework-free) so the Bun services can consume them without pulling in
Svelte. shared-branding keeps only the UI-facing labels and descriptions
and re-exports the canonical types for frontend convenience.

Wires two Better Auth organization hooks in mana-auth:
- beforeCreateOrganization asserts metadata.type is a valid SpaceType,
  rejecting the create with a BAD_REQUEST otherwise.
- beforeDeleteOrganization rejects deletion of the personal space.

Covered by bun tests (11 assertions) for the helper module.

No migration and no schema change — type lives in the existing
organization.metadata jsonb column.

Plan: docs/plans/spaces-foundation.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:05:38 +02:00
Till JS
9a3025fed8 feat(ai,auth): Mission Grant endpoint + unwrap helper + audit table
Phase 1 of the Mission Key-Grant rollout. Webapp can now request a
wrapped per-mission data key; mana-ai can unwrap and (Phase 2) use it.

mana-auth:
- POST /api/v1/me/ai-mission-grant — HKDF-derives MDK from the user
  master key, RSA-OAEP-2048-wraps with the mana-ai public key, returns
  { wrappedKey, derivation, issuedAt, expiresAt }
- MissionGrantService refuses zero-knowledge users (409 ZK_ACTIVE) and
  returns 503 GRANT_NOT_CONFIGURED when MANA_AI_PUBLIC_KEY_PEM is unset
- TTL clamped to [1h, 30d]

mana-ai:
- configureMissionGrantKey + unwrapMissionGrant with structured failure
  reasons (not-configured / expired / malformed / wrap-rejected)
- mana_ai.decrypt_audit table + RLS policy scoped to
  app.current_user_id — append-only row per server-side decrypt attempt
- MANA_AI_PRIVATE_KEY_PEM env slot; absent = grants silently disabled

No existing behaviour changes: missions without a grant run exactly as
before. Grant flow is wired end-to-end but unused until Phase 2 lands
the encrypted resolver.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:41:59 +02:00
Till JS
919fcca4b7 refactor(shared-tailwind): rewrite themes.css to single-layer shadcn convention
Pre-launch theme system audit found multiple parallel layers in themes.css
(--theme-X full hsl strings, --X partial shadcn aliases, --color-X populated
by runtime store with raw channels) plus dead-code companion files. The
inconsistency caused light-mode regressions when scoped-CSS consumers
wrote `var(--color-X)` standalone — the variable holds raw HSL channels
which is invalid as a color value, browser fell back to inherited (white).

Rewrite to one consistent layer:

  - Source of truth: --color-X defined as raw HSL channels (e.g.
    `0 0% 17%`) in :root, .dark, and all variant [data-theme="..."]
    blocks. Matches the format the runtime store
    (@mana/shared-theme/src/utils.ts) writes, eliminating the
    static-fallback-vs-runtime mismatch and the corresponding flash
    of unstyled content on hydration.

  - @theme inline uses self-reference + Tailwind v4 <alpha-value>
    placeholder so utility classes generate correctly AND opacity
    modifiers work: `text-foreground/50` → `hsl(var(--color-foreground) / 0.5)`.

  - @layer components (.btn-primary, .card, .badge, etc.) wraps
    var(--color-X) refs with hsl() — they were broken in light mode
    too for the same reason.

Convention going forward (also documented in the file header):

  1. Markup: use Tailwind utility classes (text-foreground, bg-card, …)
  2. Scoped CSS: hsl(var(--color-X)) — always wrap with hsl()
  3. NEVER raw var(--color-X) in CSS — that's the bug pattern

Net file: 692 → 580 LOC. Single source layer, no indirection.

Also delete dead companion files (zero imports anywhere):
  - tailwind-v4.css (had broken self-reference, never imported)
  - theme-variables.css (legacy hex-based palette)
  - components.css (legacy component utilities)
  - index.js / preset.js / colors.js (Tailwind v3 preset format,
    irrelevant under Tailwind v4)

package.json exports map shrinks accordingly to just `./themes.css`.

Consumers using `hsl(var(--color-X))` (~379 files across mana-web,
manavoxel-web, arcade-web) keep working unchanged — the public API
name `--color-X` is preserved. Only the broken pattern `var(--color-X)`
(~61 files) needs a follow-up sweep, handled in a separate commit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 01:13:06 +02:00
Till JS
0d1d3b9449 fix(mana-auth): declare missing nanoid dependency
mana-auth has been crash-looping in production with:

    error: Cannot find package 'nanoid' from
    '/app/src/services/encryption-vault/index.ts'

The encryption-vault service imports nanoid for audit row IDs (line 27,
used at line 547 in the audit log writer), but nanoid was never added
to services/mana-auth/package.json. The import was introduced in commit
e9915428c (phase 2 — server-side master key custody) and slipped past
because nanoid happens to exist transitively in the workspace via
postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just
fine; a strict isolated container build can't.

Fix:
- Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps
- pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules

Verified the import resolves locally:
    bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())'
    → ok: 6TLuTWlenhC0KnSESn5Ex

The Mac Mini still needs to redeploy mana-auth (rebuild image with the
new lockfile, restart container) to pick this up — production is
currently 502ing on auth.mana.how.
2026-04-08 15:50:14 +02:00
Till JS
e9915428cb feat(mana-auth): encryption vault — phase 2 (server-side master key custody)
Adds the server side of the per-user encryption vault. Phase 1 shipped
the client foundation (no-op while every table is enabled:false). This
commit lets the client actually fetch a master key when Phase 3 flips
the registry switches.

Schema (Drizzle + raw SQL migration)
  - auth.encryption_vaults: per-user wrapped MK + IV + format version +
    kek_id stamp + created/rotated timestamps. PK = user_id, ON DELETE
    CASCADE so account deletion wipes the vault.
  - auth.encryption_vault_audit: append-only trail of init/fetch/rotate
    actions with IP, user-agent, HTTP status, free-form context.
  - sql/002_encryption_vaults.sql: idempotent CREATE TABLE + ENABLE +
    FORCE row-level security with a `current_setting('app.current_user_id')`
    policy on both tables. FORCE makes the policy apply to the table
    owner too — no bypass via grants.

KEK loader (services/encryption-vault/kek.ts)
  - Loads a 32-byte AES-256 KEK from the MANA_AUTH_KEK env var (base64).
  - Production: missing or wrong-length input is fatal at boot.
  - Development: 32-zero-byte fallback so contributors can run the
    service without provisioning a secret. Logs a loud warning.
  - wrapMasterKey / unwrapMasterKey use Web Crypto AES-GCM-256 over the
    raw 32-byte MK with a fresh 12-byte IV per wrap. Returns base64
    pair for storage.
  - generateMasterKey + activeKekId helpers used by the service.
  - Future migration to KMS / Vault: only loadKek() changes; the
    kek_id stamp on each row tracks which KEK produced it.

EncryptionVaultService (services/encryption-vault/index.ts)
  - init(userId): idempotent — returns existing MK or mints a new one.
  - getMasterKey(userId): unwraps the stored MK; throws VaultNotFoundError
    on no-row so the route can return 404 cleanly.
  - rotate(userId): mints fresh MK, replaces wrap. Caller is on the
    hook for re-encryption — destructive by design.
  - withUserScope(userId, fn): wraps every read/write in a Drizzle
    transaction with set_config('app.current_user_id', userId, true)
    so the RLS policy admits only the matching row. Empty userId is
    rejected up-front.
  - writeAudit() appends a row to encryption_vault_audit on every
    action including failures, so probing attempts leave a trail.

Routes (routes/encryption-vault.ts)
  - POST /api/v1/me/encryption-vault/init  — idempotent bootstrap
  - GET  /api/v1/me/encryption-vault/key   — fetch the active MK
  - POST /api/v1/me/encryption-vault/rotate — destructive rotation
  - All return base64-encoded master key bytes plus formatVersion +
    kekId. JWT-protected via the existing /api/v1/me/* middleware.
  - readAuditContext() pulls X-Forwarded-For + User-Agent off the
    request for the audit row.

Bootstrap (index.ts)
  - loadKek() runs at top-level await before any route can fire so a
    misconfigured KEK fails closed at boot, never at request time.
  - encryptionVaultService is mounted under /api/v1/me/encryption-vault
    so it inherits the existing JWT middleware and shows up next to the
    GDPR self-service endpoints.

Tests (services/encryption-vault/kek.test.ts)
  - 11 Bun-test cases covering: KEK load (happy path, wrong length,
    idempotent, before-load guard), generateMasterKey randomness,
    wrap/unwrap roundtrip, IV uniqueness across repeated wraps,
    wrong-MK-length rejection, tampered-ciphertext rejection,
    wrong-length IV rejection, wrong-KEK rejection.
  - Service-level integration tests deferred — they need a real
    Postgres for the RLS behaviour, set up via existing mana-sync
    test pattern in CI.

Config + env
  - .env.development gains MANA_AUTH_KEK= (empty → dev fallback)
    with a comment explaining the production requirement.
  - services/mana-auth/package.json gains "test": "bun test".

Verified: 11/11 KEK tests passing, 31/31 Phase 1 client tests still
passing, only pre-existing TS errors remain in mana-auth (auth.ts:281
forgetPassword + api-keys.ts:50 insert overload — both unrelated).

Phase 3: client wires the MemoryKeyProvider to GET /encryption-vault/key
on login, flips registry entries to enabled:true table by table, and
extends the Dexie hooks to call wrapValue/unwrapValue on configured
fields.
Phase 4: settings UI for lock state, key rotation, recovery code opt-in.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:38:09 +02:00
Till JS
b2adaaa30e refactor(mana-auth): route emails through mana-notify instead of Nodemailer
Replace direct Brevo SMTP sending with HTTP calls to mana-notify's
notification API. This centralizes all email configuration in one
service (mana-notify) and removes the nodemailer dependency from
mana-auth. SMTP provider is now swappable via a single env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 15:01:27 +02:00
Till JS
61ee1ae269 feat(services): create mana-auth (Hono + Bun) — Phase 5 auth rewrite
Rewrite the central authentication service from NestJS to Hono + Bun.
Uses Better Auth's native fetch-based handler — no Express conversion.

Key architecture changes:
- Better Auth handler mounted directly on Hono (app.all('/api/auth/*'))
- No NestJS DI, modules, guards, decorators — plain TypeScript
- JWT validation via jose (same as extracted services)
- Email via nodemailer (simplified, German templates)
- ~1,400 LOC vs ~11,500 LOC in NestJS (88% reduction)

Service structure:
- auth/better-auth.config.ts — copied from mana-core-auth (framework-agnostic)
- auth/stores.ts — in-memory stores for email redirect URLs
- email/send.ts — nodemailer email functions
- middleware/ — JWT auth, service auth, error handler (shared pattern)
- db/schema/ — copied from mana-core-auth (Drizzle, framework-agnostic)

Port: 3001 (same as mana-core-auth — drop-in replacement)
Database: mana_auth (same DB, same schemas)

Better Auth plugins: Organization, JWT (EdDSA), OIDC Provider,
Two-Factor (TOTP), Magic Link

Note: This is the initial version. Guilds, API keys, Me (GDPR),
security (lockout/audit), and admin endpoints will be added
incrementally. The old mana-core-auth remains until fully replaced.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 02:43:44 +01:00