feat(auth): error-classification layer + passkey end-to-end

Two interlocking fixes driven by a production lockout incident.

## Bug that motivated this

A fresh schema-drift column (auth.users.onboarding_completed_at) made
every Better Auth query crash with Postgres 42703. The /login wrapper
swallowed the non-2xx and mapped it onto a generic "401 Invalid
credentials" AND bumped the password lockout counter — so 5 legit
login attempts against a broken DB would have locked every real user
out of their own account. Same wrapper pattern on /register, /refresh,
/reset-password etc. The 30-minute hunt ended in a one-off repro
script that finally surfaced the real Postgres error.

The user-facing passkey button additionally returned generic 404s on
every login-page mount because the route wasn't registered (the DB
schema existed, the Better Auth plugin wasn't wired).

## Phase 1 — Error classification (services/mana-auth/src/lib/auth-errors)

- 19-code AuthErrorCode taxonomy (INVALID_CREDENTIALS, EMAIL_NOT_VERIFIED,
  ACCOUNT_LOCKED, SERVICE_UNAVAILABLE, PASSKEY_VERIFICATION_FAILED, …)
- classifyFromResponse/classifyFromError handle: Better Auth APIError
  (duck-typed on `name === 'APIError'`), Postgres errors (23505 unique,
  42703/08xxx → infra), ZodError, fetch/ECONNREFUSED network errors,
  bare Error, unknown.
- respondWithError routes the structured response, logs at the right
  level, fires the correct security event, and CRITICALLY only bumps
  the lockout counter for actual credential failures — SERVICE_UNAVAILABLE
  and INTERNAL never touch lockout.
- All 12 endpoints in routes/auth.ts refactored (/login, /register,
  /logout, /session-to-token, /refresh, /validate, /forgot-password,
  /reset-password, /resend-verification, /profile GET+POST,
  /change-email, /change-password, /account DELETE).
- Fixed pre-existing auth.api.forgetPassword typo (→ requestPasswordReset).
- shared-logger + requestLogger middleware wired in index.ts; all
  console.* calls in the service removed.

## Phase 2 — Passkey end-to-end (@better-auth/passkey 1.6+)

- sql/007_passkey_bootstrap.sql: idempotent schema alignment —
  friendly_name→name, +aaguid, transports jsonb→text, +method column
  on login_attempts.
- better-auth.config.ts: passkey plugin wired with rpID/rpName/origin
  from new webauthn config section. rpID defaults to mana.how in prod
  (from COOKIE_DOMAIN), localhost in dev.
- routes/passkeys.ts: 7 wrapper endpoints (capability probe,
  register/options+verify, authenticate/options+verify with JWT mint,
  list, delete, rename). Each routes errors through the classifier;
  authenticate/verify promotes generic INVALID_CREDENTIALS to
  PASSKEY_VERIFICATION_FAILED.
- PasskeyRateLimitService: in-memory per-IP (options: 20/min) and
  per-credential (verify: 10 failures/min → 5 min cooldown) buckets.
  Deliberately separate from the password lockout — different factor,
  different blast radius.
- Client: authService.getPasskeyCapability() async probe, memoised per
  session. authStore.passkeyAvailable reactive state. LoginPage gates
  on === true so a slow probe doesn't flash the button in.
- AuthResult grew a code: AuthErrorCode field; handleAuthError in
  shared-auth prefers the server envelope over the legacy message
  heuristics.

## Tests

- 30 unit tests for the classifier covering every branch (including
  the exact Postgres 42703 shape that started this).
- 9 unit tests for the rate limiter.
- 14 integration tests for the auth routes — the regression test
  explicitly asserts "upstream 500 → 503 + zero lockout bumps".
- 101 tests pass, 0 fail, 30 pre-existing skips unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-24 01:52:51 +02:00
parent b204958007
commit e66654068f
24 changed files with 3450 additions and 552 deletions

View file

@ -153,7 +153,16 @@ export const jwks = authSchema.table('jwks', {
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
});
// Passkeys table (WebAuthn credentials)
// Passkeys table (WebAuthn credentials).
// Field names match `@better-auth/passkey`'s expected schema so the
// Drizzle adapter can write/read directly without a translation layer.
// Notably: the TS field is `credentialID` (capital I/D) even though
// the SQL column stays snake_case; the plugin dereferences by TS name.
// `transports` is a comma-separated string (not jsonb) because the
// plugin stores the AuthenticatorTransport[] as a CSV.
// `name` (was `friendlyName`) is user-provided.
// `lastUsedAt` is ours — populated by the wrapper on successful
// authentication; the plugin itself doesn't touch it.
export const passkeys = authSchema.table(
'passkeys',
{
@ -161,13 +170,14 @@ export const passkeys = authSchema.table(
userId: text('user_id')
.references(() => users.id, { onDelete: 'cascade' })
.notNull(),
credentialId: text('credential_id').unique().notNull(), // base64url-encoded
credentialID: text('credential_id').unique().notNull(), // base64url-encoded
publicKey: text('public_key').notNull(), // base64url-encoded COSE public key
counter: integer('counter').default(0).notNull(), // signature counter
deviceType: text('device_type').notNull(), // 'singleDevice' | 'multiDevice'
backedUp: boolean('backed_up').default(false).notNull(),
transports: jsonb('transports').$type<string[]>(), // ['internal', 'hybrid', etc.]
friendlyName: text('friendly_name'),
transports: text('transports'), // CSV of AuthenticatorTransport values
name: text('name'),
aaguid: text('aaguid'), // authenticator AAGUID (optional)
lastUsedAt: timestamp('last_used_at', { withTimezone: true }),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
},