mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 21:21:10 +02:00
feat(auth): error-classification layer + passkey end-to-end
Two interlocking fixes driven by a production lockout incident. ## Bug that motivated this A fresh schema-drift column (auth.users.onboarding_completed_at) made every Better Auth query crash with Postgres 42703. The /login wrapper swallowed the non-2xx and mapped it onto a generic "401 Invalid credentials" AND bumped the password lockout counter — so 5 legit login attempts against a broken DB would have locked every real user out of their own account. Same wrapper pattern on /register, /refresh, /reset-password etc. The 30-minute hunt ended in a one-off repro script that finally surfaced the real Postgres error. The user-facing passkey button additionally returned generic 404s on every login-page mount because the route wasn't registered (the DB schema existed, the Better Auth plugin wasn't wired). ## Phase 1 — Error classification (services/mana-auth/src/lib/auth-errors) - 19-code AuthErrorCode taxonomy (INVALID_CREDENTIALS, EMAIL_NOT_VERIFIED, ACCOUNT_LOCKED, SERVICE_UNAVAILABLE, PASSKEY_VERIFICATION_FAILED, …) - classifyFromResponse/classifyFromError handle: Better Auth APIError (duck-typed on `name === 'APIError'`), Postgres errors (23505 unique, 42703/08xxx → infra), ZodError, fetch/ECONNREFUSED network errors, bare Error, unknown. - respondWithError routes the structured response, logs at the right level, fires the correct security event, and CRITICALLY only bumps the lockout counter for actual credential failures — SERVICE_UNAVAILABLE and INTERNAL never touch lockout. - All 12 endpoints in routes/auth.ts refactored (/login, /register, /logout, /session-to-token, /refresh, /validate, /forgot-password, /reset-password, /resend-verification, /profile GET+POST, /change-email, /change-password, /account DELETE). - Fixed pre-existing auth.api.forgetPassword typo (→ requestPasswordReset). - shared-logger + requestLogger middleware wired in index.ts; all console.* calls in the service removed. ## Phase 2 — Passkey end-to-end (@better-auth/passkey 1.6+) - sql/007_passkey_bootstrap.sql: idempotent schema alignment — friendly_name→name, +aaguid, transports jsonb→text, +method column on login_attempts. - better-auth.config.ts: passkey plugin wired with rpID/rpName/origin from new webauthn config section. rpID defaults to mana.how in prod (from COOKIE_DOMAIN), localhost in dev. - routes/passkeys.ts: 7 wrapper endpoints (capability probe, register/options+verify, authenticate/options+verify with JWT mint, list, delete, rename). Each routes errors through the classifier; authenticate/verify promotes generic INVALID_CREDENTIALS to PASSKEY_VERIFICATION_FAILED. - PasskeyRateLimitService: in-memory per-IP (options: 20/min) and per-credential (verify: 10 failures/min → 5 min cooldown) buckets. Deliberately separate from the password lockout — different factor, different blast radius. - Client: authService.getPasskeyCapability() async probe, memoised per session. authStore.passkeyAvailable reactive state. LoginPage gates on === true so a slow probe doesn't flash the button in. - AuthResult grew a code: AuthErrorCode field; handleAuthError in shared-auth prefers the server envelope over the legacy message heuristics. ## Tests - 30 unit tests for the classifier covering every branch (including the exact Postgres 42703 shape that started this). - 9 unit tests for the rate limiter. - 14 integration tests for the auth routes — the regression test explicitly asserts "upstream 500 → 503 + zero lockout bumps". - 101 tests pass, 0 fail, 30 pre-existing skips unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b204958007
commit
e66654068f
24 changed files with 3450 additions and 552 deletions
110
services/mana-auth/sql/007_passkey_bootstrap.sql
Normal file
110
services/mana-auth/sql/007_passkey_bootstrap.sql
Normal file
|
|
@ -0,0 +1,110 @@
|
|||
-- 007_passkey_bootstrap.sql
|
||||
--
|
||||
-- Aligns auth.passkeys with the expected schema of
|
||||
-- `@better-auth/passkey` (1.6+) and extends auth.login_attempts with
|
||||
-- a `method` column so passkey failures can be bucketed separately
|
||||
-- from password failures for rate-limit/lockout accounting.
|
||||
--
|
||||
-- Idempotent. Safe to re-run against a fresh or partially-migrated
|
||||
-- dev database. No destructive drops — we only ADD or RENAME.
|
||||
--
|
||||
-- Applied via psql (not drizzle-kit push) because:
|
||||
-- - drizzle-kit push treats column renames as drop + add unless
|
||||
-- confirmed interactively, which would delete existing passkey
|
||||
-- rows if there were any;
|
||||
-- - adding NOT NULL / DEFAULT in a push without a USING clause
|
||||
-- fails against tables with existing rows.
|
||||
--
|
||||
-- Usage (dev):
|
||||
-- docker exec -i mana-postgres psql -U mana -d mana_platform \
|
||||
-- < services/mana-auth/sql/007_passkey_bootstrap.sql
|
||||
--
|
||||
-- Production: run under migrations tooling once the pattern exists.
|
||||
-- The mana-auth CLAUDE.md notes the repo convention that hand-
|
||||
-- authored SQL migrations under sql/ are applied by hand.
|
||||
|
||||
BEGIN;
|
||||
|
||||
-- ─── Passkey schema alignment ──────────────────────────────────
|
||||
|
||||
-- friendly_name → name
|
||||
-- Better Auth's plugin schema calls the column `name`. Rename
|
||||
-- without dropping so any rows survive (none expected in dev, but
|
||||
-- the migration is idempotent regardless).
|
||||
DO $$
|
||||
BEGIN
|
||||
IF EXISTS (
|
||||
SELECT 1 FROM information_schema.columns
|
||||
WHERE table_schema = 'auth' AND table_name = 'passkeys'
|
||||
AND column_name = 'friendly_name'
|
||||
) AND NOT EXISTS (
|
||||
SELECT 1 FROM information_schema.columns
|
||||
WHERE table_schema = 'auth' AND table_name = 'passkeys'
|
||||
AND column_name = 'name'
|
||||
) THEN
|
||||
ALTER TABLE auth.passkeys RENAME COLUMN friendly_name TO name;
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
-- Add aaguid — the authenticator AAGUID is optional in WebAuthn but
|
||||
-- required by Better Auth's schema. Nullable so existing rows (if
|
||||
-- any) stay valid.
|
||||
ALTER TABLE auth.passkeys ADD COLUMN IF NOT EXISTS aaguid text;
|
||||
|
||||
-- Convert transports from jsonb to text (CSV of AuthenticatorTransport
|
||||
-- values). Better Auth stores it as a plain string like
|
||||
-- "usb,nfc,hybrid"; jsonb would force the plugin to JSON.parse on
|
||||
-- every read.
|
||||
--
|
||||
-- Postgres forbids subqueries directly in ALTER TABLE … USING, so
|
||||
-- we stage the conversion through a dedicated helper function (which
|
||||
-- can freely contain subqueries) and drop the function after use.
|
||||
DO $$
|
||||
DECLARE
|
||||
current_type text;
|
||||
BEGIN
|
||||
SELECT data_type INTO current_type
|
||||
FROM information_schema.columns
|
||||
WHERE table_schema = 'auth' AND table_name = 'passkeys'
|
||||
AND column_name = 'transports';
|
||||
|
||||
IF current_type = 'jsonb' THEN
|
||||
CREATE OR REPLACE FUNCTION pg_temp.jsonb_array_to_csv(j jsonb)
|
||||
RETURNS text LANGUAGE sql IMMUTABLE AS $fn$
|
||||
SELECT CASE
|
||||
WHEN j IS NULL THEN NULL
|
||||
WHEN jsonb_typeof(j) = 'array' THEN (
|
||||
SELECT string_agg(value, ',')
|
||||
FROM jsonb_array_elements_text(j) AS value
|
||||
)
|
||||
ELSE j::text
|
||||
END
|
||||
$fn$;
|
||||
|
||||
ALTER TABLE auth.passkeys
|
||||
ALTER COLUMN transports TYPE text
|
||||
USING (pg_temp.jsonb_array_to_csv(transports));
|
||||
END IF;
|
||||
END $$;
|
||||
|
||||
-- ─── Lockout table: method column ──────────────────────────────
|
||||
|
||||
-- Bucket login attempts by auth method so passkey + password + 2FA
|
||||
-- failures can be counted / rate-limited independently. Default
|
||||
-- 'password' for the existing pre-passkey column — that's historically
|
||||
-- what any prior row represented.
|
||||
ALTER TABLE auth.login_attempts
|
||||
ADD COLUMN IF NOT EXISTS method text NOT NULL DEFAULT 'password';
|
||||
|
||||
-- Replace the existing (email, attempted_at) index with one that
|
||||
-- also covers method, so lockout checks filter without a sequential
|
||||
-- scan. Using IF NOT EXISTS on the new index and dropping the old
|
||||
-- one afterwards keeps the migration re-runnable.
|
||||
CREATE INDEX IF NOT EXISTS login_attempts_email_method_time_idx
|
||||
ON auth.login_attempts (email, method, attempted_at);
|
||||
|
||||
-- The old (email, attempted_at) index becomes redundant once the new
|
||||
-- one exists (queries on email+method still use the new one).
|
||||
DROP INDEX IF EXISTS auth.login_attempts_email_attempted_at_idx;
|
||||
|
||||
COMMIT;
|
||||
Loading…
Add table
Add a link
Reference in a new issue