Commit graph

4 commits

Author SHA1 Message Date
Till JS
79d112657c feat(personas): M5.a — Playwright visual suite scaffold
Smallest possible foundation for the persona-driven visual regression
suite (M5 in docs/plans/mana-mcp-and-personas.md). One flow, two
viewports, one persona — enough to prove the stack end-to-end:
seed-script → mana-auth → API login → cookie injection → web app →
screenshot → disk. Extending is copy-paste per flow.

tests/personas/
  playwright.config.ts
    Own config separate from the root tests/e2e/ suite. Two viewports
    (1440×900 desktop Chrome + Pixel 5 mobile) — more can be added
    once baselines settle without quadrupling the review load.
    Diff threshold 0.2 %, animations disabled, snapshots land under
    __snapshots__/{spec}/{arg}-{project}.png. No auto-webServer —
    the whole point is to catch regressions against the real stack
    the user runs, not a hermetic one; if the stack is down, tests
    fail loud.

  fixtures/persona-auth.ts
    Typed Playwright `test.extend` with a `personaKey` worker option
    and a `personaPage` fixture that returns a pre-logged-in Page
    pointed at `/`. Login is API-side: POST /api/v1/auth/login with
    the deterministic HMAC-SHA256 password, parse Set-Cookie headers,
    inject into the browser context. Derivation is a bit-identical
    mirror of scripts/personas/password.ts and
    services/mana-persona-runner/src/password.ts — a 3-way contract.
    Changing one without the others locks the suite out of every
    persona. PERSONAS map exports all 10 catalog emails for typed
    access.

  flows/home.spec.ts
    One smoke flow. Asserts the persona isn't redirected to /login,
    hides any [data-testid="live-time"] so clock widgets don't
    invalidate diffs, captures a full-page screenshot. When this
    goes green, the whole pipeline is plumbed. Copy this file to
    add per-module tours.

  package.json
    @mana/tests-personas workspace. Scripts: `test`, `test:update`,
    `report` (HTML diff viewer).

  README.md
    Prerequisites (stack up + seeded + ideally persona-runner ticked
    once), run recipe, env vars, architecture diagram, extension
    pattern.

root package.json: `pnpm test:personas` + `:update`.
.gitignore: playwright-report-personas/ + test-results/ so generated
artefacts never get committed.

Type-check / list: `playwright test --list` succeeds, 2 tests (one
per viewport) registered for home.spec.ts.

Not attempted in this commit (user action to run the stack):
- Actual baseline capture (needs docker up + db:push + seed:personas
  + ANTHROPIC_API_KEY + diag/tick).
- Additional flows (todo, journal, notes, habits, calendar). They're
  copy-paste per README. Land when the stack is smoked.
- Nightly CI job. Will land once baselines are stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:33:06 +02:00
Till JS
f07eae3c01 feat(personas): M3.b-d — tick loop + Claude Agent SDK + persistence (real)
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.

Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.

Internal endpoints (mana-auth, service-key-gated)

- GET  /api/v1/internal/personas/due
    Returns personas whose tickCadence + lastActiveAt say they're
    due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
    NULLS FIRST so never-run personas go ahead of stale ones.

- POST /api/v1/internal/personas/:id/actions
    Batch ≤ 500. Row ids are deterministic
    `${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
    runner can retry a tick without doubling audit rows. Also
    bumps personas.last_active_at so the next /due call sees it.

- POST /api/v1/internal/personas/:id/feedback
    Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
    one rating per module per tick.

Runner tick pipeline (services/mana-persona-runner/src/runner/)

- claude-session.ts
    Two phases per tick. runMainTurn feeds the persona's system
    prompt + a German "simulate a day" user prompt to Claude Agent
    SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
    server. We iterate the returned AsyncGenerator and extract
    tool_use blocks into ActionRows; a tool_result with
    is_error=true flips the most recent action. runRatingTurn is a
    fresh query() with tools:[] asking Claude in character to rate
    each used module 1-5 as strict JSON. We parse with tolerance
    for whitespace / fences. Unparseable output becomes a synthetic
    '__parse' feedback row so operators see the failure.

- tick.ts
    Orchestrator. Skips when config.paused. Fetches /due, processes
    in batches of config.concurrency via Promise.allSettled so a
    single persona failure never kills the batch. Returns
    {due, ranSuccessfully, failed[], durationMs}.

- types.ts
    ActionRow + FeedbackRow shapes shared between claude-session
    and the internal client.

Runner bootstrap (src/index.ts)

- setInterval(config.tickIntervalMs) starts the tick loop on boot.
  tickInFlight guards against overlap when Claude latency >
  interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
  loop is disabled with a warn line — /health + /diag/login still
  work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
  the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.

Client

- clients/mana-auth-internal.ts
    X-Service-Key client for the three endpoints above.
    Constructor throws on empty serviceKey — fail loud.

Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.

M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.

M2.d (cross-space family/team memberships) still deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:18:31 +02:00
Till JS
493db0c3b2 feat(personas): M2.a-c — persona schemas + admin endpoints + seed pipeline
Continuation of docs/plans/mana-mcp-and-personas.md. Personas are the
auto-test users the M3 runner will drive — they're real Mana users
(kind='persona', tier='founder'), registered through the same Better
Auth pipeline as humans, just stamped differently and metadata-tracked
so the persona-runner knows how to role-play them.

Schemas (auth namespace — personas are 1:1 with users, no reason for a
separate platform.* schema that the plan originally sketched)

- userKindEnum ('human' | 'persona' | 'system') + users.kind column,
  wired into better-auth additionalFields so the JWT/user object carry
  the flag. Default 'human' keeps every existing user untouched.
- auth.personas — 1:1 descriptor (archetype, systemPrompt, moduleMix
  jsonb, tickCadence, lastActiveAt). CASCADE from users.id.
- auth.persona_actions — tick-grouped audit of every tool call the
  runner makes (toolName, inputHash for dedup, result, latency).
- auth.persona_feedback — structured 1-5 ratings per module per tick,
  plus free-text notes. This is where the runner writes the
  self-reflection step at end of each tick.

Admin endpoints (/api/v1/admin/personas, admin-tier-gated)

- POST /            create-or-update by email. Uses auth.api.signUpEmail
                    if the user's new, then stamps kind+tier+verified
                    and upserts the personas row. Idempotent — safe to
                    re-run after catalog edits.
- GET  /            list with 7-day action count per persona.
- GET  /:id         detail + recent 20 actions + per-module feedback
                    aggregate.
- DELETE /:id       hard delete. Refuses non-persona users as
                    defense-in-depth: an admin typo here would cascade
                    through the full user-delete chain.

Catalog + seed pipeline (scripts/personas/)

- catalog.json      10 handwritten personas spanning 7 archetypes
                    (adhd-student, ceo-busy, creative-parent, solo-dev,
                    researcher, freelancer, overwhelmed-newbie).
                    Five pairs of personas that will later share
                    family/team spaces (cross-space setup is deferred
                    to M2.d per the plan).
- catalog.ts        zod-validated loader. Refines email to require
                    @mana.test TLD — non-existent, no bounce risk.
- password.ts       deterministic HMAC-SHA256(PERSONA_SEED_SECRET,
                    email). No stored per-persona credentials; the
                    runner re-derives on every login. Refuses the
                    dev-fallback secret in production.
- seed.ts           POST /admin/personas per catalog entry. Flags:
                    --auth=, --jwt=, --dry-run.
- cleanup.ts        Hard-delete every live persona. Warns when the
                    live set drifts from the catalog.

Root package.json:
  pnpm seed:personas
  pnpm seed:personas:cleanup

Extends the ESLint root-ignore list with `scripts/**` so Bun-typed
utility scripts don't fail the typed-parser check they weren't opted
into. Consistent with the rest of scripts/ being .mjs+.sh.

To go live (user action):
  pnpm docker:up
  cd services/mana-auth && bun run db:push
  export MANA_ADMIN_JWT=...
  pnpm seed:personas

M2.d deferred: cross-space (family/team/practice) memberships between
persona pairs. Better Auth's org-invite flow is multi-step and would
roughly double the M2 scope; the persona-runner (M3) can operate in
personal spaces first, shared-space tests land as their own milestone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:55:14 +02:00
Till JS
16c8818338 feat(mcp): M1+M1.5 MCP gateway + tool-registry + shared-crypto
Foundation for autonomous Claude-driven testing. Plan:
docs/plans/mana-mcp-and-personas.md.

New packages
- @mana/tool-registry — schema-first ToolSpec<InputSchema, OutputSchema>
  with zod generics, scope ('user-space' | 'admin') and policyHint
  ('read' | 'write' | 'destructive'). sync-client helpers speak the
  mana-sync push/pull protocol directly so RLS and field-level LWW are
  preserved. MasterKeyClient fetches per-user MKs via the existing
  mana-auth GET /api/v1/me/encryption-vault/key endpoint (JWT-gated,
  ZK-aware, already audited) — no new service-key endpoint built.
  ZeroKnowledgeUserError surfaced as a typed throw.
- @mana/shared-crypto — AES-GCM-256 primitives extracted from the web
  app's $lib/data/crypto/aes.ts so the server-side tool handlers and the
  browser produce byte-for-byte identical wire format
  (enc:1:{b64(iv)}.{b64(ct)}). Web app aes.ts now re-exports from
  shared-crypto — 5 existing importers unchanged, svelte-check stays
  green.

New service
- services/mana-mcp (:3069, Bun/Hono) — MCP Streamable HTTP gateway.
  JWKS auth against mana-auth, per-user session isolation (session-id
  belongs to the user who opened it — cross-user access returns 403),
  admin-scoped tools filtered out before registration. MasterKeyClient
  cached per process with a 5-minute TTL.

11 tools registered
- habits.{create,list,update,archive}, spaces.list (plaintext, M1)
- todo.{create,list,complete}, notes.{create,search}, journal.add
  (encrypted — field lists match
  apps/mana/apps/web/src/lib/data/crypto/registry.ts verbatim)

Infra
- Port 3069 added to docs/PORT_SCHEMA.md
- services/mana-mcp/CLAUDE.md with architecture, auth model,
  tool-authoring recipe, local smoke-test steps
- Root CLAUDE.md services list updated

Type-check green across shared-crypto, mana-tool-registry, mana-mcp.
svelte-check on apps/mana/apps/web stays at 0 errors / 0 warnings.
Boot smoke verified: /health returns registry.loaded=true, unauthed
/mcp → 401, invalid-JWT /mcp → 401 with descriptive message.

Decisions locked in for later milestones (per plan D1–D10):
- Personas will be real mana-auth users (users.kind='persona'), no
  service-key bypass (D1, D2)
- Tool-registry is the SSOT; mana-ai and the legacy
  apps/api/src/mcp/server.ts get merged into it in M4 (three current
  parallel tool catalogs collapse to one)
- Persona-runner (:3070) will be a separate service using the Claude
  Agent SDK + MCP client (D5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:18:35 +02:00