managarten/services/mana-persona-runner/CLAUDE.md
Till JS f07eae3c01 feat(personas): M3.b-d — tick loop + Claude Agent SDK + persistence (real)
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.

Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.

Internal endpoints (mana-auth, service-key-gated)

- GET  /api/v1/internal/personas/due
    Returns personas whose tickCadence + lastActiveAt say they're
    due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
    NULLS FIRST so never-run personas go ahead of stale ones.

- POST /api/v1/internal/personas/:id/actions
    Batch ≤ 500. Row ids are deterministic
    `${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
    runner can retry a tick without doubling audit rows. Also
    bumps personas.last_active_at so the next /due call sees it.

- POST /api/v1/internal/personas/:id/feedback
    Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
    one rating per module per tick.

Runner tick pipeline (services/mana-persona-runner/src/runner/)

- claude-session.ts
    Two phases per tick. runMainTurn feeds the persona's system
    prompt + a German "simulate a day" user prompt to Claude Agent
    SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
    server. We iterate the returned AsyncGenerator and extract
    tool_use blocks into ActionRows; a tool_result with
    is_error=true flips the most recent action. runRatingTurn is a
    fresh query() with tools:[] asking Claude in character to rate
    each used module 1-5 as strict JSON. We parse with tolerance
    for whitespace / fences. Unparseable output becomes a synthetic
    '__parse' feedback row so operators see the failure.

- tick.ts
    Orchestrator. Skips when config.paused. Fetches /due, processes
    in batches of config.concurrency via Promise.allSettled so a
    single persona failure never kills the batch. Returns
    {due, ranSuccessfully, failed[], durationMs}.

- types.ts
    ActionRow + FeedbackRow shapes shared between claude-session
    and the internal client.

Runner bootstrap (src/index.ts)

- setInterval(config.tickIntervalMs) starts the tick loop on boot.
  tickInFlight guards against overlap when Claude latency >
  interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
  loop is disabled with a warn line — /health + /diag/login still
  work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
  the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.

Client

- clients/mana-auth-internal.ts
    X-Service-Key client for the three endpoints above.
    Constructor throws on empty serviceKey — fail loud.

Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.

M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.

M2.d (cross-space family/team memberships) still deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:18:31 +02:00

7.1 KiB
Raw Blame History

mana-persona-runner

Tick-loop service that drives the M2 personas through the app via Claude + the mana-mcp gateway. Test infrastructure — not a user-facing service, not deployed to prod until the runner has proven itself in staging.

Plan: docs/plans/mana-mcp-and-personas.md (M3)

Tech Stack

Layer Technology
Runtime Bun
Framework Hono
AI @anthropic-ai/claude-agent-sdk (native MCP tool-loop)
Tools mana-mcp (:3069) — Streamable HTTP, per-persona JWT
Upstream mana-auth (:3001) for login + spaces + action/feedback persistence

Port: 3070

What it does (when the tick loop lands — M3.b)

Every TICK_INTERVAL_MS:

  1. Query auth.personas for rows whose tickCadence + lastActiveAt make them due.
  2. Limit to PERSONA_CONCURRENCY personas in parallel.
  3. For each due persona:
    • Login: POST /api/v1/auth/login with deterministic HMAC-derived password (same algorithm as scripts/personas/password.ts).
    • Resolve space: GET /api/auth/organization/list, pick first personal space.
    • Claude call: @anthropic-ai/claude-agent-sdk with persona.systemPrompt, MCP server wired to :3069, X-Mana-Space pinned to the persona's personal space.
    • Self-reflection: after the tool loop settles, ask Claude in-character to rate each module used (15 + note).
    • Persist: POST /api/v1/internal/personas/:id/actions and /feedback on mana-auth (service-key auth).

Files

  • src/config.ts — env-driven config + production-secret assertion
  • src/clients/auth.ts — login + listSpaces, convenience loginAndResolvePersonalSpace
  • src/clients/mana-auth-internal.tsX-Service-Key-gated calls: listDuePersonas, postActions, postFeedback
  • src/password.ts — HMAC derivation (mirror of scripts/personas/password.ts, see comment)
  • src/runner/claude-session.ts — per-tick runMainTurn + runRatingTurn on top of @anthropic-ai/claude-agent-sdk
  • src/runner/tick.ts — orchestrator: due → concurrency-limited fan-out → per-persona pipeline
  • src/runner/types.tsActionRow/FeedbackRow shapes shared between runner modules
  • src/index.ts — Hono app, /health, /metrics, dev-only /diag/login + /diag/tick

Tick pipeline (M3.b)

setInterval(config.tickIntervalMs)
    │
    ▼
GET  /api/v1/internal/personas/due          (service-key)
    │  due? hourly>1h, daily>24h, weekdays>24h mon-fri
    ▼
for each persona (max concurrency at once):
    │
    POST /api/v1/auth/login                 (persona JWT)
    GET  /api/auth/organization/list        (personal space id)
    │
    ▼
    runMainTurn
      query({ systemPrompt, mcpServers: { mana: {type:'http', url, headers} }, maxTurns })
      for each SDKMessage:
        tool_use block  →  push ActionRow (ok provisional)
        tool_result err →  flip last ActionRow to 'error'
        module prefix   →  modulesUsed.add(module)
    │
    ▼
    runRatingTurn (same systemPrompt, fresh query, tools:[])
      prompt: 'rate each of {modulesUsed} 1-5, respond JSON'
      parse {ratings:[{module,rating,notes}]}  →  FeedbackRow[]
      invalid JSON       →  one synthetic rating row '__parse' as marker
    │
    ▼
POST /api/v1/internal/personas/:id/actions  (idempotent, batch ≤500)
POST /api/v1/internal/personas/:id/feedback (idempotent, batch ≤100)
    │
    ▼
mana-auth writes rows + bumps personas.last_active_at

The outer tick Promise.allSettleds each persona, so one failure never kills the batch. Per-persona exceptions become failed: [{persona,error}] entries in the tick result and get logged. tickInFlight guards against overlap when Claude latency exceeds the interval.

What's NOT in M3.b (deferred)

  • Precise tool_use_idtool_result pairing. Today the last action gets flipped to error when a tool_result carries is_error: true. Good enough for the audit dashboard; exact attribution lands when the dashboard needs it.
  • Retries/back-off on Claude 429/5xx. The SDK has some built-in; we do no extra handling. A burst of 429s can fail a batch — next tick picks them up anyway.
  • Prometheus counters. Health + log lines today, counters when the dashboard lands in M6.

Environment Variables

PORT=3070
MANA_AUTH_URL=http://localhost:3001
MANA_MCP_URL=http://localhost:3069

# Service-to-service auth for action/feedback persistence (M3.c).
MANA_SERVICE_KEY=...

# Claude API key the runner uses to drive each persona's turn.
ANTHROPIC_API_KEY=...

# Must match whatever the seed script used when the personas were created.
# In production: rotate together with the seed script's env.
PERSONA_SEED_SECRET=...

# Tick loop (M3.b).
TICK_INTERVAL_MS=60000
PERSONA_CONCURRENCY=2

# Operational kill-switch. When true, the service stays up (health-ok)
# but no ticks fire. Useful during demos or when debugging a persona.
RUNNER_PAUSED=false

End-to-end smoke (M3 exit gate)

Proves: personas exist, runner picks them up, Claude drives tools via MCP, actions + ratings land in Postgres.

# 1. Stack
pnpm docker:up
cd services/mana-auth && bun run db:push     # applies users.kind + auth.personas* tables
pnpm dev:auth                                 # mana-auth on 3001
pnpm dev:sync                                 # mana-sync on 3050
pnpm --filter @mana/mcp-service dev           # mana-mcp on 3069
pnpm --filter @mana/persona-runner dev        # this service on 3070
    # (boots warning-only if MANA_SERVICE_KEY or ANTHROPIC_API_KEY missing)

# 2. Seed the 10 catalog personas
export MANA_ADMIN_JWT=# admin-tier JWT
export PERSONA_SEED_SECRET=# any value; must match runner
pnpm seed:personas

# 3. Verify login works
curl -s "localhost:3070/diag/login?email=persona.anna@mana.test" | jq
# → { ok: true, userId: "…", spaceId: "…" }

# 4. Fire a tick manually (dev-only endpoint, avoids waiting the full interval)
export MANA_SERVICE_KEY=export ANTHROPIC_API_KEY=…
curl -s -X POST localhost:3070/diag/tick | jq
# → { ok: true, result: { due: 10, ranSuccessfully: N, failed: [], durationMs: … } }

# 5. Inspect what landed
psql -c "SELECT persona_id, tool_name, result FROM auth.persona_actions ORDER BY created_at DESC LIMIT 20;"
psql -c "SELECT persona_id, module, rating, notes FROM auth.persona_feedback ORDER BY created_at DESC LIMIT 20;"

A green run through step 5 is the M3 exit criterion.

Why a separate service (not part of mana-ai)

  • Lifecycle: persona-runner is test infra. Starts and stops with a demo, can be paused without downtime noise. mana-ai is a production worker for real user missions — different risk profile.
  • Observability: mixing them means "is this tick Anna running the suite or a real user running their mission?" becomes a log-filter problem. Separate services give you separate Prometheus scrapes.
  • Tool source: mana-ai today uses an internal tool catalog; persona-runner uses MCP. When M4 unifies both onto @mana/tool-registry, the split still makes sense as two consumers of the same tool surface.