managarten/services/mana-persona-runner/CLAUDE.md
Till JS f07eae3c01 feat(personas): M3.b-d — tick loop + Claude Agent SDK + persistence (real)
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.

Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.

Internal endpoints (mana-auth, service-key-gated)

- GET  /api/v1/internal/personas/due
    Returns personas whose tickCadence + lastActiveAt say they're
    due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
    NULLS FIRST so never-run personas go ahead of stale ones.

- POST /api/v1/internal/personas/:id/actions
    Batch ≤ 500. Row ids are deterministic
    `${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
    runner can retry a tick without doubling audit rows. Also
    bumps personas.last_active_at so the next /due call sees it.

- POST /api/v1/internal/personas/:id/feedback
    Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
    one rating per module per tick.

Runner tick pipeline (services/mana-persona-runner/src/runner/)

- claude-session.ts
    Two phases per tick. runMainTurn feeds the persona's system
    prompt + a German "simulate a day" user prompt to Claude Agent
    SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
    server. We iterate the returned AsyncGenerator and extract
    tool_use blocks into ActionRows; a tool_result with
    is_error=true flips the most recent action. runRatingTurn is a
    fresh query() with tools:[] asking Claude in character to rate
    each used module 1-5 as strict JSON. We parse with tolerance
    for whitespace / fences. Unparseable output becomes a synthetic
    '__parse' feedback row so operators see the failure.

- tick.ts
    Orchestrator. Skips when config.paused. Fetches /due, processes
    in batches of config.concurrency via Promise.allSettled so a
    single persona failure never kills the batch. Returns
    {due, ranSuccessfully, failed[], durationMs}.

- types.ts
    ActionRow + FeedbackRow shapes shared between claude-session
    and the internal client.

Runner bootstrap (src/index.ts)

- setInterval(config.tickIntervalMs) starts the tick loop on boot.
  tickInFlight guards against overlap when Claude latency >
  interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
  loop is disabled with a warn line — /health + /diag/login still
  work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
  the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.

Client

- clients/mana-auth-internal.ts
    X-Service-Key client for the three endpoints above.
    Constructor throws on empty serviceKey — fail loud.

Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.

M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.

M2.d (cross-space family/team memberships) still deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:18:31 +02:00

163 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# mana-persona-runner
Tick-loop service that drives the **M2 personas** through the app via Claude + the **mana-mcp** gateway. Test infrastructure — not a user-facing service, not deployed to prod until the runner has proven itself in staging.
**Plan:** [`docs/plans/mana-mcp-and-personas.md`](../../docs/plans/mana-mcp-and-personas.md) (M3)
## Tech Stack
| Layer | Technology |
|-------|------------|
| **Runtime** | Bun |
| **Framework** | Hono |
| **AI** | `@anthropic-ai/claude-agent-sdk` (native MCP tool-loop) |
| **Tools** | `mana-mcp` (`:3069`) — Streamable HTTP, per-persona JWT |
| **Upstream** | `mana-auth` (`:3001`) for login + spaces + action/feedback persistence |
## Port: 3070
## What it does (when the tick loop lands — M3.b)
Every `TICK_INTERVAL_MS`:
1. Query `auth.personas` for rows whose `tickCadence` + `lastActiveAt` make them due.
2. Limit to `PERSONA_CONCURRENCY` personas in parallel.
3. For each due persona:
- **Login**: `POST /api/v1/auth/login` with deterministic HMAC-derived password (same algorithm as `scripts/personas/password.ts`).
- **Resolve space**: `GET /api/auth/organization/list`, pick first `personal` space.
- **Claude call**: `@anthropic-ai/claude-agent-sdk` with `persona.systemPrompt`, MCP server wired to `:3069`, `X-Mana-Space` pinned to the persona's personal space.
- **Self-reflection**: after the tool loop settles, ask Claude in-character to rate each module used (15 + note).
- **Persist**: `POST /api/v1/internal/personas/:id/actions` and `/feedback` on mana-auth (service-key auth).
## Files
- `src/config.ts` — env-driven config + production-secret assertion
- `src/clients/auth.ts` — login + listSpaces, convenience `loginAndResolvePersonalSpace`
- `src/clients/mana-auth-internal.ts``X-Service-Key`-gated calls: `listDuePersonas`, `postActions`, `postFeedback`
- `src/password.ts` — HMAC derivation (mirror of `scripts/personas/password.ts`, see comment)
- `src/runner/claude-session.ts` — per-tick `runMainTurn` + `runRatingTurn` on top of `@anthropic-ai/claude-agent-sdk`
- `src/runner/tick.ts` — orchestrator: due → concurrency-limited fan-out → per-persona pipeline
- `src/runner/types.ts``ActionRow`/`FeedbackRow` shapes shared between runner modules
- `src/index.ts` — Hono app, `/health`, `/metrics`, dev-only `/diag/login` + `/diag/tick`
## Tick pipeline (M3.b)
```
setInterval(config.tickIntervalMs)
GET /api/v1/internal/personas/due (service-key)
│ due? hourly>1h, daily>24h, weekdays>24h mon-fri
for each persona (max concurrency at once):
POST /api/v1/auth/login (persona JWT)
GET /api/auth/organization/list (personal space id)
runMainTurn
query({ systemPrompt, mcpServers: { mana: {type:'http', url, headers} }, maxTurns })
for each SDKMessage:
tool_use block → push ActionRow (ok provisional)
tool_result err → flip last ActionRow to 'error'
module prefix → modulesUsed.add(module)
runRatingTurn (same systemPrompt, fresh query, tools:[])
prompt: 'rate each of {modulesUsed} 1-5, respond JSON'
parse {ratings:[{module,rating,notes}]} → FeedbackRow[]
invalid JSON → one synthetic rating row '__parse' as marker
POST /api/v1/internal/personas/:id/actions (idempotent, batch ≤500)
POST /api/v1/internal/personas/:id/feedback (idempotent, batch ≤100)
mana-auth writes rows + bumps personas.last_active_at
```
The outer tick `Promise.allSettled`s each persona, so one failure never
kills the batch. Per-persona exceptions become `failed: [{persona,error}]`
entries in the tick result and get logged. `tickInFlight` guards against
overlap when Claude latency exceeds the interval.
## What's NOT in M3.b (deferred)
- Precise `tool_use_id``tool_result` pairing. Today the last action
gets flipped to `error` when a `tool_result` carries `is_error: true`.
Good enough for the audit dashboard; exact attribution lands when the
dashboard needs it.
- Retries/back-off on Claude 429/5xx. The SDK has some built-in; we do
no extra handling. A burst of 429s can fail a batch — next tick picks
them up anyway.
- Prometheus counters. Health + log lines today, counters when the
dashboard lands in M6.
## Environment Variables
```env
PORT=3070
MANA_AUTH_URL=http://localhost:3001
MANA_MCP_URL=http://localhost:3069
# Service-to-service auth for action/feedback persistence (M3.c).
MANA_SERVICE_KEY=...
# Claude API key the runner uses to drive each persona's turn.
ANTHROPIC_API_KEY=...
# Must match whatever the seed script used when the personas were created.
# In production: rotate together with the seed script's env.
PERSONA_SEED_SECRET=...
# Tick loop (M3.b).
TICK_INTERVAL_MS=60000
PERSONA_CONCURRENCY=2
# Operational kill-switch. When true, the service stays up (health-ok)
# but no ticks fire. Useful during demos or when debugging a persona.
RUNNER_PAUSED=false
```
## End-to-end smoke (M3 exit gate)
Proves: personas exist, runner picks them up, Claude drives tools via
MCP, actions + ratings land in Postgres.
```bash
# 1. Stack
pnpm docker:up
cd services/mana-auth && bun run db:push # applies users.kind + auth.personas* tables
pnpm dev:auth # mana-auth on 3001
pnpm dev:sync # mana-sync on 3050
pnpm --filter @mana/mcp-service dev # mana-mcp on 3069
pnpm --filter @mana/persona-runner dev # this service on 3070
# (boots warning-only if MANA_SERVICE_KEY or ANTHROPIC_API_KEY missing)
# 2. Seed the 10 catalog personas
export MANA_ADMIN_JWT=# admin-tier JWT
export PERSONA_SEED_SECRET=# any value; must match runner
pnpm seed:personas
# 3. Verify login works
curl -s "localhost:3070/diag/login?email=persona.anna@mana.test" | jq
# → { ok: true, userId: "…", spaceId: "…" }
# 4. Fire a tick manually (dev-only endpoint, avoids waiting the full interval)
export MANA_SERVICE_KEY=
export ANTHROPIC_API_KEY=
curl -s -X POST localhost:3070/diag/tick | jq
# → { ok: true, result: { due: 10, ranSuccessfully: N, failed: [], durationMs: … } }
# 5. Inspect what landed
psql -c "SELECT persona_id, tool_name, result FROM auth.persona_actions ORDER BY created_at DESC LIMIT 20;"
psql -c "SELECT persona_id, module, rating, notes FROM auth.persona_feedback ORDER BY created_at DESC LIMIT 20;"
```
A green run through step 5 is the M3 exit criterion.
## Why a separate service (not part of mana-ai)
- **Lifecycle**: persona-runner is test infra. Starts and stops with a demo, can be paused without downtime noise. mana-ai is a production worker for real user missions — different risk profile.
- **Observability**: mixing them means "is this tick Anna running the suite or a real user running their mission?" becomes a log-filter problem. Separate services give you separate Prometheus scrapes.
- **Tool source**: mana-ai today uses an internal tool catalog; persona-runner uses MCP. When M4 unifies both onto `@mana/tool-registry`, the split still makes sense as two consumers of the same tool surface.