managarten/services/mana-persona-runner/CLAUDE.md

# mana-persona-runner

Tick-loop service that drives the **M2 personas** through the app via Claude + the **mana-mcp** gateway. Test infrastructure — not a user-facing service, not deployed to prod until the runner has proven itself in staging.

**Plan:** [`docs/plans/mana-mcp-and-personas.md`](../../docs/plans/mana-mcp-and-personas.md) (M3)

## Tech Stack

| Layer | Technology |
|-------|------------|
| **Runtime** | Bun |
| **Framework** | Hono |
| **AI** | `@anthropic-ai/claude-agent-sdk` (native MCP tool-loop) |
| **Tools** | `mana-mcp` (`:3069`) — Streamable HTTP, per-persona JWT |
| **Upstream** | `mana-auth` (`:3001`) for login + spaces + action/feedback persistence |

## Port: 3070

## What it does (when the tick loop lands — M3.b)

Every `TICK_INTERVAL_MS`:

1. Query `auth.personas` for rows whose `tickCadence` + `lastActiveAt` make them due.
2. Limit to `PERSONA_CONCURRENCY` personas in parallel.
3. For each due persona:
   - **Login**: `POST /api/v1/auth/login` with deterministic HMAC-derived password (same algorithm as `scripts/personas/password.ts`).
   - **Resolve space**: `GET /api/auth/organization/list`, pick first `personal` space.
   - **Claude call**: `@anthropic-ai/claude-agent-sdk` with `persona.systemPrompt`, MCP server wired to `:3069`, `X-Mana-Space` pinned to the persona's personal space.
   - **Self-reflection**: after the tool loop settles, ask Claude in-character to rate each module used (1–5 + note).
   - **Persist**: `POST /api/v1/internal/personas/:id/actions` and `/feedback` on mana-auth (service-key auth).

## Files

- `src/config.ts`     — env-driven config + production-secret assertion
- `src/clients/auth.ts` — login + listSpaces, convenience `loginAndResolvePersonalSpace`
- `src/clients/mana-auth-internal.ts` — `X-Service-Key`-gated calls: `listDuePersonas`, `postActions`, `postFeedback`
- `src/password.ts`   — HMAC derivation (mirror of `scripts/personas/password.ts`, see comment)
- `src/runner/claude-session.ts` — per-tick `runMainTurn` + `runRatingTurn` on top of `@anthropic-ai/claude-agent-sdk`
- `src/runner/tick.ts` — orchestrator: due → concurrency-limited fan-out → per-persona pipeline
- `src/runner/types.ts` — `ActionRow`/`FeedbackRow` shapes shared between runner modules
- `src/index.ts`      — Hono app, `/health`, `/metrics`, dev-only `/diag/login` + `/diag/tick`

## Tick pipeline (M3.b)

```
setInterval(config.tickIntervalMs)
    │
    ▼
GET  /api/v1/internal/personas/due          (service-key)
    │  due? hourly>1h, daily>24h, weekdays>24h mon-fri
    ▼
for each persona (max concurrency at once):
    │
    POST /api/v1/auth/login                 (persona JWT)
    GET  /api/auth/organization/list        (personal space id)
    │
    ▼
    runMainTurn
      query({ systemPrompt, mcpServers: { mana: {type:'http', url, headers} }, maxTurns })
      for each SDKMessage:
        tool_use block  →  push ActionRow (ok provisional)
        tool_result err →  flip last ActionRow to 'error'
        module prefix   →  modulesUsed.add(module)
    │
    ▼
    runRatingTurn (same systemPrompt, fresh query, tools:[])
      prompt: 'rate each of {modulesUsed} 1-5, respond JSON'
      parse {ratings:[{module,rating,notes}]}  →  FeedbackRow[]
      invalid JSON       →  one synthetic rating row '__parse' as marker
    │
    ▼
POST /api/v1/internal/personas/:id/actions  (idempotent, batch ≤500)
POST /api/v1/internal/personas/:id/feedback (idempotent, batch ≤100)
    │
    ▼
mana-auth writes rows + bumps personas.last_active_at
```

The outer tick `Promise.allSettled`s each persona, so one failure never
kills the batch. Per-persona exceptions become `failed: [{persona,error}]`
entries in the tick result and get logged. `tickInFlight` guards against
overlap when Claude latency exceeds the interval.

## What's NOT in M3.b (deferred)

- Precise `tool_use_id` ↔ `tool_result` pairing. Today the last action
  gets flipped to `error` when a `tool_result` carries `is_error: true`.
  Good enough for the audit dashboard; exact attribution lands when the
  dashboard needs it.
- Retries/back-off on Claude 429/5xx. The SDK has some built-in; we do
  no extra handling. A burst of 429s can fail a batch — next tick picks
  them up anyway.
- Prometheus counters. Health + log lines today, counters when the
  dashboard lands in M6.

## Environment Variables

```env
PORT=3070
MANA_AUTH_URL=http://localhost:3001
MANA_MCP_URL=http://localhost:3069

# Service-to-service auth for action/feedback persistence (M3.c).
MANA_SERVICE_KEY=...

# Claude API key the runner uses to drive each persona's turn.
ANTHROPIC_API_KEY=...

# Must match whatever the seed script used when the personas were created.
# In production: rotate together with the seed script's env.
PERSONA_SEED_SECRET=...

# Tick loop (M3.b).
TICK_INTERVAL_MS=60000
PERSONA_CONCURRENCY=2

# Operational kill-switch. When true, the service stays up (health-ok)
# but no ticks fire. Useful during demos or when debugging a persona.
RUNNER_PAUSED=false
```

## End-to-end smoke (M3 exit gate)

Proves: personas exist, runner picks them up, Claude drives tools via
MCP, actions + ratings land in Postgres.

```bash
# 1. Stack
pnpm docker:up
cd services/mana-auth && bun run db:push     # applies users.kind + auth.personas* tables
pnpm dev:auth                                 # mana-auth on 3001
pnpm dev:sync                                 # mana-sync on 3050
pnpm --filter @mana/mcp-service dev           # mana-mcp on 3069
pnpm --filter @mana/persona-runner dev        # this service on 3070
    # (boots warning-only if MANA_SERVICE_KEY or ANTHROPIC_API_KEY missing)

# 2. Seed the 10 catalog personas
export MANA_ADMIN_JWT=…                       # admin-tier JWT
export PERSONA_SEED_SECRET=…                  # any value; must match runner
pnpm seed:personas

# 3. Verify login works
curl -s "localhost:3070/diag/login?email=persona.anna@mana.test" | jq
# → { ok: true, userId: "…", spaceId: "…" }

# 4. Fire a tick manually (dev-only endpoint, avoids waiting the full interval)
export MANA_SERVICE_KEY=…
export ANTHROPIC_API_KEY=…
curl -s -X POST localhost:3070/diag/tick | jq
# → { ok: true, result: { due: 10, ranSuccessfully: N, failed: [], durationMs: … } }

# 5. Inspect what landed
psql -c "SELECT persona_id, tool_name, result FROM auth.persona_actions ORDER BY created_at DESC LIMIT 20;"
psql -c "SELECT persona_id, module, rating, notes FROM auth.persona_feedback ORDER BY created_at DESC LIMIT 20;"
```

A green run through step 5 is the M3 exit criterion.

## Why a separate service (not part of mana-ai)

- **Lifecycle**: persona-runner is test infra. Starts and stops with a demo, can be paused without downtime noise. mana-ai is a production worker for real user missions — different risk profile.
- **Observability**: mixing them means "is this tick Anna running the suite or a real user running their mission?" becomes a log-filter problem. Separate services give you separate Prometheus scrapes.
- **Tool source**: mana-ai today uses an internal tool catalog; persona-runner uses MCP. When M4 unifies both onto `@mana/tool-registry`, the split still makes sense as two consumers of the same tool surface.