mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:21:09 +02:00
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.
Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.
Internal endpoints (mana-auth, service-key-gated)
- GET /api/v1/internal/personas/due
Returns personas whose tickCadence + lastActiveAt say they're
due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
NULLS FIRST so never-run personas go ahead of stale ones.
- POST /api/v1/internal/personas/:id/actions
Batch ≤ 500. Row ids are deterministic
`${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
runner can retry a tick without doubling audit rows. Also
bumps personas.last_active_at so the next /due call sees it.
- POST /api/v1/internal/personas/:id/feedback
Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
one rating per module per tick.
Runner tick pipeline (services/mana-persona-runner/src/runner/)
- claude-session.ts
Two phases per tick. runMainTurn feeds the persona's system
prompt + a German "simulate a day" user prompt to Claude Agent
SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
server. We iterate the returned AsyncGenerator and extract
tool_use blocks into ActionRows; a tool_result with
is_error=true flips the most recent action. runRatingTurn is a
fresh query() with tools:[] asking Claude in character to rate
each used module 1-5 as strict JSON. We parse with tolerance
for whitespace / fences. Unparseable output becomes a synthetic
'__parse' feedback row so operators see the failure.
- tick.ts
Orchestrator. Skips when config.paused. Fetches /due, processes
in batches of config.concurrency via Promise.allSettled so a
single persona failure never kills the batch. Returns
{due, ranSuccessfully, failed[], durationMs}.
- types.ts
ActionRow + FeedbackRow shapes shared between claude-session
and the internal client.
Runner bootstrap (src/index.ts)
- setInterval(config.tickIntervalMs) starts the tick loop on boot.
tickInFlight guards against overlap when Claude latency >
interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
loop is disabled with a warn line — /health + /diag/login still
work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.
Client
- clients/mana-auth-internal.ts
X-Service-Key client for the three endpoints above.
Constructor throws on empty serviceKey — fail loud.
Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.
M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.
M2.d (cross-space family/team memberships) still deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
163 lines
7.1 KiB
Markdown
163 lines
7.1 KiB
Markdown
# mana-persona-runner
|
||
|
||
Tick-loop service that drives the **M2 personas** through the app via Claude + the **mana-mcp** gateway. Test infrastructure — not a user-facing service, not deployed to prod until the runner has proven itself in staging.
|
||
|
||
**Plan:** [`docs/plans/mana-mcp-and-personas.md`](../../docs/plans/mana-mcp-and-personas.md) (M3)
|
||
|
||
## Tech Stack
|
||
|
||
| Layer | Technology |
|
||
|-------|------------|
|
||
| **Runtime** | Bun |
|
||
| **Framework** | Hono |
|
||
| **AI** | `@anthropic-ai/claude-agent-sdk` (native MCP tool-loop) |
|
||
| **Tools** | `mana-mcp` (`:3069`) — Streamable HTTP, per-persona JWT |
|
||
| **Upstream** | `mana-auth` (`:3001`) for login + spaces + action/feedback persistence |
|
||
|
||
## Port: 3070
|
||
|
||
## What it does (when the tick loop lands — M3.b)
|
||
|
||
Every `TICK_INTERVAL_MS`:
|
||
|
||
1. Query `auth.personas` for rows whose `tickCadence` + `lastActiveAt` make them due.
|
||
2. Limit to `PERSONA_CONCURRENCY` personas in parallel.
|
||
3. For each due persona:
|
||
- **Login**: `POST /api/v1/auth/login` with deterministic HMAC-derived password (same algorithm as `scripts/personas/password.ts`).
|
||
- **Resolve space**: `GET /api/auth/organization/list`, pick first `personal` space.
|
||
- **Claude call**: `@anthropic-ai/claude-agent-sdk` with `persona.systemPrompt`, MCP server wired to `:3069`, `X-Mana-Space` pinned to the persona's personal space.
|
||
- **Self-reflection**: after the tool loop settles, ask Claude in-character to rate each module used (1–5 + note).
|
||
- **Persist**: `POST /api/v1/internal/personas/:id/actions` and `/feedback` on mana-auth (service-key auth).
|
||
|
||
## Files
|
||
|
||
- `src/config.ts` — env-driven config + production-secret assertion
|
||
- `src/clients/auth.ts` — login + listSpaces, convenience `loginAndResolvePersonalSpace`
|
||
- `src/clients/mana-auth-internal.ts` — `X-Service-Key`-gated calls: `listDuePersonas`, `postActions`, `postFeedback`
|
||
- `src/password.ts` — HMAC derivation (mirror of `scripts/personas/password.ts`, see comment)
|
||
- `src/runner/claude-session.ts` — per-tick `runMainTurn` + `runRatingTurn` on top of `@anthropic-ai/claude-agent-sdk`
|
||
- `src/runner/tick.ts` — orchestrator: due → concurrency-limited fan-out → per-persona pipeline
|
||
- `src/runner/types.ts` — `ActionRow`/`FeedbackRow` shapes shared between runner modules
|
||
- `src/index.ts` — Hono app, `/health`, `/metrics`, dev-only `/diag/login` + `/diag/tick`
|
||
|
||
## Tick pipeline (M3.b)
|
||
|
||
```
|
||
setInterval(config.tickIntervalMs)
|
||
│
|
||
▼
|
||
GET /api/v1/internal/personas/due (service-key)
|
||
│ due? hourly>1h, daily>24h, weekdays>24h mon-fri
|
||
▼
|
||
for each persona (max concurrency at once):
|
||
│
|
||
POST /api/v1/auth/login (persona JWT)
|
||
GET /api/auth/organization/list (personal space id)
|
||
│
|
||
▼
|
||
runMainTurn
|
||
query({ systemPrompt, mcpServers: { mana: {type:'http', url, headers} }, maxTurns })
|
||
for each SDKMessage:
|
||
tool_use block → push ActionRow (ok provisional)
|
||
tool_result err → flip last ActionRow to 'error'
|
||
module prefix → modulesUsed.add(module)
|
||
│
|
||
▼
|
||
runRatingTurn (same systemPrompt, fresh query, tools:[])
|
||
prompt: 'rate each of {modulesUsed} 1-5, respond JSON'
|
||
parse {ratings:[{module,rating,notes}]} → FeedbackRow[]
|
||
invalid JSON → one synthetic rating row '__parse' as marker
|
||
│
|
||
▼
|
||
POST /api/v1/internal/personas/:id/actions (idempotent, batch ≤500)
|
||
POST /api/v1/internal/personas/:id/feedback (idempotent, batch ≤100)
|
||
│
|
||
▼
|
||
mana-auth writes rows + bumps personas.last_active_at
|
||
```
|
||
|
||
The outer tick `Promise.allSettled`s each persona, so one failure never
|
||
kills the batch. Per-persona exceptions become `failed: [{persona,error}]`
|
||
entries in the tick result and get logged. `tickInFlight` guards against
|
||
overlap when Claude latency exceeds the interval.
|
||
|
||
## What's NOT in M3.b (deferred)
|
||
|
||
- Precise `tool_use_id` ↔ `tool_result` pairing. Today the last action
|
||
gets flipped to `error` when a `tool_result` carries `is_error: true`.
|
||
Good enough for the audit dashboard; exact attribution lands when the
|
||
dashboard needs it.
|
||
- Retries/back-off on Claude 429/5xx. The SDK has some built-in; we do
|
||
no extra handling. A burst of 429s can fail a batch — next tick picks
|
||
them up anyway.
|
||
- Prometheus counters. Health + log lines today, counters when the
|
||
dashboard lands in M6.
|
||
|
||
## Environment Variables
|
||
|
||
```env
|
||
PORT=3070
|
||
MANA_AUTH_URL=http://localhost:3001
|
||
MANA_MCP_URL=http://localhost:3069
|
||
|
||
# Service-to-service auth for action/feedback persistence (M3.c).
|
||
MANA_SERVICE_KEY=...
|
||
|
||
# Claude API key the runner uses to drive each persona's turn.
|
||
ANTHROPIC_API_KEY=...
|
||
|
||
# Must match whatever the seed script used when the personas were created.
|
||
# In production: rotate together with the seed script's env.
|
||
PERSONA_SEED_SECRET=...
|
||
|
||
# Tick loop (M3.b).
|
||
TICK_INTERVAL_MS=60000
|
||
PERSONA_CONCURRENCY=2
|
||
|
||
# Operational kill-switch. When true, the service stays up (health-ok)
|
||
# but no ticks fire. Useful during demos or when debugging a persona.
|
||
RUNNER_PAUSED=false
|
||
```
|
||
|
||
## End-to-end smoke (M3 exit gate)
|
||
|
||
Proves: personas exist, runner picks them up, Claude drives tools via
|
||
MCP, actions + ratings land in Postgres.
|
||
|
||
```bash
|
||
# 1. Stack
|
||
pnpm docker:up
|
||
cd services/mana-auth && bun run db:push # applies users.kind + auth.personas* tables
|
||
pnpm dev:auth # mana-auth on 3001
|
||
pnpm dev:sync # mana-sync on 3050
|
||
pnpm --filter @mana/mcp-service dev # mana-mcp on 3069
|
||
pnpm --filter @mana/persona-runner dev # this service on 3070
|
||
# (boots warning-only if MANA_SERVICE_KEY or ANTHROPIC_API_KEY missing)
|
||
|
||
# 2. Seed the 10 catalog personas
|
||
export MANA_ADMIN_JWT=… # admin-tier JWT
|
||
export PERSONA_SEED_SECRET=… # any value; must match runner
|
||
pnpm seed:personas
|
||
|
||
# 3. Verify login works
|
||
curl -s "localhost:3070/diag/login?email=persona.anna@mana.test" | jq
|
||
# → { ok: true, userId: "…", spaceId: "…" }
|
||
|
||
# 4. Fire a tick manually (dev-only endpoint, avoids waiting the full interval)
|
||
export MANA_SERVICE_KEY=…
|
||
export ANTHROPIC_API_KEY=…
|
||
curl -s -X POST localhost:3070/diag/tick | jq
|
||
# → { ok: true, result: { due: 10, ranSuccessfully: N, failed: [], durationMs: … } }
|
||
|
||
# 5. Inspect what landed
|
||
psql -c "SELECT persona_id, tool_name, result FROM auth.persona_actions ORDER BY created_at DESC LIMIT 20;"
|
||
psql -c "SELECT persona_id, module, rating, notes FROM auth.persona_feedback ORDER BY created_at DESC LIMIT 20;"
|
||
```
|
||
|
||
A green run through step 5 is the M3 exit criterion.
|
||
|
||
## Why a separate service (not part of mana-ai)
|
||
|
||
- **Lifecycle**: persona-runner is test infra. Starts and stops with a demo, can be paused without downtime noise. mana-ai is a production worker for real user missions — different risk profile.
|
||
- **Observability**: mixing them means "is this tick Anna running the suite or a real user running their mission?" becomes a log-filter problem. Separate services give you separate Prometheus scrapes.
|
||
- **Tool source**: mana-ai today uses an internal tool catalog; persona-runner uses MCP. When M4 unifies both onto `@mana/tool-registry`, the split still makes sense as two consumers of the same tool surface.
|