mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 19:21:10 +02:00
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.
Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.
Internal endpoints (mana-auth, service-key-gated)
- GET /api/v1/internal/personas/due
Returns personas whose tickCadence + lastActiveAt say they're
due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
NULLS FIRST so never-run personas go ahead of stale ones.
- POST /api/v1/internal/personas/:id/actions
Batch ≤ 500. Row ids are deterministic
`${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
runner can retry a tick without doubling audit rows. Also
bumps personas.last_active_at so the next /due call sees it.
- POST /api/v1/internal/personas/:id/feedback
Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
one rating per module per tick.
Runner tick pipeline (services/mana-persona-runner/src/runner/)
- claude-session.ts
Two phases per tick. runMainTurn feeds the persona's system
prompt + a German "simulate a day" user prompt to Claude Agent
SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
server. We iterate the returned AsyncGenerator and extract
tool_use blocks into ActionRows; a tool_result with
is_error=true flips the most recent action. runRatingTurn is a
fresh query() with tools:[] asking Claude in character to rate
each used module 1-5 as strict JSON. We parse with tolerance
for whitespace / fences. Unparseable output becomes a synthetic
'__parse' feedback row so operators see the failure.
- tick.ts
Orchestrator. Skips when config.paused. Fetches /due, processes
in batches of config.concurrency via Promise.allSettled so a
single persona failure never kills the batch. Returns
{due, ranSuccessfully, failed[], durationMs}.
- types.ts
ActionRow + FeedbackRow shapes shared between claude-session
and the internal client.
Runner bootstrap (src/index.ts)
- setInterval(config.tickIntervalMs) starts the tick loop on boot.
tickInFlight guards against overlap when Claude latency >
interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
loop is disabled with a warn line — /health + /diag/login still
work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.
Client
- clients/mana-auth-internal.ts
X-Service-Key client for the three endpoints above.
Constructor throws on empty serviceKey — fail loud.
Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.
M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.
M2.d (cross-space family/team memberships) still deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7.1 KiB
7.1 KiB
mana-persona-runner
Tick-loop service that drives the M2 personas through the app via Claude + the mana-mcp gateway. Test infrastructure — not a user-facing service, not deployed to prod until the runner has proven itself in staging.
Plan: docs/plans/mana-mcp-and-personas.md (M3)
Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Bun |
| Framework | Hono |
| AI | @anthropic-ai/claude-agent-sdk (native MCP tool-loop) |
| Tools | mana-mcp (:3069) — Streamable HTTP, per-persona JWT |
| Upstream | mana-auth (:3001) for login + spaces + action/feedback persistence |
Port: 3070
What it does (when the tick loop lands — M3.b)
Every TICK_INTERVAL_MS:
- Query
auth.personasfor rows whosetickCadence+lastActiveAtmake them due. - Limit to
PERSONA_CONCURRENCYpersonas in parallel. - For each due persona:
- Login:
POST /api/v1/auth/loginwith deterministic HMAC-derived password (same algorithm asscripts/personas/password.ts). - Resolve space:
GET /api/auth/organization/list, pick firstpersonalspace. - Claude call:
@anthropic-ai/claude-agent-sdkwithpersona.systemPrompt, MCP server wired to:3069,X-Mana-Spacepinned to the persona's personal space. - Self-reflection: after the tool loop settles, ask Claude in-character to rate each module used (1–5 + note).
- Persist:
POST /api/v1/internal/personas/:id/actionsand/feedbackon mana-auth (service-key auth).
- Login:
Files
src/config.ts— env-driven config + production-secret assertionsrc/clients/auth.ts— login + listSpaces, convenienceloginAndResolvePersonalSpacesrc/clients/mana-auth-internal.ts—X-Service-Key-gated calls:listDuePersonas,postActions,postFeedbacksrc/password.ts— HMAC derivation (mirror ofscripts/personas/password.ts, see comment)src/runner/claude-session.ts— per-tickrunMainTurn+runRatingTurnon top of@anthropic-ai/claude-agent-sdksrc/runner/tick.ts— orchestrator: due → concurrency-limited fan-out → per-persona pipelinesrc/runner/types.ts—ActionRow/FeedbackRowshapes shared between runner modulessrc/index.ts— Hono app,/health,/metrics, dev-only/diag/login+/diag/tick
Tick pipeline (M3.b)
setInterval(config.tickIntervalMs)
│
▼
GET /api/v1/internal/personas/due (service-key)
│ due? hourly>1h, daily>24h, weekdays>24h mon-fri
▼
for each persona (max concurrency at once):
│
POST /api/v1/auth/login (persona JWT)
GET /api/auth/organization/list (personal space id)
│
▼
runMainTurn
query({ systemPrompt, mcpServers: { mana: {type:'http', url, headers} }, maxTurns })
for each SDKMessage:
tool_use block → push ActionRow (ok provisional)
tool_result err → flip last ActionRow to 'error'
module prefix → modulesUsed.add(module)
│
▼
runRatingTurn (same systemPrompt, fresh query, tools:[])
prompt: 'rate each of {modulesUsed} 1-5, respond JSON'
parse {ratings:[{module,rating,notes}]} → FeedbackRow[]
invalid JSON → one synthetic rating row '__parse' as marker
│
▼
POST /api/v1/internal/personas/:id/actions (idempotent, batch ≤500)
POST /api/v1/internal/personas/:id/feedback (idempotent, batch ≤100)
│
▼
mana-auth writes rows + bumps personas.last_active_at
The outer tick Promise.allSettleds each persona, so one failure never
kills the batch. Per-persona exceptions become failed: [{persona,error}]
entries in the tick result and get logged. tickInFlight guards against
overlap when Claude latency exceeds the interval.
What's NOT in M3.b (deferred)
- Precise
tool_use_id↔tool_resultpairing. Today the last action gets flipped toerrorwhen atool_resultcarriesis_error: true. Good enough for the audit dashboard; exact attribution lands when the dashboard needs it. - Retries/back-off on Claude 429/5xx. The SDK has some built-in; we do no extra handling. A burst of 429s can fail a batch — next tick picks them up anyway.
- Prometheus counters. Health + log lines today, counters when the dashboard lands in M6.
Environment Variables
PORT=3070
MANA_AUTH_URL=http://localhost:3001
MANA_MCP_URL=http://localhost:3069
# Service-to-service auth for action/feedback persistence (M3.c).
MANA_SERVICE_KEY=...
# Claude API key the runner uses to drive each persona's turn.
ANTHROPIC_API_KEY=...
# Must match whatever the seed script used when the personas were created.
# In production: rotate together with the seed script's env.
PERSONA_SEED_SECRET=...
# Tick loop (M3.b).
TICK_INTERVAL_MS=60000
PERSONA_CONCURRENCY=2
# Operational kill-switch. When true, the service stays up (health-ok)
# but no ticks fire. Useful during demos or when debugging a persona.
RUNNER_PAUSED=false
End-to-end smoke (M3 exit gate)
Proves: personas exist, runner picks them up, Claude drives tools via MCP, actions + ratings land in Postgres.
# 1. Stack
pnpm docker:up
cd services/mana-auth && bun run db:push # applies users.kind + auth.personas* tables
pnpm dev:auth # mana-auth on 3001
pnpm dev:sync # mana-sync on 3050
pnpm --filter @mana/mcp-service dev # mana-mcp on 3069
pnpm --filter @mana/persona-runner dev # this service on 3070
# (boots warning-only if MANA_SERVICE_KEY or ANTHROPIC_API_KEY missing)
# 2. Seed the 10 catalog personas
export MANA_ADMIN_JWT=… # admin-tier JWT
export PERSONA_SEED_SECRET=… # any value; must match runner
pnpm seed:personas
# 3. Verify login works
curl -s "localhost:3070/diag/login?email=persona.anna@mana.test" | jq
# → { ok: true, userId: "…", spaceId: "…" }
# 4. Fire a tick manually (dev-only endpoint, avoids waiting the full interval)
export MANA_SERVICE_KEY=…
export ANTHROPIC_API_KEY=…
curl -s -X POST localhost:3070/diag/tick | jq
# → { ok: true, result: { due: 10, ranSuccessfully: N, failed: [], durationMs: … } }
# 5. Inspect what landed
psql -c "SELECT persona_id, tool_name, result FROM auth.persona_actions ORDER BY created_at DESC LIMIT 20;"
psql -c "SELECT persona_id, module, rating, notes FROM auth.persona_feedback ORDER BY created_at DESC LIMIT 20;"
A green run through step 5 is the M3 exit criterion.
Why a separate service (not part of mana-ai)
- Lifecycle: persona-runner is test infra. Starts and stops with a demo, can be paused without downtime noise. mana-ai is a production worker for real user missions — different risk profile.
- Observability: mixing them means "is this tick Anna running the suite or a real user running their mission?" becomes a log-filter problem. Separate services give you separate Prometheus scrapes.
- Tool source: mana-ai today uses an internal tool catalog; persona-runner uses MCP. When M4 unifies both onto
@mana/tool-registry, the split still makes sense as two consumers of the same tool surface.