managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 20:01:09 +02:00

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Till JS	eb8fac23ec	fix(personas): exact tool_use_id pairing + CI drift audit Two loose ends from M3/M4: 1. Tool_use_id-based error attribution in the persona-runner ----------------------------------------------------------- The previous collectActionsFromMessage() flipped the most recent ActionRow to 'error' when a tool_result carried is_error:true. That was fine as long as Claude invoked tools strictly in sequence, but when the planner pipelines multiple tools in one turn, a later tool_result carries an earlier tool_use_id — the last-action fallback mis- attributes the error. runMainTurn() now keeps a tool_use_id → action-index Map for the duration of the tick. On tool_use we stash block.id, on tool_result we look up the exact ActionRow via tool_use_id and flip that one. The "flip last" path survives as a pure fallback if a future SDK ever ships a block without an id. 2. New audit:encrypted-tools script ----------------------------------- scripts/audit-encrypted-tools.ts — loads registerAllModules() and apps/mana/…/crypto/registry.ts, diffs every ToolSpec.encryptedFields against the authoritative web-app ENCRYPTION_REGISTRY. Catches three classes of drift: - missing-table : tool declares a table the web-app doesn't encrypt - field-drift : both agree a table is encrypted but the field lists differ (half-encryption in the wire is silent death) - disabled : web-app has enabled:false while the tool still encrypts — advisory warning, not a fail Negative-tested by injecting a deliberate drift on todo.create + todo.list (shortened ENCRYPTED_FIELDS to ['title']); the auditor flagged both tools with full field diffs, restore returned to green. Wired into `pnpm run validate:all` so the contract survives future edits on either side. Fills the M4 audit gap noted in project_mana_mcp_personas.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 15:34:52 +02:00
Till JS	f07eae3c01	feat(personas): M3.b-d — tick loop + Claude Agent SDK + persistence (real) Previous commit `38dc80654` carries this M3 title but its payload is an unrelated apps/api/picture change — shared-.git-index race with a parallel session (see feedback_git_workflow.md). This commit holds the actual M3.b/c/d code. Leaving the misnamed commit for the user to re-attribute / revert as they prefer. Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The runner picks up due personas, drives each through Claude + MCP for one simulated turn, collects actions + ratings, persists through service-key internal endpoints in mana-auth. Internal endpoints (mana-auth, service-key-gated) - GET /api/v1/internal/personas/due Returns personas whose tickCadence + lastActiveAt say they're due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri. NULLS FIRST so never-run personas go ahead of stale ones. - POST /api/v1/internal/personas/:id/actions Batch ≤ 500. Row ids are deterministic `${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the runner can retry a tick without doubling audit rows. Also bumps personas.last_active_at so the next /due call sees it. - POST /api/v1/internal/personas/:id/feedback Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is one rating per module per tick. Runner tick pipeline (services/mana-persona-runner/src/runner/) - claude-session.ts Two phases per tick. runMainTurn feeds the persona's system prompt + a German "simulate a day" user prompt to Claude Agent SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP server. We iterate the returned AsyncGenerator and extract tool_use blocks into ActionRows; a tool_result with is_error=true flips the most recent action. runRatingTurn is a fresh query() with tools:[] asking Claude in character to rate each used module 1-5 as strict JSON. We parse with tolerance for whitespace / fences. Unparseable output becomes a synthetic '__parse' feedback row so operators see the failure. - tick.ts Orchestrator. Skips when config.paused. Fetches /due, processes in batches of config.concurrency via Promise.allSettled so a single persona failure never kills the batch. Returns {due, ranSuccessfully, failed[], durationMs}. - types.ts ActionRow + FeedbackRow shapes shared between claude-session and the internal client. Runner bootstrap (src/index.ts) - setInterval(config.tickIntervalMs) starts the tick loop on boot. tickInFlight guards against overlap when Claude latency > interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing, loop is disabled with a warn line — /health + /diag/login still work. - POST /diag/tick (dev-only) fires one tick on demand, returns the result. Avoids waiting a full interval during testing. - Graceful SIGTERM/SIGINT shutdown clears the interval. Client - clients/mana-auth-internal.ts X-Service-Key client for the three endpoints above. Constructor throws on empty serviceKey — fail loud. Boot smoke verified: /health returns ok, /diag/tick 500s with descriptive messages when keys absent. Warning lines on boot when keys are missing. Type-check green across mana-auth, tool-registry, mcp, persona-runner. M3 exit gate is the end-to-end smoke recipe (docker up → db:push → seed:personas → diag/tick → psql) documented in services/mana-persona-runner/CLAUDE.md. M2.d (cross-space family/team memberships) still deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:18:31 +02:00
Till JS	a1caeaa7f3	feat(personas): M3.a — scaffold mana-persona-runner service on :3070 First concrete piece of M3 (docs/plans/mana-mcp-and-personas.md). The tick loop itself and the Claude Agent SDK + MCP integration are M3.b; the action/feedback persistence endpoints are M3.c. This commit just stands up the service so the remaining pieces have a shell to land in. Service shape (Bun/Hono on :3070) - src/config.ts Env-driven configuration: auth URL, MCP URL, service key for action/feedback callbacks (M3.c), Anthropic API key, deterministic PERSONA_SEED_SECRET (must match scripts/personas/password.ts so the runner can log back in without any stored credentials), tick interval and concurrency, RUNNER_PAUSED kill-switch. Production start asserts all secrets are set and the dev fallback secret is rotated. - src/password.ts Bit-for-bit identical HMAC-SHA256 password derivation to scripts/personas/password.ts. Duplicated deliberately: the two sides can't share code (one is a repo-root utility script, the other is a workspace service) but must stay in sync — comment at the top calls this out. - src/clients/auth.ts Two upstream calls the runner needs for one tick: POST /auth/login and GET /api/auth/organization/list. loginAndResolvePersonalSpace() wraps both and picks the persona's auto-created personal space as the write target (throws if none exists — Spaces-Foundation should always have seeded one on signup). - src/index.ts Hono app: /health, /metrics (stub), and a dev-only /diag/login endpoint that takes a persona email, derives the password, logs in, resolves the personal space, and returns {userId, spaceId} as an end-to-end sanity check. Disabled in production. No tick loop yet — RUNNER_PAUSED prints an info line on boot, but nothing fires. The dispatcher + Claude Agent SDK + MCP client land in M3.b; the internal POST callbacks into mana-auth for persona_actions / persona_feedback land in M3.c. Infra - Port 3070 added to docs/PORT_SCHEMA.md. - Service listed in root CLAUDE.md next to mana-mcp. - services/mana-persona-runner/CLAUDE.md documents what's built today, what lands in M3.b/c, and the local diag smoke recipe. Boot smoke verified: /health returns ok + paused/interval/concurrency, /diag/login without email returns 400. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:00:43 +02:00

Till JS

eb8fac23ec

fix(personas): exact tool_use_id pairing + CI drift audit

Two loose ends from M3/M4:

1. Tool_use_id-based error attribution in the persona-runner
-----------------------------------------------------------
The previous collectActionsFromMessage() flipped the *most recent*
ActionRow to 'error' when a tool_result carried is_error:true. That was
fine as long as Claude invoked tools strictly in sequence, but when
the planner pipelines multiple tools in one turn, a later tool_result
carries an earlier tool_use_id — the last-action fallback mis-
attributes the error.

runMainTurn() now keeps a tool_use_id → action-index Map for the
duration of the tick. On tool_use we stash block.id, on tool_result we
look up the exact ActionRow via tool_use_id and flip that one. The
"flip last" path survives as a pure fallback if a future SDK ever
ships a block without an id.

2. New audit:encrypted-tools script
-----------------------------------
scripts/audit-encrypted-tools.ts — loads registerAllModules() and
apps/mana/…/crypto/registry.ts, diffs every ToolSpec.encryptedFields
against the authoritative web-app ENCRYPTION_REGISTRY.

Catches three classes of drift:
- missing-table : tool declares a table the web-app doesn't encrypt
- field-drift   : both agree a table is encrypted but the field lists
                  differ (half-encryption in the wire is silent death)
- disabled      : web-app has enabled:false while the tool still
                  encrypts — advisory warning, not a fail

Negative-tested by injecting a deliberate drift on todo.create +
todo.list (shortened ENCRYPTED_FIELDS to ['title']); the auditor
flagged both tools with full field diffs, restore returned to green.

Wired into `pnpm run validate:all` so the contract survives future
edits on either side. Fills the M4 audit gap noted in
project_mana_mcp_personas.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 15:34:52 +02:00

Till JS

f07eae3c01

feat(personas): M3.b-d — tick loop + Claude Agent SDK + persistence (real)

Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.

Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.

Internal endpoints (mana-auth, service-key-gated)

- GET  /api/v1/internal/personas/due
    Returns personas whose tickCadence + lastActiveAt say they're
    due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
    NULLS FIRST so never-run personas go ahead of stale ones.

- POST /api/v1/internal/personas/:id/actions
    Batch ≤ 500. Row ids are deterministic
    `${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
    runner can retry a tick without doubling audit rows. Also
    bumps personas.last_active_at so the next /due call sees it.

- POST /api/v1/internal/personas/:id/feedback
    Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
    one rating per module per tick.

Runner tick pipeline (services/mana-persona-runner/src/runner/)

- claude-session.ts
    Two phases per tick. runMainTurn feeds the persona's system
    prompt + a German "simulate a day" user prompt to Claude Agent
    SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
    server. We iterate the returned AsyncGenerator and extract
    tool_use blocks into ActionRows; a tool_result with
    is_error=true flips the most recent action. runRatingTurn is a
    fresh query() with tools:[] asking Claude in character to rate
    each used module 1-5 as strict JSON. We parse with tolerance
    for whitespace / fences. Unparseable output becomes a synthetic
    '__parse' feedback row so operators see the failure.

- tick.ts
    Orchestrator. Skips when config.paused. Fetches /due, processes
    in batches of config.concurrency via Promise.allSettled so a
    single persona failure never kills the batch. Returns
    {due, ranSuccessfully, failed[], durationMs}.

- types.ts
    ActionRow + FeedbackRow shapes shared between claude-session
    and the internal client.

Runner bootstrap (src/index.ts)

- setInterval(config.tickIntervalMs) starts the tick loop on boot.
  tickInFlight guards against overlap when Claude latency >
  interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
  loop is disabled with a warn line — /health + /diag/login still
  work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
  the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.

Client

- clients/mana-auth-internal.ts
    X-Service-Key client for the three endpoints above.
    Constructor throws on empty serviceKey — fail loud.

Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.

M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.

M2.d (cross-space family/team memberships) still deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 14:18:31 +02:00

Till JS

a1caeaa7f3

feat(personas): M3.a — scaffold mana-persona-runner service on :3070

First concrete piece of M3 (docs/plans/mana-mcp-and-personas.md). The
tick loop itself and the Claude Agent SDK + MCP integration are M3.b;
the action/feedback persistence endpoints are M3.c. This commit just
stands up the service so the remaining pieces have a shell to land in.

Service shape (Bun/Hono on :3070)

- src/config.ts
    Env-driven configuration: auth URL, MCP URL, service key for
    action/feedback callbacks (M3.c), Anthropic API key, deterministic
    PERSONA_SEED_SECRET (must match scripts/personas/password.ts so the
    runner can log back in without any stored credentials), tick
    interval and concurrency, RUNNER_PAUSED kill-switch. Production
    start asserts all secrets are set and the dev fallback secret is
    rotated.

- src/password.ts
    Bit-for-bit identical HMAC-SHA256 password derivation to
    scripts/personas/password.ts. Duplicated deliberately: the two
    sides can't share code (one is a repo-root utility script, the
    other is a workspace service) but must stay in sync — comment
    at the top calls this out.

- src/clients/auth.ts
    Two upstream calls the runner needs for one tick: POST /auth/login
    and GET /api/auth/organization/list. loginAndResolvePersonalSpace()
    wraps both and picks the persona's auto-created personal space as
    the write target (throws if none exists — Spaces-Foundation should
    always have seeded one on signup).

- src/index.ts
    Hono app: /health, /metrics (stub), and a dev-only /diag/login
    endpoint that takes a persona email, derives the password, logs
    in, resolves the personal space, and returns {userId, spaceId} as
    an end-to-end sanity check. Disabled in production.

No tick loop yet — RUNNER_PAUSED prints an info line on boot, but
nothing fires. The dispatcher + Claude Agent SDK + MCP client land in
M3.b; the internal POST callbacks into mana-auth for persona_actions /
persona_feedback land in M3.c.

Infra

- Port 3070 added to docs/PORT_SCHEMA.md.
- Service listed in root CLAUDE.md next to mana-mcp.
- services/mana-persona-runner/CLAUDE.md documents what's built today,
  what lands in M3.b/c, and the local diag smoke recipe.

Boot smoke verified: /health returns ok + paused/interval/concurrency,
/diag/login without email returns 400.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 14:00:43 +02:00

3 commits