Commit graph

15 commits

Author SHA1 Message Date
Till JS
72f7978ed4 feat(agent-loop): expose compactionsDone + compactedReminder producer
Closes the loop on M2: when the compactor fires, the LLM needs to know
it's now seeing a <compact-summary> instead of raw turns so it
doesn't waste a turn asking about lost details or re-executing tools
whose responses are gone.

shared-ai:
  - LoopState grows `compactionsDone: number` (cap-1 by current loop
    policy, but shape kept as count for future multi-compact cycles).
  - runPlannerLoop populates it on each reminder-channel call. New
    loop test asserts [0, 1] sequence: round 1 before compaction,
    round 2 after.

mana-ai:
  - New producer `compactedReminder` — fires severity=info when
    compactionsDone >= 1, wrapped in a German one-liner ("frag nicht
    nach verlorenen Details").
  - Injected FIRST in buildReminderChannel so the LLM frames the rest
    of the round with "I'm looking at a summary" context. Metric
    surface stays `{producer='compacted', severity='info'}`.

4 new reminder tests (3 pure producer + 1 composition-ordering) +
1 loop-wiring test. 77 shared-ai, 20 reminders.test.ts — green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:21 +02:00
Till JS
3d8214a147 feat(shared-ai): wire compactor into runPlannerLoop (M2.2)
PlannerLoopInput grows an optional compactor:

  compactor?: {
    maxContextTokens: number;
    threshold?: number;        // default 0.92, matches Claude Code wU2
    compact: (messages) => Promise<{ messages, compactedTurns }>;
  }

Before each LLM call the loop checks whether promptTokens+completion
has crossed threshold × maxContextTokens. If yes AND we haven't
compacted this run yet, the callback runs, its returned messages
REPLACE the live history, and compactionsDone flips to 1 so a
runaway tool can't re-trigger.

Design choices:
  - Fires at most ONCE per loop run. If the fresh (compacted)
    history hits the threshold again in the same run, the LLM
    round budget will hit first; better to terminate than to
    recursively compact a summary.
  - No reminder emitted automatically — the caller can wire
    that via reminderChannel by reading compactionsDone from
    LoopState (next PR; compactionsDone isn't exposed yet to
    keep the state surface small).
  - compactor callback is injectable, not hardcoded to
    compactHistory() from compact.ts. Lets mana-ai route the
    compactor LLM call to a cheaper model (Haiku) without
    changing the loop.
  - Zero maxContextTokens → skip silently (same contract as
    shouldCompact()).

Also cleaned up the isParallelSafe non-null-assertion warning by
hoisting the predicate to a local with proper narrowing.

5 new loop tests: below-threshold no-op, single-fire replacement,
once-per-run idempotency, zero-cap bail, no-op when compactor
returns 0 turns. 76 shared-ai tests total, green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:25:35 +02:00
Till JS
13361eb083 feat(shared-ai): compactHistory() — context-window compactor primitive (M2.1)
The Claude-Code wU2 pattern: when token usage hits ~92% of the provider's
context budget, fold all pre-tail turns into a single structured summary
(Goal / Decisions / Tools Called / Current Progress) so subsequent
rounds see a synopsis instead of the raw log.

This commit ships ONLY the primitive. Wiring it into runPlannerLoop
(auto-trigger before the next LLM call when shouldCompact() fires)
is M2.2 so the surface stays small and testable.

New exports from @mana/shared-ai:

  - shouldCompact(totalTokens, maxContextTokens, threshold?)
      → boolean; DEFAULT_COMPACT_THRESHOLD = 0.92, matching Claude Code.
      Bails safely when maxContextTokens is missing (local models often
      don't report usage).

  - compactHistory(messages, { llm, model, keepRecent?, temperature? })
      → { messages, summary, compactedTurns, usage? }
      Preserves: [0]=system, [1]=first user, [last N]=recent turns
      (default 4). Everything between gets sent through the compact
      agent with COMPACT_SYSTEM_PROMPT — a fixed 4-section Markdown
      schema. Temperature default 0.2 because we want summarisation,
      not creativity.

  - parseCompactSummary / renderCompactSummary — round-trip helpers.
      Parser is tolerant (missing sections → empty string) so a partial
      compaction still produces a usable summary.

The summary replaces the middle as a single role='assistant' message
wrapped in <compact-summary> tags. Assistant role (not system) because
some providers reject arbitrary system messages deep in history.

Tests: 17 new across the 4 exports (trigger logic, Markdown round-trip,
structural preservation of anchors + tail, usage passthrough, custom
keepRecent). All 71 shared-ai tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:21:10 +02:00
Till JS
8f283726b1 feat(agent-loop): activate retryLoopReminder via LoopState.recentCalls
Extends LoopState with a sliding window of the last N ExecutedCalls
(oldest-first), capped at LOOP_STATE_RECENT_CALLS_WINDOW = 5. The loop
maintains the window automatically; reminderChannel producers read it
without touching internal state.

This activates retryLoopReminder which was shape-only in faa472be9.
The guard now fires end-to-end: when round >= 3 and the tail-2 calls
both returned success:false, the LLM sees a "stop retrying, write a
summary instead" <reminder> on the next turn. The tail-2 check rather
than window-wide is deliberate — a flaky run with intermittent success
(F, F, F, OK, F) is not a retry loop, just flaky tools.

Why window=5: retry loops usually manifest within 2-3 consecutive
rounds; a 5-deep window gives room for burst-detection and
stale-tool heuristics without bloating the reminder channel. Cap
keeps the reminder producers O(5) regardless of loop length.

Tests: 3 new (sliding-window cap + slide + order in shared-ai, retry
composition + budget+retry chain + tail-only heuristic in mana-ai).
Total agent-loop tests now 74 across both packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:02:40 +02:00
Till JS
e5d230e599 feat(agent-loop): M1 — policy gate + reminder channel + parallel reads
Three Claude-Code-inspired primitives for runPlannerLoop, derived from the
reverse-engineering reports in docs/reports/:

1. **Policy gate** (@mana/tool-registry) — evaluatePolicy() gates every tool
   dispatch: denies admin-scope, denies destructive tools not in the user's
   opt-in list, rate-limits per tool (30/60s default), flags prompt-injection
   markers in freetext without blocking. Wired into mana-mcp with a
   per-user rolling invocation log and POLICY_MODE env (off|log-only|enforce,
   default log-only). mana-ai uses detectInjectionMarker only — tool dispatch
   there is plan-only, so rate-limit/destructive checks don't apply yet.

2. **Reminder channel** (packages/shared-ai/src/planner/loop.ts) — new
   reminderChannel callback in PlannerLoopInput. Called once per round with
   LoopState snapshot (round, toolCallCount, usage, lastCall); returned
   strings wrap in <reminder> tags and inject as transient system messages
   into THIS LLM request only. Never pushed to messages[] — the Claude-Code
   <system-reminder> pattern that keeps the KV-cache prefix stable.

3. **Parallel reads** (loop.ts) — isParallelSafe predicate enables
   Promise.all dispatch when every tool_call in a round is parallel-safe,
   in batches of PARALLEL_TOOL_BATCH_SIZE=10. Any non-safe call downgrades
   the whole round to sequential. messages[] always appends in source
   order, never completion order, so the debug log stays linear.
   Default-off (undefined predicate) preserves pre-M1 behaviour.

Tests: 21 new in tool-registry (policy), 9 new in shared-ai (5 parallel,
4 reminder). All 74 green, type-check clean across 4 packages.

Design/plan: docs/plans/agent-loop-improvements-m1.md
Reports: docs/reports/claude-code-architecture.md,
         docs/reports/mana-agent-improvements-from-claude-code.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:56:40 +02:00
Till JS
0d613e1846 feat(ai): thread TokenUsage through runPlannerLoop → mana-ai budget
Carries per-round token counts from the mana-llm response body
(prompt_tokens + completion_tokens) back through LlmCompletionResponse
→ PlannerLoopResult. The loop sums across rounds and exposes a single
aggregate on result.usage.

Lets mana-ai's tick re-activate per-agent daily-token budget tracking
— tokensUsed was stubbed to 0 in the migration commit (6) because the
loop didn't surface usage yet. Now recordTokenUsage + agentTokenUsage24h
get real numbers again, and the mana_ai_tokens_used_total Prometheus
counter is accurate.

Additive only: consumers without usage needs ignore the new field,
and providers that don't return usage produce zeros (not undefined —
the loop still exposes the object so downstream branches stay trivial).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:21:34 +02:00
Till JS
5b7564b3a4 test(ai): promote MockLlmClient to a shared @mana/shared-ai export
The runPlannerLoop test file and the webapp's mission-runner test each
had their own inline scripted LLM mock — same interface, diverged
slightly. Consolidates into packages/shared-ai/src/planner/mock-llm.ts
and re-exports from the package root so any consumer can drive the
loop deterministically.

Both existing test files now use the shared client. 5 + 3 tests pass,
44 total in shared-ai still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:05:46 +02:00
Till JS
9f7d2f24b3 feat(companion): chat on runPlannerLoop with native function calling
The companion chat had its own ad-hoc 3-round tool-calling pipeline:
build a system prompt with tool descriptions, ask the LLM to emit
```tool JSON blocks, regex-extract, execute, feed back the result as
a synthetic user message. Same fragility class as the old text-JSON
planner — and now unnecessary since mana-llm speaks native function
calling.

Migrates companion/engine.ts to the shared runPlannerLoop, same as
the mission runner (commit 5a) and the server tick (commit 6). Tools
go to the LLM as proper function-schemas; tool_calls come back
structured; the executor runs them directly under USER_ACTOR.

Extends shared-ai/planner/loop.ts with an optional priorMessages[]
input field so the chat can preserve multi-turn history between
turns (missions don't need this and leave it empty).

Deletes the old llm-tasks/companion-chat.ts LlmTask wrapper. Nothing
else imported it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 16:45:33 +02:00
Till JS
0077752456 fix(type-check): clear the last five failures — monorepo type-check is now 76/76 green
After the mobile-app deletion unblocked \`@context/mobile\`, five more
pre-existing failures surfaced across shared packages and two services.
All were silent-masked by the postinstall \`|| true\` for months.

- **shared-ai**: \`planner/loop.ts\` imported \`ToolSchema\` from
  \`../tools/function-schema\`, which only imports (not re-exports) the
  type. Fixed to import from the source (\`../tools/schemas\`).
- **shared-logger**: \`typeof window !== 'undefined'\` blows up under
  tsconfigs that don't include the DOM lib (e.g. uload-server's
  \`bun-types\`-only config), because shared-logger is consumed via
  source import. Replaced with a \`globalThis\`-indirected check that
  compiles under any lib configuration.
- **shared-hono**: \`credits.ts\` returned \`res.json()\` directly as
  \`Promise<T | null>\`. Modern \`@types/node\` / undici types return
  \`unknown\` strictly — cast to \`T\` at the boundary so the generic
  contract is explicit.
- **uload-server**: \`routes/analytics.ts\` + \`routes/email.ts\` still
  imported \`AuthUser\` from a \`middleware/jwt-auth\` module that was
  deleted during the migration to \`@mana/shared-hono\`. Replaced with
  \`AuthVariables\` from shared-hono, which matches the actual context
  shape set by \`authMiddleware()\`.
- **manavoxel/web**: \`guestSeed\` collection entries were wrapped in
  arrow functions, but \`local-store\` expects \`T[]\` directly and
  iterates \`seed.length\` — which on a function is 0. The "guest
  seed" was silently dead; eager-evaluating \`generateGuestWorld()\`
  once and sharing the result fixes both the type and the runtime.

Verified: \`pnpm run type-check\` from the repo root now exits 0 —
76/76 tasks successful, no failures. First fully green state since
well before the postinstall \`|| true\` was introduced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:53:07 +02:00
Till JS
4daca8970b feat(shared-ai): runPlannerLoop + compact system prompt for function calling
Introduces the new planner pipeline both the webapp runner and the
mana-ai tick will swap onto in the next commits. Additive for now —
the legacy buildPlannerPrompt + parsePlannerResponse stay exported so
callers can migrate one at a time; they get removed once the last
consumer is gone.

- planner/loop.ts — runPlannerLoop orchestrates a multi-turn chat
  against a caller-supplied LlmClient. Tool-calls from the LLM are
  handed to an onToolCall callback and their results fed back as
  tool-messages. Parallel tool-calls in one turn execute sequentially
  to keep the message log linear for debugging. Stops on assistant
  stop, empty tool_calls, or a hard max-rounds ceiling (default 5).
- planner/system-prompt.ts — new buildSystemPrompt. ~40-line German
  system frame, no tool listing (the SDK-level tools field carries
  the schemas now), no JSON format example, no "please return JSON"
  plea. User frame renders mission + linked inputs + last 3
  iteration summaries, same as before.
- Five test cases covering the loop: immediate stop, single tool
  call with result feedback, parallel calls execute in order, tool
  failures propagate as tool-messages the LLM can react to, and
  maxRounds ceiling fires with the right stopReason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:31:01 +02:00
Till JS
efc7641a60 chore(ai): P2 batch — prompt sync, perf, dedup, scope unification
Six P2 items from the AI Workbench audit:

#7 Prompt ↔ loop budget sync:
  System prompt now says "1 bis 5 Schritte pro Planungsrunde, bis zu 5
  Planungsrunden" — matches MAX_REASONING_LOOP_ITERATIONS. Cross-ref
  comment added to runner.ts.

#9 SceneHeader: useAgents() → useAgent(id):
  Only loads the single bound agent instead of the full agent list.
  Eliminates unnecessary Dexie churn on every scene header render.

#10 Unified scope filter:
  New scope-filter.ts with filterByScopeTagMap() (batch, sync) and
  filterByScopeAsync() (per-record). Both scope-context.ts (AI) and
  scene-scope.svelte.ts (UI) now import from the shared module —
  zero duplicated filter logic.

#11 Research dedup:
  Research input ID changed from `news-research-${Date.now()}` to
  `news-research-${mission.id}` — re-runs overwrite instead of
  appending duplicates.

#12 Kontext injection policy clarified:
  loadAgentKontextAsResolvedInput no longer falls back to the global
  singleton. Comment + code aligned: kontext injection is explicit
  (via input picker), not auto. Dead loadKontextAsResolvedInput
  kept for potential future opt-in auto-inject feature.

Audit doc updated with all items marked DONE.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:33:52 +02:00
Till JS
be81d11dc3 feat(ai): SSE streaming for foreground Mission Runner
Enable real-time token streaming during the planner "calling-llm" phase
so the user sees live progress ("empfange Plan… 128 tokens") instead of
a static spinner. The parser still receives the full text once complete —
no partial-JSON risk.

Changes:
- Extract shared SSE parser from playground into @mana/shared-llm/sse-parser
- remote.ts: use stream:true when onToken callback is provided
- AiPlanInput: add optional onToken field (shared-ai)
- ai-plan task: pass onToken through to backend.generate()
- runner.ts: throttled (500ms) phaseDetail updates during streaming
- Playground: refactored to use shared SSE parser

Also includes: AI agent architecture comparison report (docs/reports/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:32:43 +02:00
Till JS
8a5d200c84 fix(ai): bump planner maxTokens 1024→4096 + teach prompt about the loop
Debug log from a "tag 4 notes" mission showed the planner's second-round
response truncated mid-step: it was proposing one add_tag_to_note per
listed note but ran out of tokens halfway through note #2. Parser
rejected the malformed JSON → loop exited with 0 staged, user saw
nothing to approve.

Raising maxTokens to 4096 fits ~15-20 step objects, which covers the
batch-tagging / batch-save pattern the reasoning loop is designed for.

Also updating the system prompt so the planner actually knows about
the loop it's running inside: read-only tools are announced as
auto-executing with outputs visible next turn, and a new rule makes
explicit that batch jobs must emit all write-steps in one plan (because
staging a propose-tool ends the turn). Step count raised 1-5 → 1-10.

Prompt snapshot tests still pass (they check structure, not text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:55:18 +02:00
Till JS
d5c351d63e feat(ai): per-iteration debug log — capture prompt + response + inputs
New local-only Dexie table _aiDebugLog (v20, never synced) holds one
row per mission iteration with the full system+user prompt, raw LLM
response, latency, every ResolvedInput the planner saw, and pre-step
state (kontext-injected? web-research-ok-or-error?). Capped at 50
newest rows.

aiPlanTask always returns the captured prompt/response on AiPlanOutput.
debug; the runner persists it only when isAiDebugEnabled() — toggled
via a checkbox in the Mission detail header (defaults to on in DEV
builds, off in prod, override via localStorage 'mana.ai.debug').

New <AiDebugBlock> component renders below each iteration card:
expandable sections for Pre-Step, Resolved Inputs (each input
individually collapsible), System Prompt, User Prompt, Raw Response,
plus a "📋 JSON" copy-to-clipboard button for bug reports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:33:17 +02:00
Till JS
0d90b12d1c feat(shared-ai): extract planner + mission types to @mana/shared-ai
Single source of truth for AI Workbench types shared between the webapp
(Vite/SvelteKit) and the server-side mana-ai Bun service. Prevents the
two runtimes from drifting on prompt shape or mission structure.

- `@mana/shared-ai` package:
  - `actor.ts` — Actor union (user | ai | system) + helpers, mirrors the
    webapp's runtime type so server-side consumers parse incoming actors
    without re-declaring
  - `missions/types.ts` — Mission, MissionCadence, MissionInputRef,
    MissionIteration, PlanStep, MissionState. Adds optional
    `iteration.source: 'browser' | 'server'` to distinguish foreground
    vs server-produced iterations (groundwork for proposal write-back)
  - `planner/prompt.ts` — `buildPlannerPrompt` pure function
  - `planner/parser.ts` — `parsePlannerResponse` strict JSON validator
  - Vitest smoke tests (2) cover prompt → parse round-trip + unknown-
    tool rejection
- Webapp:
  - `missions/types.ts` re-exports from shared-ai, keeps webapp-local
    `MISSIONS_TABLE` constant + `planStepStatusFromProposal` bridge
  - `missions/planner/{types,prompt,parser}.ts` become re-export stubs
    so existing imports keep working unchanged
  - Existing webapp tests (60) continue to pass — the wire code didn't
    move, just its home

Next: mana-ai service imports buildPlannerPrompt/parsePlannerResponse
from shared-ai + wires mana-llm + writes iteration back as a
'source=server' row (tracked in services/mana-ai/CLAUDE.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:01:57 +02:00