Commit graph

19 commits

Author SHA1 Message Date
Till JS
101af462a8 feat(shared-ai): LLM-facing task tool wrapper for runSubAgent (M3.2)
Exposes runSubAgent() as a tool the planner LLM can call natively,
matching Claude Code's `Task` tool shape: { subagent_type, description,
prompt } -> single-string summary.

New exports from @mana/shared-ai:

  - TASK_TOOL_NAME = 'task'
  - TASK_TOOL_SCHEMA — ToolSchema ready to drop into a runPlannerLoop
    `tools` array. subagent_type enum = research|plan|general;
    description+prompt required; defaultPolicy: 'auto' (control-flow,
    not a user-data write).
  - createTaskToolHandler(opts) — factory returning:
      - handle(call): structured ToolResult with the sub-agent's
        summary as message + data {subAgentType, toolsCalled,
        rounds, stopReason, usage}
      - cumulativeUsage(): rolled-up TokenUsage across all sub-agent
        invocations — parent budget accounting reads from here
      - invocationCount(): metric-ready counter

Why not in mana-tool-registry: `task` is a loop-internal control-flow
primitive, not a user-data operation. Registry is for habits/notes/etc.
where MCP exposure and space-scoping matter. task never touches mana-
sync and never crosses the MCP boundary.

Recursion guard is defense-in-depth: the primitive throws
SubAgentRecursionError, this handler catches parentDepth >=
MAX_SUB_AGENT_DEPTH up front and returns a structured ToolResult
instead so the LLM sees it as regular tool-feedback.

Exceptions from the sub-agent (provider down, network) get wrapped
as `{ success: false, message: 'Sub-agent failed: ...' }`. The parent
loop's round continues.

14 new tests covering schema shape, recursion rejection, argument
validation (4 cases), happy path with tool dispatch, cumulative
usage tracking across multiple invocations, exception wrapping,
and parent-dispatcher routing.

107 shared-ai tests green total (was 93).

M3.3 consumer wiring follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 19:05:09 +02:00
Till JS
66b7e08df2 feat(shared-ai): runSubAgent() primitive — Claude-Code I2A pattern (M3.1)
New packages/shared-ai/src/planner/sub-agent.ts implementing the
"one level deep, fresh messages, restricted tools, single-string
return" sub-agent contract from Claude Code's KN5/I2A launcher.

Four invariants enforced at the primitive level:

  1. FRESH messages[] — parent's history never leaks in. The sub-agent
     only sees its own system prompt + the task description. Hundreds
     of scanned files stay inside the sub-agent.
  2. RESTRICTED tool-whitelist — parent's full catalog is filtered
     per SubAgentType ('research' = auto-policy only, 'general' =
     everything, 'plan' = auto-policy + 3-round cap). Custom filter
     overrides the type default.
  3. SINGLE RETURN VALUE — sub-agent returns summary:string for
     the parent to render as task-tool-result. Individual tool calls
     stay in rawResult for debug capture but never cross the boundary.
  4. ONE LEVEL DEEP — MAX_SUB_AGENT_DEPTH = 1. parentDepth >= 1 throws
     SubAgentRecursionError; the consumer task-tool handler will
     also check, this is defense-in-depth.

Model is required (no default) — routing to a cheaper tier like the
compactor does is an explicit decision, not a sneaky default.

Belt-and-suspenders wrapper on onToolCall rejects any tool call
whose name isn't in the whitelist, even if the LLM fabricates one.

14 new tests covering recursion guard, tool filtering per type,
custom filter, whitelist rejection, fresh-messages isolation, usage
roll-up, default summary on max-rounds, type-specific system prompt,
system-prompt override, and end-to-end tool-call -> result -> summary.

93 shared-ai tests green total (was 79).

M3.2 (task tool in registry) and M3.3 (consumer wiring) follow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:59:05 +02:00
Till JS
f7536bc0b9 feat(shared-ai): route compactor to Haiku-tier model by default (M2.5)
compactHistory() now defaults to DEFAULT_COMPACT_MODEL =
'google/gemini-2.5-flash-lite' when the caller doesn't override. Lite
is ~3–5x cheaper than gemini-2.5-flash with near-identical
summarisation quality — summarisation doesn't need the same tier as
reasoning + tool-calling, and the compactor fires exactly when token
spend is highest, so the cheaper route saves exactly where it matters.

CompactHistoryOptions.model is now optional. All three consumers
(mana-ai tick, webapp Companion, webapp Mission runner) drop their
explicit gemini-2.5-flash override and let the default apply.

This is the pragmatic M2.5: no mana-llm changes. The "tier" abstraction
(X-Model-Tier header, env-routed aliases) from the Claude-Code report
makes sense only once multiple utility tasks need cheaper routing —
topic-detection, classification, command-injection checks. Today only
the compactor wants it, and a model constant is the simplest contract
that works.

2 new tests (default applied + override honoured). 79 shared-ai tests
green, all three consumers type-check clean. One pre-existing unrelated
type error in apps/mana/apps/web/src/lib/modules/wardrobe/queries.ts
(not touched by this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:26:50 +02:00
Till JS
13361eb083 feat(shared-ai): compactHistory() — context-window compactor primitive (M2.1)
The Claude-Code wU2 pattern: when token usage hits ~92% of the provider's
context budget, fold all pre-tail turns into a single structured summary
(Goal / Decisions / Tools Called / Current Progress) so subsequent
rounds see a synopsis instead of the raw log.

This commit ships ONLY the primitive. Wiring it into runPlannerLoop
(auto-trigger before the next LLM call when shouldCompact() fires)
is M2.2 so the surface stays small and testable.

New exports from @mana/shared-ai:

  - shouldCompact(totalTokens, maxContextTokens, threshold?)
      → boolean; DEFAULT_COMPACT_THRESHOLD = 0.92, matching Claude Code.
      Bails safely when maxContextTokens is missing (local models often
      don't report usage).

  - compactHistory(messages, { llm, model, keepRecent?, temperature? })
      → { messages, summary, compactedTurns, usage? }
      Preserves: [0]=system, [1]=first user, [last N]=recent turns
      (default 4). Everything between gets sent through the compact
      agent with COMPACT_SYSTEM_PROMPT — a fixed 4-section Markdown
      schema. Temperature default 0.2 because we want summarisation,
      not creativity.

  - parseCompactSummary / renderCompactSummary — round-trip helpers.
      Parser is tolerant (missing sections → empty string) so a partial
      compaction still produces a usable summary.

The summary replaces the middle as a single role='assistant' message
wrapped in <compact-summary> tags. Assistant role (not system) because
some providers reject arbitrary system messages deep in history.

Tests: 17 new across the 4 exports (trigger logic, Markdown round-trip,
structural preservation of anchors + tail, usage passthrough, custom
keepRecent). All 71 shared-ai tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:21:10 +02:00
Till JS
faa472be91 feat(mana-ai): first live reminder producers — token budget + retry-loop
Wires the M1 reminderChannel into the mana-ai mission runner with two
initial producers in services/mana-ai/src/planner/reminders.ts:

- tokenBudgetReminder — warns at 75% of the agent's daily cap, emits a
  stronger "wrap up NOW" message at/above 100%. Uses pretick usage +
  accumulated round usage so the warning tracks drift during a long
  plan.
- retryLoopReminder — shape is in place (round≥3 + last 2 failures),
  currently limited to the single lastCall LoopState exposes. Extends
  cleanly once LoopState carries the full failure window.

buildReminderChannel composes active producers; the tick hoists
pretickUsage24h so the channel has the baseline. Each round the loop
re-evaluates the producers, so usage drift across rounds surfaces on
the NEXT turn.

Also exports LoopState + ReminderChannel from @mana/shared-ai top-level
so consumers don't need to reach into /planner.

Tests: 13 new bun tests covering thresholds, pretick+round summing,
composition, and per-round re-evaluation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:00:04 +02:00
Till JS
0d613e1846 feat(ai): thread TokenUsage through runPlannerLoop → mana-ai budget
Carries per-round token counts from the mana-llm response body
(prompt_tokens + completion_tokens) back through LlmCompletionResponse
→ PlannerLoopResult. The loop sums across rounds and exposes a single
aggregate on result.usage.

Lets mana-ai's tick re-activate per-agent daily-token budget tracking
— tokensUsed was stubbed to 0 in the migration commit (6) because the
loop didn't surface usage yet. Now recordTokenUsage + agentTokenUsage24h
get real numbers again, and the mana_ai_tokens_used_total Prometheus
counter is accurate.

Additive only: consumers without usage needs ignore the new field,
and providers that don't return usage produce zeros (not undefined —
the loop still exposes the object so downstream branches stay trivial).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:21:34 +02:00
Till JS
5b7564b3a4 test(ai): promote MockLlmClient to a shared @mana/shared-ai export
The runPlannerLoop test file and the webapp's mission-runner test each
had their own inline scripted LLM mock — same interface, diverged
slightly. Consolidates into packages/shared-ai/src/planner/mock-llm.ts
and re-exports from the package root so any consumer can drive the
loop deterministically.

Both existing test files now use the shared client. 5 + 3 tests pass,
44 total in shared-ai still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 18:05:46 +02:00
Till JS
4daca8970b feat(shared-ai): runPlannerLoop + compact system prompt for function calling
Introduces the new planner pipeline both the webapp runner and the
mana-ai tick will swap onto in the next commits. Additive for now —
the legacy buildPlannerPrompt + parsePlannerResponse stay exported so
callers can migrate one at a time; they get removed once the last
consumer is gone.

- planner/loop.ts — runPlannerLoop orchestrates a multi-turn chat
  against a caller-supplied LlmClient. Tool-calls from the LLM are
  handed to an onToolCall callback and their results fed back as
  tool-messages. Parallel tool-calls in one turn execute sequentially
  to keep the message log linear for debugging. Stops on assistant
  stop, empty tool_calls, or a hard max-rounds ceiling (default 5).
- planner/system-prompt.ts — new buildSystemPrompt. ~40-line German
  system frame, no tool listing (the SDK-level tools field carries
  the schemas now), no JSON format example, no "please return JSON"
  plea. User frame renders mission + linked inputs + last 3
  iteration summaries, same as before.
- Five test cases covering the loop: immediate stop, single tool
  call with result feedback, parallel calls execute in order, tool
  failures propagate as tool-messages the LLM can react to, and
  maxRounds ceiling fires with the right stopReason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:31:01 +02:00
Till JS
4523ab24e3 feat(shared-ai): toolToFunctionSchema — catalog → OpenAI function-spec
Single bridge between the AI_TOOL_CATALOG shape and the wire format every
provider (Gemini, OpenAI-compat, Ollama ≥ 0.3) speaks for native tool
calling. Keeps the catalog as the source of truth — the runner never
reads catalog entries directly; it asks this converter for function-spec
shapes to hand the LLM.

- No _rationale or wrapper-tool injection: the runner doesn't need it
  and the added schema noise would hurt planner quality.
- Throws on unknown parameter types so catalog typos (e.g. "array"
  instead of "string") fail loudly instead of coercing silently.
- Preserves enum constraints; drops the enum key entirely when absent
  so Gemini doesn't reject empty-enum function-declarations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 15:24:36 +02:00
Till JS
fad7f4bea3 feat(ai): guardrail layer — pre/post-plan + pre-execute checks
Add a guardrail system that runs alongside the Mission Runner pipeline
to catch obvious issues before they waste tokens or corrupt data.

Architecture (packages/shared-ai/src/guardrails/):
- types.ts: Guardrail, GuardrailResult, 4 phase interfaces
- builtin.ts: 4 built-in guardrails (always active):
  - input-size-limit: blocks >100K chars of resolved input
  - plan-step-limit: blocks plans with >25 steps (runaway planner)
  - duplicate-destructive-tool: warns if undo_drink called 2x
  - empty-required-params: blocks create_task without title
- runner.ts: runPrePlanGuardrails/runPostPlanGuardrails/runPreExecuteGuardrails

Wired into runner.ts at 3 checkpoints:
- Before deps.plan() — pre-plan check
- After plan received — post-plan check
- Before each stage() call — pre-execute check

Guardrails are synchronous, never hit the network, and produce
clear error messages when they block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:11:34 +02:00
Till JS
d40a61119e refactor(ai): dynamic tool registry — single-source catalog in shared-ai
Introduce AI_TOOL_CATALOG in @mana/shared-ai as the single source of truth
for all 29 tool schemas (17 propose + 12 auto). Both the webapp policy and
the server-side mana-ai planner now derive their tool lists from the catalog
instead of maintaining independent hardcoded copies.

- New: packages/shared-ai/src/tools/schemas.ts — catalog with ToolSchema type
- Rewrite: proposable-tools.ts — derived from catalog instead of hardcoded array
- Rewrite: services/mana-ai/src/planner/tools.ts — 277→30 lines (imports from catalog)
- Simplify: webapp policy.ts — derives AUTO/PROPOSE from catalog defaultPolicy

Adding a new tool now requires 2 files instead of 3-5:
1. Add schema to AI_TOOL_CATALOG (shared-ai)
2. Add execute function in the module's tools.ts (webapp)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:06:07 +02:00
Till JS
a08e45ca16 feat(templates): generalise to WorkbenchTemplate + ship Calmness pilot (T1)
First pass of the workbench-templates plan (docs/plans/workbench-
templates.md) — templates are no longer agent-centric but a general
"starter kit" bundle: optional agent + optional scene + optional
missions + optional per-module seeds. Pilot non-AI template "Calmness"
ships alongside.

Shape generalisation (packages/shared-ai/src/agents/templates/types.ts):
- AgentTemplate renamed to WorkbenchTemplate; all fields now optional
  (agent, scene, missions, seeds). Back-compat AgentTemplate alias
  kept so research/context/today keep compiling.
- Added `category: 'ai'|'wellness'|'work'|'lifeEvent'|'delight'` +
  `icon` (for non-agent templates that have no avatar) + `version`
  field (for future update-detection).
- New WorkbenchTemplateSeedItem shape: `{stableId?, data: unknown}`.
  Module-specific seed payloads are typed at the handler side.
- Existing three AI templates nachgezogen: category='ai' (or
  'delight' for today-agent), icon, version='1'.

Seed infrastructure:
- apps/mana/apps/web/src/lib/data/ai/agents/seed-registry.ts — in-
  memory handler map keyed by module name; module-local seed.ts files
  register themselves at import time.
- apps/mana/apps/web/src/lib/modules/meditate/seed.ts — first handler:
  createPreset-based, idempotent via stableId embedded as HTML
  comment in the preset description (T1 pragmatism; T2 adds a proper
  column on the preset schema).
- data/ai/missions/setup.ts pulls `import '$lib/modules/meditate/seed'`
  so the handler is registered before any template is applied.

Applicator upgrades (data/ai/agents/apply-template.ts):
- Agent step now optional — skipped cleanly when template has no
  agent part.
- New step 4: seeds. Walks template.seeds, looks up the handler for
  each module, aggregates per-item outcomes (created/skipped-exists/
  failed) into result.seedOutcomes. Missing handler = warning, not
  fatal. Crypto/encryption unchanged — seeds go through the same
  module stores that module code already uses.
- Result shape gains `seedOutcomes: Record<string, SeedOutcome[]>`
  so the gallery can show "3 new, 1 already there".

Calmness pilot (packages/shared-ai/src/agents/templates/calmness.ts):
- category='wellness', NO agent, scene with meditate/mood/journal/
  sleep apps, two meditate preset seeds:
  * 4-7-8 Atmung (breathing preset)
  * Body-Scan 10min (bodyscan preset with 9 scan steps)
- Each seed has a stableId so re-apply is idempotent.

Gallery updates (routes/(app)/agents/templates/+page.svelte):
- Card avatar falls back to t.icon when no agent. "Agent" chip shows
  only for agent-templates; "N Seeds" chip shows for templates with
  seeds.
- Detail header shows "Workbench-Setup ohne AI-Agent" when no agent.
- New "Seeds" preview section: lists per-module counts + item names.
- Options section gains a "Seed-Daten in Module einpflegen" checkbox.
- Success panel shows seed summary: "3 Seeds neu, 1 bereits
  vorhanden".

Tests: shared-ai 26/26, webapp svelte-check 0 errors, 0 warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 01:07:41 +02:00
Till JS
7822340ea0 feat(ai-agents): Template gallery — 3 ready-to-use agent bundles
First pass of the Multi-Agent discoverability UX. A new /agents/
templates route showcases pre-configured agents; clicking one creates
agent + scene + starter mission(s) as a single bundle. Addresses the
"blank form anxiety" + "user doesn't know what agents are for"
observations from the UX brainstorm.

Three templates for v1 (shared-ai/src/agents/templates/):
- 🔍 Recherche-Agent — reads sources one by one, writes a note per
  source, summarizes into a report. Manual-cadence mission; all
  writes propose so user curates.
- 🧭 Kontext-Agent — learns about the user via a weekly check-in.
  Reads kontext/notes/goals, asks 2-3 questions, proposes a diff-
  style context update. Weekly Sunday cadence.
- 🌅 Today-Agent — researches "on this day" history each morning,
  writes a 4-8 line German poem, proposes a journal note. Daily 7am
  cadence. A "delight" agent, not a productive one.

Each template packs (agent config, scene layout, starter mission):
- AgentTemplate type lives in @mana/shared-ai — pure data, no runtime
  imports. Adding a new template = drop a file in templates/ and
  extend ALL_TEMPLATES.
- Template-specific policies derive from the proposable-tool list so
  drift-guard catches divergence from the canonical set.
- Starter missions default to startPaused=true — user sees the
  mission ready-to-go and hits Play when ready. Prevents surprise
  autonomous work on first apply.

Applicator (data/ai/agents/apply-template.ts):
- Creates agent → scene (if template defines one) → missions in
  order. Agent failure = abort; scene/mission failures surface as
  warnings in the result without blocking.
- Duplicate-name handling: falls through to findByName, returns
  existing agent with wasExisting=true; scene is skipped in that
  case to avoid clone-proliferation.

Gallery page /(app)/agents/templates/+page.svelte:
- Three large cards side-by-side (stacks on mobile) with avatar /
  label / tagline / meta chips (Scene, N Missionen).
- Click opens detail panel with full description, scene preview
  (app-ids + widths), mission preview (title / objective / cadence),
  and override checkboxes (create scene, create missions, start
  active vs paused).
- Success panel shows what landed with warnings inline; CTA back to
  workbench.

Discoverability in /ai-agents module:
- Bar now has two buttons: "Aus Template" (primary, goto templates
  route) + "Eigener Agent" (secondary, opens the existing blank-form
  create mode).
- When only the default "Mana" agent exists, render a dashed promo
  banner at the top linking to the template gallery. Disappears as
  soon as the user has a second agent.

Tests: webapp svelte-check 0 errors, 0 warnings. shared-ai 26/26.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:36:39 +02:00
Till JS
bc77b36234 feat(agents): Agent CRUD + default bootstrap + Mission.agentId (Phase 2)
Second phase of the Multi-Agent Workbench rollout (docs/plans/
multi-agent-workbench.md). Builds on Phase 1's identity-aware Actor.

Adds the Agent primitive — a named AI persona that owns Missions,
carries its own policy + memory, and (from Phase 3 on) drives the
Workbench lens. Everything is wired; a single user currently has one
"Mana" default agent until the UI (Phase 5) lets them create more.

Shared types (@mana/shared-ai):
- agents/types.ts: Agent, AgentState, DEFAULT_AGENT_ID/NAME constants
- policy/types.ts: AiPolicy + PolicyDecision (moved from webapp so
  Agent.policy can reference it without a runtime dep on the web app)
- missions/types.ts: new optional Mission.agentId field

Webapp data layer:
- data/ai/agents/{types,store,queries,bootstrap}.ts
- Dexie schema v19 adds `agents` table (indexes on state, name,
  [state+name]); sync registered under the existing ai app-id
- Encryption registry: agents.systemPrompt + agents.memory encrypted;
  name/role/avatar/policy stay plaintext for search + UI rendering
- DuplicateAgentNameError thrown at write time (not a Dexie unique
  index — bootstrap races between tabs would otherwise hit
  ConstraintError; store now resolves via getOrCreateAgent)
- bootstrap.ts: ensureDefaultAgent + backfillMissionsAgentId. The
  backfill runs once per device (localStorage sentinel) so missions
  that pre-date the rollout get stamped with the default agent's id.
  Called fire-and-forget from startMissionTick() during layout init.

Runner threading (already merged into d5c351d63 via Till's debug-log
commit that picked up my uncommitted edits):
- runner.ts + server-iteration-staging.ts now resolve mission.agentId
  to the real Agent and build makeAgentActor with agent.name as
  displayName. Missing-agent fallback keeps using LEGACY_AI_PRINCIPAL
  so historical writes still attribute cleanly.

Tests: shared-ai 26/26, mana-ai 35/35, svelte-check 0 errors.
Agent store vitest suite is present but blocked by a pre-existing
\$lib alias resolution issue in the webapp vitest config that
predates this phase (proposals/store.test.ts is broken the same way
on HEAD). Will address separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:35:49 +02:00
Till JS
1771063df4 refactor(actor): identity-aware Actor for Multi-Agent Workbench (Phase 1)
Foundation for the Multi-Agent Workbench roadmap
(docs/plans/multi-agent-workbench.md). Every event, record, and
sync_changes row now carries a principal identity + cached display
name in addition to the three-kind discriminator.

Shape change (source of truth in @mana/shared-ai):
  Before: { kind: 'user' | 'ai' | 'system', ...kind-specific fields }
  After:  discriminated union on kind, with
            - common:  principalId, displayName
            - 'user':  principalId = userId
            - 'ai':    principalId = agentId + missionId/iterationId/rationale
            - 'system': principalId = one of SYSTEM_* sentinel strings
                        ('system:projection', 'system:mission-runner', etc.)

Key design calls (from the plan's Q&A):
- System sub-sources get distinct principalIds (not a shared 'system'
  bucket) — lets Workbench filter + revert distinguish projection
  writes from migration writes from server-iteration writes
- displayName cached on the record so renaming an agent doesn't
  rewrite history
- normalizeActor() compat shim fills principalId/displayName on
  legacy rows with 'legacy:*' sentinels so historical events never
  crash the timeline

New exports:
- BaseActor / UserActor / AiActor / SystemActor (narrowed types)
- makeUserActor, makeAgentActor, makeSystemActor (factories with
  typed return)
- SYSTEM_PROJECTION, SYSTEM_RULE, SYSTEM_MIGRATION, SYSTEM_STREAM,
  SYSTEM_MISSION_RUNNER (principalId constants)
- LEGACY_USER_PRINCIPAL, LEGACY_AI_PRINCIPAL, LEGACY_SYSTEM_PRINCIPAL
- isUserActor / isFromMissionRunner predicates

Webapp:
- data/events/actor.ts now re-exports from shared-ai, keeps runtime
  ambient-context (runAs, getCurrentActor) local
- bindDefaultUser(userId, displayName) lets the auth layer replace
  the legacy placeholder with the real logged-in user actor at login
- Mission runner + server-iteration-staging stamp LEGACY_AI_PRINCIPAL
  as the agentId placeholder — Phase 2 will thread the real agent
- Streaks projection uses makeSystemActor(SYSTEM_PROJECTION)
- All test fixtures migrated to factories

Service:
- mana-ai/db/iteration-writer.ts stamps makeSystemActor(
  SYSTEM_MISSION_RUNNER) instead of the old { kind:'system',
  source:'mission-runner' } shape. Phase 3 will switch this to an
  agent actor per mission.

Tests: 26 shared-ai + 21 webapp vitest + 35 mana-ai — all green.
svelte-check: 0 errors, 0 warnings.

No behavior change; purely a type + shape upgrade. Old sync_changes
rows parse via the normalizeActor compat shim at read time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 20:13:57 +02:00
Till JS
ef47adb7d7 feat(ai-missions): live phase + elapsed + cancel for running iterations
Closes the "iteration is running, no feedback" black hole. The user now
sees, per running iteration:

    Frage Planner · frage Planner an              ⏱ 23s
                                              [Abbrechen]

Phases (\`IterationPhase\`):
  resolving-inputs → calling-llm → parsing-response →
  staging-proposals → finalizing

The runner advances through these via \`setIterationPhase\` between each
await, writing currentPhase + phaseDetail + phaseStartedAt onto the
iteration. UI reads them via Dexie liveQuery — no polling.

Cancel:
- \`requestIterationCancel\` writes cancelRequested=true on the iteration
- runner polls \`isCancelRequested\` between every phase + per stage step
- cancellation finalises as \`failed\` with summary \`'cancelled by user'\`
- UI button is disabled + relabelled "Wird abgebrochen…" until the next
  poll picks it up

Hard timeout: 90 s wall-clock per iteration via Promise.race against a
CancelledError. Wedged backends (e.g. flaky mana-llm) fail fast with
"timeout after 90s" instead of sitting in \`running\` forever.

Elapsed counter is a \$state variable ticking once a second, scoped to
the ListView component — Dexie isn't touched. Auto-cleaned on
component destroy.

shared-ai re-exports \`IterationPhase\` so server-side mana-ai can
inspect the same phase enum (no consumer there yet, but the type is
ready for the run-status endpoint planned in HEALTH page).

77/77 webapp tests still green; svelte-check clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:15:48 +02:00
Till JS
6882ffb626 feat(shared-ai): Mission Key-Grant contract + plan for encrypted server-side runs
Foundation for Phase 2+ of the Mission Key-Grant flow: lets mana-ai
execute missions that depend on encrypted inputs (notes/tasks/events/
journal/kontext) without needing an open browser tab. Opt-in per
mission, Zero-Knowledge users excluded.

- Canonical HKDF-SHA256 derivation (scope-bound via tables + recordIds
  in the HKDF info string → scope changes invalidate the grant
  cryptographically, not just via a runtime check)
- Mission.grant field on the shared Mission type
- Golden snapshot + drift-guard test so webapp wrap path and mana-auth
  wrap endpoint can't silently diverge
- Ideas backlog at docs/future/AI_AGENTS_IDEAS.md
- Full rollout plan at docs/plans/ai-mission-key-grant.md
- COMPANION_BRAIN_ARCHITECTURE.md §21 captures the flow + privacy
  guarantees + non-goals

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:41:35 +02:00
Till JS
4be5e29bd3 feat(shared-ai): canonical proposable-tool list + drift guard on mana-ai
Makes the webapp's AI policy and the server's tool allow-list physically
impossible to drift. Adds the missing entries the guard caught on first
run: `complete_tasks_by_title`, `visit_place`, `undo_drink` now have
parameter schemas server-side too.

- `packages/shared-ai/src/policy/proposable-tools.ts`
  - `AI_PROPOSABLE_TOOL_NAMES` as `const` array + literal union type
  - `AI_PROPOSABLE_TOOL_SET` for set-membership checks
- Webapp `DEFAULT_AI_POLICY` derives its `propose` entries from the
  shared list via `Object.fromEntries(...)` — adding a tool there is now
  a one-line change in `@mana/shared-ai`
- mana-ai `AI_AVAILABLE_TOOLS`: module-load assertion compares its
  hardcoded names against `AI_PROPOSABLE_TOOL_SET` and throws with a
  pointed error on drift (extras in one direction, missing in the
  other). Service refuses to start on mismatch — better than silent
  degradation.
- Bun test (`tools.test.ts`) runs the same contract plus sanity checks
  (non-empty description, required params carry docs). Vitest policy
  test adds the symmetric check on the webapp side.

All three runtimes now green: webapp 66/66, shared-ai 2/2,
mana-ai 9/9 Bun tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:52:38 +02:00
Till JS
0d90b12d1c feat(shared-ai): extract planner + mission types to @mana/shared-ai
Single source of truth for AI Workbench types shared between the webapp
(Vite/SvelteKit) and the server-side mana-ai Bun service. Prevents the
two runtimes from drifting on prompt shape or mission structure.

- `@mana/shared-ai` package:
  - `actor.ts` — Actor union (user | ai | system) + helpers, mirrors the
    webapp's runtime type so server-side consumers parse incoming actors
    without re-declaring
  - `missions/types.ts` — Mission, MissionCadence, MissionInputRef,
    MissionIteration, PlanStep, MissionState. Adds optional
    `iteration.source: 'browser' | 'server'` to distinguish foreground
    vs server-produced iterations (groundwork for proposal write-back)
  - `planner/prompt.ts` — `buildPlannerPrompt` pure function
  - `planner/parser.ts` — `parsePlannerResponse` strict JSON validator
  - Vitest smoke tests (2) cover prompt → parse round-trip + unknown-
    tool rejection
- Webapp:
  - `missions/types.ts` re-exports from shared-ai, keeps webapp-local
    `MISSIONS_TABLE` constant + `planStepStatusFromProposal` bridge
  - `missions/planner/{types,prompt,parser}.ts` become re-export stubs
    so existing imports keep working unchanged
  - Existing webapp tests (60) continue to pass — the wire code didn't
    move, just its home

Next: mana-ai service imports buildPlannerPrompt/parsePlannerResponse
from shared-ai + wires mana-llm + writes iteration back as a
'source=server' row (tracked in services/mana-ai/CLAUDE.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:01:57 +02:00