Four cross-cutting fixes that make the bulk-import worker safe to run
under real production load. All four were called out as live-rollout
risks in the post-ship review of docs/plans/articles-bulk-import.md.
#1 — Same fieldMetaTime bug fixed in mana-ai
The articles fix in 054b9e5be hoists the helper to its own file
`apps/api/src/modules/articles/field-meta.ts`. The same naive
`rowFM[k] >= localTime` LWW comparison existed in three more
projections under services/mana-ai (missions-projection,
snapshot-refresh, agents-projection). Once any F3 stamp lands
beside a legacy-string stamp, the comparison evaluates
`'[object Object]' >= 'ISO-…'` (false) and the older value wins.
New `services/mana-ai/src/db/field-meta.ts` — same helper,
deliberately duplicated (each service treats sync_changes as a
read-only event log; sharing infra across services is out of
scope here). All 61 mana-ai bun tests still pass.
#2 — Stale 'extracting' items recycle
If the worker dies mid-fetch (OOM, pod restart), items stay in
state='extracting' forever and the job never completes. New sweep
at the start of `processOneJob`: items whose lastAttemptAt is
older than 5 minutes get bounced back to 'pending' so the next
tick re-claims them. STALE_EXTRACTING_MS tuned for the 15s
shared-rss fetch + JSDOM-parse worst case.
#3 — Pickup-row GC
Every 30 ticks (~once per minute) the worker hard-deletes
articleExtractPickup rows older than 24h. Without this a stuck
pickup-consumer (all tabs closed, Web-Lock mismatch) would let
sync_changes accumulate without bound. Logs the row count when
non-zero so we can spot stuck consumers in the wild.
#4 — DRY consent-wall heuristic
Identical CONSENT_KEYWORDS + threshold lived in routes.ts AND
import-extractor.ts. Hoisted to
`apps/api/src/modules/articles/consent-wall.ts`; both call sites
now share one heuristic.
Plan: docs/plans/articles-bulk-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Dockerfile copied only @mana/shared-{hono,ai,logger}, but
services/mana-ai/package.json also depends on @mana/shared-research and
@mana/tool-registry (and @mana/tool-registry pulls in
@mana/shared-crypto transitively). Without those, pnpm couldn't
resolve the workspace symlinks and the container crashlooped on every
restart with:
error: ENOENT reading "/app/services/mana-ai/node_modules/@mana/tool-registry"
Surfaced today after a manual `gh workflow run cd-macmini.yml -f
service=mana-ai` — mana-ai had never been deployed because no commit
since the CD pipeline started had touched its path. The first real
deploy hit the missing-COPY immediately.
Header comment in the Dockerfile now spells out direct + transitive
workspace deps so future additions don't drift.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes `updatedAt` from the wire protocol and from every Local-prefixed
record type. Replaced by two orthogonal mechanisms — deriveUpdatedAt()
for read-side public-facing values, _updatedAtIndex shadow for indexed
sorts.
Local-side:
- New `_updatedAtIndex` shadow column. Stamped by the Dexie creating /
updating hook on every write. Stripped from the pending-change payload
so it never travels to mana-sync. Indexed in Dexie v53 on the 22 tables
that previously indexed `updatedAt`.
- `deriveUpdatedAt(record)` in sync.ts returns max(__fieldMeta[*].at) so
the public-facing Task / Note / etc. shape keeps an `updatedAt: string`
property without holding it as data.
- Type-converters across ~60 module/queries.ts and types.ts files now
call `deriveUpdatedAt(local)` instead of reading `local.updatedAt`.
Module-store sweep:
- Regex codemod removed `updatedAt: new Date().toISOString()` /
`: now` / `: now()` / `: nowIso()` stamping from 121 store files
(~382 call sites total). Single-property update calls
(`{ updatedAt: now }`) collapsed to `{}`; touch-only patterns
(writing/drafts, writing/generations) kept the call as a no-op
because the hook now stamps `_updatedAtIndex` automatically on
any Dexie modification.
- Local* interfaces stripped of `updatedAt: string` (43 types.ts files).
Public-facing types (Task, Note, Mission, Agent, …) keep
`updatedAt: string` as a computed read-side property.
- Companion's chat conversation now sorts on a real
`lastMessageAt` data field instead of touching `updatedAt`.
- Session-only stores (times/session-alarms, session-countdown-timers)
stamp `updatedAt: now` directly because they're not in Dexie and
have no field-meta layer to derive from.
Sync engine:
- applyServerChanges sets `_updatedAtIndex` itself when applying
server changes (max of server-field times for updates, recordTime
for inserts) so server-replays land orderable.
- Dropped the legacy `localUpdatedAt` fallback — every record now has
`__fieldMeta`, the per-field at is the canonical source.
- Soft-delete tombstone path stops stamping `updatedAt: serverTime`,
uses `_updatedAtIndex` instead.
Server-side:
- mana-ai iteration-writer no longer emits `updatedAt` in
sync_changes.data; receivers derive it from the field-meta map.
- mana-sync types: no change (the wire format already uses
`field_meta` / `at` from F1).
Out of scope: backend Drizzle schemas (mana-credits, mana-events, …)
keep their `updated_at` columns. Those are pure server-internal — not
part of the sync_changes / __fieldMeta mechanism F3 cleans up.
Tests + checks:
- 0 svelte-check errors over 7652 files.
- 29/29 sync.test.ts (vitest).
- 61 mana-ai bun tests.
- mana-sync go test ./... cached green.
Plan: docs/plans/sync-field-meta-overhaul.md F3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 5 milestones landed today in one continuous session: registry,
health cache, fallback router, observability, and consumer migration.
115 service-side tests, validator covers 2538 files.
Final milestone of docs/plans/llm-fallback-aliases.md. Every backend
caller now requests models via the `mana/<class>` alias system instead
of hardcoded `ollama/...` strings. mana-llm resolves aliases through
`services/mana-llm/aliases.yaml` with health-aware fallback (M3) and
emits resolved-model + fallback metrics (M4).
SSOT moved to `packages/shared-ai/src/llm-aliases.ts` so apps/api,
apps/mana/apps/web, and services/mana-ai all import the same
`MANA_LLM` constant via the existing `@mana/shared-ai` workspace
dependency. Three additional sites (memoro-server, mana-events,
mana-research) inline the alias string with a SSOT comment because
they don't pull @mana/shared-ai today.
Migrated 14 sites across 10 files:
- apps/api: writing(LONG_FORM), comic(STRUCTURED), context(FAST_TEXT),
food(VISION), plants(VISION), research orchestrator (3 tiers
collapsed to STRUCTURED+FAST_TEXT/LONG_FORM)
- apps/mana/apps/web: voice/parse-task + parse-habit (STRUCTURED)
- services/mana-ai: planner llm-client + tick.ts (REASONING)
- services/mana-events: website-extractor (STRUCTURED, inlined)
- services/mana-research: mana-llm client (FAST_TEXT, inlined)
- apps/memoro/apps/server: ai.ts (FAST_TEXT, inlined)
Legacy env-vars removed: WRITING_MODEL, COMIC_STORYBOARD_MODEL,
VISION_MODEL, MANA_LLM_DEFAULT_MODEL. The chain in aliases.yaml is
now the single tuning surface; SIGHUP reloads it without redeploys.
New `scripts/validate-llm-strings.mjs` regex-scans 2538 files for
hardcoded `<provider>/<model>` strings and fails the build if any
land outside the SSOT or the explicitly-allowed paths (image-gen
modules, model-inspector code, this validator itself, the registry).
Wired into `validate:all` next to the i18n + theme validators.
Verified: `pnpm validate:llm-strings` clean, `pnpm --filter @mana/api
type-check` clean, `pnpm --filter @mana/ai-service type-check`
clean. Web type-check has 2 pre-existing errors in
SettingsSidebar.svelte (i18n MessageFormatter type drift, last
touched in 988c17a67 — unrelated to this work).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
compactHistory() now defaults to DEFAULT_COMPACT_MODEL =
'google/gemini-2.5-flash-lite' when the caller doesn't override. Lite
is ~3–5x cheaper than gemini-2.5-flash with near-identical
summarisation quality — summarisation doesn't need the same tier as
reasoning + tool-calling, and the compactor fires exactly when token
spend is highest, so the cheaper route saves exactly where it matters.
CompactHistoryOptions.model is now optional. All three consumers
(mana-ai tick, webapp Companion, webapp Mission runner) drop their
explicit gemini-2.5-flash override and let the default apply.
This is the pragmatic M2.5: no mana-llm changes. The "tier" abstraction
(X-Model-Tier header, env-routed aliases) from the Claude-Code report
makes sense only once multiple utility tasks need cheaper routing —
topic-detection, classification, command-injection checks. Today only
the compactor wants it, and a model constant is the simplest contract
that works.
2 new tests (default applied + override honoured). 79 shared-ai tests
green, all three consumers type-check clean. One pre-existing unrelated
type error in apps/mana/apps/web/src/lib/modules/wardrobe/queries.ts
(not touched by this commit).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on M2: when the compactor fires, the LLM needs to know
it's now seeing a <compact-summary> instead of raw turns so it
doesn't waste a turn asking about lost details or re-executing tools
whose responses are gone.
shared-ai:
- LoopState grows `compactionsDone: number` (cap-1 by current loop
policy, but shape kept as count for future multi-compact cycles).
- runPlannerLoop populates it on each reminder-channel call. New
loop test asserts [0, 1] sequence: round 1 before compaction,
round 2 after.
mana-ai:
- New producer `compactedReminder` — fires severity=info when
compactionsDone >= 1, wrapped in a German one-liner ("frag nicht
nach verlorenen Details").
- Injected FIRST in buildReminderChannel so the LLM frames the rest
of the round with "I'm looking at a summary" context. Metric
surface stays `{producer='compacted', severity='info'}`.
4 new reminder tests (3 pure producer + 1 composition-ordering) +
1 loop-wiring test. 77 shared-ai, 20 reminders.test.ts — green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Claude-Code wU2 pattern goes live. Every mission run now passes a
compactor into runPlannerLoop that will fire once if cumulative token
usage crosses 92% of MANA_AI_COMPACT_MAX_CTX (default 1_000_000, the
gemini-2.5-flash ceiling). Override via env for deployments on smaller
models; set to 0 to disable entirely.
The compactor reuses the planner's own LlmClient + gemini-2.5-flash
model for now. When mana-llm grows a Haiku tier we'll route the
compactor there — it's pure summarisation and a cheaper model saves
tokens exactly where they matter.
New metrics:
- mana_ai_compactions_triggered_total — counter, one per firing
- mana_ai_compacted_turns — histogram, how many middle turns got
folded each time (< 3 ⇒ maxCtx is probably misconfigured)
Logs print a 60-char tail of the summary.goal so the "what was this
mission doing again" question survives a compaction.
No new tests here — compactHistory and the loop wiring are already
covered by the 22 tests in shared-ai (M2.1 + M2.2). The 57 existing
mana-ai bun tests stay green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Producers now return structured {producer, severity, text} objects
instead of raw strings. buildReminderChannel collects them, increments
mana_ai_reminders_emitted_total{producer, severity} per emission, and
maps back to strings for the shared-ai loop input.
Why structured: the Prometheus label "severity" lets dashboards split
75-99% token-budget warnings (severity=warn) from 100%+ escalations
(severity=escalate) without NLP on the reminder text. Adding a new
producer that emits only info-level state (e.g. stale-sync warning)
falls out for free.
Active producer labels today:
- token-budget (warn, escalate)
- retry-loop (warn)
With this plus the scrape job (d087b4744), we can finally answer:
"does the budget warning actually change LLM behaviour?" — correlate
reminders_emitted_total{producer='token-budget'} with
tick_duration_seconds or planner_rounds_histogram.
3 tests updated to assert the new {producer, severity, text} shape
(16 reminder tests total, all green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends LoopState with a sliding window of the last N ExecutedCalls
(oldest-first), capped at LOOP_STATE_RECENT_CALLS_WINDOW = 5. The loop
maintains the window automatically; reminderChannel producers read it
without touching internal state.
This activates retryLoopReminder which was shape-only in faa472be9.
The guard now fires end-to-end: when round >= 3 and the tail-2 calls
both returned success:false, the LLM sees a "stop retrying, write a
summary instead" <reminder> on the next turn. The tail-2 check rather
than window-wide is deliberate — a flaky run with intermittent success
(F, F, F, OK, F) is not a retry loop, just flaky tools.
Why window=5: retry loops usually manifest within 2-3 consecutive
rounds; a 5-deep window gives room for burst-detection and
stale-tool heuristics without bloating the reminder channel. Cap
keeps the reminder producers O(5) regardless of loop length.
Tests: 3 new (sliding-window cap + slide + order in shared-ai, retry
composition + budget+retry chain + tail-only heuristic in mana-ai).
Total agent-loop tests now 74 across both packages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mana-mcp:
- Policy-gate section: POLICY_MODE semantics, the four decision
rules, where to find soak metrics during log-only burn-in.
- /metrics section pointing at the Prometheus job.
mana-ai:
- New v0.8 status block: reminderChannel wiring, the two live
producers (tokenBudgetReminder active, retryLoopReminder dormant
pending LoopState extension), why POLICY_MODE here is limited to
freetext inspection, why parallel-reads have no effect until the
tool-registry absorbs the full AI_TOOL_CATALOG (M4 of personas).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the M1 reminderChannel into the mana-ai mission runner with two
initial producers in services/mana-ai/src/planner/reminders.ts:
- tokenBudgetReminder — warns at 75% of the agent's daily cap, emits a
stronger "wrap up NOW" message at/above 100%. Uses pretick usage +
accumulated round usage so the warning tracks drift during a long
plan.
- retryLoopReminder — shape is in place (round≥3 + last 2 failures),
currently limited to the single lastCall LoopState exposes. Extends
cleanly once LoopState carries the full failure window.
buildReminderChannel composes active producers; the tick hoists
pretickUsage24h so the channel has the baseline. Each round the loop
re-evaluates the producers, so usage drift across rounds surfaces on
the NEXT turn.
Also exports LoopState + ReminderChannel from @mana/shared-ai top-level
so consumers don't need to reach into /planner.
Tests: 13 new bun tests covering thresholds, pretick+round summing,
composition, and per-round re-evaluation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Claude-Code-inspired primitives for runPlannerLoop, derived from the
reverse-engineering reports in docs/reports/:
1. **Policy gate** (@mana/tool-registry) — evaluatePolicy() gates every tool
dispatch: denies admin-scope, denies destructive tools not in the user's
opt-in list, rate-limits per tool (30/60s default), flags prompt-injection
markers in freetext without blocking. Wired into mana-mcp with a
per-user rolling invocation log and POLICY_MODE env (off|log-only|enforce,
default log-only). mana-ai uses detectInjectionMarker only — tool dispatch
there is plan-only, so rate-limit/destructive checks don't apply yet.
2. **Reminder channel** (packages/shared-ai/src/planner/loop.ts) — new
reminderChannel callback in PlannerLoopInput. Called once per round with
LoopState snapshot (round, toolCallCount, usage, lastCall); returned
strings wrap in <reminder> tags and inject as transient system messages
into THIS LLM request only. Never pushed to messages[] — the Claude-Code
<system-reminder> pattern that keeps the KV-cache prefix stable.
3. **Parallel reads** (loop.ts) — isParallelSafe predicate enables
Promise.all dispatch when every tool_call in a round is parallel-safe,
in batches of PARALLEL_TOOL_BATCH_SIZE=10. Any non-safe call downgrades
the whole round to sequential. messages[] always appends in source
order, never completion order, so the debug log stays linear.
Default-off (undefined predicate) preserves pre-M1 behaviour.
Tests: 21 new in tool-registry (policy), 9 new in shared-ai (5 parallel,
4 reminder). All 74 green, type-check clean across 4 packages.
Design/plan: docs/plans/agent-loop-improvements-m1.md
Reports: docs/reports/claude-code-architecture.md,
docs/reports/mana-agent-improvements-from-claude-code.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Opt-in path for missions that want Gemini Deep Research Max (up to 60 min
per task) instead of the shallow RSS pre-research. Because Max runs well
past a single 60-second tick, the state is carried across ticks:
tick N: submit → INSERT mission_research_jobs row → skip planner
tick N+k: poll → still running → skip planner (metric pending_skips)
tick N+m: poll → completed → inject as ResolvedInput, DELETE row, plan
- ManaResearchClient talks to mana-research's new internal
/v1/internal/research/async endpoints with X-Service-Key +
X-User-Id. Graceful-null on transport errors so a flaky
mana-research never crashes the tick loop.
- New table mana_ai.mission_research_jobs with PK (user_id, mission_id)
— presence is the "pending" flag; delete-on-terminal keeps queries
trivial.
- handleDeepResearch() encapsulates the state machine; planOneMission
now returns a discriminated union (planned | skipped | failed) so
"research pending" isn't miscounted as a parse failure.
- Opt-in at TWO gates to keep cost in check ($3–7/task, 1500 credits
per run):
1. MANA_AI_DEEP_RESEARCH_ENABLED=true server-side (default off)
2. DEEP_RESEARCH_TRIGGER regex matches the mission objective
(strict: "deep research", "tiefe recherche", "umfassende
recherche", "hintergrundrecherche", "deep dive")
Falls back to shallow RSS when either gate fails or the submit
errors upstream.
- Prom metrics: mana_ai_research_jobs_{submitted,completed,failed}_total
labelled by provider, plus _pending_skips_total.
- docker-compose wires MANA_RESEARCH_URL + the opt-in flag and adds
mana-research to depends_on.
- Full write-up with real API response shape (outputs plural, not
OpenAI-style), step-3 MCP-server plan (security-gated, not built),
ops + kill-switch: docs/reports/gemini-deep-research.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new counters + one histogram fill the observability gap from
the function-calling migration:
- mana_ai_tool_calls_total{tool, policy, outcome} — one tick per
tool_call the planner produced. `outcome` is `deferred` on the
server (stub onToolCall records for later client execution);
webapp runner will emit success/failure once it grows its own
Prom surface.
- mana_ai_planner_rounds (histogram, buckets 1..5) — distribution of
rounds consumed per iteration. Runs close to the cap signal a
planner struggling with the mission objective.
- mana_ai_provider_errors_total{provider, kind} — structured errors
surfaced from mana-llm. Kind mirrors the ProviderError hierarchy
added in commit 1 of the migration (blocked/truncated/auth/
rate_limit/capability/unknown).
Plumbing:
- llm-client.ts parses mana-llm's `{detail: {kind, message}}` 4xx/5xx
body shape and re-throws as ProviderCallError carrying the kind.
- tick.ts observes metrics at the natural emission points — rounds
+ per-call counter after runPlannerLoop returns, provider_errors
in the catch block.
Grafana dashboards + status.mana.how already pick up the
collectDefaultMetrics prefix, so these metrics land in the existing
mana-ai panel without scraper changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Carries per-round token counts from the mana-llm response body
(prompt_tokens + completion_tokens) back through LlmCompletionResponse
→ PlannerLoopResult. The loop sums across rounds and exposes a single
aggregate on result.usage.
Lets mana-ai's tick re-activate per-agent daily-token budget tracking
— tokensUsed was stubbed to 0 in the migration commit (6) because the
loop didn't surface usage yet. Now recordTokenUsage + agentTokenUsage24h
get real numbers again, and the mana_ai_tokens_used_total Prometheus
counter is accurate.
Additive only: consumers without usage needs ignore the new field,
and providers that don't return usage produce zeros (not undefined —
the loop still exposes the object so downstream branches stay trivial).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrates the background tick from buildPlannerPrompt + PlannerClient +
parsePlannerResponse to the shared runPlannerLoop with native function
calling. Structurally identical to the webapp runner (commit 5a) —
same catalog, same compact system prompt, same multi-turn chat.
Server-specific twist: the ``onToolCall`` callback is a no-op stub
(returns {success:true, message:'recorded — pending client
application'}). The server has no Dexie access, so it can't actually
execute writes; instead it captures the LLM's chosen tool_calls and
writes them as PlanStep entries on the iteration. The user's client
picks up those planned steps on sync — same shape as before, just
sourced from the LLM's native tool_calls instead of a regex-extracted
JSON block.
Scope trimmed by the SERVER_TOOLS filter: only propose-default (write)
tools go to the server planner. Read-only tools (list_*, get_*) are
hidden because stubbing a response would let the LLM hallucinate that
it saw real data. Read-then-act chains stay with the foreground
runner, which has a real executor.
Deleted: planner/client.ts (old PlannerClient; replaced by
planner/llm-client.ts). Drift guard in tools.ts collapses into a
SERVER_TOOLS = AI_TOOL_CATALOG.filter(propose) derivation — no more
hand-maintained duplicate list; the contract test now asserts the
inverse round-trip against AI_PROPOSABLE_TOOL_SET.
TODO (follow-up): token usage tracking is temporarily set to 0 because
runPlannerLoop doesn't expose per-message usage yet. Budget
enforcement on the server is effectively disabled until the loop
returns that data — the webapp runner is unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
startGoalTracker was only ever called from tests, so DrinkLogged /
TaskCompleted / MealLogged events never incremented currentValue and
GoalReached never fired — the progress bars were cosmetic. Wire it into
the (app)/+layout idle boot next to startStreakTracker, with matching
teardown in onDestroy.
Also drop <AiProposalInbox module="goals"/> into the module ListView so
create_goal / pause_goal / resume_goal / complete_goal proposals are
reviewable inline (previously only visible in the mission-detail view).
Refresh the tool-coverage tables while we're at it: apps/mana/CLAUDE.md
now reflects the real catalog state (59 tools, 19 modules — was 37/12),
and services/mana-ai/CLAUDE.md shows the correct server-side propose
subset (31 tools, 16 modules). Also fixes a stale 'location_log' →
'get_current_location' typo in the places row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement rolling 24h token budget enforcement in the mana-ai tick loop.
Agents with maxTokensPerDay set are now rate-limited server-side.
Changes:
- PlannerClient: extract usage.total_tokens from mana-llm response
- planOneMission: return {plan, tokensUsed} tuple
- tick loop: check getAgentTokenUsage24h() before planning; skip with
'skipped-budget' decision if over limit
- tick loop: record token usage after successful plan via
recordTokenUsage() INSERT into mana_ai.token_usage
- migrate.ts: new mana_ai.token_usage table with rolling window index
- metrics.ts: mana_ai_tokens_used_total counter (by agent_id)
Budget flow:
Agent.maxTokensPerDay = 50000
→ tick checks: SELECT SUM(tokens_used) WHERE ts > now()-24h
→ if sum >= 50000: skip mission, emit skipped-budget metric
→ else: plan mission, INSERT token_usage row
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduce AI_TOOL_CATALOG in @mana/shared-ai as the single source of truth
for all 29 tool schemas (17 propose + 12 auto). Both the webapp policy and
the server-side mana-ai planner now derive their tool lists from the catalog
instead of maintaining independent hardcoded copies.
- New: packages/shared-ai/src/tools/schemas.ts — catalog with ToolSchema type
- Rewrite: proposable-tools.ts — derived from catalog instead of hardcoded array
- Rewrite: services/mana-ai/src/planner/tools.ts — 277→30 lines (imports from catalog)
- Simplify: webapp policy.ts — derives AUTO/PROPOSE from catalog defaultPolicy
Adding a new tool now requires 2 files instead of 3-5:
1. Add schema to AI_TOOL_CATALOG (shared-ai)
2. Add execute function in the module's tools.ts (webapp)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Catches up all docs with the current state of the AI tool system.
services/mana-ai/CLAUDE.md:
- New v0.6 status section documenting NewsResearchClient,
pre-planning research injection, config.manaApiUrl, and the full
28-tool / 11-module inventory (17 propose + 11 auto).
apps/mana/CLAUDE.md:
- New "Tool Coverage" table in the AI Workbench section listing all
tools per module with their policy (propose vs auto).
- New "Templates" subsection documenting the two-section gallery
(agent vs workbench templates), the seed-handler registry, and
the current handlers (meditate, habits, goals).
- Architecture cross-reference updated to include §23.
docs/architecture/COMPANION_BRAIN_ARCHITECTURE.md:
- §23.2 gains a "Server-Side Research (mana-ai, ab v0.6)" subsection
explaining how NewsResearchClient mirrors the client-side research
pre-step: same endpoints, same trigger regex, but HTTP-direct from
the Docker network instead of SvelteKit-internal.
docs/plans/README.md:
- workbench-templates.md added to the roadmap table (T1 shipped).
- Multi-agent description updated to mention 28 tools + server-side
web-research.
- Architecture cross-reference includes §23.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two major tool expansions — the Recherche-Agent and Today-Agent can
now research the web autonomously (no browser needed), and a future
Meeting-Prep agent can read + create contacts.
=== research_news (server-side execution) ===
The biggest addition: mana-ai can now call mana-api's news-research
endpoints (POST /discover + /search) directly, without a browser.
Infrastructure:
- services/mana-ai/src/planner/news-research-client.ts — full HTTP
client with discover→search pipeline. 15s/30s timeouts. Graceful
null on any failure (network, mana-api down, bad response) so the
tick never crashes from research errors.
- config.manaApiUrl added (default http://localhost:3060); wired in
docker-compose.macmini.yml as http://mana-api:3060 + depends_on
mana-api with service_healthy condition.
Pre-planning research step (cron/tick.ts):
- Before the planner prompt is built, the tick checks if the
mission's objective or conceptMarkdown matches research keywords
(same RESEARCH_TRIGGER regex the webapp uses). When it matches:
* NewsResearchClient.research(objective) runs discovery + search
* Results are injected as a synthetic ResolvedInput with id
'__web-research__' and a formatted markdown context block
* The Planner then sees real article URLs/titles/excerpts and can
reference them in create_note / save_news_article steps
* Log line: "pre-research: N feeds, M articles"
Tool registration:
- research_news added to AI_PROPOSABLE_TOOL_NAMES + mana-ai tools.ts
with params (query, language?, limit?). This lets the planner also
explicitly propose a research step as a PlanStep (in addition to
the pre-planning auto-injection).
=== create_contact ===
- Added to AI_PROPOSABLE_TOOL_NAMES + mana-ai tools.ts with params
(firstName required, lastName/email/phone/company/notes optional).
- Contacts are encrypted at rest; server planner can plan the step
but execution stays on the webapp (same as all propose tools).
Full server-side contact resolution via Key-Grant is a future
enhancement.
- get_contacts added to webapp AUTO_TOOLS so agents can inspect
existing contacts without nagging (read-only, auto-policy).
Module coverage now:
✅ todo (5) ✅ calendar (2) ✅ notes (5) ✅ places (4)
✅ drink (3) ✅ food (2) ✅ news (1) ✅ journal (1)
✅ habits (3) ✅ news-research (1) ✅ contacts (1)
11 modules, 28 tools total (17 propose, 11 auto).
Tests: mana-ai 41/41 (drift-guard passes), shared-ai type-check
clean, webapp svelte-check 0 errors, 0 warnings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes the three biggest tool-coverage gaps so the shipped agent
templates can actually do their job end-to-end. Before this, the
Recherche-Agent couldn't create notes (only edit), the Today-Agent
couldn't create journal entries, and no habit-related tool was
server-proposable at all.
shared-ai (proposable-tools.ts):
- create_note (notes) — key unlock: Recherche-Agent now creates
per-source notes and the summary report.
- create_journal_entry (journal) — key unlock: Today-Agent proposes
a poem as a journal entry with optional mood.
- create_habit (habits) — agent can suggest new habits.
- log_habit (habits) — agent can log a habit completion for today.
Organized the list with per-module section comments for readability
now that we're at 15 proposable tools.
mana-ai (planner/tools.ts):
- 5 new tool definitions with full parameter schemas:
* create_note (title, content?)
* create_journal_entry (content, title?, mood? enum)
* create_habit (title, icon, color)
* log_habit (habitId, note?)
- Drift-guard contract test passes (41/41) — confirms the mana-ai
tool list is in sync with the shared-ai canonical set.
Webapp (policy.ts):
- get_habits added to AUTO_TOOLS (read-only; agent can inspect
which habits exist without nagging the user for approval).
- list_notes added to AUTO_TOOLS (was already used in the reasoning
loop but missing from the explicit auto-list; the planner default
fell through to 'propose' which was wasteful for a read op).
Module coverage after this change:
✅ todo (5 tools) ✅ calendar (2) ✅ notes (5 incl. create)
✅ places (4) ✅ drink (3) ✅ food (2)
✅ news (1) ✅ journal (1) ✅ habits (3)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes the "read all notes and tag them #Natur/#Technologie/…" use case
fully functional. Four new ModuleTool entries in notes/tools.ts:
- list_notes(limit?, query?, includeArchived?) — auto, read-only. Returns
id + title + excerpt so the planner can reference concrete notes
without dumping full bodies.
- update_note(noteId, title?, content?) — proposable. Destructive full
overwrite. Docstring nudges toward append_to_note when applicable.
- append_to_note(noteId, content) — proposable, non-destructive. Handles
the trailing-newline separator so markdown stays clean.
- add_tag_to_note(noteId, tag) — proposable, idempotent, case-insensitive.
Strips leading #, replaces spaces with _, skips if already present.
Exactly the categorization primitive the user asked for.
All three writes are added to AI_PROPOSABLE_TOOL_NAMES so both the
webapp policy and mana-ai's boot-time drift guard agree (now 11 tools).
Mirrored in services/mana-ai/src/planner/tools.ts.
AiProposalInbox mounted on /notes so approvals land inline in the
notes module too (already appears in the mission-detail cross-module
inbox via the earlier commit).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Multi-Agent Workbench shipped end-to-end (commits 1771063df through
7c89eb625). This commit turns the plan doc into a proper history + post-
mortem and captures the deferred Team-Workbench as its own forward plan
so the architectural breadcrumbs don't rot.
docs/plans/multi-agent-workbench.md:
- Status bumped to ✅ Shipped; every phase checkbox flipped.
- Open-questions section rewritten with the decisions that were
actually made (name-unique via store write-time check, per-source
system principalIds, policy fully migrated, scene binding default-
empty with smart suggestion).
- New "Shipping-Historie" table mapping each phase to its commit, the
number of files touched, and the test outcome.
- New "Lessons Learnt + Follow-Up Ideen" with:
* What went better than expected (L3 Actor cutover, getOrCreate
instead of unique index, displayName caching)
* Thin spots worth revisiting (avatar not on Actor, missing token
counter for budget, no missions list on agent detail, no
drag-reassign, scene binding doesn't drive filters yet)
* Five deferred follow-up projects (team features, agent memory
self-update, agent-to-agent messaging, meta-planner, per-agent
encryption domains)
docs/plans/team-workbench.md (NEW):
- Full forward-looking plan for the deferred Team-Workbench.
- Two use-cases (human multi-user vs multi-agent sharing team
context) with the observation that they share the same infra.
- Decision candidates table (still open — meant as T0 RFC fodder,
not baked in).
- Architecture sketch with data-model deltas over the current
single-user shape.
- Encryption subsection dedicated to the hardest problems: team-key
wrapping per member (reuses Mission-Grant pattern), member-removal
rotation (lazy vs eager), Zero-Knowledge-mode incompatibility.
- T0..T6 phasing (~7 weeks for a clean first-pass).
- Section "Wie Multi-Agent dafür den Weg geebnet hat" enumerating
the four invariants the shipped Phase 0-7 deliberately preserved
to make this plan cheap when it lands.
docs/plans/README.md (NEW):
- Index doc with the AI/Workbench roadmap as an ASCII flow so future
contributors can locate themselves in the sequence without reading
three 400-line plans first.
docs/future/AI_AGENTS_IDEAS.md:
- Header marks Point 1 (encrypted tables) as shipped via the Mission
Grant plan; points 2-8 stay relevant. Cross-link to all three plan
docs so this stays the go-to backlog.
services/mana-ai/CLAUDE.md:
- Design-context header expanded to link to all four related docs
(arch §20-22, both shipped plans, forward team plan, ideas backlog).
No code changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 6 — Multi-Agent observability:
- AI Workbench timeline gets a per-agent filter (dropdown with avatars)
alongside module + mission. TimelineBucket gains agentId +
agentDisplayName, projected off the bucket's first AI actor.
- Bucket header now leads with the agent's avatar + name (lookup via
the live useAgents query so renamed agents reflect instantly) and
falls back to Actor.displayName for deleted agents.
- AiProposalInbox card header replaces the generic Sparkle + "KI
schlägt vor" with an agent chip "🤖 Cashflow Watcher schlägt vor"
using the cached Actor.displayName. Ghost-agent label preserved
via the cached displayName even when the agent record is gone.
Phase 7 — Docs:
- docs/architecture/COMPANION_BRAIN_ARCHITECTURE.md §22 added:
data model, identity flow, tick gate order, Scene-Agent binding
semantics, non-goals.
- services/mana-ai/CLAUDE.md status bumped to v0.5 (Multi-Agent
Workbench) with the per-agent runner features + metrics listed.
- apps/mana/CLAUDE.md AI Workbench section rewritten to cover the
Agent primitive, per-agent policy, scene lens, and the updated
timeline header.
Multi-Agent rollout is code-complete end-to-end:
Phase 0 Plan ✓ Phase 4 Policy-per-agent ✓
Phase 1 Actor identity ✓ Phase 5 Agent UI + Scene lens ✓
Phase 2 Agent CRUD ✓ Phase 6 Observability ✓
Phase 3 Tick agent-aware ✓ Phase 7 Docs ✓
Tests: webapp svelte-check 0 errors, 0 warnings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Until now AiPolicy lived as a user-global setting consulted for every
AI action. With agents as the principal unit of AI behavior, policy
belongs on the agent — different agents can be aggressive about tasks
but conservative about calendar edits, etc.
Webapp (tools/executor.ts):
- When an AI actor invokes a tool, the executor looks up the owning
agent via getAgent(actor.principalId) and passes agent.policy into
resolvePolicy. Falls back to DEFAULT_AI_POLICY when the agent record
is missing (legacy write, deleted agent, race) so no tool call can
silently bypass the propose/deny path.
- resolvePolicy already accepted an optional policy arg, so the call
site change is a single line plus the agent load.
Server (mana-ai):
- ServerAgent gains an optional policy field, projected off the same
plaintext JSONB that the webapp writes.
- Tick loop filters AI_AVAILABLE_TOOLS through filterToolsByAgentPolicy
before passing them to the planner prompt. Resolution order mirrors
the webapp: tools[name] → defaultsByModule → defaultForAi; 'deny'
drops the tool so the LLM never even sees it.
Phase 5 will surface a per-agent policy editor on the agent-detail
UI. Until then all agents inherit DEFAULT_AI_POLICY (baked in during
createAgent), which means no behavior change for existing users —
every tool that was 'propose' before is still 'propose' now, just
reached via agent.policy instead of the user-level singleton.
Tests: mana-ai 41/41, webapp svelte-check clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Third phase of the Multi-Agent Workbench. The background mission
runner now respects the owning Agent: agent state gates whether
a mission runs, concurrency is capped per-agent, and server-produced
iterations carry the agent's identity as their Actor.
Data layer:
- db/migrate.ts: new mana_ai.agent_snapshots table (mirrors
mission_snapshots) with indexes on (user_id, last_applied_at) and
a partial index on active agents.
- db/agents-projection.ts: refreshAgentSnapshots (incremental LWW
replay over sync_changes appId='ai' table='agents') +
loadActiveAgents / loadAgent helpers. mergeRaw exported for tests.
- db/missions-projection.ts: ServerMission.agentId + projection
reads the JSONB field (undefined for legacy missions).
Tick integration (cron/tick.ts):
- Refreshes both snapshot tables on every pass (parallel).
- Per-user in-tick agent cache (Map<userId, Map<agentId, Agent>>)
so N missions for one user hit the DB once.
- Gate order: agent archived → skip silently; agent paused → skip;
per-agent maxConcurrentMissions exhausted this tick → defer to next.
All skip paths bump mana_ai_agent_decisions_total{decision}.
- Prompt injection: withAgentContext prepends an <agent_context>
block to the system prompt with the agent's name + role, and
plaintext systemPrompt + memory when available. Ciphertext
(enc:1:… blobs) are skipped — server has no key by design. Mirrors
the Mission Grant privacy stance: encrypted context belongs to the
foreground runner.
Iteration writer (db/iteration-writer.ts):
- New optional `agent` + `iterationId` + `rationale` inputs.
- When agent is present, the sync_changes row is stamped with a
makeAgentActor actor (principalId=agentId, displayName=agent.name)
so the webapp timeline groups the write under the right agent.
- Falls back to an AI actor with LEGACY_AI_PRINCIPAL + 'Mana' when
the mission has no owning agent; ultimate fallback to the
mission-runner system actor when iterationId is also missing.
Metrics:
- mana_ai_agent_decisions_total{decision=ran|skipped-paused|
skipped-archived|skipped-concurrency}. Missions without an agent
don't produce this metric — plansWrittenBackTotal is the universal
"did we run" counter.
Tests: 41/41 (was 35) including 6 new cases for the agent LWW merge.
mana-ai type-check clean. Webapp svelte-check: 0 errors (4 unrelated
warnings in a different module).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Foundation for the Multi-Agent Workbench roadmap
(docs/plans/multi-agent-workbench.md). Every event, record, and
sync_changes row now carries a principal identity + cached display
name in addition to the three-kind discriminator.
Shape change (source of truth in @mana/shared-ai):
Before: { kind: 'user' | 'ai' | 'system', ...kind-specific fields }
After: discriminated union on kind, with
- common: principalId, displayName
- 'user': principalId = userId
- 'ai': principalId = agentId + missionId/iterationId/rationale
- 'system': principalId = one of SYSTEM_* sentinel strings
('system:projection', 'system:mission-runner', etc.)
Key design calls (from the plan's Q&A):
- System sub-sources get distinct principalIds (not a shared 'system'
bucket) — lets Workbench filter + revert distinguish projection
writes from migration writes from server-iteration writes
- displayName cached on the record so renaming an agent doesn't
rewrite history
- normalizeActor() compat shim fills principalId/displayName on
legacy rows with 'legacy:*' sentinels so historical events never
crash the timeline
New exports:
- BaseActor / UserActor / AiActor / SystemActor (narrowed types)
- makeUserActor, makeAgentActor, makeSystemActor (factories with
typed return)
- SYSTEM_PROJECTION, SYSTEM_RULE, SYSTEM_MIGRATION, SYSTEM_STREAM,
SYSTEM_MISSION_RUNNER (principalId constants)
- LEGACY_USER_PRINCIPAL, LEGACY_AI_PRINCIPAL, LEGACY_SYSTEM_PRINCIPAL
- isUserActor / isFromMissionRunner predicates
Webapp:
- data/events/actor.ts now re-exports from shared-ai, keeps runtime
ambient-context (runAs, getCurrentActor) local
- bindDefaultUser(userId, displayName) lets the auth layer replace
the legacy placeholder with the real logged-in user actor at login
- Mission runner + server-iteration-staging stamp LEGACY_AI_PRINCIPAL
as the agentId placeholder — Phase 2 will thread the real agent
- Streaks projection uses makeSystemActor(SYSTEM_PROJECTION)
- All test fixtures migrated to factories
Service:
- mana-ai/db/iteration-writer.ts stamps makeSystemActor(
SYSTEM_MISSION_RUNNER) instead of the old { kind:'system',
source:'mission-runner' } shape. Phase 3 will switch this to an
agent actor per mission.
Tests: 26 shared-ai + 21 webapp vitest + 35 mana-ai — all green.
svelte-check: 0 errors, 0 warnings.
No behavior change; purely a type + shape upgrade. Old sync_changes
rows parse via the normalizeActor compat shim at read time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mission objectives matching /recherch|research|news|finde|suche|aktuelle|neueste/i
trigger a synchronous deep-research call (mana-search + mana-llm via the
existing /api/v1/research/start-sync pipeline) before the planner runs;
the summary plus top-8 source URLs are injected as a synthetic ResolvedInput
so the planner can stage save_news_article proposals against real URLs.
The kontext singleton is auto-attached to every mission's planner input
(decrypted client-side, gated on non-empty content + not already linked).
save_news_article is a new proposable tool routed through articlesStore
.saveFromUrl (Readability via /api/v1/news/extract/save). AiProposalInbox
mounted on /news so the user can approve/reject inline. mana-ai planner
tool list mirrors the new tool to keep the boot-time drift guard happy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
shared-hono declares @mana/shared-logger as a workspace dep. Without
that package in the installer stage, Bun fails at runtime with ENOENT
reading /app/packages/shared-hono/node_modules/@mana/shared-logger.
Caught when mana-ai crash-looped on first boot.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire the Mission Key-Grant feature into the production Mac Mini
compose stack so mana-ai can boot and mana-auth can mint grants.
- New mana-ai service block (port 3066) — 256m mem limit, depends on
postgres + mana-llm, tick interval configurable via
MANA_AI_TICK_INTERVAL_MS / MANA_AI_TICK_ENABLED. Pulls
MANA_AI_PRIVATE_KEY_PEM from env; absent = grants silently disabled.
- mana-auth environment gains MANA_AI_PUBLIC_KEY_PEM (default empty
so existing deployments without the keypair degrade to 503
GRANT_NOT_CONFIGURED rather than failing to boot).
- mana-auth Dockerfile rewritten to the two-stage pnpm+bun pattern
used by mana-credits/mana-events — required now that mana-auth has
a @mana/shared-ai workspace dep. The previous single-stage
Dockerfile with service-scoped build context couldn't resolve any
@mana/* imports; that only worked historically because it fell
through at runtime via a pre-built layer.
- mana-ai Dockerfile copies packages/shared-ai into the installer
stage alongside shared-hono.
The build contexts for mana-auth flip from services/mana-auth to the
repo root. Existing CI/CD paths (scripts/mac-mini/build-app.sh) pass
through to docker compose build and pick up the new context
automatically — no script edits needed.
Flip-on procedure: on the Mac Mini, set MANA_AI_PUBLIC_KEY_PEM +
MANA_AI_PRIVATE_KEY_PEM in .env (already done, see
secrets/mana-ai/README.md on the host), then rebuild mana-auth +
build mana-ai.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 3 — user-facing side of the Mission Key-Grant rollout. Users
can now opt into server-side execution, revoke it, and inspect every
decrypt the runner has performed.
Webapp:
- MissionGrantDialog explains the scope (record count, tables, TTL,
audit visibility, revocation) and calls requestMissionGrant. Error
paths render distinctly for ZK, not-configured, missing vault.
- Mission detail shows a Server-Zugriff box with status pill
(aktiv/abgelaufen/nicht erteilt), Neu-erteilen + Zurückziehen
buttons. Only renders for missions with at least one encrypted-
table input.
- store.ts: setMissionGrant / revokeMissionGrant helpers, Proxy-
stripped like the rest of the store's writes.
- Workbench adds a Timeline/Datenzugriff tab switch. Audit tab queries
the new GET /api/v1/me/ai-audit endpoint, renders decrypt events
with color-coded status pills (ok/failed/scope-violation) and
stable reason strings.
- getManaAiUrl() added to api/config for the audit fetch.
mana-ai:
- GET /api/v1/me/ai-audit (JWT-gated via shared-hono authMiddleware)
backed by readDecryptAudit() — withUser + RLS double-gate so a user
can only read their own rows.
- Limit capped at 1000, newest-first.
Missions without a grant continue to work exactly as before; the
grant UI is purely additive.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 of Mission Key-Grant. The tick loop now honours a mission's
grant by unwrapping the MDK and passing it + the record allowlist into
the resolvers. Encrypted modules (notes, tasks, calendar, journal,
kontext) resolve server-side instead of returning null.
- crypto/decrypt-value.ts: mirror of webapp AES-GCM wire format
(enc:1:<iv>.<ct>) — read-only, server never wraps
- db/resolvers/encrypted.ts: factory + 5 concrete resolvers. Scope-
violation bumps a metric + writes a structured audit row, decrypt
failures same. Zero-decrypt (no grant, or record absent) = silent
null, no audit noise.
- db/audit.ts: best-effort append to mana_ai.decrypt_audit; write
failures never cascade into tick failures.
- cron/tick.ts: buildResolverContext unwraps grant per mission; MDK
reference only lives for the scope of planOneMission.
- ResolverContext plumbed through resolveServerInputs; existing goals
resolver unchanged semantically.
- Metrics: mana_ai_decrypts_total{table}, mana_ai_grant_skips_total
{reason}, mana_ai_grant_scope_violations_total{table} (alert > 0).
Missions without a grant still run exactly as before — plaintext
resolvers fire, encrypted ones short-circuit to null. No behaviour
regression for existing users.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1 of the Mission Key-Grant rollout. Webapp can now request a
wrapped per-mission data key; mana-ai can unwrap and (Phase 2) use it.
mana-auth:
- POST /api/v1/me/ai-mission-grant — HKDF-derives MDK from the user
master key, RSA-OAEP-2048-wraps with the mana-ai public key, returns
{ wrappedKey, derivation, issuedAt, expiresAt }
- MissionGrantService refuses zero-knowledge users (409 ZK_ACTIVE) and
returns 503 GRANT_NOT_CONFIGURED when MANA_AI_PUBLIC_KEY_PEM is unset
- TTL clamped to [1h, 30d]
mana-ai:
- configureMissionGrantKey + unwrapMissionGrant with structured failure
reasons (not-configured / expired / malformed / wrap-rejected)
- mana_ai.decrypt_audit table + RLS policy scoped to
app.current_user_id — append-only row per server-side decrypt attempt
- MANA_AI_PRIVATE_KEY_PEM env slot; absent = grants silently disabled
No existing behaviour changes: missions without a grant run exactly as
before. Grant flow is wired end-to-end but unused until Phase 2 lands
the encrypted resolver.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires mana-ai into the existing observability stack so tick throughput,
plan-failure rates, planner latencies, and snapshot refresh health are
visible in Grafana + Prometheus, and the service's uptime surfaces on
status.mana.how under the "Internal" section.
- `src/metrics.ts` — prom-client Registry with `mana_ai_` prefix.
Counters: ticks_total, plans_produced_total, plans_written_back_total,
parse_failures_total, mission_errors_total, snapshots_new/updated,
snapshot_rows_applied_total, http_requests_total.
Histograms: tick_duration_seconds (0.1–120s), planner_request_
duration_seconds (0.25–60s), http_request_duration_seconds (0.005–10s).
- `src/index.ts` — HTTP middleware labels every request by
method/path/status; `/metrics` serves the Prometheus text format.
- `src/cron/tick.ts` — increments counters + wraps the tick with
`tickDuration.startTimer()`. Snapshot stats fold through.
- `src/planner/client.ts` — wraps `complete()` in a latency histogram
timer so planner tail latency shows up separately from tick duration.
- `docker/prometheus/prometheus.yml` —
1. New `mana-ai` scrape job against `mana-ai:3066/metrics` (30s).
2. `/health` added to the `blackbox-internal` job so uptime shows on
status.mana.how alongside mana-geocoding.
- `scripts/generate-status-page.sh` — friendly label for the new probe:
`mana-ai:3066/health` → "Mana AI Runner" (generator already iterates
`blackbox-internal`, no other changes needed).
- `package.json` — prom-client ^15.1.3
All 17 Bun tests still pass; tsc clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the O(N sync_changes) LWW replay in every tick with an
incremental snapshot table refresh. Each tick now applies only the
delta since the last run, then runs a single indexed SELECT on the
snapshot table to find due missions.
- `db/migrate.ts` — idempotent migration. Creates `mana_ai` schema and
`mana_ai.mission_snapshots` table on boot. Partial index on
active+nextRunAt powers the tick's "due" query.
- `db/snapshot-refresh.ts`
- `refreshSnapshots(sql)` one-pass: joins sync_changes and snapshots
on (user_id, mission_id), picks out pairs whose source max
created_at exceeds the snapshot cursor. Per-pair refresh wrapped
in `withUser` for RLS scoping on the source SELECT.
- Bootstrap: missing snapshot rows seed from a full replay of their
mission's history; subsequent ticks apply only the delta.
- Delete tombstones purge the snapshot row.
- `db/missions-projection.ts` `listDueMissions` — single SELECT against
`mana_ai.mission_snapshots` with an indexed WHERE. Dropped the legacy
cross-user scan + per-user two-phase read (unused now). `mergeAndFilter`
stays for its existing test coverage.
- `cron/tick.ts` calls `refreshSnapshots` before `listDueMissions` and
logs when the refresh actually applied rows. No behaviour change
externally.
- `index.ts` awaits `migrate()` on boot (top-level `await` — Bun
supports it natively).
Closes the last item on the AI-Workbench roadmap's "future work" list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes the "cross-user scan" caveat on the mission read path. The
earlier implementation pulled every aiMissions row server-wide and
partitioned by user_id in memory — fine for a pre-launch single-user
deploy, not a cross-user infrastructure.
New flow:
1. `listMissionUsers(sql)` — one cross-user DISTINCT query. This is
the ONLY surface that still reads across users; documented as
requiring BYPASSRLS on the service's DB role (or ownership without
FORCE).
2. `listDueMissionsForUser(sql, userId, now)` — RLS-scoped via
`withUser(sql, userId, tx => ...)` just like the write path in
`iteration-writer.ts`. Defense-in-depth: even if the SELECT mis-
filters, RLS drops any row whose user_id doesn't match the session
setting.
3. `listDueMissions(sql, now)` — two-phase composition of the above.
The LWW merge + due-filter logic is factored out into a pure
`mergeAndFilter(rows, userId, now)`. Fully unit-tested (6 Bun cases):
active-due happy-path, future nextRunAt, non-active state, delete
tombstone, multi-row LWW merge, userId stamping.
Matches the pattern already in use for writes (`db/connection.ts:withUser`
+ `db/iteration-writer.ts`). Docstring on `listMissionUsers` spells out
the remaining BYPASSRLS dependency so ops knows what role the service
needs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes the webapp's AI policy and the server's tool allow-list physically
impossible to drift. Adds the missing entries the guard caught on first
run: `complete_tasks_by_title`, `visit_place`, `undo_drink` now have
parameter schemas server-side too.
- `packages/shared-ai/src/policy/proposable-tools.ts`
- `AI_PROPOSABLE_TOOL_NAMES` as `const` array + literal union type
- `AI_PROPOSABLE_TOOL_SET` for set-membership checks
- Webapp `DEFAULT_AI_POLICY` derives its `propose` entries from the
shared list via `Object.fromEntries(...)` — adding a tool there is now
a one-line change in `@mana/shared-ai`
- mana-ai `AI_AVAILABLE_TOOLS`: module-load assertion compares its
hardcoded names against `AI_PROPOSABLE_TOOL_SET` and throws with a
pointed error on drift (extras in one direction, missing in the
other). Service refuses to start on mismatch — better than silent
degradation.
- Bun test (`tools.test.ts`) runs the same contract plus sanity checks
(non-empty description, required params carry docs). Vitest policy
test adds the symmetric check on the webapp side.
All three runtimes now green: webapp 66/66, shared-ai 2/2,
mana-ai 9/9 Bun tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plugs plaintext-safe Mission context into the Planner prompt per tick.
Before this, `resolvedInputs: []` was always passed — the LLM only saw
the mission's concept + objective. Now goals (the only plaintext
category of linked inputs today) resolve and land in the prompt.
Privacy constraint is explicit and documented: tables in the webapp's
encryption registry (notes, kontext, journal, dreams, …) arrive at
`sync_changes.data` as ciphertext — the master key lives in mana-auth
KEK-wrapped and never reaches this service. Resolvers for encrypted
modules therefore don't exist server-side; missions referencing them
should use the foreground runner which decrypts client-side.
- `db/resolvers/types.ts` — ServerInputResolver contract
- `db/resolvers/record-replay.ts` — single-record LWW replay
(tighter WHERE than `missions-projection.ts`, used by all resolvers)
- `db/resolvers/goals.ts` — reads `companionGoals` via replayRecord,
mirrors the webapp's default goalsResolver output shape
- `db/resolvers/index.ts` — registry with `registerServerResolver` /
`unregisterServerResolver` / `resolveServerInputs`. Seeds `goals`.
Drift-tolerant: missions pointing at unregistered modules silently
skip those inputs.
- `cron/tick.ts` — wires `resolveServerInputs(sql, m.inputs, m.userId)`
into the planner input; updates the outdated "stubbed" comment
5 Bun tests over the registry (handled + unhandled + thrown +
mixed cases + seeded default).
Future: expand to plaintext tables if/when more land (habits without
free-text, dashboard configs, tags), or introduce a decrypt-via-auth
sidecar if users opt into server-side access to encrypted content.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes the off-tab AI pipeline. mana-ai now writes produced plans
back to `sync_changes` as a server-sourced Mission iteration; the webapp
picks it up on next sync and translates each PlanStep into a local
Proposal via the existing createProposal flow. User sees the resulting
ghost cards in the matching module's AiProposalInbox with full mission
attribution.
Server (mana-ai v0.3):
- `db/connection.ts` — `withUser(sql, userId, fn)` RLS-scoped tx helper
mirroring the Go `withUser` pattern (SET LOCAL app.current_user_id)
- `db/iteration-writer.ts`
- `planToIteration(plan, id, now)` — shared-ai AiPlanOutput → inline
MissionIteration with `source: 'server'` + status='awaiting-review'
- `appendServerIteration(sql, input)` — INSERT sync_changes row with
op=update, data={iterations: [...]} + field_timestamps + actor
JSONB={kind:'system', source:'mission-runner'}
- `cron/tick.ts` — after parse success: build iteration, append to
mission.iterations, persist via appendServerIteration. Stats now
include `plansWrittenBack`.
Actor union:
- `packages/shared-ai/src/actor.ts` + webapp actor: `system.source` gains
`'mission-runner'` so the server's own writes are attributed correctly
and distinguishable from projection/rule writes
Webapp:
- `data/ai/missions/server-iteration-staging.ts`
- `startServerIterationStaging()` subscribes to aiMissions via Dexie
liveQuery; on each Mission update, walks iterations looking for
`source='server'` entries that haven't been staged yet
- For each such iteration: creates a Proposal per PlanStep under
`{kind:'ai', missionId, iterationId, rationale}` so policy + hooks
fire correctly
- Writes proposalIds back into plan[].proposalId + status='staged' so
other tabs and app restarts skip re-staging
- Idempotent: in-memory `processedIterations` Set + durable
proposalId marker
- Wired into (app)/+layout.svelte alongside startMissionTick
- 3 unit tests: translate server iteration → proposal, skip
already-staged, ignore browser iterations
Full pipeline now: user creates Mission in /companion/missions →
mana-ai tick picks it up → calls mana-llm → parses plan →
writes iteration → synced to webapp → staging effect creates
proposals → user approves in /todo (or any module) → task lands with
`{actor: ai, missionId, iterationId, rationale}` attribution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>