Documents the SQL that was applied manually to match the personas.ts
Drizzle schema introduced in 493db0c3b. Idempotent. See
docs/plans/mana-mcp-and-personas.md for the design. Required because
the spaces tables created alongside personas sit outside the auth
schemaFilter, and pre-existing public enums would otherwise trip
drizzle-kit push (resolved separately in migration 006).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the loop on M2: when the compactor fires, the LLM needs to know
it's now seeing a <compact-summary> instead of raw turns so it
doesn't waste a turn asking about lost details or re-executing tools
whose responses are gone.
shared-ai:
- LoopState grows `compactionsDone: number` (cap-1 by current loop
policy, but shape kept as count for future multi-compact cycles).
- runPlannerLoop populates it on each reminder-channel call. New
loop test asserts [0, 1] sequence: round 1 before compaction,
round 2 after.
mana-ai:
- New producer `compactedReminder` — fires severity=info when
compactionsDone >= 1, wrapped in a German one-liner ("frag nicht
nach verlorenen Details").
- Injected FIRST in buildReminderChannel so the LLM frames the rest
of the round with "I'm looking at a summary" context. Metric
surface stays `{producer='compacted', severity='info'}`.
4 new reminder tests (3 pure producer + 1 composition-ordering) +
1 loop-wiring test. 77 shared-ai, 20 reminders.test.ts — green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two loose ends from M3/M4:
1. Tool_use_id-based error attribution in the persona-runner
-----------------------------------------------------------
The previous collectActionsFromMessage() flipped the *most recent*
ActionRow to 'error' when a tool_result carried is_error:true. That was
fine as long as Claude invoked tools strictly in sequence, but when
the planner pipelines multiple tools in one turn, a later tool_result
carries an earlier tool_use_id — the last-action fallback mis-
attributes the error.
runMainTurn() now keeps a tool_use_id → action-index Map for the
duration of the tick. On tool_use we stash block.id, on tool_result we
look up the exact ActionRow via tool_use_id and flip that one. The
"flip last" path survives as a pure fallback if a future SDK ever
ships a block without an id.
2. New audit:encrypted-tools script
-----------------------------------
scripts/audit-encrypted-tools.ts — loads registerAllModules() and
apps/mana/…/crypto/registry.ts, diffs every ToolSpec.encryptedFields
against the authoritative web-app ENCRYPTION_REGISTRY.
Catches three classes of drift:
- missing-table : tool declares a table the web-app doesn't encrypt
- field-drift : both agree a table is encrypted but the field lists
differ (half-encryption in the wire is silent death)
- disabled : web-app has enabled:false while the tool still
encrypts — advisory warning, not a fail
Negative-tested by injecting a deliberate drift on todo.create +
todo.list (shortened ENCRYPTED_FIELDS to ['title']); the auditor
flagged both tools with full field diffs, restore returned to green.
Wired into `pnpm run validate:all` so the contract survives future
edits on either side. Fills the M4 audit gap noted in
project_mana_mcp_personas.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Claude-Code wU2 pattern goes live. Every mission run now passes a
compactor into runPlannerLoop that will fire once if cumulative token
usage crosses 92% of MANA_AI_COMPACT_MAX_CTX (default 1_000_000, the
gemini-2.5-flash ceiling). Override via env for deployments on smaller
models; set to 0 to disable entirely.
The compactor reuses the planner's own LlmClient + gemini-2.5-flash
model for now. When mana-llm grows a Haiku tier we'll route the
compactor there — it's pure summarisation and a cheaper model saves
tokens exactly where they matter.
New metrics:
- mana_ai_compactions_triggered_total — counter, one per firing
- mana_ai_compacted_turns — histogram, how many middle turns got
folded each time (< 3 ⇒ maxCtx is probably misconfigured)
Logs print a 60-char tail of the summary.goal so the "what was this
mission doing again" question survives a compaction.
No new tests here — compactHistory and the loop wiring are already
covered by the 22 tests in shared-ai (M2.1 + M2.2). The 57 existing
mana-ai bun tests stay green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Superseded by the top-level docker-compose.dev.yml (which defines
searxng + redis as part of the unified dev stack via `pnpm docker:up`).
This per-service file was an artefact from before the unified setup
and no script / doc / README still references it.
An orphan `mana-searxng-dev` + `mana-search-redis-dev` had been running
from this file for ~2 weeks, squatting on the host's port 8080. Every
first `pnpm dev:mana:all` after a cold machine start would fail with
Bind for 0.0.0.0:8080 failed: port is already allocated
because the top-level compose's `mana-searxng` service couldn't take
8080 while the orphan held it. The second invocation silently
"worked" — docker saw the freshly-created mana-searxng container and
skipped the bind step on the idempotent up, leaving it healthy but
only reachable inside the docker network (8080/tcp, no external
publish).
Cleanup already done out-of-band:
docker compose -f services/mana-search/docker-compose.dev.yml down
docker compose -f docker-compose.dev.yml up -d --force-recreate searxng
Deleting the file so a stale `docker compose -f …/mana-search/dev.yml up`
can't resurrect the orphan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Producers now return structured {producer, severity, text} objects
instead of raw strings. buildReminderChannel collects them, increments
mana_ai_reminders_emitted_total{producer, severity} per emission, and
maps back to strings for the shared-ai loop input.
Why structured: the Prometheus label "severity" lets dashboards split
75-99% token-budget warnings (severity=warn) from 100%+ escalations
(severity=escalate) without NLP on the reminder text. Adding a new
producer that emits only info-level state (e.g. stale-sync warning)
falls out for free.
Active producer labels today:
- token-budget (warn, escalate)
- retry-loop (warn)
With this plus the scrape job (d087b4744), we can finally answer:
"does the budget warning actually change LLM behaviour?" — correlate
reminders_emitted_total{producer='token-budget'} with
tick_duration_seconds or planner_rounds_histogram.
3 tests updated to assert the new {producer, severity, text} shape
(16 reminder tests total, all green).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends LoopState with a sliding window of the last N ExecutedCalls
(oldest-first), capped at LOOP_STATE_RECENT_CALLS_WINDOW = 5. The loop
maintains the window automatically; reminderChannel producers read it
without touching internal state.
This activates retryLoopReminder which was shape-only in faa472be9.
The guard now fires end-to-end: when round >= 3 and the tail-2 calls
both returned success:false, the LLM sees a "stop retrying, write a
summary instead" <reminder> on the next turn. The tail-2 check rather
than window-wide is deliberate — a flaky run with intermittent success
(F, F, F, OK, F) is not a retry loop, just flaky tools.
Why window=5: retry loops usually manifest within 2-3 consecutive
rounds; a 5-deep window gives room for burst-detection and
stale-tool heuristics without bloating the reminder channel. Cap
keeps the reminder producers O(5) regardless of loop length.
Tests: 3 new (sliding-window cap + slide + order in shared-ai, retry
composition + budget+retry chain + tail-only heuristic in mana-ai).
Total agent-loop tests now 74 across both packages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mana-mcp:
- Policy-gate section: POLICY_MODE semantics, the four decision
rules, where to find soak metrics during log-only burn-in.
- /metrics section pointing at the Prometheus job.
mana-ai:
- New v0.8 status block: reminderChannel wiring, the two live
producers (tokenBudgetReminder active, retryLoopReminder dormant
pending LoopState extension), why POLICY_MODE here is limited to
freetext inspection, why parallel-reads have no effect until the
tool-registry absorbs the full AI_TOOL_CATALOG (M4 of personas).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the stub /metrics endpoint with a real prom-client registry
(mana_mcp_ prefix, {service="mana-mcp"} default label). Default
process metrics come along for free.
Policy-gate telemetry is the whole point — without it we can't soak
POLICY_MODE=log-only safely or decide when to flip to enforce. New
counter mana_mcp_policy_decisions_total{decision, reason, mode} buckets
every evaluatePolicy() call:
decision ∈ {allow, deny, flagged}
reason ∈ {admin-scope-not-invokable, destructive-not-allowed,
rate-limit-exceeded, injection-marker, clean, unknown}
mode ∈ {log-only, enforce}
So the rate of "would have been denied" during soak is visible directly
as policy_decisions_total{decision="deny", mode="log-only"}.
Also:
- mana_mcp_tool_invocations_total{tool, outcome} — success |
handler-error | input-invalid. Policy denies are NOT counted here
(they're in policy_decisions_total above); this counter only counts
calls that actually reached the handler or tripped zod validation.
- mana_mcp_tool_duration_seconds histogram per tool/outcome.
Dep: prom-client ^15.1.3 (same version mana-ai pins).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.
Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.
Internal endpoints (mana-auth, service-key-gated)
- GET /api/v1/internal/personas/due
Returns personas whose tickCadence + lastActiveAt say they're
due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
NULLS FIRST so never-run personas go ahead of stale ones.
- POST /api/v1/internal/personas/:id/actions
Batch ≤ 500. Row ids are deterministic
`${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
runner can retry a tick without doubling audit rows. Also
bumps personas.last_active_at so the next /due call sees it.
- POST /api/v1/internal/personas/:id/feedback
Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
one rating per module per tick.
Runner tick pipeline (services/mana-persona-runner/src/runner/)
- claude-session.ts
Two phases per tick. runMainTurn feeds the persona's system
prompt + a German "simulate a day" user prompt to Claude Agent
SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
server. We iterate the returned AsyncGenerator and extract
tool_use blocks into ActionRows; a tool_result with
is_error=true flips the most recent action. runRatingTurn is a
fresh query() with tools:[] asking Claude in character to rate
each used module 1-5 as strict JSON. We parse with tolerance
for whitespace / fences. Unparseable output becomes a synthetic
'__parse' feedback row so operators see the failure.
- tick.ts
Orchestrator. Skips when config.paused. Fetches /due, processes
in batches of config.concurrency via Promise.allSettled so a
single persona failure never kills the batch. Returns
{due, ranSuccessfully, failed[], durationMs}.
- types.ts
ActionRow + FeedbackRow shapes shared between claude-session
and the internal client.
Runner bootstrap (src/index.ts)
- setInterval(config.tickIntervalMs) starts the tick loop on boot.
tickInFlight guards against overlap when Claude latency >
interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
loop is disabled with a warn line — /health + /diag/login still
work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.
Client
- clients/mana-auth-internal.ts
X-Service-Key client for the three endpoints above.
Constructor throws on empty serviceKey — fail loud.
Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.
M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.
M2.d (cross-space family/team memberships) still deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First concrete piece of M3 (docs/plans/mana-mcp-and-personas.md). The
tick loop itself and the Claude Agent SDK + MCP integration are M3.b;
the action/feedback persistence endpoints are M3.c. This commit just
stands up the service so the remaining pieces have a shell to land in.
Service shape (Bun/Hono on :3070)
- src/config.ts
Env-driven configuration: auth URL, MCP URL, service key for
action/feedback callbacks (M3.c), Anthropic API key, deterministic
PERSONA_SEED_SECRET (must match scripts/personas/password.ts so the
runner can log back in without any stored credentials), tick
interval and concurrency, RUNNER_PAUSED kill-switch. Production
start asserts all secrets are set and the dev fallback secret is
rotated.
- src/password.ts
Bit-for-bit identical HMAC-SHA256 password derivation to
scripts/personas/password.ts. Duplicated deliberately: the two
sides can't share code (one is a repo-root utility script, the
other is a workspace service) but must stay in sync — comment
at the top calls this out.
- src/clients/auth.ts
Two upstream calls the runner needs for one tick: POST /auth/login
and GET /api/auth/organization/list. loginAndResolvePersonalSpace()
wraps both and picks the persona's auto-created personal space as
the write target (throws if none exists — Spaces-Foundation should
always have seeded one on signup).
- src/index.ts
Hono app: /health, /metrics (stub), and a dev-only /diag/login
endpoint that takes a persona email, derives the password, logs
in, resolves the personal space, and returns {userId, spaceId} as
an end-to-end sanity check. Disabled in production.
No tick loop yet — RUNNER_PAUSED prints an info line on boot, but
nothing fires. The dispatcher + Claude Agent SDK + MCP client land in
M3.b; the internal POST callbacks into mana-auth for persona_actions /
persona_feedback land in M3.c.
Infra
- Port 3070 added to docs/PORT_SCHEMA.md.
- Service listed in root CLAUDE.md next to mana-mcp.
- services/mana-persona-runner/CLAUDE.md documents what's built today,
what lands in M3.b/c, and the local diag smoke recipe.
Boot smoke verified: /health returns ok + paused/interval/concurrency,
/diag/login without email returns 400.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the M1 reminderChannel into the mana-ai mission runner with two
initial producers in services/mana-ai/src/planner/reminders.ts:
- tokenBudgetReminder — warns at 75% of the agent's daily cap, emits a
stronger "wrap up NOW" message at/above 100%. Uses pretick usage +
accumulated round usage so the warning tracks drift during a long
plan.
- retryLoopReminder — shape is in place (round≥3 + last 2 failures),
currently limited to the single lastCall LoopState exposes. Extends
cleanly once LoopState carries the full failure window.
buildReminderChannel composes active producers; the tick hoists
pretickUsage24h so the channel has the baseline. Each round the loop
re-evaluates the producers, so usage drift across rounds surfaces on
the NEXT turn.
Also exports LoopState + ReminderChannel from @mana/shared-ai top-level
so consumers don't need to reach into /planner.
Tests: 13 new bun tests covering thresholds, pretick+round summing,
composition, and per-round re-evaluation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Claude-Code-inspired primitives for runPlannerLoop, derived from the
reverse-engineering reports in docs/reports/:
1. **Policy gate** (@mana/tool-registry) — evaluatePolicy() gates every tool
dispatch: denies admin-scope, denies destructive tools not in the user's
opt-in list, rate-limits per tool (30/60s default), flags prompt-injection
markers in freetext without blocking. Wired into mana-mcp with a
per-user rolling invocation log and POLICY_MODE env (off|log-only|enforce,
default log-only). mana-ai uses detectInjectionMarker only — tool dispatch
there is plan-only, so rate-limit/destructive checks don't apply yet.
2. **Reminder channel** (packages/shared-ai/src/planner/loop.ts) — new
reminderChannel callback in PlannerLoopInput. Called once per round with
LoopState snapshot (round, toolCallCount, usage, lastCall); returned
strings wrap in <reminder> tags and inject as transient system messages
into THIS LLM request only. Never pushed to messages[] — the Claude-Code
<system-reminder> pattern that keeps the KV-cache prefix stable.
3. **Parallel reads** (loop.ts) — isParallelSafe predicate enables
Promise.all dispatch when every tool_call in a round is parallel-safe,
in batches of PARALLEL_TOOL_BATCH_SIZE=10. Any non-safe call downgrades
the whole round to sequential. messages[] always appends in source
order, never completion order, so the debug log stays linear.
Default-off (undefined predicate) preserves pre-M1 behaviour.
Tests: 21 new in tool-registry (policy), 9 new in shared-ai (5 parallel,
4 reminder). All 74 green, type-check clean across 4 packages.
Design/plan: docs/plans/agent-loop-improvements-m1.md
Reports: docs/reports/claude-code-architecture.md,
docs/reports/mana-agent-improvements-from-claude-code.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continuation of docs/plans/mana-mcp-and-personas.md. Personas are the
auto-test users the M3 runner will drive — they're real Mana users
(kind='persona', tier='founder'), registered through the same Better
Auth pipeline as humans, just stamped differently and metadata-tracked
so the persona-runner knows how to role-play them.
Schemas (auth namespace — personas are 1:1 with users, no reason for a
separate platform.* schema that the plan originally sketched)
- userKindEnum ('human' | 'persona' | 'system') + users.kind column,
wired into better-auth additionalFields so the JWT/user object carry
the flag. Default 'human' keeps every existing user untouched.
- auth.personas — 1:1 descriptor (archetype, systemPrompt, moduleMix
jsonb, tickCadence, lastActiveAt). CASCADE from users.id.
- auth.persona_actions — tick-grouped audit of every tool call the
runner makes (toolName, inputHash for dedup, result, latency).
- auth.persona_feedback — structured 1-5 ratings per module per tick,
plus free-text notes. This is where the runner writes the
self-reflection step at end of each tick.
Admin endpoints (/api/v1/admin/personas, admin-tier-gated)
- POST / create-or-update by email. Uses auth.api.signUpEmail
if the user's new, then stamps kind+tier+verified
and upserts the personas row. Idempotent — safe to
re-run after catalog edits.
- GET / list with 7-day action count per persona.
- GET /:id detail + recent 20 actions + per-module feedback
aggregate.
- DELETE /:id hard delete. Refuses non-persona users as
defense-in-depth: an admin typo here would cascade
through the full user-delete chain.
Catalog + seed pipeline (scripts/personas/)
- catalog.json 10 handwritten personas spanning 7 archetypes
(adhd-student, ceo-busy, creative-parent, solo-dev,
researcher, freelancer, overwhelmed-newbie).
Five pairs of personas that will later share
family/team spaces (cross-space setup is deferred
to M2.d per the plan).
- catalog.ts zod-validated loader. Refines email to require
@mana.test TLD — non-existent, no bounce risk.
- password.ts deterministic HMAC-SHA256(PERSONA_SEED_SECRET,
email). No stored per-persona credentials; the
runner re-derives on every login. Refuses the
dev-fallback secret in production.
- seed.ts POST /admin/personas per catalog entry. Flags:
--auth=, --jwt=, --dry-run.
- cleanup.ts Hard-delete every live persona. Warns when the
live set drifts from the catalog.
Root package.json:
pnpm seed:personas
pnpm seed:personas:cleanup
Extends the ESLint root-ignore list with `scripts/**` so Bun-typed
utility scripts don't fail the typed-parser check they weren't opted
into. Consistent with the rest of scripts/ being .mjs+.sh.
To go live (user action):
pnpm docker:up
cd services/mana-auth && bun run db:push
export MANA_ADMIN_JWT=...
pnpm seed:personas
M2.d deferred: cross-space (family/team/practice) memberships between
persona pairs. Better Auth's org-invite flow is multi-step and would
roughly double the M2 scope; the persona-runner (M3) can operate in
personal spaces first, shared-space tests land as their own milestone.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for autonomous Claude-driven testing. Plan:
docs/plans/mana-mcp-and-personas.md.
New packages
- @mana/tool-registry — schema-first ToolSpec<InputSchema, OutputSchema>
with zod generics, scope ('user-space' | 'admin') and policyHint
('read' | 'write' | 'destructive'). sync-client helpers speak the
mana-sync push/pull protocol directly so RLS and field-level LWW are
preserved. MasterKeyClient fetches per-user MKs via the existing
mana-auth GET /api/v1/me/encryption-vault/key endpoint (JWT-gated,
ZK-aware, already audited) — no new service-key endpoint built.
ZeroKnowledgeUserError surfaced as a typed throw.
- @mana/shared-crypto — AES-GCM-256 primitives extracted from the web
app's $lib/data/crypto/aes.ts so the server-side tool handlers and the
browser produce byte-for-byte identical wire format
(enc:1:{b64(iv)}.{b64(ct)}). Web app aes.ts now re-exports from
shared-crypto — 5 existing importers unchanged, svelte-check stays
green.
New service
- services/mana-mcp (:3069, Bun/Hono) — MCP Streamable HTTP gateway.
JWKS auth against mana-auth, per-user session isolation (session-id
belongs to the user who opened it — cross-user access returns 403),
admin-scoped tools filtered out before registration. MasterKeyClient
cached per process with a 5-minute TTL.
11 tools registered
- habits.{create,list,update,archive}, spaces.list (plaintext, M1)
- todo.{create,list,complete}, notes.{create,search}, journal.add
(encrypted — field lists match
apps/mana/apps/web/src/lib/data/crypto/registry.ts verbatim)
Infra
- Port 3069 added to docs/PORT_SCHEMA.md
- services/mana-mcp/CLAUDE.md with architecture, auth model,
tool-authoring recipe, local smoke-test steps
- Root CLAUDE.md services list updated
Type-check green across shared-crypto, mana-tool-registry, mana-mcp.
svelte-check on apps/mana/apps/web stays at 0 errors / 0 warnings.
Boot smoke verified: /health returns registry.loaded=true, unauthed
/mcp → 401, invalid-JWT /mcp → 401 with descriptive message.
Decisions locked in for later milestones (per plan D1–D10):
- Personas will be real mana-auth users (users.kind='persona'), no
service-key bypass (D1, D2)
- Tool-registry is the SSOT; mana-ai and the legacy
apps/api/src/mcp/server.ts get merged into it in M4 (three current
parallel tool catalogs collapse to one)
- Persona-runner (:3070) will be a separate service using the Claude
Agent SDK + MCP client (D5)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mana-auth's package.json declares @mana/shared-types as a workspace
dependency, but the Dockerfile's install stage never copied its source
into the build context. pnpm then silently failed to create the
workspace symlink under node_modules, and bun hit ENOENT on every
import at runtime: "reading /app/services/mana-auth/node_modules/
@mana/shared-types".
The broken image sat undetected as long as the long-running container
didn't restart. Tonight's deploy recreated it and every mana-auth
container immediately crash-looped — taking mana-api and mana-web
down with it via depends_on.
Same class of bug as 70c62e758 (shared-logger).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the mana-sync event-stream export (GET /backup/export) with a
fully client-driven `.mana` v2 archive: webapp reads Dexie, decrypts
per-field, packages JSONL + manifest, optionally PBKDF2+AES-GCM seals
with a passphrase.
- New: backup/v2/{format,passphrase,export,import}.ts + format.test.ts
(10 tests: round-trip, sealed path, 3 failure modes incl. wrong-
passphrase vs. tamper distinction).
- UI: ExportImportPanel with module multi-select, optional passphrase,
progress + sealed-file detection — replaces the old backup flow in
Settings → MyData.
- Removes services/mana-sync/internal/backup/ and the corresponding
client helpers + v1 tests. No parallel paths, no legacy shim.
- Why client-driven: zero-knowledge users hold their vault key only
client-side, so a server exporter cannot produce plaintext archives;
GDPR Art. 20 portability is better served by plaintext-by-default.
- Cross-account restore works via re-encryption under the target
vault key (no MK transfer needed).
DATA_LAYER_AUDIT.md §8 rewritten to reflect the new architecture.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-commit of c413ab7dd (reverted in c31dcdd66) without the unrelated
files that accidentally got swept into the original stage. Parser
content is identical.
The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).
This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:
- status dispatch (queued, in_progress, failed, cancelled, incomplete)
- completed body concatenated across text items, skipping thought/image
- empty/missing `outputs` without crashing
- missing usage
- citations deduped by url, hostname extracted as title
- wrong-type annotations and those without url skipped
- real vertexaisearch redirect URLs Gemini emits
- fallback to url as title when the URL is unparseable
- trimming of leading/trailing whitespace
To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.
Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.
17 pass / 0 fail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).
This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:
- status dispatch (queued, in_progress, failed, cancelled, incomplete)
- completed body concatenated across text items, skipping thought/image
- empty/missing `outputs` without crashing
- missing usage
- citations deduped by url, hostname extracted as title
- wrong-type annotations and those without url skipped
- real vertexaisearch redirect URLs Gemini emits
- fallback to url as title when the URL is unparseable
- trimming of leading/trailing whitespace
To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.
Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.
17 pass / 0 fail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Opt-in path for missions that want Gemini Deep Research Max (up to 60 min
per task) instead of the shallow RSS pre-research. Because Max runs well
past a single 60-second tick, the state is carried across ticks:
tick N: submit → INSERT mission_research_jobs row → skip planner
tick N+k: poll → still running → skip planner (metric pending_skips)
tick N+m: poll → completed → inject as ResolvedInput, DELETE row, plan
- ManaResearchClient talks to mana-research's new internal
/v1/internal/research/async endpoints with X-Service-Key +
X-User-Id. Graceful-null on transport errors so a flaky
mana-research never crashes the tick loop.
- New table mana_ai.mission_research_jobs with PK (user_id, mission_id)
— presence is the "pending" flag; delete-on-terminal keeps queries
trivial.
- handleDeepResearch() encapsulates the state machine; planOneMission
now returns a discriminated union (planned | skipped | failed) so
"research pending" isn't miscounted as a parse failure.
- Opt-in at TWO gates to keep cost in check ($3–7/task, 1500 credits
per run):
1. MANA_AI_DEEP_RESEARCH_ENABLED=true server-side (default off)
2. DEEP_RESEARCH_TRIGGER regex matches the mission objective
(strict: "deep research", "tiefe recherche", "umfassende
recherche", "hintergrundrecherche", "deep dive")
Falls back to shallow RSS when either gate fails or the submit
errors upstream.
- Prom metrics: mana_ai_research_jobs_{submitted,completed,failed}_total
labelled by provider, plus _pending_skips_total.
- docker-compose wires MANA_RESEARCH_URL + the opt-in flag and adds
mana-research to depends_on.
- Full write-up with real API response shape (outputs plural, not
OpenAI-style), step-3 MCP-server plan (security-gated, not built),
ops + kill-switch: docs/reports/gemini-deep-research.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New providers gemini-deep-research + gemini-deep-research-max on the
Interactions API (preview-04-2026). Submit/poll split, tier parameter
selects between standard (~minutes, $1–3) and max (up to 60 min, $3–7).
- Parser matches the real response shape: flat `outputs` array of
thought|text|image items, url_citation annotations without title,
`usage.total_input_tokens` / `total_output_tokens`.
- Route generalisation: /v1/research/async accepts `provider` with
default 'openai-deep-research' (backward compatible) and dispatches
to the right submit/poll pair.
- New internal service-to-service endpoint /v1/internal/research/async
gated by X-Service-Key + X-User-Id for credit accounting. Enables
mana-ai to drive deep-research jobs on the mission owner's wallet
without requiring a user JWT.
- Pricing: 300 credits (standard) / 1500 credits (max). Conservative
markup over the ~$3/$7 ceiling so the first runs can't surprise us.
- Docs: AGENT_PROVIDER_IDS + pricing + env map + auto-router stay in
sync; CLAUDE.md Phase 3b now current; API_KEYS.md references the
new providers under GOOGLE_GENAI_API_KEY.
Verified with a real smoke test against the Gemini API: submit + poll
both succeed, completed response parsed cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
shared-types/src/index.ts re-exports with explicit .ts extensions
(Tailwind v4 module resolver needs them). TS 5.7 requires consumers
to opt in via allowImportingTsExtensions. The flag only type-checks
when noEmit:true; the NestJS builder also needs
rewriteRelativeImportExtensions so tsc still emits valid JS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes surfaced by the end-to-end smoke test.
1. broadcast-track.ts: inner route paths double-prefixed.
Routes were declared as '/track/open/:token' etc, then
mounted at '/api/v1/track', yielding '/api/v1/track/track/open/:token'
— every tracking endpoint returned 404. Dropped the redundant
'/track/' prefix so the full path is now
'/api/v1/track/{open,click,unsubscribe}/:token' as the
orchestrator + client both expect.
Verified with live curl:
- /track/open/BAD → 200 image/gif 42 bytes (graceful no-signal)
- /track/click/?url missing → 400 missing url
- /track/click?url=javascript: → 400 bad url
- /track/click?url=https://ok + bad token → 302 graceful
- /track/unsubscribe/BAD GET → 400 HTML
- /track/unsubscribe/BAD POST → 400 (RFC 8058)
2. shared-branding/tsconfig.json: allowImportingTsExtensions
missing. shared-types/src/index.ts uses explicit .ts
imports (intentional, for Tailwind's module resolver); any
downstream tsconfig without allowImportingTsExtensions emits
8 errors. shared-auth already had this fix — shared-branding
gets the same treatment. noEmit:true is set, so no rewrite
flag needed.
Verified: shared-branding pnpm check → 0 errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The three final pre-dogfood items:
1. drizzle.config: schemaFilter now includes 'broadcast' alongside
'mail'. Without this, `bun run db:push` skipped the broadcast
tables — schema existed in code but not in Postgres. Tested via
db:push + psql \dt (3 tables created: campaigns, events, sends).
2. .env.development: new MANA-MAIL SERVICE section with Stalwart
knobs + broadcast config (tracking secret, rate limits, send
throttle). DEV secret is explicitly labelled non-production —
prod rotates via env.
3. generate-env.mjs: new block writes services/mana-mail/.env on
`pnpm setup:env`. Mirrors the invoices / research / events
pattern. All 16 broadcast/mail vars flow through from SSOT.
Verified end-to-end:
- pnpm setup:env → services/mana-mail/.env contains
BROADCAST_TRACKING_SECRET + rate limits
- bun run src/index.ts → /health returns 200 with the new config
- psql → broadcast.campaigns / events / sends are materialised
Broadcast module is now fully ready to send real mail — nothing
else required before the first dogfood campaign.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last plan milestone. Users can verify their sending-domain
setup without leaving the broadcast settings page.
Server (mana-mail)
- services/dns-check.ts: parseSpf / parseDkim / parseDmarc are pure
functions. SPF accepts include:<mailDomain>, flags weak (+all) and
wrong (include missing) and multi-record (RFC 7208 §3.2). DKIM
needs v=DKIM1 + a p= public-key segment. DMARC requires v=DMARC1,
flags p=none as weak (monitoring only), ok on quarantine/reject.
All three are case-insensitive.
- lookupTxt(): DNS-over-HTTPS against Cloudflare 1.1.1.1 — avoids
the Bun/container udp-resolver flakiness and works everywhere.
Multi-string TXT (`"a" "b"`) get concatenated before parsing.
- checkDomain(): one call, three parallel DoH lookups, returns a
structured result with suggested copy-paste records scoped to the
user's actual mail domain from config.
- Route: GET /v1/mail/dns-check?domain=&selector= (JWT auth). Zod
validates the domain looks sensible before hitting DoH.
- 16 unit tests covering all three parsers + multi-record edge case.
Client
- api.ts: runDnsCheck(domain, selector?) helper with typed result.
- components/DnsCheckBanner.svelte: derives domain from the default
from-email (after @), calls the check on-demand, renders per-record
status chips (ok / weak / wrong / missing) with messages, exposes
copy-pasteable SPF + DMARC records when anything's off. DKIM setup
is provider-specific so we show a hint rather than a canned record.
Last-check timestamp persists to settings.dnsCheck so the banner
survives a reload without re-hitting the API.
- Wired into SettingsForm between Impressum and Standard-Footer —
where the user is already thinking about "what's required to
actually send".
All checks clean:
- webapp pnpm check: 0 broadcast errors (4 pre-existing articles errors
from parallel Spaces work, unrelated)
- mana-mail tests: 36/36 across tracking-token + link-rewriter + dns-check
- mana-mail build: 2.51 MB (+8 KB for juice — dns-check itself is ~3 KB)
Plan: docs/plans/broadcast-module.md §M8. All 10 milestones now done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last two dogfood blockers before real-campaign use.
link-rewriter.ts
- rewriteClickLinks(): walks <a href="http…"> in the HTML body and
replaces each URL with /api/v1/track/click/{token}?url={original}
so clicks go through the tracking endpoint. Regex-based because
Tiptap output is well-formed; returns a count for debugging.
- Leaves mailto: / tel: / anchor fragments alone — wrapping those
breaks the recipient's native handler and accomplishes nothing.
- `skipUrls` param carries the unsubscribe + web-view URLs (already
tracking endpoints themselves) so they don't get double-wrapped.
- 11 unit tests covering http/https rewriting, skip list, non-http
schemes, attribute preservation, multi-link count, quoted-attr
variants, idempotency.
Orchestrator wiring
- substituteUrls now calls rewriteClickLinks after the preview-
placeholder swap and before the open-pixel injection. The
unsubscribe + web-view URLs from this same function are passed
in as skip entries so they survive the pass untouched.
- Constructor gains `sendThrottleMs` param (default 150ms).
- Main send loop awaits sleep(throttleMs) between iterations. 150ms
= ~6/sec = ~360/min, safely below most SMTP provider limits.
100-recipient campaign = ~15s extra wall-clock but that's fine
for MVP (and most campaigns are way smaller).
Config
- New env BROADCAST_SEND_THROTTLE_MS (default 150). Wired from
loadConfig to the orchestrator constructor.
The broadcast module is now functionally complete for dogfooding.
Remaining before a real campaign can actually go out: run
`cd services/mana-mail && bun run db:push` to materialise the
broadcast.* schema tables.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the "could actually dogfood" gap: legal address can be set,
sent campaigns have a proper view with live stats, and the send path
respects DSGVO.
Webapp
- components/SettingsForm.svelte: sender defaults + Impressum (required,
highlighted amber until filled) + footer. Matches the invoices
SenderProfileForm pattern — immediate save, dedicated section per
concern.
- /broadcasts/settings/+page.svelte: mounts the form. ComposeView step
3's "Einstellungen öffnen" CTA now lands somewhere.
- views/DetailView.svelte: read-only view for sent/scheduled/cancelled
campaigns. 5-card stats grid (sent, open, click, bounce, unsub) with
rate percentages. Polls mana-mail every 30s for up to 30 min after
mount, persists back to Dexie via applyServerStatus so the list view
+ widget catch up. Includes a preview of the actual rendered campaign
so "what went out" is visible after the fact.
- /broadcasts/[id]/+page.svelte: DetailView for non-drafts; drafts
bounce to /edit via $effect-triggered goto.
- ListView row-click now routes by status (draft → edit, else → detail).
mana-mail compliance
- Orchestrator loadUnsubscribedEmails(): queries broadcast.sends WHERE
status='unsubscribed' scoped to the user, filters the recipient list
BEFORE any send rows get written. Campaign's totalRecipients reflects
the post-skip count so open rates aren't inflated by "virtual sends".
Skipped count surfaces in result.errors for the UI to show.
- jmap-client.submitEmail: new extraHeaders param. Sets custom headers
via JMAP's `header:<Name>:asText` property convention.
- Orchestrator sets RFC 8058 headers per recipient:
List-Unsubscribe: <https://.../track/unsubscribe/{token}>
List-Unsubscribe-Post: List-Unsubscribe=One-Click
This is what makes Gmail / Apple Mail show their native "Abmelden"
button in the message header (not just a body link).
All checks clean: 0 TS errors, 37/37 webapp tests, 9/9 tracking-token
tests, mana-mail bun build = 2.50 MB.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end send path lives: click "Jetzt senden" in step 4 → client
resolves recipients → POST /v1/mail/bulk-send → mana-mail loops through
JMAP with per-recipient signed URLs → status flips draft → sent.
mana-mail (backend)
- New Postgres schema `broadcast.{campaigns,sends,events}` in Drizzle.
Campaigns + sends keyed on the webapp's local ids so joins are free;
events append-only with send_id FK, dedup at query-time not write-time
so tracking pixel hits don't contend on a transaction.
- tracking-token.ts: HMAC-SHA256 over JSON({campaignId, sendId, nonce}),
base64url.base64url encoded. JSON inner payload instead of delimiter
splits so IDs can contain any character. timingSafeEqual for the HMAC
comparison. 9 unit tests covering roundtrip / tamper / malformed.
- broadcast-orchestrator.ts: takes pre-resolved recipient list, inlines
CSS once via juice (webResources.images=false so no external fetches
slow the loop), per-recipient substitutes `{{unsubscribe_url}}` /
`{{web_view_url}}` + injects open pixel, submits each mail through
the user's own JMAP account. Writes sends rows first (status=queued)
so a crash mid-loop leaves truthful DB state. Returns aggregate
stats + per-email errors.
- Routes: POST /v1/mail/bulk-send (JWT, cap at 5000 recipients via
zod + config), GET /v1/mail/campaigns/:id/events (JWT, aggregates
opens + clicks + unsubscribes with COUNT DISTINCT for the "unique"
metric), GET/POST /v1/track/{open,click,unsubscribe}/:token (public,
no auth, signed URL is the only gate).
- Track routes mounted OUTSIDE /api/v1/mail/* because the JWT
middleware guards that subtree — recipients aren't logged in.
- Config: BROADCAST_TRACKING_SECRET (separate from SERVICE_KEY so the
blast radius of a leak stays narrow),
BROADCAST_MAX_RECIPIENTS_PER_CAMPAIGN (default 5000),
BROADCAST_MAX_RECIPIENTS_PER_HOUR (default 500, not yet enforced).
- Added juice@^11 dependency.
Webapp (client)
- api.ts: sendCampaign() resolves the audience from Dexie contacts,
renders the full email HTML + plaintext with placeholders, POSTs to
mana-mail. Contacts NEVER leave the client decrypted — the server
only sees the flat recipient list the user's client produced.
- fetchCampaignStats() for M7 dashboard/detail polling.
- ComposeView step 4 replaced: confirmation modal with "sicher?"
question, sending state with spinner, done state with delivered
count + expandable per-email error list + "Zur Übersicht" button.
- Status transitions to 'sent' with cached stats after successful
send via applyServerStatus.
Known M4 gaps (fill in M5)
- Open/click/unsubscribe track endpoints return valid responses but
event dedup is rough — one insert per hit, dedup at query time
only. M5 adds windowed IP-hash dedup.
- Synchronous send loop. 100 recipients ≈ 15s blocking. M5/M6 moves
this to an async job queue with SSE progress.
- Each recipient generates a "Sent" folder entry in the user's
Stalwart mailbox. Fine for 50-recipient newsletters, silly for
5000. Phase 2 carves out a dedicated broadcast mailbox.
Plan: docs/plans/broadcast-module.md §M4.
Next: M5 open/click tracking with dedup + rate-limits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new counters + one histogram fill the observability gap from
the function-calling migration:
- mana_ai_tool_calls_total{tool, policy, outcome} — one tick per
tool_call the planner produced. `outcome` is `deferred` on the
server (stub onToolCall records for later client execution);
webapp runner will emit success/failure once it grows its own
Prom surface.
- mana_ai_planner_rounds (histogram, buckets 1..5) — distribution of
rounds consumed per iteration. Runs close to the cap signal a
planner struggling with the mission objective.
- mana_ai_provider_errors_total{provider, kind} — structured errors
surfaced from mana-llm. Kind mirrors the ProviderError hierarchy
added in commit 1 of the migration (blocked/truncated/auth/
rate_limit/capability/unknown).
Plumbing:
- llm-client.ts parses mana-llm's `{detail: {kind, message}}` 4xx/5xx
body shape and re-throws as ProviderCallError carrying the kind.
- tick.ts observes metrics at the natural emission points — rounds
+ per-call counter after runPlannerLoop returns, provider_errors
in the catch block.
Grafana dashboards + status.mana.how already pick up the
collectDefaultMetrics prefix, so these metrics land in the existing
mana-ai panel without scraper changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the gap between "invite flow UI exists" and "two users in the
same space actually see each other's data". Three pieces land together
because they're meaningless without each other.
mana-auth — new internal endpoint:
GET /api/v1/internal/users/:userId/memberships
Returns [{organizationId, role}, ...] for the user. mana-sync uses
this to populate the multi-member RLS session config.
mana-sync — membership lookup:
new internal/memberships package with an HTTP client + 5 min
per-user cache, fail-open (empty list = pre-Spaces behavior).
Config gets MANA_AUTH_URL (default http://localhost:3001).
Handler.NewHandler takes the Lookup. Every Push/Pull/Stream call
now passes spaceIDsFor(userID) to Store methods.
GetChangesSince + GetAllChangesSince extend their WHERE clause:
WHERE (user_id = $1 OR space_id = ANY($memberSpaces))
so co-members see each other's rows, not just the author.
apps/web — encryption skip for shared-space records:
encryptRecord now checks record.spaceId:
- `_personal:<userId>` sentinel OR no active shared space → encrypt
with user master key (E2E as today).
- Active space resolves to non-personal type AND spaceId matches
that space → skip encryption; write lands plaintext.
decryptRecord is unchanged because its per-field isEncrypted() guard
already passes plaintext through.
Phase-1 compromise: shared-space data is protected by server RLS
only, not E2E. Phase 2 adds per-Space shared keys with per-member
wrap — tracked in docs/plans/spaces-foundation.md.
Plus docs/plans/shared-space-smoketest.md: step-by-step Zwei-User-Test
mit erwarteten Ergebnissen und Debugging-Hinweisen bei Problemen.
Build + go test + web check all green.
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches the macmini compose — Google Gemini was already wired in the
provider adapter (commit 2 of the function-calling migration) but the
local dev stack's compose never passed the env through, so the
container booted without the provider and every tool-calling request
fell back to Ollama (unreachable in local dev, LAN-only GPU box).
With this in place the local mana-llm healthcheck reports both
`google` and `openrouter` as healthy and the webapp planner hits
Gemini Flash for real.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migration from user-level tier to Space-level tier, following the
Spaces foundation plan. User-visible effect: the tier that gates
module access now belongs to the active Space, not the user account.
Personal Spaces inherit the user's old tier on signup so nothing
downgrades.
shared-types:
- New SpaceTier type ('guest' | 'public' | 'beta' | 'alpha' | 'founder').
- New spaceTierMeets(actual, required) helper.
- SpaceMetadata gains an optional `tier` field.
mana-auth:
- createPersonalSpaceFor reads user.accessTier and stamps it into the
personal Space's metadata.tier. A founder-tier user setting up their
first Space keeps founder access in that Space.
- databaseHooks.user.create.after now forwards accessTier into the
personal-space creator.
apps/web (scope layer):
- ActiveSpace gains a required `tier: SpaceTier`; rawToActiveSpace
reads it from organization.metadata, defaulting to 'public' if
missing or invalid.
- New getEffectiveTier(userFallback) helper resolves the tier to use
for gating: prefers the active Space's tier, falls back to the
caller-supplied user tier during the boot window.
apps/web ((app) layout):
- `effectiveTier` $derived replaces every authStore.user?.tier reference
in the layout's access-gating logic (appItems, routeBlocked,
routeTierLabels). AuthGate deeper in the UI keeps using user.tier as
its own fallback — the tier move is additive, not destructive.
What this does NOT do yet:
- The user.accessTier column still exists and is still the initial
source for personal-space tier. Removing it is a later cleanup once
every code path reads through the Space primitive.
- No admin API for setting tier on a Space (PUT /api/v1/admin/spaces/
:id/tier). Follow-up when admin tooling needs it — today admins still
set user.accessTier, which flows to the personal space on next
signup.
Resolves the MANA_APPS-tier-patch workaround memory: future sessions
can adjust tier per Space instead of per User.
0 errors across 7151 files. 10/10 scope tests pass.
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the second RLS policy needed for shared spaces. Users can read
rows in any space they're a member of, in addition to their own rows.
Changes:
- New policy sync_changes_space_member_read (SELECT only) uses
app.current_user_space_ids session config: rows with space_id in
that comma-separated list pass RLS.
- WITH CHECK is not extended — writes still require user_id match, so
only the author can write. Members read, owner/author writes.
- withUser() is now a thin wrapper around withUserAndMemberships(),
which accepts the caller's Space membership list and sets the new
session config alongside app.current_user_id.
- The comma-join is empty-filtered so stray blank entries can't match
rows with literal empty space_id (defense in depth).
Forward-compatible: today every space has exactly one member (the
author), so the membership list is always empty and the new policy
is a no-op — user_id isolation remains the only active guard.
When shared spaces start being used (clubs/teams/brand spaces with
invites), the HTTP handlers will fetch the caller's membership from
mana-auth and pass it to withUserAndMemberships. No migration needed
at that point — the policy is already live.
Subscription fan-out (WS/SSE broadcast to all space members) is still
per-user; that's a follow-up tied to the membership lookup infra.
Go build + existing tests pass.
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Carries per-round token counts from the mana-llm response body
(prompt_tokens + completion_tokens) back through LlmCompletionResponse
→ PlannerLoopResult. The loop sums across rounds and exposes a single
aggregate on result.usage.
Lets mana-ai's tick re-activate per-agent daily-token budget tracking
— tokensUsed was stubbed to 0 in the migration commit (6) because the
loop didn't surface usage yet. Now recordTokenUsage + agentTokenUsage24h
get real numbers again, and the mana_ai_tokens_used_total Prometheus
counter is accurate.
Additive only: consumers without usage needs ignore the new field,
and providers that don't return usage produce zeros (not undefined —
the loop still exposes the object so downstream branches stay trivial).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Server-side:
- sync_changes gains a nullable space_id TEXT column + partial index
on (user_id, space_id, app_id, created_at) WHERE space_id IS NOT NULL.
- RecordChange takes spaceID as a first-class parameter; *string so
empty strings land as real SQL NULL and the partial index skips them.
- ChangeRow + all three SELECTs (GetChangesSince, GetAllChangesSince,
StreamAllUserChanges) propagate space_id through to clients.
- changeFromRow surfaces SpaceID on the wire Change shape.
- New extractSpaceID helper reads the incoming payload — prefers top-
level spaceId, falls back to data.spaceId (inserts) or
fields.spaceId.value (updates). Tolerates pre-v28 clients.
- 6 Go tests cover the helper + round-trip.
Client-side:
- PendingChange gains an optional spaceId.
- Dexie creating hook stamps spaceId from the active record onto the
pending-change row (already set by the v28 scope hook).
- Dexie updating hook reads spaceId from the pre-update record and
stamps it on the pending-change so updates carry space context even
though spaceId itself is immutable and never in `fields`.
- buildChangeset forwards spaceId to the server.
Explicitly NOT in scope this pass:
- RLS remains user_id-scoped; multi-member shared-space reads need a
second policy that joins against auth.members. Follow-up once shared
spaces are actually used — today everything is personal.
- Subscription fan-out is still per-user; fan-out to all members of a
shared space is part of the same follow-up.
Go tests: 6/6 pass. Web type-check clean (0 errors across 7139 files).
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrates the background tick from buildPlannerPrompt + PlannerClient +
parsePlannerResponse to the shared runPlannerLoop with native function
calling. Structurally identical to the webapp runner (commit 5a) —
same catalog, same compact system prompt, same multi-turn chat.
Server-specific twist: the ``onToolCall`` callback is a no-op stub
(returns {success:true, message:'recorded — pending client
application'}). The server has no Dexie access, so it can't actually
execute writes; instead it captures the LLM's chosen tool_calls and
writes them as PlanStep entries on the iteration. The user's client
picks up those planned steps on sync — same shape as before, just
sourced from the LLM's native tool_calls instead of a regex-extracted
JSON block.
Scope trimmed by the SERVER_TOOLS filter: only propose-default (write)
tools go to the server planner. Read-only tools (list_*, get_*) are
hidden because stubbing a response would let the LLM hallucinate that
it saw real data. Read-then-act chains stay with the foreground
runner, which has a real executor.
Deleted: planner/client.ts (old PlannerClient; replaced by
planner/llm-client.ts). Drift guard in tools.ts collapses into a
SERVER_TOOLS = AI_TOOL_CATALOG.filter(propose) derivation — no more
hand-maintained duplicate list; the contract test now asserts the
inverse round-trip against AI_PROPOSABLE_TOOL_SET.
TODO (follow-up): token usage tracking is temporarily set to 0 because
runPlannerLoop doesn't expose per-message usage yet. Budget
enforcement on the server is effectively disabled until the loop
returns that data — the webapp runner is unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Groundwork for server-side Space extensions that must NOT live in Dexie:
- spaces.credentials — per-space OAuth tokens, API keys, SMTP
configs. Access tokens are stored
encrypted at rest with the service KEK.
- spaces.module_permissions — role × module read/write/admin overrides
on top of the SPACE_MODULE_ALLOWLIST
defaults.
Both tables FK to auth.organizations with ON DELETE CASCADE so deleting
a space drops its credentials and permission overrides automatically.
RLS is intentionally deferred — enabling it now would lock out services
that don't yet pass space context. A follow-up migration turns it on
after mana-api speaks the Spaces protocol end-to-end.
To apply locally: bun run db:push in services/mana-auth, or psql -f
sql/004_spaces.sql against the mana_platform DB.
No runtime code reads these tables yet — they're the scaffolding that
Task-8 (mana-sync) and the eventual social-relay/clubs modules will
consume.
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires databaseHooks.user.create.after to call createPersonalSpaceFor,
which provisions a Better Auth organization of type='personal' with the
user as owner. Every signup now produces a usable default space — no
UI code needed to bootstrap it.
Details:
- Slug derived from email local-part, lowercase, alphanumerics + hyphens,
max 30 chars, random fallback if nothing usable remains.
- Reserved-slug list (me/admin/api/auth/…) blocks system-route clashes.
- Collision resolver appends -2, -3, … up to 999 before falling back to
a random suffix. Tests cover both the DB-taken and reserved-slug cases
via an injectable SlugTakenLookup (no DB needed for unit tests).
- Idempotent: if a personal space already exists for the user, returns
it instead of creating a duplicate. Guards against retry double-signup.
- Failure propagates — an orphan user without a personal space is worse
than a retry-able signup error.
Existing dev users will need a backfill or a re-provisioning of the dev
DB — new users are unaffected.
12 tests pass (23 total across the spaces module).
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Moves the canonical SpaceType + SPACE_MODULE_ALLOWLIST to @mana/shared-types
(framework-free) so the Bun services can consume them without pulling in
Svelte. shared-branding keeps only the UI-facing labels and descriptions
and re-exports the canonical types for frontend convenience.
Wires two Better Auth organization hooks in mana-auth:
- beforeCreateOrganization asserts metadata.type is a valid SpaceType,
rejecting the create with a BAD_REQUEST otherwise.
- beforeDeleteOrganization rejects deletion of the personal space.
Covered by bun tests (11 assertions) for the helper module.
No migration and no schema change — type lives in the existing
organization.metadata jsonb column.
Plan: docs/plans/spaces-foundation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the chat-completions surface so callers can ask any provider
to call named functions and get structured tool_calls back. Wired
through all three provider adapters so the planner and companion can
switch off the fragile JSON-parsing pathway.
- Request: tools[], tool_choice, assistant tool_calls, tool-role
messages with tool_call_id.
- Response: MessageResponse.tool_calls, Choice.finish_reason adds
"tool_calls", DeltaContent streams tool_calls.
- Google provider: Tool(function_declarations=...) build, result
normalised (args dict → JSON string), function_response parts on
a user turn for tool-role messages.
- OpenAI-compat: 1:1 passthrough of the OpenAI spec.
- Ollama: /api/chat passthrough; model-level capability check via a
TOOL_CAPABLE_OLLAMA_PATTERNS whitelist (llama3.1+, qwen2.5+,
mistral, command-r, …) — unsupported models rejected rather than
silently falling back to prose.
- Router: model_supports_tools() check upfront for both streaming
and non-streaming paths; ProviderCapabilityError bubbles as 400.
No silent downgrade. Missing tool support = explicit error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After yesterday's type-check cascade repair (c34175afa), the root
\`pnpm run type-check\` progressed through 5 more packages but still
stopped on two pre-existing failures:
- \`services/mana-media\` delivery route: \`c.body(transformedBuffer)\`
passed a Node \`Buffer<ArrayBufferLike>\`, but Hono 4.7 types the
body argument as \`Uint8Array<ArrayBuffer>\` (strict — no
ArrayBufferLike). \`Uint8Array.from(buf)\` gives a clean copy with a
fresh \`ArrayBuffer\` backing that the strict type accepts. Runtime
cost for a handful of KB per image transform is negligible next to
the Sharp pipeline that produced the buffer.
- \`packages/shared-llm\`: same rune issue as local-stt + local-llm —
\`store.svelte.ts\` uses \`$state\` and transitively pulls in
\`local-llm/src/svelte.svelte.ts\`. Plain tsc can't resolve Svelte 5
runes. Same treatment: \`type-check\` script explicitly skips with a
message pointing at svelte-check.
Root \`pnpm run type-check\` now reaches \`@context/mobile\`, which has
real code-level type errors (adapter shape mismatches, an RN event-
handler typing drift, and a deleted Supabase module still imported by
\`utils/supabaseTest.ts\`). Those need domain changes, not config
tweaks — out of scope for this repair pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The google provider called response.text after a chat completion and
passed the resulting string downstream unchanged. When Gemini's content
filter, recitation guard, or max_tokens ceiling fired, response.text
quietly returned "" — which the planner then reported as "no JSON block
found", masking the real cause. Empirically this failed in 45 ms on a
simple Quiz mission.
Introduces providers/errors.py with a small ProviderError hierarchy
(Blocked / Truncated / Auth / RateLimit / Capability). google.py now
inspects response.candidates[0].finish_reason and raises the matching
structured error; the non-streaming path maps it to 422/502/429 via a
new except-branch in main.py, and the streaming path surfaces the kind
as the SSE error type. Capability is wired but not yet used — it lands
with the tool-schema passthrough in the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "every Drizzle table uses pgSchema" rule was documented in
.claude/guidelines/database.md (added yesterday as part of Concern 5)
but enforced only by convention. A new service could slip a raw
\`pgTable()\` past review and collide in the default \`public\` schema
of \`mana_platform\`, and nothing would surface the mistake until a
production migration failed.
- \`scripts/validate-pg-schema-isolation.mjs\` scans every tracked
TypeScript file under services/, apps/api/, packages/ for call sites
of \`pgTable(\` (not imports — imports can still be useful for types).
Strips comments before matching so doc-examples like "use \`pgTable()\`"
don't trigger false positives.
- Wired as \`pnpm run validate:pg-schema\` and a new CI step in the
validate job (right after the turbo-recursion check). 721 files
scan clean today.
- Removed an unused \`pgTable\` import in mana-subscriptions that would
have been the only import of the symbol remaining after this change.
- Updated .claude/guidelines/database.md — the old verification blurb
said "no automated lint rule yet", now points at the enforcer.
Drift verified: injecting a synthetic \`pgTable('bad', {})\` into
subscriptions.ts failed with a clear file:line violation pointing at
the database guideline.
Closes the "no automated lint rule" gap noted in the database guideline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
startGoalTracker was only ever called from tests, so DrinkLogged /
TaskCompleted / MealLogged events never incremented currentValue and
GoalReached never fired — the progress bars were cosmetic. Wire it into
the (app)/+layout idle boot next to startStreakTracker, with matching
teardown in onDestroy.
Also drop <AiProposalInbox module="goals"/> into the module ListView so
create_goal / pause_goal / resume_goal / complete_goal proposals are
reviewable inline (previously only visible in the mission-detail view).
Refresh the tool-coverage tables while we're at it: apps/mana/CLAUDE.md
now reflects the real catalog state (59 tools, 19 modules — was 37/12),
and services/mana-ai/CLAUDE.md shows the correct server-side propose
subset (31 tools, 16 modules). Also fixes a stale 'location_log' →
'get_current_location' typo in the places row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TRUSTED_ORIGINS was defined inside better-auth.config.ts, which pulls
in the whole Better Auth stack just to read a list of hostnames. Anyone
who wants to consume the list (infra tooling, compose-env generators,
monitoring) had to either duplicate it or pay the import cost.
- New `sso-origins.ts` — zero-dep module exposing
`PRODUCTION_TRUSTED_ORIGINS` + `LOCAL_TRUSTED_ORIGINS` + the combined
`TRUSTED_ORIGINS` list. This is now the canonical place to add a new
top-level SSO origin.
- `better-auth.config.ts` imports + re-exports so existing consumers
keep working without a touch.
- `sso-config.spec.ts` imports directly from `./sso-origins` (cleaner
coupling) and now HARD-FAILS when mana-auth CORS_ORIGINS contains a
production origin that isn't in trustedOrigins. Previously this was
a `console.warn` only, meaning dead-drift could silently accumulate
and then surface as a confusing runtime auth rejection.
- Root CLAUDE.md "Adding an app to SSO" updated to point at the SSOT
and mention the new hard-fail direction.
No current drift — the mana-auth CORS_ORIGINS already match. The
hardened assertion is defensive for future changes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Complete walkthrough per provider — signup URL, free-tier details,
pay-per-use pricing, env-var name, key format — plus sections on where
to paste keys (.env.secrets), BYO-keys vs server-keys, verification
curl commands and troubleshooting (including the cross-service
MANA_SERVICE_KEY mismatch encountered during live testing).
Linked from services/mana-research/CLAUDE.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Google deprecated `gemini-2.0-flash` for new API users — existing
accounts still work, but a freshly-billed key returns 404
"models/gemini-2.0-flash is no longer available to new users". The
working replacement is `gemini-2.5-flash` (same price tier, better
quality, groundingMetadata shape unchanged).
Verified live: the fix produced a real answer with 6 grounding
citations in 2.6s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Eventbrite shut down their public Event Search API (/v3/events/search)
in 2023. The provider now uses the website extractor pipeline
(mana-research + LLM) to scrape Eventbrite's public search pages.
No API key needed — same pipeline as any website source.
Also adds mana-events to generate-env.mjs for automatic .env generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- claude-web-search.ts: only send `temperature` when caller explicitly
sets one. Opus 4.7 deprecated the param and returns
400 invalid_request_error "`temperature` is deprecated for this
model." Sonnet/Haiku still accept it, so keep the opt-in path.
- execute-research.ts: log provider errors via console.warn so future
integration failures are visible in stdout. Previously the executor
swallowed the underlying error and only returned a generic errorCode,
which made diagnosing vendor-specific API changes impossible.
Discovered via smoke-testing with a real Anthropic key — the direct
curl worked, but our provider 400'd because Opus 4.7 tightened the
accepted param set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>