From bb3da78d5c894cafaf5466691a841f80a12f6c51 Mon Sep 17 00:00:00 2001 From: Till JS Date: Wed, 15 Apr 2026 14:02:47 +0200 Subject: [PATCH] =?UTF-8?q?feat(ai):=20Mission=20Grant=20rollout=20gating?= =?UTF-8?q?=20=E2=80=94=20flag,=20alerts,=20runbook,=20user=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 4 — everything needed to flip the Mission Key-Grant feature on safely per deployment. No new behaviour; purely operational plumbing. - PUBLIC_AI_MISSION_GRANTS feature flag (default off). hooks.server.ts injects window.__PUBLIC_AI_MISSION_GRANTS__, api/config.ts exposes isMissionGrantsEnabled(). Grant UI (dialog + status box) and the Workbench "Datenzugriff" tab both hide when the flag is off. - PUBLIC_MANA_AI_URL added to the injection set so the webapp can reach the new audit endpoint from production. - Prometheus alerts (new mana_ai_alerts group): - ManaAIServiceDown (warning, 2m) - ManaAIGrantScopeViolation (critical, 0m) — MUST stay at 0; any increment pages immediately - ManaAIGrantSkipsHigh (warning, 15m) — flags keypair drift - ManaAIPlannerParseFailures (warning, 10m) — prompt/LLM drift - Runbook in docs/plans/ai-mission-key-grant.md: initial keypair gen, leak-response procedure (rotate + invalidate all grants + audit), scope-violation triage. - User-facing doc in apps/docs security.mdx: new "AI Mission Grants" section with the three hard constraints (ZK users blocked, scope changes invalidate cryptographically, revocation is one click) plus an honest threat-model comparison column showing where grants shift the tradeoff. Rollout remaining (not code): generate keypair on Mac Mini, provision MANA_AI_PRIVATE_KEY_PEM + MANA_AI_PUBLIC_KEY_PEM via Docker secrets, flip PUBLIC_AI_MISSION_GRANTS=true starting with till-only. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../content/docs/architecture/security.mdx | 52 +++++++++++++++ apps/mana/apps/web/src/hooks.server.ts | 8 +++ apps/mana/apps/web/src/lib/api/config.ts | 16 +++++ .../lib/modules/ai-missions/ListView.svelte | 4 +- .../lib/modules/ai-workbench/ListView.svelte | 24 ++++--- docker/prometheus/alerts.yml | 50 ++++++++++++++ docs/plans/ai-mission-key-grant.md | 65 +++++++++++++++++-- 7 files changed, 204 insertions(+), 15 deletions(-) diff --git a/apps/docs/src/content/docs/architecture/security.mdx b/apps/docs/src/content/docs/architecture/security.mdx index c2fba61b2..823bc3a4e 100644 --- a/apps/docs/src/content/docs/architecture/security.mdx +++ b/apps/docs/src/content/docs/architecture/security.mdx @@ -184,6 +184,58 @@ Each row carries the IP address, user-agent, HTTP status code, and a free-form c | User loses recovery code | n/a | ❌ Data lost | | User loses password but vault is in ZK mode | Recovery via password reset | ❌ Data lost (vault is keyed to recovery code) | +## AI Mission Grants (opt-in, per mission) + +By default, AI missions that depend on encrypted data (notes, tasks, +calendar events, journal entries, your Kontext document) run **only +when your browser tab is open** — the background runner on our server +sees ciphertext and physically cannot read them. + +Some missions are more useful when they run continuously, even while +you're offline. For those, you can opt in — per mission, not globally +— to a **Mission Key-Grant**. Here is exactly what that does: + +1. Your browser derives a fresh key that is bound to: + - The mission's ID. + - The specific table names referenced. + - The specific record IDs referenced. +2. The derived key is wrapped with the mana-ai service's public key + and attached to the mission record. +3. When the mana-ai runner ticks for that mission, it unwraps the + key in memory, decrypts **only the allowlisted records**, plans + the next iteration, and forgets the key at the end of the tick. +4. Every decrypt is logged. You see the full log under **Workbench + → Datenzugriff**. + +Hard constraints — enforced by the code, not by policy: + +- **Zero-knowledge users cannot issue grants.** The mana-auth server + has no usable master key in ZK mode; the endpoint refuses. +- **Scope changes invalidate the key cryptographically.** Add a new + record to a mission → the derived key is different → the existing + grant stops working → you're prompted to re-consent. It is not + possible for the runner to "silently expand" its scope. +- **Grants expire.** Default lifetime is 7 days, renewed on every + successful run. Missions that go idle lose their grant automatically; + you re-consent on the next edit. +- **Revocation is one click.** The lock icon in the Workbench removes + the grant; the mission keeps its history but stops running + server-side until you re-grant. +- **The runner never writes under a grant** — it only reads. All + changes still go through the normal proposal-approve flow you + control. + +| Threat | Standard | With a Mission Grant | Zero-Knowledge | +|--------|----------|----------------------|----------------| +| Mana operator reads an unrelated record of the same user | ⚠️ Could decrypt with KEK | ✅ Cannot — key is scoped | ✅ Cannot | +| Mana operator reads the granted records of the grant-enabled mission | ⚠️ Could decrypt with KEK | ⚠️ Could decrypt with the grant key + record ciphertext | ✅ Cannot | +| Court order against Mana for the granted-mission records | ⚠️ Could be compelled | ⚠️ Could be compelled (while grant is active) | ✅ Mana physically cannot comply | +| Runner RAM-dump during the 60s tick | ⚠️ n/a | ⚠️ Could expose the grant key for one tick window | ✅ n/a | + +The tradeoff is deliberate: you exchange a small, scoped privacy +reduction for autonomy on one mission. Missions without a grant keep +the full standard / ZK guarantees. + ## Implementation references For the architectural deep dive, code locations, and the complete rollout history (Phases 1–9 + the backlog sweep), see [`DATA_LAYER_AUDIT.md`](https://github.com/mana-how/mana-monorepo/blob/main/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md). diff --git a/apps/mana/apps/web/src/hooks.server.ts b/apps/mana/apps/web/src/hooks.server.ts index 002de6655..85aeb096e 100644 --- a/apps/mana/apps/web/src/hooks.server.ts +++ b/apps/mana/apps/web/src/hooks.server.ts @@ -47,6 +47,12 @@ const PUBLIC_MANA_API_URL_CLIENT = process.env.PUBLIC_MANA_API_URL_CLIENT || process.env.PUBLIC_MANA_API_URL || ''; const PUBLIC_MANA_CREDITS_URL_CLIENT = process.env.PUBLIC_MANA_CREDITS_URL_CLIENT || process.env.PUBLIC_MANA_CREDITS_URL || ''; +const PUBLIC_MANA_AI_URL_CLIENT = + process.env.PUBLIC_MANA_AI_URL_CLIENT || process.env.PUBLIC_MANA_AI_URL || ''; +// Feature flag for the Mission Key-Grant UI (server-side execution of +// encrypted missions). Default off — flip to 'true' per deployment once +// the MANA_AI_PUBLIC/PRIVATE_KEY_PEM pair is provisioned on both services. +const PUBLIC_AI_MISSION_GRANTS = process.env.PUBLIC_AI_MISSION_GRANTS === 'true' ? 'true' : 'false'; // Map of app subdomains to internal paths const APP_SUBDOMAINS = new Set([ @@ -126,6 +132,8 @@ window.__PUBLIC_MANA_LLM_URL__ = ${JSON.stringify(PUBLIC_MANA_LLM_URL_CLIENT)}; window.__PUBLIC_MANA_EVENTS_URL__ = ${JSON.stringify(PUBLIC_MANA_EVENTS_URL_CLIENT)}; window.__PUBLIC_MANA_API_URL__ = ${JSON.stringify(PUBLIC_MANA_API_URL_CLIENT)}; window.__PUBLIC_MANA_CREDITS_URL__ = ${JSON.stringify(PUBLIC_MANA_CREDITS_URL_CLIENT)}; +window.__PUBLIC_MANA_AI_URL__ = ${JSON.stringify(PUBLIC_MANA_AI_URL_CLIENT)}; +window.__PUBLIC_AI_MISSION_GRANTS__ = ${JSON.stringify(PUBLIC_AI_MISSION_GRANTS)}; window.__PUBLIC_GLITCHTIP_DSN__ = ${JSON.stringify(PUBLIC_GLITCHTIP_DSN)}; `; return injectUmamiAnalytics(html.replace('', `${envScript}`)); diff --git a/apps/mana/apps/web/src/lib/api/config.ts b/apps/mana/apps/web/src/lib/api/config.ts index 2e30ba4df..d1b7b1414 100644 --- a/apps/mana/apps/web/src/lib/api/config.ts +++ b/apps/mana/apps/web/src/lib/api/config.ts @@ -74,6 +74,22 @@ export function getManaAiUrl(): string { return process.env.PUBLIC_MANA_AI_URL || 'http://localhost:3066'; } +/** + * Feature flag for the AI Mission Key-Grant UI. When false, the consent + * dialog + "Server-Zugriff" box are hidden even on missions with + * encrypted inputs — missions simply stay foreground-only. Flip on per- + * deployment after the MANA_AI_PUBLIC/PRIVATE_KEY_PEM keypair is + * provisioned on both mana-auth and mana-ai. + */ +export function isMissionGrantsEnabled(): boolean { + if (browser && typeof window !== 'undefined') { + const flag = (window as unknown as { __PUBLIC_AI_MISSION_GRANTS__?: string }) + .__PUBLIC_AI_MISSION_GRANTS__; + return flag === 'true'; + } + return process.env.PUBLIC_AI_MISSION_GRANTS === 'true'; +} + /** * Get the mana-mail service URL. * Hosts mail threads, send, labels, accounts. diff --git a/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte b/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte index 1fa229b7c..117e51e75 100644 --- a/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte +++ b/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte @@ -19,6 +19,7 @@ import { productionDeps } from '$lib/data/ai/missions/setup'; import MissionInputPicker from '$lib/components/ai/MissionInputPicker.svelte'; import MissionGrantDialog from '$lib/components/ai/MissionGrantDialog.svelte'; + import { isMissionGrantsEnabled } from '$lib/api/config'; import type { Mission, MissionCadence, MissionInputRef } from '$lib/data/ai/missions/types'; const missions = $derived(useMissions()); @@ -106,6 +107,7 @@ function hasEncryptedInputs(m: Mission): boolean { return m.inputs.some((i) => ENCRYPTED_SERVER_TABLES.has(i.table)); } + const grantsEnabled = $derived(isMissionGrantsEnabled()); function grantStatus(m: Mission): 'none' | 'active' | 'expired' { if (!m.grant) return 'none'; return Date.parse(m.grant.expiresAt) < Date.now() ? 'expired' : 'active'; @@ -305,7 +307,7 @@ {/if} - {#if hasEncryptedInputs(selected)} + {#if grantsEnabled && hasEncryptedInputs(selected)}
🔑 Server-Zugriff diff --git a/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte b/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte index 9b18b0bb5..b06a83a44 100644 --- a/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte +++ b/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte @@ -8,6 +8,7 @@ import { useMissions } from '$lib/data/ai/missions/queries'; import { revertIteration } from '$lib/data/ai/revert/revert-iteration'; import { fetchDecryptAudit, type AuditRow } from '$lib/data/ai/audit/queries'; + import { isMissionGrantsEnabled } from '$lib/api/config'; import type { DomainEvent } from '$lib/data/events/types'; let moduleFilter = $state(null); @@ -41,6 +42,7 @@ } // ── Tab switcher: timeline ↔ decrypt audit ───────────── + const grantsEnabled = $derived(isMissionGrantsEnabled()); let tab = $state<'timeline' | 'audit'>('timeline'); let auditRows = $state([]); let auditLoading = $state(false); @@ -110,16 +112,18 @@ > Timeline - + {#if grantsEnabled} + + {/if}
diff --git a/docker/prometheus/alerts.yml b/docker/prometheus/alerts.yml index e47506c58..104dab6ba 100644 --- a/docker/prometheus/alerts.yml +++ b/docker/prometheus/alerts.yml @@ -465,3 +465,53 @@ groups: annotations: summary: "LLM responses are slow" description: "LLM p95 latency is {{ $value | humanizeDuration }}." + + - name: mana_ai_alerts + rules: + # mana-ai background runner down + - alert: ManaAIServiceDown + expr: up{job="mana-ai"} == 0 + for: 2m + labels: + severity: warning + annotations: + summary: "mana-ai background runner is down" + description: "mana-ai has been down for 2+ minutes. Missions fall back to the browser-only Runner — users with closed tabs stop receiving proposals." + + # Grant scope violation — MUST remain at 0 in steady state. + # Any increment is a serious signal: either a runtime bug bypassed + # the cryptographic scope binding, or a compromised service tried + # to decrypt outside its allowlist. Page on first occurrence. + - alert: ManaAIGrantScopeViolation + expr: increase(mana_ai_grant_scope_violations_total[5m]) > 0 + for: 0m + labels: + severity: critical + annotations: + summary: "mana-ai Mission Grant scope violation detected" + description: "mana-ai attempted to decrypt a record outside a Mission Grant's allowlist on table {{ $labels.table }}. Steady-state value MUST be 0. Investigate: (1) look for a resolver bug on the named table, (2) check recent grant issuance, (3) dump the most recent rows from mana_ai.decrypt_audit WHERE status='scope-violation'." + + # Chronic grant failures — expired TTLs are fine, but a flood of + # wrap-rejected / malformed / not-configured means the keypair is + # misconfigured or rotated without re-consent. + - alert: ManaAIGrantSkipsHigh + expr: | + sum(rate(mana_ai_grant_skips_total{reason!="expired"}[15m])) > 0.1 + for: 15m + labels: + severity: warning + annotations: + summary: "mana-ai grant skips trending high ({{ $labels.reason }})" + description: "mana-ai is skipping grants at {{ $value | humanize }}/s with reason={{ $labels.reason }}. Likely causes: MANA_AI_PRIVATE_KEY_PEM mis-set, keypair out of sync with mana-auth's public key, or client producing malformed grants." + + # Planner parse failures — too many means the prompt / LLM drifted. + - alert: ManaAIPlannerParseFailures + expr: | + sum(rate(mana_ai_parse_failures_total[10m])) + / (sum(rate(mana_ai_plans_produced_total[10m])) + sum(rate(mana_ai_parse_failures_total[10m])) + 0.0001) > 0.2 + for: 10m + labels: + severity: warning + annotations: + summary: "mana-ai planner parse-failure rate high" + description: "{{ $value | humanizePercentage }} of Planner responses failed to parse — prompt drift or LLM degradation likely." diff --git a/docs/plans/ai-mission-key-grant.md b/docs/plans/ai-mission-key-grant.md index 98001bf17..408ac0fa9 100644 --- a/docs/plans/ai-mission-key-grant.md +++ b/docs/plans/ai-mission-key-grant.md @@ -85,10 +85,11 @@ Ziel: User kann Grant geben/zurückziehen, UX ist ehrlich. ### Phase 4 — Rollout (1–2 Tage) -- [ ] **Feature-Flag**: `PUBLIC_AI_MISSION_GRANTS=false` default. Dogfood zuerst (till only), dann beta-tier, dann alpha. -- [ ] **Status-Page**: blackbox-probe auf `mana-ai` `/health` existiert schon; zusätzlich Alerting auf `mana_ai_grant_scope_violations_total > 0` (darf nie vorkommen). -- [ ] **Runbook**: Was tun wenn `MANA_AI_PRIVATE_KEY` leaked? → Keypair rotieren, alle Grants invalidieren (simples `UPDATE aiMissions SET grant=null`), User bekommen Re-Consent-Prompts. -- [ ] **Docs-Update**: [`apps/docs/src/content/docs/architecture/security.mdx`](../../apps/docs/src/content/docs/architecture/security.mdx) — neuer Abschnitt "AI Mission Grants". +- [x] **Feature-Flag**: `PUBLIC_AI_MISSION_GRANTS=false` default — Dialog + Audit-Tab sind gegated. Dogfood zuerst (till only), dann beta-tier, dann alpha. +- [x] **Alerting**: `ManaAIGrantScopeViolation` (critical, any increment), `ManaAIGrantSkipsHigh` (warning, non-expired skips), `ManaAIPlannerParseFailures` in `docker/prometheus/alerts.yml`. Status-Page blackbox-probe auf `/health` laeuft bereits. +- [x] **Runbook**: Keypair-initial + Keypair-Leak-Prozedur + Scope-Violation-Response weiter unten in diesem Dokument. +- [x] **Docs-Update**: [`apps/docs/src/content/docs/architecture/security.mdx`](../../apps/docs/src/content/docs/architecture/security.mdx) — Abschnitt "AI Mission Grants" inkl. erweiterter Threat-Model-Zeilen. +- [ ] **Keypair tatsaechlich erzeugen** auf Mac-Mini + in Secrets ablegen (nicht in diesem Repo — out-of-band). --- @@ -131,6 +132,62 @@ Ziel: User kann Grant geben/zurückziehen, UX ist ehrlich. --- +## Runbook + +### Keypair initial erzeugen (einmalig pro Deployment) + +```bash +# Auf dem Mac-Mini (oder einer sicheren Arbeitsumgebung): +openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out mana-ai.priv.pem +openssl pkey -in mana-ai.priv.pem -pubout -out mana-ai.pub.pem + +# Als Env-Vars exportieren (Docker-Compose env_file / secrets): +# MANA_AI_PRIVATE_KEY_PEM → mana-ai (niemals ausserhalb des Services!) +# MANA_AI_PUBLIC_KEY_PEM → mana-auth + +# Dann im Webapp-Build: +# PUBLIC_AI_MISSION_GRANTS=true (Dialog + Audit-Tab aktivieren) +``` + +Beide Services loggen beim Boot ob das Feature aktiv ist; `GET /health`-Status aendert sich nicht. + +### "Was tun wenn `MANA_AI_PRIVATE_KEY_PEM` leaked?" + +Der Private-Key ist das einzige Geheimnis, das alle aktiven Grants entschluesseln kann. Leakt er, kann ein Angreifer **im Besitz des verschluesselten Grant-Blobs + der verschluesselten Records** den Plaintext rekonstruieren. Ohne die verschluesselten Records allein bringt der Key nichts — aber das ist eine duenne Grenze; im Zweifel: rotieren. + +Prozedur: + +1. **Neues Keypair erzeugen** (siehe oben). Unter keinen Umstaenden das alte wiederverwenden. +2. **`MANA_AI_PRIVATE_KEY_PEM`** auf `mana-ai` austauschen → Service neustarten. Alle bestehenden Grants unwrappen ab jetzt mit `wrap-rejected` (neuer Private-Key passt nicht zum alten Wrap). +3. **`MANA_AI_PUBLIC_KEY_PEM`** auf `mana-auth` austauschen → Service neustarten. +4. **Alle bestehenden Grants invalidieren** — die sind mit dem alten Public-Key gewrappt und funktionslos. Im Postgres: + ```sql + UPDATE aiMissions SET grant = NULL + WHERE user_id = '' AND grant IS NOT NULL; + ``` + (Im Mana-Modell lebt das als `sync_changes`-Row auf `appId='ai'/table='aiMissions'`; einfacher ist eine leise Migration im `mana-sync` Admin-Backend.) +5. **Audit-Trail** dokumentieren: Zeitpunkt Leak entdeckt / Keys getauscht / Grants invalidiert. Post-Mortem in `docs/postmortems/`. +6. **User benachrichtigen**: Missions bleiben aktiv, laufen aber nur noch im Vordergrund bis der User den Zugriff erneut erteilt. Das ist nach Plan; Re-Consent-Prompt erscheint automatisch beim naechsten Mission-Edit. +7. **Monitoring pruefen**: `mana_ai_grant_skips_total{reason="wrap-rejected"}` muss nach Schritt 2 kurz hoch gehen (alte Grants) und dann zurueck auf 0 sobald alle via Schritt 4 entfernt sind. + +### Scope-Violation Alarm reagiert + +Prometheus-Alert `ManaAIGrantScopeViolation` (critical, see `docker/prometheus/alerts.yml`) feuert bei `mana_ai_grant_scope_violations_total > 0`. Steady-State muss 0 sein — jede Zuendung ist entweder Bug oder Angriff. + +1. Letzte Scope-Violations auslesen: + ```sql + SELECT * FROM mana_ai.decrypt_audit + WHERE status = 'scope-violation' + ORDER BY ts DESC LIMIT 20; + ``` +2. `record_id` pruefen: gehoert die Record tatsaechlich zum User? Falls nein → kompromittierte Mission-Grant-Erzeugung, Nutzer sperren. +3. Falls ja: Resolver-Bug. `services/mana-ai/src/db/resolvers/encrypted.ts` checken — die HKDF-Bindung sollte der Check eigentlich ueberfluessig machen. Wenn der Runtime-Check greift, stimmt etwas in der Derivation nicht. +4. Mission temporaer pausieren: + ```sql + UPDATE aiMissions SET state = 'paused', grant = NULL + WHERE id = ''; + ``` + ## Nicht-Ziele - **Zero-Knowledge-User bekommen das nicht.** Die bleiben beim Foreground-Runner. Wenn sie Autonomie wollen, müssen sie ZK abschalten — das ist die Entscheidung die ZK bedeutet.