From bb3da78d5c894cafaf5466691a841f80a12f6c51 Mon Sep 17 00:00:00 2001
From: Till JS <tills95@gmail.com>
Date: Wed, 15 Apr 2026 14:02:47 +0200
Subject: [PATCH] =?UTF-8?q?feat(ai):=20Mission=20Grant=20rollout=20gating?=
 =?UTF-8?q?=20=E2=80=94=20flag,=20alerts,=20runbook,=20user=20docs?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Phase 4 — everything needed to flip the Mission Key-Grant feature on
safely per deployment. No new behaviour; purely operational plumbing.

- PUBLIC_AI_MISSION_GRANTS feature flag (default off). hooks.server.ts
  injects window.__PUBLIC_AI_MISSION_GRANTS__, api/config.ts exposes
  isMissionGrantsEnabled(). Grant UI (dialog + status box) and the
  Workbench "Datenzugriff" tab both hide when the flag is off.
- PUBLIC_MANA_AI_URL added to the injection set so the webapp can reach
  the new audit endpoint from production.
- Prometheus alerts (new mana_ai_alerts group):
  - ManaAIServiceDown (warning, 2m)
  - ManaAIGrantScopeViolation (critical, 0m) — MUST stay at 0; any
    increment pages immediately
  - ManaAIGrantSkipsHigh (warning, 15m) — flags keypair drift
  - ManaAIPlannerParseFailures (warning, 10m) — prompt/LLM drift
- Runbook in docs/plans/ai-mission-key-grant.md: initial keypair gen,
  leak-response procedure (rotate + invalidate all grants + audit),
  scope-violation triage.
- User-facing doc in apps/docs security.mdx: new "AI Mission Grants"
  section with the three hard constraints (ZK users blocked, scope
  changes invalidate cryptographically, revocation is one click) plus
  an honest threat-model comparison column showing where grants shift
  the tradeoff.

Rollout remaining (not code): generate keypair on Mac Mini, provision
MANA_AI_PRIVATE_KEY_PEM + MANA_AI_PUBLIC_KEY_PEM via Docker secrets,
flip PUBLIC_AI_MISSION_GRANTS=true starting with till-only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .../content/docs/architecture/security.mdx    | 52 +++++++++++++++
 apps/mana/apps/web/src/hooks.server.ts        |  8 +++
 apps/mana/apps/web/src/lib/api/config.ts      | 16 +++++
 .../lib/modules/ai-missions/ListView.svelte   |  4 +-
 .../lib/modules/ai-workbench/ListView.svelte  | 24 ++++---
 docker/prometheus/alerts.yml                  | 50 ++++++++++++++
 docs/plans/ai-mission-key-grant.md            | 65 +++++++++++++++++--
 7 files changed, 204 insertions(+), 15 deletions(-)

diff --git a/apps/docs/src/content/docs/architecture/security.mdx b/apps/docs/src/content/docs/architecture/security.mdx
index c2fba61b2..823bc3a4e 100644
--- a/apps/docs/src/content/docs/architecture/security.mdx
+++ b/apps/docs/src/content/docs/architecture/security.mdx
@@ -184,6 +184,58 @@ Each row carries the IP address, user-agent, HTTP status code, and a free-form c
 | User loses recovery code | n/a | ❌ Data lost |
 | User loses password but vault is in ZK mode | Recovery via password reset | ❌ Data lost (vault is keyed to recovery code) |
 
+## AI Mission Grants (opt-in, per mission)
+
+By default, AI missions that depend on encrypted data (notes, tasks,
+calendar events, journal entries, your Kontext document) run **only
+when your browser tab is open** — the background runner on our server
+sees ciphertext and physically cannot read them.
+
+Some missions are more useful when they run continuously, even while
+you're offline. For those, you can opt in — per mission, not globally
+— to a **Mission Key-Grant**. Here is exactly what that does:
+
+1. Your browser derives a fresh key that is bound to:
+   - The mission's ID.
+   - The specific table names referenced.
+   - The specific record IDs referenced.
+2. The derived key is wrapped with the mana-ai service's public key
+   and attached to the mission record.
+3. When the mana-ai runner ticks for that mission, it unwraps the
+   key in memory, decrypts **only the allowlisted records**, plans
+   the next iteration, and forgets the key at the end of the tick.
+4. Every decrypt is logged. You see the full log under **Workbench
+   → Datenzugriff**.
+
+Hard constraints — enforced by the code, not by policy:
+
+- **Zero-knowledge users cannot issue grants.** The mana-auth server
+  has no usable master key in ZK mode; the endpoint refuses.
+- **Scope changes invalidate the key cryptographically.** Add a new
+  record to a mission → the derived key is different → the existing
+  grant stops working → you're prompted to re-consent. It is not
+  possible for the runner to "silently expand" its scope.
+- **Grants expire.** Default lifetime is 7 days, renewed on every
+  successful run. Missions that go idle lose their grant automatically;
+  you re-consent on the next edit.
+- **Revocation is one click.** The lock icon in the Workbench removes
+  the grant; the mission keeps its history but stops running
+  server-side until you re-grant.
+- **The runner never writes under a grant** — it only reads. All
+  changes still go through the normal proposal-approve flow you
+  control.
+
+| Threat | Standard | With a Mission Grant | Zero-Knowledge |
+|--------|----------|----------------------|----------------|
+| Mana operator reads an unrelated record of the same user | ⚠️ Could decrypt with KEK | ✅ Cannot — key is scoped | ✅ Cannot |
+| Mana operator reads the granted records of the grant-enabled mission | ⚠️ Could decrypt with KEK | ⚠️ Could decrypt with the grant key + record ciphertext | ✅ Cannot |
+| Court order against Mana for the granted-mission records | ⚠️ Could be compelled | ⚠️ Could be compelled (while grant is active) | ✅ Mana physically cannot comply |
+| Runner RAM-dump during the 60s tick | ⚠️ n/a | ⚠️ Could expose the grant key for one tick window | ✅ n/a |
+
+The tradeoff is deliberate: you exchange a small, scoped privacy
+reduction for autonomy on one mission. Missions without a grant keep
+the full standard / ZK guarantees.
+
 ## Implementation references
 
 For the architectural deep dive, code locations, and the complete rollout history (Phases 1–9 + the backlog sweep), see [`DATA_LAYER_AUDIT.md`](https://github.com/mana-how/mana-monorepo/blob/main/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md).
diff --git a/apps/mana/apps/web/src/hooks.server.ts b/apps/mana/apps/web/src/hooks.server.ts
index 002de6655..85aeb096e 100644
--- a/apps/mana/apps/web/src/hooks.server.ts
+++ b/apps/mana/apps/web/src/hooks.server.ts
@@ -47,6 +47,12 @@ const PUBLIC_MANA_API_URL_CLIENT =
 	process.env.PUBLIC_MANA_API_URL_CLIENT || process.env.PUBLIC_MANA_API_URL || '';
 const PUBLIC_MANA_CREDITS_URL_CLIENT =
 	process.env.PUBLIC_MANA_CREDITS_URL_CLIENT || process.env.PUBLIC_MANA_CREDITS_URL || '';
+const PUBLIC_MANA_AI_URL_CLIENT =
+	process.env.PUBLIC_MANA_AI_URL_CLIENT || process.env.PUBLIC_MANA_AI_URL || '';
+// Feature flag for the Mission Key-Grant UI (server-side execution of
+// encrypted missions). Default off — flip to 'true' per deployment once
+// the MANA_AI_PUBLIC/PRIVATE_KEY_PEM pair is provisioned on both services.
+const PUBLIC_AI_MISSION_GRANTS = process.env.PUBLIC_AI_MISSION_GRANTS === 'true' ? 'true' : 'false';
 
 // Map of app subdomains to internal paths
 const APP_SUBDOMAINS = new Set([
@@ -126,6 +132,8 @@ window.__PUBLIC_MANA_LLM_URL__ = ${JSON.stringify(PUBLIC_MANA_LLM_URL_CLIENT)};
 window.__PUBLIC_MANA_EVENTS_URL__ = ${JSON.stringify(PUBLIC_MANA_EVENTS_URL_CLIENT)};
 window.__PUBLIC_MANA_API_URL__ = ${JSON.stringify(PUBLIC_MANA_API_URL_CLIENT)};
 window.__PUBLIC_MANA_CREDITS_URL__ = ${JSON.stringify(PUBLIC_MANA_CREDITS_URL_CLIENT)};
+window.__PUBLIC_MANA_AI_URL__ = ${JSON.stringify(PUBLIC_MANA_AI_URL_CLIENT)};
+window.__PUBLIC_AI_MISSION_GRANTS__ = ${JSON.stringify(PUBLIC_AI_MISSION_GRANTS)};
 window.__PUBLIC_GLITCHTIP_DSN__ = ${JSON.stringify(PUBLIC_GLITCHTIP_DSN)};
 </script>`;
 			return injectUmamiAnalytics(html.replace('<head>', `<head>${envScript}`));
diff --git a/apps/mana/apps/web/src/lib/api/config.ts b/apps/mana/apps/web/src/lib/api/config.ts
index 2e30ba4df..d1b7b1414 100644
--- a/apps/mana/apps/web/src/lib/api/config.ts
+++ b/apps/mana/apps/web/src/lib/api/config.ts
@@ -74,6 +74,22 @@ export function getManaAiUrl(): string {
 	return process.env.PUBLIC_MANA_AI_URL || 'http://localhost:3066';
 }
 
+/**
+ * Feature flag for the AI Mission Key-Grant UI. When false, the consent
+ * dialog + "Server-Zugriff" box are hidden even on missions with
+ * encrypted inputs — missions simply stay foreground-only. Flip on per-
+ * deployment after the MANA_AI_PUBLIC/PRIVATE_KEY_PEM keypair is
+ * provisioned on both mana-auth and mana-ai.
+ */
+export function isMissionGrantsEnabled(): boolean {
+	if (browser && typeof window !== 'undefined') {
+		const flag = (window as unknown as { __PUBLIC_AI_MISSION_GRANTS__?: string })
+			.__PUBLIC_AI_MISSION_GRANTS__;
+		return flag === 'true';
+	}
+	return process.env.PUBLIC_AI_MISSION_GRANTS === 'true';
+}
+
 /**
  * Get the mana-mail service URL.
  * Hosts mail threads, send, labels, accounts.
diff --git a/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte b/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte
index 1fa229b7c..117e51e75 100644
--- a/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte
+++ b/apps/mana/apps/web/src/lib/modules/ai-missions/ListView.svelte
@@ -19,6 +19,7 @@
 	import { productionDeps } from '$lib/data/ai/missions/setup';
 	import MissionInputPicker from '$lib/components/ai/MissionInputPicker.svelte';
 	import MissionGrantDialog from '$lib/components/ai/MissionGrantDialog.svelte';
+	import { isMissionGrantsEnabled } from '$lib/api/config';
 	import type { Mission, MissionCadence, MissionInputRef } from '$lib/data/ai/missions/types';
 
 	const missions = $derived(useMissions());
@@ -106,6 +107,7 @@
 	function hasEncryptedInputs(m: Mission): boolean {
 		return m.inputs.some((i) => ENCRYPTED_SERVER_TABLES.has(i.table));
 	}
+	const grantsEnabled = $derived(isMissionGrantsEnabled());
 	function grantStatus(m: Mission): 'none' | 'active' | 'expired' {
 		if (!m.grant) return 'none';
 		return Date.parse(m.grant.expiresAt) < Date.now() ? 'expired' : 'active';
@@ -305,7 +307,7 @@
 			</details>
 		{/if}
 
-		{#if hasEncryptedInputs(selected)}
+		{#if grantsEnabled && hasEncryptedInputs(selected)}
 			<section class="grant-box">
 				<div class="grant-head">
 					<span class="grant-title">🔑 Server-Zugriff</span>
diff --git a/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte b/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte
index 9b18b0bb5..b06a83a44 100644
--- a/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte
+++ b/apps/mana/apps/web/src/lib/modules/ai-workbench/ListView.svelte
@@ -8,6 +8,7 @@
 	import { useMissions } from '$lib/data/ai/missions/queries';
 	import { revertIteration } from '$lib/data/ai/revert/revert-iteration';
 	import { fetchDecryptAudit, type AuditRow } from '$lib/data/ai/audit/queries';
+	import { isMissionGrantsEnabled } from '$lib/api/config';
 	import type { DomainEvent } from '$lib/data/events/types';
 
 	let moduleFilter = $state<string | null>(null);
@@ -41,6 +42,7 @@
 	}
 
 	// ── Tab switcher: timeline ↔ decrypt audit ─────────────
+	const grantsEnabled = $derived(isMissionGrantsEnabled());
 	let tab = $state<'timeline' | 'audit'>('timeline');
 	let auditRows = $state<AuditRow[]>([]);
 	let auditLoading = $state(false);
@@ -110,16 +112,18 @@
 		>
 			Timeline
 		</button>
-		<button
-			type="button"
-			role="tab"
-			class="tab"
-			class:tab-active={tab === 'audit'}
-			aria-selected={tab === 'audit'}
-			onclick={() => (tab = 'audit')}
-		>
-			Datenzugriff
-		</button>
+		{#if grantsEnabled}
+			<button
+				type="button"
+				role="tab"
+				class="tab"
+				class:tab-active={tab === 'audit'}
+				aria-selected={tab === 'audit'}
+				onclick={() => (tab = 'audit')}
+			>
+				Datenzugriff
+			</button>
+		{/if}
 	</div>
 
 	<div class="filters">
diff --git a/docker/prometheus/alerts.yml b/docker/prometheus/alerts.yml
index e47506c58..104dab6ba 100644
--- a/docker/prometheus/alerts.yml
+++ b/docker/prometheus/alerts.yml
@@ -465,3 +465,53 @@ groups:
         annotations:
           summary: "LLM responses are slow"
           description: "LLM p95 latency is {{ $value | humanizeDuration }}."
+
+  - name: mana_ai_alerts
+    rules:
+      # mana-ai background runner down
+      - alert: ManaAIServiceDown
+        expr: up{job="mana-ai"} == 0
+        for: 2m
+        labels:
+          severity: warning
+        annotations:
+          summary: "mana-ai background runner is down"
+          description: "mana-ai has been down for 2+ minutes. Missions fall back to the browser-only Runner — users with closed tabs stop receiving proposals."
+
+      # Grant scope violation — MUST remain at 0 in steady state.
+      # Any increment is a serious signal: either a runtime bug bypassed
+      # the cryptographic scope binding, or a compromised service tried
+      # to decrypt outside its allowlist. Page on first occurrence.
+      - alert: ManaAIGrantScopeViolation
+        expr: increase(mana_ai_grant_scope_violations_total[5m]) > 0
+        for: 0m
+        labels:
+          severity: critical
+        annotations:
+          summary: "mana-ai Mission Grant scope violation detected"
+          description: "mana-ai attempted to decrypt a record outside a Mission Grant's allowlist on table {{ $labels.table }}. Steady-state value MUST be 0. Investigate: (1) look for a resolver bug on the named table, (2) check recent grant issuance, (3) dump the most recent rows from mana_ai.decrypt_audit WHERE status='scope-violation'."
+
+      # Chronic grant failures — expired TTLs are fine, but a flood of
+      # wrap-rejected / malformed / not-configured means the keypair is
+      # misconfigured or rotated without re-consent.
+      - alert: ManaAIGrantSkipsHigh
+        expr: |
+          sum(rate(mana_ai_grant_skips_total{reason!="expired"}[15m])) > 0.1
+        for: 15m
+        labels:
+          severity: warning
+        annotations:
+          summary: "mana-ai grant skips trending high ({{ $labels.reason }})"
+          description: "mana-ai is skipping grants at {{ $value | humanize }}/s with reason={{ $labels.reason }}. Likely causes: MANA_AI_PRIVATE_KEY_PEM mis-set, keypair out of sync with mana-auth's public key, or client producing malformed grants."
+
+      # Planner parse failures — too many means the prompt / LLM drifted.
+      - alert: ManaAIPlannerParseFailures
+        expr: |
+          sum(rate(mana_ai_parse_failures_total[10m]))
+          / (sum(rate(mana_ai_plans_produced_total[10m])) + sum(rate(mana_ai_parse_failures_total[10m])) + 0.0001) > 0.2
+        for: 10m
+        labels:
+          severity: warning
+        annotations:
+          summary: "mana-ai planner parse-failure rate high"
+          description: "{{ $value | humanizePercentage }} of Planner responses failed to parse — prompt drift or LLM degradation likely."
diff --git a/docs/plans/ai-mission-key-grant.md b/docs/plans/ai-mission-key-grant.md
index 98001bf17..408ac0fa9 100644
--- a/docs/plans/ai-mission-key-grant.md
+++ b/docs/plans/ai-mission-key-grant.md
@@ -85,10 +85,11 @@ Ziel: User kann Grant geben/zurückziehen, UX ist ehrlich.
 
 ### Phase 4 — Rollout (1–2 Tage)
 
-- [ ] **Feature-Flag**: `PUBLIC_AI_MISSION_GRANTS=false` default. Dogfood zuerst (till only), dann beta-tier, dann alpha.
-- [ ] **Status-Page**: blackbox-probe auf `mana-ai` `/health` existiert schon; zusätzlich Alerting auf `mana_ai_grant_scope_violations_total > 0` (darf nie vorkommen).
-- [ ] **Runbook**: Was tun wenn `MANA_AI_PRIVATE_KEY` leaked? → Keypair rotieren, alle Grants invalidieren (simples `UPDATE aiMissions SET grant=null`), User bekommen Re-Consent-Prompts.
-- [ ] **Docs-Update**: [`apps/docs/src/content/docs/architecture/security.mdx`](../../apps/docs/src/content/docs/architecture/security.mdx) — neuer Abschnitt "AI Mission Grants".
+- [x] **Feature-Flag**: `PUBLIC_AI_MISSION_GRANTS=false` default — Dialog + Audit-Tab sind gegated. Dogfood zuerst (till only), dann beta-tier, dann alpha.
+- [x] **Alerting**: `ManaAIGrantScopeViolation` (critical, any increment), `ManaAIGrantSkipsHigh` (warning, non-expired skips), `ManaAIPlannerParseFailures` in `docker/prometheus/alerts.yml`. Status-Page blackbox-probe auf `/health` laeuft bereits.
+- [x] **Runbook**: Keypair-initial + Keypair-Leak-Prozedur + Scope-Violation-Response weiter unten in diesem Dokument.
+- [x] **Docs-Update**: [`apps/docs/src/content/docs/architecture/security.mdx`](../../apps/docs/src/content/docs/architecture/security.mdx) — Abschnitt "AI Mission Grants" inkl. erweiterter Threat-Model-Zeilen.
+- [ ] **Keypair tatsaechlich erzeugen** auf Mac-Mini + in Secrets ablegen (nicht in diesem Repo — out-of-band).
 
 ---
 
@@ -131,6 +132,62 @@ Ziel: User kann Grant geben/zurückziehen, UX ist ehrlich.
 
 ---
 
+## Runbook
+
+### Keypair initial erzeugen (einmalig pro Deployment)
+
+```bash
+# Auf dem Mac-Mini (oder einer sicheren Arbeitsumgebung):
+openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out mana-ai.priv.pem
+openssl pkey -in mana-ai.priv.pem -pubout -out mana-ai.pub.pem
+
+# Als Env-Vars exportieren (Docker-Compose env_file / secrets):
+#   MANA_AI_PRIVATE_KEY_PEM  → mana-ai  (niemals ausserhalb des Services!)
+#   MANA_AI_PUBLIC_KEY_PEM   → mana-auth
+
+# Dann im Webapp-Build:
+#   PUBLIC_AI_MISSION_GRANTS=true  (Dialog + Audit-Tab aktivieren)
+```
+
+Beide Services loggen beim Boot ob das Feature aktiv ist; `GET /health`-Status aendert sich nicht.
+
+### "Was tun wenn `MANA_AI_PRIVATE_KEY_PEM` leaked?"
+
+Der Private-Key ist das einzige Geheimnis, das alle aktiven Grants entschluesseln kann. Leakt er, kann ein Angreifer **im Besitz des verschluesselten Grant-Blobs + der verschluesselten Records** den Plaintext rekonstruieren. Ohne die verschluesselten Records allein bringt der Key nichts — aber das ist eine duenne Grenze; im Zweifel: rotieren.
+
+Prozedur:
+
+1. **Neues Keypair erzeugen** (siehe oben). Unter keinen Umstaenden das alte wiederverwenden.
+2. **`MANA_AI_PRIVATE_KEY_PEM`** auf `mana-ai` austauschen → Service neustarten. Alle bestehenden Grants unwrappen ab jetzt mit `wrap-rejected` (neuer Private-Key passt nicht zum alten Wrap).
+3. **`MANA_AI_PUBLIC_KEY_PEM`** auf `mana-auth` austauschen → Service neustarten.
+4. **Alle bestehenden Grants invalidieren** — die sind mit dem alten Public-Key gewrappt und funktionslos. Im Postgres:
+   ```sql
+   UPDATE aiMissions SET grant = NULL
+   WHERE user_id = '<jeder>' AND grant IS NOT NULL;
+   ```
+   (Im Mana-Modell lebt das als `sync_changes`-Row auf `appId='ai'/table='aiMissions'`; einfacher ist eine leise Migration im `mana-sync` Admin-Backend.)
+5. **Audit-Trail** dokumentieren: Zeitpunkt Leak entdeckt / Keys getauscht / Grants invalidiert. Post-Mortem in `docs/postmortems/`.
+6. **User benachrichtigen**: Missions bleiben aktiv, laufen aber nur noch im Vordergrund bis der User den Zugriff erneut erteilt. Das ist nach Plan; Re-Consent-Prompt erscheint automatisch beim naechsten Mission-Edit.
+7. **Monitoring pruefen**: `mana_ai_grant_skips_total{reason="wrap-rejected"}` muss nach Schritt 2 kurz hoch gehen (alte Grants) und dann zurueck auf 0 sobald alle via Schritt 4 entfernt sind.
+
+### Scope-Violation Alarm reagiert
+
+Prometheus-Alert `ManaAIGrantScopeViolation` (critical, see `docker/prometheus/alerts.yml`) feuert bei `mana_ai_grant_scope_violations_total > 0`. Steady-State muss 0 sein — jede Zuendung ist entweder Bug oder Angriff.
+
+1. Letzte Scope-Violations auslesen:
+   ```sql
+   SELECT * FROM mana_ai.decrypt_audit
+   WHERE status = 'scope-violation'
+   ORDER BY ts DESC LIMIT 20;
+   ```
+2. `record_id` pruefen: gehoert die Record tatsaechlich zum User? Falls nein → kompromittierte Mission-Grant-Erzeugung, Nutzer sperren.
+3. Falls ja: Resolver-Bug. `services/mana-ai/src/db/resolvers/encrypted.ts` checken — die HKDF-Bindung sollte der Check eigentlich ueberfluessig machen. Wenn der Runtime-Check greift, stimmt etwas in der Derivation nicht.
+4. Mission temporaer pausieren:
+   ```sql
+   UPDATE aiMissions SET state = 'paused', grant = NULL
+   WHERE id = '<missionId>';
+   ```
+
 ## Nicht-Ziele
 
 - **Zero-Knowledge-User bekommen das nicht.** Die bleiben beim Foreground-Runner. Wenn sie Autonomie wollen, müssen sie ZK abschalten — das ist die Entscheidung die ZK bedeutet.