feat(ai): Mission Grant rollout gating — flag, alerts, runbook, user docs

Phase 4 — everything needed to flip the Mission Key-Grant feature on
safely per deployment. No new behaviour; purely operational plumbing.

- PUBLIC_AI_MISSION_GRANTS feature flag (default off). hooks.server.ts
  injects window.__PUBLIC_AI_MISSION_GRANTS__, api/config.ts exposes
  isMissionGrantsEnabled(). Grant UI (dialog + status box) and the
  Workbench "Datenzugriff" tab both hide when the flag is off.
- PUBLIC_MANA_AI_URL added to the injection set so the webapp can reach
  the new audit endpoint from production.
- Prometheus alerts (new mana_ai_alerts group):
  - ManaAIServiceDown (warning, 2m)
  - ManaAIGrantScopeViolation (critical, 0m) — MUST stay at 0; any
    increment pages immediately
  - ManaAIGrantSkipsHigh (warning, 15m) — flags keypair drift
  - ManaAIPlannerParseFailures (warning, 10m) — prompt/LLM drift
- Runbook in docs/plans/ai-mission-key-grant.md: initial keypair gen,
  leak-response procedure (rotate + invalidate all grants + audit),
  scope-violation triage.
- User-facing doc in apps/docs security.mdx: new "AI Mission Grants"
  section with the three hard constraints (ZK users blocked, scope
  changes invalidate cryptographically, revocation is one click) plus
  an honest threat-model comparison column showing where grants shift
  the tradeoff.

Rollout remaining (not code): generate keypair on Mac Mini, provision
MANA_AI_PRIVATE_KEY_PEM + MANA_AI_PUBLIC_KEY_PEM via Docker secrets,
flip PUBLIC_AI_MISSION_GRANTS=true starting with till-only.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-15 14:02:47 +02:00
parent 74bbfda212
commit bb3da78d5c
7 changed files with 204 additions and 15 deletions

View file

@ -184,6 +184,58 @@ Each row carries the IP address, user-agent, HTTP status code, and a free-form c
| User loses recovery code | n/a | ❌ Data lost |
| User loses password but vault is in ZK mode | Recovery via password reset | ❌ Data lost (vault is keyed to recovery code) |
## AI Mission Grants (opt-in, per mission)
By default, AI missions that depend on encrypted data (notes, tasks,
calendar events, journal entries, your Kontext document) run **only
when your browser tab is open** — the background runner on our server
sees ciphertext and physically cannot read them.
Some missions are more useful when they run continuously, even while
you're offline. For those, you can opt in — per mission, not globally
— to a **Mission Key-Grant**. Here is exactly what that does:
1. Your browser derives a fresh key that is bound to:
- The mission's ID.
- The specific table names referenced.
- The specific record IDs referenced.
2. The derived key is wrapped with the mana-ai service's public key
and attached to the mission record.
3. When the mana-ai runner ticks for that mission, it unwraps the
key in memory, decrypts **only the allowlisted records**, plans
the next iteration, and forgets the key at the end of the tick.
4. Every decrypt is logged. You see the full log under **Workbench
→ Datenzugriff**.
Hard constraints — enforced by the code, not by policy:
- **Zero-knowledge users cannot issue grants.** The mana-auth server
has no usable master key in ZK mode; the endpoint refuses.
- **Scope changes invalidate the key cryptographically.** Add a new
record to a mission → the derived key is different → the existing
grant stops working → you're prompted to re-consent. It is not
possible for the runner to "silently expand" its scope.
- **Grants expire.** Default lifetime is 7 days, renewed on every
successful run. Missions that go idle lose their grant automatically;
you re-consent on the next edit.
- **Revocation is one click.** The lock icon in the Workbench removes
the grant; the mission keeps its history but stops running
server-side until you re-grant.
- **The runner never writes under a grant** — it only reads. All
changes still go through the normal proposal-approve flow you
control.
| Threat | Standard | With a Mission Grant | Zero-Knowledge |
|--------|----------|----------------------|----------------|
| Mana operator reads an unrelated record of the same user | ⚠️ Could decrypt with KEK | ✅ Cannot — key is scoped | ✅ Cannot |
| Mana operator reads the granted records of the grant-enabled mission | ⚠️ Could decrypt with KEK | ⚠️ Could decrypt with the grant key + record ciphertext | ✅ Cannot |
| Court order against Mana for the granted-mission records | ⚠️ Could be compelled | ⚠️ Could be compelled (while grant is active) | ✅ Mana physically cannot comply |
| Runner RAM-dump during the 60s tick | ⚠️ n/a | ⚠️ Could expose the grant key for one tick window | ✅ n/a |
The tradeoff is deliberate: you exchange a small, scoped privacy
reduction for autonomy on one mission. Missions without a grant keep
the full standard / ZK guarantees.
## Implementation references
For the architectural deep dive, code locations, and the complete rollout history (Phases 19 + the backlog sweep), see [`DATA_LAYER_AUDIT.md`](https://github.com/mana-how/mana-monorepo/blob/main/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md).

View file

@ -47,6 +47,12 @@ const PUBLIC_MANA_API_URL_CLIENT =
process.env.PUBLIC_MANA_API_URL_CLIENT || process.env.PUBLIC_MANA_API_URL || '';
const PUBLIC_MANA_CREDITS_URL_CLIENT =
process.env.PUBLIC_MANA_CREDITS_URL_CLIENT || process.env.PUBLIC_MANA_CREDITS_URL || '';
const PUBLIC_MANA_AI_URL_CLIENT =
process.env.PUBLIC_MANA_AI_URL_CLIENT || process.env.PUBLIC_MANA_AI_URL || '';
// Feature flag for the Mission Key-Grant UI (server-side execution of
// encrypted missions). Default off — flip to 'true' per deployment once
// the MANA_AI_PUBLIC/PRIVATE_KEY_PEM pair is provisioned on both services.
const PUBLIC_AI_MISSION_GRANTS = process.env.PUBLIC_AI_MISSION_GRANTS === 'true' ? 'true' : 'false';
// Map of app subdomains to internal paths
const APP_SUBDOMAINS = new Set([
@ -126,6 +132,8 @@ window.__PUBLIC_MANA_LLM_URL__ = ${JSON.stringify(PUBLIC_MANA_LLM_URL_CLIENT)};
window.__PUBLIC_MANA_EVENTS_URL__ = ${JSON.stringify(PUBLIC_MANA_EVENTS_URL_CLIENT)};
window.__PUBLIC_MANA_API_URL__ = ${JSON.stringify(PUBLIC_MANA_API_URL_CLIENT)};
window.__PUBLIC_MANA_CREDITS_URL__ = ${JSON.stringify(PUBLIC_MANA_CREDITS_URL_CLIENT)};
window.__PUBLIC_MANA_AI_URL__ = ${JSON.stringify(PUBLIC_MANA_AI_URL_CLIENT)};
window.__PUBLIC_AI_MISSION_GRANTS__ = ${JSON.stringify(PUBLIC_AI_MISSION_GRANTS)};
window.__PUBLIC_GLITCHTIP_DSN__ = ${JSON.stringify(PUBLIC_GLITCHTIP_DSN)};
</script>`;
return injectUmamiAnalytics(html.replace('<head>', `<head>${envScript}`));

View file

@ -74,6 +74,22 @@ export function getManaAiUrl(): string {
return process.env.PUBLIC_MANA_AI_URL || 'http://localhost:3066';
}
/**
* Feature flag for the AI Mission Key-Grant UI. When false, the consent
* dialog + "Server-Zugriff" box are hidden even on missions with
* encrypted inputs missions simply stay foreground-only. Flip on per-
* deployment after the MANA_AI_PUBLIC/PRIVATE_KEY_PEM keypair is
* provisioned on both mana-auth and mana-ai.
*/
export function isMissionGrantsEnabled(): boolean {
if (browser && typeof window !== 'undefined') {
const flag = (window as unknown as { __PUBLIC_AI_MISSION_GRANTS__?: string })
.__PUBLIC_AI_MISSION_GRANTS__;
return flag === 'true';
}
return process.env.PUBLIC_AI_MISSION_GRANTS === 'true';
}
/**
* Get the mana-mail service URL.
* Hosts mail threads, send, labels, accounts.

View file

@ -19,6 +19,7 @@
import { productionDeps } from '$lib/data/ai/missions/setup';
import MissionInputPicker from '$lib/components/ai/MissionInputPicker.svelte';
import MissionGrantDialog from '$lib/components/ai/MissionGrantDialog.svelte';
import { isMissionGrantsEnabled } from '$lib/api/config';
import type { Mission, MissionCadence, MissionInputRef } from '$lib/data/ai/missions/types';
const missions = $derived(useMissions());
@ -106,6 +107,7 @@
function hasEncryptedInputs(m: Mission): boolean {
return m.inputs.some((i) => ENCRYPTED_SERVER_TABLES.has(i.table));
}
const grantsEnabled = $derived(isMissionGrantsEnabled());
function grantStatus(m: Mission): 'none' | 'active' | 'expired' {
if (!m.grant) return 'none';
return Date.parse(m.grant.expiresAt) < Date.now() ? 'expired' : 'active';
@ -305,7 +307,7 @@
</details>
{/if}
{#if hasEncryptedInputs(selected)}
{#if grantsEnabled && hasEncryptedInputs(selected)}
<section class="grant-box">
<div class="grant-head">
<span class="grant-title">🔑 Server-Zugriff</span>

View file

@ -8,6 +8,7 @@
import { useMissions } from '$lib/data/ai/missions/queries';
import { revertIteration } from '$lib/data/ai/revert/revert-iteration';
import { fetchDecryptAudit, type AuditRow } from '$lib/data/ai/audit/queries';
import { isMissionGrantsEnabled } from '$lib/api/config';
import type { DomainEvent } from '$lib/data/events/types';
let moduleFilter = $state<string | null>(null);
@ -41,6 +42,7 @@
}
// ── Tab switcher: timeline ↔ decrypt audit ─────────────
const grantsEnabled = $derived(isMissionGrantsEnabled());
let tab = $state<'timeline' | 'audit'>('timeline');
let auditRows = $state<AuditRow[]>([]);
let auditLoading = $state(false);
@ -110,6 +112,7 @@
>
Timeline
</button>
{#if grantsEnabled}
<button
type="button"
role="tab"
@ -120,6 +123,7 @@
>
Datenzugriff
</button>
{/if}
</div>
<div class="filters">

View file

@ -465,3 +465,53 @@ groups:
annotations:
summary: "LLM responses are slow"
description: "LLM p95 latency is {{ $value | humanizeDuration }}."
- name: mana_ai_alerts
rules:
# mana-ai background runner down
- alert: ManaAIServiceDown
expr: up{job="mana-ai"} == 0
for: 2m
labels:
severity: warning
annotations:
summary: "mana-ai background runner is down"
description: "mana-ai has been down for 2+ minutes. Missions fall back to the browser-only Runner — users with closed tabs stop receiving proposals."
# Grant scope violation — MUST remain at 0 in steady state.
# Any increment is a serious signal: either a runtime bug bypassed
# the cryptographic scope binding, or a compromised service tried
# to decrypt outside its allowlist. Page on first occurrence.
- alert: ManaAIGrantScopeViolation
expr: increase(mana_ai_grant_scope_violations_total[5m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "mana-ai Mission Grant scope violation detected"
description: "mana-ai attempted to decrypt a record outside a Mission Grant's allowlist on table {{ $labels.table }}. Steady-state value MUST be 0. Investigate: (1) look for a resolver bug on the named table, (2) check recent grant issuance, (3) dump the most recent rows from mana_ai.decrypt_audit WHERE status='scope-violation'."
# Chronic grant failures — expired TTLs are fine, but a flood of
# wrap-rejected / malformed / not-configured means the keypair is
# misconfigured or rotated without re-consent.
- alert: ManaAIGrantSkipsHigh
expr: |
sum(rate(mana_ai_grant_skips_total{reason!="expired"}[15m])) > 0.1
for: 15m
labels:
severity: warning
annotations:
summary: "mana-ai grant skips trending high ({{ $labels.reason }})"
description: "mana-ai is skipping grants at {{ $value | humanize }}/s with reason={{ $labels.reason }}. Likely causes: MANA_AI_PRIVATE_KEY_PEM mis-set, keypair out of sync with mana-auth's public key, or client producing malformed grants."
# Planner parse failures — too many means the prompt / LLM drifted.
- alert: ManaAIPlannerParseFailures
expr: |
sum(rate(mana_ai_parse_failures_total[10m]))
/ (sum(rate(mana_ai_plans_produced_total[10m])) + sum(rate(mana_ai_parse_failures_total[10m])) + 0.0001) > 0.2
for: 10m
labels:
severity: warning
annotations:
summary: "mana-ai planner parse-failure rate high"
description: "{{ $value | humanizePercentage }} of Planner responses failed to parse — prompt drift or LLM degradation likely."

View file

@ -85,10 +85,11 @@ Ziel: User kann Grant geben/zurückziehen, UX ist ehrlich.
### Phase 4 — Rollout (12 Tage)
- [ ] **Feature-Flag**: `PUBLIC_AI_MISSION_GRANTS=false` default. Dogfood zuerst (till only), dann beta-tier, dann alpha.
- [ ] **Status-Page**: blackbox-probe auf `mana-ai` `/health` existiert schon; zusätzlich Alerting auf `mana_ai_grant_scope_violations_total > 0` (darf nie vorkommen).
- [ ] **Runbook**: Was tun wenn `MANA_AI_PRIVATE_KEY` leaked? → Keypair rotieren, alle Grants invalidieren (simples `UPDATE aiMissions SET grant=null`), User bekommen Re-Consent-Prompts.
- [ ] **Docs-Update**: [`apps/docs/src/content/docs/architecture/security.mdx`](../../apps/docs/src/content/docs/architecture/security.mdx) — neuer Abschnitt "AI Mission Grants".
- [x] **Feature-Flag**: `PUBLIC_AI_MISSION_GRANTS=false` default — Dialog + Audit-Tab sind gegated. Dogfood zuerst (till only), dann beta-tier, dann alpha.
- [x] **Alerting**: `ManaAIGrantScopeViolation` (critical, any increment), `ManaAIGrantSkipsHigh` (warning, non-expired skips), `ManaAIPlannerParseFailures` in `docker/prometheus/alerts.yml`. Status-Page blackbox-probe auf `/health` laeuft bereits.
- [x] **Runbook**: Keypair-initial + Keypair-Leak-Prozedur + Scope-Violation-Response weiter unten in diesem Dokument.
- [x] **Docs-Update**: [`apps/docs/src/content/docs/architecture/security.mdx`](../../apps/docs/src/content/docs/architecture/security.mdx) — Abschnitt "AI Mission Grants" inkl. erweiterter Threat-Model-Zeilen.
- [ ] **Keypair tatsaechlich erzeugen** auf Mac-Mini + in Secrets ablegen (nicht in diesem Repo — out-of-band).
---
@ -131,6 +132,62 @@ Ziel: User kann Grant geben/zurückziehen, UX ist ehrlich.
---
## Runbook
### Keypair initial erzeugen (einmalig pro Deployment)
```bash
# Auf dem Mac-Mini (oder einer sicheren Arbeitsumgebung):
openssl genpkey -algorithm RSA -pkeyopt rsa_keygen_bits:2048 -out mana-ai.priv.pem
openssl pkey -in mana-ai.priv.pem -pubout -out mana-ai.pub.pem
# Als Env-Vars exportieren (Docker-Compose env_file / secrets):
# MANA_AI_PRIVATE_KEY_PEM → mana-ai (niemals ausserhalb des Services!)
# MANA_AI_PUBLIC_KEY_PEM → mana-auth
# Dann im Webapp-Build:
# PUBLIC_AI_MISSION_GRANTS=true (Dialog + Audit-Tab aktivieren)
```
Beide Services loggen beim Boot ob das Feature aktiv ist; `GET /health`-Status aendert sich nicht.
### "Was tun wenn `MANA_AI_PRIVATE_KEY_PEM` leaked?"
Der Private-Key ist das einzige Geheimnis, das alle aktiven Grants entschluesseln kann. Leakt er, kann ein Angreifer **im Besitz des verschluesselten Grant-Blobs + der verschluesselten Records** den Plaintext rekonstruieren. Ohne die verschluesselten Records allein bringt der Key nichts — aber das ist eine duenne Grenze; im Zweifel: rotieren.
Prozedur:
1. **Neues Keypair erzeugen** (siehe oben). Unter keinen Umstaenden das alte wiederverwenden.
2. **`MANA_AI_PRIVATE_KEY_PEM`** auf `mana-ai` austauschen → Service neustarten. Alle bestehenden Grants unwrappen ab jetzt mit `wrap-rejected` (neuer Private-Key passt nicht zum alten Wrap).
3. **`MANA_AI_PUBLIC_KEY_PEM`** auf `mana-auth` austauschen → Service neustarten.
4. **Alle bestehenden Grants invalidieren** — die sind mit dem alten Public-Key gewrappt und funktionslos. Im Postgres:
```sql
UPDATE aiMissions SET grant = NULL
WHERE user_id = '<jeder>' AND grant IS NOT NULL;
```
(Im Mana-Modell lebt das als `sync_changes`-Row auf `appId='ai'/table='aiMissions'`; einfacher ist eine leise Migration im `mana-sync` Admin-Backend.)
5. **Audit-Trail** dokumentieren: Zeitpunkt Leak entdeckt / Keys getauscht / Grants invalidiert. Post-Mortem in `docs/postmortems/`.
6. **User benachrichtigen**: Missions bleiben aktiv, laufen aber nur noch im Vordergrund bis der User den Zugriff erneut erteilt. Das ist nach Plan; Re-Consent-Prompt erscheint automatisch beim naechsten Mission-Edit.
7. **Monitoring pruefen**: `mana_ai_grant_skips_total{reason="wrap-rejected"}` muss nach Schritt 2 kurz hoch gehen (alte Grants) und dann zurueck auf 0 sobald alle via Schritt 4 entfernt sind.
### Scope-Violation Alarm reagiert
Prometheus-Alert `ManaAIGrantScopeViolation` (critical, see `docker/prometheus/alerts.yml`) feuert bei `mana_ai_grant_scope_violations_total > 0`. Steady-State muss 0 sein — jede Zuendung ist entweder Bug oder Angriff.
1. Letzte Scope-Violations auslesen:
```sql
SELECT * FROM mana_ai.decrypt_audit
WHERE status = 'scope-violation'
ORDER BY ts DESC LIMIT 20;
```
2. `record_id` pruefen: gehoert die Record tatsaechlich zum User? Falls nein → kompromittierte Mission-Grant-Erzeugung, Nutzer sperren.
3. Falls ja: Resolver-Bug. `services/mana-ai/src/db/resolvers/encrypted.ts` checken — die HKDF-Bindung sollte der Check eigentlich ueberfluessig machen. Wenn der Runtime-Check greift, stimmt etwas in der Derivation nicht.
4. Mission temporaer pausieren:
```sql
UPDATE aiMissions SET state = 'paused', grant = NULL
WHERE id = '<missionId>';
```
## Nicht-Ziele
- **Zero-Knowledge-User bekommen das nicht.** Die bleiben beim Foreground-Runner. Wenn sie Autonomie wollen, müssen sie ZK abschalten — das ist die Entscheidung die ZK bedeutet.