From 4f3343560725ffb5e98b78267c55e538ab83aa89 Mon Sep 17 00:00:00 2001 From: Till JS Date: Tue, 14 Apr 2026 17:48:47 +0200 Subject: [PATCH] docs(sync): document backup/restore pipeline + stability contract MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - DATA_LAYER_AUDIT.md: new section 8 covering the export/import flow end-to-end — architecture diagram, .mana format, protocol-stability commitments we locked in pre-launch (eventId + schemaVersion + op vocab + tombstones-forever), encryption-boundary argument, file map, and the remaining backup backlog (M4b, M5, signature, resumable download, dedup table). - services/mana-sync/CLAUDE.md: /backup/export row in API table with explicit note that it sits outside the billing gate, new Backup / Restore section with format sketch + split between writer.go (pure) and handler.go (shim), test-coverage line mentions the backup cases, project-structure tree lists backup/*.go, Security section mentions RLS still applies to the export path. No code changes. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../apps/web/src/lib/data/DATA_LAYER_AUDIT.md | 100 +++++++++++++++++- services/mana-sync/CLAUDE.md | 28 ++++- 2 files changed, 121 insertions(+), 7 deletions(-) diff --git a/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md b/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md index 8e1fb9f08..8c1625508 100644 --- a/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md +++ b/apps/mana/apps/web/src/lib/data/DATA_LAYER_AUDIT.md @@ -34,7 +34,7 @@ | 2 | mana-auth Server Vault: encryption_vaults + RLS + KEK + 11 Tests | ✅ | `e9915428c` | | 3 | Client Wire-up: vault-client, record-helpers, layout integration | ✅ | `354cbcb17` | | 4 | Pilot: notes table mit 8 End-to-End Tests | ✅ | `bed08a1aa` | -| 5 | Rollout: chat, dreams, memoro, contacts, cycles, finance | ✅ | `af92720a6` | +| 5 | Rollout: chat, dreams, memoro, contacts, period, finance | ✅ | `af92720a6` | | 6.1 | Rollout: cards, presi, inventar, plants | ✅ | `73f294b29` | | 6.2 + 6.3 | Settings UI (`/settings/security`) + Encryption Intro Banner | ✅ | `6b8e2c717` | | Roundup | DATA_LAYER_AUDIT roll-up vor Phase 7 | ✅ | `4bdf4238c` | @@ -388,8 +388,8 @@ Unlock-Flow (Login auf neuem Gerät): | memoro | `memos` | `title`, `intro`, `transcript` | 5 | | | `memories` | `title`, `content` | 5 | | contacts | `contacts` | 16 PII-Felder (firstName, lastName, email, phone, mobile, birthday, address, social, ...) | 5 | -| cycles | `cycles` | `notes` | 5 | -| | `cycleDayLogs` | `notes`, `mood` (symptoms plaintext für Set-Diffs) | 5 | +| period | `period` | `notes` | 5 | +| | `periodDayLogs` | `notes`, `mood` (symptoms plaintext für Set-Diffs) | 5 | | finance | `transactions` | `description`, `note` | 5 | | cards | `cards` | `front`, `back` | 6.1 | | | `cardDecks` | `name`, `description` | 6.1 | @@ -422,7 +422,7 @@ Bestimmte Felder bleiben absichtlich im Klartext, weil sie strukturell gebraucht - **`links.originalUrl`** — Public-Redirect-Handler löst `shortCode → 302` ohne async Decrypt auf - **`socialEvents` veröffentlicht** — Beim Publish wird die Local-Row decrypted und als Plaintext in den Server-Snapshot gepusht (per Design: shareable RSVP-Page anstatt Confidentiality) - **`dreamSymbols.name`** — Wird als unique Lookup-Key in `where('name').equals(...)` benutzt -- **`cycleDayLogs.symptoms`** — String-Array, das per Set-Diff in `dayLogsStore.logDay` abgeglichen wird +- **`periodDayLogs.symptoms`** — String-Array, das per Set-Diff in `dayLogsStore.logDay` abgeglichen wird - **`plants.healthStatus`, `meals.nutrition`** — Strukturierte Browsing-/Aggregations-Felder - **`files.name` / `images.prompt`** — Zwar im Dexie-Schema indexed, aber kein `.where()`-Call-Site benutzt sie; Encryption ist sicher, der Index wird nur ein No-Op für Content-Lookups @@ -495,8 +495,98 @@ Pre-existing Test-Failures (nicht von dieser Audit-Arbeit verursacht): - Lazy Sync für selten genutzte Module (Connection Limits geschont) - Vollständiger Offline-Support inkl. Online-Resume - SSE bevorzugt, Polling als Fallback (mit pipelined parser) -- Saubere Trennung Detection (`quota-detect.ts`) vs. db-aware Helpers (`quota.ts`) → keine Import-Cycles +- Saubere Trennung Detection (`quota-detect.ts`) vs. db-aware Helpers (`quota.ts`) → keine Import-Period - Encryption-Boundary lebt in dedicated `crypto/` Sub-Modul, völlig entkoppelt vom Sync-Layer - Vault-Singleton via `vault-instance.ts` — Layout + Settings + zukünftige UI teilen sich denselben State Die Datenschicht ist jetzt **production-grade** in den Dimensionen Korrektheit, Sicherheit, **Vertraulichkeit** (inkl. optionaler **Zero-Knowledge-Modus**), Robustheit, Beobachtbarkeit, Performance und Testabdeckung. + +## 8. Backup & Restore (Sync-Stream-Export) + +Der Sync-Event-Log ist bereits eine saubere, LWW-geordnete, schema-versionierte Serialisierung aller Nutzerdaten — also nutzen wir ihn als Backup-Format statt eine zweite parallele Serializer-Schicht zu bauen. + +### Architektur — eine Datei, beide Richtungen + +``` +EXPORT IMPORT +──────────────────────────────────────────── ──────────────────────────────────────────── +mana-sync DB .mana (ZIP) + └─ sync_changes WHERE user_id = $1 ├─ events.jsonl ──┐ + │ └─ manifest.json │ parseBackup() + ▼ ▼ + WriteBackup(w, userID, createdAt, iter) authStore.user.id match? ┐ + │ streams eventsSha256 match? │ validate + ├─ events.jsonl (JSON Lines) schemaVersionMax ≤ client?┘ + └─ manifest.json │ + ▼ + iterateEvents() → toSyncChange() + │ + ▼ + applyServerChanges(appId, batch) + │ (batches of 300) + ▼ + IndexedDB (via Dexie hooks, suppressed) +``` + +Same-Account-Restore funktioniert ohne Server-Roundtrip: Events liegen schon auf mana-sync, LWW würde sowieso dedupen. Cross-Account-Migration (anderer User auf neuem Gerät) braucht den MK-Transfer-Pfad — siehe Backlog. + +### `.mana`-Dateiformat (Version 1) + +ZIP-Archiv mit genau zwei Einträgen, beide DEFLATE-komprimiert: + +| Entry | Inhalt | +| --------------- | ---------------------------------------------------------------------------------------------------------------------------------- | +| `events.jsonl` | Eine JSON-Zeile pro `sync_changes`-Row, chronologisch | +| `manifest.json` | Header mit `formatVersion`, `schemaVersion`, `userId`, `eventCount`, `eventsSha256`, `apps[]`, `createdAt`, `schemaVersionMin/Max` | + +**Event-Zeile**: + +```json +{"eventId":"uuid","schemaVersion":1,"appId":"todo","table":"tasks","id":"task-1","op":"update","data":{...},"fieldTimestamps":{...},"clientId":"...","createdAt":"2026-..."} +``` + +Verschlüsselte Felder bleiben Ciphertext — die `.mana`-Datei ist für die 27 Encryption-Registry-Tabellen **at-rest verschlüsselt**. Plaintext-Felder (IDs, Sort-Keys, Timestamps) stehen lesbar drin (GDPR-Portabilitäts-Anspruch). + +### Protokoll-Stability-Contract (M2, pre-launch gehärtet) + +Ab v1 sind diese Felder unveränderlich im Event-Shape: + +- `eventId: uuid` — stabiler Primary-Key, client-seitiger Dedup +- `schemaVersion: number` — ermöglicht Migration-Chain für künftige Protokoll-Änderungen +- `op: "insert" | "update" | "delete"` — Vokabular eingefroren +- `fields` = kanonisch für LWW-Merges, `data` = Snapshot-only für Inserts +- Tombstones (Deletes) bleiben für immer in `sync_changes` — sonst kein vollständiges Backup + +**Pre-M2-Clients** (kein `schemaVersion` auf dem Wire) werden server-seitig auf v1 geklemmt. Ein Client mit `schemaVersion > MaxSupported` wird mit 400 abgelehnt. + +### Encryption-Boundary bleibt intakt + +Der Backup-Pfad **berührt nie Plaintext**: + +1. Feld-Level-Ciphertext liegt bereits verschlüsselt in `sync_changes.data` +2. `WriteBackup` liest Bytes 1:1 und streamt sie in den ZIP +3. Import-Seite ruft `applyServerChanges()` — das gleiche Pfad, den Live-Sync benutzt — was in IndexedDB landet, fließt durch den normalen `decryptRecords()`-Pfad beim Lesen, nicht beim Schreiben + +Zero-Knowledge-User: bis zum MK-Transfer-Pfad (M5) können sie sich selbst restoren (gleicher Account, gleicher Recovery-Code schon aktiv) — aber kein Account-Wechsel ohne Recovery-Code. + +### Dateien + +| Pfad | Rolle | +| ------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | +| `services/mana-sync/internal/backup/writer.go` | Pure `WriteBackup()` — streaming ZIP + sha256-Tee | +| `services/mana-sync/internal/backup/handler.go` | HTTP-Shim für `GET /backup/export` (auth-only, kein billing-gate) | +| `services/mana-sync/internal/backup/writer_test.go` | 4 Go-Tests (Round-Trip, empty, legacy-v0-clamping) | +| `services/mana-sync/internal/store/postgres.go` | `StreamAllUserChanges()` — cursor-freier Stream über alle Events eines Users, RLS-scoped | +| `apps/mana/apps/web/src/lib/data/backup/format.ts` | Hand-gerollter ZIP-Parser + sha256-Recompute (nutzt `pako` für Inflate) | +| `apps/mana/apps/web/src/lib/data/backup/import.ts` | Replay-Logik: validate → iterate → batch → `applyServerChanges` | +| `apps/mana/apps/web/src/lib/data/backup/format.test.ts` | 8 Vitest-Tests für den Parser (synthetische PKZIP-Bytes) | +| `apps/mana/apps/web/src/lib/api/services/backup.ts` | Browser-seitiger Download-Helper | +| `apps/mana/apps/web/src/routes/(app)/settings/my-data/+page.svelte` | UI: Download + File-Picker + Progress | + +### Offene Punkte (Backup-Backlog) + +- **M5 (Cross-Account-Restore)**: `manifest.encryption.mkWrap` mit KEK-wrapped MK befüllen; neuer `POST /me/vault/import-mk` in `mana-auth`; Zero-Knowledge-Pfad via Recovery-Code-Eingabe beim Import +- **M4b (Bulk-Ingest-Endpoint)**: `POST /sync/{appId}/ingest` damit importierte Events auch server-seitig auf dem neuen Account landen (nur relevant bei Cross-Account) +- **Signatur**: Ed25519 über `manifest.json` gegen Tampering — heute nur sha256 über events.jsonl +- **Resumable Download**: Multi-GB-Accounts werden irgendwann fraglich im Browser +- **`_appliedEventIds` Dedup-Tabelle**: Performance-Optimierung für Re-Import (heute macht LWW den Dedup, aber wir verarbeiten trotzdem jedes Event) diff --git a/services/mana-sync/CLAUDE.md b/services/mana-sync/CLAUDE.md index 5fce1640e..565fb978b 100644 --- a/services/mana-sync/CLAUDE.md +++ b/services/mana-sync/CLAUDE.md @@ -132,10 +132,30 @@ Result: title="Buy eggs", completed=true (merged — different fields) | `GET /sync/{appId}/stream` | GET | JWT + Billing | SSE stream for real-time changes | | `GET /ws` | WS | JWT (in-band) | Unified real-time sync (all apps, one connection) | | `GET /ws/{appId}` | WS | JWT (in-band) | Legacy per-app sync notifications | +| `GET /backup/export` | GET | JWT only | **GDPR-grade full-account export** as `.mana` zip (see below) | | `GET /health` | GET | No | Health check with connection stats | | `GET /metrics` | GET | No | Prometheus metrics | -**Billing gate**: Push, pull, and stream endpoints are wrapped by a billing middleware that checks the user's sync subscription status via `mana-credits`. Returns **402 Payment Required** if sync is not active. Status is cached for 5 minutes per user. Fail-open: if mana-credits is unreachable, sync is allowed. +**Billing gate**: Push, pull, and stream endpoints are wrapped by a billing middleware that checks the user's sync subscription status via `mana-credits`. Returns **402 Payment Required** if sync is not active. Status is cached for 5 minutes per user. Fail-open: if mana-credits is unreachable, sync is allowed. **`/backup/export` is intentionally outside the billing gate** — GDPR data-portability must always be available. + +## Backup / Restore + +`GET /backup/export` streams a `.mana` archive (zip) with the user's full `sync_changes` log. Format: + +``` +mana-backup-{userId}-{YYYYMMDD-HHMMSS}.mana (application/zip) +├── events.jsonl — one SyncChange per line (chronological) +└── manifest.json — formatVersion, schemaVersion, userId, eventCount, + eventsSha256, apps[], createdAt, schemaVersionMin/Max +``` + +The zip is built in a single DB pass: `events.jsonl` is written via `io.MultiWriter(entry, sha256)` so the manifest's `eventsSha256` can be filled without a second scan. The client (web) parses the zip with a hand-rolled reader against `pako` deflate, validates `userId` match + sha256, then replays events through `applyServerChanges()` in 300-event batches per `appId`. + +Ciphertext (27 encrypted tables, client-side AES-GCM) passes through untouched — the archive is effectively encrypted at rest for sensitive fields. + +**Protocol stability (v1, pre-launch):** Once this ships, these event fields are append-only: `eventId`, `schemaVersion`, `op`, `fields` (LWW-canonical) / `data` (insert-snapshot). Tombstones stay in `sync_changes` forever so exports remain complete. + +**Split**: pure logic lives in `internal/backup/writer.go::WriteBackup(w, userID, createdAt, iter)`. The HTTP handler (`handler.go`) is a thin shim; tests use a slice-backed iterator so they run without Postgres. See `writer_test.go` (4 cases) + `apps/mana/apps/web/src/lib/data/backup/format.test.ts` (8 cases). ## Database Schema @@ -176,7 +196,7 @@ cd services/mana-sync go test ./... -v ``` -Test coverage: auth (JWT extraction, validator), config (env loading), sync (validation, serialization, LWW types). +Test coverage: auth (JWT extraction, validator), config (env loading), sync (validation, serialization, LWW types), backup (ZIP writer round-trip + legacy `schema_version=0` clamping + empty-export manifest). ## Project Structure @@ -186,6 +206,9 @@ services/mana-sync/ ├── internal/ │ ├── auth/jwt.go — EdDSA JWT validation via JWKS │ ├── auth/jwt_test.go — Token extraction, validator tests +│ ├── backup/writer.go — Pure ZIP writer for .mana archives (testable without DB) +│ ├── backup/writer_test.go — 4 cases: round-trip, empty, legacy schema_version=0 +│ ├── backup/handler.go — HTTP shim for GET /backup/export (auth-only) │ ├── billing/check.go — Sync billing status checker (cached, fail-open) │ ├── config/config.go — Environment variable loading │ ├── config/config_test.go — Config defaults and env override tests @@ -207,6 +230,7 @@ services/mana-sync/ - Operation types validated (insert/update/delete only) - Table and record IDs required on all changes - RecordChange failures abort the entire sync (no partial writes) +- `/backup/export` is auth-only by design (GDPR), but `StreamAllUserChanges` is RLS-scoped to the caller's `user_id` via the same `withUser()` transaction pattern as every other query — cross-user export is impossible at the DB layer ## Connected Apps (19)