diff --git a/docs/BACKUP_STRATEGY.md b/docs/BACKUP_STRATEGY.md new file mode 100644 index 000000000..a7990364d --- /dev/null +++ b/docs/BACKUP_STRATEGY.md @@ -0,0 +1,129 @@ +# BACKUP_STRATEGY — Postgres + Volumes + +**Stand 2026-05-13.** Wiederhergestellt und erweitert, nachdem der +LaunchD-Job 3 Monate stillgestanden hat (alter Pfad zeigte auf +`mana-monorepo/`, das nicht mehr existiert). + +## Heute: lokales Backup, kein Off-Site + +### Was läuft + +- **`com.mana.backup-databases`** LaunchD-Job auf mana-server +- **Skript:** `~/projects/managarten/scripts/mac-mini/backup-databases.sh` +- **Schedule:** täglich 03:00 (StartCalendarInterval) +- **Ziel:** `/Volumes/ManaData/backups/postgres/{daily,weekly}/` +- **Retention:** daily 7 Tage, weekly (Sonntag) 4 Wochen + +### Was wird gebackupt + +Alle Postgres-Container, die `*postgres*` matchen, ausgenommen +`*exporter*` und `mana-infra-postgres-backup`. Stand 2026-05-13: + +| Container | User | DB(s) | +|---|---|---| +| `mana-infra-postgres` | `postgres` | mana_platform, mana_sync, mana_admin, memoro (956 MB), forgejo, glitchtip, umami | +| `cards-postgres` | `cards` | cards | +| `manaspur-postgres` | `manaspur` | manaspur | +| `nutriphi-postgres` | `nutriphi` | nutriphi | +| `zitare-postgres` | `zitare` | zitare | +| `chorportal-prod-postgres` | `chorportal` | chorportal | + +Dump-Pattern: `${container}_${db}_${date}.sql.gz`. + +Total nach erstem Run 2026-05-13: **~45 GB** in `/Volumes/ManaData/backups/postgres`. + +### Was NICHT gebackupt wird (heute) + +- **MinIO-Objekte** (Cards-Media, mana-media, …) — getrennte Volumes +- **`/Volumes/ManaData/{cards,manaspur,…}/postgres`** auf File-Level — + pg_dump reicht für DBs, aber Disk-Bitrot würde damit nicht erfasst +- **Cloudflared-Tunnel-Credentials** unter `~/.cloudflared/` (kritisch!) +- **`~/secrets/`** (App-Service-Keys, Master-Keys-Klartexte) + +## Off-Site — heute noch nicht aktiv + +**Problem:** Mac Mini ist Single-Point-of-Failure. Defekte Disk, +Diebstahl, Brand → alle Backups weg, weil sie auf der **gleichen** +Disk liegen. + +**Aufgaben für die Off-Site-Strategie:** + +1. **Endpoint wählen:** + - **Cloudflare R2** (S3-kompatibel, Vereins-tauglich Preis, mana + hat schon Cloudflare-Account) ← Empfehlung + - **Hetzner Storage Box** (günstig, EU-Hoster, Verein-Datenschutz- + konform) + - **Wasabi / Backblaze B2** (günstig, US-Anbieter, DSGVO-fragwürdig) + - **eigener mana-server-2** (privater Off-Site-Mini, max Kontrolle) + +2. **Tool wählen:** + - **rclone** (Multi-Provider, läuft auf macOS) ← Empfehlung + - **restic** (Encryption + Dedup eingebaut, S3-fähig) + - **borg** (klassisch, Repo-orientiert) + +3. **Encryption:** Daten verlassen den Mac Mini — ohne Encryption-at- + transit-AND-at-rest verletzen wir die Vereins-Werte (Memoro/Cards/ + Manaspur enthalten User-Inhalte). Empfehlung: **rclone crypt** als + Wrapper, Key in `~/secrets/` und Off-Site-Recovery-Code im + `secret_offsite_backup_key`-Memory. + +### Vorgeschlagener Aufbau (zu implementieren) + +```bash +# rclone-Config mit verschlüsseltem Remote: +rclone config create r2-raw s3 provider Cloudflare \ + access_key_id secret_access_key \ + endpoint .r2.cloudflarestorage.com + +rclone config create r2-encrypted crypt remote r2-raw:mana-backups \ + password password2 +``` + +LaunchD-Job nach `backup-databases.sh`: + +```bash +# scripts/mac-mini/backup-sync-offsite.sh +rclone sync /Volumes/ManaData/backups/postgres r2-encrypted:postgres \ + --transfers 4 --checkers 8 --log-file /tmp/mana-backup-offsite.log +``` + +Cron alle 6h, separate plist `com.mana.backup-offsite.plist`. + +### Pre-Live-Gate für mana-Plattform + +Bevor manaspur-Endurance-User-Daten landen (siehe `manaspur-native/docs/ENDURANCE_TEST.md`): + +- [x] Local-Backup-Job wieder aktiv (2026-05-13) +- [ ] Off-Site-Endpoint provisioniert (R2-Bucket o. ä.) +- [ ] rclone + Encryption-Setup +- [ ] LaunchD-Job für Off-Site-Sync alle 6h +- [ ] Recovery-Probe: zufällige Daily-Backup-Datei herunterladen + + entschlüsseln + pg_restore-Trockenlauf gegen Test-DB + +## Recovery-Drill (Test, dass Backups wiederherstellbar sind) + +Pro Quartal: + +```bash +# Beispiel: cards-postgres aus Backup wiederherstellen +ssh mana-server +docker run --rm -d --name cards-postgres-restore-test \ + -e POSTGRES_PASSWORD=test -e POSTGRES_USER=cards -e POSTGRES_DB=cards \ + postgres:16-alpine + +gunzip -c /Volumes/ManaData/backups/postgres/daily/cards-postgres_cards_2026-05-13.sql.gz \ + | docker exec -i cards-postgres-restore-test psql -U cards -d cards + +# Probe-Query +docker exec cards-postgres-restore-test psql -U cards -d cards \ + -c "SELECT count(*) FROM cards.decks;" + +docker stop cards-postgres-restore-test +``` + +## Cross-Refs + +- `secret_offsite_backup_key.md` (Memory, kommt mit Off-Site-Setup) +- `mana/docs/PLAN.md` § Backup-Strategy +- `scripts/mac-mini/backup-databases.sh` — der eigentliche Code +- `~/Library/LaunchAgents/com.mana.backup-databases.plist` — LaunchD-Job diff --git a/scripts/mac-mini/backup-databases.sh b/scripts/mac-mini/backup-databases.sh index a5a44f0c4..755d3925b 100755 --- a/scripts/mac-mini/backup-databases.sh +++ b/scripts/mac-mini/backup-databases.sh @@ -1,14 +1,28 @@ #!/bin/bash # Mana Database Backup Script -# Creates daily backups of all PostgreSQL databases with rotation +# Creates daily backups of all PostgreSQL databases with rotation. # # Retention policy: # - Daily backups: keep last 7 days # - Weekly backups: keep last 4 weeks (Sundays) # -# Run via LaunchD daily at 3 AM +# Covers ALL postgres-Container that match `*postgres*` (ohne exporter +# /backup). Pro Container werden alle Datenbanken (außer Templates + +# `postgres`) gedumpt. Dump-Datei-Pattern: +# ${CONTAINER}_${DB}_${DATE}.sql.gz +# damit Cards-und-Manaspur-DBs mit gleichem Schema-Namen nicht +# überschreiben. +# +# Container-spezifischer DB-User: per-Container ENV-Override +# BACKUP_USER_=username (Default: postgres) +# z.B. BACKUP_USER_CARDS_POSTGRES=cards (Cards-Container heißt +# cards-postgres → cards-User). +# +# Run via LaunchD daily at 3 AM. -set -e +# NOTE: bewusst KEIN `set -e` global — wir wollen, dass ein Fehler +# in einem Container nicht den Rest abbricht. Failures werden via +# `FAILED_DBS` gesammelt und am Ende reported. # Ensure PATH includes docker export PATH="/usr/local/bin:/opt/homebrew/bin:$PATH" @@ -20,12 +34,10 @@ LOG_FILE="/tmp/mana-backup.log" DATE=$(date +%Y-%m-%d) DAY_OF_WEEK=$(date +%u) # 1=Monday, 7=Sunday -# Load env for password -if [ -f "$PROJECT_ROOT/.env.macmini" ]; then - source "$PROJECT_ROOT/.env.macmini" -fi - -POSTGRES_PASSWORD="${POSTGRES_PASSWORD:-mana123}" +# .env.macmini ist im DOTENV-Format (Werte enthalten Spaces, BEGIN/END- +# Marker etc.) — kann nicht via `source` in bash geladen werden. Wir +# brauchen aus diesem File auch nichts; Telegram-Tokens kommen aus +# .env.notifications separat. log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG_FILE" @@ -49,50 +61,76 @@ send_notification() { fi } +# Default-DB-User pro Container. Greenfield-Apps (cards, manaspur, +# nutriphi, zitare) nutzen den App-eigenen User; mana-infra-postgres +# läuft als `postgres`-Superuser. +db_user_for_container() { + case "$1" in + cards-postgres) echo "cards" ;; + manaspur-postgres) echo "manaspur" ;; + nutriphi-postgres) echo "nutriphi" ;; + zitare-postgres) echo "zitare" ;; + chorportal-prod-postgres) echo "chorportal" ;; + mana-infra-postgres) echo "postgres" ;; + *) echo "postgres" ;; + esac +} + # Create backup directories mkdir -p "$BACKUP_DIR/daily" mkdir -p "$BACKUP_DIR/weekly" log "=== Mana Database Backup ===" -# Check if postgres container is running -if ! docker ps --format '{{.Names}}' | grep -q "mana-infra-postgres"; then - log "ERROR: PostgreSQL container is not running" - send_notification "🚨 Backup Failed\n\nPostgreSQL container not running" "high" +# Alle Postgres-Container finden (heuristic: name endet auf `postgres` +# oder enthält `-postgres`; ignoriere exporter/backup-Varianten). +CONTAINERS=$(docker ps --format '{{.Names}}' | grep -E 'postgres$|-postgres$' | grep -vE 'exporter|^mana-infra-postgres-backup$') + +if [ -z "$CONTAINERS" ]; then + log "ERROR: no postgres container found" + send_notification "🚨 Backup Failed\n\nNo postgres container running" "high" exit 1 fi -# Get list of databases (exclude templates and postgres) -DATABASES=$(docker exec mana-infra-postgres psql -U postgres -t -c "SELECT datname FROM pg_database WHERE datistemplate = false AND datname != 'postgres';" | tr -d ' ' | grep -v "^$") - -log "Found databases: $(echo $DATABASES | tr '\n' ' ')" +log "Containers: $(echo $CONTAINERS | tr '\n' ' ')" BACKUP_COUNT=0 BACKUP_SIZE=0 FAILED_DBS="" -for DB in $DATABASES; do - log "Backing up: $DB" - BACKUP_FILE="$BACKUP_DIR/daily/${DB}_${DATE}.sql.gz" +for CONTAINER in $CONTAINERS; do + USER=$(db_user_for_container "$CONTAINER") + log "--- Container: $CONTAINER (user: $USER) ---" - # Create backup using pg_dump inside container, compress with gzip - if docker exec mana-infra-postgres pg_dump -U postgres "$DB" 2>/dev/null | gzip > "$BACKUP_FILE"; then - SIZE=$(ls -lh "$BACKUP_FILE" | awk '{print $5}') - log " OK: $DB ($SIZE)" - BACKUP_COUNT=$((BACKUP_COUNT + 1)) - BACKUP_SIZE=$((BACKUP_SIZE + $(stat -f%z "$BACKUP_FILE" 2>/dev/null || stat -c%s "$BACKUP_FILE" 2>/dev/null))) - else - log " FAILED: $DB" - FAILED_DBS="$FAILED_DBS $DB" - rm -f "$BACKUP_FILE" # Remove incomplete backup + # DB-Liste in diesem Container + if ! DB_LIST=$(docker exec "$CONTAINER" psql -U "$USER" -t -c "SELECT datname FROM pg_database WHERE datistemplate = false AND datname != 'postgres';" 2>/dev/null | tr -d ' ' | grep -v "^$"); then + log " FAILED to list databases in $CONTAINER (user $USER) — skipping" + FAILED_DBS="$FAILED_DBS ${CONTAINER}:list" + continue fi + + for DB in $DB_LIST; do + BACKUP_FILE="$BACKUP_DIR/daily/${CONTAINER}_${DB}_${DATE}.sql.gz" + if docker exec "$CONTAINER" pg_dump -U "$USER" "$DB" 2>/dev/null | gzip > "$BACKUP_FILE"; then + SIZE=$(ls -lh "$BACKUP_FILE" | awk '{print $5}') + log " OK: ${CONTAINER}/${DB} ($SIZE)" + BACKUP_COUNT=$((BACKUP_COUNT + 1)) + BACKUP_SIZE=$((BACKUP_SIZE + $(stat -f%z "$BACKUP_FILE" 2>/dev/null || stat -c%s "$BACKUP_FILE" 2>/dev/null))) + else + log " FAILED: ${CONTAINER}/${DB}" + FAILED_DBS="$FAILED_DBS ${CONTAINER}:${DB}" + rm -f "$BACKUP_FILE" + fi + done done -# On Sunday, create weekly backup +# On Sunday, create weekly backup (Sonntag = 7 in date +%u) if [ "$DAY_OF_WEEK" -eq 7 ]; then log "Creating weekly backup (Sunday)..." WEEKLY_DIR="$BACKUP_DIR/weekly/$DATE" mkdir -p "$WEEKLY_DIR" + # Alle daily-Dumps für heute kopieren (Pattern enthält jetzt CONTAINER + # vorne, deshalb `*_${DATE}.sql.gz` greift weiterhin). cp "$BACKUP_DIR/daily/"*"_${DATE}.sql.gz" "$WEEKLY_DIR/" 2>/dev/null || true log "Weekly backup created in $WEEKLY_DIR" fi