chore(deploy): auto-apply additive Drizzle schema migrations + RAM headroom for mana-web build

Two CD-pipeline ergonomics fixes that surfaced during the 2026-04-28
schema-drift sweep.

(C) Auto-apply additive Drizzle migrations
========================================
8 services use Drizzle (mana-auth/-credits/-events/-research/-mail/
-subscriptions/-user/-analytics) but the CD pipeline never ran their
`db:push` script, so 4 schema additions stayed undeployed for days
(auth.users.kind, credits.{sync_subscriptions,reservations},
event_discovery.*) until live PostgresErrors surfaced them.

New `scripts/mac-mini/safe-db-push.sh`:
- Uses `drizzle-kit generate` to write a probe SQL file (does NOT
  apply yet).
- Greps the generated SQL for destructive patterns (DROP TABLE/
  COLUMN/TYPE/SCHEMA/INDEX, ALTER COLUMN ... TYPE, RENAME).
- Refuses to auto-apply if any are found — operator must review and
  run `pnpm db:push --force` manually after pg_dump.
- Otherwise applies via `drizzle-kit push --force` and cleans up the
  probe artifacts.

CD step "Apply schema migrations" runs between build and container
restart, sourcing each changed service's DATABASE_URL from compose
config (with @postgres → @localhost rewrite for the host runner).
Failure aborts deploy before the new container starts — the old
container keeps running with the old schema, which matches.

(D) Build-time RAM headroom
========================================
mana-web's Vite build needs 8 GiB of Node heap; Colima's VM is sized
at 12 GiB; ~3.5 GiB of other containers run during deploy. The 2026-
04-28 mana-web deploy OOM'd at the Vite step ("cannot allocate
memory") and only succeeded on retry once concurrent traffic settled.

New `scripts/mac-mini/build-memory-headroom.sh`:
- `start`: stops every container matching `^mana-mon-` (the
  observability stack — VictoriaMetrics, Loki, Glitchtip, cAdvisor,
  umami, blackbox, exporters). Frees ~700 MiB.
- `stop`: restores them from the snapshot list captured at start.
- `wrap <cmd>`: pause + run + always-resume via trap.

CD wraps the build loop with start/stop, but only when mana-web is in
the change set — other services build well below 4 GiB and don't
need the headroom. The monitoring stack resumes before the migration
step so cAdvisor + exporters are back online for the deploy-metrics
collection.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-28 16:10:31 +02:00
parent bcc21ca785
commit 698e09b88c
3 changed files with 283 additions and 0 deletions

View file

@ -290,6 +290,24 @@ jobs:
echo "=== Rebuilding: $SERVICES ==="
fi
# mana-web's Vite build needs 8 GiB of Node heap and Colima's
# VM is sized at 12 GiB. With ~3.5 GiB of other containers
# running, peak RSS occasionally OOMs the build (we hit this
# on 2026-04-28). Pause the non-critical monitoring stack
# for the duration of the build to free ~700 MiB of headroom;
# the trap inside the wrapper restores it on exit, even on
# build failure. No-op if mana-web isn't in $SERVICES.
PAUSE_MONITORING=false
if echo " $SERVICES " | grep -q ' mana-web '; then
PAUSE_MONITORING=true
echo "=== Pausing monitoring stack (mana-web build needs RAM headroom) ==="
./scripts/mac-mini/build-memory-headroom.sh start
fi
# Resume monitoring no matter how the build phase exits.
if [ "$PAUSE_MONITORING" = "true" ]; then
trap './scripts/mac-mini/build-memory-headroom.sh stop' EXIT
fi
# Build each service individually to capture build times
BUILD_TIMES=""
for svc in $SERVICES; do
@ -302,6 +320,44 @@ jobs:
echo " $svc built in ${build_dur}s"
done
# Resume monitoring before the migration / start steps run —
# they need cAdvisor + exporters back online to record the
# deploy metrics step further down.
if [ "$PAUSE_MONITORING" = "true" ]; then
./scripts/mac-mini/build-memory-headroom.sh stop
trap - EXIT
fi
# Apply Drizzle schema migrations BEFORE we restart the
# service containers — additive-only, see
# scripts/mac-mini/safe-db-push.sh for the destructive guard.
# If a service has no Drizzle config or no schema diff this is
# a fast no-op. We must source POSTGRES_PASSWORD from the env
# file because the workflow env doesn't carry it.
echo "=== Applying schema migrations ==="
set -a
# shellcheck source=/dev/null
. "$ENV_FILE"
set +a
PG_PASSWORD="${POSTGRES_PASSWORD:-mana123}"
# Most services live in mana_platform; mana-sync (Go, no
# Drizzle) and a handful of others use mana_sync. Per-service
# routing is read straight from compose's DATABASE_URL env.
for svc in $SERVICES; do
# Pull the literal DATABASE_URL from the compose definition,
# then swap host postgres → localhost (we run on the host,
# not inside the docker network).
db_url=$(docker compose -f "$COMPOSE_FILE" --env-file "$ENV_FILE" config "$svc" 2>/dev/null \
| awk '/DATABASE_URL:/ {print $2; exit}' \
| sed 's|@postgres:|@localhost:|')
if [ -z "$db_url" ]; then continue; fi
DATABASE_URL="$db_url" PROJECT_DIR="${{ env.PROJECT_DIR }}" \
./scripts/mac-mini/safe-db-push.sh "$svc" || {
echo "[deploy] safe-db-push failed for $svc — aborting before restart"
exit 1
}
done
# Start all services at once (no rebuild, images already built)
echo "=== Starting services ==="
if [ "$DEPLOY_ALL" == "true" ]; then