Two CD-pipeline ergonomics fixes that surfaced during the 2026-04-28
schema-drift sweep.
(C) Auto-apply additive Drizzle migrations
========================================
8 services use Drizzle (mana-auth/-credits/-events/-research/-mail/
-subscriptions/-user/-analytics) but the CD pipeline never ran their
`db:push` script, so 4 schema additions stayed undeployed for days
(auth.users.kind, credits.{sync_subscriptions,reservations},
event_discovery.*) until live PostgresErrors surfaced them.
New `scripts/mac-mini/safe-db-push.sh`:
- Uses `drizzle-kit generate` to write a probe SQL file (does NOT
apply yet).
- Greps the generated SQL for destructive patterns (DROP TABLE/
COLUMN/TYPE/SCHEMA/INDEX, ALTER COLUMN ... TYPE, RENAME).
- Refuses to auto-apply if any are found — operator must review and
run `pnpm db:push --force` manually after pg_dump.
- Otherwise applies via `drizzle-kit push --force` and cleans up the
probe artifacts.
CD step "Apply schema migrations" runs between build and container
restart, sourcing each changed service's DATABASE_URL from compose
config (with @postgres → @localhost rewrite for the host runner).
Failure aborts deploy before the new container starts — the old
container keeps running with the old schema, which matches.
(D) Build-time RAM headroom
========================================
mana-web's Vite build needs 8 GiB of Node heap; Colima's VM is sized
at 12 GiB; ~3.5 GiB of other containers run during deploy. The 2026-
04-28 mana-web deploy OOM'd at the Vite step ("cannot allocate
memory") and only succeeded on retry once concurrent traffic settled.
New `scripts/mac-mini/build-memory-headroom.sh`:
- `start`: stops every container matching `^mana-mon-` (the
observability stack — VictoriaMetrics, Loki, Glitchtip, cAdvisor,
umami, blackbox, exporters). Frees ~700 MiB.
- `stop`: restores them from the snapshot list captured at start.
- `wrap <cmd>`: pause + run + always-resume via trap.
CD wraps the build loop with start/stop, but only when mana-web is in
the change set — other services build well below 4 GiB and don't
need the headroom. The monitoring stack resumes before the migration
step so cAdvisor + exporters are back online for the deploy-metrics
collection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs made the Mac Mini auto-deploy silently miss everything on a
multi-commit push:
1. Diff range was HEAD~1..HEAD, so a push with N commits only checked
the tip. Now uses github.event.before..sha, with a safe fallback to
HEAD~1 when the before SHA is absent (first push, force reset).
2. Service list was still the legacy per-product web/backend apps
(todo-web, chat-web, calendar-web, …) that were consolidated into
`mana-web` + `mana-api` months ago. The unified services didn't
exist in the workflow, so a push touching apps/mana/apps/web or
apps/api never rebuilt them.
Rewrite:
- Collapse per-service outputs into one `services` output driven by a
SERVICE_SOURCES array (add a new service by adding one line).
- Expanded service surface: mana-ai, mana-research, mana-events,
mana-user, mana-subscriptions, mana-analytics, mana-llm, mana-api,
mana-web, mana-credits, mana-geocoding, manavoxel-web — alongside
the Go services + memoro + landing-builder.
- Removed dead entries: todo/chat/calendar/clock/contacts/music/
storage/memoro-web variants.
- Expanded sveltekit-base trigger (any commit to shared-pwa /
shared-vite-config / root Dockerfile / pnpm-lock forces a base
rebuild — those were invisible before).
- Updated health-check URLs from the running containers' actual host
ports (PORT_SCHEMA.md prose + table disagreed; docker ps wins).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit bundles two unrelated changes that were swept together by an
accidental `git add -A` in another working session. Documented here so the
history reflects what's actually inside.
═══════════════════════════════════════════════════════════════════════
1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead
of api.signInEmail
═══════════════════════════════════════════════════════════════════════
Previous attempt (commit 55cc75e7d) tried to fix the broken JWT mint in
/api/v1/auth/login by switching the cookie name from `mana.session_token`
to `__Secure-mana.session_token` for production. That was necessary but
not sufficient: Better Auth's session cookie value isn't just the raw
session token, it's `<token>.<HMAC>` where the HMAC is derived from the
better-auth secret. Reconstructing the cookie from auth.api.signInEmail's
JSON response only gave us the raw token, so /api/auth/token's
get-session middleware still couldn't validate it and the JWT mint kept
silently failing.
Real fix: do the sign-in via auth.handler (the HTTP path) rather than
auth.api.signInEmail (the SDK path). The handler returns a real fetch
Response with a Set-Cookie header containing the fully signed cookie
envelope. We capture that header verbatim and forward it as the cookie
on the /api/auth/token request, which now passes validation and mints
the JWT correctly.
Verified end-to-end on auth.mana.how:
$ curl -X POST https://auth.mana.how/api/v1/auth/login \
-d '{"email":"...","password":"..."}'
{
"user": {...},
"token": "<session token>",
"accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now
"refreshToken": "<session token>"
}
Side benefits:
- Email-not-verified path is now handled by checking
signInResponse.status === 403 directly, no more catching APIError
with the comment-noted async-stream footgun.
- X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter
and our security log see the real client IP.
- The leftover catch block now only handles unexpected exceptions
(network errors etc); the FORBIDDEN-checking logic in it is dead but
harmless and left in for defense in depth.
═══════════════════════════════════════════════════════════════════════
2. chore: remove the entire self-hosted Matrix stack (Synapse, Element,
Manalink, mana-matrix-bot)
═══════════════════════════════════════════════════════════════════════
The Matrix subsystem ran parallel to the main Mana product without any
load-bearing integration: the unified web app never imported matrix-js-sdk,
the chat module uses mana-sync (local-first), and mana-matrix-bot's
plugins duplicated features the unified app already ships natively.
Keeping it alive cost a Synapse + Element + matrix-web + bot container
quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth,
and a steady drip of devlog/dependency churn.
Removed:
- apps/matrix (Manalink web + mobile, ~150 files)
- services/mana-matrix-bot (Go bot with ~20 plugins)
- docker/matrix configs (Synapse + Element)
- synapse/element-web/matrix-web/mana-matrix-bot services in
docker-compose.macmini.yml
- matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes
- OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks
table from mana-auth (oauth_* schema definitions also removed)
- MatrixService import path in mana-media (importFromMatrix endpoint)
- Matrix notification channel in mana-notify (worker, metrics, config,
channel_type enum, MatrixOptions handler)
- Matrix entries from shared-branding (mana-apps + app-icons),
notify-client, the i18n bundle, the observatory map, the credits
app-label list, the landing footer/apps page, the prometheus + alerts
+ promtail tier mappings, and the matrix-related deploy paths in
cd-macmini.yml + ci.yml
Devlog/manascore/blueprint entries that mention Matrix are left intact
as historical record. The oauth_* + matrix_user_links Postgres tables
stay on existing prod databases — code can no longer write to them, drop
them in a follow-up migration if you want them gone for real.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mana-stt: add WhisperX service with CUDA GPU support, speaker diarization, and auto-fallback chain.
mana-notify: add locale fallback and default templates for task reminders.
CD: update deployment pipeline and docker-compose configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both apps are fully local-first via Dexie.js + mana-sync. Their NestJS
backends were pure CRUD wrappers (20 + 31 source files) that are no
longer needed.
Changes:
- Add packages/shared-hono: JWT auth via JWKS (jose), Drizzle DB factory,
health route, generic GDPR admin handler, error middleware
- Migrate zitare lists page from fetch() to listsStore (local-first)
- Rewrite clock timers store from API-based to timerCollection (Dexie)
- Update clock +layout.svelte CommandBar search to use local collections
- Remove zitare-backend + clock-backend from docker-compose, CI/CD,
Prometheus, env generation, setup scripts
- Add docs/TECHNOLOGY_AUDIT_2026_03.md with full repo analysis
Net result: -2 Docker containers, -2 ports, -2728 lines of code
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 21 separate NestJS Matrix bot processes (~2.1 GB RAM, ~4.2 GB Docker images)
with a single Go binary using plugin architecture (8.6 MB binary, ~30 MB RAM).
New services:
- services/mana-matrix-bot/ — Go Matrix bot with 21 plugins (mautrix-go, Redis sessions)
- services/mana-api-gateway-go/ — Go API gateway (rate limiting, API keys, credit billing)
Deleted:
- 21 services/matrix-*-bot/ directories
- packages/bot-services/ and packages/matrix-bot-common/
- Legacy deploy scripts and CI build jobs
Updated:
- docker-compose.macmini.yml: new Go services, legacy bots removed
- CI/CD: change detection + build jobs for Go services
- Root package.json: new dev:matrix, build:matrix, test:matrix scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.sveltekit-base: pre-built base with all 34 shared
packages (mirrors nestjs-base pattern), eliminates redundant COPY/build
steps from individual web Dockerfiles
- Add scripts/mac-mini/build-app.sh: stops monitoring stack before build
to free RAM, auto-restarts on exit (trap cleanup)
- Migrate todo web Dockerfile to use sveltekit-base:local (47 COPY lines
→ 2, 4 build steps → 0)
- Update CD workflow to build sveltekit-base when deploying web apps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New service that generates static Astro landing pages for organizations
and deploys them to Cloudflare Pages at {slug}.mana.how.
Components:
- Landing Builder Service (NestJS, port 3030) with Astro template
- Admin UI in Manacore web dashboard at /organizations/[id]/landing
- TeamSection + ContactSection for shared-landing-ui
- Two org themes (classic dark, warm light)
- LandingPageConfig types in shared-types
- Docker + CI/CD integration for Mac Mini deployment
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add build context to storage-web in docker-compose (was pulling from
GHCR, now builds locally like other services)
- Add storage-backend and storage-web to CD change detection and deploy
- Fix mukke health check URLs (were using wrong ports 3035/5015)
- Remove hardcoded port from Dockerfile (use PORT env var from compose)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.nestjs-base with all shared packages pre-built
- Convert 6 backend Dockerfiles (chat, todo, calendar, clock, contacts,
mukke) to inherit from nestjs-base:local
- Fix bugs: duplicate shared-nestjs-setup builds (mukke), unnecessary
shared-error-tracking rebuild in production stage (chat, clock)
- CD pipeline builds base image before services when backends deploy
- Net reduction: 317 lines removed, 112 added (-205 lines)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dockerfile, docker-compose service (port 5100), Caddy and cloudflared
routing for the WhoPixels game. PORT is now configurable via env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sends a message to a Matrix room when a deploy fails, including
the failing services, commit, deployer, and a link to the logs.
Requires two GitHub Actions secrets:
- DEPLOY_NOTIFY_ROOM_ID: Matrix room ID
- DEPLOY_NOTIFY_BOT_TOKEN: Matrix bot access token
Skips silently if secrets are not configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mukke was missing from the automated deployment pipeline, so changes
to the web app were not being deployed to the Mac Mini server.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove set -euo pipefail from sourced library (breaks caller error handling)
- Replace declare -A associative arrays with string-based lookups
- macOS ships Bash 3.2 which doesn't support declare -A
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the CD pipeline to record per-deploy and per-service metrics
(build time, image size, startup time, health status) into PostgreSQL and
push gauges to Pushgateway. Adds a Grafana dashboard with 13 panels covering
deploy frequency, build performance, service health, and history.
New files:
- scripts/mac-mini/init-deploy-tracking.sql (idempotent DDL)
- scripts/deploy-metrics.sh (bash library for CI)
- docker/grafana/provisioning/datasources/deploy-tracking.yml
- docker/grafana/dashboards/deploy-tracking.json
Modified:
- docker/prometheus/prometheus.yml (pushgateway scrape job)
- .github/workflows/cd-macmini.yml (build/health instrumentation)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a step to the CD pipeline that ensures CALENDAR_ENCRYPTION_KEY
exists in .env.macmini, generating one if missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a GitHub Actions workflow that detects changed services on push to
main and automatically rebuilds/restarts only the affected Docker containers
on the Mac Mini. Includes setup guide for the self-hosted runner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>