Two CD-pipeline ergonomics fixes that surfaced during the 2026-04-28
schema-drift sweep.
(C) Auto-apply additive Drizzle migrations
========================================
8 services use Drizzle (mana-auth/-credits/-events/-research/-mail/
-subscriptions/-user/-analytics) but the CD pipeline never ran their
`db:push` script, so 4 schema additions stayed undeployed for days
(auth.users.kind, credits.{sync_subscriptions,reservations},
event_discovery.*) until live PostgresErrors surfaced them.
New `scripts/mac-mini/safe-db-push.sh`:
- Uses `drizzle-kit generate` to write a probe SQL file (does NOT
apply yet).
- Greps the generated SQL for destructive patterns (DROP TABLE/
COLUMN/TYPE/SCHEMA/INDEX, ALTER COLUMN ... TYPE, RENAME).
- Refuses to auto-apply if any are found — operator must review and
run `pnpm db:push --force` manually after pg_dump.
- Otherwise applies via `drizzle-kit push --force` and cleans up the
probe artifacts.
CD step "Apply schema migrations" runs between build and container
restart, sourcing each changed service's DATABASE_URL from compose
config (with @postgres → @localhost rewrite for the host runner).
Failure aborts deploy before the new container starts — the old
container keeps running with the old schema, which matches.
(D) Build-time RAM headroom
========================================
mana-web's Vite build needs 8 GiB of Node heap; Colima's VM is sized
at 12 GiB; ~3.5 GiB of other containers run during deploy. The 2026-
04-28 mana-web deploy OOM'd at the Vite step ("cannot allocate
memory") and only succeeded on retry once concurrent traffic settled.
New `scripts/mac-mini/build-memory-headroom.sh`:
- `start`: stops every container matching `^mana-mon-` (the
observability stack — VictoriaMetrics, Loki, Glitchtip, cAdvisor,
umami, blackbox, exporters). Frees ~700 MiB.
- `stop`: restores them from the snapshot list captured at start.
- `wrap <cmd>`: pause + run + always-resume via trap.
CD wraps the build loop with start/stop, but only when mana-web is in
the change set — other services build well below 4 GiB and don't
need the headroom. The monitoring stack resumes before the migration
step so cAdvisor + exporters are back online for the deploy-metrics
collection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CI workflow accumulated 16 build jobs for apps/services that no
longer exist after the 2026-04 unification (chat, todo, calendar, clock,
contacts, presi, storage, food, skilltree, telegram-stats-bot — all of
their /apps/web and /apps/backend dirs are gone) plus a structurally
broken `build-mana-web` (its Dockerfile starts FROM `sveltekit-base:local`,
an image only the Mac Mini self-hosted runner has). Every push has been
producing red CI runs from these dead jobs while the real production
deploys (cd-macmini.yml) succeeded.
Removed:
- detect-changes per-product outputs + force-build-all branches
- 220 lines of dead per-product detect-logic shell
- 19 lines of per-product summary block
- build-mana-web (broken; CD on Mac Mini covers prod)
- build-{chat,todo,calendar,clock,contacts,presi,storage,food,skilltree}-{web,backend}
- build-telegram-stats-bot
Kept (still build cleanly on ubuntu-latest):
- build-mana-{auth,search,sync,notify,api-gateway,crawler,media,credits}
- validate (PRs)
- auth-integration (PRs)
CI workflow shrank 1290 → 583 lines. The header comment now spells out
which services are in/out of CI and why.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs made the Mac Mini auto-deploy silently miss everything on a
multi-commit push:
1. Diff range was HEAD~1..HEAD, so a push with N commits only checked
the tip. Now uses github.event.before..sha, with a safe fallback to
HEAD~1 when the before SHA is absent (first push, force reset).
2. Service list was still the legacy per-product web/backend apps
(todo-web, chat-web, calendar-web, …) that were consolidated into
`mana-web` + `mana-api` months ago. The unified services didn't
exist in the workflow, so a push touching apps/mana/apps/web or
apps/api never rebuilt them.
Rewrite:
- Collapse per-service outputs into one `services` output driven by a
SERVICE_SOURCES array (add a new service by adding one line).
- Expanded service surface: mana-ai, mana-research, mana-events,
mana-user, mana-subscriptions, mana-analytics, mana-llm, mana-api,
mana-web, mana-credits, mana-geocoding, manavoxel-web — alongside
the Go services + memoro + landing-builder.
- Removed dead entries: todo/chat/calendar/clock/contacts/music/
storage/memoro-web variants.
- Expanded sveltekit-base trigger (any commit to shared-pwa /
shared-vite-config / root Dockerfile / pnpm-lock forces a base
rebuild — those were invisible before).
- Updated health-check URLs from the running containers' actual host
ports (PORT_SCHEMA.md prose + table disagreed; docker ps wins).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
\`ci.yml\` had a \`pnpm run validate:monorepo\` step that referenced a
script defined nowhere in the repo — CI would fail at that step
whenever the validate job ran. Replacing it with a new bundled
\`validate:all\` script closes that gap and gives contributors a single
local command that mirrors what CI enforces.
- New \`validate:all\` chains the three fast repo-invariant checks
(turbo recursion, pgSchema isolation, crypto registry) with fail-fast
semantics. Runtime ~1s — suitable as a pre-push gate.
- \`validate:dockerfiles\` intentionally left out: its current output
is 41 pre-existing "MISSING" warnings on two web Dockerfiles, which
look like a validator-vs-wildcard-COPY mismatch rather than real
issues. Keeping it as a standalone script so those can be
triaged separately without blocking \`validate:all\`.
- ci.yml: four separate validate steps collapsed into one. The step
rename also removes the dead \`validate:monorepo\` call.
Verified: \`pnpm run validate:all\` exits 0 in ~1s — 138 packages
scanned for turbo recursion, 727 TypeScript files for raw pgTable,
190 Dexie tables classified in the crypto registry (85 encrypted,
105 allowlisted plaintext).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "every Drizzle table uses pgSchema" rule was documented in
.claude/guidelines/database.md (added yesterday as part of Concern 5)
but enforced only by convention. A new service could slip a raw
\`pgTable()\` past review and collide in the default \`public\` schema
of \`mana_platform\`, and nothing would surface the mistake until a
production migration failed.
- \`scripts/validate-pg-schema-isolation.mjs\` scans every tracked
TypeScript file under services/, apps/api/, packages/ for call sites
of \`pgTable(\` (not imports — imports can still be useful for types).
Strips comments before matching so doc-examples like "use \`pgTable()\`"
don't trigger false positives.
- Wired as \`pnpm run validate:pg-schema\` and a new CI step in the
validate job (right after the turbo-recursion check). 721 files
scan clean today.
- Removed an unused \`pgTable\` import in mana-subscriptions that would
have been the only import of the symbol remaining after this change.
- Updated .claude/guidelines/database.md — the old verification blurb
said "no automated lint rule yet", now points at the enforcer.
Drift verified: injecting a synthetic \`pgTable('bad', {})\` into
subscriptions.ts failed with a clear file:line violation pointing at
the database guideline.
Closes the "no automated lint rule" gap noted in the database guideline.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLAUDE.md flagged this as "CRITICAL" — a child package.json defining
e.g. \`"build": "turbo run build"\` causes a 10+ minute CI hang with
thousands of duplicate task spawns. The rule was documented but never
enforced, so it re-emerged every couple of months as someone copied a
parent script pattern.
- \`scripts/validate-no-recursive-turbo.mjs\` walks every tracked
package.json (via \`git ls-files\`, so node_modules is auto-skipped)
and fails if any non-root package has build/type-check/lint/test/
test:coverage/check scripts containing \`turbo run\`. \`dev\` stays
allowed — delegating it from a parent is the intended ergonomic.
- Wired as \`pnpm run validate:turbo\` + a new CI step in the validate
job (before type-check — fails fast).
- CLAUDE.md §Turborepo updated to point at the enforcer and call out
the full task list (test/test:coverage/check were missing from the
original prose).
Verified: 138 non-root package.json files scan clean. Drift simulation
(injecting \`"build": "turbo run build"\` into apps/mana/apps/web) fails
with a clear message pointing at the offending file + script + fix.
This closes audit item #32 from the architecture review.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before: adding a new Dexie table left the encryption decision implicit.
If you forgot to register it, the table silently shipped in plaintext
forever — no error, no warning, no footprint anywhere. The architecture
audit flagged this as the root of Concern 1.
- `scripts/audit-crypto-registry.mjs` parses database.ts's `.stores()`
blocks and registry.ts's entries, then enforces three invariants:
1. Every Dexie table is either in the encryption registry OR in the
new `plaintext-allowlist.ts` — one conscious classification per
table.
2. No dead registry entries (referring to tables that no longer
exist in Dexie).
3. No table appears in both — single authoritative source.
- `plaintext-allowlist.ts` auto-seeded from current state. 105 entries,
each tagged `// TODO: audit` as an invitation to review whether the
table truly holds nothing sensitive. The allowlist is intentionally
a separate file so additions are reviewable on their own (not buried
inside database.ts schema bumps).
- Wired into `pnpm run check:crypto` + CI validate job — a new table
now fails the PR check instead of slipping past review.
- `check:crypto:seed` regenerates the allowlist if ever needed.
Verified: drift simulation (removing aiMissions from the allowlist)
fails the audit with a clear message pointing at the missing
classification. Current state passes: 187 Dexie tables, 82 encrypted,
105 explicit plaintext.
Concern 1 is now fully closed (A: typed registry entries, B: dev-mode
runtime drift check, C: build-time audit enforcing coverage).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI previously ran `pnpm run test || true` — test failures were silently
swallowed with no artifact, so we had no visibility into what was actually
passing across 1,296 test files.
- New `test:coverage` turbo pipeline task + root script; packages that opt
in by declaring their own `test:coverage` get picked up automatically.
- Wired up three high-value Vitest targets: apps/mana/apps/web (main
frontend, ~590 tests), shared-ui (Svelte component library), and
shared-storage (S3 client). Each emits lcov.info + coverage-summary.json
+ browsable HTML.
- apps/mana/apps/web `"test"` was running in watch mode (just `vitest`),
which hangs under turbo orchestration — changed to `vitest run` and
added `test:watch` for the interactive case.
- CI uploads coverage artifacts (14-day retention) regardless of whether
tests passed. `continue-on-error: true` replaces `|| true` so a failed
suite shows up as a warning annotation on the PR rather than being
invisible. Flip to a hard gate once main is green for a full week.
- Testing guideline documents the pattern + the template vitest config
+ the planned 80% threshold.
- ESLint flat-config `vitest.config.ts` ignore only matched at the root;
widened to `**/vitest.config.{ts,js,mjs}` so nested configs don't trip
the project-service parser.
Coverage baseline produced locally:
shared-storage: 91.37% lines (6 files, 123 tests)
shared-ui: 2.87% lines (mostly Svelte components, untested)
apps/mana/web: 9/59 test files fail — pre-existing, not regression
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds four audit scripts (module health, inter-module coupling, per-function
cognitive complexity, D3 treemap) with generated reports under docs/ and
an iframe-embedded workbench app at /admin/complexity. Reports regenerate
weekly via the module-health GitHub Action.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a 13-step integration test that exercises register → email
verification → login → JWT validation → /me/data → encryption-vault
init/key → logout against a real stack of postgres + redis + mailpit +
mana-auth + mana-notify in docker compose.
Verified locally that this catches every regression we hit on
2026-04-08 in well under a second:
- missing nanoid dependency → register endpoint 500
- missing MANA_AUTH_KEK env passthrough → mana-auth never starts
- missing encryption-vault SQL migrations → vault endpoints 500
- wrong cookie name in /api/v1/auth/login → no accessToken in response
- mana-notify SMTP misconfigured → mailpit poll times out
Files:
- docker-compose.test.yml — minimal isolated stack on alt ports
(postgres 5443, redis 6390, mailpit 1026/8026, mana-auth 3091,
mana-notify 3092). Runs alongside the dev stack without collision.
Postgres healthcheck runs a real query rather than just pg_isready
to avoid the race where pg_isready reports healthy while the docker
init scripts are still running on a unix socket.
- tests/integration/auth-flow.test.ts — bun test that drives the full
flow via fetch + mailpit's REST API. Cleans up its test user from
postgres in afterAll. Self-contained, no extra deps.
- tests/integration/README.md — what's covered, why it exists, how
to run locally + extend.
- scripts/run-integration-tests.sh — orchestrator. Brings up the
stack, pushes the @mana/auth Drizzle schema, applies the
encryption-vault SQL migrations (002, 003), restarts mana-auth so
it sees the fresh tables, runs the test, tears down on exit.
KEEP_STACK=1 to leave it up for manual mailpit inspection.
- docker-compose.dev.yml — also adds Mailpit as a regular dev service
(ports 1025/8025) so local development can have a working email
capture without spinning up the test stack.
- .github/workflows/ci.yml — new auth-integration job that runs on
every PR. Calls run-integration-tests.sh; on failure dumps
mana-auth + mana-notify logs and the mailpit message queue. Marked
as a required check via the existing PR validation pipeline.
Reproduced 3 clean runs and 1 negative-control run (removed nanoid
from package.json → mana-auth container exits → script aborts with
non-zero) before committing. Full happy path runs in ~22s on a warm
Docker cache.
This commit bundles two unrelated changes that were swept together by an
accidental `git add -A` in another working session. Documented here so the
history reflects what's actually inside.
═══════════════════════════════════════════════════════════════════════
1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead
of api.signInEmail
═══════════════════════════════════════════════════════════════════════
Previous attempt (commit 55cc75e7d) tried to fix the broken JWT mint in
/api/v1/auth/login by switching the cookie name from `mana.session_token`
to `__Secure-mana.session_token` for production. That was necessary but
not sufficient: Better Auth's session cookie value isn't just the raw
session token, it's `<token>.<HMAC>` where the HMAC is derived from the
better-auth secret. Reconstructing the cookie from auth.api.signInEmail's
JSON response only gave us the raw token, so /api/auth/token's
get-session middleware still couldn't validate it and the JWT mint kept
silently failing.
Real fix: do the sign-in via auth.handler (the HTTP path) rather than
auth.api.signInEmail (the SDK path). The handler returns a real fetch
Response with a Set-Cookie header containing the fully signed cookie
envelope. We capture that header verbatim and forward it as the cookie
on the /api/auth/token request, which now passes validation and mints
the JWT correctly.
Verified end-to-end on auth.mana.how:
$ curl -X POST https://auth.mana.how/api/v1/auth/login \
-d '{"email":"...","password":"..."}'
{
"user": {...},
"token": "<session token>",
"accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now
"refreshToken": "<session token>"
}
Side benefits:
- Email-not-verified path is now handled by checking
signInResponse.status === 403 directly, no more catching APIError
with the comment-noted async-stream footgun.
- X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter
and our security log see the real client IP.
- The leftover catch block now only handles unexpected exceptions
(network errors etc); the FORBIDDEN-checking logic in it is dead but
harmless and left in for defense in depth.
═══════════════════════════════════════════════════════════════════════
2. chore: remove the entire self-hosted Matrix stack (Synapse, Element,
Manalink, mana-matrix-bot)
═══════════════════════════════════════════════════════════════════════
The Matrix subsystem ran parallel to the main Mana product without any
load-bearing integration: the unified web app never imported matrix-js-sdk,
the chat module uses mana-sync (local-first), and mana-matrix-bot's
plugins duplicated features the unified app already ships natively.
Keeping it alive cost a Synapse + Element + matrix-web + bot container
quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth,
and a steady drip of devlog/dependency churn.
Removed:
- apps/matrix (Manalink web + mobile, ~150 files)
- services/mana-matrix-bot (Go bot with ~20 plugins)
- docker/matrix configs (Synapse + Element)
- synapse/element-web/matrix-web/mana-matrix-bot services in
docker-compose.macmini.yml
- matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes
- OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks
table from mana-auth (oauth_* schema definitions also removed)
- MatrixService import path in mana-media (importFromMatrix endpoint)
- Matrix notification channel in mana-notify (worker, metrics, config,
channel_type enum, MatrixOptions handler)
- Matrix entries from shared-branding (mana-apps + app-icons),
notify-client, the i18n bundle, the observatory map, the credits
app-label list, the landing footer/apps page, the prometheus + alerts
+ promtail tier mappings, and the matrix-related deploy paths in
cd-macmini.yml + ci.yml
Devlog/manascore/blueprint entries that mention Matrix are left intact
as historical record. The oauth_* + matrix_user_links Postgres tables
stay on existing prod databases — code can no longer write to them, drop
them in a follow-up migration if you want them gone for real.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mana-stt: add WhisperX service with CUDA GPU support, speaker diarization, and auto-fallback chain.
mana-notify: add locale fallback and default templates for task reminders.
CD: update deployment pipeline and docker-compose configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The mirror workflow runs git pull on the server's working directory,
which fails if there are uncommitted changes (e.g., from manual edits).
Adding git stash before pull prevents this.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Forgejo runner has no macOS binary — Docker-based runner can't access
host filesystem/SSH needed for CD. GitHub CD via native self-hosted
runner handles all deployments. Forgejo remains a push-mirror for
backup and visibility.
- Remove .forgejo/workflows/cd-macmini.yml
- Remove forgejo-runner service from docker-compose
- Update mirror workflow comments
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- .github/workflows/mirror-to-forgejo.yml: on push to main, runner on Mac Mini
pushes to Forgejo via local SSH (localhost:2222). Keeps Forgejo in sync.
- .forgejo/workflows/cd-macmini.yml: deploy step now pulls from forgejo remote
(ssh://localhost:2222) instead of GitHub origin.
Flow: local → git push origin main → GitHub → mirror-to-forgejo runs on Mac Mini
→ pushes to Forgejo → Forgejo CD pipeline → deploys containers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Presi NestJS backend (40 source files, 50 deps) was a CRUD wrapper
around decks, slides, and themes — all now handled by local-first sync.
Only the share-link feature requires server-side state (public URLs
without auth), so a minimal Hono + Bun server replaces the entire
NestJS backend:
- apps/presi/apps/server/ — Hono server with share routes + GDPR admin
Uses @manacore/shared-hono for auth (JWKS), health, admin, errors
- Web app API client stripped to share-only (was 270 lines → 90 lines)
- Removed from docker-compose, CI/CD, Prometheus, env generation
- NestJS backend deleted (40 TS files, 8 test specs, 3038 lines)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both apps are fully local-first via Dexie.js + mana-sync. Their NestJS
backends were pure CRUD wrappers (20 + 31 source files) that are no
longer needed.
Changes:
- Add packages/shared-hono: JWT auth via JWKS (jose), Drizzle DB factory,
health route, generic GDPR admin handler, error middleware
- Migrate zitare lists page from fetch() to listsStore (local-first)
- Rewrite clock timers store from API-based to timerCollection (Dexie)
- Update clock +layout.svelte CommandBar search to use local collections
- Remove zitare-backend + clock-backend from docker-compose, CI/CD,
Prometheus, env generation, setup scripts
- Add docs/TECHNOLOGY_AUDIT_2026_03.md with full repo analysis
Net result: -2 Docker containers, -2 ports, -2728 lines of code
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the NestJS mana-search service with a Go implementation for
lower resource usage and faster startup. All 7 API endpoints are 1:1
compatible (search, extract, bulk extract, engines, health, metrics,
cache clear). Uses go-readability for content extraction and
html-to-markdown for Markdown conversion. Redis cache with graceful
degradation, Prometheus metrics, and structured JSON logging.
Binary: 22 MB vs ~200+ MB node_modules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 21 separate NestJS Matrix bot processes (~2.1 GB RAM, ~4.2 GB Docker images)
with a single Go binary using plugin architecture (8.6 MB binary, ~30 MB RAM).
New services:
- services/mana-matrix-bot/ — Go Matrix bot with 21 plugins (mautrix-go, Redis sessions)
- services/mana-api-gateway-go/ — Go API gateway (rate limiting, API keys, credit billing)
Deleted:
- 21 services/matrix-*-bot/ directories
- packages/bot-services/ and packages/matrix-bot-common/
- Legacy deploy scripts and CI build jobs
Updated:
- docker-compose.macmini.yml: new Go services, legacy bots removed
- CI/CD: change detection + build jobs for Go services
- Root package.json: new dev:matrix, build:matrix, test:matrix scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.sveltekit-base: pre-built base with all 34 shared
packages (mirrors nestjs-base pattern), eliminates redundant COPY/build
steps from individual web Dockerfiles
- Add scripts/mac-mini/build-app.sh: stops monitoring stack before build
to free RAM, auto-restarts on exit (trap cleanup)
- Migrate todo web Dockerfile to use sveltekit-base:local (47 COPY lines
→ 2, 4 build steps → 0)
- Update CD workflow to build sveltekit-base when deploying web apps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validates workspace dependencies and Dockerfile freshness before
Docker builds. Catches missing deps and outdated COPYs in PRs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Validates Dockerfiles and builds 5 representative images in parallel
on PRs. Catches missing COPY statements and build errors before deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New service that generates static Astro landing pages for organizations
and deploys them to Cloudflare Pages at {slug}.mana.how.
Components:
- Landing Builder Service (NestJS, port 3030) with Astro template
- Admin UI in Manacore web dashboard at /organizations/[id]/landing
- TeamSection + ContactSection for shared-landing-ui
- Two org themes (classic dark, warm light)
- LandingPageConfig types in shared-types
- Docker + CI/CD integration for Mac Mini deployment
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add build context to storage-web in docker-compose (was pulling from
GHCR, now builds locally like other services)
- Add storage-backend and storage-web to CD change detection and deploy
- Fix mukke health check URLs (were using wrong ports 3035/5015)
- Remove hardcoded port from Dockerfile (use PORT env var from compose)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add format:check step (prettier --check, hard fail)
- Make lint step a hard fail (remove || echo fallback)
- Add test step (soft fail for now — coverage is thin)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.nestjs-base with all shared packages pre-built
- Convert 6 backend Dockerfiles (chat, todo, calendar, clock, contacts,
mukke) to inherit from nestjs-base:local
- Fix bugs: duplicate shared-nestjs-setup builds (mukke), unnecessary
shared-error-tracking rebuild in production stage (chat, clock)
- CD pipeline builds base image before services when backends deploy
- Net reduction: 317 lines removed, 112 added (-205 lines)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dockerfile, docker-compose service (port 5100), Caddy and cloudflared
routing for the WhoPixels game. PORT is now configurable via env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sends a message to a Matrix room when a deploy fails, including
the failing services, commit, deployer, and a link to the logs.
Requires two GitHub Actions secrets:
- DEPLOY_NOTIFY_ROOM_ID: Matrix room ID
- DEPLOY_NOTIFY_BOT_TOKEN: Matrix bot access token
Skips silently if secrets are not configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mukke was missing from the automated deployment pipeline, so changes
to the web app were not being deployed to the Mac Mini server.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove set -euo pipefail from sourced library (breaks caller error handling)
- Replace declare -A associative arrays with string-based lookups
- macOS ships Bash 3.2 which doesn't support declare -A
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instrument the CD pipeline to record per-deploy and per-service metrics
(build time, image size, startup time, health status) into PostgreSQL and
push gauges to Pushgateway. Adds a Grafana dashboard with 13 panels covering
deploy frequency, build performance, service health, and history.
New files:
- scripts/mac-mini/init-deploy-tracking.sql (idempotent DDL)
- scripts/deploy-metrics.sh (bash library for CI)
- docker/grafana/provisioning/datasources/deploy-tracking.yml
- docker/grafana/dashboards/deploy-tracking.json
Modified:
- docker/prometheus/prometheus.yml (pushgateway scrape job)
- .github/workflows/cd-macmini.yml (build/health instrumentation)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a step to the CD pipeline that ensures CALENDAR_ENCRYPTION_KEY
exists in .env.macmini, generating one if missing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a GitHub Actions workflow that detects changed services on push to
main and automatically rebuilds/restarts only the affected Docker containers
on the Mac Mini. Includes setup guide for the self-hosted runner.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Dockerfile for zitare-backend (multi-stage build, port 3007)
- Add docker-entrypoint.sh for database setup
- Add zitare-backend service to docker-compose.macmini.yml
- Update matrix-zitare-bot to depend on zitare-backend
- Add zitare-backend to CI workflow (change detection + build job)
Storage-backend build was failing on ARM64 due to QEMU emulation
"Illegal instruction" crash when building native dependencies.
Same approach used for matrix-mana-bot and matrix-tts-bot.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
All matrix bots use matrix-bot-sdk which has native dependencies
(cpu-features, ssh2) that cause QEMU emulation failures during CI
arm64 builds. Build amd64 only - can run on arm64 via Rosetta.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>