Two pieces of the same cleanup:
1. build-app.sh now passes `--env-file .env.macmini` explicitly via a
shared COMPOSE_ARGS array. Without it, docker compose silently fell
back to `.env` in the project root — a separate file that happened
to hold MANA_AUTH_KEK and other secrets that `.env.macmini` lacked.
deploy.sh, restart.sh, and the CD workflow already used the flag;
this aligns build-app.sh with the rest. Server-side .env.macmini
was reconciled 2026-04-23 with the union of both files, so the
duplicate `.env` is no longer needed.
2. .env.macmini.example now documents 7 keys the prod stack actually
depends on but that had never been listed: GOOGLE_GEMINI_API_KEY /
GOOGLE_GENAI_API_KEY (SDK aliases for Deep-Research + mana-ai),
MANA_AI_PRIVATE_KEY_PEM / MANA_AI_PUBLIC_KEY_PEM (Mission-Grant
keypair), MANA_AI_DEEP_RESEARCH_ENABLED + PUBLIC_AI_MISSION_GRANTS
(feature flags), MANA_CORE_SERVICE_KEY (legacy alias), and the STT/
TTS internal shared secrets.
Matrix-bot tokens deliberately left undocumented — no Matrix homeserver
in the current running stack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hit "container name already in use" / "removal in progress" errors
three times during today's Phase 5 deploys. The previous restart
pattern was just `compose up -d --no-deps`, which fails when:
1. A previous interrupted recreate left a stale container under
the canonical name. The new `up` tries to claim the name and
gets a conflict.
2. Compose's recovery from #1 sometimes creates a hash-prefixed
orphan container (`<hash>_<container_name>`), which then
blocks the next clean run too.
3. Even `--force-recreate` can't always handle the case because
the old container is in the middle of being removed when the
new one is being created (race).
Two-step replacement that's reliable across all three failure modes:
Step 1 — `docker compose rm -fs SERVICES`
Stops + force-removes the canonical compose-managed container.
Idempotent: does nothing if already gone. Filters out the
"No stopped containers" log noise so the output stays clean.
Step 2 — orphan sweep via `docker rm -f`
For each service, look up its container_name from the
compose config (falls back to the service name if not set),
then `docker ps -aq --filter name=^${cname}$` for the canonical
one and `name=_${cname}$` for hash-prefixed orphans. Anything
found gets nuked. This catches the case where compose's own
state has lost track of an orphan it created earlier.
Step 3 — `docker compose up -d --no-deps --remove-orphans`
Creates the fresh container. The `--remove-orphans` flag also
silences the "Found orphan containers ([mana-game-whopixels])"
warning we kept seeing — that's a leftover from a removed
service that nobody had cleaned up.
The container_name extraction uses awk on `compose config` output
(verified locally: `mana-web` → `mana-app-web`) so the script doesn't
need a hard-coded service→container mapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NOTE: the previous commit 048184bef carried this commit message but
accidentally bundled an unrelated PickerOverlay refactor instead of
this script change (lint-staged stash interaction). This is the
actual fix.
Per-app web Dockerfiles do `FROM sveltekit-base:local` and do NOT
re-COPY packages/shared-* — those packages are baked into the base
image. So a change to packages/shared-utils, packages/shared-ui, etc.
only reaches the live web app if the base image is also rebuilt.
This bit us THREE times on 2026-04-08 alone:
1. CSP fix in shared-utils ('wasm-unsafe-eval') sat unused in
production for over an hour because every `build-app.sh mana-web`
reused the cached base layer with old shared-utils.
2. The BaseListView export in shared-ui after the ListView
consolidation refactor — mana-web's build failed because Rollup
couldn't resolve the new symbol from the stale base.
3. Same shape, different package, repeatedly during the Gemma 4
migration push.
The pattern is identical every time and the manual workaround
(`build-app.sh --base` first) is something you only think to run if
you already know how the layering works. Make the script catch it.
New `is_base_image_stale` helper compares the base image's `Created`
timestamp against the latest git commit touching paths the base image
actually depends on (packages/, docker/Dockerfile.sveltekit-base,
pnpm-lock.yaml). When building any *-web service, if the image is
stale or missing, the base is rebuilt automatically before the
per-app build kicks off, with the triggering commit's oneline
printed for transparency.
Date parsing handles macOS Docker's local-TZ-offset RFC3339 format
(`...+02:00`, not Z). We strip from char 19 onward and parse the
literal local clock time with BSD date (no -u). GNU date is the
fallback for Linux dev boxes. If parsing fails for any reason we
conservatively force a rebuild rather than risk shipping stale code.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit bundles two unrelated changes that were swept together by an
accidental `git add -A` in another working session. Documented here so the
history reflects what's actually inside.
═══════════════════════════════════════════════════════════════════════
1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead
of api.signInEmail
═══════════════════════════════════════════════════════════════════════
Previous attempt (commit 55cc75e7d) tried to fix the broken JWT mint in
/api/v1/auth/login by switching the cookie name from `mana.session_token`
to `__Secure-mana.session_token` for production. That was necessary but
not sufficient: Better Auth's session cookie value isn't just the raw
session token, it's `<token>.<HMAC>` where the HMAC is derived from the
better-auth secret. Reconstructing the cookie from auth.api.signInEmail's
JSON response only gave us the raw token, so /api/auth/token's
get-session middleware still couldn't validate it and the JWT mint kept
silently failing.
Real fix: do the sign-in via auth.handler (the HTTP path) rather than
auth.api.signInEmail (the SDK path). The handler returns a real fetch
Response with a Set-Cookie header containing the fully signed cookie
envelope. We capture that header verbatim and forward it as the cookie
on the /api/auth/token request, which now passes validation and mints
the JWT correctly.
Verified end-to-end on auth.mana.how:
$ curl -X POST https://auth.mana.how/api/v1/auth/login \
-d '{"email":"...","password":"..."}'
{
"user": {...},
"token": "<session token>",
"accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now
"refreshToken": "<session token>"
}
Side benefits:
- Email-not-verified path is now handled by checking
signInResponse.status === 403 directly, no more catching APIError
with the comment-noted async-stream footgun.
- X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter
and our security log see the real client IP.
- The leftover catch block now only handles unexpected exceptions
(network errors etc); the FORBIDDEN-checking logic in it is dead but
harmless and left in for defense in depth.
═══════════════════════════════════════════════════════════════════════
2. chore: remove the entire self-hosted Matrix stack (Synapse, Element,
Manalink, mana-matrix-bot)
═══════════════════════════════════════════════════════════════════════
The Matrix subsystem ran parallel to the main Mana product without any
load-bearing integration: the unified web app never imported matrix-js-sdk,
the chat module uses mana-sync (local-first), and mana-matrix-bot's
plugins duplicated features the unified app already ships natively.
Keeping it alive cost a Synapse + Element + matrix-web + bot container
quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth,
and a steady drip of devlog/dependency churn.
Removed:
- apps/matrix (Manalink web + mobile, ~150 files)
- services/mana-matrix-bot (Go bot with ~20 plugins)
- docker/matrix configs (Synapse + Element)
- synapse/element-web/matrix-web/mana-matrix-bot services in
docker-compose.macmini.yml
- matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes
- OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks
table from mana-auth (oauth_* schema definitions also removed)
- MatrixService import path in mana-media (importFromMatrix endpoint)
- Matrix notification channel in mana-notify (worker, metrics, config,
channel_type enum, MatrixOptions handler)
- Matrix entries from shared-branding (mana-apps + app-icons),
notify-client, the i18n bundle, the observatory map, the credits
app-label list, the landing footer/apps page, the prometheus + alerts
+ promtail tier mappings, and the matrix-related deploy paths in
cd-macmini.yml + ci.yml
Devlog/manascore/blueprint entries that mention Matrix are left intact
as historical record. The oauth_* + matrix_user_links Postgres tables
stay on existing prod databases — code can no longer write to them, drop
them in a follow-up migration if you want them gone for real.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Loki was already running but had no log shipper. Adds Promtail to collect
Docker logs from all 66 containers with automatic tier labeling (infra,
auth, core, app, matrix, games) and a Grafana Logs Explorer dashboard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-app.sh now checks available RAM before builds and only stops
monitoring containers when free memory is below 3 GB threshold.
New memory-baseline.sh script measures per-container and per-category
RAM usage for capacity planning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 21 separate NestJS Matrix bot processes (~2.1 GB RAM, ~4.2 GB Docker images)
with a single Go binary using plugin architecture (8.6 MB binary, ~30 MB RAM).
New services:
- services/mana-matrix-bot/ — Go Matrix bot with 21 plugins (mautrix-go, Redis sessions)
- services/mana-api-gateway-go/ — Go API gateway (rate limiting, API keys, credit billing)
Deleted:
- 21 services/matrix-*-bot/ directories
- packages/bot-services/ and packages/matrix-bot-common/
- Legacy deploy scripts and CI build jobs
Updated:
- docker-compose.macmini.yml: new Go services, legacy bots removed
- CI/CD: change detection + build jobs for Go services
- Root package.json: new dev:matrix, build:matrix, test:matrix scripts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
docker compose stop with service names can hang due to env var warnings.
Using docker stop/start with container names is more reliable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add docker/Dockerfile.sveltekit-base: pre-built base with all 34 shared
packages (mirrors nestjs-base pattern), eliminates redundant COPY/build
steps from individual web Dockerfiles
- Add scripts/mac-mini/build-app.sh: stops monitoring stack before build
to free RAM, auto-restarts on exit (trap cleanup)
- Migrate todo web Dockerfile to use sveltekit-base:local (47 COPY lines
→ 2, 4 build steps → 0)
- Update CD workflow to build sveltekit-base when deploying web apps
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>