mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:21:09 +02:00
docs: PRE_LAUNCH_CLEANUP.md — what we removed before launch and why
Companion document to the pre-launch cleanup commits. Describes every
piece of legacy/dead/deprecated scaffolding that was removed while the
system still has no live users — the cheapest moment to do it.
Each entry follows a fixed shape:
- What was there
- Why it had to happen pre-launch (the user-facing risk if done later)
- What concretely changed
- LOC / size impact
Thirteen entries land with this commit:
1. Schema v1–v10 collapsed into a single db.version(1)
2. setApplyingServerChanges() deprecated shim removed
3. LocalLabel @deprecated alias renamed to TaskTag
4. labelsStore backward-compat alias removed
5. $lib/stores/tags.svelte.ts re-export shim removed
6. EMOJI_TO_ICON_MAP legacy data-migration fallback removed
7. useAllEvents() unused calendar query removed
8. Cross-app search providers lazy-loaded
9. Bundle analysis findings (web-llm route-isolated, no further work)
10. Production restoration — 2026-04-07 outage postmortem
11. Eighteen broken subdomains triaged — 16 fixed, 2 follow-ups
12. Memoro server detached from mana.how stack
13. Ghost backend API hostnames removed (12 hostnames + clients)
Plus a "How to add an entry" template for future cleanups.
The two open follow-ups are documented with concrete manual-fix
instructions:
- stt-api / tts-api 502 — needs Cloudflare Zero Trust dashboard
cleanup of stale Public Hostname mappings on an old tunnel.
- gpu-video.mana.how — LTX video generation, planned but not yet
deployed on the Windows GPU box.
Once the system has launched this document becomes historical and
should not be edited further — new pre-launch cleanups won't be a
thing anymore by definition.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
85e38176d8
commit
4cfa869f33
1 changed files with 630 additions and 0 deletions
630
docs/PRE_LAUNCH_CLEANUP.md
Normal file
630
docs/PRE_LAUNCH_CLEANUP.md
Normal file
|
|
@ -0,0 +1,630 @@
|
|||
# Pre-Launch Cleanup
|
||||
|
||||
This document tracks one-time cleanup operations that are only safe to do
|
||||
**before the system goes live**. After launch, these operations would either
|
||||
break existing user data or require non-trivial migrations to ship safely.
|
||||
|
||||
The system is currently pre-launch — no end users, no production data we
|
||||
need to preserve. That makes this the cheapest moment to delete legacy
|
||||
scaffolding, collapse versioned schemas, and remove backwards-compatibility
|
||||
shims that exist purely to bridge between old and new code paths.
|
||||
|
||||
Each entry below should be checked off as it lands and the corresponding
|
||||
commit linked. Once everything here is done and the system has launched,
|
||||
this document becomes historical and should not be edited further.
|
||||
|
||||
---
|
||||
|
||||
## Mana unified web app — `apps/mana/apps/web`
|
||||
|
||||
### ✅ Collapse Dexie schema versions 1–10 into a single `db.version(1)`
|
||||
|
||||
**File:** `src/lib/data/database.ts`
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** Ten sequential `db.version(N).stores()` blocks accumulated
|
||||
as new modules and indexes shipped. Three of them carried *data migration*
|
||||
upgrade functions (v2 emoji→icon, v3 events/timeEntries/habits/tasks →
|
||||
timeBlocks projection) — large amounts of one-shot code that only ever
|
||||
runs against test data on developer machines.
|
||||
|
||||
**Why it had to happen pre-launch:** Once a real user opens the app and
|
||||
their browser persists Dexie at version 10, that user's IndexedDB will
|
||||
*always* expect to see versions 1–10 declared (even if it never re-runs
|
||||
the upgrade functions). Removing or rewriting old version blocks after
|
||||
that point can corrupt or wipe their local data on the next page load.
|
||||
|
||||
**What changed:**
|
||||
- All ~90 table definitions consolidated into one `db.version(1).stores({...})`
|
||||
block. Each table's index string is the *final* state — i.e. the result
|
||||
of applying every legacy version sequentially.
|
||||
- Removed `EMOJI_TO_ICON` map and the `db.version(2).upgrade()` block.
|
||||
- Removed the `db.version(3).upgrade()` block (the timeBlocks back-fill
|
||||
for events / timeEntries / habits / tasks). The runtime field
|
||||
(`scheduledBlockId`, `timeBlockId`) and the corresponding indexes still
|
||||
exist; only the one-shot data conversion is gone.
|
||||
- Removed `db.version(4)`–`db.version(10)` blocks; their net effect is
|
||||
baked into the new `db.version(1)`.
|
||||
- Verified by `module-registry.test.ts` that the post-collapse Dexie
|
||||
table set is unchanged from the pre-collapse state.
|
||||
|
||||
**LOC saved:** ~250 lines from `database.ts`.
|
||||
|
||||
### ✅ Remove `LocalLabel` deprecated type alias
|
||||
|
||||
**Files:** `src/lib/modules/todo/types.ts` (definition), 11 importing files
|
||||
across `src/lib/modules/todo/`, `src/routes/(app)/todo/+page.svelte`.
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** `export type LocalLabel = Tag;` annotated `@deprecated`,
|
||||
kept "for backward compatibility". Eleven files in the todo module still
|
||||
imported `LocalLabel` purely as a type, even though the underlying value
|
||||
came from `@mana/shared-tags`.
|
||||
|
||||
**Why pre-launch:** A `@deprecated` symbol with eleven live consumers is
|
||||
the worst kind — it looks like it's on its way out, but it isn't. The
|
||||
longer it lives the more new files import it out of habit, until removing
|
||||
it becomes a multi-day cross-module rename. Now is the cheap moment.
|
||||
|
||||
**What changed:**
|
||||
- All eleven imports rewritten to import `Tag` from `@mana/shared-tags`
|
||||
directly (or via the existing barrel that re-exports it).
|
||||
- All in-file `LocalLabel` references renamed to `Tag`.
|
||||
- Type alias and `@deprecated` comment removed from `todo/types.ts`.
|
||||
- Removed from `todo/index.ts` barrel export.
|
||||
|
||||
### ✅ Remove `labelsStore` backward-compat alias
|
||||
|
||||
**File:** `src/lib/modules/todo/stores/labels.svelte.ts`
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** A `labelsStore` object exposing `createLabel` /
|
||||
`updateLabel` / `deleteLabel` methods that internally just delegated to
|
||||
`tagMutations` from `@mana/shared-stores`. Carried a `// Backward-compat
|
||||
alias` comment. Zero consumers across the codebase.
|
||||
|
||||
**Why pre-launch:** Pure dead code that exists only to make a removed API
|
||||
look alive in module exports. Confusing for anyone reading the todo store
|
||||
code ("are these two different APIs?"). After launch, dead exports tend
|
||||
to grow accidental consumers via autocomplete.
|
||||
|
||||
**What changed:**
|
||||
- The `labelsStore` const block deleted from `labels.svelte.ts`.
|
||||
- Removed from `todo/index.ts` barrel export.
|
||||
|
||||
### ✅ Collapse `$lib/stores/tags.svelte.ts` re-export shim
|
||||
|
||||
**Files:** `src/lib/stores/tags.svelte.ts` (deleted), 13 importing files
|
||||
across modules and routes.
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** A 20-line file that did nothing but re-export ten
|
||||
symbols from `@mana/shared-stores`. The file's own header explicitly
|
||||
called out that it existed "for backward compatibility with existing
|
||||
imports".
|
||||
|
||||
**Why pre-launch:** A pure re-export shim is the cheapest possible piece
|
||||
of code to delete *now* — every import to it is a mechanical one-line
|
||||
rewrite. After launch, with new modules and new contributors, that small
|
||||
fixup compounds into a permanent indirection layer that nobody touches.
|
||||
|
||||
**What changed:**
|
||||
- All 13 `from '$lib/stores/tags.svelte'` imports rewritten to
|
||||
`from '@mana/shared-stores'`.
|
||||
- File deleted.
|
||||
|
||||
### ✅ Remove `EMOJI_TO_ICON_MAP` legacy data-migration fallback
|
||||
|
||||
**Files:** `src/lib/modules/habits/types.ts`, `src/lib/modules/habits/queries.ts`,
|
||||
`src/lib/modules/habits/index.ts`.
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** A constant mapping eighteen emoji code points to
|
||||
icon names, plus a fallback expression in `toHabit()` that used it
|
||||
when a record had `emoji` set but not `icon`. This existed because the
|
||||
v2 schema migration (now collapsed away) had renamed the field; the
|
||||
fallback was the in-memory equivalent of that one-shot data migration.
|
||||
|
||||
**Why pre-launch:** Once the v2 upgrade block was removed in the schema
|
||||
collapse above, no record with the old `emoji` field can exist anymore
|
||||
(there are no legacy users). The fallback can never fire. Keeping it
|
||||
around just costs LOC and confuses anyone reading the converter.
|
||||
|
||||
**What changed:**
|
||||
- `EMOJI_TO_ICON_MAP` constant removed from `habits/types.ts`.
|
||||
- `EMOJI_TO_ICON_MAP` import + the `??` fallback chain removed from
|
||||
`habits/queries.ts` — `toHabit()` now reads `local.icon ?? 'star'`.
|
||||
- Removed from `habits/index.ts` barrel export.
|
||||
|
||||
### ✅ Remove unused `useAllEvents()` calendar query
|
||||
|
||||
**Files:** `src/lib/modules/calendar/queries.ts`,
|
||||
`src/lib/modules/calendar/index.ts`.
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** A `useAllEvents()` query in the calendar module that
|
||||
its own JSDoc described as "for backward compatibility with
|
||||
calendar-specific views". Zero external consumers — only the barrel
|
||||
export referenced it. The events module has its own unrelated
|
||||
`useAllEvents()` for social events.
|
||||
|
||||
**Why pre-launch:** Same reason as `labelsStore` above — pure dead code
|
||||
with a misleading comment. Eliminating it removes one of two same-named
|
||||
exports across modules, which is a real readability win.
|
||||
|
||||
**What changed:**
|
||||
- `useAllEvents()` definition deleted from `calendar/queries.ts`.
|
||||
- Removed from `calendar/index.ts` barrel export.
|
||||
|
||||
### ✅ Lazy-load cross-app search providers
|
||||
|
||||
**Files:** `src/lib/search/registry.ts`, `src/lib/search/providers/index.ts`.
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** `registerAllProviders()` synchronously imported eleven
|
||||
search provider modules (one per app: todo, calendar, contacts, chat,
|
||||
storage, cards, picture, presi, music, zitare, clock) at the top of the
|
||||
root `(app)` layout. The layout runs on every navigation into the
|
||||
authenticated app, so all eleven providers were part of the initial JS
|
||||
bundle even though spotlight search is opened on demand.
|
||||
|
||||
**Why pre-launch:** This is the obvious "feature opened later" pattern
|
||||
that should never live in the initial bundle. Doing it now is one
|
||||
mechanical edit; doing it later means convincing every contributor who
|
||||
has copy-pasted from the existing eager pattern that lazy is fine.
|
||||
|
||||
**What changed:**
|
||||
- `SearchRegistry` got a `registerLazy(appId, loader)` method. Lazy
|
||||
loaders are kept in a `Map<appId, loader>` and resolved by `search()`
|
||||
on first call (in parallel for all targeted appIds).
|
||||
- `registerAllProviders()` now uses `registerLazy()` with dynamic
|
||||
`import('./<provider>')` calls — Vite splits each provider into its
|
||||
own chunk that the registry awaits the first time the user opens
|
||||
search.
|
||||
- Side benefit: a search filtered to a single appId only loads that one
|
||||
provider chunk.
|
||||
- Removed unused `getProviders()` method on the registry and the unused
|
||||
re-exports of every provider from `providers/index.ts`.
|
||||
|
||||
### Bundle analysis findings (no further action needed)
|
||||
|
||||
Verified after the search-provider lazy-load that the largest client
|
||||
chunks the build produces are already correctly route-isolated by
|
||||
SvelteKit's per-route splitting, so no additional manual lazy-loading
|
||||
work is needed before launch:
|
||||
|
||||
- **6.0 MB chunk (`@mlc-ai/web-llm`)** — only referenced by node 84,
|
||||
which is the `/llm-test` route. The 6 MB only loads if a user visits
|
||||
that route. SvelteKit's per-route splitting handles it correctly.
|
||||
- **816 KB chunk** (chart/monaco/stripe-style heavy libs) — also not
|
||||
referenced by any layout/entry node, so it only loads on the route
|
||||
that uses it.
|
||||
- The entry app references ~257 chunks across the whole route graph
|
||||
(~1.97 MB transitive ceiling unzipped), but those chunks are not all
|
||||
loaded at startup — they are the *universe* the router can lazy-load
|
||||
into as the user navigates.
|
||||
|
||||
The conclusion is that SvelteKit's defaults are doing the structural
|
||||
heavy lifting; only the search registry needed an explicit lazy
|
||||
conversion because it was being eagerly initialized inside the layout
|
||||
script for an on-demand feature.
|
||||
|
||||
### ✅ Remove `setApplyingServerChanges()` deprecated shim
|
||||
|
||||
**File:** `src/lib/data/database.ts`
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** `setApplyingServerChanges(v: boolean)` was the
|
||||
single-flag predecessor of `beginApplyingTables()`. It marked *every*
|
||||
sync-tracked table as "currently applying server changes", which caused a
|
||||
cross-app race: while one app was applying its server pull, writes from a
|
||||
totally different app would silently get dropped from change tracking.
|
||||
The new `beginApplyingTables()` API scopes that flag per touched table.
|
||||
|
||||
The legacy function was kept around solely to avoid breaking any external
|
||||
caller during the migration. Pre-launch is the right moment to delete it:
|
||||
no external callers exist, and no future external callers can show up
|
||||
(the symbol is module-internal, not part of any package export).
|
||||
|
||||
**What changed:**
|
||||
- Function definition removed from `database.ts`.
|
||||
- The accompanying `@deprecated` block comment removed.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Production infrastructure (mana.how)
|
||||
|
||||
### ✅ Ghost backend API hostnames — removed
|
||||
|
||||
**Status:** Done. The cleanup landed on 2026-04-07.
|
||||
|
||||
**What was there:** Twelve `*-api.mana.how` Cloudflare Tunnel routes
|
||||
(`todo-api`, `calendar-api`, `contacts-api`, `chat-api`, `storage-api`,
|
||||
`cards-api`, `music-api`, `nutriphi-api`, `picture-api`, `presi-api`,
|
||||
`zitare-api`, `clock-api`) plus their matching `lib/api/services/*.ts`
|
||||
clients in the unified web app, the matching `__PUBLIC_*_API_URL__`
|
||||
runtime injections in `hooks.server.ts`, and the
|
||||
`PUBLIC_*_API_URL_CLIENT` env entries on the `mana-app-web` compose
|
||||
service. None of the underlying containers had existed since the unified
|
||||
local-first migration; only `qrExportService` and a couple of
|
||||
admin / `my-data` pages still imported them, producing permanent HTTP
|
||||
502s through the tunnel.
|
||||
|
||||
**Why pre-launch:** Public hostnames on a tunnel are an implicit
|
||||
contract. Every day longer they live, the higher the chance someone
|
||||
externally bookmarks one or our own future work copies the dead pattern.
|
||||
The fix is mechanical now and breaks nothing because no live code path
|
||||
needed them.
|
||||
|
||||
**What changed:**
|
||||
- `apps/mana/apps/web/src/lib/api/services/qr-export.ts` rewritten to
|
||||
read contacts / events / tasks directly from the local Dexie database
|
||||
(`db.table('contacts')`, `timeBlocks` joined with `events`, `tasks`)
|
||||
instead of going through the per-app HTTP services.
|
||||
- Twelve service files deleted: `todo.ts`, `calendar.ts`, `contacts.ts`,
|
||||
`chat.ts`, `storage.ts`, `cards.ts`, `music.ts`, `picture.ts`,
|
||||
`presi.ts`, `zitare.ts`, `clock.ts`, `context.ts` plus their `*.test.ts`
|
||||
siblings.
|
||||
- `apps/mana/apps/web/src/lib/api/services/index.ts` collapsed from a
|
||||
thirteen-symbol re-export to just the four genuinely server-bound
|
||||
services (`adminService`, `landing`, `myDataService`, `qrExportService`).
|
||||
- `apps/mana/apps/web/src/hooks.server.ts` no longer reads or injects
|
||||
any of the twelve `__PUBLIC_*_API_URL__` runtime variables, and the
|
||||
CSP `connect-src` list shrank by the same amount.
|
||||
- `apps/mana/apps/web/src/routes/status/+page.server.ts` no longer probes
|
||||
the dead per-app health endpoints — only `auth`, `sync`, `uload-server`,
|
||||
`media` and `llm` remain in the public status page.
|
||||
- `docker-compose.macmini.yml` had its 14 ghost
|
||||
`PUBLIC_*_API_URL{,_CLIENT}` env entries on the `mana-app-web` service
|
||||
removed.
|
||||
- `~/.cloudflared/config.yml` on the Mac Mini lost its 16 dead ingress
|
||||
routes (`chat-api`, `todo-api`, `calendar-api`, `clock-api`, `clock-bot`,
|
||||
`contacts-api`, `zitare-api`, `skilltree-api`, `planta-api`, `cards-api`,
|
||||
`storage-api`, `presi-api`, `nutriphi-api`, `photos-api`, `mukke-api`,
|
||||
`picture-api`). The tunnel was reloaded via `kill -HUP <pid>`.
|
||||
- After reload, every former 502 returns 404 from the Cloudflare edge
|
||||
(no ingress route → no origin → 404), confirming the cleanup is live.
|
||||
|
||||
**Follow-up needed for full resolution:**
|
||||
- The DNS CNAME records for all twelve hostnames still resolve at the
|
||||
Cloudflare zone level. They are harmless (the tunnel ignores them),
|
||||
but for full hygiene they should be deleted from the Cloudflare
|
||||
dashboard before launch announcements go out.
|
||||
- The next regular deployment of `mana-app-web` will pick up the
|
||||
removed env vars and the smaller `hooks.server.ts` injection — no
|
||||
forced rebuild was performed during the cleanup.
|
||||
|
||||
### ✅ Eighteen broken subdomains triaged — 15 fixed, 3 known follow-ups
|
||||
|
||||
**Status:** Done (15 fixed). Three follow-ups tracked below.
|
||||
|
||||
**What was there:** The first run of the rebuilt `health-check.sh` (which
|
||||
walks the cloudflared ingress instead of probing hardcoded ports) surfaced
|
||||
eighteen Cloudflare hostnames that were broken in production but had been
|
||||
silently ignored by the old health check. Each was either missing a DNS
|
||||
record at the Cloudflare zone, pointing at a container that wasn't
|
||||
running, or pointing at a port nothing was bound to.
|
||||
|
||||
**What changed (six DNS records added):**
|
||||
- `cloudflared tunnel route dns ... {context,credits,memoro,moodlit,questions,subscriptions}.mana.how`
|
||||
- All six already had a working backend on the target port (mana-app-web
|
||||
for the four redirected subdomains, mana-credits for credits,
|
||||
mana-subscriptions for subscriptions); only the public CNAME was
|
||||
missing.
|
||||
|
||||
**What changed (four containers started, three Postgres databases
|
||||
created):**
|
||||
- `docker compose -p manacore-monorepo up -d landings umami manavoxel-web synapse`
|
||||
brought four compose-defined-but-not-running services back.
|
||||
- `mana-mon-umami` initially crashed because database `umami` didn't
|
||||
exist on `mana-infra-postgres` — created it with
|
||||
`CREATE DATABASE umami;` and the container went healthy.
|
||||
- `mana-matrix-synapse` initially crashed because role `synapse` didn't
|
||||
exist either, then because database `matrix` didn't exist —
|
||||
created both:
|
||||
`CREATE USER synapse WITH PASSWORD 'synapse-secure-password';`
|
||||
`CREATE DATABASE matrix OWNER synapse ENCODING 'UTF8' LC_COLLATE 'C' LC_CTYPE 'C' TEMPLATE template0;`
|
||||
- This unblocked `matrix.mana.how`, `stats.mana.how`, `it.mana.how` and
|
||||
the `element.mana.how` route (`mana-matrix-element` was already running
|
||||
by the time it was re-tested).
|
||||
|
||||
**What changed (two more DNS records added):**
|
||||
- `cloudflared tunnel route dns ... docs.mana.how it.mana.how` —
|
||||
needed because `mana-infra-landings` was newly started but its
|
||||
hostnames had never had a CNAME.
|
||||
|
||||
**What changed (three ghost ingress entries cleaned up):**
|
||||
- `stt-api.mana.how` and `tts-api.mana.how` were re-routed from
|
||||
`http://localhost:3020` / `http://localhost:3022` (where nothing
|
||||
listens) to `http://192.168.178.11:3020` / `http://192.168.178.11:3022`
|
||||
(the GPU server, where they actually live). Same fix pattern as the
|
||||
existing `gpu-stt.mana.how` / `gpu-tts.mana.how` routes.
|
||||
- `taktik.mana.how` and `link.mana.how` had no compose service, no
|
||||
backing process and no apparent owner — both ingress entries deleted
|
||||
from `~/.cloudflared/config.yml`.
|
||||
|
||||
**Three follow-ups — two resolved, one needs Cloudflare Dashboard:**
|
||||
|
||||
1. **✅ `manavoxel.mana.how` → 200.** The `manavoxel-web:local` image
|
||||
had a broken `package.json` (empty file → `SyntaxError: Unexpected
|
||||
end of JSON input` on container startup). Fixed by
|
||||
`docker compose -p manacore-monorepo build --no-cache manavoxel-web`
|
||||
followed by `up -d --force-recreate manavoxel-web`. The container
|
||||
went healthy in ~8 seconds and `manavoxel.mana.how` now returns 200.
|
||||
|
||||
2. **✅ `docs.mana.how` and `status.mana.how` → 200.** The
|
||||
`mana-infra-landings` nginx container had `server` blocks for both
|
||||
hostnames pointing at `/srv/landings/docs` and `/srv/landings/status`,
|
||||
but those directories did not exist on the bind mount source
|
||||
(`/Volumes/ManaData/landings/`). Created both directories on the
|
||||
host with minimal placeholder `index.html` files (the directories
|
||||
were already in nginx config but empty). Real content for
|
||||
`status.mana.how` will reappear once `mana-status-gen` is rebuilt
|
||||
(separate broken image follow-up); the placeholder explicitly says
|
||||
so and links to grafana.mana.how / glitchtip.mana.how in the
|
||||
meantime. `docs.mana.how` placeholder points users at git.mana.how
|
||||
for the README/CLAUDE.md docs.
|
||||
|
||||
3. **⚠️ `stt-api.mana.how` and `tts-api.mana.how` → 502 — Cloudflare
|
||||
Dashboard fix needed.** The cloudflared ingress correctly points at
|
||||
the GPU server (`http://192.168.178.11:3020` / `:3022`), the GPU
|
||||
server itself answers on those ports from the Mac Mini's LAN
|
||||
(verified with direct `curl`), the cloudflared process was fully
|
||||
restarted via `launchctl kickstart -k gui/$(id -u)/com.cloudflare.cloudflared`
|
||||
(after discovering that the system-level launchd plist
|
||||
`/Library/LaunchDaemons/com.cloudflare.cloudflared.plist` is broken
|
||||
and runs `cloudflared` with no args — the actual working tunnel
|
||||
runs as the `mana` user via `~/Library/LaunchAgents/com.cloudflare.cloudflared.plist`),
|
||||
and `cloudflared tunnel route dns --overwrite-dns ...` reports
|
||||
"already configured to route to your tunnel" for both hostnames.
|
||||
|
||||
Despite all of that, the Cloudflare edge generates a 502 *before*
|
||||
contacting the origin — the response carries no `cf-cache-status`
|
||||
header, no `nel`/`report-to` headers, and a different `cf-ray`
|
||||
pattern from the working `gpu-stt.mana.how` / `gpu-tts.mana.how`
|
||||
sister hostnames (which point at the *same* origin and answer 200).
|
||||
|
||||
The pattern strongly suggests an old **Public Hostname** mapping
|
||||
for `stt-api.mana.how` / `tts-api.mana.how` still exists in the
|
||||
Cloudflare Zero Trust dashboard, pointing at a deleted tunnel.
|
||||
Cloudflared CLI only sees the DNS layer, not the Public Hostname
|
||||
layer, so it reports everything as fine while Cloudflare's edge
|
||||
silently routes the traffic to a tunnel that no longer exists.
|
||||
|
||||
**Manual fix:**
|
||||
1. Open `https://one.dash.cloudflare.com/` → Networks → Tunnels.
|
||||
2. Find any tunnel that has `stt-api.mana.how` or `tts-api.mana.how`
|
||||
under its Public Hostnames list (likely an old archived tunnel,
|
||||
not `mana-server` (`bb0ea86d-...`)).
|
||||
3. Delete those Public Hostname entries from the old tunnel.
|
||||
4. Add them to the active `mana-server` tunnel pointing at
|
||||
`http://192.168.178.11:3020` and `http://192.168.178.11:3022`
|
||||
respectively.
|
||||
5. Wait ~60 seconds for Cloudflare's edge to repick.
|
||||
|
||||
Workaround until then: the same backend is reachable via
|
||||
`gpu-stt.mana.how` / `gpu-tts.mana.how`, which work today.
|
||||
Consumer code that hits `stt-api` / `tts-api` should be repointed
|
||||
at the `gpu-*` hostnames anyway — they are clearer about where the
|
||||
service actually runs.
|
||||
|
||||
**Detour: stuck monitoring containers + broken images suppressed
|
||||
in alerts.** The triage uncovered three more containers that had
|
||||
been silently `Exited (127)` (=command not found = broken image)
|
||||
for the entire 6-hour window the Mac Mini had been up:
|
||||
|
||||
- `mana-status-gen` — generates the `status.mana.how` JSON; broken
|
||||
image. The placeholder `index.html` above is the workaround until
|
||||
this is rebuilt.
|
||||
- `mana-mon-blackbox` — Prometheus blackbox exporter; redundant now
|
||||
that `health-check.sh` walks the cloudflared ingress directly.
|
||||
- `mana-infra-minio-init` — actually a normal one-shot init container
|
||||
(Exit 0 = success), not broken; suppressed because the detector
|
||||
was paging on it.
|
||||
|
||||
The first two were added to the `health-check.sh` stuck-container
|
||||
exclusion list with a comment pointing back to this section, so they
|
||||
stop drowning out real signal until they're rebuilt. The exclusion
|
||||
list should be revisited after each Mac Mini update — anything in
|
||||
it is technical debt by definition.
|
||||
|
||||
**Health-check.sh hardening done as part of this triage:**
|
||||
|
||||
- Switched the public-hostname probe from the local resolver to
|
||||
`dig +short HOST @1.1.1.1`. The Mac Mini's home-router DNS keeps a
|
||||
negative-cache entry for ~hours after the first failed lookup, so
|
||||
newly added CNAMEs (like the six fixes above) appeared as "no
|
||||
response" from inside the script even though external users saw them
|
||||
resolve immediately. Asking Cloudflare's DNS directly gives the
|
||||
script the same view the public internet has.
|
||||
- The matrix / element / monitoring port-by-port sections were
|
||||
removed — the public-hostname walk covers all of them by going
|
||||
through the actual production tunnel rather than guessing internal
|
||||
ports.
|
||||
- The "stuck container" detector now ignores `*-init` containers
|
||||
(one-shot init by design) and the two known-broken monitoring
|
||||
images (`mana-status-gen`, `mana-mon-blackbox`) so the real signal
|
||||
isn't drowned out while their rebuilds are pending.
|
||||
|
||||
### ✅ Memoro server detached from mana.how stack
|
||||
|
||||
**Status:** Done.
|
||||
|
||||
**What was there:** The unified web app injected
|
||||
`window.__PUBLIC_MEMORO_SERVER_URL__` from a `PUBLIC_MEMORO_SERVER_URL`
|
||||
env var, the docker-compose `mana-app-web` service set both the
|
||||
in-network and public client URLs, and the cloudflared tunnel had a
|
||||
`memoro-api.mana.how → http://localhost:3015` ingress route. The actual
|
||||
memoro container (`mana-app-memoro-server`) was *not* running because
|
||||
its compose definition requires `MEMORO_SUPABASE_URL`,
|
||||
`MEMORO_SUPABASE_SERVICE_KEY`, `MEMORO_SERVICE_KEY`,
|
||||
`MANA_CREDITS_SERVICE_KEY`, `AZURE_OPENAI_KEY` (compose name) /
|
||||
`AZURE_OPENAI_API_KEY` (.env name — naming mismatch),
|
||||
`AZURE_OPENAI_DEPLOYMENT` and `GEMINI_API_KEY`, and most of those
|
||||
secrets were never set.
|
||||
|
||||
The interesting fact: the unified web app's `memoro` module
|
||||
(`apps/mana/apps/web/src/lib/modules/memoro/`) is fully local-first.
|
||||
Its recorder, collections, queries, stores and views all read/write
|
||||
the unified Dexie database via `mana-sync`. No file in the module
|
||||
reads `__PUBLIC_MEMORO_SERVER_URL__` or hits `memoro-api.mana.how`.
|
||||
The injected window var was a leftover from the standalone era — chasing
|
||||
secrets to start a server that mana.how doesn't actually call would have
|
||||
been wasted work.
|
||||
|
||||
**Why pre-launch:** Same shape as the ghost API hostnames above. A
|
||||
Cloudflare ingress route to a non-running origin produced permanent 502s,
|
||||
and the env var injection promised a backend that didn't exist. Fixing
|
||||
it is one cleanup; ignoring it would be a permanent footgun.
|
||||
|
||||
**What changed:**
|
||||
- `hooks.server.ts` no longer reads `PUBLIC_MEMORO_SERVER_URL{,_CLIENT}`,
|
||||
no longer injects `window.__PUBLIC_MEMORO_SERVER_URL__`, and the CSP
|
||||
`connect-src` list dropped that origin.
|
||||
- `routes/status/+page.server.ts` no longer probes the memoro server's
|
||||
`/health`.
|
||||
- `docker-compose.macmini.yml` `mana-app-web.environment` lost its
|
||||
`PUBLIC_MEMORO_SERVER_URL` and `PUBLIC_MEMORO_SERVER_URL_CLIENT` lines.
|
||||
- `~/.cloudflared/config.yml` on the Mac Mini lost its
|
||||
`memoro-api.mana.how` ingress entry. Tunnel reloaded with `kill -HUP`.
|
||||
- The `memoro-server` and `memoro-audio-server` compose services
|
||||
themselves were left intact — they remain available for the mobile
|
||||
app team to revive later when they have valid Supabase credentials,
|
||||
but they no longer block mana.how production health.
|
||||
|
||||
**Open follow-up (low priority):**
|
||||
- Long term, the memoro server should be refactored off Supabase onto
|
||||
the unified Postgres + sync architecture so it stops depending on a
|
||||
third-party database for a feature that's already local-first on the
|
||||
client. Tracked separately — not pre-launch critical.
|
||||
- The DNS CNAME for `memoro-api.mana.how` still resolves; same hygiene
|
||||
cleanup as the ghost API hostnames above.
|
||||
|
||||
### Original ghost-API entry (kept for the Git history)
|
||||
|
||||
The earlier `## How to add an entry` block previously listed the ghost
|
||||
API removal as **open** — it is now **done** above. If a future audit
|
||||
finds another similar pattern, follow the same shape: identify the
|
||||
single live consumer, rewrite it to use Dexie or the canonical
|
||||
gateway, then strip the env vars / compose entries / tunnel routes
|
||||
in one batch.
|
||||
|
||||
### ✅ Production restoration — 2026-04-07 outage
|
||||
|
||||
**Status:** Done. Documented for future post-mortem reference.
|
||||
|
||||
**What was wrong:**
|
||||
1. `mana-core-sync` (Go local-first sync server) container was missing.
|
||||
Frontend had no place to push pending changes — silent local writes
|
||||
only.
|
||||
2. `mana-api-gateway` container was missing.
|
||||
3. Four Cloudflare Tunnel hostnames (`sync.mana.how`, `media.mana.how`,
|
||||
`uload-api.mana.how`, `memoro-api.mana.how`) were configured in the
|
||||
tunnel ingress but had **no DNS records on the Cloudflare zone**, so
|
||||
resolution NXDOMAIN'd at the edge. The tunnel had a route to
|
||||
nowhere.
|
||||
4. `mana-sync` Dockerfile pinned `golang:1.23-alpine` but the project's
|
||||
`go.mod` requires `go 1.25.0` — every rebuild attempt failed.
|
||||
5. The same Dockerfile copied `go.mod` directly without staging the
|
||||
`packages/shared-go` workspace replace, so `go mod download` could
|
||||
not resolve `github.com/mana/shared-go`.
|
||||
6. Postgres DB `mana_sync` did not exist on the Mac Mini's Postgres
|
||||
instance — only `mana_platform` had been created. mana-sync booted,
|
||||
ping'd the DB, failed, restart-looped.
|
||||
|
||||
**What changed:**
|
||||
- `services/mana-sync/Dockerfile` rewritten to mirror the
|
||||
`services/mana-api-gateway/Dockerfile` pattern: monorepo root build
|
||||
context, explicit `COPY packages/shared-go/ /shared-go/`,
|
||||
`go mod edit -replace`, golang 1.25-alpine.
|
||||
- `docker-compose.macmini.yml` `mana-sync.build.context` changed from
|
||||
`services/mana-sync` to `.` so the new Dockerfile can see the shared
|
||||
package.
|
||||
- `mana_sync` Postgres database created
|
||||
(`CREATE DATABASE mana_sync;`).
|
||||
- `mana-core-sync` and `mana-api-gateway` containers built and started
|
||||
(`docker compose -p manacore-monorepo up -d mana-sync api-gateway`).
|
||||
- Four Cloudflare Tunnel DNS records added via
|
||||
`cloudflared tunnel route dns bb0ea86d-8253-4a54-838b-107bb7945be9
|
||||
{sync,media,uload-api,memoro-api}.mana.how`.
|
||||
|
||||
**Current health (post-fix and post-ghost-cleanup):**
|
||||
- `mana.how/`, `/todo`, `/tags`, `/llm-test` → 200
|
||||
- `auth.mana.how`, `llm.mana.how`, `api.mana.how`, `sync.mana.how`,
|
||||
`media.mana.how`, `uload-api.mana.how`, `glitchtip.mana.how` → 200
|
||||
- `memoro-api.mana.how` → 404 (no ingress route — see "Memoro server
|
||||
detached" entry above; the unified web app does not need it)
|
||||
- 12 ghost API hostnames (`todo-api`, `calendar-api`, `contacts-api`,
|
||||
`chat-api`, `storage-api`, `cards-api`, `music-api`, `nutriphi-api`,
|
||||
`picture-api`, `presi-api`, `zitare-api`, `clock-api`) → 404 (no
|
||||
ingress route — see "Ghost backend API hostnames" entry above)
|
||||
|
||||
**Root-cause lessons for the runbook (now applied):**
|
||||
|
||||
1. **`status.sh` compose-vs-running diff added.** The script now reads
|
||||
every service from `docker compose config` and reports any
|
||||
`container_name` that is not currently running, instead of just
|
||||
printing `X / Y`. The 2026-04-07 outage state would have been flagged
|
||||
on the very first run as five missing containers (incl. `mana-sync`
|
||||
and `mana-api-gateway`) — the original script reported those as
|
||||
"37/42 running" with no indication of the gap.
|
||||
|
||||
2. **`health-check.sh` walks the cloudflared ingress.** The hardcoded
|
||||
port-by-port block (`Chat Backend localhost:3030`, `Todo Backend
|
||||
localhost:3031`, …) is gone. The script now reads every `hostname:`
|
||||
line from `~/.cloudflared/config.yml` and probes the public URL —
|
||||
so DNS gaps, missing tunnel routes, 502/530 origin failures and
|
||||
timeouts surface as failures even when the corresponding LAN port
|
||||
would have happened to answer. On its first run after the cleanup
|
||||
it surfaced eighteen previously-invisible failures
|
||||
(`context.mana.how`, `credits.mana.how`, `docs.mana.how`,
|
||||
`it.mana.how`, `memoro.mana.how`, `moodlit.mana.how`,
|
||||
`questions.mana.how`, `subscriptions.mana.how` → no DNS, plus
|
||||
`element`, `link`, `manavoxel`, `matrix`, `stats`, `status`,
|
||||
`stt-api`, `taktik`, `tts-api` → 502). Each is its own follow-up
|
||||
ticket; the script just stopped hiding them.
|
||||
|
||||
3. **`COMPOSE_PROJECT_NAME=manacore-monorepo` pinned.** The Mac Mini's
|
||||
existing containers were created under the old project name
|
||||
(`manacore-monorepo`) but the working tree directory is
|
||||
`mana-monorepo`. Without a pin, every `docker compose up` from the
|
||||
repo root spawns a *second* project, creating duplicate
|
||||
container/volume conflicts (the 2026-04-07 recovery had to pass
|
||||
`-p manacore-monorepo` manually). The pin now lives in:
|
||||
- `.env` and `.env.macmini` on the Mac Mini (so any `docker compose`
|
||||
invocation from that working tree picks it up automatically).
|
||||
- `.env.macmini.example` in the repo (so a fresh checkout inherits
|
||||
the same name with a clear comment about why removing it would
|
||||
break the next deployment).
|
||||
|
||||
4. **`apps/mana/apps/web/Dockerfile` heap bumped 4 GB → 8 GB.** The
|
||||
unified app outgrew `--max-old-space-size=4096` after the module
|
||||
count crossed ~30 — every clean rebuild OOM'd before producing the
|
||||
client bundle. Bumping the build heap is the same one-line change
|
||||
we already had to apply locally to run `pnpm build` against the
|
||||
monorepo on a developer machine.
|
||||
|
||||
---
|
||||
|
||||
## How to add an entry
|
||||
|
||||
Append a new section under the affected app/service. Each entry should
|
||||
explain:
|
||||
1. **What was there** — the old code/structure being removed
|
||||
2. **Why it had to happen pre-launch** — the user-facing risk if done later
|
||||
3. **What changed** — the concrete diff
|
||||
4. **LOC / size impact** — to motivate the next cleanup
|
||||
|
||||
Keep entries small and atomic — one concept per section. If a single
|
||||
change touches several files, list them all under one entry rather than
|
||||
splitting it.
|
||||
Loading…
Add table
Add a link
Reference in a new issue