Root file cleanup:
- mac-mini-setup.sh → scripts/mac-mini/bootstrap.sh (first-time bootstrap
belongs next to the other mac-mini setup-* scripts)
- test-chat-auth.sh → scripts/test-chat-auth.sh (ad-hoc smoke test, no
reason to live in the repo root)
- cloudflared-config.yml stays in root on purpose — it's the single source
of truth read by scripts/mac-mini/setup-*.sh and scripts/check-status.sh.
Docs:
- docs/POSTMORTEM_2026-04-07.md → docs/postmortems/2026-04-07-memoro-deploy-prod-wipe.md
(creates the postmortems/ home for future entries; descriptive name)
- docs/future/MAIL_SERVER_MAC_MINI_TEMP.md deleted — what it described
("Bereit zur Umsetzung", Stalwart on Mac Mini) is what's actually
running today, documented in docs/MAIL_SERVER.md. The DEDICATED variant
in docs/future/ remains since it's still a real future plan.
Root CLAUDE.md fix:
- @mana/local-store description was wrong — claimed it was legacy/standalone
only, but it's still used by apps/mana/apps/web itself, plus manavoxel,
arcade, and three shared packages.
Not touched (flagged for follow-up):
- NewAppIdeas/ (344K of "Roblox Reimagined" planning notes in repo root) —
user decision: archive externally or move under docs/future/
- Doc giants (PROJECT_OVERVIEW 41k, MATRIX_BOT_ARCHITECTURE 36k, etc.) —
splitting them is its own refactor
- Service CLAUDE.md staleness audit across 18 services — too broad for
this pass
Removed:
- apps/manacore/ — three Svelte files were byte-identical duplicates of
the apps/mana/ versions, leftover from the 2025 rename. Untracked .env
files in the same dir were also cleared.
- 21 empty apps/*/apps/web-archived/ directories — leftover from the
unification move, never tracked in git.
- services/it-landing/ — empty directory, picked up by the services/*
workspace glob for no reason.
- apps/news/apps/server-archived/ — empty.
Fixed:
- scripts/mac-mini/status.sh: COMPOSE_PROJECT_NAME fallback was still
manacore-monorepo from before the rename.
Documented:
- Root CLAUDE.md now describes apps/api/ (the @mana/api unified backend)
as a top-level peer to apps/mana/. It was completely missing from the
trimmed CLAUDE.md, which made the layout look frontend-only.
Two failures during the 2026-04-07 production outage triage were caused
not by the underlying outage but by `status.sh` and `health-check.sh`
hiding the broken state. Both scripts hardened so the same outage
shape can't reoccur invisibly.
status.sh — compose-vs-running diff
The old script printed "X containers running / Y total" without
noticing that some compose-defined containers were never started in
the first place. The Mac Mini was running 37 of 42 declared
containers and the script reported "37 running" with no indication
of the gap — `mana-core-sync` and `mana-api-gateway` were silently
missing for hours.
New behaviour: read every service from `docker compose config`,
diff its `container_name` against `docker ps`, and report each
declared service whose container is not currently up. The same
outage state would have been flagged on the very first run.
health-check.sh — public-hostname walk via Cloudflare DNS
The old script probed ~50 hardcoded `localhost:<port>/health`
endpoints across Chat, Todo, Calendar, etc. — but the per-app
HTTP backends those endpoints expected don't exist anymore (the
ghost-API cleanup removed them entirely). Every probe returned
HTTP 000 / connection refused, generating a wall of false-positive
alerts that drowned out the real signal.
The block was replaced with a dynamic walk of every `hostname:`
entry in `~/.cloudflared/config.yml`. Each hostname is probed via
the public Cloudflare tunnel, so DNS gaps, missing tunnel routes,
502/530 origin failures and timeouts surface as failures the same
way real users would experience them. On its first run after the
cleanup it surfaced eighteen previously-invisible hostname failures
(no DNS, 502, or 530) — every one of them a real production issue.
DNS resolution intentionally goes through `dig +short HOST @1.1.1.1`
instead of the local resolver. The Mac Mini's home-router DNS keeps
a negative cache for hours after the first failed lookup, so newly
added CNAMEs (like the post-outage sync/media records) appeared as
"no response" from inside the script for hours even though external
users saw them resolve immediately. Asking Cloudflare's DNS directly
gives the script the same view the public internet has.
The Matrix, Element, GPU-LAN-redundant and monitoring port-by-port
blocks were removed — the public-hostname walk covers all of them
via their `*.mana.how` hostnames going through the actual tunnel.
The "stuck container" detector now ignores `*-init` containers
(one-shot init pods, Exit 0 = success, intentionally never re-run).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds end-to-end browser voice capture for the Memoro module, mirroring the
existing dreams pattern: MediaRecorder → SvelteKit server proxy → mana-stt
on the Windows GPU box via Cloudflare tunnel.
Recording UI lives in /memoro page header (mic button + live timer + cancel +
sticky-permission retry). Server proxy at /api/v1/memoro/transcribe forwards
the blob with the server-held X-API-Key. memosStore.createFromVoice creates a
placeholder memo with processingStatus='processing' and fires transcribeBlob
in the background, which writes the transcript and flips status on completion
(or 'failed' with error in metadata).
Also corrects the mana-stt hostname across the repo: stt-api.mana.how (which
never existed in DNS) → gpu-stt.mana.how (the actual Cloudflare tunnel route
to the Windows GPU box). Adds an ENVIRONMENT_VARIABLES.md section explaining
how to obtain MANA_STT_API_KEY and where the tunnel terminates. Adds tunnel
health probes to the mac-mini health-check script so we catch tunnel-side
breakage in addition to LAN-side.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a one-tap voice recorder at the top of the Dreams module. Speak
your dream right after waking, the audio is sent through a server-side
proxy to mana-stt, and the transcript appears in the entry as soon as
it lands.
- New /api/v1/dreams/transcribe SvelteKit server route proxies the
upload to mana-stt with the server-held MANA_STT_API_KEY (never
exposed to the browser); validates mime, size, missing config
- Adds MANA_STT_URL + MANA_STT_API_KEY to the mana-web env config in
generate-env.mjs (private, not PUBLIC_ prefixed)
- New DreamRecorder class wraps MediaRecorder with reactive
$state — status, elapsed timer, error; supports cancel
- dreamsStore.createFromVoice creates a placeholder dream with
processingStatus='transcribing' and kicks off the upload
- dreamsStore.transcribeBlob uploads, writes the result back into
the dream, falls back to processingStatus='failed' on errors
- Adds processingStatus + processingError + audioDurationMs to
LocalDream; backwards-compatible defaults in toDream
- Mic button in ListView with idle / requesting / recording
(with elapsed timer + pulsing red) / stopping states
- Cancel button discards the in-flight recording
- Transcribing badge ●●● + failed ! badge on dream rows
- Inline editor shows live transcription status; while it's running
and the user hasn't typed anything, the transcript folds into the
edit buffer as soon as it arrives
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New Hono+Bun service at services/mana-events on port 3065 with two
schemas in mana_platform: events_published (snapshots) and public_rsvps
(unauthenticated responses), plus a per-token hourly rate-limit bucket.
- Host endpoints (JWT) for publish/update/unpublish/list-rsvps
- Public endpoints for snapshot fetch + RSVP upsert with rate limiting
- New /rsvp/[token] page outside the auth gate, SSR-loads the snapshot
- Client store wires publishEvent/unpublishEvent to the server, syncs
snapshot updates after edits, and deletes the snapshot on event delete
- DetailView polls GET /events/:id/rsvps every 30s while open and lets
hosts import a public response into their local guest list
- generate-env, setup-databases.sh, .env.development, hooks.server.ts,
package.json wired for local dev
The previous startup.sh checked colima status via `colima status | grep running`
and, if that failed, ran `colima stop --force` unconditionally before starting.
This is destructive: a transient status mis-detection can kill a healthy running
VM, and the subsequent start often hangs because of leftover locks/processes.
Triggered today during the ManaCore→Mana rename: reloading the docker-startup
LaunchAgent ran the script, which falsely concluded colima was down, killed the
running VM, and left 12 zombie limactl processes plus a stale disk lock symlink.
The whole production stack (incl. Forgejo) was offline until manual cleanup.
Changes:
- Use `docker info` as the readiness check instead of `colima status` —
it directly tests the thing we care about (docker socket reachable)
- Only do cleanup work when we actually need to start; never SIGKILL a
running VM as a "precaution"
- When we do need to start: reap any zombie limactl/colima processes from
prior failed runs, and clear the stale disk-in-use lock if no process
actually holds it
- Verify successful start with `docker info`, not `colima status`
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All web app subdomains (chat.mana.how, todo.mana.how, etc.) were removed
when the unified app launched, but monitoring configs still referenced them.
Update blackbox targets to use mana.how/route URLs, remove stale API backend
routes from cloudflared, clean up CORS origins, and fix status page generator
to handle route-based URLs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove standalone app Umami website IDs from .env.development and
generate-env.mjs. Remove injectUmamiAnalytics from all 21 standalone
app hooks.server.ts files. All analytics now flow through the single
ManaCore unified app website ID with module-level segmentation.
Landing page IDs are preserved (separate Astro sites).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors the frontend unification (single IndexedDB) on the backend.
All services now use pgSchema() for isolation within one shared database,
enabling cross-schema JOINs, simplified ops, and zero DB setup for new apps.
- Migrate 7 services from pgTable() to pgSchema(): mana-user (usr),
mana-media (media), todo, traces, presi, uload, cards
- Update all DATABASE_URLs in .env.development, docker-compose, configs
- Rewrite init-db scripts for 2 databases + 12 schemas
- Rewrite setup-databases.sh for consolidated architecture
- Update shared-drizzle-config default to mana_platform
- Update CLAUDE.md with new database architecture docs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New GPU service for fast text-to-video generation using LTX-Video (~2B params)
on the RTX 3090. Generates 480p clips in 10-30 seconds, uses ~10GB VRAM.
Includes Cloudflare Tunnel route, Prometheus monitoring, and health checks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ManaCore as first entry in MANA_APPS so the dashboard at mana.how
gets a tier badge. Map mana.how → manacore and inventar → inventory
in subdomain aliases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Mukke, Photos, Planta, SkillTree, Playground, Arcade to mana-apps.ts
with icons and APP_URLS. Fix manadeck→cards subdomain alias in status
page generator so the tier badge renders for the renamed app.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move release tier info (founder/alpha/beta/public) from a standalone
grid section into the existing service rows as small inline badges
next to each web app name. Cleaner, less visual noise.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse tier data automatically from mana-apps.ts (awk, read-only volume
mount) so the status page stays in sync without manual updates. Shows
founder/alpha/beta/public cards with per-app development status.
Tier data is also included in status.json for ManaScore consumption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- generate-status-page.sh now also writes status.json alongside index.html
Format: { updated, summary: {up, total}, services: { appName: bool } }
- nginx status.mana.how serves status.json with CORS headers (public read)
and explicit location block to avoid rewrite to index.html
- ManaScore index page fetches status.json client-side on load and
injects green ● LIVE / red ● DOWN badge next to each app's status chip
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace @arcade/backend (NestJS) with @arcade/server (Hono/Bun).
Same two endpoints, no auth required (public game generator):
- POST /api/games/generate — AI game generation (Gemini, Claude, GPT)
- POST /api/games/submit — Community game submission via GitHub PR
- GET /health — Health check
This removes the last remaining NestJS backend from the monorepo.
NestJS is now completely gone — all servers use Hono + Bun.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- inventar-web: fix mangled icon import in settings page
- skilltree-web: create missing lib/services/storage.ts for export/import
- startup.sh: add umami/synapse DB creation + synapse user setup with C locale
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents the internal SSD from filling up if the external SSD is not
mounted or if `colima delete` wiped the datadisk symlink.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds 4 new Tier 3 metrics to the ecosystem health audit script:
- Git Activity: % of apps with commits in the last 30 days (97%)
- A11y Indicators: alt-text coverage, role=dialog, focusTrap (36%)
- Auth Guard Coverage: AuthGate/authGuard presence per app (83%)
- Docker Readiness: Dockerfile present per app (80%)
Overall score updated from 74 → 72 (23 metrics, 135 total weight).
Dashboard at /manascore/ecosystem updated with new category rows.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 new metrics:
- Toast Consistency (100%) — all apps use shared toastStore
- Store Pattern (95%) — 176 Runes stores vs 9 old writable/readable
- Shared Types (62%) — shared-types imports vs local type files
- Dep Freshness (80%) — avg 37 deps per app
- Bundle Config (100%) — all apps have SvelteKit adapter
Ecosystem Health Score: 74/100 (19 metrics total)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expand ecosystem-audit with:
- Error Boundaries (54%) — +error.svelte + offline page per app
- TypeScript Strict (100%) — strict mode in all apps
- Test Coverage (72%) — apps with at least one test (111 files total)
- PWA Support (2%) — manifest + service worker
- Maintainability (0%) — files under 500 lines (38 files exceed limit)
Dashboard shows file size top offenders and apps without tests.
Overall score adjusted from 76 to 70 with rebalanced weights.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Script scans the monorepo and generates ecosystem-wide consistency
metrics (icon adoption, modal usage, shared packages, etc.).
Outputs ecosystem-health.json for the Ecosystem Health dashboard.
Run: node scripts/ecosystem-audit.mjs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- check-disk-space.sh: always prune dangling images, unused volumes, and
build cache >7 days on every run (not just at critical threshold)
- check-disk-space.sh: auto-remove node_modules if found on server
(never needed — Docker builds inside containers)
- disk-check launchd: reduce interval from 60min to 15min to catch
disk issues faster (yesterday we hit 100% before hourly check caught it)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- check-disk-space.sh now pushes mac_disk_used_percent + mac_colima_disk_used_gb
to Pushgateway every hour so vmalert can alert on real macOS disk usage
- alerts.yml: replace broken node-exporter disk alerts with Pushgateway-based ones
- master-overview.json: add "Recent Errors (Loki)" section with live error log
stream, error rate timeseries and top error sources barchart
- move-colima-to-external-ssd.sh: guided script to move 200GB Colima VM
datadisk from internal SSD to /Volumes/ManaData (3.6TB external SSD)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
colima delete wipes the entire VM disk on every power cycle, forcing
full image rebuilds. colima stop --force is sufficient to clear stale
process state after a hard shutdown.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The startup script runs `colima delete` on hard shutdown recovery,
wiping the colima.yaml mount config. Then `colima start` only added
/Volumes/ManaData but forgot /Users/mana — causing all file bind-mounts
to appear as empty directories (VirtioFS can't see host files).
This was the root cause of Synapse/SearXNG/Alertmanager/Loki crashing
after the power outage. Now both mounts are always passed explicitly.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Loki was already running but had no log shipper. Adds Promtail to collect
Docker logs from all 66 containers with automatic tier labeling (infra,
auth, core, app, matrix, games) and a Grafana Logs Explorer dashboard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Kill Docker Desktop if it auto-started
- Clean stale Colima state from hard shutdown (delete --force)
- Start Colima with VZ, 12GB RAM, VirtioFS
- Restore named volumes from backup if missing
- Start containers with --no-build to skip broken Dockerfiles
- Create missing databases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mac Mini has docker at /usr/local/bin/docker, not in PATH.
Use same DOCKER_CMD pattern as build-app.sh.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-app.sh now checks available RAM before builds and only stops
monitoring containers when free memory is below 3 GB threshold.
New memory-baseline.sh script measures per-container and per-category
RAM usage for capacity planning.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove set -e to prevent abort on non-critical errors
- Suppress tar errors for volatile TSDB files (VictoriaMetrics)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Deactivate Ollama, FLUX.2, and Telegram Bot LaunchAgents on Mac Mini
- Remove extra_hosts from mana-llm (no longer needs host.docker.internal)
- Update health-check.sh to monitor GPU server services instead of local
- Update status.sh to show GPU server status instead of native services
- Rewrite MAC_MINI_SERVER.md: remove ~400 lines of Ollama/FLUX/Bot docs,
add GPU server architecture diagram and deactivation notes
- Update CAPACITY_PLANNING.md with post-offload numbers (~80-150 peak users)
Mac Mini is now a pure hosting server (Web, API, DB, Sync).
All AI workloads run on GPU server (RTX 3090) via LAN.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .forgejo/workflows/smoke-tests.yml: checks all Go services + web apps
every 6h, fails if any health check fails
- scripts/lighthouse-audit.sh: runs Lighthouse on all 14 web apps
Initial Lighthouse results:
mana.how: Perf:80 A11y:96 BP:100 SEO:92
todo.mana.how: Perf:69 A11y:96 BP:100 SEO:100
chat.mana.how: Perf:83 A11y:100 BP:96 SEO:92
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add mana-user (3062), mana-subscriptions (3063), mana-analytics (3064)
to docker-compose with health checks and traefik labels
- Replace old NestJS Tier 3 app backends (~300 lines) with comment
placeholder for Hono compute servers (need shared Dockerfile)
- Create docker/Dockerfile.hono-server — shared Bun Dockerfile for
all 14 app compute servers (ARG APP for build context)
- Add 5 new databases to setup-databases.sh: mana_auth, mana_credits,
mana_user, mana_subscriptions, mana_analytics, mana_sync
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>