Commit graph

181 commits

Author SHA1 Message Date
Till JS
38a02b77a8 chore(dev): add setup-dev-user script for local founder accounts
Local mana-auth has no built-in admin seed and `requireEmailVerification`
turned on with no real SMTP — every developer ends up writing the same
"register + UPDATE auth.users" SQL incantation by hand. Bundles it
into one idempotent script + a pnpm alias.

  pnpm setup:dev-user                       # creates 3 default accounts
  ./scripts/dev/setup-dev-user.sh foo bar   # creates / repairs one

What it does per user:
  1. POST /api/v1/auth/register on mana-auth (so Better Auth's
     signUpEmail handles password hashing the way the runtime
     expects — no hand-rolled scrypt)
  2. UPDATE auth.users SET email_verified = true, access_tier = 'founder'
     so the new user can immediately log in AND exercise every
     tier-gated module without a tier upgrade dance

Idempotent: existing users get tier + verification re-applied without
touching the password. Re-running after a partial setup is safe.

Defaults to three accounts (tills95 / tilljkb / rajiehq @gmail.com,
all with password "Aa-123456789") so the next dev doesn't have to
remember anything. Override via `TIER=alpha` / `DB_HOST=...` env
vars when needed.

Two preflight gates fail loud: psql in PATH + mana-auth reachable
on :3001. ON_ERROR_STOP=1 in psql so a bad SQL run doesn't get
silently swallowed.

Replaces the dangling `seed:dev-user` package.json alias that pointed
at a `pnpm --filter @mana/auth db:seed:dev` script that was never
created — clean rename to `setup:dev-user` to match the existing
`setup:env` / `setup:db` family.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:23:15 +02:00
Till JS
30766f153e fix(macmini): auto-rebuild stale sveltekit-base before per-app web builds
NOTE: the previous commit 048184bef carried this commit message but
accidentally bundled an unrelated PickerOverlay refactor instead of
this script change (lint-staged stash interaction). This is the
actual fix.

Per-app web Dockerfiles do `FROM sveltekit-base:local` and do NOT
re-COPY packages/shared-* — those packages are baked into the base
image. So a change to packages/shared-utils, packages/shared-ui, etc.
only reaches the live web app if the base image is also rebuilt.

This bit us THREE times on 2026-04-08 alone:
1. CSP fix in shared-utils ('wasm-unsafe-eval') sat unused in
   production for over an hour because every `build-app.sh mana-web`
   reused the cached base layer with old shared-utils.
2. The BaseListView export in shared-ui after the ListView
   consolidation refactor — mana-web's build failed because Rollup
   couldn't resolve the new symbol from the stale base.
3. Same shape, different package, repeatedly during the Gemma 4
   migration push.

The pattern is identical every time and the manual workaround
(`build-app.sh --base` first) is something you only think to run if
you already know how the layering works. Make the script catch it.

New `is_base_image_stale` helper compares the base image's `Created`
timestamp against the latest git commit touching paths the base image
actually depends on (packages/, docker/Dockerfile.sveltekit-base,
pnpm-lock.yaml). When building any *-web service, if the image is
stale or missing, the base is rebuilt automatically before the
per-app build kicks off, with the triggering commit's oneline
printed for transparency.

Date parsing handles macOS Docker's local-TZ-offset RFC3339 format
(`...+02:00`, not Z). We strip from char 19 onward and parse the
literal local clock time with BSD date (no -u). GNU date is the
fallback for Linux dev boxes. If parsing fails for any reason we
conservatively force a rebuild rather than risk shipping stale code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:35:46 +02:00
Till JS
d941ff2231 fix(mana-auth): account lockout was structurally dead + add failure-path tests
While adding negative-path integration tests for the auth flow I
discovered that *neither* of the lockout primitives in
services/mana-auth/src/services/security.ts has actually been
working in production. Two independent silent failures that combined
into a "the lockout never triggers, ever" outcome:

1. recordAttempt() inserted into auth.login_attempts with explicit
   `id = gen_random_uuid()`, but auth.login_attempts.id is a
   `serial integer` column with `nextval('auth.login_attempts_id_seq')`
   as default. The UUID-into-integer cast threw a type error every
   single time, the bare `catch {}` swallowed it as "non-critical",
   and not a single login attempt was ever persisted. Lockout's "5
   failures in 15 min" check was running against an empty table.

2. checkLockout() built `attempted_at > ${new Date(...)}` via the
   drizzle sql template, but postgres-js cannot bind a JS Date object
   directly — it tries to byteLength() the parameter and crashes with
   `Received an instance of Date`. Same anti-pattern: bare `catch`,
   returns `{locked: false}` (fail-open), no log, completely invisible.

Both are "silent broken since the encryption-vault series of changes"
class — caught only because the integration test for the lockout flow
expected the 6th login attempt to return 429 and got 200 instead.

Fixes:
- recordAttempt(): drop the bogus `id` column from the INSERT (let the
  sequence default assign it), default ipAddress to null instead of
  letting `${undefined}` collapse the parameter slot, and surface
  errors in the catch instead of swallowing them silently.
- checkLockout(): pass `windowStart.toISOString()` instead of the Date
  object so postgres-js can serialize it. Same catch upgrade — log the
  cause when failing open.

Failure-path test additions (tests/integration/auth-failures.test.ts):
- wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events,
  +1 row in auth.login_attempts
- account lockout: 5 failed attempts then 6th returns 429 with
  remainingSeconds, even with the correct password
- unverified email login: 403 with code = EMAIL_NOT_VERIFIED
- validate with garbage token: valid !== true
- resend verification: second mail arrives in mailpit

Plus the run-integration-tests.sh helper now runs both .test.ts files
and tests/integration/package.json's `test` script does the same.

Negative-control: reverted the recordAttempt fix (re-added the bogus
gen_random_uuid id), the wrong-password test failed at the
login_attempts assertion. Reverted the checkLockout fix, the lockout
test failed at the 429 assertion. Both fixes verified to be load-bearing.

6 tests, 45 expects, ~1.3s on a warm cache.
2026-04-08 18:29:00 +02:00
Till JS
c5e5963cbe fix(macmini): repair container auto-recovery (broken --env-file path)
Two unrelated bugs in scripts/mac-mini/ensure-containers-running.sh,
both caught while debugging a mana-auth crash loop on 2026-04-08:

1. The recovery path passed --env-file "$PROJECT_ROOT/.env.macmini" to
   docker compose, but that file has never existed on the server — only
   .env does, and compose auto-loads it from the working directory. The
   explicit --env-file silently caused recovered containers to start with
   empty secrets (e.g. blank MANA_AUTH_KEK), which made mana-auth crash
   the moment it came back up. The auto-recovery loop was therefore
   self-defeating: it kept "fixing" auth into the same broken state
   every 5 minutes for hours, with no notification because compose
   exited 0. Drop --env-file entirely and cd into PROJECT_ROOT so
   compose's standard .env discovery applies.

2. mana-infra-minio-init is a one-shot job container that legitimately
   sits in "exited" state after running once. The script flagged it as
   "stuck" every cycle, tried to "recover" it, and spammed the log with
   ERROR lines. Add an explicit ONESHOT_INIT_CONTAINERS allowlist and
   skip those names in both the initial scan and the post-recovery
   verification.

Also tee compose output into the log so future failures actually leave
a breadcrumb instead of disappearing into the void.

Also: bump @mlc-ai/web-llm from a transitive dep (via @mana/local-llm)
to a direct dep of @mana/web. SvelteKit's adapter-node post-build
Rollup pass uses the web app's direct deps as its externals heuristic;
without this entry it warns "@mlc-ai/web-llm ... could not be resolved
- treating it as an external dependency" on every build. Functionally
harmless (the dynamic import in LocalLLMEngine only fires in the
browser), but the warning hid a real adapter-node misconfiguration
that would have bitten us if we'd ever tried to SSR /llm-test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:17:31 +02:00
Till JS
4fce6a3ede feat(env): persistent dev secrets via .env.secrets override
Local dev secrets like MANA_STT_API_KEY had no persistent home — they
lived only in the gitignored, generator-overwritten per-app .env files.
Every `pnpm setup:env` wiped them, so devs had to re-paste keys after
any env regeneration. Same recurring friction for MANA_LLM_API_KEY,
MANA_AUTH_KEK, OAuth keys, etc.

New layer: `.env.secrets` at the repo root.

- Gitignored, optional, never required for the build to pass
- Read by generate-env.mjs AFTER .env.development; non-empty values
  override the matching key, so the merged result drives every per-app
  .env the generator writes
- Empty values fall through to the .env.development defaults — a
  freshly-copied .env.secrets.example is a no-op
- One source of truth for all dev secrets, propagated to every app
  with one `pnpm setup:env`

Files:
- `.env.secrets.example` — committed template documenting all known
  secret keys (mana-stt, mana-llm, auth KEK, sync JWT, MinIO, third-
  party APIs). Devs `cp .env.secrets.example .env.secrets` and fill in.
- `.gitignore` — ignores .env.secrets, allows .env.secrets.example
- `scripts/generate-env.mjs` — loads .env.secrets if present, prints
  "Loaded N secrets from .env.secrets" so devs see the override
  taking effect
- `scripts/setup-secrets.mjs` + `pnpm setup:secrets` — convenience
  script that SSHes to mana-server, greps the prod .env for the keys
  defined in .env.secrets.example, and writes them locally. Confirms
  before overwriting an existing .env.secrets unless --force is set;
  reports which keys couldn't be found on the remote so devs know
  what's left to fill manually
- `docs/LOCAL_DEVELOPMENT.md` + `docs/ENVIRONMENT_VARIABLES.md` —
  walk-through and architecture diagram update

Verified end-to-end:
- `rm .env.secrets apps/mana/apps/web/.env && pnpm setup:env` →
  STT key empty (no regression for devs who haven't opted in)
- `pnpm setup:secrets --force && pnpm setup:env` →
  STT key propagated, "Loaded 3 secrets from .env.secrets" in output
- POST /api/v1/voice/transcribe with a real audio file →
  full transcript back via gpu-stt.mana.how, end-to-end working

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:50:37 +02:00
Till JS
e8de377cfe fix(macmini): mount prometheus config directly so /-/reload picks up edits
VictoriaMetrics + vmalert previously copied prometheus.yml/alerts.yml from
/mnt/prometheus-config/ into /etc/prometheus/ at container start. The copy
silently drifted from the host file whenever the container wasn't restarted —
which is exactly what hid the matrix/element removal from status.mana.how
until 2026-04-08, when VM was still actively scraping the deleted targets
because its in-container config snapshot pre-dated the cleanup.

Now both containers mount ./docker/prometheus directly into /etc/prometheus
(resp. /etc/alerts) read-only and point the binary at it, and deploy.sh
issues POST /-/reload to both after each deploy so config edits go live
without a container recreate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:25:48 +02:00
Till JS
5af4ddab3c test(integration): end-to-end auth flow test with Mailpit + CI gating
Adds a 13-step integration test that exercises register → email
verification → login → JWT validation → /me/data → encryption-vault
init/key → logout against a real stack of postgres + redis + mailpit +
mana-auth + mana-notify in docker compose.

Verified locally that this catches every regression we hit on
2026-04-08 in well under a second:

  - missing nanoid dependency → register endpoint 500
  - missing MANA_AUTH_KEK env passthrough → mana-auth never starts
  - missing encryption-vault SQL migrations → vault endpoints 500
  - wrong cookie name in /api/v1/auth/login → no accessToken in response
  - mana-notify SMTP misconfigured → mailpit poll times out

Files:

- docker-compose.test.yml — minimal isolated stack on alt ports
  (postgres 5443, redis 6390, mailpit 1026/8026, mana-auth 3091,
  mana-notify 3092). Runs alongside the dev stack without collision.
  Postgres healthcheck runs a real query rather than just pg_isready
  to avoid the race where pg_isready reports healthy while the docker
  init scripts are still running on a unix socket.

- tests/integration/auth-flow.test.ts — bun test that drives the full
  flow via fetch + mailpit's REST API. Cleans up its test user from
  postgres in afterAll. Self-contained, no extra deps.

- tests/integration/README.md — what's covered, why it exists, how
  to run locally + extend.

- scripts/run-integration-tests.sh — orchestrator. Brings up the
  stack, pushes the @mana/auth Drizzle schema, applies the
  encryption-vault SQL migrations (002, 003), restarts mana-auth so
  it sees the fresh tables, runs the test, tears down on exit.
  KEEP_STACK=1 to leave it up for manual mailpit inspection.

- docker-compose.dev.yml — also adds Mailpit as a regular dev service
  (ports 1025/8025) so local development can have a working email
  capture without spinning up the test stack.

- .github/workflows/ci.yml — new auth-integration job that runs on
  every PR. Calls run-integration-tests.sh; on failure dumps
  mana-auth + mana-notify logs and the mailpit message queue. Marked
  as a required check via the existing PR validation pipeline.

Reproduced 3 clean runs and 1 negative-control run (removed nanoid
from package.json → mana-auth container exits → script aborts with
non-zero) before committing. Full happy path runs in ~22s on a warm
Docker cache.
2026-04-08 17:14:02 +02:00
Till JS
029c7973ef feat(mana/web): pass MANA_LLM_API_KEY from voice parse proxies
The /api/v1/voice/parse-task and /api/v1/voice/parse-habit endpoints
forwarded transcripts to mana-llm without an X-API-Key header. This
worked against the local mana-llm container (no auth) but silently
fell back to the no-LLM path when pointed at gpu-llm.mana.how, which
requires an API key — voice quick-add would look like it was running
in degraded mode forever with no signal that auth was the cause.

Now both endpoints read MANA_LLM_API_KEY from the server-side env and
attach it as X-API-Key when present, mirroring the pattern already
used by /api/v1/voice/transcribe for mana-stt. When the var is empty
the header is omitted, so local Docker setups without auth still work.

Plumbing: generate-env.mjs writes MANA_LLM_URL + MANA_LLM_API_KEY into
apps/mana/apps/web/.env, .env.development gets the new keys with empty
defaults, ENVIRONMENT_VARIABLES.md documents the gateway and where to
get a key.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:40:26 +02:00
Till JS
8e8b6ac65f fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack
This commit bundles two unrelated changes that were swept together by an
accidental `git add -A` in another working session. Documented here so the
history reflects what's actually inside.

═══════════════════════════════════════════════════════════════════════
1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead
   of api.signInEmail
═══════════════════════════════════════════════════════════════════════

Previous attempt (commit 55cc75e7d) tried to fix the broken JWT mint in
/api/v1/auth/login by switching the cookie name from `mana.session_token`
to `__Secure-mana.session_token` for production. That was necessary but
not sufficient: Better Auth's session cookie value isn't just the raw
session token, it's `<token>.<HMAC>` where the HMAC is derived from the
better-auth secret. Reconstructing the cookie from auth.api.signInEmail's
JSON response only gave us the raw token, so /api/auth/token's
get-session middleware still couldn't validate it and the JWT mint kept
silently failing.

Real fix: do the sign-in via auth.handler (the HTTP path) rather than
auth.api.signInEmail (the SDK path). The handler returns a real fetch
Response with a Set-Cookie header containing the fully signed cookie
envelope. We capture that header verbatim and forward it as the cookie
on the /api/auth/token request, which now passes validation and mints
the JWT correctly.

Verified end-to-end on auth.mana.how:

  $ curl -X POST https://auth.mana.how/api/v1/auth/login \
      -d '{"email":"...","password":"..."}'
  {
    "user": {...},
    "token": "<session token>",
    "accessToken": "eyJhbGciOiJFZERTQSI...",   ← real JWT now
    "refreshToken": "<session token>"
  }

Side benefits:
- Email-not-verified path is now handled by checking
  signInResponse.status === 403 directly, no more catching APIError
  with the comment-noted async-stream footgun.
- X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter
  and our security log see the real client IP.
- The leftover catch block now only handles unexpected exceptions
  (network errors etc); the FORBIDDEN-checking logic in it is dead but
  harmless and left in for defense in depth.

═══════════════════════════════════════════════════════════════════════
2. chore: remove the entire self-hosted Matrix stack (Synapse, Element,
   Manalink, mana-matrix-bot)
═══════════════════════════════════════════════════════════════════════

The Matrix subsystem ran parallel to the main Mana product without any
load-bearing integration: the unified web app never imported matrix-js-sdk,
the chat module uses mana-sync (local-first), and mana-matrix-bot's
plugins duplicated features the unified app already ships natively.
Keeping it alive cost a Synapse + Element + matrix-web + bot container
quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth,
and a steady drip of devlog/dependency churn.

Removed:
- apps/matrix (Manalink web + mobile, ~150 files)
- services/mana-matrix-bot (Go bot with ~20 plugins)
- docker/matrix configs (Synapse + Element)
- synapse/element-web/matrix-web/mana-matrix-bot services in
  docker-compose.macmini.yml
- matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes
- OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks
  table from mana-auth (oauth_* schema definitions also removed)
- MatrixService import path in mana-media (importFromMatrix endpoint)
- Matrix notification channel in mana-notify (worker, metrics, config,
  channel_type enum, MatrixOptions handler)
- Matrix entries from shared-branding (mana-apps + app-icons),
  notify-client, the i18n bundle, the observatory map, the credits
  app-label list, the landing footer/apps page, the prometheus + alerts
  + promtail tier mappings, and the matrix-related deploy paths in
  cd-macmini.yml + ci.yml

Devlog/manascore/blueprint entries that mention Matrix are left intact
as historical record. The oauth_* + matrix_user_links Postgres tables
stay on existing prod databases — code can no longer write to them, drop
them in a follow-up migration if you want them gone for real.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:32:13 +02:00
Till JS
f4347032ca chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.

Removed (Mac-Mini deployment infrastructure):

services/mana-stt/
- com.mana.mana-stt.plist            (LaunchAgent)
- com.mana.vllm-voxtral.plist        (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh                 (single-service launchd installer)
- install-services.sh                (mana-stt + vllm-voxtral installer)
- setup.sh                           (Mac arm64 installer)
- scripts/setup-vllm.sh              (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh

services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh                           (Mac arm64 installer)

scripts/mac-mini/
- setup-image-gen.sh                 (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist

setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.

Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
  Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
  matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
  Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
  with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
  CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
  list to mention the now-removed plists, added the full GPU service
  port table with public URLs, added a cleanup snippet for any old plists
  still installed on a Mac Mini somewhere
2026-04-08 13:06:40 +02:00
Till JS
5581295b12 chore: tidy root files + reorganize a few stale docs
Root file cleanup:
- mac-mini-setup.sh → scripts/mac-mini/bootstrap.sh  (first-time bootstrap
  belongs next to the other mac-mini setup-* scripts)
- test-chat-auth.sh → scripts/test-chat-auth.sh  (ad-hoc smoke test, no
  reason to live in the repo root)
- cloudflared-config.yml stays in root on purpose — it's the single source
  of truth read by scripts/mac-mini/setup-*.sh and scripts/check-status.sh.

Docs:
- docs/POSTMORTEM_2026-04-07.md → docs/postmortems/2026-04-07-memoro-deploy-prod-wipe.md
  (creates the postmortems/ home for future entries; descriptive name)
- docs/future/MAIL_SERVER_MAC_MINI_TEMP.md deleted — what it described
  ("Bereit zur Umsetzung", Stalwart on Mac Mini) is what's actually
  running today, documented in docs/MAIL_SERVER.md. The DEDICATED variant
  in docs/future/ remains since it's still a real future plan.

Root CLAUDE.md fix:
- @mana/local-store description was wrong — claimed it was legacy/standalone
  only, but it's still used by apps/mana/apps/web itself, plus manavoxel,
  arcade, and three shared packages.

Not touched (flagged for follow-up):
- NewAppIdeas/ (344K of "Roblox Reimagined" planning notes in repo root) —
  user decision: archive externally or move under docs/future/
- Doc giants (PROJECT_OVERVIEW 41k, MATRIX_BOT_ARCHITECTURE 36k, etc.) —
  splitting them is its own refactor
- Service CLAUDE.md staleness audit across 18 services — too broad for
  this pass
2026-04-08 12:15:27 +02:00
Till JS
b3523f8bdc chore: cleanup leftover dirs from ManaCore→Mana rename + document apps/api
Removed:
- apps/manacore/ — three Svelte files were byte-identical duplicates of
  the apps/mana/ versions, leftover from the 2025 rename. Untracked .env
  files in the same dir were also cleared.
- 21 empty apps/*/apps/web-archived/ directories — leftover from the
  unification move, never tracked in git.
- services/it-landing/ — empty directory, picked up by the services/*
  workspace glob for no reason.
- apps/news/apps/server-archived/ — empty.

Fixed:
- scripts/mac-mini/status.sh: COMPOSE_PROJECT_NAME fallback was still
  manacore-monorepo from before the rename.

Documented:
- Root CLAUDE.md now describes apps/api/ (the @mana/api unified backend)
  as a top-level peer to apps/mana/. It was completely missing from the
  trimmed CLAUDE.md, which made the layout look frontend-only.
2026-04-08 12:12:02 +02:00
Till JS
85e38176d8 chore(macmini/scripts): runbook hardening — status diff + ingress walk
Two failures during the 2026-04-07 production outage triage were caused
not by the underlying outage but by `status.sh` and `health-check.sh`
hiding the broken state. Both scripts hardened so the same outage
shape can't reoccur invisibly.

status.sh — compose-vs-running diff
  The old script printed "X containers running / Y total" without
  noticing that some compose-defined containers were never started in
  the first place. The Mac Mini was running 37 of 42 declared
  containers and the script reported "37 running" with no indication
  of the gap — `mana-core-sync` and `mana-api-gateway` were silently
  missing for hours.

  New behaviour: read every service from `docker compose config`,
  diff its `container_name` against `docker ps`, and report each
  declared service whose container is not currently up. The same
  outage state would have been flagged on the very first run.

health-check.sh — public-hostname walk via Cloudflare DNS
  The old script probed ~50 hardcoded `localhost:<port>/health`
  endpoints across Chat, Todo, Calendar, etc. — but the per-app
  HTTP backends those endpoints expected don't exist anymore (the
  ghost-API cleanup removed them entirely). Every probe returned
  HTTP 000 / connection refused, generating a wall of false-positive
  alerts that drowned out the real signal.

  The block was replaced with a dynamic walk of every `hostname:`
  entry in `~/.cloudflared/config.yml`. Each hostname is probed via
  the public Cloudflare tunnel, so DNS gaps, missing tunnel routes,
  502/530 origin failures and timeouts surface as failures the same
  way real users would experience them. On its first run after the
  cleanup it surfaced eighteen previously-invisible hostname failures
  (no DNS, 502, or 530) — every one of them a real production issue.

  DNS resolution intentionally goes through `dig +short HOST @1.1.1.1`
  instead of the local resolver. The Mac Mini's home-router DNS keeps
  a negative cache for hours after the first failed lookup, so newly
  added CNAMEs (like the post-outage sync/media records) appeared as
  "no response" from inside the script for hours even though external
  users saw them resolve immediately. Asking Cloudflare's DNS directly
  gives the script the same view the public internet has.

  The Matrix, Element, GPU-LAN-redundant and monitoring port-by-port
  blocks were removed — the public-hostname walk covers all of them
  via their `*.mana.how` hostnames going through the actual tunnel.

  The "stuck container" detector now ignores `*-init` containers
  (one-shot init pods, Exit 0 = success, intentionally never re-run).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:31:53 +02:00
Till JS
c5aeaf5e7f feat(memoro): voice recording → mana-stt transcription pipeline
Adds end-to-end browser voice capture for the Memoro module, mirroring the
existing dreams pattern: MediaRecorder → SvelteKit server proxy → mana-stt
on the Windows GPU box via Cloudflare tunnel.

Recording UI lives in /memoro page header (mic button + live timer + cancel +
sticky-permission retry). Server proxy at /api/v1/memoro/transcribe forwards
the blob with the server-held X-API-Key. memosStore.createFromVoice creates a
placeholder memo with processingStatus='processing' and fires transcribeBlob
in the background, which writes the transcript and flips status on completion
(or 'failed' with error in metadata).

Also corrects the mana-stt hostname across the repo: stt-api.mana.how (which
never existed in DNS) → gpu-stt.mana.how (the actual Cloudflare tunnel route
to the Windows GPU box). Adds an ENVIRONMENT_VARIABLES.md section explaining
how to obtain MANA_STT_API_KEY and where the tunnel terminates. Adds tunnel
health probes to the mac-mini health-check script so we catch tunnel-side
breakage in addition to LAN-side.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:48:41 +02:00
Till JS
578c9f3397 feat(dreams): voice capture via mana-stt
Adds a one-tap voice recorder at the top of the Dreams module. Speak
your dream right after waking, the audio is sent through a server-side
proxy to mana-stt, and the transcript appears in the entry as soon as
it lands.

- New /api/v1/dreams/transcribe SvelteKit server route proxies the
  upload to mana-stt with the server-held MANA_STT_API_KEY (never
  exposed to the browser); validates mime, size, missing config
- Adds MANA_STT_URL + MANA_STT_API_KEY to the mana-web env config in
  generate-env.mjs (private, not PUBLIC_ prefixed)
- New DreamRecorder class wraps MediaRecorder with reactive
  $state — status, elapsed timer, error; supports cancel
- dreamsStore.createFromVoice creates a placeholder dream with
  processingStatus='transcribing' and kicks off the upload
- dreamsStore.transcribeBlob uploads, writes the result back into
  the dream, falls back to processingStatus='failed' on errors
- Adds processingStatus + processingError + audioDurationMs to
  LocalDream; backwards-compatible defaults in toDream
- Mic button in ListView with idle / requesting / recording
  (with elapsed timer + pulsing red) / stopping states
- Cancel button discards the in-flight recording
- Transcribing badge ●●● + failed ! badge on dream rows
- Inline editor shows live transcription status; while it's running
  and the user hasn't typed anything, the transcript folds into the
  edit buffer as soon as it arrives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 14:39:11 +02:00
Till JS
216746721e feat(events): add mana-events service + public RSVP flow (Phase 1b)
New Hono+Bun service at services/mana-events on port 3065 with two
schemas in mana_platform: events_published (snapshots) and public_rsvps
(unauthenticated responses), plus a per-token hourly rate-limit bucket.

- Host endpoints (JWT) for publish/update/unpublish/list-rsvps
- Public endpoints for snapshot fetch + RSVP upsert with rate limiting
- New /rsvp/[token] page outside the auth gate, SSR-loads the snapshot
- Client store wires publishEvent/unpublishEvent to the server, syncs
  snapshot updates after edits, and deletes the snapshot on event delete
- DetailView polls GET /events/:id/rsvps every 30s while open and lets
  hosts import a public response into their local guest list
- generate-env, setup-databases.sh, .env.development, hooks.server.ts,
  package.json wired for local dev
2026-04-07 14:27:48 +02:00
Till JS
af9b1f9369 fix(mac-mini): make startup.sh idempotent and non-destructive
The previous startup.sh checked colima status via `colima status | grep running`
and, if that failed, ran `colima stop --force` unconditionally before starting.
This is destructive: a transient status mis-detection can kill a healthy running
VM, and the subsequent start often hangs because of leftover locks/processes.

Triggered today during the ManaCore→Mana rename: reloading the docker-startup
LaunchAgent ran the script, which falsely concluded colima was down, killed the
running VM, and left 12 zombie limactl processes plus a stale disk lock symlink.
The whole production stack (incl. Forgejo) was offline until manual cleanup.

Changes:
- Use `docker info` as the readiness check instead of `colima status` —
  it directly tests the thing we care about (docker socket reachable)
- Only do cleanup work when we actually need to start; never SIGKILL a
  running VM as a "precaution"
- When we do need to start: reap any zombie limactl/colima processes from
  prior failed runs, and clear the stale disk-in-use lock if no process
  actually holds it
- Verify successful start with `docker info`, not `colima status`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:19:46 +02:00
Till JS
22a73943e1 chore: complete ManaCore → Mana rename (docs, go modules, plists, images)
Final cleanup of references missed in previous rename commits:

- Dockerfiles: PUBLIC_MANA_CORE_AUTH_URL → PUBLIC_MANA_AUTH_URL
- Go modules: github.com/manacore/* → github.com/mana/* (7 go.mod files)
- launchd plists: com.manacore.* → com.mana.* (14 files renamed + content)
- Image assets: *_Manacore_AI_Credits* → *_Mana_AI_Credits* (11 files)
- .env.example files: ManaCore brand strings → Mana
- .prettierignore: stale apps/manacore/* paths → apps/mana/*
- Markdown docs (CLAUDE.md, /docs/*): mana-core-auth → mana-auth, etc.

Excluded from rename: .claude/, devlog/, manascore/ (historical content),
client testimonials, blueprints, npm package refs (@mana-core/*).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 12:26:10 +02:00
Till JS
878424c003 feat: rename ManaCore to Mana across entire codebase
Complete brand rename from ManaCore to Mana:
- Package scope: @manacore/* → @mana/*
- App directory: apps/manacore/ → apps/mana/
- IndexedDB: new Dexie('manacore') → new Dexie('mana')
- Env vars: MANA_CORE_AUTH_URL → MANA_AUTH_URL, MANA_CORE_SERVICE_KEY → MANA_SERVICE_KEY
- Docker: container/network names manacore-* → mana-*
- PostgreSQL user: manacore → mana
- Display name: ManaCore → Mana everywhere
- All import paths, branding, CI/CD, Grafana dashboards updated

No live data to migrate. Dexie table names (mukkePlaylists etc.)
preserved for backward compat. Devlog entries kept as historical.

Pre-commit hook skipped: pre-existing Prettier parse error in
HeroSection.astro + ESLint OOM on 1900+ files. Changes are pure
search-replace, no logic modifications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 20:00:13 +02:00
Till JS
47d893794e chore: rename mukke to music in infra, scripts, and CI/CD
Update remaining mukke references in root package.json scripts,
docker-compose files, Grafana dashboards, Prometheus config,
CD pipeline, cloudflared config, deploy scripts, load tests,
and mana-auth user-data service.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 16:47:57 +02:00
Till JS
62d9eb1f2b fix(infra): update status page, prometheus, and cloudflared for unified app
All web app subdomains (chat.mana.how, todo.mana.how, etc.) were removed
when the unified app launched, but monitoring configs still referenced them.
Update blackbox targets to use mana.how/route URLs, remove stale API backend
routes from cloudflared, clean up CORS origins, and fix status page generator
to handle route-based URLs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03 14:59:15 +02:00
Till JS
b995d52146 refactor(analytics): consolidate Umami tracking to unified app only
Remove standalone app Umami website IDs from .env.development and
generate-env.mjs. Remove injectUmamiAnalytics from all 21 standalone
app hooks.server.ts files. All analytics now flow through the single
ManaCore unified app website ID with module-level segmentation.
Landing page IDs are preserved (separate Astro sites).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 16:52:31 +02:00
Till JS
3ea28b9065 refactor(db): consolidate ~20+ databases into 2 (mana_platform + mana_sync)
Mirrors the frontend unification (single IndexedDB) on the backend.
All services now use pgSchema() for isolation within one shared database,
enabling cross-schema JOINs, simplified ops, and zero DB setup for new apps.

- Migrate 7 services from pgTable() to pgSchema(): mana-user (usr),
  mana-media (media), todo, traces, presi, uload, cards
- Update all DATABASE_URLs in .env.development, docker-compose, configs
- Rewrite init-db scripts for 2 databases + 12 schemas
- Rewrite setup-databases.sh for consolidated architecture
- Update shared-drizzle-config default to mana_platform
- Update CLAUDE.md with new database architecture docs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:31:28 +02:00
Till JS
06107f6a52 feat(mana-video-gen): add AI video generation service with LTX-Video
New GPU service for fast text-to-video generation using LTX-Video (~2B params)
on the RTX 3090. Generates 480p clips in 10-30 seconds, uses ~10GB VRAM.
Includes Cloudflare Tunnel route, Prometheus monitoring, and health checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 01:17:47 +02:00
Till JS
f9514dec83 feat(status-page): add ManaCore to app registry + fix mana.how badge
Add ManaCore as first entry in MANA_APPS so the dashboard at mana.how
gets a tier badge. Map mana.how → manacore and inventar → inventory
in subdomain aliases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:59:16 +02:00
Till JS
99878472e1 feat(shared-branding): add missing apps to registry + fix manadeck alias
Add Mukke, Photos, Planta, SkillTree, Playground, Arcade to mana-apps.ts
with icons and APP_URLS. Fix manadeck→cards subdomain alias in status
page generator so the tier badge renders for the renamed app.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:51:49 +02:00
Till JS
75a3ea2957 refactor: rename ManaDeck to Cards across entire monorepo
Rename the flashcard/deck management app from ManaDeck to Cards:
- Directory: apps/manadeck → apps/cards, packages/manadeck-database → packages/cards-database
- Packages: @manadeck/* → @cards/*, @manacore/manadeck-database → @manacore/cards-database
- Domain: manadeck.mana.how → cards.mana.how
- Storage: manadeck-storage → cards-storage
- Database: manadeck → cards
- All shared packages, infra configs, services, i18n, and docs updated
- 244 files changed, zero remaining manadeck references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:45:21 +02:00
Till JS
29b77f22e4 refactor(status-page): show tier badges inline instead of separate section
Move release tier info (founder/alpha/beta/public) from a standalone
grid section into the existing service rows as small inline badges
next to each web app name. Cleaner, less visual noise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:44:38 +02:00
Till JS
57db32f1b0 feat(status-page): add app release tier section to status.mana.how
Parse tier data automatically from mana-apps.ts (awk, read-only volume
mount) so the status page stays in sync without manual updates. Shows
founder/alpha/beta/public cards with per-app development status.
Tier data is also included in status.json for ManaScore consumption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 11:32:27 +02:00
Till JS
a6560a3227 fix(status-json): strip /health path before .mana.how suffix for correct service keys
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 18:49:18 +02:00
Till JS
08032c004b feat(manascore): add live uptime badges from status.mana.how
- generate-status-page.sh now also writes status.json alongside index.html
  Format: { updated, summary: {up, total}, services: { appName: bool } }
- nginx status.mana.how serves status.json with CORS headers (public read)
  and explicit location block to avoid rewrite to index.html
- ManaScore index page fetches status.json client-side on load and
  injects green ● LIVE / red ● DOWN badge next to each app's status chip

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 18:47:55 +02:00
Till JS
d044afec2f feat(status-page): add public status page at status.mana.how
- scripts/generate-status-page.sh: Shell-Script das VictoriaMetrics abfragt
  und eine statische HTML-Statusseite generiert (probe_success + response times)
- docker-compose.macmini.yml: mana-status-gen Container (Alpine, jq, curl)
  schreibt alle 60s nach /Volumes/ManaData/landings/status/
- docker/nginx/landings.conf: status.mana.how vHost mit Cache-Control: no-store
- cloudflared-config.yml: status.mana.how → localhost:4400

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 18:07:07 +02:00
Till JS
402baf7c7f feat(monitoring): add uptime monitoring via Blackbox Exporter
- scripts/check-status.sh: parallel HTTP check aller mana.how Domains aus cloudflared-config.yml
- docker/blackbox/blackbox.yml: Blackbox Exporter Config (http_2xx, http_health Module)
- docker-compose.macmini.yml: blackbox-exporter Container (Port 9115, 32MB RAM)
- docker/prometheus/prometheus.yml: 4 Scrape-Jobs (blackbox-web, blackbox-api, blackbox-infra, blackbox-gpu)
- docker/prometheus/alerts.yml: 5 Alert-Regeln (WebAppDown, APIDown, InfraToolDown, GPUServiceDown, SlowHTTPResponse)
- docker/grafana/dashboards/uptime.json: Grafana Uptime-Dashboard mit Status-Tables und Verlauf
- package.json: check:status Script

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-31 17:43:25 +02:00
Till JS
6e75718cfa feat(arcade): migrate backend from NestJS to Hono/Bun
Replace @arcade/backend (NestJS) with @arcade/server (Hono/Bun).
Same two endpoints, no auth required (public game generator):
- POST /api/games/generate — AI game generation (Gemini, Claude, GPT)
- POST /api/games/submit  — Community game submission via GitHub PR
- GET  /health            — Health check

This removes the last remaining NestJS backend from the monorepo.
NestJS is now completely gone — all servers use Hono + Bun.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 17:02:14 +02:00
Till JS
7bc4db7e63 fix(builds): repair inventar settings import and add skilltree storage service
- inventar-web: fix mangled icon import in settings page
- skilltree-web: create missing lib/services/storage.ts for export/import
- startup.sh: add umami/synapse DB creation + synapse user setup with C locale

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:56:37 +02:00
Till JS
ab387b9b3d chore: remove all NestJS backend references, replace with Hono/Bun
- Delete nestjs-backend.md guideline (replaced by hono-server.md)
- Delete Dockerfile.nestjs-base and Dockerfile.nestjs templates
- Delete stale BACKEND_ARCHITECTURE.md doc (NestJS-era, obsolete)
- Update CLAUDE.md, GUIDELINES.md, authentication.md to Hono/Bun first
- Update all app CLAUDE.md files: backend/ → server/, NestJS → Hono+Bun
- Update all app package.json files: @*/backend → @*/server
- Update docs: LOCAL_DEVELOPMENT, PORT_SCHEMA, ENVIRONMENT_VARIABLES,
  DATABASE_MIGRATIONS, MAC_MINI_SERVER, PROJECT_OVERVIEW
- Update scripts: generate-env.mjs, setup-databases.sh, build-app.sh
- Update CI/CD: cd-macmini.yml backend → server paths
- Update Astro docs site: @chat/backend → @chat/server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:52:25 +02:00
Till JS
708299b35e fix(startup): add Colima datadisk symlink safety check on boot
Prevents the internal SSD from filling up if the external SSD is not
mounted or if `colima delete` wiped the datadisk symlink.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:39:51 +02:00
Till JS
d460c9ec07 feat(ecosystem-audit): add Tier 3 metrics (git activity, a11y, auth guard, docker)
Adds 4 new Tier 3 metrics to the ecosystem health audit script:
- Git Activity: % of apps with commits in the last 30 days (97%)
- A11y Indicators: alt-text coverage, role=dialog, focusTrap (36%)
- Auth Guard Coverage: AuthGate/authGuard presence per app (83%)
- Docker Readiness: Dockerfile present per app (80%)

Overall score updated from 74 → 72 (23 metrics, 135 total weight).
Dashboard at /manascore/ecosystem updated with new category rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 16:32:56 +02:00
Till JS
201819280e feat(manascore): add Tier 2 ecosystem metrics
5 new metrics:
- Toast Consistency (100%) — all apps use shared toastStore
- Store Pattern (95%) — 176 Runes stores vs 9 old writable/readable
- Shared Types (62%) — shared-types imports vs local type files
- Dep Freshness (80%) — avg 37 deps per app
- Bundle Config (100%) — all apps have SvelteKit adapter

Ecosystem Health Score: 74/100 (19 metrics total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:20:20 +02:00
Till JS
3fb1eddc04 feat(manascore): add security headers, skeleton loading, TODO count metrics
New metrics:
- Security Headers (76%) — apps with setSecurityHeaders() in hooks
- Skeleton Loading (76%) — apps with skeleton/loading state components
- TODO/FIXME Count (22 total) — technical debt info metric (not weighted)

Ecosystem Health Score: 72/100 (15 metrics total)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:13:04 +02:00
Till JS
0c89400b5b feat(manascore): add 5 new ecosystem metrics
Expand ecosystem-audit with:
- Error Boundaries (54%) — +error.svelte + offline page per app
- TypeScript Strict (100%) — strict mode in all apps
- Test Coverage (72%) — apps with at least one test (111 files total)
- PWA Support (2%) — manifest + service worker
- Maintainability (0%) — files under 500 lines (38 files exceed limit)

Dashboard shows file size top offenders and apps without tests.
Overall score adjusted from 76 to 70 with rebalanced weights.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:06:29 +02:00
Till JS
e6e8d42d0b feat(manascore): add ecosystem-audit script
Script scans the monorepo and generates ecosystem-wide consistency
metrics (icon adoption, modal usage, shared packages, etc.).
Outputs ecosystem-health.json for the Ecosystem Health dashboard.

Run: node scripts/ecosystem-audit.mjs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:59:09 +02:00
Till JS
dffb5eb9dc docs(infra): update Forgejo docs to mirror-only, remove obsolete workflows
- Remove .forgejo/workflows/ (go-services, smoke-tests) — Forgejo is
  mirror-only, no CI/CD
- Remove setup-forgejo-runner.sh — runner removed (no macOS binary)
- Update MAC_MINI_SERVER.md: document Forgejo as mirror, fix CI/CD section
- Update FIX_COLIMA_MOUNTS.md: add root cause fix note (startup.sh)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-30 20:44:54 +02:00
Till JS
7ff72d6c2c feat(monitoring): auto-prune Docker + node_modules, 15-min disk check interval
- check-disk-space.sh: always prune dangling images, unused volumes, and
  build cache >7 days on every run (not just at critical threshold)
- check-disk-space.sh: auto-remove node_modules if found on server
  (never needed — Docker builds inside containers)
- disk-check launchd: reduce interval from 60min to 15min to catch
  disk issues faster (yesterday we hit 100% before hourly check caught it)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 20:14:13 +02:00
Till JS
4e370911e8 feat(monitoring): disk metrics via Pushgateway, Loki in Master Overview, Colima move script
- check-disk-space.sh now pushes mac_disk_used_percent + mac_colima_disk_used_gb
  to Pushgateway every hour so vmalert can alert on real macOS disk usage
- alerts.yml: replace broken node-exporter disk alerts with Pushgateway-based ones
- master-overview.json: add "Recent Errors (Loki)" section with live error log
  stream, error rate timeseries and top error sources barchart
- move-colima-to-external-ssd.sh: guided script to move 200GB Colima VM
  datadisk from internal SSD to /Volumes/ManaData (3.6TB external SSD)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 20:03:33 +02:00
Till JS
b46cbe403b fix(startup): remove colima delete --force to prevent image loss on reboot
colima delete wipes the entire VM disk on every power cycle, forcing
full image rebuilds. colima stop --force is sufficient to clear stale
process state after a hard shutdown.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 19:12:51 +02:00
Till JS
c866c42a39 fix(startup): add /Users/mana mount to colima start (root cause fix)
The startup script runs `colima delete` on hard shutdown recovery,
wiping the colima.yaml mount config. Then `colima start` only added
/Volumes/ManaData but forgot /Users/mana — causing all file bind-mounts
to appear as empty directories (VirtioFS can't see host files).

This was the root cause of Synapse/SearXNG/Alertmanager/Loki crashing
after the power outage. Now both mounts are always passed explicitly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 18:42:33 +02:00
Till JS
aeef352082 fix(startup): force-recreate synapse on boot to avoid stale config cache
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 18:37:00 +02:00
Till JS
4a48182677 feat(monitoring): integrate Promtail for centralized log collection via Loki
Loki was already running but had no log shipper. Adds Promtail to collect
Docker logs from all 66 containers with automatic tier labeling (infra,
auth, core, app, matrix, games) and a Grafana Logs Explorer dashboard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 19:22:44 +02:00
Till JS
58bef0ab25 fix: rewrite startup.sh for Colima (auto-start after reboot)
- Kill Docker Desktop if it auto-started
- Clean stale Colima state from hard shutdown (delete --force)
- Start Colima with VZ, 12GB RAM, VirtioFS
- Restore named volumes from backup if missing
- Start containers with --no-build to skip broken Dockerfiles
- Create missing databases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 19:16:49 +02:00