managarten/docker
Till JS a55aae6cb5 chore(macmini): infra cleanup — compose env, blackbox mem, prometheus gpu probes
Three Mac Mini infrastructure follow-ups bundled:

1. docker-compose.macmini.yml — drop ghost backend env vars from
   the mana-app-web service (todo, calendar, contacts, chat, storage,
   cards, music, nutriphi `PUBLIC_*_API_URL{,_CLIENT}` plus the memoro
   server URLs). The matching consumers were removed in the earlier
   ghost-API cleanup commits, so these env entries had been wiring
   nothing into the running container for several deploys. Force-
   recreating mana-app-web after pulling this commit will pick up
   the slimmer env automatically.

2. docker-compose.macmini.yml — bump `mana-mon-blackbox` mem_limit
   from 32m to 128m. blackbox-exporter v0.25 sits north of 32m
   under load and was OOM-restart-looping every ~90 seconds, which
   in turn made `status.mana.how` and the prometheus probe metrics
   stale (since the scraper was missing every other window).

3. docker/prometheus/prometheus.yml — split `blackbox-gpu` into two
   jobs:
     - `blackbox-gpu` now probes `/health` via the http_health
       module, because the GPU services (whisper STT, FLUX image
       gen, Coqui TTS) return 401/404 on `/` by design (auth or
       API-only). The previous http_2xx-on-`/` probe was reporting
       all four as down even though they answered `/health` with
       200, which inflated the down count on status.mana.how.
     - `blackbox-gpu-root` keeps the http_2xx-on-`/` probe for
       Ollama, which has no `/health` endpoint but does answer
       2xx on its root.
   Both jobs share the same blackbox-exporter relabel rewrite so
   the targets are routed through the exporter container, not
   scraped directly by VictoriaMetrics.

Verified post-fix: status.mana.how reports 41/42 services up (only
`gpu-video` remains down — LTX Video Gen is intentionally not
deployed yet on the Windows GPU box).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:59:38 +02:00
..
alert-notifier feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
alertmanager feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
blackbox feat(monitoring): add uptime monitoring via Blackbox Exporter 2026-03-31 17:43:25 +02:00
grafana feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
init-db feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
loki feat(gpu-server): complete GPU server setup with AI services, monitoring, and public access 2026-03-27 21:35:30 +01:00
matrix feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
nginx refactor: rename ManaDeck to Cards across entire monorepo 2026-04-01 11:45:21 +02:00
postgres fix(infra): use postgres -c flags instead of config_file override 2026-03-24 11:42:42 +01:00
prometheus chore(macmini): infra cleanup — compose env, blackbox mem, prometheus gpu probes 2026-04-07 22:59:38 +02:00
promtail feat(monitoring): structured logging, Promtail alignment, GlitchTip config, status page 2026-04-02 17:23:52 +02:00
shared 🐛 fix(docker): add missing build-shared-packages.sh script for Docker builds 2025-12-25 20:51:15 +01:00
templates chore: remove all NestJS backend references, replace with Hono/Bun 2026-03-31 16:52:25 +02:00
Dockerfile.hono-server feat(infra): add docker-compose for new Hono services + DB init 2026-03-28 17:54:24 +01:00
Dockerfile.sveltekit-base feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00