Commit graph

2 commits

Author SHA1 Message Date
Till JS
cd888cd54a fix(gpu-box): drop gpu-promtail healthcheck — image has no curl/wget/nc
Promtail v3.0.0 ships a minimal alpine-ish image with only the
promtail binary. The original Mini compose's wget-based healthcheck
errored out with 'executable file not found' on every tick, marking
the container as 'unhealthy' for hours despite Loki actively
receiving logs from it. Restart-policy unless-stopped catches real
crashes anyway, so the healthcheck adds noise without value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 15:13:47 +02:00
Till JS
d8a35afd99 infra(gpu-box): commit GPU-Box compose to repo + Phase 2e docs
The GPU-Box stack has been carrying real production workload since
Phase 2c (monitoring) but only existed as a /srv/mana/docker-compose.gpu-box.yml
on the box itself. If the WSL filesystem dies, none of it is
reproducible. Bring the file into infrastructure/ as the source of
truth (live file on the box must be kept synchronous; manual rsync
for now since there's no CD into the GPU box).

Plus:
- infrastructure/.env.gpu-box.example as the secrets template
- infrastructure/README.md describing what runs there + how the
  Cloudflare-tunnel ingress is API-managed (not config.yml)
- .gitignore for the live infrastructure/.env.gpu-box copy
- MAC_MINI_SERVER.md status-page section now points at the GPU-Box
  setup instead of the long-stopped Mini container
- PLAN_OPTION_C.md: Phase 2e row + GPU-Box service tree update

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 13:28:49 +02:00