From 05ae348b127be5b00a121e30736f9b9e40a602ec Mon Sep 17 00:00:00 2001 From: Till JS Date: Tue, 7 Apr 2026 23:47:57 +0200 Subject: [PATCH] fix(macmini): blackbox-exporter uses 1.1.1.1/8.8.8.8 directly for DNS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Docker's embedded DNS resolver (127.0.0.11) forwards to the host resolver, which on the Mac Mini forwards to the home router's FRITZ!Box DNS. The router keeps a stale negative cache for hours after a hostname first fails, so any newly added Cloudflare CNAME (e.g. the GPU public hostnames recreated via the Cloudflare dashboard during the 2026-04-07 cleanup) appears as "no such host" to the blackbox probes for the entire negative-cache TTL — even though the hostname resolves fine via 1.1.1.1 directly the entire time. Symptom before the fix: health-check.sh (uses dig @1.1.1.1) → All services healthy ✅ status.mana.how (via blackbox/VM) → 4 GPU services down ❌ The two views were lying to each other in opposite directions — the public-facing status page reported four healthy services as down while the operator runbook reported them as up. Confusing and exactly the kind of monitoring discrepancy a launch should not ship with. Fix: pin the blackbox container to public DNS (Cloudflare + Google) in compose. Blackbox now resolves directly against 1.1.1.1, bypassing the home-router negative cache entirely. After the recreate the four GPU probes flipped from probe_success=0 to probe_success=1 within one scrape interval, and status.mana.how went from 38/42 to 41/42 (only gpu-video remains down — LTX Video Gen is intentionally not deployed on the Windows GPU box yet). Co-Authored-By: Claude Opus 4.6 (1M context) --- docker-compose.macmini.yml | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docker-compose.macmini.yml b/docker-compose.macmini.yml index d853c3038..39eb05e97 100644 --- a/docker-compose.macmini.yml +++ b/docker-compose.macmini.yml @@ -1411,6 +1411,17 @@ services: container_name: mana-mon-blackbox restart: always mem_limit: 128m + # Use Cloudflare + Google public resolvers instead of Docker's + # embedded DNS (127.0.0.11). Docker DNS forwards to the host + # resolver which forwards to the home router (FRITZ!Box), and the + # router keeps a stale negative cache for hours after a hostname + # first fails. New CNAMEs (e.g. fresh GPU public hostnames added + # via the Cloudflare dashboard) appear as "no such host" to the + # blackbox probes for the entire negative-cache TTL even though + # they resolve fine via 1.1.1.1 directly. + dns: + - 1.1.1.1 + - 8.8.8.8 command: ["--config.file=/etc/blackbox/blackbox.yml"] volumes: - ./docker/blackbox/blackbox.yml:/etc/blackbox/blackbox.yml:ro