mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 22:21:10 +02:00
fix(macmini): blackbox-exporter uses 1.1.1.1/8.8.8.8 directly for DNS
Docker's embedded DNS resolver (127.0.0.11) forwards to the host resolver, which on the Mac Mini forwards to the home router's FRITZ!Box DNS. The router keeps a stale negative cache for hours after a hostname first fails, so any newly added Cloudflare CNAME (e.g. the GPU public hostnames recreated via the Cloudflare dashboard during the 2026-04-07 cleanup) appears as "no such host" to the blackbox probes for the entire negative-cache TTL — even though the hostname resolves fine via 1.1.1.1 directly the entire time. Symptom before the fix: health-check.sh (uses dig @1.1.1.1) → All services healthy ✅ status.mana.how (via blackbox/VM) → 4 GPU services down ❌ The two views were lying to each other in opposite directions — the public-facing status page reported four healthy services as down while the operator runbook reported them as up. Confusing and exactly the kind of monitoring discrepancy a launch should not ship with. Fix: pin the blackbox container to public DNS (Cloudflare + Google) in compose. Blackbox now resolves directly against 1.1.1.1, bypassing the home-router negative cache entirely. After the recreate the four GPU probes flipped from probe_success=0 to probe_success=1 within one scrape interval, and status.mana.how went from 38/42 to 41/42 (only gpu-video remains down — LTX Video Gen is intentionally not deployed on the Windows GPU box yet). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
24001e9545
commit
05ae348b12
1 changed files with 11 additions and 0 deletions
|
|
@ -1411,6 +1411,17 @@ services:
|
|||
container_name: mana-mon-blackbox
|
||||
restart: always
|
||||
mem_limit: 128m
|
||||
# Use Cloudflare + Google public resolvers instead of Docker's
|
||||
# embedded DNS (127.0.0.11). Docker DNS forwards to the host
|
||||
# resolver which forwards to the home router (FRITZ!Box), and the
|
||||
# router keeps a stale negative cache for hours after a hostname
|
||||
# first fails. New CNAMEs (e.g. fresh GPU public hostnames added
|
||||
# via the Cloudflare dashboard) appear as "no such host" to the
|
||||
# blackbox probes for the entire negative-cache TTL even though
|
||||
# they resolve fine via 1.1.1.1 directly.
|
||||
dns:
|
||||
- 1.1.1.1
|
||||
- 8.8.8.8
|
||||
command: ["--config.file=/etc/blackbox/blackbox.yml"]
|
||||
volumes:
|
||||
- ./docker/blackbox/blackbox.yml:/etc/blackbox/blackbox.yml:ro
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue