fix(mana-llm): route Ollama through gpu-proxy instead of LAN IP

The mana-service-llm container had OLLAMA_URL pointed at the GPU box's
LAN address (192.168.178.11:11434). On the Mac Mini host that route
works fine, but from inside any Colima container the entire
192.168.178.0/24 subnet gets synthesized RST — Colima's VM "claims"
the LAN range without being able to route to it, so every connect()
returns "Connection refused" before a packet ever leaves the box.

mana-llm started cleanly, reported the configured upstream as
"unhealthy", served an empty /v1/models list, and every chat
completion failed with "All connection attempts failed". The most
visible downstream effect: voice quick-add (parse-task, parse-habit)
silently degraded to its no-LLM fallback for everyone hitting the
local stack — same shape as a successful response, no error log,
just no enrichment.

The Mac Mini already runs a gpu-proxy LaunchAgent
(com.mana.gpu-proxy, /Users/mana/gpu-proxy.py) that forwards
127.0.0.1:13434 → 192.168.178.11:11434 alongside several other GPU
service ports. Pointing OLLAMA_URL at host.docker.internal:13434 and
adding the host-gateway extra_hosts mapping puts mana-llm on the
already-running rail. Verified end-to-end: from inside the container,
GET http://host.docker.internal:13434/api/tags now returns the full
model list (gemma3:4b, gemma3:12b, gemma3:27b, qwen2.5-coder:14b,
nomic-embed-text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-08 16:46:14 +02:00
parent da6e2f39da
commit 7f382138a1

View file

@ -952,10 +952,24 @@ services:
depends_on: depends_on:
redis: redis:
condition: service_healthy condition: service_healthy
# Ollama lives on the Windows GPU box at 192.168.178.11:11434, but
# Colima containers can't reach the LAN range — the entire
# 192.168.178.0/24 subnet gets synthesized RST from inside any
# container, even though the macOS host routes there fine. The
# gpu-proxy LaunchAgent on the Mac Mini host (com.mana.gpu-proxy,
# see /Users/mana/gpu-proxy.py) bridges 127.0.0.1:13434 → GPU
# box's 11434, so we go through host.docker.internal:13434 to
# reach Ollama. Without this hop the local mana-llm starts
# cleanly but reports an empty model list and every chat
# completion fails with "All connection attempts failed", which
# cascades into voice quick-add silently degrading to its no-LLM
# fallback for everyone hitting the local stack.
extra_hosts:
- "host.docker.internal:host-gateway"
environment: environment:
PORT: 3025 PORT: 3025
LOG_LEVEL: info LOG_LEVEL: info
OLLAMA_URL: ${OLLAMA_URL:-http://192.168.178.11:11434} OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:13434}
OLLAMA_DEFAULT_MODEL: ${OLLAMA_MODEL:-gemma3:12b} OLLAMA_DEFAULT_MODEL: ${OLLAMA_MODEL:-gemma3:12b}
OLLAMA_TIMEOUT: 120 OLLAMA_TIMEOUT: 120
REDIS_URL: redis://redis:6379 REDIS_URL: redis://redis:6379