From 7f382138a10336b5b25f01f041bcb5d9a212f559 Mon Sep 17 00:00:00 2001
From: Till JS <tills95@gmail.com>
Date: Wed, 8 Apr 2026 16:46:14 +0200
Subject: [PATCH] fix(mana-llm): route Ollama through gpu-proxy instead of LAN
 IP
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The mana-service-llm container had OLLAMA_URL pointed at the GPU box's
LAN address (192.168.178.11:11434). On the Mac Mini host that route
works fine, but from inside any Colima container the entire
192.168.178.0/24 subnet gets synthesized RST — Colima's VM "claims"
the LAN range without being able to route to it, so every connect()
returns "Connection refused" before a packet ever leaves the box.

mana-llm started cleanly, reported the configured upstream as
"unhealthy", served an empty /v1/models list, and every chat
completion failed with "All connection attempts failed". The most
visible downstream effect: voice quick-add (parse-task, parse-habit)
silently degraded to its no-LLM fallback for everyone hitting the
local stack — same shape as a successful response, no error log,
just no enrichment.

The Mac Mini already runs a gpu-proxy LaunchAgent
(com.mana.gpu-proxy, /Users/mana/gpu-proxy.py) that forwards
127.0.0.1:13434 → 192.168.178.11:11434 alongside several other GPU
service ports. Pointing OLLAMA_URL at host.docker.internal:13434 and
adding the host-gateway extra_hosts mapping puts mana-llm on the
already-running rail. Verified end-to-end: from inside the container,
GET http://host.docker.internal:13434/api/tags now returns the full
model list (gemma3:4b, gemma3:12b, gemma3:27b, qwen2.5-coder:14b,
nomic-embed-text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docker-compose.macmini.yml | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/docker-compose.macmini.yml b/docker-compose.macmini.yml
index c7a0b689f..874cea160 100644
--- a/docker-compose.macmini.yml
+++ b/docker-compose.macmini.yml
@@ -952,10 +952,24 @@ services:
     depends_on:
       redis:
         condition: service_healthy
+    # Ollama lives on the Windows GPU box at 192.168.178.11:11434, but
+    # Colima containers can't reach the LAN range — the entire
+    # 192.168.178.0/24 subnet gets synthesized RST from inside any
+    # container, even though the macOS host routes there fine. The
+    # gpu-proxy LaunchAgent on the Mac Mini host (com.mana.gpu-proxy,
+    # see /Users/mana/gpu-proxy.py) bridges 127.0.0.1:13434 → GPU
+    # box's 11434, so we go through host.docker.internal:13434 to
+    # reach Ollama. Without this hop the local mana-llm starts
+    # cleanly but reports an empty model list and every chat
+    # completion fails with "All connection attempts failed", which
+    # cascades into voice quick-add silently degrading to its no-LLM
+    # fallback for everyone hitting the local stack.
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
     environment:
       PORT: 3025
       LOG_LEVEL: info
-      OLLAMA_URL: ${OLLAMA_URL:-http://192.168.178.11:11434}
+      OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:13434}
       OLLAMA_DEFAULT_MODEL: ${OLLAMA_MODEL:-gemma3:12b}
       OLLAMA_TIMEOUT: 120
       REDIS_URL: redis://redis:6379