diff --git a/packages/local-llm/CLAUDE.md b/packages/local-llm/CLAUDE.md
index 56383a671..9d40b68d0 100644
--- a/packages/local-llm/CLAUDE.md
+++ b/packages/local-llm/CLAUDE.md
@@ -1,6 +1,8 @@
 # `@mana/local-llm` — Browser-Local LLM Inference
 
-Client-side LLM inference that runs **entirely in the user's browser** via WebGPU. No server roundtrips, no API keys, no data leaving the device. Used by `/llm-test` (developer tool) and the `playground` module in `apps/mana/apps/web`. Not related to `services/mana-llm` (which is the server-side LLM proxy that talks to Ollama, OpenAI, etc.).
+Client-side LLM inference that runs **entirely in the user's browser** via WebGPU. No server roundtrips, no API keys, no data leaving the device. Used by `/llm-test` (developer tool) and the `playground` module in `apps/mana/apps/web`.
+
+**Don't confuse this with the server-side LLM** (`services/mana-llm`). The server-side proxy is what backs the **`mana-server`** and **`cloud`** tiers in `@mana/shared-llm`'s tiered orchestrator — it speaks OpenAI-compatible HTTP and routes to a configured Ollama instance or to Gemini. The Ollama instance is **not** the Mac Mini's local Ollama: traffic goes via `~/gpu-proxy.py` (a Python TCP forwarder running as a LaunchAgent on the Mac Mini host) to the Windows GPU server's Ollama at `192.168.178.11:11434`, where inference runs on the **RTX 3090**. See `docs/MAC_MINI_SERVER.md` and `docs/WINDOWS_GPU_SERVER_SETUP.md` for the full topology. This package (`@mana/local-llm`) is the **only** path that uses the user's own device — `mana-server` and `cloud` both leave the device.
 
 ## What's currently in the box
 
diff --git a/packages/shared-llm/src/backends/mana-server.ts b/packages/shared-llm/src/backends/mana-server.ts
index 474279e54..4ee6dd47e 100644
--- a/packages/shared-llm/src/backends/mana-server.ts
+++ b/packages/shared-llm/src/backends/mana-server.ts
@@ -1,19 +1,38 @@
 /**
  * Mana-server backend — calls services/mana-llm with an Ollama model
  * string. mana-llm's ProviderRouter recognizes plain Ollama model names
- * (no provider prefix) and routes them to the local Ollama instance on
- * the Mac Mini (running on the M4's Metal GPU), with automatic Gemini
- * fallback if Ollama is overloaded.
+ * (no provider prefix) and routes them to its configured Ollama
+ * instance, with automatic Google Gemini fallback if Ollama is
+ * overloaded.
+ *
+ * Where the inference actually runs (subtle, easy to misread):
+ *
+ *   mana-llm container's `OLLAMA_URL` points at
+ *   `host.docker.internal:13434`. That is NOT the Mac Mini's local
+ *   Ollama — it's a Python TCP forwarder (`~/gpu-proxy.py`, running
+ *   as a LaunchAgent on the Mac Mini host) that pipes the traffic to
+ *   `192.168.178.11:11434` over the LAN, where Ollama is running on
+ *   the Windows GPU server with the RTX 3090 (24 GB VRAM). All
+ *   inference happens there, not on the Mac Mini's M4 Metal GPU.
+ *
+ *   See docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md
+ *   (specifically the "Auf dem Mac Mini läuft ein TCP-Proxy" section)
+ *   for the full topology. The Mac Mini's brew-installed Ollama
+ *   binary is NOT on the inference path — it's just a local CLI for
+ *   inspecting the proxied daemon.
  *
  * The default model is gemma4:e4b — Google's Gemma 4 "Effective 4B"
  * variant, released 2026-04-02. Same family as @mana/local-llm's
  * browser tier model (Gemma 4 E2B is the smaller sibling) so prompts
  * behave consistently when a task auto-falls between tiers. e4b is
  * the right Mana-Server default because:
- *   - 9.6 GB on disk fits comfortably on the M4's 16 GB unified memory
+ *   - 9.6 GB on disk fits comfortably on the 3090's 24 GB VRAM
  *   - 128K context window covers all current title/summarize tasks
  *   - The "Effective 4B" architecture punches well above its weight
  *     class (better than gemma3:4b on most German prompts)
+ *   - It's a reasoning model — uses message.reasoning for chain-of-
+ *     thought when given enough max_tokens budget; remote.ts has a
+ *     fallback parser for that field
  *   - The tier name we surface in the source label stays "Gemma 4"
  *     family for both browser and mana-server, so the UX is coherent
  */