managarten/packages/local-llm
Till JS 92f8221bfd docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md
In commit c9e16243c (the gemma3:4b → gemma4:e4b switch) I sloppily
wrote in the ManaServerBackend docstring that mana-llm "routes them
to the local Ollama instance on the Mac Mini (running on the M4's
Metal GPU)". That is wrong AND it's the exact misconception I had
to debug-out-of earlier the same day.

The actual topology — already documented correctly in
docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I
just didn't read those before writing the docstring:

  mana-llm container's OLLAMA_URL points at host.docker.internal:13434
  → ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini)
  → 192.168.178.11:11434 (LAN)
  → Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM)
  → Inference

The Mac Mini's brew-installed Ollama binary is NOT on the inference
path. It's just a CLI for inspecting the proxied daemon. Today's
"why does the Mac Mini still have Ollama 0.15.4" puzzle has the
answer "because nothing on the Mac Mini actually runs inference, the
binary version was never load-bearing".

Two doc fixes:

1. packages/shared-llm/src/backends/mana-server.ts
   Replace the lying docstring with the real topology, including a
   pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md
   sections that document it. Also note that gemma4:e4b is a
   reasoning model that emits message.reasoning when given enough
   tokens (cross-reference to remote.ts's fallback parser).

2. packages/local-llm/CLAUDE.md
   Add a paragraph at the top explaining the difference between
   "@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm
   "mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py
   → RTX 3090). This was implicit before — "not related to
   services/mana-llm" — but didn't say where mana-server actually
   goes. Future me reading the doc would still have to dig through
   the docker-compose env to find out.

No code changes — only docstring + markdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:40:34 +02:00
..
src feat(local-llm): Phase 3 — move inference into a Web Worker 2026-04-09 01:27:10 +02:00
CLAUDE.md docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md 2026-04-09 16:40:34 +02:00
package.json feat(local-llm): swap WebLLM/Qwen for transformers.js + Gemma 4 E2B 2026-04-08 22:22:32 +02:00
tsconfig.json feat(local-llm): add client-side LLM inference package with WebLLM 2026-04-02 01:53:54 +02:00