In commit c9e16243c (the gemma3:4b → gemma4:e4b switch) I sloppily
wrote in the ManaServerBackend docstring that mana-llm "routes them
to the local Ollama instance on the Mac Mini (running on the M4's
Metal GPU)". That is wrong AND it's the exact misconception I had
to debug-out-of earlier the same day.
The actual topology — already documented correctly in
docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I
just didn't read those before writing the docstring:
mana-llm container's OLLAMA_URL points at host.docker.internal:13434
→ ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini)
→ 192.168.178.11:11434 (LAN)
→ Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM)
→ Inference
The Mac Mini's brew-installed Ollama binary is NOT on the inference
path. It's just a CLI for inspecting the proxied daemon. Today's
"why does the Mac Mini still have Ollama 0.15.4" puzzle has the
answer "because nothing on the Mac Mini actually runs inference, the
binary version was never load-bearing".
Two doc fixes:
1. packages/shared-llm/src/backends/mana-server.ts
Replace the lying docstring with the real topology, including a
pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md
sections that document it. Also note that gemma4:e4b is a
reasoning model that emits message.reasoning when given enough
tokens (cross-reference to remote.ts's fallback parser).
2. packages/local-llm/CLAUDE.md
Add a paragraph at the top explaining the difference between
"@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm
"mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py
→ RTX 3090). This was implicit before — "not related to
services/mana-llm" — but didn't say where mana-server actually
goes. Future me reading the doc would still have to dig through
the docker-compose env to find out.
No code changes — only docstring + markdown.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two surprises came out of "why do we still use Gemma 3 instead of 4":
1. The hardcoded default in ManaServerBackend was `gemma3:4b`, which
was even smaller than mana-llm's actual server-side default of
`gemma3:12b`. My initial guess from docs/LOCAL_LLM_MODELS.md was
conservative.
2. The mana-llm OLLAMA_URL points at host.docker.internal:13434,
which is NOT the Mac Mini's local Ollama — it's a Python TCP
forwarder (~/gpu-proxy.py) that proxies to 192.168.178.11:11434
on the Windows GPU server. So title generation has been running
on the RTX 3090 the whole time, not on the M4 Metal GPU. The
Mac Mini's brew-installed ollama 0.15.4 wasn't even being used
for inference — only as a CLI to inspect the proxied Ollama.
To get to Gemma 4, both Ollama instances needed an upgrade:
- Mac Mini brew : 0.15.4 → 0.20.4 (cosmetic, the binary isn't on
the inference path; upgraded for consistency)
- GPU server : 0.18.2 → 0.20.4 via winget. Required restarting
the daemon via the OllamaServe scheduled task
that was already configured.
Then `ollama pull gemma4:e4b` on the GPU server (9.6 GB, ~10 min on
the LAN). Verified end-to-end via the proxy with a real chat
completion request to mana-llm — gemma4:e4b answered with a clean
4-word German title for a sample voice memo prompt:
prompt: "Erstelle einen kurzen 3-Wort Titel für: Es ist ein
schöner Tag heute am 9. April"
→ "Schöner Tag, neuntes April"
Changes in this commit:
packages/shared-llm/src/backends/mana-server.ts
- defaultModel: 'gemma3:4b' → 'gemma4:e4b'
- Updated docstring to explain why E4B is the right Mana-Server
tier default: 9.6 GB on disk, 128K context, "Effective 4B"
arch punches above its weight class for German prompts, and
the family stays consistent with the browser tier (Gemma 4
E2B is the smaller sibling) so the source label and prompt
behavior remain coherent across tiers.
apps/mana/apps/web/src/lib/modules/memoro/views/DetailView.svelte
- TITLE_SOURCE_LABELS map updated:
browser → "Auf deinem Gerät (Gemma 4 E2B)" (was "(Gemma 4)")
mana-server → "Mana-Server (Gemma 4 E4B)" (was "(gemma3:4b)")
- The label now reflects that BOTH the browser and the mana-server
tier are running Gemma 4 variants, which is more honest than
the previous mix.
Did NOT change:
- The Ollama OLLAMA_DEFAULT_MODEL env var in docker-compose.macmini.yml
(still gemma3:12b). That's the fallback for callers who don't
specify a model in their request. Our generate-title task always
sends an explicit model string, so it's unaffected. Bumping the
global default is a separate decision — it would change behavior
for the playground module and any other consumer that relies on
the implicit fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adding an app to a workbench scene threw DataCloneError. scenesState
is a $state array, so current.openApps was a Svelte 5 proxy and
spreading it into a new array left proxy entries inside; IndexedDB's
structured clone refuses to serialise those. Snapshot before handing
the array to patchScene / createScene so Dexie sees plain objects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>