managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-15 01:21:09 +02:00

History

Till JS 92f8221bfd docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md In commit `c9e16243c` (the gemma3:4b → gemma4:e4b switch) I sloppily wrote in the ManaServerBackend docstring that mana-llm "routes them to the local Ollama instance on the Mac Mini (running on the M4's Metal GPU)". That is wrong AND it's the exact misconception I had to debug-out-of earlier the same day. The actual topology — already documented correctly in docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I just didn't read those before writing the docstring: mana-llm container's OLLAMA_URL points at host.docker.internal:13434 → ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini) → 192.168.178.11:11434 (LAN) → Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM) → Inference The Mac Mini's brew-installed Ollama binary is NOT on the inference path. It's just a CLI for inspecting the proxied daemon. Today's "why does the Mac Mini still have Ollama 0.15.4" puzzle has the answer "because nothing on the Mac Mini actually runs inference, the binary version was never load-bearing". Two doc fixes: 1. packages/shared-llm/src/backends/mana-server.ts Replace the lying docstring with the real topology, including a pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md sections that document it. Also note that gemma4:e4b is a reasoning model that emits message.reasoning when given enough tokens (cross-reference to remote.ts's fallback parser). 2. packages/local-llm/CLAUDE.md Add a paragraph at the top explaining the difference between "@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm "mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py → RTX 3090). This was implicit before — "not related to services/mana-llm" — but didn't say where mana-server actually goes. Future me reading the doc would still have to dig through the docker-compose env to find out. No code changes — only docstring + markdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-09 16:40:34 +02:00
..
src	feat(local-llm): Phase 3 — move inference into a Web Worker	2026-04-09 01:27:10 +02:00
CLAUDE.md	docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md	2026-04-09 16:40:34 +02:00
package.json	feat(local-llm): swap WebLLM/Qwen for transformers.js + Gemma 4 E2B	2026-04-08 22:22:32 +02:00
tsconfig.json	feat(local-llm): add client-side LLM inference package with WebLLM	2026-04-02 01:53:54 +02:00