managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-20 22:46:41 +02:00

Commit graph

Author SHA1 Message Date

Author	SHA1	Message	Date
Till JS	92f8221bfd	docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md In commit `c9e16243c` (the gemma3:4b → gemma4:e4b switch) I sloppily wrote in the ManaServerBackend docstring that mana-llm "routes them to the local Ollama instance on the Mac Mini (running on the M4's Metal GPU)". That is wrong AND it's the exact misconception I had to debug-out-of earlier the same day. The actual topology — already documented correctly in docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I just didn't read those before writing the docstring: mana-llm container's OLLAMA_URL points at host.docker.internal:13434 → ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini) → 192.168.178.11:11434 (LAN) → Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM) → Inference The Mac Mini's brew-installed Ollama binary is NOT on the inference path. It's just a CLI for inspecting the proxied daemon. Today's "why does the Mac Mini still have Ollama 0.15.4" puzzle has the answer "because nothing on the Mac Mini actually runs inference, the binary version was never load-bearing". Two doc fixes: 1. packages/shared-llm/src/backends/mana-server.ts Replace the lying docstring with the real topology, including a pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md sections that document it. Also note that gemma4:e4b is a reasoning model that emits message.reasoning when given enough tokens (cross-reference to remote.ts's fallback parser). 2. packages/local-llm/CLAUDE.md Add a paragraph at the top explaining the difference between "@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm "mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py → RTX 3090). This was implicit before — "not related to services/mana-llm" — but didn't say where mana-server actually goes. Future me reading the doc would still have to dig through the docker-compose env to find out. No code changes — only docstring + markdown. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:40:34 +02:00
Till JS	c9e16243c8	feat(shared-llm): bump mana-server default model to gemma4:e4b Two surprises came out of "why do we still use Gemma 3 instead of 4": 1. The hardcoded default in ManaServerBackend was `gemma3:4b`, which was even smaller than mana-llm's actual server-side default of `gemma3:12b`. My initial guess from docs/LOCAL_LLM_MODELS.md was conservative. 2. The mana-llm OLLAMA_URL points at host.docker.internal:13434, which is NOT the Mac Mini's local Ollama — it's a Python TCP forwarder (~/gpu-proxy.py) that proxies to 192.168.178.11:11434 on the Windows GPU server. So title generation has been running on the RTX 3090 the whole time, not on the M4 Metal GPU. The Mac Mini's brew-installed ollama 0.15.4 wasn't even being used for inference — only as a CLI to inspect the proxied Ollama. To get to Gemma 4, both Ollama instances needed an upgrade: - Mac Mini brew : 0.15.4 → 0.20.4 (cosmetic, the binary isn't on the inference path; upgraded for consistency) - GPU server : 0.18.2 → 0.20.4 via winget. Required restarting the daemon via the OllamaServe scheduled task that was already configured. Then `ollama pull gemma4:e4b` on the GPU server (9.6 GB, ~10 min on the LAN). Verified end-to-end via the proxy with a real chat completion request to mana-llm — gemma4:e4b answered with a clean 4-word German title for a sample voice memo prompt: prompt: "Erstelle einen kurzen 3-Wort Titel für: Es ist ein schöner Tag heute am 9. April" → "Schöner Tag, neuntes April" Changes in this commit: packages/shared-llm/src/backends/mana-server.ts - defaultModel: 'gemma3:4b' → 'gemma4:e4b' - Updated docstring to explain why E4B is the right Mana-Server tier default: 9.6 GB on disk, 128K context, "Effective 4B" arch punches above its weight class for German prompts, and the family stays consistent with the browser tier (Gemma 4 E2B is the smaller sibling) so the source label and prompt behavior remain coherent across tiers. apps/mana/apps/web/src/lib/modules/memoro/views/DetailView.svelte - TITLE_SOURCE_LABELS map updated: browser → "Auf deinem Gerät (Gemma 4 E2B)" (was "(Gemma 4)") mana-server → "Mana-Server (Gemma 4 E4B)" (was "(gemma3:4b)") - The label now reflects that BOTH the browser and the mana-server tier are running Gemma 4 variants, which is more honest than the previous mix. Did NOT change: - The Ollama OLLAMA_DEFAULT_MODEL env var in docker-compose.macmini.yml (still gemma3:12b). That's the fallback for callers who don't specify a model in their request. Our generate-title task always sends an explicit model string, so it's unaffected. Bumping the global default is a separate decision — it would change behavior for the playground module and any other consumer that relies on the implicit fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:06:33 +02:00
Till JS	56065c8537	fix(mana/web): unwrap $state proxy in workbench-scenes Dexie writes Adding an app to a workbench scene threw DataCloneError. scenesState is a $state array, so current.openApps was a Svelte 5 proxy and spreading it into a new array left proxy entries inside; IndexedDB's structured clone refuses to serialise those. Snapshot before handing the array to patchScene / createScene so Dexie sees plain objects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 00:44:00 +02:00

Till JS

92f8221bfd

docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md

In commit c9e16243c (the gemma3:4b → gemma4:e4b switch) I sloppily
wrote in the ManaServerBackend docstring that mana-llm "routes them
to the local Ollama instance on the Mac Mini (running on the M4's
Metal GPU)". That is wrong AND it's the exact misconception I had
to debug-out-of earlier the same day.

The actual topology — already documented correctly in
docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I
just didn't read those before writing the docstring:

  mana-llm container's OLLAMA_URL points at host.docker.internal:13434
  → ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini)
  → 192.168.178.11:11434 (LAN)
  → Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM)
  → Inference

The Mac Mini's brew-installed Ollama binary is NOT on the inference
path. It's just a CLI for inspecting the proxied daemon. Today's
"why does the Mac Mini still have Ollama 0.15.4" puzzle has the
answer "because nothing on the Mac Mini actually runs inference, the
binary version was never load-bearing".

Two doc fixes:

1. packages/shared-llm/src/backends/mana-server.ts
   Replace the lying docstring with the real topology, including a
   pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md
   sections that document it. Also note that gemma4:e4b is a
   reasoning model that emits message.reasoning when given enough
   tokens (cross-reference to remote.ts's fallback parser).

2. packages/local-llm/CLAUDE.md
   Add a paragraph at the top explaining the difference between
   "@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm
   "mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py
   → RTX 3090). This was implicit before — "not related to
   services/mana-llm" — but didn't say where mana-server actually
   goes. Future me reading the doc would still have to dig through
   the docker-compose env to find out.

No code changes — only docstring + markdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-09 16:40:34 +02:00

Till JS

c9e16243c8

feat(shared-llm): bump mana-server default model to gemma4:e4b

Two surprises came out of "why do we still use Gemma 3 instead of 4":

1. The hardcoded default in ManaServerBackend was `gemma3:4b`, which
   was even smaller than mana-llm's actual server-side default of
   `gemma3:12b`. My initial guess from docs/LOCAL_LLM_MODELS.md was
   conservative.

2. The mana-llm OLLAMA_URL points at host.docker.internal:13434,
   which is NOT the Mac Mini's local Ollama — it's a Python TCP
   forwarder (~/gpu-proxy.py) that proxies to 192.168.178.11:11434
   on the Windows GPU server. So title generation has been running
   on the RTX 3090 the whole time, not on the M4 Metal GPU. The
   Mac Mini's brew-installed ollama 0.15.4 wasn't even being used
   for inference — only as a CLI to inspect the proxied Ollama.

To get to Gemma 4, both Ollama instances needed an upgrade:
  - Mac Mini brew  : 0.15.4 → 0.20.4 (cosmetic, the binary isn't on
                     the inference path; upgraded for consistency)
  - GPU server     : 0.18.2 → 0.20.4 via winget. Required restarting
                     the daemon via the OllamaServe scheduled task
                     that was already configured.

Then `ollama pull gemma4:e4b` on the GPU server (9.6 GB, ~10 min on
the LAN). Verified end-to-end via the proxy with a real chat
completion request to mana-llm — gemma4:e4b answered with a clean
4-word German title for a sample voice memo prompt:

  prompt: "Erstelle einen kurzen 3-Wort Titel für: Es ist ein
           schöner Tag heute am 9. April"
  → "Schöner Tag, neuntes April"

Changes in this commit:

  packages/shared-llm/src/backends/mana-server.ts
    - defaultModel: 'gemma3:4b' → 'gemma4:e4b'
    - Updated docstring to explain why E4B is the right Mana-Server
      tier default: 9.6 GB on disk, 128K context, "Effective 4B"
      arch punches above its weight class for German prompts, and
      the family stays consistent with the browser tier (Gemma 4
      E2B is the smaller sibling) so the source label and prompt
      behavior remain coherent across tiers.

  apps/mana/apps/web/src/lib/modules/memoro/views/DetailView.svelte
    - TITLE_SOURCE_LABELS map updated:
        browser     → "Auf deinem Gerät (Gemma 4 E2B)" (was "(Gemma 4)")
        mana-server → "Mana-Server (Gemma 4 E4B)" (was "(gemma3:4b)")
    - The label now reflects that BOTH the browser and the mana-server
      tier are running Gemma 4 variants, which is more honest than
      the previous mix.

Did NOT change:
  - The Ollama OLLAMA_DEFAULT_MODEL env var in docker-compose.macmini.yml
    (still gemma3:12b). That's the fallback for callers who don't
    specify a model in their request. Our generate-title task always
    sends an explicit model string, so it's unaffected. Bumping the
    global default is a separate decision — it would change behavior
    for the playground module and any other consumer that relies on
    the implicit fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-09 16:06:33 +02:00

Till JS

56065c8537

fix(mana/web): unwrap $state proxy in workbench-scenes Dexie writes

Adding an app to a workbench scene threw DataCloneError. scenesState
is a $state array, so current.openApps was a Svelte 5 proxy and
spreading it into a new array left proxy entries inside; IndexedDB's
structured clone refuses to serialise those. Snapshot before handing
the array to patchScene / createScene so Dexie sees plain objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-09 00:44:00 +02:00

3 commits