docs(local-llm): document browser-local LLM stack and CSP requirements

Add packages/local-llm/CLAUDE.md as the package-level reference for
browser-local LLM inference. The package went through a non-trivial
engine swap from WebLLM/Qwen to transformers.js/Gemma 4 E2B on
2026-04-08, and the bring-up surfaced enough sharp edges that the
next person (or AI agent) touching this code will save real time
having them written down in one place rather than re-discovering
them error by error.

Captured topics:

- What the package is, what library/model is currently used, and
  the deliberate engine-agnostic API surface that lets future swaps
  stay contained to this package.

- Why we chose transformers.js + Gemma 4 over staying on WebLLM
  (MLC compilation lag for new model architectures) and what the
  return path looks like once MLC ships Gemma 4 builds.

- The seven CSP directives that browser-local inference needs and
  WHY each one is required:
    * script-src: 'wasm-unsafe-eval', cdn.jsdelivr.net, blob:
    * connect-src: huggingface.co + *.huggingface.co + cdn-lfs-*,
                   *.hf.co + cas-bridge.xethub.hf.co (XET CDN),
                   cdn.jsdelivr.net (for the WASM preload fetch)
  Including the subtle "jsDelivr is needed in BOTH script-src and
  connect-src" trap that produces identical-looking error messages
  for two distinct underlying causes.

- The Vite SSR module-cache gotcha: CSP additions made in
  packages/shared-utils/security-headers.ts do NOT hot-reload across
  the workspace package boundary, while additions made directly in
  apps/mana/apps/web/src/hooks.server.ts do. Includes the diagnostic
  pattern (compare which additions show up in the next CSP error
  vs which don't) and the workaround (move them into hooks.server.ts
  via setSecurityHeaders options).

- The two-step tokenization pattern that's mandatory for
  Gemma4Processor: apply_chat_template(tokenize:false) → string, then
  processor.tokenizer(text, return_tensors:'pt'). The collapsed
  apply_chat_template(return_dict:true) path looks shorter but
  produces a malformed input shape and crashes model.generate() deep
  inside the forward pass with "Cannot read properties of null
  (reading 'dims')" — opaque from the call site.

- The transformers.js v4 quirk that model.generate() returns null
  (not a tensor) when a TextStreamer is attached. The streamer is
  the only stable text channel; the engine always attaches one and
  uses the streamer's collected text as the canonical output, with
  a chars/4 fallback for token counts.

- API surface (Svelte 5 example), how to add a new model to the
  registry, deploy notes (no base image rebuild needed for local-llm
  changes alone, but IS needed if shared-utils CSP defaults change),
  browser cache semantics, and hard browser support requirements
  (WebGPU, ~1.5–2 GB VRAM for E2B q4f16, no CPU/WASM fallback).

Also link to the new doc from the root CLAUDE.md Shared Packages
table so people land on it from the standard discovery path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-08 23:27:50 +02:00
parent e8423e7551
commit 7901a5df2f
2 changed files with 190 additions and 0 deletions

View file

@ -145,6 +145,7 @@ MinIO (Docker, S3-compatible) in both local and prod. Console: http://localhost:
| `@mana/shared-theme` | Theme config |
| `@mana/shared-i18n` | i18n |
| `@mana/local-store` | Local-first store primitives — used by unified Mana, manavoxel, arcade, and shared-uload/-stores/-links |
| `@mana/local-llm` | Browser-local LLM inference (transformers.js + Gemma 4 E2B, WebGPU). Powers `/llm-test` and the playground module. See [`packages/local-llm/CLAUDE.md`](packages/local-llm/CLAUDE.md) for the CSP requirements and the transformers.js v4 gotchas. |
## Adding Dependencies