mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:01:09 +02:00
Add packages/local-llm/CLAUDE.md as the package-level reference for
browser-local LLM inference. The package went through a non-trivial
engine swap from WebLLM/Qwen to transformers.js/Gemma 4 E2B on
2026-04-08, and the bring-up surfaced enough sharp edges that the
next person (or AI agent) touching this code will save real time
having them written down in one place rather than re-discovering
them error by error.
Captured topics:
- What the package is, what library/model is currently used, and
the deliberate engine-agnostic API surface that lets future swaps
stay contained to this package.
- Why we chose transformers.js + Gemma 4 over staying on WebLLM
(MLC compilation lag for new model architectures) and what the
return path looks like once MLC ships Gemma 4 builds.
- The seven CSP directives that browser-local inference needs and
WHY each one is required:
* script-src: 'wasm-unsafe-eval', cdn.jsdelivr.net, blob:
* connect-src: huggingface.co + *.huggingface.co + cdn-lfs-*,
*.hf.co + cas-bridge.xethub.hf.co (XET CDN),
cdn.jsdelivr.net (for the WASM preload fetch)
Including the subtle "jsDelivr is needed in BOTH script-src and
connect-src" trap that produces identical-looking error messages
for two distinct underlying causes.
- The Vite SSR module-cache gotcha: CSP additions made in
packages/shared-utils/security-headers.ts do NOT hot-reload across
the workspace package boundary, while additions made directly in
apps/mana/apps/web/src/hooks.server.ts do. Includes the diagnostic
pattern (compare which additions show up in the next CSP error
vs which don't) and the workaround (move them into hooks.server.ts
via setSecurityHeaders options).
- The two-step tokenization pattern that's mandatory for
Gemma4Processor: apply_chat_template(tokenize:false) → string, then
processor.tokenizer(text, return_tensors:'pt'). The collapsed
apply_chat_template(return_dict:true) path looks shorter but
produces a malformed input shape and crashes model.generate() deep
inside the forward pass with "Cannot read properties of null
(reading 'dims')" — opaque from the call site.
- The transformers.js v4 quirk that model.generate() returns null
(not a tensor) when a TextStreamer is attached. The streamer is
the only stable text channel; the engine always attaches one and
uses the streamer's collected text as the canonical output, with
a chars/4 fallback for token counts.
- API surface (Svelte 5 example), how to add a new model to the
registry, deploy notes (no base image rebuild needed for local-llm
changes alone, but IS needed if shared-utils CSP defaults change),
browser cache semantics, and hard browser support requirements
(WebGPU, ~1.5–2 GB VRAM for E2B q4f16, no CPU/WASM fallback).
Also link to the new doc from the root CLAUDE.md Shared Packages
table so people land on it from the standard discovery path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| cards-database | ||
| credits | ||
| eslint-config | ||
| feedback | ||
| help | ||
| local-llm | ||
| local-store | ||
| notify-client | ||
| qr-export | ||
| shared-api-client | ||
| shared-auth | ||
| shared-auth-ui | ||
| shared-branding | ||
| shared-config | ||
| shared-drizzle-config | ||
| shared-error-tracking | ||
| shared-errors | ||
| shared-go | ||
| shared-hono | ||
| shared-i18n | ||
| shared-icons | ||
| shared-landing-ui | ||
| shared-links | ||
| shared-llm | ||
| shared-logger | ||
| shared-pwa | ||
| shared-python/manacore_auth | ||
| shared-splitscreen | ||
| shared-storage | ||
| shared-stores | ||
| shared-tags | ||
| shared-tailwind | ||
| shared-theme | ||
| shared-theme-ui | ||
| shared-tsconfig | ||
| shared-types | ||
| shared-ui | ||
| shared-uload | ||
| shared-utils | ||
| shared-vite-config | ||
| spiral-db | ||
| subscriptions | ||
| test-config | ||
| wallpaper-generator | ||