From 1f26aa4f2faf7693b72961b2ccd32f350c9e00ba Mon Sep 17 00:00:00 2001
From: Till JS <tills95@gmail.com>
Date: Wed, 8 Apr 2026 22:22:32 +0200
Subject: [PATCH] feat(local-llm): swap WebLLM/Qwen for transformers.js + Gemma
 4 E2B
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replace the entire @mana/local-llm engine with a transformers.js-based
implementation backed by Google's Gemma 4 E2B (released 2026-04-02).
The external API of LocalLLMEngine — load(), generate(), prompt(),
extractJson(), classify(), onStatusChange(), isSupported() — is
preserved 1:1, so the /llm-test page, the playground module, and the
Svelte 5 reactive bindings in svelte.svelte.ts need no changes
beyond updating the default model key.

Why the engine swap: MLC has not (and as of today still hasn't)
published Gemma 4 builds for WebLLM. The webml-community team and
HuggingFace's onnx-community already have Gemma 4 E2B running in
the browser via transformers.js + WebGPU, with a documented
Gemma4ForConditionalGeneration class shipped in @huggingface/transformers
v4.0.0. Going through the ONNX route gets us the latest Google model
six days after release instead of waiting on MLC compilation.

Trade-offs accepted (discussed before this commit):
- transformers.js is a more generic ONNX runtime, so per-token
  throughput will be ~20-40% lower than WebLLM would deliver for the
  same model size. For a 2B model on a modern WebGPU device that's
  still well above interactive latency.
- The JS bundle gains ~2-3 MB (the ONNX runtime). Negligible compared
  to the 500 MB model download.
- transformers.js v4 is brand new (released alongside Gemma 4) so the
  Gemma4ForConditionalGeneration code path has very little battle
  testing yet. The risk is partially offset by webml-community's
  reference implementation.

What changed file by file:

- packages/local-llm/package.json: drop @mlc-ai/web-llm, add
  @huggingface/transformers ^4.0.0; bump version 0.1.0 → 0.2.0; rewrite
  description.

- packages/local-llm/src/types.ts: add `dtype` field to ModelConfig
  ('fp32' | 'fp16' | 'q8' | 'q4' | 'q4f16') so each model can request
  the quantization that matches its uploaded ONNX shards.

- packages/local-llm/src/models.ts: replace the old Qwen 2.5 + Gemma 2
  registry with a single `gemma-4-e2b` entry pointing at
  onnx-community/gemma-4-E2B-it-ONNX with q4f16 quantization. Future
  models can be added by appending entries — the /llm-test picker
  reads MODELS dynamically and picks them up automatically.

- packages/local-llm/src/cache.ts: replace the WebLLM-specific
  hasModelInCache helper with a generic Cache API probe that looks for
  `https://huggingface.co/{model_id}/resolve/main/tokenizer.json` in
  any open cache. tokenizer.json is small, downloaded first, and
  always present, so its presence is a reliable proxy for "model has
  been loaded before".

- packages/local-llm/src/engine.ts: full rewrite. Internally we now
  hold a transformers.js model + processor pair (created via
  AutoProcessor.from_pretrained + Gemma4ForConditionalGeneration.from_pretrained
  with `device: 'webgpu'`), and translate our LoadingStatus union from
  the library's `progress_callback` shape. generate() applies Gemma's
  chat template via the processor, runs model.generate() with optional
  TextStreamer for streaming, then slices the prompt tokens off the
  output tensor to compute per-call usage. The convenience methods
  (prompt, extractJson, classify) are unchanged because they only call
  generate() under the hood.

- packages/local-llm/src/generate.ts and status.svelte.ts: deleted.
  These were orphaned from a much earlier engine API (referenced
  `getEngine()` / `subscribe()` / `LlmState` symbols that haven't
  existed for a while) and were never re-exported from index.ts —
  they only showed up because `tsc --noEmit` was crawling the src
  tree. Their functionality lives in engine.ts + svelte.svelte.ts now.

- apps/mana/apps/web/package.json: swap the direct dep from
  @mlc-ai/web-llm to @huggingface/transformers. This is the same
  trick we used for the previous adapter-node externals warning —
  having it as a direct dep makes adapter-node's Rollup pass treat
  it as external automatically.

- apps/mana/apps/web/vite.config.ts: swap ssr.external entry from
  @mlc-ai/web-llm to @huggingface/transformers. Add a comment
  explaining the why so the next person doesn't wonder.

- apps/mana/apps/web/src/routes/(app)/llm-test/+page.svelte: change
  the default selectedModel from 'qwen-2.5-1.5b' to 'gemma-4-e2b'.
  All other model display strings come from the MODELS registry, so
  this is the single hard-coded reference that needed updating.

- pnpm-lock.yaml: regenerated. Confirmed @mlc-ai/web-llm is gone (0
  references) and @huggingface/transformers is in (4 references).

CSP: no header changes needed. We already opened connect-src for
huggingface.co + cdn-lfs.huggingface.co + raw.githubusercontent.com
when fixing the WebLLM blockers earlier today, and 'wasm-unsafe-eval'
is already in script-src — both transformers.js (ONNX runtime) and
WebLLM (MLC runtime) need that. If transformers.js spawns its
inference into a Web Worker via a blob URL we may need to add
`worker-src 'self' blob:` once we hit the first runtime test, but
the existing CSP should be enough for the synchronous path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 apps/mana/apps/web/package.json               |   2 +-
 .../src/routes/(app)/llm-test/+page.svelte    |   2 +-
 apps/mana/apps/web/vite.config.ts             |   7 +-
 packages/local-llm/package.json               |   6 +-
 packages/local-llm/src/cache.ts               |  21 +-
 packages/local-llm/src/engine.ts              | 223 ++++++++++++------
 packages/local-llm/src/generate.ts            | 112 ---------
 packages/local-llm/src/models.ts              |  47 ++--
 packages/local-llm/src/status.svelte.ts       |  22 --
 packages/local-llm/src/types.ts               |  11 +-
 pnpm-lock.yaml                                | 194 +++++++++++++--
 11 files changed, 378 insertions(+), 269 deletions(-)
 delete mode 100644 packages/local-llm/src/generate.ts
 delete mode 100644 packages/local-llm/src/status.svelte.ts

diff --git a/apps/mana/apps/web/package.json b/apps/mana/apps/web/package.json
index 2affc33a3..3f95e97ff 100644
--- a/apps/mana/apps/web/package.json
+++ b/apps/mana/apps/web/package.json
@@ -45,7 +45,7 @@
 	},
 	"dependencies": {
 		"@calc/shared": "workspace:*",
-		"@mlc-ai/web-llm": "^0.2.78",
+		"@huggingface/transformers": "^4.0.0",
 		"@mana/credits": "workspace:^",
 		"@mana/feedback": "workspace:*",
 		"@mana/help": "workspace:*",
diff --git a/apps/mana/apps/web/src/routes/(app)/llm-test/+page.svelte b/apps/mana/apps/web/src/routes/(app)/llm-test/+page.svelte
index e9c8319c1..3b4a0f458 100644
--- a/apps/mana/apps/web/src/routes/(app)/llm-test/+page.svelte
+++ b/apps/mana/apps/web/src/routes/(app)/llm-test/+page.svelte
@@ -42,7 +42,7 @@
 	}
 
 	// --- State ---
-	let selectedModel: ModelKey = $state('qwen-2.5-1.5b');
+	let selectedModel: ModelKey = $state('gemma-4-e2b');
 	let activeTab: 'chat' | 'extract' | 'classify' | 'compare' | 'benchmark' = $state('chat');
 	const supported = isLocalLlmSupported();
 	const status = getLocalLlmStatus();
diff --git a/apps/mana/apps/web/vite.config.ts b/apps/mana/apps/web/vite.config.ts
index 815874a96..f4164ebd2 100644
--- a/apps/mana/apps/web/vite.config.ts
+++ b/apps/mana/apps/web/vite.config.ts
@@ -64,7 +64,12 @@ export default defineConfig({
 		// into the server build forces Vite's interop layer to handle the
 		// CJS↔ESM mismatch correctly.
 		noExternal: [...MANA_SHARED_PACKAGES, ...APP_SHARED_PACKAGES, 'rrule'],
-		external: ['@mlc-ai/web-llm'],
+		// transformers.js is browser-only (uses WebGPU + the Cache API). The
+		// dynamic import in @mana/local-llm only ever fires client-side, but
+		// SvelteKit's adapter-node Rollup pass would otherwise warn that the
+		// import is unresolved at SSR time. Marking it external both silences
+		// the warning and ensures the SSR bundle never tries to load it.
+		external: ['@huggingface/transformers'],
 	},
 	optimizeDeps: {
 		exclude: [...MANA_SHARED_PACKAGES, ...APP_SHARED_PACKAGES],
diff --git a/packages/local-llm/package.json b/packages/local-llm/package.json
index d79e65d99..acc54e1a5 100644
--- a/packages/local-llm/package.json
+++ b/packages/local-llm/package.json
@@ -1,8 +1,8 @@
 {
 	"name": "@mana/local-llm",
-	"version": "0.1.0",
+	"version": "0.2.0",
 	"private": true,
-	"description": "Client-side LLM inference via WebLLM (Qwen 2.5 1.5B) with Svelte 5 reactive stores",
+	"description": "Client-side LLM inference via transformers.js (Gemma 4 E2B, WebGPU) with Svelte 5 reactive stores",
 	"main": "./src/index.ts",
 	"types": "./src/index.ts",
 	"exports": {
@@ -13,7 +13,7 @@
 		"clean": "rm -rf dist"
 	},
 	"dependencies": {
-		"@mlc-ai/web-llm": "^0.2.78"
+		"@huggingface/transformers": "^4.0.0"
 	},
 	"devDependencies": {
 		"@types/node": "^24.10.1",
diff --git a/packages/local-llm/src/cache.ts b/packages/local-llm/src/cache.ts
index 373d4be1a..3ecee020f 100644
--- a/packages/local-llm/src/cache.ts
+++ b/packages/local-llm/src/cache.ts
@@ -1,12 +1,23 @@
 /**
- * Check if a model is cached in the browser's Cache API.
- * Wraps @mlc-ai/web-llm's hasModelInCache with a dynamic import
- * so it doesn't break SSR/Docker builds.
+ * Check if a transformers.js model is already cached in the browser.
+ *
+ * transformers.js stores HuggingFace shards in the standard Cache API under a
+ * named cache (default "transformers-cache"). We probe for the model's
+ * tokenizer.json — it's tiny (~few KB), always present, and downloaded
+ * first, so its presence is a reliable proxy for "this model has been
+ * loaded at least once before".
  */
 export async function hasModelInCache(modelId: string): Promise<boolean> {
+	if (typeof caches === 'undefined') return false;
 	try {
-		const { hasModelInCache: check } = await import('@mlc-ai/web-llm');
-		return await check(modelId);
+		const cacheNames = await caches.keys();
+		const url = `https://huggingface.co/${modelId}/resolve/main/tokenizer.json`;
+		for (const name of cacheNames) {
+			const cache = await caches.open(name);
+			const match = await cache.match(url);
+			if (match) return true;
+		}
+		return false;
 	} catch {
 		return false;
 	}
diff --git a/packages/local-llm/src/engine.ts b/packages/local-llm/src/engine.ts
index ce7c7b66a..60a7360c0 100644
--- a/packages/local-llm/src/engine.ts
+++ b/packages/local-llm/src/engine.ts
@@ -1,17 +1,34 @@
 /**
- * LocalLLMEngine — WebLLM wrapper for client-side inference.
+ * LocalLLMEngine — transformers.js wrapper for client-side inference.
  *
- * Lazy-loads the model on first use, caches weights in browser Cache API.
- * Provides both one-shot and streaming generation.
+ * Lazy-loads a HuggingFace ONNX model on first use, caches weights in the
+ * browser's Cache API, and runs inference on the WebGPU backend.
+ *
+ * The default model is Google's Gemma 4 E2B (`onnx-community/gemma-4-E2B-it-ONNX`,
+ * q4f16). The external API of this class is intentionally identical to the
+ * previous WebLLM implementation so callers (Svelte stores, /llm-test page,
+ * playground module) need no changes when the underlying engine swaps.
  */
 
-import type { MLCEngine } from '@mlc-ai/web-llm';
 import type { ChatMessage, GenerateOptions, GenerateResult, LoadingStatus } from './types';
 import type { ModelConfig } from './types';
 import { MODELS, DEFAULT_MODEL, type ModelKey } from './models';
 
+// transformers.js types are minimal here on purpose. The library does not
+// publish first-class TS types for every model class, and we never expose
+// these objects past this file — the public surface (LocalLLMEngine methods)
+// is fully typed via our own GenerateResult / LoadingStatus etc.
+type TransformersModule = typeof import('@huggingface/transformers');
+
+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+type AnyModel = any;
+// eslint-disable-next-line @typescript-eslint/no-explicit-any
+type AnyProcessor = any;
+
 export class LocalLLMEngine {
-	private engine: MLCEngine | null = null;
+	private model: AnyModel = null;
+	private processor: AnyProcessor = null;
+	private transformers: TransformersModule | null = null;
 	private loadPromise: Promise<void> | null = null;
 	private currentModel: ModelKey | null = null;
 	private _status: LoadingStatus = { state: 'idle' };
@@ -53,17 +70,17 @@ export class LocalLLMEngine {
 
 	/**
 	 * Load a model. Idempotent — returns immediately if already loaded.
-	 * Model weights are cached in browser Cache API for instant reload.
+	 * Model weights are cached in the browser Cache API for instant reload.
 	 */
 	async load(model: ModelKey = DEFAULT_MODEL): Promise<void> {
 		// Already loaded with this model
-		if (this.engine && this.currentModel === model) return;
+		if (this.model && this.currentModel === model) return;
 
 		// Already loading
 		if (this.loadPromise && this.currentModel === model) return this.loadPromise;
 
 		// Unload previous model if switching
-		if (this.engine && this.currentModel !== model) {
+		if (this.model && this.currentModel !== model) {
 			await this.unload();
 		}
 
@@ -81,21 +98,60 @@ export class LocalLLMEngine {
 		this.setStatus({ state: 'checking' });
 
 		try {
-			const { CreateMLCEngine } = await import('@mlc-ai/web-llm');
+			if (!this.transformers) {
+				this.transformers = await import('@huggingface/transformers');
+			}
 			const config = MODELS[model];
 
-			this.engine = await CreateMLCEngine(config.modelId, {
-				initProgressCallback: (report) => {
-					if (report.progress < 1) {
-						this.setStatus({
-							state: 'downloading',
-							progress: report.progress,
-							text: report.text,
-						});
-					} else {
-						this.setStatus({ state: 'loading', text: 'Initializing model...' });
-					}
-				},
+			// transformers.js progress callback shape:
+			//   { status: 'initiate'|'download'|'progress'|'done'|'ready',
+			//     name?: string, file?: string, progress?: number, loaded?: number, total?: number }
+			// We collapse it into our LoadingStatus union.
+			const progressCallback = (report: {
+				status: string;
+				file?: string;
+				name?: string;
+				progress?: number;
+				loaded?: number;
+				total?: number;
+			}) => {
+				const label = report.file ?? report.name ?? '';
+				if (report.status === 'progress' || report.status === 'download') {
+					const pct = typeof report.progress === 'number' ? report.progress : 0;
+					this.setStatus({
+						state: 'downloading',
+						progress: pct / 100,
+						text: label
+							? `Downloading ${label} (${pct.toFixed(0)}%)`
+							: `Downloading (${pct.toFixed(0)}%)`,
+					});
+				} else if (report.status === 'initiate') {
+					this.setStatus({ state: 'downloading', progress: 0, text: `Starting ${label}` });
+				} else if (report.status === 'done') {
+					this.setStatus({ state: 'loading', text: label ? `Loaded ${label}` : 'Loaded shard' });
+				}
+				// 'ready' is handled below after both processor + model finish
+			};
+
+			// AutoProcessor wraps tokenizer + image/audio preprocessors. For
+			// our text-only chat path we use the wrapped tokenizer's
+			// apply_chat_template, but loading the full processor is the
+			// path the model card documents and avoids architecture-specific
+			// special-casing.
+			const { AutoProcessor, Gemma4ForConditionalGeneration } = this.transformers as unknown as {
+				AutoProcessor: { from_pretrained(id: string, opts?: unknown): Promise<AnyProcessor> };
+				Gemma4ForConditionalGeneration: {
+					from_pretrained(id: string, opts?: unknown): Promise<AnyModel>;
+				};
+			};
+
+			this.processor = await AutoProcessor.from_pretrained(config.modelId, {
+				progress_callback: progressCallback,
+			});
+			this.model = await Gemma4ForConditionalGeneration.from_pretrained(config.modelId, {
+				dtype: config.dtype,
+				device: 'webgpu',
+				progress_callback: progressCallback,
 			});
 
 			this.setStatus({ state: 'ready' });
@@ -108,13 +164,15 @@ export class LocalLLMEngine {
 	}
 
 	/**
-	 * Unload the model and free memory.
+	 * Unload the model and free GPU memory.
 	 */
 	async unload(): Promise<void> {
-		if (this.engine) {
-			await this.engine.unload();
-			this.engine = null;
-		}
+		// transformers.js doesn't expose an explicit dispose() yet — dropping
+		// the references and letting the runtime/GC clean up is the
+		// recommended path. The WebGPU buffers are tied to the model object
+		// and get released when it's no longer reachable.
+		this.model = null;
+		this.processor = null;
 		this.currentModel = null;
 		this.loadPromise = null;
 		this.setStatus({ state: 'idle' });
@@ -124,70 +182,85 @@ export class LocalLLMEngine {
 	 * Generate a response. Auto-loads the model if not yet loaded.
 	 */
 	async generate(options: GenerateOptions): Promise<GenerateResult> {
-		if (!this.engine) {
+		if (!this.model || !this.processor) {
 			await this.load();
 		}
 
 		const start = performance.now();
 
-		if (options.onToken) {
-			return this._generateStreaming(options, start);
-		}
-
-		const response = await this.engine!.chat.completions.create({
-			messages: options.messages,
-			temperature: options.temperature ?? 0.7,
-			max_tokens: options.maxTokens ?? 1024,
-			stream: false,
+		// Apply Gemma's chat template via the processor's tokenizer wrapper.
+		// `add_generation_prompt: true` appends the tokens that tell the model
+		// "now generate an assistant turn".
+		const inputs = await this.processor.apply_chat_template(options.messages, {
+			add_generation_prompt: true,
+			return_dict: true,
+			return_tensor: 'pt',
+		});
+
+		const promptTokenCount = this.tensorLength(inputs.input_ids);
+
+		// Streaming via TextStreamer if requested
+		let streamer: unknown = undefined;
+		if (options.onToken) {
+			const transformers = this.transformers as TransformersModule;
+			// eslint-disable-next-line @typescript-eslint/no-explicit-any
+			const TextStreamer = (transformers as any).TextStreamer;
+			streamer = new TextStreamer(this.processor.tokenizer, {
+				skip_prompt: true,
+				skip_special_tokens: true,
+				callback_function: (text: string) => {
+					options.onToken!(text);
+				},
+			});
+		}
+
+		const generated = await this.model.generate({
+			...inputs,
+			max_new_tokens: options.maxTokens ?? 1024,
+			temperature: options.temperature ?? 0.7,
+			do_sample: (options.temperature ?? 0.7) > 0,
+			streamer,
+		});
+
+		// `generated` is a tensor with shape [batch, seq_len_with_prompt].
+		// We slice off the prompt portion to get just the new tokens.
+		const fullSequence = this.tensorRow(generated, 0);
+		const newTokens = fullSequence.slice(promptTokenCount);
+		const completionTokenCount = newTokens.length;
+
+		const content: string = this.processor.tokenizer.decode(newTokens, {
+			skip_special_tokens: true,
 		});
 
-		const choice = response.choices[0];
 		return {
-			content: choice.message.content ?? '',
+			content,
 			usage: {
-				prompt_tokens: response.usage?.prompt_tokens ?? 0,
-				completion_tokens: response.usage?.completion_tokens ?? 0,
-				total_tokens: response.usage?.total_tokens ?? 0,
+				prompt_tokens: promptTokenCount,
+				completion_tokens: completionTokenCount,
+				total_tokens: promptTokenCount + completionTokenCount,
 			},
 			latencyMs: Math.round(performance.now() - start),
 		};
 	}
 
-	private async _generateStreaming(
-		options: GenerateOptions,
-		start: number
-	): Promise<GenerateResult> {
-		const chunks: string[] = [];
-		let usage = { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 };
+	/**
+	 * Helper: extract the seq-length of a transformers.js Tensor.
+	 * The tensors expose `.dims` ([batch, seq_len]) and `.data` (TypedArray).
+	 */
+	// eslint-disable-next-line @typescript-eslint/no-explicit-any
+	private tensorLength(tensor: any): number {
+		if (!tensor || !tensor.dims) return 0;
+		return tensor.dims[tensor.dims.length - 1];
+	}
 
-		const stream = await this.engine!.chat.completions.create({
-			messages: options.messages,
-			temperature: options.temperature ?? 0.7,
-			max_tokens: options.maxTokens ?? 1024,
-			stream: true,
-			stream_options: { include_usage: true },
-		});
-
-		for await (const chunk of stream) {
-			const delta = chunk.choices[0]?.delta?.content;
-			if (delta) {
-				chunks.push(delta);
-				options.onToken!(delta);
-			}
-			if (chunk.usage) {
-				usage = {
-					prompt_tokens: chunk.usage.prompt_tokens,
-					completion_tokens: chunk.usage.completion_tokens,
-					total_tokens: chunk.usage.total_tokens,
-				};
-			}
-		}
-
-		return {
-			content: chunks.join(''),
-			usage,
-			latencyMs: Math.round(performance.now() - start),
-		};
+	/**
+	 * Helper: extract row N of a 2D tensor as a number array.
+	 */
+	// eslint-disable-next-line @typescript-eslint/no-explicit-any
+	private tensorRow(tensor: any, row: number): number[] {
+		const seqLen = tensor.dims[tensor.dims.length - 1];
+		const start = row * seqLen;
+		return Array.from(tensor.data.slice(start, start + seqLen)) as number[];
 	}
 
 	/**
diff --git a/packages/local-llm/src/generate.ts b/packages/local-llm/src/generate.ts
deleted file mode 100644
index 2ac7a0112..000000000
--- a/packages/local-llm/src/generate.ts
+++ /dev/null
@@ -1,112 +0,0 @@
-import { getEngine } from './engine.js';
-
-export interface ChatMessage {
-	role: 'system' | 'user' | 'assistant';
-	content: string;
-}
-
-export interface GenerateOptions {
-	messages: ChatMessage[];
-	temperature?: number;
-	maxTokens?: number;
-	onToken?: (token: string) => void;
-}
-
-export interface GenerateResult {
-	content: string;
-	latencyMs: number;
-	usage: {
-		prompt_tokens: number;
-		completion_tokens: number;
-	};
-}
-
-export async function generate(options: GenerateOptions): Promise<GenerateResult> {
-	const engine = getEngine();
-	if (!engine) throw new Error('No model loaded. Call loadLocalLlm() first.');
-
-	const { messages, temperature = 0.7, maxTokens = 1024, onToken } = options;
-	const start = performance.now();
-
-	const reply = await engine.chat.completions.create({
-		messages,
-		temperature,
-		max_tokens: maxTokens,
-		stream: !!onToken,
-		stream_options: onToken ? { include_usage: true } : undefined,
-	});
-
-	let content = '';
-	let promptTokens = 0;
-	let completionTokens = 0;
-
-	if (Symbol.asyncIterator in Object(reply)) {
-		for await (const chunk of reply as AsyncIterable<any>) {
-			const delta = chunk.choices?.[0]?.delta?.content;
-			if (delta) {
-				content += delta;
-				onToken?.(delta);
-			}
-			if (chunk.usage) {
-				promptTokens = chunk.usage.prompt_tokens ?? 0;
-				completionTokens = chunk.usage.completion_tokens ?? 0;
-			}
-		}
-	} else {
-		const completion = reply as any;
-		content = completion.choices?.[0]?.message?.content ?? '';
-		promptTokens = completion.usage?.prompt_tokens ?? 0;
-		completionTokens = completion.usage?.completion_tokens ?? 0;
-	}
-
-	const latencyMs = Math.round(performance.now() - start);
-
-	return {
-		content,
-		latencyMs,
-		usage: { prompt_tokens: promptTokens, completion_tokens: completionTokens },
-	};
-}
-
-export async function extractJson(text: string, instruction: string): Promise<unknown> {
-	const result = await generate({
-		messages: [
-			{
-				role: 'system',
-				content:
-					'You are a JSON extraction assistant. Respond ONLY with valid JSON, no explanation or markdown.',
-			},
-			{
-				role: 'user',
-				content: `${instruction}\n\nText:\n${text}`,
-			},
-		],
-		temperature: 0.1,
-		maxTokens: 2048,
-	});
-
-	const jsonMatch = result.content.match(/[[{][\s\S]*[}\]]/);
-	if (!jsonMatch) throw new Error('No JSON found in response');
-	return JSON.parse(jsonMatch[0]);
-}
-
-export async function classify(text: string, categories: string[]): Promise<string> {
-	const result = await generate({
-		messages: [
-			{
-				role: 'system',
-				content: `You are a text classifier. Classify the text into exactly one of these categories: ${categories.join(', ')}. Respond with ONLY the category name, nothing else.`,
-			},
-			{
-				role: 'user',
-				content: text,
-			},
-		],
-		temperature: 0.1,
-		maxTokens: 50,
-	});
-
-	const response = result.content.trim().toLowerCase();
-	const match = categories.find((c) => response.includes(c.toLowerCase()));
-	return match ?? result.content.trim();
-}
diff --git a/packages/local-llm/src/models.ts b/packages/local-llm/src/models.ts
index 200219a1a..bf8ab1c69 100644
--- a/packages/local-llm/src/models.ts
+++ b/packages/local-llm/src/models.ts
@@ -2,40 +2,29 @@ import type { ModelConfig } from './types';
 
 /**
  * Pre-configured models for client-side inference.
- * All models are quantized for browser use via WebLLM/MLC.
+ *
+ * All models are ONNX builds loaded via @huggingface/transformers (transformers.js)
+ * with the WebGPU backend. The default is Google's Gemma 4 E2B — the smallest
+ * member of the Gemma 4 family released 2026-04-02. E2B stands for "Effective 2B"
+ * and is multimodal (text + image + audio) at the model level, but our chat-only
+ * code path only ever passes text.
+ *
+ * Adding a new model: pick a HuggingFace ONNX repo (look on huggingface.co/onnx-community
+ * for community-converted models, or huggingface.co/{org}/{repo}-ONNX for first-party
+ * builds), confirm it has a `q4f16` quantization in its `onnx/` directory, and add an
+ * entry below. The /llm-test page picks up new entries automatically.
  */
 
 export const MODELS = {
-	/** Default model — fast, good at structured output, multilingual */
-	'qwen-2.5-1.5b': {
-		modelId: 'Qwen2.5-1.5B-Instruct-q4f16_1-MLC',
-		displayName: 'Qwen 2.5 1.5B',
-		downloadSizeMb: 1000,
-		ramUsageMb: 1800,
-	},
-	/** Smaller variant for low-end devices */
-	'qwen-2.5-0.5b': {
-		modelId: 'Qwen2.5-0.5B-Instruct-q4f16_1-MLC',
-		displayName: 'Qwen 2.5 0.5B',
-		downloadSizeMb: 400,
-		ramUsageMb: 800,
-	},
-	/** Google Gemma 2 — strong general-purpose model, similar size class to Qwen 1.5B */
-	'gemma-2-2b': {
-		modelId: 'gemma-2-2b-it-q4f16_1-MLC',
-		displayName: 'Gemma 2 2B',
-		downloadSizeMb: 1400,
-		ramUsageMb: 2200,
-	},
-	/** Google Gemma 2 9B — much higher quality, needs a beefy GPU (~6GB VRAM) */
-	'gemma-2-9b': {
-		modelId: 'gemma-2-9b-it-q4f16_1-MLC',
-		displayName: 'Gemma 2 9B',
-		downloadSizeMb: 5300,
-		ramUsageMb: 6500,
+	'gemma-4-e2b': {
+		modelId: 'onnx-community/gemma-4-E2B-it-ONNX',
+		displayName: 'Gemma 4 E2B',
+		dtype: 'q4f16',
+		downloadSizeMb: 500,
+		ramUsageMb: 1500,
 	},
 } as const satisfies Record<string, ModelConfig>;
 
 export type ModelKey = keyof typeof MODELS;
 
-export const DEFAULT_MODEL: ModelKey = 'qwen-2.5-1.5b';
+export const DEFAULT_MODEL: ModelKey = 'gemma-4-e2b';
diff --git a/packages/local-llm/src/status.svelte.ts b/packages/local-llm/src/status.svelte.ts
deleted file mode 100644
index b37057d7f..000000000
--- a/packages/local-llm/src/status.svelte.ts
+++ /dev/null
@@ -1,22 +0,0 @@
-import { subscribe, type LlmState } from './engine.js';
-
-/**
- * Reactive status wrapper for use in Svelte 5 components.
- * Returns an object with a `current` property that updates reactively.
- */
-export function getLocalLlmStatus(): { current: LlmState } {
-	let state = $state<LlmState>({ state: 'idle' });
-
-	$effect(() => {
-		const unsub = subscribe((s) => {
-			state = s;
-		});
-		return unsub;
-	});
-
-	return {
-		get current() {
-			return state;
-		},
-	};
-}
diff --git a/packages/local-llm/src/types.ts b/packages/local-llm/src/types.ts
index 1ac245ed5..e9b0c728b 100644
--- a/packages/local-llm/src/types.ts
+++ b/packages/local-llm/src/types.ts
@@ -33,10 +33,19 @@ export interface GenerateResult {
 }
 
 export interface ModelConfig {
-	/** WebLLM model identifier */
+	/** HuggingFace ONNX repo id, e.g. "onnx-community/gemma-4-E2B-it-ONNX" */
 	modelId: string;
 	/** Human-readable name */
 	displayName: string;
+	/**
+	 * Quantization the transformers.js loader should request. Common values:
+	 *   - "fp32"   — full precision, biggest, only for tiny models
+	 *   - "fp16"   — half precision, ~50% smaller than fp32
+	 *   - "q8"     — 8-bit weights, fp32 activations
+	 *   - "q4"     — 4-bit weights, fp32 activations
+	 *   - "q4f16"  — 4-bit weights, fp16 activations (recommended for WebGPU)
+	 */
+	dtype: 'fp32' | 'fp16' | 'q8' | 'q4' | 'q4f16';
 	/** Approximate download size in MB */
 	downloadSizeMb: number;
 	/** Approximate VRAM/RAM usage in MB */
diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml
index a8ebe5073..b0be3bea5 100644
--- a/pnpm-lock.yaml
+++ b/pnpm-lock.yaml
@@ -933,6 +933,9 @@ importers:
       '@calc/shared':
         specifier: workspace:*
         version: link:../../../calc/packages/shared
+      '@huggingface/transformers':
+        specifier: ^4.0.0
+        version: 4.0.1
       '@mana/credits':
         specifier: workspace:^
         version: link:../../../../packages/credits
@@ -1011,9 +1014,6 @@ importers:
       '@mana/wallpaper-generator':
         specifier: workspace:*
         version: link:../../../../packages/wallpaper-generator
-      '@mlc-ai/web-llm':
-        specifier: ^0.2.78
-        version: 0.2.82
       '@types/suncalc':
         specifier: ^1.9.2
         version: 1.9.2
@@ -2680,9 +2680,9 @@ importers:
 
   packages/local-llm:
     dependencies:
-      '@mlc-ai/web-llm':
-        specifier: ^0.2.78
-        version: 0.2.82
+      '@huggingface/transformers':
+        specifier: ^4.0.0
+        version: 4.0.1
     devDependencies:
       '@types/node':
         specifier: ^24.10.1
@@ -3333,6 +3333,9 @@ importers:
 
   services/mana-analytics:
     dependencies:
+      '@mana/shared-hono':
+        specifier: workspace:*
+        version: link:../../packages/shared-hono
       drizzle-orm:
         specifier: ^0.38.3
         version: 0.38.4(@opentelemetry/api@1.9.1)(@types/pg@8.6.1)(@types/react@19.2.14)(bun-types@1.3.11)(kysely@0.28.15)(postgres@3.4.9)(react@19.2.0)
@@ -3360,6 +3363,9 @@ importers:
 
   services/mana-auth:
     dependencies:
+      '@mana/shared-hono':
+        specifier: workspace:*
+        version: link:../../packages/shared-hono
       bcryptjs:
         specifier: ^3.0.2
         version: 3.0.3
@@ -3399,6 +3405,9 @@ importers:
 
   services/mana-credits:
     dependencies:
+      '@mana/shared-hono':
+        specifier: workspace:*
+        version: link:../../packages/shared-hono
       bcryptjs:
         specifier: ^3.0.2
         version: 3.0.3
@@ -3574,6 +3583,9 @@ importers:
 
   services/mana-subscriptions:
     dependencies:
+      '@mana/shared-hono':
+        specifier: workspace:*
+        version: link:../../packages/shared-hono
       drizzle-orm:
         specifier: ^0.38.3
         version: 0.38.4(@opentelemetry/api@1.9.1)(@types/pg@8.6.1)(@types/react@19.2.14)(bun-types@1.3.11)(kysely@0.28.15)(postgres@3.4.9)(react@19.2.0)
@@ -3604,6 +3616,9 @@ importers:
 
   services/mana-user:
     dependencies:
+      '@mana/shared-hono':
+        specifier: workspace:*
+        version: link:../../packages/shared-hono
       drizzle-orm:
         specifier: ^0.38.3
         version: 0.38.4(@opentelemetry/api@1.9.1)(@types/pg@8.6.1)(@types/react@19.2.14)(bun-types@1.3.11)(kysely@0.28.15)(postgres@3.4.9)(react@19.2.0)
@@ -6222,6 +6237,16 @@ packages:
       '@modelcontextprotocol/sdk':
         optional: true
 
+  '@huggingface/jinja@0.5.6':
+    resolution: {integrity: sha512-MyMWyLnjqo+KRJYSH7oWNbsOn5onuIvfXYPcc0WOGxU0eHUV7oAYUoQTl2BMdu7ml+ea/bu11UM+EshbeHwtIA==}
+    engines: {node: '>=18'}
+
+  '@huggingface/tokenizers@0.1.3':
+    resolution: {integrity: sha512-8rF/RRT10u+kn7YuUbUg0OF30K8rjTc78aHpxT+qJ1uWSqxT1MHi8+9ltwYfkFYJzT/oS+qw3JVfHtNMGAdqyA==}
+
+  '@huggingface/transformers@4.0.1':
+    resolution: {integrity: sha512-tAQYEy+cnW0ku/NxBSjFXCymi+DZa1/JkoGf4McxjzO36CZZIL/J4TF6X7i/tzs75yTjshUDgsvSz03s2xym2A==}
+
   '@humanfs/core@0.19.1':
     resolution: {integrity: sha512-5DyQ4+1JEUzejeK1JGICcideyfUbGixgS9jNgex5nqkW+cY7WZhxBigmieN5Qnw9ZosSNVC9KQKyb+GUaGyKUA==}
     engines: {node: '>=18.18.0'}
@@ -6640,9 +6665,6 @@ packages:
   '@mdx-js/mdx@3.1.1':
     resolution: {integrity: sha512-f6ZO2ifpwAQIpzGWaBQT2TXxPv6z3RBzQKpVftEWN78Vl/YweF1uwussDx8ECAXVtr3Rs89fKyG9YlzUs9DyGQ==}
 
-  '@mlc-ai/web-llm@0.2.82':
-    resolution: {integrity: sha512-ONhW+28PPVSUI1m0RkJcm7suwc47b65i5b/rTEIADq5I22p1+9uf/CBbDPRkkjj1WJB9s8oFp0ywAW0NY1G6fg==}
-
   '@mozilla/readability@0.5.0':
     resolution: {integrity: sha512-Z+CZ3QaosfFaTqvhQsIktyGrjFjSC0Fa4EMph4mqKnWhmyoGICsV/8QK+8HpXut6zV7zwfWwqDmEjtk1Qf6EgQ==}
     engines: {node: '>=14.0.0'}
@@ -9221,6 +9243,10 @@ packages:
     engines: {node: '>=0.4.0'}
     hasBin: true
 
+  adm-zip@0.5.17:
+    resolution: {integrity: sha512-+Ut8d9LLqwEvHHJl1+PIHqoyDxFgVN847JTVM3Izi3xHDWPE4UtzzXysMZQs64DMcrJfBeS/uoEP4AD3HQHnQQ==}
+    engines: {node: '>=12.0'}
+
   agent-base@7.1.4:
     resolution: {integrity: sha512-MnA+YT8fwfJPgBx3m60MNqakm30XOkyIoH1y6huTQvC0PwZG7ki8NacLBcrPbNoo8vEZy7Jpuk7+jMO+CUovTQ==}
     engines: {node: '>= 14'}
@@ -9711,6 +9737,10 @@ packages:
   boolbase@1.0.0:
     resolution: {integrity: sha512-JZOSA7Mo9sNGB8+UjSgzdLtokWAky1zbztM3WRLCbZ70/3cTANmQmOdR7y2g+J0e2WXywy1yS468tY+IruqEww==}
 
+  boolean@3.2.0:
+    resolution: {integrity: sha512-d0II/GO9uf9lfUHH2BQsjxzRJZBdsjgsBiW4BvhWk/3qoKwQFjIDVN19PfX8F2D/r9PCMTtLWjYVCFrpeYUzsw==}
+    deprecated: Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.
+
   bowser@2.14.1:
     resolution: {integrity: sha512-tzPjzCxygAKWFOJP011oxFHs57HzIhOEracIgAePE4pqB3LikALKnSzUyU4MGs9/iCEUuHlAJTjTc5M+u7YEGg==}
 
@@ -10442,6 +10472,9 @@ packages:
   detect-node-es@1.1.0:
     resolution: {integrity: sha512-ypdmJU/TbBby2Dxibuv7ZLW3Bs1QEmM7nHjEANfohJLvE0XVujisn1qPJcZxg+qDucsr+bP6fLD1rPS3AhJ7EQ==}
 
+  detect-node@2.1.0:
+    resolution: {integrity: sha512-T0NIuQpnTvFDATNuHN5roPwSBG83rFsuO+MXXH9/3N1eFbn4wcPjttvjMLEPWJ0RGUYgQE7cGgS3tNxbqCGM7g==}
+
   deterministic-object-hash@2.0.2:
     resolution: {integrity: sha512-KxektNH63SrbfUyDiwXqRb1rLwKt33AmMv+5Nhsw1kqZ13SJBRTgZHtGbE+hH3a1mVW1cz+4pqSWVPAtLVXTzQ==}
     engines: {node: '>=18'}
@@ -11059,6 +11092,9 @@ packages:
     resolution: {integrity: sha512-p2snDhiLaXe6dahss1LddxqEm+SkuDvV8dnIQG0MWjyHpcMNfXKPE+/Cc0y+PhxJX3A4xGNeFCj5oc0BUh6deg==}
     engines: {node: '>=0.10'}
 
+  es6-error@4.1.1:
+    resolution: {integrity: sha512-Um/+FxMr9CISWh0bi5Zv0iOD+4cFh5qLeks1qhAopKVAJw3drgKbKySikp7wGhDL0HPeaja0P5ULZrxLkniUVg==}
+
   es6-iterator@2.0.3:
     resolution: {integrity: sha512-zw4SRzoUkd+cl+ZoE15A9o1oQd920Bb0iOJMQkQhl3jNc03YqVjAhG7scf9C5KWRU/R13Orf588uCC6525o02g==}
 
@@ -12267,6 +12303,9 @@ packages:
     resolution: {integrity: sha512-f7ccFPK3SXFHpx15UIGyRJ/FJQctuKZ0zVuN3frBo4HnK3cay9VEW0R6yPYFHC0AgqhukPzKjq22t5DmAyqGyw==}
     engines: {node: '>=16'}
 
+  flatbuffers@25.9.23:
+    resolution: {integrity: sha512-MI1qs7Lo4Syw0EOzUl0xjs2lsoeqFku44KpngfIduHBYvzm8h2+7K8YMQh1JtVVVrUvhLpNwqVi4DERegUJhPQ==}
+
   flatted@3.4.2:
     resolution: {integrity: sha512-PjDse7RzhcPkIJwy5t7KPWQSZ9cAbzQXcafsetQoD7sOJRQlGikNbx7yZp2OotDnJyrDcbyRq3Ttb18iYOqkxA==}
 
@@ -12490,6 +12529,10 @@ packages:
     engines: {node: '>=16 || 14 >=14.17'}
     deprecated: Old versions of glob are not supported, and contain widely publicized security vulnerabilities, which have been fixed in the current version. Please update. Support for old versions may be purchased (at exorbitant rates) by contacting i@izs.me
 
+  global-agent@3.0.0:
+    resolution: {integrity: sha512-PT6XReJ+D07JvGoxQMkT6qji/jVNfX/h364XHZOWeRzy64sSFr+xJ5OX7LI3b4MPQzdL4H8Y8M0xzPpsVMwA8Q==}
+    engines: {node: '>=10.0'}
+
   globals@13.24.0:
     resolution: {integrity: sha512-AhO5QUcj8llrbG09iWhPU2B204J1xnPeL8kQmVorSsy+Sjj1sk8gIyh6cUocGmH4L0UuhAJy+hJMRA4mgA4mFQ==}
     engines: {node: '>=8'}
@@ -12542,6 +12585,9 @@ packages:
     resolution: {integrity: sha512-5v6yZd4JK3eMI3FqqCouswVqwugaA9r4dNZB1wwcmrD02QkV5H0y7XBQW8QwQqEaZY1pM9aqORSORhJRdNK44Q==}
     engines: {node: '>=6.0'}
 
+  guid-typescript@1.0.9:
+    resolution: {integrity: sha512-Y8T4vYhEfwJOTbouREvG+3XDsjr8E3kIr7uf+JZ0BYloFsttiHU0WfvANVsR7TxNUJa/WpCnw/Ino/p+DeBhBQ==}
+
   h3@1.15.11:
     resolution: {integrity: sha512-L3THSe2MPeBwgIZVSH5zLdBBU90TOxarvhK9d04IDY2AmVS8j2Jz2LIWtwsGOU3lu2I5jCN7FNvVfY2+XyF+mg==}
 
@@ -13366,6 +13412,9 @@ packages:
     resolution: {integrity: sha512-qtYiSSFlwot9XHtF9bD9c7rwKjr+RecWT//ZnPvSmEjpV5mmPOCN4j8UjY5hbjNkOwZ/jQv3J6R1/pL7RwgMsg==}
     engines: {node: '>= 0.4'}
 
+  json-stringify-safe@5.0.1:
+    resolution: {integrity: sha512-ZClg6AaYvamvYEE82d3Iyd3vSSIjQ+odgjaTzRuO3s7toCdFKczob2i0zCh7JE8kWn17yvAWhUVxvqGwUalsRA==}
+
   json5@1.0.2:
     resolution: {integrity: sha512-g1MWMLBiz8FKi1e4w0UyVL3w+iJceWAFBAaBnnGKOpNa5f8TLktkbre1+s6oICydWAm+HRUGTmI+//xv2hvXYA==}
     hasBin: true
@@ -13691,10 +13740,6 @@ packages:
     resolution: {integrity: sha512-9ie8ItPR6tjY5uYJh8K/Zrv/RMZ5VOlOWvtZdEHYSTFKZfIBPQa9tOAEeAWhd+AnIneLJ22w5fjOYtoutpWq5w==}
     engines: {node: '>=18'}
 
-  loglevel@1.9.2:
-    resolution: {integrity: sha512-HgMmCqIJSAKqo68l0rS2AanEWfkxaZ5wNiEFb5ggm08lDs9Xl2KxBlX3PTcaD2chBM1gXAYf491/M2Rv8Jwayg==}
-    engines: {node: '>= 0.6.0'}
-
   long@5.3.2:
     resolution: {integrity: sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==}
 
@@ -13786,6 +13831,10 @@ packages:
   marky@1.3.0:
     resolution: {integrity: sha512-ocnPZQLNpvbedwTy9kNrQEsknEfgvcLMvOtz3sFeWApDq1MXH1TqkCIx58xlpESsfwQOnuBO9beyQuNGzVvuhQ==}
 
+  matcher@3.0.0:
+    resolution: {integrity: sha512-OkeDaAZ/bQCxeFAozM55PKcKU0yJMPGifLwV4Qgjitu+5MoAfSQN4lsLJeXZ1b8w0x+/Emda6MZgXS1jvsapng==}
+    engines: {node: '>=10'}
+
   math-intrinsics@1.1.0:
     resolution: {integrity: sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==}
     engines: {node: '>= 0.4'}
@@ -14483,6 +14532,19 @@ packages:
   oniguruma-to-es@4.3.5:
     resolution: {integrity: sha512-Zjygswjpsewa0NLTsiizVuMQZbp0MDyM6lIt66OxsF21npUDlzpHi1Mgb/qhQdkb+dWFTzJmFbEWdvZgRho8eQ==}
 
+  onnxruntime-common@1.24.0-dev.20251116-b39e144322:
+    resolution: {integrity: sha512-BOoomdHYmNRL5r4iQ4bMvsl2t0/hzVQ3OM3PHD0gxeXu1PmggqBv3puZicEUVOA3AtHHYmqZtjMj9FOfGrATTw==}
+
+  onnxruntime-common@1.24.3:
+    resolution: {integrity: sha512-GeuPZO6U/LBJXvwdaqHbuUmoXiEdeCjWi/EG7Y1HNnDwJYuk6WUbNXpF6luSUY8yASul3cmUlLGrCCL1ZgVXqA==}
+
+  onnxruntime-node@1.24.3:
+    resolution: {integrity: sha512-JH7+czbc8ALA819vlTgcV+Q214/+VjGeBHDjX81+ZCD0PCVCIFGFNtT0V4sXG/1JXypKPgScQcB3ij/hk3YnTg==}
+    os: [win32, darwin, linux]
+
+  onnxruntime-web@1.25.0-dev.20260327-722743c0e2:
+    resolution: {integrity: sha512-8PXdZy4Ekhg10CLg+cFFt39b4tFDGMRJB6lGjnQL6eA+2boUQYDymZ0gtxiS+H6oIWoCjQp/ziyirvFbaFKfiw==}
+
   open@7.4.2:
     resolution: {integrity: sha512-MVHddDVweXZF3awtlAS+6pgKLlm/JgxZ90+/NBurBoQctVOOB/zDdVjcyPzQ+0laDGbsWgrRkflI65sQeOgT9Q==}
     engines: {node: '>=8'}
@@ -14754,6 +14816,9 @@ packages:
     resolution: {integrity: sha512-nDywThFk1i4BQK4twPQ6TA4RT8bDY96yeuCVBWL3ePARCiEKDRSrNGbFIgUJpLp+XeIR65v8ra7WuJOFUBtkMA==}
     engines: {node: '>=8'}
 
+  platform@1.3.6:
+    resolution: {integrity: sha512-fnWVljUchTro6RiCFvCXBbNhJc2NijN7oIQxbwsyL0buWJPG85v81ehlHI9fXrJsMNgTofEoWIQeClKpgxFLrg==}
+
   playwright-core@1.59.1:
     resolution: {integrity: sha512-HBV/RJg81z5BiiZ9yPzIiClYV/QMsDCKUyogwH9p3MCP6IYjUFu/MActgYAvK0oWyV9NlwM3GLBjADyWgydVyg==}
     engines: {node: '>=18'}
@@ -15732,6 +15797,10 @@ packages:
     deprecated: Rimraf versions prior to v4 are no longer supported
     hasBin: true
 
+  roarr@2.15.4:
+    resolution: {integrity: sha512-CHhPh+UNHD2GTXNYhPWLnU8ONHdI+5DI+4EYIAOaiD63rHeYlZvyh8P+in5999TTSFgUYuKUAjzRI4mdh/p+2A==}
+    engines: {node: '>=8.0'}
+
   rollup@2.80.0:
     resolution: {integrity: sha512-cIFJOD1DESzpjOBl763Kp1AH7UE/0fcdHe6rZXUdQ9c50uvgigvW97u3IcSeBwOkgqL/PXPBktBCh0KEu5L8XQ==}
     engines: {node: '>=10.0.0'}
@@ -15836,6 +15905,9 @@ packages:
     resolution: {integrity: sha512-vfD3pmTzGpufjScBh50YHKzEu2lxBWhVEHsNGoEXmCmn2hKGfeNLYMzCJpe8cD7gqX7TJluOVpBkAequ6dgMmA==}
     engines: {node: '>=4'}
 
+  semver-compare@1.0.0:
+    resolution: {integrity: sha512-YM3/ITh2MJ5MtzaM429anh+x2jiLVjqILF4m4oyQB18W7Ggea7BfqdH/wGMK7dDiMghv/6WG7znWMwUDzJiXow==}
+
   semver@6.3.1:
     resolution: {integrity: sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==}
     hasBin: true
@@ -15868,6 +15940,10 @@ packages:
     resolution: {integrity: sha512-ghgmKt5o4Tly5yEG/UJp8qTd0AN7Xalw4XBtDEKP655B699qMEtra1WlXeE6WIvdEG481JvRxULKsInq/iNysw==}
     engines: {node: '>=0.10.0'}
 
+  serialize-error@7.0.1:
+    resolution: {integrity: sha512-8I8TjW5KMOKsZQTvoxjuSIa7foAwPWGOts+6o7sgjz41/qMD9VQHEDxi6PBvK2l0MXUmqZyNpUK+T2tQaaElvw==}
+    engines: {node: '>=10'}
+
   serialize-javascript@6.0.2:
     resolution: {integrity: sha512-Saa1xPByTTq2gdeFZYLLo+RFE35NHZkAbqZeWNd3BpzppeVisAqpDjcp8dyf6uIvEqJRd46jemmyA4iFIeVk8g==}
 
@@ -16058,6 +16134,9 @@ packages:
   sprintf-js@1.0.3:
     resolution: {integrity: sha512-D9cPgkvLlV3t3IzL0D0YLvGA9Ahk4PcvVwUbN0dSGr1aP0Nrt4AEnTUbuGvquEC0mA64Gqt1fzirlRs5ibXx8g==}
 
+  sprintf-js@1.1.3:
+    resolution: {integrity: sha512-Oo+0REFV59/rz3gfJNKQiBlwfHaSESl1pcGyABQsnnIfWOFt6JNj5gCog2U6MLZ//IGYD+nA8nI+mTShREReaA==}
+
   stable-hash@0.0.5:
     resolution: {integrity: sha512-+L3ccpzibovGXFK+Ap/f8LOS0ahMrHTf3xu7mMLSpEGU0EO9ucaysSylKo9eRDFNhWve/y275iPmIZ4z39a9iA==}
 
@@ -16628,6 +16707,10 @@ packages:
     resolution: {integrity: sha512-0fr/mIH1dlO+x7TlcMy+bIDqKPsw/70tVyeHW787goQjhmqaZe10uwLujubK9q9Lg6Fiho1KUKDYz0Z7k7g5/g==}
     engines: {node: '>=4'}
 
+  type-fest@0.13.1:
+    resolution: {integrity: sha512-34R7HTnG0XIJcBSn5XhDd7nNFPRcXYRZrBB2O2jdKqYODldSzBAqzsWoZYYvduky73toYS/ESqxPvkDf/F0XMg==}
+    engines: {node: '>=10'}
+
   type-fest@0.16.0:
     resolution: {integrity: sha512-eaBzG6MxNzEn9kiwvtre90cXaNLkmadMWa1zQMs3XORCXNbsH/OewwbxC5ia9dCxIxnTAsSxXJaa/p5y8DlvJg==}
     engines: {node: '>=10'}
@@ -21353,6 +21436,18 @@ snapshots:
       - supports-color
       - utf-8-validate
 
+  '@huggingface/jinja@0.5.6': {}
+
+  '@huggingface/tokenizers@0.1.3': {}
+
+  '@huggingface/transformers@4.0.1':
+    dependencies:
+      '@huggingface/jinja': 0.5.6
+      '@huggingface/tokenizers': 0.1.3
+      onnxruntime-node: 1.24.3
+      onnxruntime-web: 1.25.0-dev.20260327-722743c0e2
+      sharp: 0.34.5
+
   '@humanfs/core@0.19.1': {}
 
   '@humanfs/node@0.16.7':
@@ -21850,10 +21945,6 @@ snapshots:
     transitivePeerDependencies:
       - supports-color
 
-  '@mlc-ai/web-llm@0.2.82':
-    dependencies:
-      loglevel: 1.9.2
-
   '@mozilla/readability@0.5.0': {}
 
   '@msgpackr-extract/msgpackr-extract-darwin-arm64@3.0.3':
@@ -26002,6 +26093,8 @@ snapshots:
 
   acorn@8.16.0: {}
 
+  adm-zip@0.5.17: {}
+
   agent-base@7.1.4: {}
 
   agentkeepalive@4.6.0:
@@ -26885,6 +26978,8 @@ snapshots:
 
   boolbase@1.0.0: {}
 
+  boolean@3.2.0: {}
+
   bowser@2.14.1: {}
 
   boxen@8.0.1:
@@ -27634,6 +27729,8 @@ snapshots:
 
   detect-node-es@1.1.0: {}
 
+  detect-node@2.1.0: {}
+
   deterministic-object-hash@2.0.2:
     dependencies:
       base-64: 1.0.0
@@ -27973,6 +28070,8 @@ snapshots:
       esniff: 2.0.1
       next-tick: 1.1.0
 
+  es6-error@4.1.1: {}
+
   es6-iterator@2.0.3:
     dependencies:
       d: 1.0.2
@@ -30635,6 +30734,8 @@ snapshots:
       flatted: 3.4.2
       keyv: 4.5.4
 
+  flatbuffers@25.9.23: {}
+
   flatted@3.4.2: {}
 
   flattie@1.1.1: {}
@@ -30888,6 +30989,15 @@ snapshots:
       minipass: 4.2.8
       path-scurry: 1.11.1
 
+  global-agent@3.0.0:
+    dependencies:
+      boolean: 3.2.0
+      es6-error: 4.1.1
+      matcher: 3.0.0
+      roarr: 2.15.4
+      semver: 7.7.4
+      serialize-error: 7.0.1
+
   globals@13.24.0:
     dependencies:
       type-fest: 0.20.2
@@ -30942,6 +31052,8 @@ snapshots:
       section-matter: 1.0.0
       strip-bom-string: 1.0.0
 
+  guid-typescript@1.0.9: {}
+
   h3@1.15.11:
     dependencies:
       cookie-es: 1.2.3
@@ -32170,6 +32282,8 @@ snapshots:
       jsonify: 0.0.1
       object-keys: 1.1.1
 
+  json-stringify-safe@5.0.1: {}
+
   json5@1.0.2:
     dependencies:
       minimist: 1.2.8
@@ -32449,8 +32563,6 @@ snapshots:
       strip-ansi: 7.2.0
       wrap-ansi: 9.0.2
 
-  loglevel@1.9.2: {}
-
   long@5.3.2: {}
 
   longest-streak@3.1.0: {}
@@ -32528,6 +32640,10 @@ snapshots:
 
   marky@1.3.0: {}
 
+  matcher@3.0.0:
+    dependencies:
+      escape-string-regexp: 4.0.0
+
   math-intrinsics@1.1.0: {}
 
   mdast-util-definitions@6.0.0:
@@ -33786,6 +33902,25 @@ snapshots:
       regex: 6.1.0
       regex-recursion: 6.0.2
 
+  onnxruntime-common@1.24.0-dev.20251116-b39e144322: {}
+
+  onnxruntime-common@1.24.3: {}
+
+  onnxruntime-node@1.24.3:
+    dependencies:
+      adm-zip: 0.5.17
+      global-agent: 3.0.0
+      onnxruntime-common: 1.24.3
+
+  onnxruntime-web@1.25.0-dev.20260327-722743c0e2:
+    dependencies:
+      flatbuffers: 25.9.23
+      guid-typescript: 1.0.9
+      long: 5.3.2
+      onnxruntime-common: 1.24.0-dev.20251116-b39e144322
+      platform: 1.3.6
+      protobufjs: 7.5.4
+
   open@7.4.2:
     dependencies:
       is-docker: 2.2.1
@@ -34094,6 +34229,8 @@ snapshots:
     dependencies:
       find-up: 3.0.0
 
+  platform@1.3.6: {}
+
   playwright-core@1.59.1: {}
 
   playwright@1.59.1:
@@ -35557,6 +35694,15 @@ snapshots:
     dependencies:
       glob: 7.2.3
 
+  roarr@2.15.4:
+    dependencies:
+      boolean: 3.2.0
+      detect-node: 2.1.0
+      globalthis: 1.0.4
+      json-stringify-safe: 5.0.1
+      semver-compare: 1.0.0
+      sprintf-js: 1.1.3
+
   rollup@2.80.0:
     optionalDependencies:
       fsevents: 2.3.3
@@ -35690,6 +35836,8 @@ snapshots:
       extend-shallow: 2.0.1
       kind-of: 6.0.3
 
+  semver-compare@1.0.0: {}
+
   semver@6.3.1: {}
 
   semver@7.6.3: {}
@@ -35720,6 +35868,10 @@ snapshots:
 
   serialize-error@2.1.0: {}
 
+  serialize-error@7.0.1:
+    dependencies:
+      type-fest: 0.13.1
+
   serialize-javascript@6.0.2:
     dependencies:
       randombytes: 2.1.0
@@ -35974,6 +36126,8 @@ snapshots:
 
   sprintf-js@1.0.3: {}
 
+  sprintf-js@1.1.3: {}
+
   stable-hash@0.0.5: {}
 
   stack-utils@2.0.6:
@@ -36604,6 +36758,8 @@ snapshots:
 
   type-detect@4.0.8: {}
 
+  type-fest@0.13.1: {}
+
   type-fest@0.16.0: {}
 
   type-fest@0.20.2: {}