After yesterday's type-check cascade repair (c34175afa), the root
\`pnpm run type-check\` progressed through 5 more packages but still
stopped on two pre-existing failures:
- \`services/mana-media\` delivery route: \`c.body(transformedBuffer)\`
passed a Node \`Buffer<ArrayBufferLike>\`, but Hono 4.7 types the
body argument as \`Uint8Array<ArrayBuffer>\` (strict — no
ArrayBufferLike). \`Uint8Array.from(buf)\` gives a clean copy with a
fresh \`ArrayBuffer\` backing that the strict type accepts. Runtime
cost for a handful of KB per image transform is negligible next to
the Sharp pipeline that produced the buffer.
- \`packages/shared-llm\`: same rune issue as local-stt + local-llm —
\`store.svelte.ts\` uses \`$state\` and transitively pulls in
\`local-llm/src/svelte.svelte.ts\`. Plain tsc can't resolve Svelte 5
runes. Same treatment: \`type-check\` script explicitly skips with a
message pointing at svelte-check.
Root \`pnpm run type-check\` now reaches \`@context/mobile\`, which has
real code-level type errors (adapter shape mismatches, an RN event-
handler typing drift, and a deleted Supabase module still imported by
\`utils/supabaseTest.ts\`). Those need domain changes, not config
tweaks — out of scope for this repair pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move getUserMessage() to the base LlmError class so every error type
gets a German explanation with a clickable settings deep-link:
- TierTooLowError: "Kein KI-Modell aktiviert. Mindestens X benötigt."
- ProviderBlockedError: "… hat die Anfrage blockiert (Inhaltsfilter)."
- BackendUnreachableError: "… ist nicht erreichbar."
- EdgeLoadFailedError: "Browser-Modell konnte nicht geladen werden."
- Generic fallback: also includes the settings link now
The companion engine now catches LlmError (base class) instead of
only NoTierAvailableError, covering all failure modes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Error messages now include a clickable Markdown link
"KI-Einstellungen öffnen" that navigates to /?app=settings#ai-options,
which opens the settings panel in the workbench, switches to the AI
tab, and scrolls to the LLM options section.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Track skip reasons per tier in the orchestrator (no-consent,
no-backend, not-available, not-ready, runtime-error) and expose
them via NoTierAvailableError.getUserMessage() with actionable
German text pointing the user to the right settings page.
Before: "No tier could run task 'companion.chat' (attempted: cloud)"
After: "Cloud (Gemini): Cloud-Einwilligung fehlt. Aktiviere sie
unter Einstellungen → KI."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enable real-time token streaming during the planner "calling-llm" phase
so the user sees live progress ("empfange Plan… 128 tokens") instead of
a static spinner. The parser still receives the full text once complete —
no partial-JSON risk.
Changes:
- Extract shared SSE parser from playground into @mana/shared-llm/sse-parser
- remote.ts: use stream:true when onToken callback is provided
- AiPlanInput: add optional onToken field (shared-ai)
- ai-plan task: pass onToken through to backend.generate()
- runner.ts: throttled (500ms) phaseDetail updates during streaming
- Playground: refactored to use shared SSE parser
Also includes: AI agent architecture comparison report (docs/reports/)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gemini-2.0-flash is deprecated June 1 2026. gemini-2.5-flash has been
stable since Q1 2026 with similar pricing ($0.15/$0.60 per 1M tokens
vs $0.10/$0.40 — pricing table already had the entry).
Three files touched:
- packages/shared-llm/src/backends/cloud.ts — client default
- services/mana-llm/src/config.py — server default
- services/mana-llm/src/providers/google.py — Ollama→Gemini fallback
map + constructor default + deduplicated model list
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Following the shared-icons fix (d5cabed14), audit every workspace
package's src/index.ts for top-level side effects and flag the
ones that are safe to tree-shake:
- Pure TS re-export barrels (types, theme, utils, llm, storage):
"sideEffects": false — lets Vite prune entire submodules when a
consumer only imports a subset of named exports. Matters most for
shared-llm where the orchestrator/BYOK branch isn't needed on
every route.
- Packages that ship .svelte components (branding, ui, links):
"sideEffects": ["**/*.svelte", "**/*.css"] — same tree-shaking
benefit for TS modules, but keeps Svelte component CSS injection
intact.
The state-holding submodules (shared-ui drag-state/toast,
shared-llm store, shared-links mutations) are still evaluated
whenever their exports are referenced, so behaviour is unchanged —
the flag only lets the bundler skip modules that aren't in the
dependency graph at all.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug: setting taskOverrides['companion.chat'] = 'byok' didn't work
when the user's allowedTiers was empty/['none']. The tier-too-low
check in run() compared task.minTier ('browser') against userMaxTier
('none') and threw TierTooLowError before the override was even read.
Same issue in canRun() and candidateTiers().
Fix: when a per-task override exists, treat it as opt-in to that tier
even if not in the global allowedTiers. The override is the user's
explicit per-task signal — overriding the global default is exactly
what an override is for.
- run(): effectiveMaxTier = max(override, userMaxTier)
- candidateTiers(task, override): adds override to baseTiers
- canRun(): now passes the override to candidateTiers
The Companion chat now correctly uses BYOK when selected from the
toolbar, even if the user hasn't enabled BYOK in their global LLM
settings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 1-3 of BYOK support. Introduces a 5th LLM tier 'byok' that
routes to user-provided API keys via direct browser fetches.
shared-llm additions:
- LlmTier extended with 'byok' (rank 3, between mana-server and cloud)
- ByokBackend: LlmBackend implementation that delegates key lookup
to an app-provided resolver callback, then dispatches to the right
provider adapter
- 4 provider adapters:
- OpenAI (gpt-5, gpt-4o, o1 family)
- Anthropic (Claude Opus/Sonnet/Haiku 4.6) with CORS header
- Gemini (2.5 Pro/Flash) — REST API with different message format
- Mistral — OpenAI-compatible, reuses shared openai-compat adapter
- Pricing table for 20+ models with USD per 1M tokens
- estimateCost() + formatCost() helpers
Keys stay device-local (IndexedDB in next phase). Browser-direct
fetches mean keys never touch Mana's server.
Updates two existing tier maps (memoro DetailView, SourceBadge) to
include the new tier.
Planning doc at docs/architecture/BYOK_PLAN.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redis: allkeys-lru → noeviction to prevent silent data loss when memory full
- mana-media: --watch → --hot to fix EADDRINUSE crash on Bun HMR reload
- Svelte: build initial values before $state() to avoid state_referenced_locally warnings
in create-app-onboarding.svelte.ts and shared-llm/store.svelte.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In commit c9e16243c (the gemma3:4b → gemma4:e4b switch) I sloppily
wrote in the ManaServerBackend docstring that mana-llm "routes them
to the local Ollama instance on the Mac Mini (running on the M4's
Metal GPU)". That is wrong AND it's the exact misconception I had
to debug-out-of earlier the same day.
The actual topology — already documented correctly in
docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I
just didn't read those before writing the docstring:
mana-llm container's OLLAMA_URL points at host.docker.internal:13434
→ ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini)
→ 192.168.178.11:11434 (LAN)
→ Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM)
→ Inference
The Mac Mini's brew-installed Ollama binary is NOT on the inference
path. It's just a CLI for inspecting the proxied daemon. Today's
"why does the Mac Mini still have Ollama 0.15.4" puzzle has the
answer "because nothing on the Mac Mini actually runs inference, the
binary version was never load-bearing".
Two doc fixes:
1. packages/shared-llm/src/backends/mana-server.ts
Replace the lying docstring with the real topology, including a
pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md
sections that document it. Also note that gemma4:e4b is a
reasoning model that emits message.reasoning when given enough
tokens (cross-reference to remote.ts's fallback parser).
2. packages/local-llm/CLAUDE.md
Add a paragraph at the top explaining the difference between
"@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm
"mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py
→ RTX 3090). This was implicit before — "not related to
services/mana-llm" — but didn't say where mana-server actually
goes. Future me reading the doc would still have to dig through
the docker-compose env to find out.
No code changes — only docstring + markdown.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reasoning-style models (Gemma 4 E4B is the first one we use, but
DeepSeek R1, Gemini 2.5 thinking, etc. behave the same way) split
their output into two fields:
- message.content — the final answer
- message.reasoning — the chain-of-thought leading up to it
When the model is given too few max_tokens to finish reasoning AND
emit content, the response comes back with content="" and reasoning
populated with the half-finished thought. Verified empirically with
gemma4:e4b and `max_tokens: 10` on a "Sage Hi auf Deutsch in einem
Wort" prompt — content was "" while reasoning had "Here's a
thinking process to..." (cut off mid-thought).
For the title task this rarely matters because the system prompt is
directive enough to skip the thinking phase (verified: same gemma4:
e4b returns clean 7-token titles like "Sonnenstrahlen genießen
heute" with the standard system prompt + max_tokens 32). But it's
a real failure mode for any future task that uses a less-directive
prompt or hits a longer reasoning chain.
Defensive fix: prefer message.content first, fall back to
message.reasoning if content is empty. The fallback is a string-or-
nothing operation, no semantic interpretation — if the reasoning
field happens to contain a usable answer fragment, the caller's
cleanup chain (e.g. generateTitleTask's strip-quotes-and-dots
pipeline) will normalize it. If it's truly half-finished thought,
the caller's runRules fallback still kicks in via the existing
empty-result detection.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two surprises came out of "why do we still use Gemma 3 instead of 4":
1. The hardcoded default in ManaServerBackend was `gemma3:4b`, which
was even smaller than mana-llm's actual server-side default of
`gemma3:12b`. My initial guess from docs/LOCAL_LLM_MODELS.md was
conservative.
2. The mana-llm OLLAMA_URL points at host.docker.internal:13434,
which is NOT the Mac Mini's local Ollama — it's a Python TCP
forwarder (~/gpu-proxy.py) that proxies to 192.168.178.11:11434
on the Windows GPU server. So title generation has been running
on the RTX 3090 the whole time, not on the M4 Metal GPU. The
Mac Mini's brew-installed ollama 0.15.4 wasn't even being used
for inference — only as a CLI to inspect the proxied Ollama.
To get to Gemma 4, both Ollama instances needed an upgrade:
- Mac Mini brew : 0.15.4 → 0.20.4 (cosmetic, the binary isn't on
the inference path; upgraded for consistency)
- GPU server : 0.18.2 → 0.20.4 via winget. Required restarting
the daemon via the OllamaServe scheduled task
that was already configured.
Then `ollama pull gemma4:e4b` on the GPU server (9.6 GB, ~10 min on
the LAN). Verified end-to-end via the proxy with a real chat
completion request to mana-llm — gemma4:e4b answered with a clean
4-word German title for a sample voice memo prompt:
prompt: "Erstelle einen kurzen 3-Wort Titel für: Es ist ein
schöner Tag heute am 9. April"
→ "Schöner Tag, neuntes April"
Changes in this commit:
packages/shared-llm/src/backends/mana-server.ts
- defaultModel: 'gemma3:4b' → 'gemma4:e4b'
- Updated docstring to explain why E4B is the right Mana-Server
tier default: 9.6 GB on disk, 128K context, "Effective 4B"
arch punches above its weight class for German prompts, and
the family stays consistent with the browser tier (Gemma 4
E2B is the smaller sibling) so the source label and prompt
behavior remain coherent across tiers.
apps/mana/apps/web/src/lib/modules/memoro/views/DetailView.svelte
- TITLE_SOURCE_LABELS map updated:
browser → "Auf deinem Gerät (Gemma 4 E2B)" (was "(Gemma 4)")
mana-server → "Mana-Server (Gemma 4 E4B)" (was "(gemma3:4b)")
- The label now reflects that BOTH the browser and the mana-server
tier are running Gemma 4 variants, which is more honest than
the previous mix.
Did NOT change:
- The Ollama OLLAMA_DEFAULT_MODEL env var in docker-compose.macmini.yml
(still gemma3:12b). That's the fallback for callers who don't
specify a model in their request. Our generate-title task always
sends an explicit model string, so it's unaffected. Bumping the
global default is a separate decision — it would change behavior
for the playground module and any other consumer that relies on
the implicit fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diagnosis from the user's last test pinpointed the bug: mana-llm
returns totalFrames=0 (no SSE frames at all) when called from the
browser, but works perfectly when called via curl from the same host
with the same payload. Two compounding causes:
1. credentials: 'include' in our fetch combined with mana-llm's
CORS headers silently breaks the response body. This is the
classic "Access-Control-Allow-Origin: * + Allow-Credentials: true"
mismatch — browsers reject the response per spec but report it
as a 0-byte success rather than an error.
2. Streaming over CORS adds a second layer of fragility. Even if
credentials weren't an issue, the browser fetch API's response
body for SSE under CORS depends on a specific combination of
server headers we evidently don't have.
Fix: drop both the streaming AND the credentials.
- stream: false in the request body. Single JSON response per call,
much friendlier to the browser fetch API.
- No `credentials` field at all (default 'same-origin' for cross-
origin requests = don't send cookies). mana-llm's API key
middleware accepts anonymous requests, so we don't need to send
any auth context.
- Parse the response as `await res.json()` instead of streaming
SSE chunks. Pull `choice.message.content` (or fall back to
`choice.text` for legacy completions API responses).
- Backwards-compatibility shim for `req.onToken`: if a caller
registered a token callback (legacy chat-style streaming UX),
fire it ONCE with the full content at the end. The current
orchestrator + queue model never consumes per-token streams for
remote tiers, so this is a degraded-but-equivalent path. The
playground module uses its own client and isn't affected.
Verified manually with curl:
$ curl -X POST https://llm.mana.how/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"gemma3:4b","messages":[{"role":"user","content":"Hi"}],"max_tokens":50,"stream":false}'
→ returns clean JSON with `choices[0].message.content` populated.
Same call with `stream: true` from the same host also works (full
SSE frames come back). The bug really is browser+credentials
specific, not a service bug.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
User test on the mana-server tier showed Ollama gemma3:4b returning
LITERALLY empty content for the title task, which is much weirder
than the small browser model misbehaving. Three layered fixes plus
diagnostics that will tell us what's actually happening over the
wire next time.
1. remote.ts: SSE diagnostics + liberal field shape
The mana-llm /v1/chat/completions endpoint claims OpenAI
compatibility, but different upstream providers (Ollama, OpenAI,
Gemini) wrap their token text in different field paths inside
the SSE delta. Be liberal in what we accept:
- choice.delta.content (canonical OpenAI)
- choice.delta.text (some Ollama-compat shims)
- choice.message.content (non-streaming response embedded in stream)
- choice.text (legacy completion API)
Plus: count totalFrames + dataFrames + capture firstFrameRaw +
firstFrameParsed during the stream. When `collected` is empty at
the end of the stream, dump all of that to console.warn so the
next test session shows us exactly what mana-llm is sending. This
is the only reliable way to debug "empty completion" without a
network sniffer in the user's browser.
2. generate-title.ts: drop few-shot, use simple system+user prompt
The previous few-shot prompt with three `Aufnahme: "..."\nTitel: ...`
examples was apparently too much for Ollama gemma3:4b on the
mana-server tier — it returned literal "" for reasons we don't
fully understand (chat-template confusion with the embedded
quotes? multi-section format? some quirk of how mana-llm formats
the messages for Ollama?). Either way, the failure mode is clear.
Replace with a minimal two-message format:
- system: "Du erzeugst einen kurzen Titel (3-5 Wörter)..."
- user: <transcript>
Same instruction, much simpler shape. Bumped maxTokens 24 → 32
to give the model breathing room.
3. generate-title.ts: rules fallback detects sentence fragments
Even when the LLM fails and we fall through to runRules, the
previous heuristic for medium-length transcripts (10-20 words)
would extract the first 7 words verbatim — which for a typical
"Eine kleine Testaufnahme um zu sehen ob alles funktioniert" memo
produces "Eine kleine Testaufnahme, um zu sehen, ob" as the
"title". That's a sentence fragment ending mid-thought, not a
title. Worse than "Memo vom 9. April 2026".
Add a "looks like a sentence fragment" heuristic: if the last
word of the extracted slice is a German stop-word or article
(und/oder/wenn/ob/zu/um/der/die/das/ein/...) the result is
clearly mid-clause. In that case fall through to dateLabel()
instead of writing the fragment.
Stop-word list is curated to 30 entries — common conjunctions,
articles, prepositions, auxiliaries. Not exhaustive but catches
the typical "first 7 words of a German sentence" failure mode.
After this commit lands, the next test will surface in the console
EITHER:
- the actual delta shape mana-llm is using (so we know if our
parser is wrong or if the model is genuinely silent)
- a real LLM-generated title (if the simpler prompt worked)
- "Memo vom <date>" via the rules fallback (if the LLM still
fails but the rules fragment detection caught the bad slice)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Until now, modules wanting to use the orchestrator had to await each
LLM call inline in their store code. That's fine for foreground tasks
("user clicked summarize") but a non-starter for background work
("auto-tag every new note", "generate a title for every voice memo
after STT finishes"). Background tasks need to:
- Queue up while no LLM tier is ready, then drain when one becomes
available (e.g. user just enabled the browser tier from settings)
- Survive page reloads, browser restarts, and the user navigating
away mid-execution
- Run one at a time without blocking the foreground UI
- Allow modules to subscribe to results reactively without polling
- Retry transient failures (network, model loading) but not
semantic ones (tier-too-low, content blocked)
Phase 4 ships exactly that.
Architecture:
packages/shared-llm/src/queue.ts — LlmTaskQueue class
+ QueuedTask interface (the persistent row shape)
+ EnqueueOptions (refType/refId/priority/maxAttempts)
+ TaskRegistry type (name → LlmTask map)
+ LlmTaskQueueOptions (table + orchestrator + registry +
retryBackoffMs + idleWakeupMs)
Public API:
- enqueue(task, input, opts) → string (returns the queued id)
- get(id), list(filter)
- retry(id), cancel(id), purge(olderThanMs)
- start(), stop() (idempotent processor lifecycle)
apps/mana/apps/web/src/lib/llm-queue.ts — web app singleton
- Dedicated `mana-llm-queue` Dexie database (separate from the
main `mana` IDB; see comment for the rationale: ephemeral
per-device state, no encryption needed, no sync needed, doesn't
belong in the long-frozen `mana` schema)
- Wires up the queue with llmOrchestrator + taskRegistry
- Exposes startLlmQueue() / stopLlmQueue() for the layout hook
apps/mana/apps/web/src/lib/llm-task-registry.ts
- Maps task names → task objects so the queue processor can
look up the implementation when pulling rows off the table.
Closures can't be persisted, so we round-trip via name.
- Currently registers extractDateTask + summarizeTextTask;
module-side tasks land here as we add them.
apps/mana/apps/web/src/routes/(app)/+layout.svelte
- startLlmQueue() in handleAuthReady's Phase A (auth-independent)
so guests + authenticated users both get the queue
- stopLlmQueue() in onDestroy as a fire-and-forget cleanup
Processor loop semantics (the heart of the implementation):
1. On start(), reclaim any 'running' rows from a crashed previous
session — reset them to 'pending'. The orphan recovery is the
reason a crash mid-task doesn't leave the queue stuck.
2. findNextRunnable() picks the highest-priority pending task whose
`notBefore` (retry-backoff timestamp) is in the past. Sort key:
priority desc, then enqueuedAt asc (FIFO within priority).
3. Mark the task running, increment attempts, look up the LlmTask
in the registry, hand it to orchestrator.run().
4. On success: mark done, store result + source + finishedAt.
5. On error:
- TierTooLowError or ProviderBlockedError → fail immediately,
no retry. These are not transient — the user's settings or
the content itself need to change.
- Anything else → if attempts < maxAttempts, reset to pending
with notBefore = now + retryBackoffMs (default 60s). Else
mark failed.
6. When no work is pending, sleep on a Promise that resolves when
either (a) someone calls enqueue() (which fires notifyWakeup),
or (b) idleWakeupMs elapses (default 30s, safety net for any
missed wakeup signal).
Module-side reactive reads use Dexie liveQuery directly on the queue
table — no special subscription API on the queue itself. This is
consistent with how every other Mana module reads its data, so the
mental model stays uniform:
const tags = useLiveQuery(
() => llmQueueDb.tasks
.where({ refType: 'note', refId, taskName: 'common.extractTags' })
.reverse().first(),
[refId]
);
Smoke test: a new "Queue" tab in /llm-test lets you enqueue the
existing extractDate / summarize tasks and watch the live state of
the queue table via liveQuery. The display includes per-row state
badge (pending/running/done/failed), tier source, attempt count,
input/output, and a "Done/failed löschen" button that exercises
purge().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adding an app to a workbench scene threw DataCloneError. scenesState
is a $state array, so current.openApps was a Svelte 5 proxy and
spreading it into a new array left proxy entries inside; IndexedDB's
structured clone refuses to serialise those. Snapshot before handing
the array to patchScene / createScene so Dexie sees plain objects.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lockfile had grown five (!) different vitest versions over time:
1.6.1, 2.1.9, 3.2.4, 4.1.2 and 4.1.3 — pulled in by various
packages that pinned outdated majors. The mismatch produced the
classic "createDOMElementFilter not found" startup crash because
hoisted @vitest/utils@3.x was loaded by the nested @vitest/runner@4.x.
Bumped every package.json that pinned an old vitest:
- apps/manavoxel/apps/web (^4.1.0 → ^4.1.2)
- apps/matrix/apps/web (^4.1.0 → ^4.1.2)
- apps/memoro/apps/server (^3.0.0 → ^4.1.2)
- apps/nutriphi/packages/shared (^2.1.8 → ^4.1.2)
- packages/qr-export (^3.0.5 → ^4.1.2)
- packages/shared-llm (^2.0.0 → ^4.1.2)
- packages/shared-storage (^4.1.0 → ^4.1.2)
- packages/spiral-db (^1.6.1 → ^4.1.2)
- packages/test-config (^3.0.0 → ^4.1.2)
- packages/wallpaper-generator (^3.0.5 → ^4.1.2)
After a clean pnpm-lock.yaml regenerate, every @vitest/* sub-package
resolves to a single version (4.1.3, picked by semver) — no more
duplicates between hoisted and nested node_modules.
Verified by running:
pnpm --filter @mana/web vitest run src/lib/data/sync.test.ts
→ 20/20 tests passing in 217ms
pnpm --filter @mana/web vitest run src/lib/data/time-blocks/recurrence.test.ts
→ 19/19 tests passing in 198ms
Pre-existing test failures in base-client.test.ts (German error
strings vs english assertions), dashboard.test.ts (widget count
drift), and content/help/index.test.ts (svelte-i18n locale not
initialised in test env) are unrelated and tracked separately.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>