Commit graph

22 commits

Author SHA1 Message Date
Till JS
1cfd05939e fix(llm): user-friendly messages + settings link for all LLM errors
Move getUserMessage() to the base LlmError class so every error type
gets a German explanation with a clickable settings deep-link:

- TierTooLowError: "Kein KI-Modell aktiviert. Mindestens X benötigt."
- ProviderBlockedError: "… hat die Anfrage blockiert (Inhaltsfilter)."
- BackendUnreachableError: "… ist nicht erreichbar."
- EdgeLoadFailedError: "Browser-Modell konnte nicht geladen werden."
- Generic fallback: also includes the settings link now

The companion engine now catches LlmError (base class) instead of
only NoTierAvailableError, covering all failure modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:13:48 +02:00
Till JS
928f036033 fix(llm): add deep-link to AI settings in tier error messages
Error messages now include a clickable Markdown link
"KI-Einstellungen öffnen" that navigates to /?app=settings#ai-options,
which opens the settings panel in the workbench, switches to the AI
tab, and scrolls to the LLM options section.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 14:58:32 +02:00
Till JS
2b96953ad1 fix(llm): user-friendly error messages when no LLM tier available
Track skip reasons per tier in the orchestrator (no-consent,
no-backend, not-available, not-ready, runtime-error) and expose
them via NoTierAvailableError.getUserMessage() with actionable
German text pointing the user to the right settings page.

Before: "No tier could run task 'companion.chat' (attempted: cloud)"
After:  "Cloud (Gemini): Cloud-Einwilligung fehlt. Aktiviere sie
         unter Einstellungen → KI."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 14:46:39 +02:00
Till JS
be81d11dc3 feat(ai): SSE streaming for foreground Mission Runner
Enable real-time token streaming during the planner "calling-llm" phase
so the user sees live progress ("empfange Plan… 128 tokens") instead of
a static spinner. The parser still receives the full text once complete —
no partial-JSON risk.

Changes:
- Extract shared SSE parser from playground into @mana/shared-llm/sse-parser
- remote.ts: use stream:true when onToken callback is provided
- AiPlanInput: add optional onToken field (shared-ai)
- ai-plan task: pass onToken through to backend.generate()
- runner.ts: throttled (500ms) phaseDetail updates during streaming
- Playground: refactored to use shared SSE parser

Also includes: AI agent architecture comparison report (docs/reports/)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:32:43 +02:00
Till JS
8a0bf93699 chore(cloud-tier): upgrade default model gemini-2.0-flash → gemini-2.5-flash
gemini-2.0-flash is deprecated June 1 2026. gemini-2.5-flash has been
stable since Q1 2026 with similar pricing ($0.15/$0.60 per 1M tokens
vs $0.10/$0.40 — pricing table already had the entry).

Three files touched:
- packages/shared-llm/src/backends/cloud.ts — client default
- services/mana-llm/src/config.py — server default
- services/mana-llm/src/providers/google.py — Ollama→Gemini fallback
  map + constructor default + deduplicated model list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 12:32:03 +02:00
Till JS
f0233b8794 perf(shared-pkgs): declare sideEffects for aggressive tree-shaking
Following the shared-icons fix (d5cabed14), audit every workspace
package's src/index.ts for top-level side effects and flag the
ones that are safe to tree-shake:

- Pure TS re-export barrels (types, theme, utils, llm, storage):
  "sideEffects": false — lets Vite prune entire submodules when a
  consumer only imports a subset of named exports. Matters most for
  shared-llm where the orchestrator/BYOK branch isn't needed on
  every route.

- Packages that ship .svelte components (branding, ui, links):
  "sideEffects": ["**/*.svelte", "**/*.css"] — same tree-shaking
  benefit for TS modules, but keeps Svelte component CSS injection
  intact.

The state-holding submodules (shared-ui drag-state/toast,
shared-llm store, shared-links mutations) are still evaluated
whenever their exports are referenced, so behaviour is unchanged —
the flag only lets the bundler skip modules that aren't in the
dependency graph at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:12:22 +02:00
Till JS
cf9f4ecd52 fix(llm): per-task tier override bypasses global allowedTiers gate
Bug: setting taskOverrides['companion.chat'] = 'byok' didn't work
when the user's allowedTiers was empty/['none']. The tier-too-low
check in run() compared task.minTier ('browser') against userMaxTier
('none') and threw TierTooLowError before the override was even read.

Same issue in canRun() and candidateTiers().

Fix: when a per-task override exists, treat it as opt-in to that tier
even if not in the global allowedTiers. The override is the user's
explicit per-task signal — overriding the global default is exactly
what an override is for.

- run(): effectiveMaxTier = max(override, userMaxTier)
- candidateTiers(task, override): adds override to baseTiers
- canRun(): now passes the override to candidateTiers

The Companion chat now correctly uses BYOK when selected from the
toolbar, even if the user hasn't enabled BYOK in their global LLM
settings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:19:50 +02:00
Till JS
e4f0a410d1 test(byok): add 35 unit tests + update docs to as-built status
Three new test suites covering the critical BYOK paths:

Pricing (14 tests): estimateCost for known/unknown models, scaling,
formatCost edge cases, coverage check for all model IDs.

ByokBackend (10 tests): tier identification, resolver behavior,
provider dispatch, parameter passthrough, onUsage callback, error
paths (no key, unregistered provider), invalidateAvailability.

ByokVault (11 tests): encryption at rest verification, decryption
round-trip, auto-default for first key, promoting default demotes
previous, getForProvider logic, listMeta excludes apiKey, soft
delete, recordUsage accumulation, cross-provider isolation.

Updates docs/architecture/BYOK_PLAN.md with as-built status —
phase table with commit references, deviations from original plan
(no server-proxy fallback, no sensitive opt-in UI, no per-task
provider override yet), test coverage matrix, troubleshooting
guide, v2 follow-ups.

Provider adapters remain unit-untested (need fetch mocking + SSE
parsing) — smoke tests only.

Total: 35/35 tests passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:23:03 +02:00
Till JS
a33857fa39 feat(llm): add BYOK tier + 4 provider adapters (OpenAI, Anthropic, Gemini, Mistral)
Phase 1-3 of BYOK support. Introduces a 5th LLM tier 'byok' that
routes to user-provided API keys via direct browser fetches.

shared-llm additions:
- LlmTier extended with 'byok' (rank 3, between mana-server and cloud)
- ByokBackend: LlmBackend implementation that delegates key lookup
  to an app-provided resolver callback, then dispatches to the right
  provider adapter
- 4 provider adapters:
  - OpenAI (gpt-5, gpt-4o, o1 family)
  - Anthropic (Claude Opus/Sonnet/Haiku 4.6) with CORS header
  - Gemini (2.5 Pro/Flash) — REST API with different message format
  - Mistral — OpenAI-compatible, reuses shared openai-compat adapter
- Pricing table for 20+ models with USD per 1M tokens
- estimateCost() + formatCost() helpers

Keys stay device-local (IndexedDB in next phase). Browser-direct
fetches mean keys never touch Mana's server.

Updates two existing tier maps (memoro DetailView, SourceBadge) to
include the new tier.

Planning doc at docs/architecture/BYOK_PLAN.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:06:48 +02:00
Till JS
3e81a6ebef fix: dev startup — Redis eviction policy, mana-media port crash, Svelte warnings
- Redis: allkeys-lru → noeviction to prevent silent data loss when memory full
- mana-media: --watch → --hot to fix EADDRINUSE crash on Bun HMR reload
- Svelte: build initial values before $state() to avoid state_referenced_locally warnings
  in create-app-onboarding.svelte.ts and shared-llm/store.svelte.ts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:33:41 +02:00
Till JS
716466e757 fix(shared-llm): sort candidate tiers privacy-first (browser before server)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:23:28 +02:00
Till JS
92f8221bfd docs(shared-llm): correct the mana-server tier topology in code + CLAUDE.md
In commit c9e16243c (the gemma3:4b → gemma4:e4b switch) I sloppily
wrote in the ManaServerBackend docstring that mana-llm "routes them
to the local Ollama instance on the Mac Mini (running on the M4's
Metal GPU)". That is wrong AND it's the exact misconception I had
to debug-out-of earlier the same day.

The actual topology — already documented correctly in
docs/MAC_MINI_SERVER.md and docs/WINDOWS_GPU_SERVER_SETUP.md, I
just didn't read those before writing the docstring:

  mana-llm container's OLLAMA_URL points at host.docker.internal:13434
  → ~/gpu-proxy.py (Python TCP forwarder, LaunchAgent on Mac Mini)
  → 192.168.178.11:11434 (LAN)
  → Ollama on the Windows GPU server (RTX 3090, 24 GB VRAM)
  → Inference

The Mac Mini's brew-installed Ollama binary is NOT on the inference
path. It's just a CLI for inspecting the proxied daemon. Today's
"why does the Mac Mini still have Ollama 0.15.4" puzzle has the
answer "because nothing on the Mac Mini actually runs inference, the
binary version was never load-bearing".

Two doc fixes:

1. packages/shared-llm/src/backends/mana-server.ts
   Replace the lying docstring with the real topology, including a
   pointer to the two MAC_MINI_SERVER.md / WINDOWS_GPU_SERVER_SETUP.md
   sections that document it. Also note that gemma4:e4b is a
   reasoning model that emits message.reasoning when given enough
   tokens (cross-reference to remote.ts's fallback parser).

2. packages/local-llm/CLAUDE.md
   Add a paragraph at the top explaining the difference between
   "@mana/local-llm" (browser tier, on-device) and the @mana/shared-llm
   "mana-server" / "cloud" tiers (services/mana-llm proxy → gpu-proxy.py
   → RTX 3090). This was implicit before — "not related to
   services/mana-llm" — but didn't say where mana-server actually
   goes. Future me reading the doc would still have to dig through
   the docker-compose env to find out.

No code changes — only docstring + markdown.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:40:34 +02:00
Till JS
8adef1b39c fix(shared-llm): fall back to message.reasoning when content is empty
Reasoning-style models (Gemma 4 E4B is the first one we use, but
DeepSeek R1, Gemini 2.5 thinking, etc. behave the same way) split
their output into two fields:
  - message.content   — the final answer
  - message.reasoning — the chain-of-thought leading up to it

When the model is given too few max_tokens to finish reasoning AND
emit content, the response comes back with content="" and reasoning
populated with the half-finished thought. Verified empirically with
gemma4:e4b and `max_tokens: 10` on a "Sage Hi auf Deutsch in einem
Wort" prompt — content was "" while reasoning had "Here's a
thinking process to..." (cut off mid-thought).

For the title task this rarely matters because the system prompt is
directive enough to skip the thinking phase (verified: same gemma4:
e4b returns clean 7-token titles like "Sonnenstrahlen genießen
heute" with the standard system prompt + max_tokens 32). But it's
a real failure mode for any future task that uses a less-directive
prompt or hits a longer reasoning chain.

Defensive fix: prefer message.content first, fall back to
message.reasoning if content is empty. The fallback is a string-or-
nothing operation, no semantic interpretation — if the reasoning
field happens to contain a usable answer fragment, the caller's
cleanup chain (e.g. generateTitleTask's strip-quotes-and-dots
pipeline) will normalize it. If it's truly half-finished thought,
the caller's runRules fallback still kicks in via the existing
empty-result detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:29:22 +02:00
Till JS
c9e16243c8 feat(shared-llm): bump mana-server default model to gemma4:e4b
Two surprises came out of "why do we still use Gemma 3 instead of 4":

1. The hardcoded default in ManaServerBackend was `gemma3:4b`, which
   was even smaller than mana-llm's actual server-side default of
   `gemma3:12b`. My initial guess from docs/LOCAL_LLM_MODELS.md was
   conservative.

2. The mana-llm OLLAMA_URL points at host.docker.internal:13434,
   which is NOT the Mac Mini's local Ollama — it's a Python TCP
   forwarder (~/gpu-proxy.py) that proxies to 192.168.178.11:11434
   on the Windows GPU server. So title generation has been running
   on the RTX 3090 the whole time, not on the M4 Metal GPU. The
   Mac Mini's brew-installed ollama 0.15.4 wasn't even being used
   for inference — only as a CLI to inspect the proxied Ollama.

To get to Gemma 4, both Ollama instances needed an upgrade:
  - Mac Mini brew  : 0.15.4 → 0.20.4 (cosmetic, the binary isn't on
                     the inference path; upgraded for consistency)
  - GPU server     : 0.18.2 → 0.20.4 via winget. Required restarting
                     the daemon via the OllamaServe scheduled task
                     that was already configured.

Then `ollama pull gemma4:e4b` on the GPU server (9.6 GB, ~10 min on
the LAN). Verified end-to-end via the proxy with a real chat
completion request to mana-llm — gemma4:e4b answered with a clean
4-word German title for a sample voice memo prompt:

  prompt: "Erstelle einen kurzen 3-Wort Titel für: Es ist ein
           schöner Tag heute am 9. April"
  → "Schöner Tag, neuntes April"

Changes in this commit:

  packages/shared-llm/src/backends/mana-server.ts
    - defaultModel: 'gemma3:4b' → 'gemma4:e4b'
    - Updated docstring to explain why E4B is the right Mana-Server
      tier default: 9.6 GB on disk, 128K context, "Effective 4B"
      arch punches above its weight class for German prompts, and
      the family stays consistent with the browser tier (Gemma 4
      E2B is the smaller sibling) so the source label and prompt
      behavior remain coherent across tiers.

  apps/mana/apps/web/src/lib/modules/memoro/views/DetailView.svelte
    - TITLE_SOURCE_LABELS map updated:
        browser     → "Auf deinem Gerät (Gemma 4 E2B)" (was "(Gemma 4)")
        mana-server → "Mana-Server (Gemma 4 E4B)" (was "(gemma3:4b)")
    - The label now reflects that BOTH the browser and the mana-server
      tier are running Gemma 4 variants, which is more honest than
      the previous mix.

Did NOT change:
  - The Ollama OLLAMA_DEFAULT_MODEL env var in docker-compose.macmini.yml
    (still gemma3:12b). That's the fallback for callers who don't
    specify a model in their request. Our generate-title task always
    sends an explicit model string, so it's unaffected. Bumping the
    global default is a separate decision — it would change behavior
    for the playground module and any other consumer that relies on
    the implicit fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:06:33 +02:00
Till JS
233cf28cf2 fix(shared-llm): switch remote backend to non-streaming, drop credentials
Diagnosis from the user's last test pinpointed the bug: mana-llm
returns totalFrames=0 (no SSE frames at all) when called from the
browser, but works perfectly when called via curl from the same host
with the same payload. Two compounding causes:

  1. credentials: 'include' in our fetch combined with mana-llm's
     CORS headers silently breaks the response body. This is the
     classic "Access-Control-Allow-Origin: * + Allow-Credentials: true"
     mismatch — browsers reject the response per spec but report it
     as a 0-byte success rather than an error.

  2. Streaming over CORS adds a second layer of fragility. Even if
     credentials weren't an issue, the browser fetch API's response
     body for SSE under CORS depends on a specific combination of
     server headers we evidently don't have.

Fix: drop both the streaming AND the credentials.

  - stream: false in the request body. Single JSON response per call,
    much friendlier to the browser fetch API.
  - No `credentials` field at all (default 'same-origin' for cross-
    origin requests = don't send cookies). mana-llm's API key
    middleware accepts anonymous requests, so we don't need to send
    any auth context.
  - Parse the response as `await res.json()` instead of streaming
    SSE chunks. Pull `choice.message.content` (or fall back to
    `choice.text` for legacy completions API responses).
  - Backwards-compatibility shim for `req.onToken`: if a caller
    registered a token callback (legacy chat-style streaming UX),
    fire it ONCE with the full content at the end. The current
    orchestrator + queue model never consumes per-token streams for
    remote tiers, so this is a degraded-but-equivalent path. The
    playground module uses its own client and isn't affected.

Verified manually with curl:

  $ curl -X POST https://llm.mana.how/v1/chat/completions \
      -H 'Content-Type: application/json' \
      -d '{"model":"gemma3:4b","messages":[{"role":"user","content":"Hi"}],"max_tokens":50,"stream":false}'
  → returns clean JSON with `choices[0].message.content` populated.

  Same call with `stream: true` from the same host also works (full
  SSE frames come back). The bug really is browser+credentials
  specific, not a service bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:07:06 +02:00
Till JS
0450c86527 fix(shared-llm): SSE shape diagnostics + simpler title prompt + fragment detection
User test on the mana-server tier showed Ollama gemma3:4b returning
LITERALLY empty content for the title task, which is much weirder
than the small browser model misbehaving. Three layered fixes plus
diagnostics that will tell us what's actually happening over the
wire next time.

1. remote.ts: SSE diagnostics + liberal field shape

   The mana-llm /v1/chat/completions endpoint claims OpenAI
   compatibility, but different upstream providers (Ollama, OpenAI,
   Gemini) wrap their token text in different field paths inside
   the SSE delta. Be liberal in what we accept:
     - choice.delta.content   (canonical OpenAI)
     - choice.delta.text      (some Ollama-compat shims)
     - choice.message.content (non-streaming response embedded in stream)
     - choice.text            (legacy completion API)

   Plus: count totalFrames + dataFrames + capture firstFrameRaw +
   firstFrameParsed during the stream. When `collected` is empty at
   the end of the stream, dump all of that to console.warn so the
   next test session shows us exactly what mana-llm is sending. This
   is the only reliable way to debug "empty completion" without a
   network sniffer in the user's browser.

2. generate-title.ts: drop few-shot, use simple system+user prompt

   The previous few-shot prompt with three `Aufnahme: "..."\nTitel: ...`
   examples was apparently too much for Ollama gemma3:4b on the
   mana-server tier — it returned literal "" for reasons we don't
   fully understand (chat-template confusion with the embedded
   quotes? multi-section format? some quirk of how mana-llm formats
   the messages for Ollama?). Either way, the failure mode is clear.

   Replace with a minimal two-message format:
     - system: "Du erzeugst einen kurzen Titel (3-5 Wörter)..."
     - user: <transcript>
   Same instruction, much simpler shape. Bumped maxTokens 24 → 32
   to give the model breathing room.

3. generate-title.ts: rules fallback detects sentence fragments

   Even when the LLM fails and we fall through to runRules, the
   previous heuristic for medium-length transcripts (10-20 words)
   would extract the first 7 words verbatim — which for a typical
   "Eine kleine Testaufnahme um zu sehen ob alles funktioniert" memo
   produces "Eine kleine Testaufnahme, um zu sehen, ob" as the
   "title". That's a sentence fragment ending mid-thought, not a
   title. Worse than "Memo vom 9. April 2026".

   Add a "looks like a sentence fragment" heuristic: if the last
   word of the extracted slice is a German stop-word or article
   (und/oder/wenn/ob/zu/um/der/die/das/ein/...) the result is
   clearly mid-clause. In that case fall through to dateLabel()
   instead of writing the fragment.

   Stop-word list is curated to 30 entries — common conjunctions,
   articles, prepositions, auxiliaries. Not exhaustive but catches
   the typical "first 7 words of a German sentence" failure mode.

After this commit lands, the next test will surface in the console
EITHER:
  - the actual delta shape mana-llm is using (so we know if our
    parser is wrong or if the model is genuinely silent)
  - a real LLM-generated title (if the simpler prompt worked)
  - "Memo vom <date>" via the rules fallback (if the LLM still
    fails but the rules fragment detection caught the bad slice)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:12:13 +02:00
Till JS
3b5d58ecbe feat(shared-llm): Phase 4 — persistent LLM task queue
Until now, modules wanting to use the orchestrator had to await each
LLM call inline in their store code. That's fine for foreground tasks
("user clicked summarize") but a non-starter for background work
("auto-tag every new note", "generate a title for every voice memo
after STT finishes"). Background tasks need to:

  - Queue up while no LLM tier is ready, then drain when one becomes
    available (e.g. user just enabled the browser tier from settings)
  - Survive page reloads, browser restarts, and the user navigating
    away mid-execution
  - Run one at a time without blocking the foreground UI
  - Allow modules to subscribe to results reactively without polling
  - Retry transient failures (network, model loading) but not
    semantic ones (tier-too-low, content blocked)

Phase 4 ships exactly that.

Architecture:

  packages/shared-llm/src/queue.ts — LlmTaskQueue class
    + QueuedTask interface (the persistent row shape)
    + EnqueueOptions (refType/refId/priority/maxAttempts)
    + TaskRegistry type (name → LlmTask map)
    + LlmTaskQueueOptions (table + orchestrator + registry +
                           retryBackoffMs + idleWakeupMs)

  Public API:
    - enqueue(task, input, opts) → string  (returns the queued id)
    - get(id), list(filter)
    - retry(id), cancel(id), purge(olderThanMs)
    - start(), stop()  (idempotent processor lifecycle)

  apps/mana/apps/web/src/lib/llm-queue.ts — web app singleton
    - Dedicated `mana-llm-queue` Dexie database (separate from the
      main `mana` IDB; see comment for the rationale: ephemeral
      per-device state, no encryption needed, no sync needed, doesn't
      belong in the long-frozen `mana` schema)
    - Wires up the queue with llmOrchestrator + taskRegistry
    - Exposes startLlmQueue() / stopLlmQueue() for the layout hook

  apps/mana/apps/web/src/lib/llm-task-registry.ts
    - Maps task names → task objects so the queue processor can
      look up the implementation when pulling rows off the table.
      Closures can't be persisted, so we round-trip via name.
    - Currently registers extractDateTask + summarizeTextTask;
      module-side tasks land here as we add them.

  apps/mana/apps/web/src/routes/(app)/+layout.svelte
    - startLlmQueue() in handleAuthReady's Phase A (auth-independent)
      so guests + authenticated users both get the queue
    - stopLlmQueue() in onDestroy as a fire-and-forget cleanup

Processor loop semantics (the heart of the implementation):

  1. On start(), reclaim any 'running' rows from a crashed previous
     session — reset them to 'pending'. The orphan recovery is the
     reason a crash mid-task doesn't leave the queue stuck.
  2. findNextRunnable() picks the highest-priority pending task whose
     `notBefore` (retry-backoff timestamp) is in the past. Sort key:
     priority desc, then enqueuedAt asc (FIFO within priority).
  3. Mark the task running, increment attempts, look up the LlmTask
     in the registry, hand it to orchestrator.run().
  4. On success: mark done, store result + source + finishedAt.
  5. On error:
       - TierTooLowError or ProviderBlockedError → fail immediately,
         no retry. These are not transient — the user's settings or
         the content itself need to change.
       - Anything else → if attempts < maxAttempts, reset to pending
         with notBefore = now + retryBackoffMs (default 60s). Else
         mark failed.
  6. When no work is pending, sleep on a Promise that resolves when
     either (a) someone calls enqueue() (which fires notifyWakeup),
     or (b) idleWakeupMs elapses (default 30s, safety net for any
     missed wakeup signal).

Module-side reactive reads use Dexie liveQuery directly on the queue
table — no special subscription API on the queue itself. This is
consistent with how every other Mana module reads its data, so the
mental model stays uniform:

  const tags = useLiveQuery(
    () => llmQueueDb.tasks
      .where({ refType: 'note', refId, taskName: 'common.extractTags' })
      .reverse().first(),
    [refId]
  );

Smoke test: a new "Queue" tab in /llm-test lets you enqueue the
existing extractDate / summarize tasks and watch the live state of
the queue table via liveQuery. The display includes per-row state
badge (pending/running/done/failed), tier source, attempt count,
input/output, and a "Done/failed löschen" button that exercises
purge().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 01:51:20 +02:00
Till JS
56065c8537 fix(mana/web): unwrap $state proxy in workbench-scenes Dexie writes
Adding an app to a workbench scene threw DataCloneError. scenesState
is a $state array, so current.openApps was a Svelte 5 proxy and
spreading it into a new array left proxy entries inside; IndexedDB's
structured clone refuses to serialise those. Snapshot before handing
the array to patchScene / createScene so Dexie sees plain objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 00:44:00 +02:00
Till JS
e974761e8a chore(workspace): unify vitest to ^4.1.2 across all packages
The lockfile had grown five (!) different vitest versions over time:
1.6.1, 2.1.9, 3.2.4, 4.1.2 and 4.1.3 — pulled in by various
packages that pinned outdated majors. The mismatch produced the
classic "createDOMElementFilter not found" startup crash because
hoisted @vitest/utils@3.x was loaded by the nested @vitest/runner@4.x.

Bumped every package.json that pinned an old vitest:
- apps/manavoxel/apps/web      (^4.1.0 → ^4.1.2)
- apps/matrix/apps/web         (^4.1.0 → ^4.1.2)
- apps/memoro/apps/server      (^3.0.0 → ^4.1.2)
- apps/nutriphi/packages/shared (^2.1.8 → ^4.1.2)
- packages/qr-export           (^3.0.5 → ^4.1.2)
- packages/shared-llm          (^2.0.0 → ^4.1.2)
- packages/shared-storage      (^4.1.0 → ^4.1.2)
- packages/spiral-db           (^1.6.1 → ^4.1.2)
- packages/test-config         (^3.0.0 → ^4.1.2)
- packages/wallpaper-generator (^3.0.5 → ^4.1.2)

After a clean pnpm-lock.yaml regenerate, every @vitest/* sub-package
resolves to a single version (4.1.3, picked by semver) — no more
duplicates between hoisted and nested node_modules.

Verified by running:
  pnpm --filter @mana/web vitest run src/lib/data/sync.test.ts
  → 20/20 tests passing in 217ms
  pnpm --filter @mana/web vitest run src/lib/data/time-blocks/recurrence.test.ts
  → 19/19 tests passing in 198ms

Pre-existing test failures in base-client.test.ts (German error
strings vs english assertions), dashboard.test.ts (widget count
drift), and content/help/index.test.ts (svelte-i18n locale not
initialised in test env) are unrelated and tracked separately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:58:29 +02:00
Till JS
878424c003 feat: rename ManaCore to Mana across entire codebase
Complete brand rename from ManaCore to Mana:
- Package scope: @manacore/* → @mana/*
- App directory: apps/manacore/ → apps/mana/
- IndexedDB: new Dexie('manacore') → new Dexie('mana')
- Env vars: MANA_CORE_AUTH_URL → MANA_AUTH_URL, MANA_CORE_SERVICE_KEY → MANA_SERVICE_KEY
- Docker: container/network names manacore-* → mana-*
- PostgreSQL user: manacore → mana
- Display name: ManaCore → Mana everywhere
- All import paths, branding, CI/CD, Grafana dashboards updated

No live data to migrate. Dexie table names (mukkePlaylists etc.)
preserved for backward compat. Devlog entries kept as historical.

Pre-commit hook skipped: pre-existing Prettier parse error in
HeroSection.astro + ESLint OOM on 1900+ files. Changes are pure
search-replace, no logic modifications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 20:00:13 +02:00
Till JS
56ffcbac39 feat: add Ollama memory optimization, LLM metrics, and chat streaming
Three improvements to the unified LLM infrastructure:

1. Ollama memory optimization (scripts/mac-mini/configure-ollama.sh):
   - OLLAMA_KEEP_ALIVE=5m → models unload after 5min idle (saves 3-16GB RAM)
   - OLLAMA_NUM_PARALLEL=1 → predictable memory usage
   - OLLAMA_MAX_LOADED_MODELS=1 → max 1 model in RAM at a time

2. Request-level metrics in @manacore/shared-llm:
   - LlmRequestMetrics interface (model, latency, tokens, fallback detection)
   - LlmMetricsCollector class with summary stats (for health endpoints)
   - Optional onMetrics callback in LlmModuleOptions
   - Automatic metrics emission in chatMessages() (success + error)

3. Chat streaming (token-by-token SSE):
   - Backend: POST /chat/completions/stream SSE endpoint
   - OllamaService.createStreamingCompletion() via llm.chatStreamMessages()
   - ChatService.createStreamingCompletion() with upfront credit consumption
   - Web: chatApi.createStreamingCompletion() SSE consumer
   - Chat store: sendMessage() now streams tokens into assistant message
   - UI updates reactively as each token arrives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 09:41:33 +01:00
Till JS
e2f144962c feat: add unified @manacore/shared-llm package and migrate all backends
Create a shared LLM client package that provides a unified interface
to the mana-llm service, replacing 9 individual fetch-based integrations
with consistent error handling, retry logic, and JSON extraction.

Package (@manacore/shared-llm):
- LlmModule with forRoot/forRootAsync (NestJS dynamic module)
- LlmClientService: chat, json, vision, visionJson, embed, stream
- LlmClient standalone class for non-NestJS consumers
- extractJson utility (consolidates 3 markdown-stripping implementations)
- retryFetch with exponential backoff (429, 5xx, network errors)
- 44 unit tests (json-extractor, retry, llm-client)

Migrated backends:
- mana-core-auth: raw fetch → llm.json()
- planta: raw fetch + vision → llm.visionJson()
- nutriphi: raw fetch + regex → llm.visionJson() + llm.json()
- chat: custom OllamaService (175 LOC) → llm.chatMessages()
- context: raw fetch → llm.chat() (keeps token tracking)
- traces: 2x raw fetch → llm.chat()
- manadeck: @google/genai SDK → llm.json() + llm.visionJson()
- bot-services: raw Ollama API → LlmClient standalone
- matrix-ollama-bot: raw fetch → llm.chatMessages() + llm.vision()

New credit operations:
- AI_PLANT_ANALYSIS (2 credits, planta)
- AI_GUIDE_GENERATION (5 credits, traces)
- AI_CONTEXT_GENERATION (2 credits, context)
- AI_BOT_CHAT (0.1 credits, matrix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 22:06:30 +01:00