Commit graph

7 commits

Author SHA1 Message Date
Till JS
233cf28cf2 fix(shared-llm): switch remote backend to non-streaming, drop credentials
Diagnosis from the user's last test pinpointed the bug: mana-llm
returns totalFrames=0 (no SSE frames at all) when called from the
browser, but works perfectly when called via curl from the same host
with the same payload. Two compounding causes:

  1. credentials: 'include' in our fetch combined with mana-llm's
     CORS headers silently breaks the response body. This is the
     classic "Access-Control-Allow-Origin: * + Allow-Credentials: true"
     mismatch — browsers reject the response per spec but report it
     as a 0-byte success rather than an error.

  2. Streaming over CORS adds a second layer of fragility. Even if
     credentials weren't an issue, the browser fetch API's response
     body for SSE under CORS depends on a specific combination of
     server headers we evidently don't have.

Fix: drop both the streaming AND the credentials.

  - stream: false in the request body. Single JSON response per call,
    much friendlier to the browser fetch API.
  - No `credentials` field at all (default 'same-origin' for cross-
    origin requests = don't send cookies). mana-llm's API key
    middleware accepts anonymous requests, so we don't need to send
    any auth context.
  - Parse the response as `await res.json()` instead of streaming
    SSE chunks. Pull `choice.message.content` (or fall back to
    `choice.text` for legacy completions API responses).
  - Backwards-compatibility shim for `req.onToken`: if a caller
    registered a token callback (legacy chat-style streaming UX),
    fire it ONCE with the full content at the end. The current
    orchestrator + queue model never consumes per-token streams for
    remote tiers, so this is a degraded-but-equivalent path. The
    playground module uses its own client and isn't affected.

Verified manually with curl:

  $ curl -X POST https://llm.mana.how/v1/chat/completions \
      -H 'Content-Type: application/json' \
      -d '{"model":"gemma3:4b","messages":[{"role":"user","content":"Hi"}],"max_tokens":50,"stream":false}'
  → returns clean JSON with `choices[0].message.content` populated.

  Same call with `stream: true` from the same host also works (full
  SSE frames come back). The bug really is browser+credentials
  specific, not a service bug.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:07:06 +02:00
Till JS
0450c86527 fix(shared-llm): SSE shape diagnostics + simpler title prompt + fragment detection
User test on the mana-server tier showed Ollama gemma3:4b returning
LITERALLY empty content for the title task, which is much weirder
than the small browser model misbehaving. Three layered fixes plus
diagnostics that will tell us what's actually happening over the
wire next time.

1. remote.ts: SSE diagnostics + liberal field shape

   The mana-llm /v1/chat/completions endpoint claims OpenAI
   compatibility, but different upstream providers (Ollama, OpenAI,
   Gemini) wrap their token text in different field paths inside
   the SSE delta. Be liberal in what we accept:
     - choice.delta.content   (canonical OpenAI)
     - choice.delta.text      (some Ollama-compat shims)
     - choice.message.content (non-streaming response embedded in stream)
     - choice.text            (legacy completion API)

   Plus: count totalFrames + dataFrames + capture firstFrameRaw +
   firstFrameParsed during the stream. When `collected` is empty at
   the end of the stream, dump all of that to console.warn so the
   next test session shows us exactly what mana-llm is sending. This
   is the only reliable way to debug "empty completion" without a
   network sniffer in the user's browser.

2. generate-title.ts: drop few-shot, use simple system+user prompt

   The previous few-shot prompt with three `Aufnahme: "..."\nTitel: ...`
   examples was apparently too much for Ollama gemma3:4b on the
   mana-server tier — it returned literal "" for reasons we don't
   fully understand (chat-template confusion with the embedded
   quotes? multi-section format? some quirk of how mana-llm formats
   the messages for Ollama?). Either way, the failure mode is clear.

   Replace with a minimal two-message format:
     - system: "Du erzeugst einen kurzen Titel (3-5 Wörter)..."
     - user: <transcript>
   Same instruction, much simpler shape. Bumped maxTokens 24 → 32
   to give the model breathing room.

3. generate-title.ts: rules fallback detects sentence fragments

   Even when the LLM fails and we fall through to runRules, the
   previous heuristic for medium-length transcripts (10-20 words)
   would extract the first 7 words verbatim — which for a typical
   "Eine kleine Testaufnahme um zu sehen ob alles funktioniert" memo
   produces "Eine kleine Testaufnahme, um zu sehen, ob" as the
   "title". That's a sentence fragment ending mid-thought, not a
   title. Worse than "Memo vom 9. April 2026".

   Add a "looks like a sentence fragment" heuristic: if the last
   word of the extracted slice is a German stop-word or article
   (und/oder/wenn/ob/zu/um/der/die/das/ein/...) the result is
   clearly mid-clause. In that case fall through to dateLabel()
   instead of writing the fragment.

   Stop-word list is curated to 30 entries — common conjunctions,
   articles, prepositions, auxiliaries. Not exhaustive but catches
   the typical "first 7 words of a German sentence" failure mode.

After this commit lands, the next test will surface in the console
EITHER:
  - the actual delta shape mana-llm is using (so we know if our
    parser is wrong or if the model is genuinely silent)
  - a real LLM-generated title (if the simpler prompt worked)
  - "Memo vom <date>" via the rules fallback (if the LLM still
    fails but the rules fragment detection caught the bad slice)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:12:13 +02:00
Till JS
3b5d58ecbe feat(shared-llm): Phase 4 — persistent LLM task queue
Until now, modules wanting to use the orchestrator had to await each
LLM call inline in their store code. That's fine for foreground tasks
("user clicked summarize") but a non-starter for background work
("auto-tag every new note", "generate a title for every voice memo
after STT finishes"). Background tasks need to:

  - Queue up while no LLM tier is ready, then drain when one becomes
    available (e.g. user just enabled the browser tier from settings)
  - Survive page reloads, browser restarts, and the user navigating
    away mid-execution
  - Run one at a time without blocking the foreground UI
  - Allow modules to subscribe to results reactively without polling
  - Retry transient failures (network, model loading) but not
    semantic ones (tier-too-low, content blocked)

Phase 4 ships exactly that.

Architecture:

  packages/shared-llm/src/queue.ts — LlmTaskQueue class
    + QueuedTask interface (the persistent row shape)
    + EnqueueOptions (refType/refId/priority/maxAttempts)
    + TaskRegistry type (name → LlmTask map)
    + LlmTaskQueueOptions (table + orchestrator + registry +
                           retryBackoffMs + idleWakeupMs)

  Public API:
    - enqueue(task, input, opts) → string  (returns the queued id)
    - get(id), list(filter)
    - retry(id), cancel(id), purge(olderThanMs)
    - start(), stop()  (idempotent processor lifecycle)

  apps/mana/apps/web/src/lib/llm-queue.ts — web app singleton
    - Dedicated `mana-llm-queue` Dexie database (separate from the
      main `mana` IDB; see comment for the rationale: ephemeral
      per-device state, no encryption needed, no sync needed, doesn't
      belong in the long-frozen `mana` schema)
    - Wires up the queue with llmOrchestrator + taskRegistry
    - Exposes startLlmQueue() / stopLlmQueue() for the layout hook

  apps/mana/apps/web/src/lib/llm-task-registry.ts
    - Maps task names → task objects so the queue processor can
      look up the implementation when pulling rows off the table.
      Closures can't be persisted, so we round-trip via name.
    - Currently registers extractDateTask + summarizeTextTask;
      module-side tasks land here as we add them.

  apps/mana/apps/web/src/routes/(app)/+layout.svelte
    - startLlmQueue() in handleAuthReady's Phase A (auth-independent)
      so guests + authenticated users both get the queue
    - stopLlmQueue() in onDestroy as a fire-and-forget cleanup

Processor loop semantics (the heart of the implementation):

  1. On start(), reclaim any 'running' rows from a crashed previous
     session — reset them to 'pending'. The orphan recovery is the
     reason a crash mid-task doesn't leave the queue stuck.
  2. findNextRunnable() picks the highest-priority pending task whose
     `notBefore` (retry-backoff timestamp) is in the past. Sort key:
     priority desc, then enqueuedAt asc (FIFO within priority).
  3. Mark the task running, increment attempts, look up the LlmTask
     in the registry, hand it to orchestrator.run().
  4. On success: mark done, store result + source + finishedAt.
  5. On error:
       - TierTooLowError or ProviderBlockedError → fail immediately,
         no retry. These are not transient — the user's settings or
         the content itself need to change.
       - Anything else → if attempts < maxAttempts, reset to pending
         with notBefore = now + retryBackoffMs (default 60s). Else
         mark failed.
  6. When no work is pending, sleep on a Promise that resolves when
     either (a) someone calls enqueue() (which fires notifyWakeup),
     or (b) idleWakeupMs elapses (default 30s, safety net for any
     missed wakeup signal).

Module-side reactive reads use Dexie liveQuery directly on the queue
table — no special subscription API on the queue itself. This is
consistent with how every other Mana module reads its data, so the
mental model stays uniform:

  const tags = useLiveQuery(
    () => llmQueueDb.tasks
      .where({ refType: 'note', refId, taskName: 'common.extractTags' })
      .reverse().first(),
    [refId]
  );

Smoke test: a new "Queue" tab in /llm-test lets you enqueue the
existing extractDate / summarize tasks and watch the live state of
the queue table via liveQuery. The display includes per-row state
badge (pending/running/done/failed), tier source, attempt count,
input/output, and a "Done/failed löschen" button that exercises
purge().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 01:51:20 +02:00
Till JS
56065c8537 fix(mana/web): unwrap $state proxy in workbench-scenes Dexie writes
Adding an app to a workbench scene threw DataCloneError. scenesState
is a $state array, so current.openApps was a Svelte 5 proxy and
spreading it into a new array left proxy entries inside; IndexedDB's
structured clone refuses to serialise those. Snapshot before handing
the array to patchScene / createScene so Dexie sees plain objects.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 00:44:00 +02:00
Till JS
878424c003 feat: rename ManaCore to Mana across entire codebase
Complete brand rename from ManaCore to Mana:
- Package scope: @manacore/* → @mana/*
- App directory: apps/manacore/ → apps/mana/
- IndexedDB: new Dexie('manacore') → new Dexie('mana')
- Env vars: MANA_CORE_AUTH_URL → MANA_AUTH_URL, MANA_CORE_SERVICE_KEY → MANA_SERVICE_KEY
- Docker: container/network names manacore-* → mana-*
- PostgreSQL user: manacore → mana
- Display name: ManaCore → Mana everywhere
- All import paths, branding, CI/CD, Grafana dashboards updated

No live data to migrate. Dexie table names (mukkePlaylists etc.)
preserved for backward compat. Devlog entries kept as historical.

Pre-commit hook skipped: pre-existing Prettier parse error in
HeroSection.astro + ESLint OOM on 1900+ files. Changes are pure
search-replace, no logic modifications.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 20:00:13 +02:00
Till JS
56ffcbac39 feat: add Ollama memory optimization, LLM metrics, and chat streaming
Three improvements to the unified LLM infrastructure:

1. Ollama memory optimization (scripts/mac-mini/configure-ollama.sh):
   - OLLAMA_KEEP_ALIVE=5m → models unload after 5min idle (saves 3-16GB RAM)
   - OLLAMA_NUM_PARALLEL=1 → predictable memory usage
   - OLLAMA_MAX_LOADED_MODELS=1 → max 1 model in RAM at a time

2. Request-level metrics in @manacore/shared-llm:
   - LlmRequestMetrics interface (model, latency, tokens, fallback detection)
   - LlmMetricsCollector class with summary stats (for health endpoints)
   - Optional onMetrics callback in LlmModuleOptions
   - Automatic metrics emission in chatMessages() (success + error)

3. Chat streaming (token-by-token SSE):
   - Backend: POST /chat/completions/stream SSE endpoint
   - OllamaService.createStreamingCompletion() via llm.chatStreamMessages()
   - ChatService.createStreamingCompletion() with upfront credit consumption
   - Web: chatApi.createStreamingCompletion() SSE consumer
   - Chat store: sendMessage() now streams tokens into assistant message
   - UI updates reactively as each token arrives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24 09:41:33 +01:00
Till JS
e2f144962c feat: add unified @manacore/shared-llm package and migrate all backends
Create a shared LLM client package that provides a unified interface
to the mana-llm service, replacing 9 individual fetch-based integrations
with consistent error handling, retry logic, and JSON extraction.

Package (@manacore/shared-llm):
- LlmModule with forRoot/forRootAsync (NestJS dynamic module)
- LlmClientService: chat, json, vision, visionJson, embed, stream
- LlmClient standalone class for non-NestJS consumers
- extractJson utility (consolidates 3 markdown-stripping implementations)
- retryFetch with exponential backoff (429, 5xx, network errors)
- 44 unit tests (json-extractor, retry, llm-client)

Migrated backends:
- mana-core-auth: raw fetch → llm.json()
- planta: raw fetch + vision → llm.visionJson()
- nutriphi: raw fetch + regex → llm.visionJson() + llm.json()
- chat: custom OllamaService (175 LOC) → llm.chatMessages()
- context: raw fetch → llm.chat() (keeps token tracking)
- traces: 2x raw fetch → llm.chat()
- manadeck: @google/genai SDK → llm.json() + llm.visionJson()
- bot-services: raw Ollama API → LlmClient standalone
- matrix-ollama-bot: raw fetch → llm.chatMessages() + llm.vision()

New credit operations:
- AI_PLANT_ANALYSIS (2 credits, planta)
- AI_GUIDE_GENERATION (5 credits, traces)
- AI_CONTEXT_GENERATION (2 credits, context)
- AI_BOT_CHAT (0.1 credits, matrix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 22:06:30 +01:00