managarten/apps/mana/apps
Till JS 7007140d13 fix(voice): switch to gemma3:12b + few-shot prompt for parse-task
Two related changes that fall out of real end-to-end testing against
the now-working local mana-llm.

1. Default model bumped from gemma3:4b to gemma3:12b for both
   parse-task and parse-habit. The 4b model gets weekday math
   off-by-one ("nächsten Montag" from a Wednesday → 2026-04-14
   instead of 2026-04-13), aggressively shortens titles ("Anna
   anrufen" → "Anrufen"), and frequently paraphrases habit names
   instead of copying verbatim ("Joggen" instead of "Laufen") which
   the verbatim-validation in coerce drops, costing an LLM round-trip
   for nothing. The 12b variant is roughly 10% slower for these
   tiny prompts (~1.1s vs ~1.0s on the GPU box) so the accuracy
   win is essentially free.

2. parse-task prompt rewritten as few-shot. Pure rule descriptions
   were *worse* than simple examples — the long "Rules — read
   carefully" section in the previous prompt actually made the model
   compute next Monday as 2026-04-14 even though a direct "what date
   is next Monday?" prompt to the same model returned 2026-04-13.
   The detailed rules were also priming the model to over-shorten
   titles and over-eagerly tag filler words. Five worked examples
   (including the previously-failing "Anna nächsten Montag anrufen"
   case) plus one novel case ("Mama am Wochenende besuchen") all
   come back correct now, including for the novel one.

The deterministic guards in coerce() are kept as a backstop for the
day the GPU box swaps in a weaker model — they're cheap and don't
hurt the happy path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:59:32 +02:00
..
landing chore(matrix): final scrub of stale matrix references 2026-04-08 16:47:54 +02:00
mobile chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
web fix(voice): switch to gemma3:12b + few-shot prompt for parse-task 2026-04-08 16:59:32 +02:00