mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-15 05:41:09 +02:00
Two related changes that fall out of real end-to-end testing against
the now-working local mana-llm.
1. Default model bumped from gemma3:4b to gemma3:12b for both
parse-task and parse-habit. The 4b model gets weekday math
off-by-one ("nächsten Montag" from a Wednesday → 2026-04-14
instead of 2026-04-13), aggressively shortens titles ("Anna
anrufen" → "Anrufen"), and frequently paraphrases habit names
instead of copying verbatim ("Joggen" instead of "Laufen") which
the verbatim-validation in coerce drops, costing an LLM round-trip
for nothing. The 12b variant is roughly 10% slower for these
tiny prompts (~1.1s vs ~1.0s on the GPU box) so the accuracy
win is essentially free.
2. parse-task prompt rewritten as few-shot. Pure rule descriptions
were *worse* than simple examples — the long "Rules — read
carefully" section in the previous prompt actually made the model
compute next Monday as 2026-04-14 even though a direct "what date
is next Monday?" prompt to the same model returned 2026-04-13.
The detailed rules were also priming the model to over-shorten
titles and over-eagerly tag filler words. Five worked examples
(including the previously-failing "Anna nächsten Montag anrufen"
case) plus one novel case ("Mama am Wochenende besuchen") all
come back correct now, including for the novel one.
The deterministic guards in coerce() are kept as a backstop for the
day the GPU box swaps in a weaker model — they're cheap and don't
hurt the happy path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| api | ||
| calc/packages/shared | ||
| calendar | ||
| cards | ||
| chat | ||
| citycorners | ||
| contacts | ||
| context | ||
| docs | ||
| guides | ||
| inventar | ||
| mana | ||
| manavoxel | ||
| memoro | ||
| moodlit | ||
| mukke | ||
| news | ||
| nutriphi | ||
| photos | ||
| picture | ||
| planta | ||
| presi | ||
| questions | ||
| skilltree | ||
| storage | ||
| times | ||
| todo | ||
| traces | ||
| uload | ||
| zitare/packages/content | ||