fix(voice/parse-task): guard against gemma3:4b hallucinating dueDate + priority

Real end-to-end testing against the now-working local mana-llm
surfaced two model behaviours the prompt couldn't talk down:

1. gemma3:4b stamps today's date on every task that doesn't have a
   real time anchor. "Mülltonnen rausstellen" came back with
   dueDate=2026-04-08 and priority=low even though the prompt
   explicitly said "MUST be null when no date is mentioned". After
   typing "Buy milk" the user would silently get a today-due task,
   which is worse than no parsing at all.

2. The model occasionally returns dueDate as a full ISO timestamp
   ("2026-04-09T14:00:00") when the transcript mentions a time. The
   coerce regex previously matched the prefix and let the timestamp
   through unchanged, which then breaks the YYYY-MM-DD-shaped Dexie
   field downstream.

Fix: deterministic post-processing in coerce. The prompt is also
tightened with explicit "ONLY when…" rules but the guards are the
load-bearing change since gemma3:4b ignores prompt restrictions.

- Strict YYYY-MM-DD extraction: a leading-anchor regex match keeps
  only the date prefix even if the model adds a time component.
- DATE_TRIGGER_PATTERNS: substring scan over the original transcript
  for German + English date words. If the LLM returned a dueDate but
  the transcript has zero matches, drop the date — it was a
  hallucination. False positives are preferable to false negatives:
  letting through a fake date is more annoying than suppressing a
  real one the user can re-type.
- PRIORITY_TRIGGER_PATTERNS: same idea for priority. The model thinks
  taxes are inherently urgent; we don't want to inherit that opinion.

The labels field is left noisy on purpose — "müll", "unbedingt",
"erledigen" all come back from a single transcript and only the ones
that fuzzy-match an existing workspace tag end up on the task, so
filtering filler words at this layer would be wasted work.

Verified against five transcripts spanning bare/explicit/relative
date in DE + EN. Real LLM round-trip via http://localhost:5173https://llm.mana.how → ollama gemma3:4b. Local mana-llm now reaches
its Ollama backend after the gpu-proxy routing fix in 7f382138a.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-08 16:50:19 +02:00
parent bfeeef7819
commit b505024f7b

View file

@ -45,28 +45,153 @@ function buildPrompt(transcript: string, language: string): string {
'',
'Extract the following fields and return ONLY a JSON object with these exact keys:',
' - title: short imperative title without filler words (string, required)',
' - dueDate: ISO date YYYY-MM-DD or null if no date is mentioned',
' - priority: "low" | "medium" | "high" | null',
' - labels: array of short topic labels (max 3, lowercase, may be empty)',
' - dueDate: ISO date YYYY-MM-DD',
' - priority: "low" | "medium" | "high"',
' - labels: array of short topic labels (max 3, lowercase)',
'',
'Rules:',
'- Resolve relative dates ("morgen", "tomorrow", "nächsten Montag") against today.',
'- If only a time is mentioned, assume today.',
'- Never invent details. If unsure, use null / empty array.',
'Rules — read carefully, the model often gets these wrong:',
'- dueDate: ONLY set this when the transcript explicitly mentions a',
' date, weekday, or relative time word ("morgen", "tomorrow",',
' "nächsten Montag", "heute Abend", "in zwei Wochen"). For a bare',
' task like "Mülltonnen rausstellen" with no time at all, dueDate',
' MUST be null. Never default to today just because the task feels',
' like a today-thing.',
'- priority: ONLY set this when the transcript uses urgency or',
' importance words ("dringend", "wichtig", "unbedingt", "asap",',
' "low priority", "kann warten"). For a neutral task, priority',
' MUST be null. Never guess from the topic.',
'- labels: ONLY include labels that come directly from concrete',
' topic words in the transcript. For "Mülltonnen rausstellen",',
' "müll" is fine but "haushalt" is a stretch — when in doubt,',
' empty array. Max 3 labels, single words preferred.',
'- Resolve relative dates against today for the dueDate field.',
'- If only a time is mentioned (e.g. "um 14 Uhr"), assume today.',
'- title: a short imperative ("Steuererklärung machen", not',
' "Erinnere mich an die Steuererklärung").',
'- Output JSON only, no markdown, no commentary, no code fences.',
'- Use null (literal, not the string "null") for absent fields.',
'',
`Transcript: ${JSON.stringify(transcript)}`,
].join('\n');
}
/**
* Words that signal "the user actually mentioned a time-anchor".
* We use this to override the LLM when it hallucinates a dueDate from
* a transcript that has no date words at all gemma3:4b is stubborn
* about defaulting to today even when the prompt explicitly says null.
*
* Match is substring-based on the lowercased transcript, so we catch
* "morgens" via "morgen", "heutzutage" via "heut", etc. False positives
* are preferable to false negatives here: if we let through a
* transcript with no real date, the user gets an unwanted dueDate;
* if we suppress the LLM on a transcript that DOES have a date, the
* user just sees no date and can fix it in two clicks.
*/
const DATE_TRIGGER_PATTERNS = [
// German
'heut',
'morgen',
'übermorgen',
'gestern',
'montag',
'dienstag',
'mittwoch',
'donnerstag',
'freitag',
'samstag',
'sonntag',
'wochenende',
'nächste',
'nachste',
'kommende',
'in einer woche',
'in zwei',
'in drei',
'in einem monat',
'um ',
'uhr',
' am ',
// English
'today',
'tomorrow',
'yesterday',
'monday',
'tuesday',
'wednesday',
'thursday',
'friday',
'saturday',
'sunday',
'weekend',
'next ',
'in a week',
'in two ',
'in three ',
'at ',
'pm',
'am ',
'tonight',
];
const PRIORITY_TRIGGER_PATTERNS = [
// German
'dringend',
'wichtig',
'unbedingt',
'sofort',
'asap',
'kann warten',
'eilt',
'priorität',
// English
'urgent',
'important',
'immediately',
'critical',
'low priority',
'high priority',
'whenever',
];
function transcriptMentions(transcript: string, patterns: string[]): boolean {
const lower = transcript.toLowerCase();
return patterns.some((p) => lower.includes(p));
}
function coerce(raw: unknown, transcript: string): ParseResult {
if (!raw || typeof raw !== 'object') return fallback(transcript);
const r = raw as Record<string, unknown>;
const title = typeof r.title === 'string' && r.title.trim() ? r.title.trim() : transcript.trim();
const dueDate =
typeof r.dueDate === 'string' && /^\d{4}-\d{2}-\d{2}/.test(r.dueDate) ? r.dueDate : null;
const priority =
r.priority === 'low' || r.priority === 'medium' || r.priority === 'high' ? r.priority : null;
// Strict YYYY-MM-DD only — strip any time component the model adds
// ("2026-04-09T14:00:00" → "2026-04-09"). Reject anything that
// doesn't start with a 10-char ISO date.
let dueDate: string | null = null;
if (typeof r.dueDate === 'string') {
const m = r.dueDate.match(/^(\d{4}-\d{2}-\d{2})/);
if (m) dueDate = m[1];
}
// Override: if the transcript has zero date trigger words, the
// LLM hallucinated a dueDate. Drop it. This guard exists because
// gemma3:4b consistently emits today's date for plain tasks like
// "Mülltonnen rausstellen" no matter how loudly the prompt says
// "null when no date is mentioned".
if (dueDate && !transcriptMentions(transcript, DATE_TRIGGER_PATTERNS)) {
dueDate = null;
}
let priority: 'low' | 'medium' | 'high' | null = null;
if (r.priority === 'low' || r.priority === 'medium' || r.priority === 'high') {
priority = r.priority;
}
// Same hallucination guard for priority — neutral transcripts
// shouldn't end up "high" because the model thinks taxes are
// inherently urgent.
if (priority && !transcriptMentions(transcript, PRIORITY_TRIGGER_PATTERNS)) {
priority = null;
}
const labels = Array.isArray(r.labels)
? r.labels.filter((l): l is string => typeof l === 'string').slice(0, 3)
: [];