fix(voice/parse-task): guard against gemma3:4b hallucinating dueDate + priority

Real end-to-end testing against the now-working local mana-llm surfaced two model behaviours the prompt couldn't talk down: 1. gemma3:4b stamps today's date on every task that doesn't have a real time anchor. "Mülltonnen rausstellen" came back with dueDate=2026-04-08 and priority=low even though the prompt explicitly said "MUST be null when no date is mentioned". After typing "Buy milk" the user would silently get a today-due task, which is worse than no parsing at all. 2. The model occasionally returns dueDate as a full ISO timestamp ("2026-04-09T14:00:00") when the transcript mentions a time. The coerce regex previously matched the prefix and let the timestamp through unchanged, which then breaks the YYYY-MM-DD-shaped Dexie field downstream. Fix: deterministic post-processing in coerce. The prompt is also tightened with explicit "ONLY when…" rules but the guards are the load-bearing change since gemma3:4b ignores prompt restrictions. - Strict YYYY-MM-DD extraction: a leading-anchor regex match keeps only the date prefix even if the model adds a time component. - DATE_TRIGGER_PATTERNS: substring scan over the original transcript for German + English date words. If the LLM returned a dueDate but the transcript has zero matches, drop the date — it was a hallucination. False positives are preferable to false negatives: letting through a fake date is more annoying than suppressing a real one the user can re-type. - PRIORITY_TRIGGER_PATTERNS: same idea for priority. The model thinks taxes are inherently urgent; we don't want to inherit that opinion. The labels field is left noisy on purpose — "müll", "unbedingt", "erledigen" all come back from a single transcript and only the ones that fuzzy-match an existing workspace tag end up on the task, so filtering filler words at this layer would be wasted work. Verified against five transcripts spanning bare/explicit/relative date in DE + EN. Real LLM round-trip via http://localhost:5173 → https://llm.mana.how → ollama gemma3:4b. Local mana-llm now reaches its Ollama backend after the gpu-proxy routing fix in 7f382138a. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-19 11:41:23 +02:00 · 2026-04-08 16:50:19 +02:00 · 2026-04-08 16:50:19 +02:00 · b505024f7b
commit b505024f7b
parent bfeeef7819
1 changed files with 136 additions and 11 deletions
--- a/apps/mana/apps/web/src/routes/api/v1/voice/parse-task/+server.ts
+++ b/apps/mana/apps/web/src/routes/api/v1/voice/parse-task/+server.ts
@ -45,28 +45,153 @@ function buildPrompt(transcript: string, language: string): string {
 		'',
 		'Extract the following fields and return ONLY a JSON object with these exact keys:',
 		'  - title: short imperative title without filler words (string, required)',
-		'  - dueDate: ISO date YYYY-MM-DD or null if no date is mentioned',
-		'  - priority: "low" | "medium" | "high" | null',
-		'  - labels: array of short topic labels (max 3, lowercase, may be empty)',
+		'  - dueDate: ISO date YYYY-MM-DD',
+		'  - priority: "low" | "medium" | "high"',
+		'  - labels: array of short topic labels (max 3, lowercase)',
 		'',
-		'Rules:',
-		'- Resolve relative dates ("morgen", "tomorrow", "nächsten Montag") against today.',
-		'- If only a time is mentioned, assume today.',
-		'- Never invent details. If unsure, use null / empty array.',
+		'Rules — read carefully, the model often gets these wrong:',
+		'- dueDate: ONLY set this when the transcript explicitly mentions a',
+		'  date, weekday, or relative time word ("morgen", "tomorrow",',
+		'  "nächsten Montag", "heute Abend", "in zwei Wochen"). For a bare',
+		'  task like "Mülltonnen rausstellen" with no time at all, dueDate',
+		'  MUST be null. Never default to today just because the task feels',
+		'  like a today-thing.',
+		'- priority: ONLY set this when the transcript uses urgency or',
+		'  importance words ("dringend", "wichtig", "unbedingt", "asap",',
+		'  "low priority", "kann warten"). For a neutral task, priority',
+		'  MUST be null. Never guess from the topic.',
+		'- labels: ONLY include labels that come directly from concrete',
+		'  topic words in the transcript. For "Mülltonnen rausstellen",',
+		'  "müll" is fine but "haushalt" is a stretch — when in doubt,',
+		'  empty array. Max 3 labels, single words preferred.',
+		'- Resolve relative dates against today for the dueDate field.',
+		'- If only a time is mentioned (e.g. "um 14 Uhr"), assume today.',
+		'- title: a short imperative ("Steuererklärung machen", not',
+		'  "Erinnere mich an die Steuererklärung").',
 		'- Output JSON only, no markdown, no commentary, no code fences.',
+		'- Use null (literal, not the string "null") for absent fields.',
 		'',
 		`Transcript: ${JSON.stringify(transcript)}`,
 	].join('\n');
 }

+/**
+ * Words that signal "the user actually mentioned a time-anchor".
+ * We use this to override the LLM when it hallucinates a dueDate from
+ * a transcript that has no date words at all — gemma3:4b is stubborn
+ * about defaulting to today even when the prompt explicitly says null.
+ *
+ * Match is substring-based on the lowercased transcript, so we catch
+ * "morgens" via "morgen", "heutzutage" via "heut", etc. False positives
+ * are preferable to false negatives here: if we let through a
+ * transcript with no real date, the user gets an unwanted dueDate;
+ * if we suppress the LLM on a transcript that DOES have a date, the
+ * user just sees no date and can fix it in two clicks.
+ */
+const DATE_TRIGGER_PATTERNS = [
+	// German
+	'heut',
+	'morgen',
+	'übermorgen',
+	'gestern',
+	'montag',
+	'dienstag',
+	'mittwoch',
+	'donnerstag',
+	'freitag',
+	'samstag',
+	'sonntag',
+	'wochenende',
+	'nächste',
+	'nachste',
+	'kommende',
+	'in einer woche',
+	'in zwei',
+	'in drei',
+	'in einem monat',
+	'um ',
+	'uhr',
+	' am ',
+	// English
+	'today',
+	'tomorrow',
+	'yesterday',
+	'monday',
+	'tuesday',
+	'wednesday',
+	'thursday',
+	'friday',
+	'saturday',
+	'sunday',
+	'weekend',
+	'next ',
+	'in a week',
+	'in two ',
+	'in three ',
+	'at ',
+	'pm',
+	'am ',
+	'tonight',
+];
+
+const PRIORITY_TRIGGER_PATTERNS = [
+	// German
+	'dringend',
+	'wichtig',
+	'unbedingt',
+	'sofort',
+	'asap',
+	'kann warten',
+	'eilt',
+	'priorität',
+	// English
+	'urgent',
+	'important',
+	'immediately',
+	'critical',
+	'low priority',
+	'high priority',
+	'whenever',
+];
+
+function transcriptMentions(transcript: string, patterns: string[]): boolean {
+	const lower = transcript.toLowerCase();
+	return patterns.some((p) => lower.includes(p));
+}
+
 function coerce(raw: unknown, transcript: string): ParseResult {
 	if (!raw || typeof raw !== 'object') return fallback(transcript);
 	const r = raw as Record<string, unknown>;
 	const title = typeof r.title === 'string' && r.title.trim() ? r.title.trim() : transcript.trim();
-	const dueDate =
-		typeof r.dueDate === 'string' && /^\d{4}-\d{2}-\d{2}/.test(r.dueDate) ? r.dueDate : null;
-	const priority =
-		r.priority === 'low' || r.priority === 'medium' || r.priority === 'high' ? r.priority : null;
+
+	// Strict YYYY-MM-DD only — strip any time component the model adds
+	// ("2026-04-09T14:00:00" → "2026-04-09"). Reject anything that
+	// doesn't start with a 10-char ISO date.
+	let dueDate: string | null = null;
+	if (typeof r.dueDate === 'string') {
+		const m = r.dueDate.match(/^(\d{4}-\d{2}-\d{2})/);
+		if (m) dueDate = m[1];
+	}
+	// Override: if the transcript has zero date trigger words, the
+	// LLM hallucinated a dueDate. Drop it. This guard exists because
+	// gemma3:4b consistently emits today's date for plain tasks like
+	// "Mülltonnen rausstellen" no matter how loudly the prompt says
+	// "null when no date is mentioned".
+	if (dueDate && !transcriptMentions(transcript, DATE_TRIGGER_PATTERNS)) {
+		dueDate = null;
+	}
+
+	let priority: 'low' | 'medium' | 'high' | null = null;
+	if (r.priority === 'low' || r.priority === 'medium' || r.priority === 'high') {
+		priority = r.priority;
+	}
+	// Same hallucination guard for priority — neutral transcripts
+	// shouldn't end up "high" because the model thinks taxes are
+	// inherently urgent.
+	if (priority && !transcriptMentions(transcript, PRIORITY_TRIGGER_PATTERNS)) {
+		priority = null;
+	}
+
 	const labels = Array.isArray(r.labels)
 		? r.labels.filter((l): l is string => typeof l === 'string').slice(0, 3)
 		: [];