mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-19 11:41:23 +02:00
fix(voice/parse-task): guard against gemma3:4b hallucinating dueDate + priority
Real end-to-end testing against the now-working local mana-llm
surfaced two model behaviours the prompt couldn't talk down:
1. gemma3:4b stamps today's date on every task that doesn't have a
real time anchor. "Mülltonnen rausstellen" came back with
dueDate=2026-04-08 and priority=low even though the prompt
explicitly said "MUST be null when no date is mentioned". After
typing "Buy milk" the user would silently get a today-due task,
which is worse than no parsing at all.
2. The model occasionally returns dueDate as a full ISO timestamp
("2026-04-09T14:00:00") when the transcript mentions a time. The
coerce regex previously matched the prefix and let the timestamp
through unchanged, which then breaks the YYYY-MM-DD-shaped Dexie
field downstream.
Fix: deterministic post-processing in coerce. The prompt is also
tightened with explicit "ONLY when…" rules but the guards are the
load-bearing change since gemma3:4b ignores prompt restrictions.
- Strict YYYY-MM-DD extraction: a leading-anchor regex match keeps
only the date prefix even if the model adds a time component.
- DATE_TRIGGER_PATTERNS: substring scan over the original transcript
for German + English date words. If the LLM returned a dueDate but
the transcript has zero matches, drop the date — it was a
hallucination. False positives are preferable to false negatives:
letting through a fake date is more annoying than suppressing a
real one the user can re-type.
- PRIORITY_TRIGGER_PATTERNS: same idea for priority. The model thinks
taxes are inherently urgent; we don't want to inherit that opinion.
The labels field is left noisy on purpose — "müll", "unbedingt",
"erledigen" all come back from a single transcript and only the ones
that fuzzy-match an existing workspace tag end up on the task, so
filtering filler words at this layer would be wasted work.
Verified against five transcripts spanning bare/explicit/relative
date in DE + EN. Real LLM round-trip via http://localhost:5173 →
https://llm.mana.how → ollama gemma3:4b. Local mana-llm now reaches
its Ollama backend after the gpu-proxy routing fix in 7f382138a.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
bfeeef7819
commit
b505024f7b
1 changed files with 136 additions and 11 deletions
|
|
@ -45,28 +45,153 @@ function buildPrompt(transcript: string, language: string): string {
|
|||
'',
|
||||
'Extract the following fields and return ONLY a JSON object with these exact keys:',
|
||||
' - title: short imperative title without filler words (string, required)',
|
||||
' - dueDate: ISO date YYYY-MM-DD or null if no date is mentioned',
|
||||
' - priority: "low" | "medium" | "high" | null',
|
||||
' - labels: array of short topic labels (max 3, lowercase, may be empty)',
|
||||
' - dueDate: ISO date YYYY-MM-DD',
|
||||
' - priority: "low" | "medium" | "high"',
|
||||
' - labels: array of short topic labels (max 3, lowercase)',
|
||||
'',
|
||||
'Rules:',
|
||||
'- Resolve relative dates ("morgen", "tomorrow", "nächsten Montag") against today.',
|
||||
'- If only a time is mentioned, assume today.',
|
||||
'- Never invent details. If unsure, use null / empty array.',
|
||||
'Rules — read carefully, the model often gets these wrong:',
|
||||
'- dueDate: ONLY set this when the transcript explicitly mentions a',
|
||||
' date, weekday, or relative time word ("morgen", "tomorrow",',
|
||||
' "nächsten Montag", "heute Abend", "in zwei Wochen"). For a bare',
|
||||
' task like "Mülltonnen rausstellen" with no time at all, dueDate',
|
||||
' MUST be null. Never default to today just because the task feels',
|
||||
' like a today-thing.',
|
||||
'- priority: ONLY set this when the transcript uses urgency or',
|
||||
' importance words ("dringend", "wichtig", "unbedingt", "asap",',
|
||||
' "low priority", "kann warten"). For a neutral task, priority',
|
||||
' MUST be null. Never guess from the topic.',
|
||||
'- labels: ONLY include labels that come directly from concrete',
|
||||
' topic words in the transcript. For "Mülltonnen rausstellen",',
|
||||
' "müll" is fine but "haushalt" is a stretch — when in doubt,',
|
||||
' empty array. Max 3 labels, single words preferred.',
|
||||
'- Resolve relative dates against today for the dueDate field.',
|
||||
'- If only a time is mentioned (e.g. "um 14 Uhr"), assume today.',
|
||||
'- title: a short imperative ("Steuererklärung machen", not',
|
||||
' "Erinnere mich an die Steuererklärung").',
|
||||
'- Output JSON only, no markdown, no commentary, no code fences.',
|
||||
'- Use null (literal, not the string "null") for absent fields.',
|
||||
'',
|
||||
`Transcript: ${JSON.stringify(transcript)}`,
|
||||
].join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* Words that signal "the user actually mentioned a time-anchor".
|
||||
* We use this to override the LLM when it hallucinates a dueDate from
|
||||
* a transcript that has no date words at all — gemma3:4b is stubborn
|
||||
* about defaulting to today even when the prompt explicitly says null.
|
||||
*
|
||||
* Match is substring-based on the lowercased transcript, so we catch
|
||||
* "morgens" via "morgen", "heutzutage" via "heut", etc. False positives
|
||||
* are preferable to false negatives here: if we let through a
|
||||
* transcript with no real date, the user gets an unwanted dueDate;
|
||||
* if we suppress the LLM on a transcript that DOES have a date, the
|
||||
* user just sees no date and can fix it in two clicks.
|
||||
*/
|
||||
const DATE_TRIGGER_PATTERNS = [
|
||||
// German
|
||||
'heut',
|
||||
'morgen',
|
||||
'übermorgen',
|
||||
'gestern',
|
||||
'montag',
|
||||
'dienstag',
|
||||
'mittwoch',
|
||||
'donnerstag',
|
||||
'freitag',
|
||||
'samstag',
|
||||
'sonntag',
|
||||
'wochenende',
|
||||
'nächste',
|
||||
'nachste',
|
||||
'kommende',
|
||||
'in einer woche',
|
||||
'in zwei',
|
||||
'in drei',
|
||||
'in einem monat',
|
||||
'um ',
|
||||
'uhr',
|
||||
' am ',
|
||||
// English
|
||||
'today',
|
||||
'tomorrow',
|
||||
'yesterday',
|
||||
'monday',
|
||||
'tuesday',
|
||||
'wednesday',
|
||||
'thursday',
|
||||
'friday',
|
||||
'saturday',
|
||||
'sunday',
|
||||
'weekend',
|
||||
'next ',
|
||||
'in a week',
|
||||
'in two ',
|
||||
'in three ',
|
||||
'at ',
|
||||
'pm',
|
||||
'am ',
|
||||
'tonight',
|
||||
];
|
||||
|
||||
const PRIORITY_TRIGGER_PATTERNS = [
|
||||
// German
|
||||
'dringend',
|
||||
'wichtig',
|
||||
'unbedingt',
|
||||
'sofort',
|
||||
'asap',
|
||||
'kann warten',
|
||||
'eilt',
|
||||
'priorität',
|
||||
// English
|
||||
'urgent',
|
||||
'important',
|
||||
'immediately',
|
||||
'critical',
|
||||
'low priority',
|
||||
'high priority',
|
||||
'whenever',
|
||||
];
|
||||
|
||||
function transcriptMentions(transcript: string, patterns: string[]): boolean {
|
||||
const lower = transcript.toLowerCase();
|
||||
return patterns.some((p) => lower.includes(p));
|
||||
}
|
||||
|
||||
function coerce(raw: unknown, transcript: string): ParseResult {
|
||||
if (!raw || typeof raw !== 'object') return fallback(transcript);
|
||||
const r = raw as Record<string, unknown>;
|
||||
const title = typeof r.title === 'string' && r.title.trim() ? r.title.trim() : transcript.trim();
|
||||
const dueDate =
|
||||
typeof r.dueDate === 'string' && /^\d{4}-\d{2}-\d{2}/.test(r.dueDate) ? r.dueDate : null;
|
||||
const priority =
|
||||
r.priority === 'low' || r.priority === 'medium' || r.priority === 'high' ? r.priority : null;
|
||||
|
||||
// Strict YYYY-MM-DD only — strip any time component the model adds
|
||||
// ("2026-04-09T14:00:00" → "2026-04-09"). Reject anything that
|
||||
// doesn't start with a 10-char ISO date.
|
||||
let dueDate: string | null = null;
|
||||
if (typeof r.dueDate === 'string') {
|
||||
const m = r.dueDate.match(/^(\d{4}-\d{2}-\d{2})/);
|
||||
if (m) dueDate = m[1];
|
||||
}
|
||||
// Override: if the transcript has zero date trigger words, the
|
||||
// LLM hallucinated a dueDate. Drop it. This guard exists because
|
||||
// gemma3:4b consistently emits today's date for plain tasks like
|
||||
// "Mülltonnen rausstellen" no matter how loudly the prompt says
|
||||
// "null when no date is mentioned".
|
||||
if (dueDate && !transcriptMentions(transcript, DATE_TRIGGER_PATTERNS)) {
|
||||
dueDate = null;
|
||||
}
|
||||
|
||||
let priority: 'low' | 'medium' | 'high' | null = null;
|
||||
if (r.priority === 'low' || r.priority === 'medium' || r.priority === 'high') {
|
||||
priority = r.priority;
|
||||
}
|
||||
// Same hallucination guard for priority — neutral transcripts
|
||||
// shouldn't end up "high" because the model thinks taxes are
|
||||
// inherently urgent.
|
||||
if (priority && !transcriptMentions(transcript, PRIORITY_TRIGGER_PATTERNS)) {
|
||||
priority = null;
|
||||
}
|
||||
|
||||
const labels = Array.isArray(r.labels)
|
||||
? r.labels.filter((l): l is string => typeof l === 'string').slice(0, 3)
|
||||
: [];
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue