fix(generate-title): few-shot prompt + rolling cleanup + date label for short transcripts

User test on browser tier (Gemma 4 E2B) showed two compounding bugs: 1. The LLM produces empty content. The cleanup chain strips it to "" and falls through to runRules. 2. runRules takes the first 7 words of the transcript. For short voice memos like "So erneut eine kleine Testaufnahme hier" (6 words) that means the entire transcript becomes the title — not actually a title, just the recording verbatim. User log: [memoro] enqueued title task ... [generateTitle] LLM returned empty after cleanup, falling back to rules [memoro-llm-watcher] writing title to memo X: "So erneut eine kleine Testaufnahme hier" Three changes to fix the actual quality, not just the empty-string symptom from the previous commit: 1. Rewrite the LLM prompt as few-shot Replace the previous "Du erstellst kurze Titel — kein Markdown, keine Anführungszeichen, keine Vorrede, kein Punkt am Ende" prompt (a wall of negative constraints that small instruct models like Gemma 4 E2B handle poorly) with a few-shot user-only message: Erstelle einen kurzen Titel (3-5 Wörter) für die folgende Aufnahme. Beispiel 1: Aufnahme: "Erinnere mich daran, morgen Vormittag den Müll rauszubringen, bevor die Müllabfuhr kommt." Titel: Erinnerung Müll rausbringen Beispiel 2: ... (Idee Präsentation Demo-Start) Beispiel 3: ... (Steuererklärung 2025) Aufnahme: "<user transcript>" Titel: Small instruct models complete the pattern much more reliably than they obey negative constraints. The expected continuation is just the title text, no punctuation, no markdown, no preamble. 2. Rolling cleanup that won't go to empty The previous cleanup chain (`.trim().replace(quotes).replace(dots).trim()`) could end up with "" if the model emitted only `.` or `**.**` or similar. Replace with a four-stage chain that picks the FIRST non-empty stage from the bottom up: trimmed = result.content.trim() stripFences = first line only (kills any model rambling) stripQuotes = strip surrounding quotes/markdown markers stripDots = strip trailing dots cleaned = stripDots || stripQuotes || stripFences || trimmed This way "Test." → "Test" but `"."` → `"."` (kept as-is rather than stripped to empty). The runRules fallback only fires when the model truly emits nothing usable in any stage. 3. runRules is smarter about short transcripts For voice memos with ≤8 words in the first sentence, the "title" would just be the whole transcript echoed back. That's not useful. The new threshold: short transcripts get a date label instead ("Memo vom 9. April 2026"), longer ones still get the first-N-words snippet. The threshold is empirical — short voice memos benefit from a date marker, longer ones can spare a few words for a snippet. Extracted dateLabel() to a module-scope function so both rulesImpl (for empty/short transcripts) and the watcher's last-resort backstop can format dates consistently. Diagnostic: log the RAW LLM output before cleanup so the next test session shows exactly what Gemma is producing. If the model is still emitting only punctuation despite the few-shot prompt, the log will show `"\n"` or `"."` and we'll know the bug is in the inference path rather than the cleanup. After this commit, the user-visible result for a 6-word transcript on the browser tier should be: - LLM produces something real ("Test der Sprachaufnahme") → write it - LLM produces nothing → rules → "Memo vom 9. April 2026" - both fail somehow → watcher's date backstop → same - never the verbatim transcript Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-17 00:39:41 +02:00 · 2026-04-09 13:03:16 +02:00 · 2026-04-09 13:03:16 +02:00 · d8591b320b
commit d8591b320b
parent ea8ca13d37
1 changed files with 75 additions and 30 deletions
--- a/apps/mana/apps/web/src/lib/llm-tasks/generate-title.ts
+++ b/apps/mana/apps/web/src/lib/llm-tasks/generate-title.ts
@ -21,16 +21,41 @@ export interface GenerateTitleInput {

 export type GenerateTitleOutput = string;

+/** Date-based fallback label, e.g. "Memo vom 9. April 2026". Used when
+ *  the input is too short to produce a meaningful first-sentence title
+ *  or when rulesImpl gets called on empty/garbled input. */
+function dateLabel(): string {
+	const today = new Date();
+	const formatted = today.toLocaleDateString('de', {
+		day: 'numeric',
+		month: 'long',
+		year: 'numeric',
+	});
+	return `Memo vom ${formatted}`;
+}
+
 /** Deterministic first-sentence heuristic. Extracted to a module-scope
 *  function so runLlm can call it as a fallback when the LLM returns
 *  empty or whitespace-only output (which happens when the model emits
- *  only a `.` or special tokens that get stripped by skip_special_tokens). */
+ *  only a `.` or special tokens that get stripped by skip_special_tokens).
+ *
+ *  For very short transcripts (≤8 words), the "first sentence" IS the
+ *  whole transcript — using it as a title means the user sees their
+ *  recording verbatim, which isn't a title. In that case fall through
+ *  to the date label. The threshold is empirical: short voice memos
+ *  benefit from a date marker, longer ones can spare a few words for
+ *  a snippet. */
 function rulesImpl(input: GenerateTitleInput): string {
 	const text = input.text.trim();
-	if (!text) return 'Ohne Titel';
+	if (!text) return dateLabel();

 	// Take the first sentence — split on .!? or newline.
 	const firstSentence = text.split(/[.!?\n]/)[0]?.trim() ?? text;
+	const wordCount = firstSentence.split(/\s+/).filter(Boolean).length;
+
+	// Short transcripts: a date label is more honest than echoing the
+	// transcript back verbatim as if it were a title.
+	if (wordCount <= 8) return dateLabel();

 	// Cap at ~60 chars / maxWords words, whichever comes first.
 	const maxWords = input.maxWords ?? 7;
@ -41,7 +66,7 @@ function rulesImpl(input: GenerateTitleInput): string {
 		candidate = candidate.slice(0, 57).trimEnd() + '…';
 	}

-	return candidate || 'Ohne Titel';
+	return candidate || dateLabel();
 }

 export const generateTitleTask: LlmTask<GenerateTitleInput, GenerateTitleOutput> = {
@ -51,40 +76,60 @@ export const generateTitleTask: LlmTask<GenerateTitleInput, GenerateTitleOutput>
 	displayLabel: 'Titel automatisch erzeugen',

 	async runLlm(input, backend: LlmBackend): Promise<GenerateTitleOutput> {
-		const maxWords = input.maxWords ?? 7;
-		const language = input.language ?? 'de';
+		// Few-shot prompt — small instruct models like Gemma 4 E2B respond
+		// far better to "here's the pattern, complete the next one" than
+		// to a list of negative constraints ("no markdown, no quotes, no
+		// vorrede..."). The model just sees the structure and continues
+		// it. Empirically this produces real titles instead of single
+		// punctuation marks or empty special-token-only outputs.
+		const userMessage = `Erstelle einen kurzen, aussagekräftigen Titel (3-5 Wörter) für die folgende Sprachaufnahme.
+
+Beispiel 1:
+Aufnahme: "Erinnere mich daran, morgen Vormittag den Müll rauszubringen, bevor die Müllabfuhr kommt."
+Titel: Erinnerung Müll rausbringen
+
+Beispiel 2:
+Aufnahme: "Ich hatte heute eine Idee für die Präsentation nächste Woche, vielleicht sollten wir mit einer Demo anfangen statt mit Folien."
+Titel: Idee Präsentation Demo-Start
+
+Beispiel 3:
+Aufnahme: "Notiz für mich, ich muss noch die Steuererklärung für 2025 fertig machen, Belege liegen schon im Ordner."
+Titel: Steuererklärung 2025
+
+Aufnahme: "${input.text.slice(0, 2000).replace(/"/g, "'")}"
+Titel:`;
+
 		const result = await backend.generate({
 			taskName: generateTitleTask.name,
 			contentClass: generateTitleTask.contentClass,
-			messages: [
-				{
-					role: 'system',
-					content: `Du erstellst kurze, aussagekräftige Titel (max. ${maxWords} Wörter) für Texte. Sprache: ${language}. Antworte AUSSCHLIESSLICH mit dem Titel — kein Markdown, keine Anführungszeichen, keine Vorrede, kein Punkt am Ende.`,
-				},
-				{
-					role: 'user',
-					content: input.text.slice(0, 4000), // cap context for speed
-				},
-			],
-			temperature: 0.5,
-			maxTokens: 32,
+			messages: [{ role: 'user', content: userMessage }],
+			temperature: 0.4,
+			maxTokens: 24,
 		});

-		// Defensive: strip surrounding quotes / markdown / trailing dots in
-		// case the model didn't fully respect the system prompt.
-		const cleaned = result.content
-			.trim()
-			.replace(/^["'`*_]+|["'`*_]+$/g, '')
-			.replace(/\.+$/, '')
-			.trim();
+		// Log the raw model output BEFORE cleanup so the next test
+		// session can show us exactly what Gemma is producing if it
+		// still misbehaves.
+		console.info('[generateTitle] raw LLM output:', JSON.stringify(result.content));
+
+		// Cleanup chain — but rolled back if any step would empty the
+		// result. We'd rather keep a slightly imperfect title (with
+		// quotes, with a trailing dot) than lose it entirely.
+		const trimmed = result.content.trim();
+		const stripFences = trimmed.split('\n')[0]?.trim() ?? trimmed; // first line only
+		const stripQuotes = stripFences.replace(/^["'`*_]+|["'`*_]+$/g, '').trim();
+		const stripDots = stripQuotes.replace(/\.+$/, '').trim();
+
+		// Walk the chain and pick the first non-empty stage. This way
+		// even if the model emits `"."` we still get something via the
+		// trimmed stage; only if EVERY stage is empty do we fall through
+		// to the rules implementation.
+		const cleaned = stripDots || stripQuotes || stripFences || trimmed;

-		// LLM produced nothing usable (empty content, only punctuation,
-		// only special tokens that got stripped, etc.) — fall back to the
-		// deterministic rules implementation so the user gets *something*.
-		// Without this fallback the watcher writes "" to memo.title and the
-		// user sees an empty placeholder forever.
 		if (!cleaned) {
-			console.info('[generateTitle] LLM returned empty after cleanup, falling back to rules');
+			console.info(
+				'[generateTitle] LLM returned empty after all cleanup stages, falling back to rules'
+			);
 			return rulesImpl(input);
 		}