From 8adef1b39c507a4dfe5994e5e0234f5f39e4967a Mon Sep 17 00:00:00 2001
From: Till JS <tills95@gmail.com>
Date: Thu, 9 Apr 2026 16:29:22 +0200
Subject: [PATCH] fix(shared-llm): fall back to message.reasoning when content
 is empty
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Reasoning-style models (Gemma 4 E4B is the first one we use, but
DeepSeek R1, Gemini 2.5 thinking, etc. behave the same way) split
their output into two fields:
  - message.content   — the final answer
  - message.reasoning — the chain-of-thought leading up to it

When the model is given too few max_tokens to finish reasoning AND
emit content, the response comes back with content="" and reasoning
populated with the half-finished thought. Verified empirically with
gemma4:e4b and `max_tokens: 10` on a "Sage Hi auf Deutsch in einem
Wort" prompt — content was "" while reasoning had "Here's a
thinking process to..." (cut off mid-thought).

For the title task this rarely matters because the system prompt is
directive enough to skip the thinking phase (verified: same gemma4:
e4b returns clean 7-token titles like "Sonnenstrahlen genießen
heute" with the standard system prompt + max_tokens 32). But it's
a real failure mode for any future task that uses a less-directive
prompt or hits a longer reasoning chain.

Defensive fix: prefer message.content first, fall back to
message.reasoning if content is empty. The fallback is a string-or-
nothing operation, no semantic interpretation — if the reasoning
field happens to contain a usable answer fragment, the caller's
cleanup chain (e.g. generateTitleTask's strip-quotes-and-dots
pipeline) will normalize it. If it's truly half-finished thought,
the caller's runRules fallback still kicks in via the existing
empty-result detection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 packages/shared-llm/src/backends/remote.ts | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/packages/shared-llm/src/backends/remote.ts b/packages/shared-llm/src/backends/remote.ts
index d6f97dfe8..80385de78 100644
--- a/packages/shared-llm/src/backends/remote.ts
+++ b/packages/shared-llm/src/backends/remote.ts
@@ -104,7 +104,7 @@ export async function callManaLlmStreaming(
 
 	let json: {
 		choices?: Array<{
-			message?: { content?: string };
+			message?: { content?: string; reasoning?: string };
 			text?: string;
 		}>;
 		usage?: { prompt_tokens?: number; completion_tokens?: number };
@@ -116,8 +116,17 @@ export async function callManaLlmStreaming(
 		throw new BackendUnreachableError(tier, res.status, 'invalid JSON response');
 	}
 
+	// Field ordering: prefer the canonical OpenAI `message.content` first.
+	// If that's empty AND `message.reasoning` is set, fall back to it —
+	// reasoning models like Gemma 4 emit their thought process there
+	// when given too few tokens to also produce a final answer (we hit
+	// this with max_tokens=10 / no system prompt: content was "" while
+	// reasoning had the half-finished thought). For our title task this
+	// rarely happens because the system prompt is directive, but the
+	// fallback is cheap and protects against future tasks that might
+	// trigger longer reasoning chains.
 	const choice = json.choices?.[0];
-	const content = choice?.message?.content ?? choice?.text ?? '';
+	const content = choice?.message?.content ?? choice?.message?.reasoning ?? choice?.text ?? '';
 
 	if (!content) {
 		console.warn(`[shared-llm:${tier}] empty completion content`, { model, json });