mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:01:09 +02:00
fix(shared-llm): fall back to message.reasoning when content is empty
Reasoning-style models (Gemma 4 E4B is the first one we use, but DeepSeek R1, Gemini 2.5 thinking, etc. behave the same way) split their output into two fields: - message.content — the final answer - message.reasoning — the chain-of-thought leading up to it When the model is given too few max_tokens to finish reasoning AND emit content, the response comes back with content="" and reasoning populated with the half-finished thought. Verified empirically with gemma4:e4b and `max_tokens: 10` on a "Sage Hi auf Deutsch in einem Wort" prompt — content was "" while reasoning had "Here's a thinking process to..." (cut off mid-thought). For the title task this rarely matters because the system prompt is directive enough to skip the thinking phase (verified: same gemma4: e4b returns clean 7-token titles like "Sonnenstrahlen genießen heute" with the standard system prompt + max_tokens 32). But it's a real failure mode for any future task that uses a less-directive prompt or hits a longer reasoning chain. Defensive fix: prefer message.content first, fall back to message.reasoning if content is empty. The fallback is a string-or- nothing operation, no semantic interpretation — if the reasoning field happens to contain a usable answer fragment, the caller's cleanup chain (e.g. generateTitleTask's strip-quotes-and-dots pipeline) will normalize it. If it's truly half-finished thought, the caller's runRules fallback still kicks in via the existing empty-result detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b2db42bb26
commit
8adef1b39c
1 changed files with 11 additions and 2 deletions
|
|
@ -104,7 +104,7 @@ export async function callManaLlmStreaming(
|
|||
|
||||
let json: {
|
||||
choices?: Array<{
|
||||
message?: { content?: string };
|
||||
message?: { content?: string; reasoning?: string };
|
||||
text?: string;
|
||||
}>;
|
||||
usage?: { prompt_tokens?: number; completion_tokens?: number };
|
||||
|
|
@ -116,8 +116,17 @@ export async function callManaLlmStreaming(
|
|||
throw new BackendUnreachableError(tier, res.status, 'invalid JSON response');
|
||||
}
|
||||
|
||||
// Field ordering: prefer the canonical OpenAI `message.content` first.
|
||||
// If that's empty AND `message.reasoning` is set, fall back to it —
|
||||
// reasoning models like Gemma 4 emit their thought process there
|
||||
// when given too few tokens to also produce a final answer (we hit
|
||||
// this with max_tokens=10 / no system prompt: content was "" while
|
||||
// reasoning had the half-finished thought). For our title task this
|
||||
// rarely happens because the system prompt is directive, but the
|
||||
// fallback is cheap and protects against future tasks that might
|
||||
// trigger longer reasoning chains.
|
||||
const choice = json.choices?.[0];
|
||||
const content = choice?.message?.content ?? choice?.text ?? '';
|
||||
const content = choice?.message?.content ?? choice?.message?.reasoning ?? choice?.text ?? '';
|
||||
|
||||
if (!content) {
|
||||
console.warn(`[shared-llm:${tier}] empty completion content`, { model, json });
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue