From 8a5d200c84b907dfae817a17cb4129e237a98b27 Mon Sep 17 00:00:00 2001
From: Till JS <tills95@gmail.com>
Date: Thu, 16 Apr 2026 00:55:18 +0200
Subject: [PATCH] =?UTF-8?q?fix(ai):=20bump=20planner=20maxTokens=201024?=
 =?UTF-8?q?=E2=86=924096=20+=20teach=20prompt=20about=20the=20loop?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Debug log from a "tag 4 notes" mission showed the planner's second-round
response truncated mid-step: it was proposing one add_tag_to_note per
listed note but ran out of tokens halfway through note #2. Parser
rejected the malformed JSON → loop exited with 0 staged, user saw
nothing to approve.

Raising maxTokens to 4096 fits ~15-20 step objects, which covers the
batch-tagging / batch-save pattern the reasoning loop is designed for.

Also updating the system prompt so the planner actually knows about
the loop it's running inside: read-only tools are announced as
auto-executing with outputs visible next turn, and a new rule makes
explicit that batch jobs must emit all write-steps in one plan (because
staging a propose-tool ends the turn). Step count raised 1-5 → 1-10.

Prompt snapshot tests still pass (they check structure, not text).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 apps/mana/apps/web/src/lib/llm-tasks/ai-plan.ts | 7 ++++++-
 packages/shared-ai/src/planner/prompt.ts        | 9 +++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/apps/mana/apps/web/src/lib/llm-tasks/ai-plan.ts b/apps/mana/apps/web/src/lib/llm-tasks/ai-plan.ts
index f4fe7a390..ee7104fbc 100644
--- a/apps/mana/apps/web/src/lib/llm-tasks/ai-plan.ts
+++ b/apps/mana/apps/web/src/lib/llm-tasks/ai-plan.ts
@@ -45,7 +45,12 @@ export const aiPlanTask: LlmTask<AiPlanInput, AiPlanOutput> = {
 				{ role: 'user', content: user },
 			],
 			temperature: 0.3,
-			maxTokens: 1024,
+			// 1024 truncates mid-response when the planner proposes 3+ steps with
+			// rich rationales — the reasoning loop amplifies this because a
+			// single round can legitimately stage one step per listed item
+			// (e.g. 10 notes → 10 add_tag_to_note calls). 4096 fits ~15-20
+			// step objects while still fast on browser tier.
+			maxTokens: 4096,
 		});
 
 		// Always populate debug payload (cheap — strings already in memory).
diff --git a/packages/shared-ai/src/planner/prompt.ts b/packages/shared-ai/src/planner/prompt.ts
index 319a112d9..69939f3be 100644
--- a/packages/shared-ai/src/planner/prompt.ts
+++ b/packages/shared-ai/src/planner/prompt.ts
@@ -40,13 +40,14 @@ function buildSystemPrompt(input: AiPlanInput): string {
 
 	return `Du bist eine KI, die im Auftrag des Nutzers an einer langlebigen Mission arbeitet.
 
-Dein Job: aus dem aktuellen Mission-Kontext einen kurzen, konkreten Plan ableiten — 1 bis 5 Schritte, jeder ein Tool-Aufruf auf Nutzerdaten. Jeder Schritt MUSS eine Begründung haben (rationale), die der Nutzer in der Review-UI sieht.
+Dein Job: aus dem aktuellen Mission-Kontext einen konkreten Plan ableiten — 1 bis 10 Schritte, jeder ein Tool-Aufruf auf Nutzerdaten. Jeder Schritt MUSS eine Begründung haben (rationale), die der Nutzer in der Review-UI sieht.
 
 Wichtige Regeln:
 1. Nutze NUR Tools aus der Liste unten. Unbekannte Tools → Plan invalide.
-2. Jeder Step wird als Proposal gestaged — der Nutzer approved oder rejected. Du schreibst nie direkt.
-3. Berücksichtige das Feedback aus vorherigen Iterationen (unten im User-Prompt). Wenn ein Vorschlag rejected wurde, wiederhole ihn nicht ohne Änderung.
-4. Antworte AUSSCHLIESSLICH mit einem JSON-Block in folgendem Format, keine Prosa davor/danach:
+2. Read-only Tools (z.B. list_*, get_*) laufen automatisch — ihre Ausgabe siehst du in der nächsten Planungsrunde als "Zwischenergebnisse" und kannst dann darauf aufbauend schreibende Tools vorschlagen. Write-Tools (create_*, update_*, add_tag_to_note, etc.) werden dem Nutzer zur Approval vorgelegt.
+3. Wenn ein Batch-Job ansteht (z.B. "tagge alle Notizen"), gib alle Einzel-Calls in EINEM plan zurück — du kriegst nach propose-Tools keinen weiteren Turn.
+4. Berücksichtige das Feedback aus vorherigen Iterationen (unten im User-Prompt). Wenn ein Vorschlag rejected wurde, wiederhole ihn nicht ohne Änderung.
+5. Antworte AUSSCHLIESSLICH mit einem JSON-Block in folgendem Format, keine Prosa davor/danach:
 
 \`\`\`json
 {