Debug log from a "tag 4 notes" mission showed the planner's second-round
response truncated mid-step: it was proposing one add_tag_to_note per
listed note but ran out of tokens halfway through note #2. Parser
rejected the malformed JSON → loop exited with 0 staged, user saw
nothing to approve.
Raising maxTokens to 4096 fits ~15-20 step objects, which covers the
batch-tagging / batch-save pattern the reasoning loop is designed for.
Also updating the system prompt so the planner actually knows about
the loop it's running inside: read-only tools are announced as
auto-executing with outputs visible next turn, and a new rule makes
explicit that batch jobs must emit all write-steps in one plan (because
staging a propose-tool ends the turn). Step count raised 1-5 → 1-10.
Prompt snapshot tests still pass (they check structure, not text).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>