feat(shared-ai): route compactor to Haiku-tier model by default (M2.5)

compactHistory() now defaults to DEFAULT_COMPACT_MODEL = 'google/gemini-2.5-flash-lite' when the caller doesn't override. Lite is ~3–5x cheaper than gemini-2.5-flash with near-identical summarisation quality — summarisation doesn't need the same tier as reasoning + tool-calling, and the compactor fires exactly when token spend is highest, so the cheaper route saves exactly where it matters. CompactHistoryOptions.model is now optional. All three consumers (mana-ai tick, webapp Companion, webapp Mission runner) drop their explicit gemini-2.5-flash override and let the default apply. This is the pragmatic M2.5: no mana-llm changes. The "tier" abstraction (X-Model-Tier header, env-routed aliases) from the Claude-Code report makes sense only once multiple utility tasks need cheaper routing — topic-detection, classification, command-injection checks. Today only the compactor wants it, and a model constant is the simplest contract that works. 2 new tests (default applied + override honoured). 79 shared-ai tests green, all three consumers type-check clean. One pre-existing unrelated type error in apps/mana/apps/web/src/lib/modules/wardrobe/queries.ts (not touched by this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 20:01:09 +02:00 · 2026-04-23 18:26:50 +02:00 · 2026-04-23 18:26:50 +02:00 · f7536bc0b9
commit f7536bc0b9
parent 2769241de3
7 changed files with 83 additions and 16 deletions
--- a/apps/mana/apps/web/src/lib/data/ai/missions/runner.ts
+++ b/apps/mana/apps/web/src/lib/data/ai/missions/runner.ts
@ -281,16 +281,15 @@ async function runMissionInner(
 				// prior steps in the same round.
 				isParallelSafe: (name) => AI_TOOL_CATALOG_BY_NAME.get(name)?.defaultPolicy === 'auto',
 				// Fold older turns into a compact-summary at 92% of
-				// maxContextTokens. Same LlmClient + model as the
-				// planner; one extra LLM call, but only when usage
-				// actually approaches the ceiling.
+				// maxContextTokens. compactHistory defaults to
+				// DEFAULT_COMPACT_MODEL (gemini-2.5-flash-lite) —
+				// cheaper than the planner's primary model, which
+				// matters because the compactor fires exactly when
+				// token spend is highest.
 				compactor: {
 					maxContextTokens: COMPACT_MAX_CTX,
 					compact: async (msgs) => {
-						const res = await compactHistory(msgs, {
-							llm: deps.llm,
-							model: deps.model ?? 'google/gemini-2.5-flash',
-						});
+						const res = await compactHistory(msgs, { llm: deps.llm });
 						return { messages: res.messages, compactedTurns: res.compactedTurns };
 					},
 				},
--- a/apps/mana/apps/web/src/lib/modules/companion/engine.ts
+++ b/apps/mana/apps/web/src/lib/modules/companion/engine.ts
@ -123,12 +123,14 @@ export async function runCompanionChat(
 				// user-visible intent order in the proposal inbox.
 				isParallelSafe: (name) => AI_TOOL_CATALOG_BY_NAME.get(name)?.defaultPolicy === 'auto',
 				// Fold the middle of messages into a compact-summary at
-				// 92% of the model's context window. Mirrors the mana-ai
-				// wiring; one call to the same LLM client, same model.
+				// 92% of the model's context window. compactHistory
+				// defaults to DEFAULT_COMPACT_MODEL (gemini-2.5-flash-lite)
+				// — cheaper than the planner's own model. Summarisation
+				// doesn't need the same tier as reasoning.
 				compactor: {
 					maxContextTokens: COMPACT_MAX_CTX,
 					compact: async (msgs) => {
-						const res = await compactHistory(msgs, { llm, model: 'google/gemini-2.5-flash' });
+						const res = await compactHistory(msgs, { llm });
 						return { messages: res.messages, compactedTurns: res.compactedTurns };
 					},
 				},