feat(shared-ai): route compactor to Haiku-tier model by default (M2.5)

compactHistory() now defaults to DEFAULT_COMPACT_MODEL = 'google/gemini-2.5-flash-lite' when the caller doesn't override. Lite is ~3–5x cheaper than gemini-2.5-flash with near-identical summarisation quality — summarisation doesn't need the same tier as reasoning + tool-calling, and the compactor fires exactly when token spend is highest, so the cheaper route saves exactly where it matters. CompactHistoryOptions.model is now optional. All three consumers (mana-ai tick, webapp Companion, webapp Mission runner) drop their explicit gemini-2.5-flash override and let the default apply. This is the pragmatic M2.5: no mana-llm changes. The "tier" abstraction (X-Model-Tier header, env-routed aliases) from the Claude-Code report makes sense only once multiple utility tasks need cheaper routing — topic-detection, classification, command-injection checks. Today only the compactor wants it, and a model constant is the simplest contract that works. 2 new tests (default applied + override honoured). 79 shared-ai tests green, all three consumers type-check clean. One pre-existing unrelated type error in apps/mana/apps/web/src/lib/modules/wardrobe/queries.ts (not touched by this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 19:01:08 +02:00 · 2026-04-23 18:26:50 +02:00 · 2026-04-23 18:26:50 +02:00 · f7536bc0b9
commit f7536bc0b9
parent 2769241de3
7 changed files with 83 additions and 16 deletions
--- a/services/mana-ai/src/cron/tick.ts
+++ b/services/mana-ai/src/cron/tick.ts
@ -396,15 +396,19 @@ async function planOneMission(
 	const plannerModel = 'google/gemini-2.5-flash';

 	// Claude-Code wU2 pattern: fold the middle of messages into a structured
-	// summary once cumulative tokens cross 92% of maxContextTokens. Uses
-	// the same LLM + model as the planner itself; later we can route this
-	// to a cheaper model (Haiku tier) when mana-llm supports it.
+	// summary once cumulative tokens cross 92% of maxContextTokens.
+	//
+	// compactHistory defaults to DEFAULT_COMPACT_MODEL
+	// (gemini-2.5-flash-lite) — cheaper than the planner's own model.
+	// Summarisation doesn't need the same reasoning tier as tool-calling,
+	// and the compactor runs exactly when token spend is highest, so the
+	// cheaper route saves tokens where they matter.
 	const compactor =
 		config.compactMaxContextTokens > 0
 			? {
 					maxContextTokens: config.compactMaxContextTokens,
 					compact: async (msgs: Parameters<typeof compactHistory>[0]) => {
-						const result = await compactHistory(msgs, { llm, model: plannerModel });
+						const result = await compactHistory(msgs, { llm });
 						if (result.compactedTurns > 0) {
 							compactionsTriggeredTotal.inc();
 							compactedTurnsHistogram.observe(result.compactedTurns);