managarten/services/mana-ai
Till JS 1d3794f96c feat(mana-ai): Prometheus metrics for tool-calls, loop rounds, provider errors
Three new counters + one histogram fill the observability gap from
the function-calling migration:

- mana_ai_tool_calls_total{tool, policy, outcome} — one tick per
  tool_call the planner produced. `outcome` is `deferred` on the
  server (stub onToolCall records for later client execution);
  webapp runner will emit success/failure once it grows its own
  Prom surface.
- mana_ai_planner_rounds (histogram, buckets 1..5) — distribution of
  rounds consumed per iteration. Runs close to the cap signal a
  planner struggling with the mission objective.
- mana_ai_provider_errors_total{provider, kind} — structured errors
  surfaced from mana-llm. Kind mirrors the ProviderError hierarchy
  added in commit 1 of the migration (blocked/truncated/auth/
  rate_limit/capability/unknown).

Plumbing:
- llm-client.ts parses mana-llm's `{detail: {kind, message}}` 4xx/5xx
  body shape and re-throws as ProviderCallError carrying the kind.
- tick.ts observes metrics at the natural emission points — rounds
  + per-call counter after runPlannerLoop returns, provider_errors
  in the catch block.

Grafana dashboards + status.mana.how already pick up the
collectDefaultMetrics prefix, so these metrics land in the existing
mana-ai panel without scraper changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 20:48:29 +02:00
..
src feat(mana-ai): Prometheus metrics for tool-calls, loop rounds, provider errors 2026-04-20 20:48:29 +02:00
CLAUDE.md fix(goals): start GoalTracker on boot + surface AI proposals inline 2026-04-20 14:24:39 +02:00
Dockerfile fix(infra): include shared-logger in mana-ai + mana-auth Dockerfile installers 2026-04-15 14:34:08 +02:00
package.json feat(mana-ai): OpenTelemetry tracing + Grafana Tempo backend 2026-04-16 15:21:23 +02:00
tsconfig.json feat(mana-ai): scaffold server-side Mission Runner (v0.1) 2026-04-14 23:48:30 +02:00