managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 22:21:10 +02:00

History

Till JS 004b3b7fca chore(observability): Grafana dashboard for agent-loop metrics One focused dashboard covering the M1+M2 instrumentation in a single view. Sections top-to-bottom: 1. Service Health — mana-mcp + mana-ai up/down, 1h deny rate, compactions/h. The deny rate is the single most important number during POLICY_MODE=log-only soak: a non-zero deny/min in log-only means real traffic that enforce mode would reject. 2. Policy Gate (mana-mcp) - Decisions / sec by outcome (allow/deny/flagged) - Deny reasons breakdown — the soak signal for flipping to enforce. If one reason dominates, address it before the flip. - Tool invocations / sec by outcome (success / handler-error / input-invalid) - Top 10 invoked tools (24h) — usage heatmap for prioritising which tools deserve the best policy-hint tuning. - Handler p50/p95/p99 latency per tool. 3. Reminder Channel (mana-ai) - Rate by producer (token-budget, retry-loop, compacted) - Rate by severity. The interesting signal is whether warn/escalate trend DOWN over time — it means the LLM is actually reacting to the hints. If warn stays flat, the producer wording probably isn't landing. 4. Context Compactor (mana-ai) - Triggers/h cumulative - Turns folded per compaction (p50/p95). Values < 3 flag MANA_AI_COMPACT_MAX_CTX misconfig — the threshold is firing on already-short histories. 5. Mission Runner Baseline — tick duration + planner rounds for correlation (e.g. "did enabling the compactor change mean tick duration?"). Dashboard provisioning already auto-loads anything in /var/lib/grafana/ dashboards (docker/grafana/provisioning/dashboards/default.yml), so this is live after the next grafana restart. UID agent-loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 18:09:32 +02:00
..
dashboards	chore(observability): Grafana dashboard for agent-loop metrics	2026-04-23 18:09:32 +02:00
provisioning	feat(mana-ai): OpenTelemetry tracing + Grafana Tempo backend	2026-04-16 15:21:23 +02:00

Till JS 004b3b7fca chore(observability): Grafana dashboard for agent-loop metrics

One focused dashboard covering the M1+M2 instrumentation in a single
view. Sections top-to-bottom:

  1. Service Health — mana-mcp + mana-ai up/down, 1h deny rate,
     compactions/h. The deny rate is the single most important
     number during POLICY_MODE=log-only soak: a non-zero
     deny/min in log-only means real traffic that enforce mode
     would reject.

  2. Policy Gate (mana-mcp)
     - Decisions / sec by outcome (allow/deny/flagged)
     - Deny reasons breakdown — the soak signal for flipping to
       enforce. If one reason dominates, address it before the flip.
     - Tool invocations / sec by outcome (success / handler-error /
       input-invalid)
     - Top 10 invoked tools (24h) — usage heatmap for prioritising
       which tools deserve the best policy-hint tuning.
     - Handler p50/p95/p99 latency per tool.

  3. Reminder Channel (mana-ai)
     - Rate by producer (token-budget, retry-loop, compacted)
     - Rate by severity. The interesting signal is whether
       warn/escalate trend DOWN over time — it means the LLM is
       actually reacting to the hints. If warn stays flat, the
       producer wording probably isn't landing.

  4. Context Compactor (mana-ai)
     - Triggers/h cumulative
     - Turns folded per compaction (p50/p95). Values < 3 flag
       MANA_AI_COMPACT_MAX_CTX misconfig — the threshold is firing
       on already-short histories.

  5. Mission Runner Baseline — tick duration + planner rounds for
     correlation (e.g. "did enabling the compactor change mean
     tick duration?").

Dashboard provisioning already auto-loads anything in /var/lib/grafana/
dashboards (docker/grafana/provisioning/dashboards/default.yml), so
this is live after the next grafana restart. UID agent-loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-23 18:09:32 +02:00

dashboards

chore(observability): Grafana dashboard for agent-loop metrics

2026-04-23 18:09:32 +02:00

provisioning

feat(mana-ai): OpenTelemetry tracing + Grafana Tempo backend

2026-04-16 15:21:23 +02:00