mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 22:21:10 +02:00
One focused dashboard covering the M1+M2 instrumentation in a single
view. Sections top-to-bottom:
1. Service Health — mana-mcp + mana-ai up/down, 1h deny rate,
compactions/h. The deny rate is the single most important
number during POLICY_MODE=log-only soak: a non-zero
deny/min in log-only means real traffic that enforce mode
would reject.
2. Policy Gate (mana-mcp)
- Decisions / sec by outcome (allow/deny/flagged)
- Deny reasons breakdown — the soak signal for flipping to
enforce. If one reason dominates, address it before the flip.
- Tool invocations / sec by outcome (success / handler-error /
input-invalid)
- Top 10 invoked tools (24h) — usage heatmap for prioritising
which tools deserve the best policy-hint tuning.
- Handler p50/p95/p99 latency per tool.
3. Reminder Channel (mana-ai)
- Rate by producer (token-budget, retry-loop, compacted)
- Rate by severity. The interesting signal is whether
warn/escalate trend DOWN over time — it means the LLM is
actually reacting to the hints. If warn stays flat, the
producer wording probably isn't landing.
4. Context Compactor (mana-ai)
- Triggers/h cumulative
- Turns folded per compaction (p50/p95). Values < 3 flag
MANA_AI_COMPACT_MAX_CTX misconfig — the threshold is firing
on already-short histories.
5. Mission Runner Baseline — tick duration + planner rounds for
correlation (e.g. "did enabling the compactor change mean
tick duration?").
Dashboard provisioning already auto-loads anything in /var/lib/grafana/
dashboards (docker/grafana/provisioning/dashboards/default.yml), so
this is live after the next grafana restart. UID agent-loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| dashboards | ||
| provisioning | ||