managarten/services/mana-mcp/src/metrics.ts
Till JS c94ab01c69 feat(mana-mcp): Prometheus metrics for policy gate + tool invocations
Replaces the stub /metrics endpoint with a real prom-client registry
(mana_mcp_ prefix, {service="mana-mcp"} default label). Default
process metrics come along for free.

Policy-gate telemetry is the whole point — without it we can't soak
POLICY_MODE=log-only safely or decide when to flip to enforce. New
counter mana_mcp_policy_decisions_total{decision, reason, mode} buckets
every evaluatePolicy() call:

  decision ∈ {allow, deny, flagged}
  reason   ∈ {admin-scope-not-invokable, destructive-not-allowed,
              rate-limit-exceeded, injection-marker, clean, unknown}
  mode     ∈ {log-only, enforce}

So the rate of "would have been denied" during soak is visible directly
as policy_decisions_total{decision="deny", mode="log-only"}.

Also:
  - mana_mcp_tool_invocations_total{tool, outcome} — success |
    handler-error | input-invalid. Policy denies are NOT counted here
    (they're in policy_decisions_total above); this counter only counts
    calls that actually reached the handler or tripped zod validation.
  - mana_mcp_tool_duration_seconds histogram per tool/outcome.

Dep: prom-client ^15.1.3 (same version mana-ai pins).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:23:08 +02:00

60 lines
2.5 KiB
TypeScript

/**
* Prometheus metrics — exported on GET /metrics.
*
* Mirrors the shape of `services/mana-ai/src/metrics.ts` so Grafana and
* status.mana.how recognise this service without special-casing. Metric
* names use the `mana_mcp_*` prefix; labels stay low-cardinality on
* purpose (tool name is high-cardinality but still a fixed registry, so
* it's acceptable — we have ~20 tools today).
*/
import { Counter, Histogram, Registry, collectDefaultMetrics } from 'prom-client';
export const register = new Registry();
register.setDefaultLabels({ service: 'mana-mcp' });
collectDefaultMetrics({ register, prefix: 'mana_mcp_' });
// ── Policy gate ──────────────────────────────────────────────
/**
* One sample per `evaluatePolicy()` call.
*
* Labels:
* - `decision`: `allow` | `deny` | `flagged` (flagged = allow with a
* reminder, e.g. freetext injection marker hit)
* - `reason`: `admin-scope-not-invokable` | `destructive-not-allowed`
* | `rate-limit-exceeded` | `injection-marker`
* | `clean` (no reason applied; for dashboards)
* - `mode`: `log-only` | `enforce` — lets us diff how many
* decisions WOULD block vs. actually blocked during soak
*/
export const policyDecisionsTotal = new Counter({
name: 'mana_mcp_policy_decisions_total',
help: 'Tool-policy gate decisions, bucketed by outcome and reason.',
labelNames: ['decision', 'reason', 'mode'] as const,
registers: [register],
});
// ── Tool invocations ─────────────────────────────────────────
/**
* Every tool that makes it past the policy gate lands here. `outcome`
* is `success` | `handler-error` | `input-invalid` so dashboards can
* differentiate "tool ran but failed" from "LLM sent malformed args".
* Policy-denied calls are NOT counted here — they never reach the
* handler — and are visible under `policyDecisionsTotal{decision='deny'}`.
*/
export const toolInvocationsTotal = new Counter({
name: 'mana_mcp_tool_invocations_total',
help: 'Tool handler invocations (after policy gate).',
labelNames: ['tool', 'outcome'] as const,
registers: [register],
});
export const toolDuration = new Histogram({
name: 'mana_mcp_tool_duration_seconds',
help: 'Handler wall-clock latency per tool.',
labelNames: ['tool', 'outcome'] as const,
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10],
registers: [register],
});