mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 18:41:08 +02:00
feat(mana-ai): Prometheus /metrics endpoint + status.mana.how integration
Wires mana-ai into the existing observability stack so tick throughput,
plan-failure rates, planner latencies, and snapshot refresh health are
visible in Grafana + Prometheus, and the service's uptime surfaces on
status.mana.how under the "Internal" section.
- `src/metrics.ts` — prom-client Registry with `mana_ai_` prefix.
Counters: ticks_total, plans_produced_total, plans_written_back_total,
parse_failures_total, mission_errors_total, snapshots_new/updated,
snapshot_rows_applied_total, http_requests_total.
Histograms: tick_duration_seconds (0.1–120s), planner_request_
duration_seconds (0.25–60s), http_request_duration_seconds (0.005–10s).
- `src/index.ts` — HTTP middleware labels every request by
method/path/status; `/metrics` serves the Prometheus text format.
- `src/cron/tick.ts` — increments counters + wraps the tick with
`tickDuration.startTimer()`. Snapshot stats fold through.
- `src/planner/client.ts` — wraps `complete()` in a latency histogram
timer so planner tail latency shows up separately from tick duration.
- `docker/prometheus/prometheus.yml` —
1. New `mana-ai` scrape job against `mana-ai:3066/metrics` (30s).
2. `/health` added to the `blackbox-internal` job so uptime shows on
status.mana.how alongside mana-geocoding.
- `scripts/generate-status-page.sh` — friendly label for the new probe:
`mana-ai:3066/health` → "Mana AI Runner" (generator already iterates
`blackbox-internal`, no other changes needed).
- `package.json` — prom-client ^15.1.3
All 17 Bun tests still pass; tsc clean.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
767b64cdd4
commit
0bf01f434e
9 changed files with 184 additions and 3 deletions
|
|
@ -123,6 +123,15 @@ scrape_configs:
|
|||
metrics_path: '/metrics'
|
||||
scrape_interval: 30s
|
||||
|
||||
# Mana AI Service (Bun) — background Mission Runner for the AI Workbench.
|
||||
# Exposes tick stats, planner-request latencies, snapshot refresh
|
||||
# counters, and standard HTTP metrics at /metrics.
|
||||
- job_name: 'mana-ai'
|
||||
static_configs:
|
||||
- targets: ['mana-ai:3066']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 30s
|
||||
|
||||
# ============================================
|
||||
# GPU Server (Windows PC, LAN: 192.168.178.11)
|
||||
# ============================================
|
||||
|
|
@ -297,6 +306,8 @@ scrape_configs:
|
|||
# Upstream Pelias health, proxied through the wrapper so the
|
||||
# blackbox-exporter doesn't need host.docker.internal access.
|
||||
- http://mana-geocoding:3018/health/pelias
|
||||
# mana-ai (Mission Runner) — internal-only, no CF tunnel.
|
||||
- http://mana-ai:3066/health
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue