managarten/services/mana-llm/tests
Till JS 8a49e3ffd5 feat(mana-llm): M4 — observability, debug endpoints, SIGHUP reload
- `X-Mana-LLM-Resolved: <provider>/<model>` header on non-streaming
  responses. Streaming clients read the same info from each chunk's
  `model` field (SSE headers go out before the chain is walked).
- Three new Prometheus metrics: `mana_llm_alias_resolved_total{alias,
  target}` (which concrete model an alias resolved to per request),
  `mana_llm_fallback_total{from_model, to_model, reason}` (each
  fallback transition), `mana_llm_provider_healthy{provider}` (gauge,
  mirrors the circuit-breaker).
- New debug endpoints: `GET /v1/aliases` (registry inspection — chain
  + description per alias, useful for confirming SIGHUP reloads),
  `GET /v1/health` (full per-provider liveness snapshot — failure
  counter, last error, unhealthy-until backoff).
- `kill -HUP <pid>` reloads `aliases.yaml`. Parse errors leave the
  previous good state in memory and log the rejection.
- `ProviderHealthCache.add_listener()` for cache→metrics decoupling:
  the gauge is updated via a transition-only listener wired in main.py
  rather than the cache importing prometheus_client itself.
- Request-side metrics now use the requested model string, success-side
  uses the resolved one. So `mana_llm_llm_requests_total{provider="ollama",
  model="gemma3:12b"}` reflects actual upstream load even when callers
  used `mana/long-form` aliases.

16 new observability tests (test_m4_observability.py): listener
fire-on-transition semantics, exception-isolation, multi-listener,
counter increments, gauge writes, end-to-end alias→metric flow,
v1/aliases + v1/health endpoint shape, response.model carries the
resolved target after fallback. Total suite: 115/115 in 1.6s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:52:28 +02:00
..
__init__.py feat(mana-llm): add central LLM abstraction service 2026-01-29 22:01:00 +01:00
test_aliases.py feat(mana-llm): M1 — AliasRegistry + aliases.yaml SSOT 2026-04-26 20:23:51 +02:00
test_api.py feat(mana-llm): add central LLM abstraction service 2026-01-29 22:01:00 +01:00
test_health.py feat(mana-llm): M2 — ProviderHealthCache + background probe loop 2026-04-26 20:29:57 +02:00
test_health_probe.py feat(mana-llm): M2 — ProviderHealthCache + background probe loop 2026-04-26 20:29:57 +02:00
test_m4_observability.py feat(mana-llm): M4 — observability, debug endpoints, SIGHUP reload 2026-04-26 20:52:28 +02:00
test_providers.py feat(mana-llm): M3 — health-aware router with alias + chain fallback 2026-04-26 20:44:16 +02:00
test_router_fallback.py feat(mana-llm): M3 — health-aware router with alias + chain fallback 2026-04-26 20:44:16 +02:00
test_streaming.py feat(mana-llm): add central LLM abstraction service 2026-01-29 22:01:00 +01:00