managarten/services/mana-llm
Till JS 59557e62d7 feat(mana-llm): M2 — ProviderHealthCache + background probe loop
Per-provider liveness with circuit-breaker semantics. The router (M3)
will read `is_healthy()` to skip dead providers in a chain; the probe
loop and the call-site fallback handler write state via
`mark_healthy` / `mark_unhealthy`.

State machine: 1st failure stays healthy (transient blips happen);
2nd consecutive failure trips the breaker and sets a 60s backoff
window during which `is_healthy → False`. After the window the
provider is half-open again — next call exercises it, success
resets, failure re-arms.

HealthProbe is the background asyncio.Task that pings every
registered provider every 30s with a 3s timeout. Probes run
concurrently per tick and one bad probe can't sink the loop. Probe
functions are injected (`{name: async-fn}`) so this module stays
decoupled from the provider classes — the wiring lives in main.py
where we already know which providers are configured.

32 new tests (FakeClock for deterministic backoff timing, slow-probe
helpers for parallelism + timeout, lifecycle tests for start/stop
idempotency and tick-after-error survival). 64/64 alias+health tests
green.

Not yet wired into the request path — that's M3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:29:57 +02:00
..
src feat(mana-llm): M2 — ProviderHealthCache + background probe loop 2026-04-26 20:29:57 +02:00
tests feat(mana-llm): M2 — ProviderHealthCache + background probe loop 2026-04-26 20:29:57 +02:00
.env.example chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts 2026-04-08 12:46:03 +02:00
.gitignore feat(mana-llm): add central LLM abstraction service 2026-01-29 22:01:00 +01:00
aliases.yaml feat(mana-llm): M1 — AliasRegistry + aliases.yaml SSOT 2026-04-26 20:23:51 +02:00
CLAUDE.md chore(matrix): final scrub of stale matrix references 2026-04-08 16:47:54 +02:00
docker-compose.dev.yml feat(mana-llm): add central LLM abstraction service 2026-01-29 22:01:00 +01:00
docker-compose.yml chore(mana-llm): thread GOOGLE_API_KEY + default model into local compose 2026-04-20 20:42:21 +02:00
Dockerfile feat(mana-llm): add central LLM abstraction service 2026-01-29 22:01:00 +01:00
pyproject.toml feat(mana-llm): M1 — AliasRegistry + aliases.yaml SSOT 2026-04-26 20:23:51 +02:00
requirements.txt feat(mana-llm): M1 — AliasRegistry + aliases.yaml SSOT 2026-04-26 20:23:51 +02:00
service.pyw chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts 2026-04-08 12:46:03 +02:00
start.sh feat(llm-playground): add model comparison feature 2026-01-31 23:30:16 +01:00