managarten/services
Till JS 3046da3b19 feat(mana-llm): M3 — health-aware router with alias + chain fallback
Replaces the old Ollama→Google special-case auto-fallback with the
unified pipeline: caller passes either a direct provider/model or an
alias from the `mana/` namespace; the router resolves to a chain and
walks it skipping unhealthy providers (per ProviderHealthCache from M2),
trying each entry, marking provider unhealthy on retryable errors and
falling through to the next.

Retryable: ConnectError, ReadTimeout, RemoteProtocolError, 5xx,
ProviderRateLimitError. Propagated (don't fall back, don't poison the
cache): ProviderCapabilityError, ProviderAuthError, ProviderBlockedError,
4xx, unknown exception types. The cache stays "what the network told us
about this provider's liveness" — caller errors don't muddy that signal.

Streaming: pre-first-byte fallback only. Once a chunk has been yielded
the provider is committed; mid-stream errors propagate as-is so we
don't splice two voices into one output.

`NoHealthyProviderError` (HTTP 503) carries a structured attempt log —
each chain entry shows up as `(model, reason)` so the cause of a 503
is visible in the response and metrics, not only in service logs.

main.py wires the lifespan: aliases.yaml is loaded, ProviderHealthCache
created, ProviderRouter takes both as constructor deps, HealthProbe
spawned with cheap HTTP probes per configured provider (Ollama
/api/tags, OpenAI-compat /v1/models with Bearer header). Google is
skipped — google-genai SDK has no obvious cheap probe; the call-site
fallback handles real errors.

22 new router tests (test_router_fallback.py): chain walking, capability
& auth propagation, 5xx vs 4xx differentiation, rate-limit retry,
all-fail → NoHealthyProviderError, direct provider strings bypass
aliases, streaming pre-first-byte fallback, mid-stream-failure does
NOT fall back, empty stream commits without retry, cache feedback on
success/failure/non-retryable. Existing test_providers.py updated for
the new constructor signature; all 99 service tests green via the dev
container (Python 3.12).

Legacy purged: `_ollama_concurrent`, `_ollama_health_cache`,
`_can_fallback_to_google`, `_should_use_ollama`, `_fallback_to_google`,
`_get_ollama_health_cached` all gone. The `auto_fallback_enabled` /
`ollama_max_concurrent` settings remain in config.py for now (M5 will
remove them along with the per-feature env-var overrides).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:44:16 +02:00
..
mana-ai feat(shared-ai): route compactor to Haiku-tier model by default (M2.5) 2026-04-23 18:26:50 +02:00
mana-analytics refactor(shared-tailwind): rewrite themes.css to single-layer shadcn convention 2026-04-09 01:13:06 +02:00
mana-api-gateway chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
mana-auth feat(auth): error-classification layer + passkey end-to-end 2026-04-24 01:52:51 +02:00
mana-crawler fix(mana-crawler): default DATABASE_URL to mana_platform in dev 2026-04-15 18:18:19 +02:00
mana-credits feat(credits): add 2-phase debit (reserve/commit/refund) 2026-04-17 14:41:41 +02:00
mana-events fix(events): Eventbrite provider — switch from dead API to web scraping 2026-04-18 16:51:58 +02:00
mana-geocoding test(geocoding): add unit tests + end-to-end smoke test script 2026-04-11 20:21:18 +02:00
mana-image-gen feat(mana-image-gen): replace Mac flux2.c implementation with Windows GPU diffusers 2026-04-08 13:02:42 +02:00
mana-landing-builder fix(tsconfig): unblock shared-types consumers 2026-04-21 18:53:55 +02:00
mana-llm feat(mana-llm): M3 — health-aware router with alias + chain fallback 2026-04-26 20:44:16 +02:00
mana-mail fix(broadcast): track route paths + shared-branding tsconfig 2026-04-21 18:30:47 +02:00
mana-mcp docs(mana-mcp,mana-ai): CLAUDE.md coverage for M1 agent-loop primitives 2026-04-23 14:25:14 +02:00
mana-media fix(mana-media): HEIC uploads from Chrome — sniff + transcode at the edge 2026-04-25 13:46:13 +02:00
mana-notify fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack 2026-04-08 16:32:13 +02:00
mana-persona-runner fix(personas): exact tool_use_id pairing + CI drift audit 2026-04-23 15:34:52 +02:00
mana-research test(mana-research): fixture-based tests for Gemini poll-response parser 2026-04-22 18:44:21 +02:00
mana-search chore(docker): drop obsolete services/mana-search/docker-compose.dev.yml 2026-04-23 15:27:19 +02:00
mana-stt chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) 2026-04-08 13:06:40 +02:00
mana-subscriptions chore(db): enforce pgSchema isolation with a lint script 2026-04-20 14:45:59 +02:00
mana-sync feat(backup): client-driven v2 snapshot export, drop server-side backup 2026-04-22 18:46:29 +02:00
mana-tts feat(profile): voice interview with pre-rendered TTS audio + Orpheus/Zonos backends 2026-04-17 15:22:52 +02:00
mana-user refactor(shared-tailwind): rewrite themes.css to single-layer shadcn convention 2026-04-09 01:13:06 +02:00
mana-video-gen chore(matrix): final scrub of stale matrix references 2026-04-08 16:47:54 +02:00
mana-voice-bot fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes 2026-04-08 13:14:57 +02:00
news-ingester refactor(shared-rss): extract RSS parsing + Readability into one package 2026-04-15 22:30:44 +02:00