managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 20:21:09 +02:00

Commit graph

Author	SHA1	Message	Date
Till JS	3046da3b19	feat(mana-llm): M3 — health-aware router with alias + chain fallback Replaces the old Ollama→Google special-case auto-fallback with the unified pipeline: caller passes either a direct provider/model or an alias from the `mana/` namespace; the router resolves to a chain and walks it skipping unhealthy providers (per ProviderHealthCache from M2), trying each entry, marking provider unhealthy on retryable errors and falling through to the next. Retryable: ConnectError, ReadTimeout, RemoteProtocolError, 5xx, ProviderRateLimitError. Propagated (don't fall back, don't poison the cache): ProviderCapabilityError, ProviderAuthError, ProviderBlockedError, 4xx, unknown exception types. The cache stays "what the network told us about this provider's liveness" — caller errors don't muddy that signal. Streaming: pre-first-byte fallback only. Once a chunk has been yielded the provider is committed; mid-stream errors propagate as-is so we don't splice two voices into one output. `NoHealthyProviderError` (HTTP 503) carries a structured attempt log — each chain entry shows up as `(model, reason)` so the cause of a 503 is visible in the response and metrics, not only in service logs. main.py wires the lifespan: aliases.yaml is loaded, ProviderHealthCache created, ProviderRouter takes both as constructor deps, HealthProbe spawned with cheap HTTP probes per configured provider (Ollama /api/tags, OpenAI-compat /v1/models with Bearer header). Google is skipped — google-genai SDK has no obvious cheap probe; the call-site fallback handles real errors. 22 new router tests (test_router_fallback.py): chain walking, capability & auth propagation, 5xx vs 4xx differentiation, rate-limit retry, all-fail → NoHealthyProviderError, direct provider strings bypass aliases, streaming pre-first-byte fallback, mid-stream-failure does NOT fall back, empty stream commits without retry, cache feedback on success/failure/non-retryable. Existing test_providers.py updated for the new constructor signature; all 99 service tests green via the dev container (Python 3.12). Legacy purged: `_ollama_concurrent`, `_ollama_health_cache`, `_can_fallback_to_google`, `_should_use_ollama`, `_fallback_to_google`, `_get_ollama_health_cached` all gone. The `auto_fallback_enabled` / `ollama_max_concurrent` settings remain in config.py for now (M5 will remove them along with the per-feature env-var overrides). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 20:44:16 +02:00
Till-JS	1495dbe476	✨ feat(mana-llm): add central LLM abstraction service Python/FastAPI service providing unified OpenAI-compatible API for Ollama and cloud LLM providers (OpenRouter, Groq, Together). Features: - Chat completions with streaming (SSE) - Vision/multimodal support - Embeddings generation - Multi-provider routing (provider/model format) - Prometheus metrics - Optional Redis caching	2026-01-29 22:01:00 +01:00

Author

SHA1

Message

Date

Till JS

3046da3b19

feat(mana-llm): M3 — health-aware router with alias + chain fallback

Replaces the old Ollama→Google special-case auto-fallback with the
unified pipeline: caller passes either a direct provider/model or an
alias from the `mana/` namespace; the router resolves to a chain and
walks it skipping unhealthy providers (per ProviderHealthCache from M2),
trying each entry, marking provider unhealthy on retryable errors and
falling through to the next.

Retryable: ConnectError, ReadTimeout, RemoteProtocolError, 5xx,
ProviderRateLimitError. Propagated (don't fall back, don't poison the
cache): ProviderCapabilityError, ProviderAuthError, ProviderBlockedError,
4xx, unknown exception types. The cache stays "what the network told us
about this provider's liveness" — caller errors don't muddy that signal.

Streaming: pre-first-byte fallback only. Once a chunk has been yielded
the provider is committed; mid-stream errors propagate as-is so we
don't splice two voices into one output.

`NoHealthyProviderError` (HTTP 503) carries a structured attempt log —
each chain entry shows up as `(model, reason)` so the cause of a 503
is visible in the response and metrics, not only in service logs.

main.py wires the lifespan: aliases.yaml is loaded, ProviderHealthCache
created, ProviderRouter takes both as constructor deps, HealthProbe
spawned with cheap HTTP probes per configured provider (Ollama
/api/tags, OpenAI-compat /v1/models with Bearer header). Google is
skipped — google-genai SDK has no obvious cheap probe; the call-site
fallback handles real errors.

22 new router tests (test_router_fallback.py): chain walking, capability
& auth propagation, 5xx vs 4xx differentiation, rate-limit retry,
all-fail → NoHealthyProviderError, direct provider strings bypass
aliases, streaming pre-first-byte fallback, mid-stream-failure does
NOT fall back, empty stream commits without retry, cache feedback on
success/failure/non-retryable. Existing test_providers.py updated for
the new constructor signature; all 99 service tests green via the dev
container (Python 3.12).

Legacy purged: `_ollama_concurrent`, `_ollama_health_cache`,
`_can_fallback_to_google`, `_should_use_ollama`, `_fallback_to_google`,
`_get_ollama_health_cached` all gone. The `auto_fallback_enabled` /
`ollama_max_concurrent` settings remain in config.py for now (M5 will
remove them along with the per-feature env-var overrides).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-26 20:44:16 +02:00

Till-JS

1495dbe476

✨ feat(mana-llm): add central LLM abstraction service

Python/FastAPI service providing unified OpenAI-compatible API for
Ollama and cloud LLM providers (OpenRouter, Groq, Together).

Features:
- Chat completions with streaming (SSE)
- Vision/multimodal support
- Embeddings generation
- Multi-provider routing (provider/model format)
- Prometheus metrics
- Optional Redis caching

2026-01-29 22:01:00 +01:00

2 commits