feat(mana-llm): add Google Gemini fallback provider with auto-routing

Add Google Gemini as a fallback provider that activates automatically when Ollama is overloaded or unavailable, ensuring LLM requests always succeed even under load. New provider (src/providers/google.py): - Full LLMProvider implementation using google-genai SDK - Chat completions (streaming + non-streaming) - Vision/multimodal support (base64 images) - Embeddings via text-embedding-004 - Model mapping: Ollama models → Gemini equivalents (gemma3:4b → gemini-2.0-flash, llava:7b → gemini-2.0-flash, etc.) Auto-fallback routing (src/providers/router.py): - Concurrent request tracking for Ollama (OLLAMA_MAX_CONCURRENT=3) - When Ollama concurrent > max: route to Google automatically - When Ollama fails: retry on Google with model mapping - Health check caching (5s TTL) to avoid hammering Ollama - Non-Ollama providers (openrouter, groq, together) are never fallback-routed - Fallback info included in /health endpoint response New config (src/config.py): - GOOGLE_API_KEY: enables Google provider - GOOGLE_DEFAULT_MODEL: default gemini-2.0-flash - AUTO_FALLBACK_ENABLED: toggle fallback (default: true) - OLLAMA_MAX_CONCURRENT: concurrent request threshold (default: 3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-14 20:21:09 +02:00 · 2026-03-23 22:44:09 +01:00 · 2026-03-23 22:44:09 +01:00 · 45063b88be
commit 45063b88be
parent 28286d126c
5 changed files with 430 additions and 19 deletions
--- a/docker-compose.macmini.yml
+++ b/docker-compose.macmini.yml
@ -1779,6 +1779,10 @@ services:
      OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-}
      GROQ_API_KEY: ${GROQ_API_KEY:-}
      TOGETHER_API_KEY: ${TOGETHER_API_KEY:-}
+      GOOGLE_API_KEY: ${GOOGLE_API_KEY:-}
+      GOOGLE_DEFAULT_MODEL: gemini-2.0-flash
+      AUTO_FALLBACK_ENABLED: "true"
+      OLLAMA_MAX_CONCURRENT: 3
      CORS_ORIGINS: https://playground.mana.how,https://mana.how,https://chat.mana.how
    extra_hosts:
      - "host.docker.internal:host-gateway"