managarten/services/mana-llm/pyproject.toml
Till JS 45063b88be feat(mana-llm): add Google Gemini fallback provider with auto-routing
Add Google Gemini as a fallback provider that activates automatically
when Ollama is overloaded or unavailable, ensuring LLM requests always
succeed even under load.

New provider (src/providers/google.py):
- Full LLMProvider implementation using google-genai SDK
- Chat completions (streaming + non-streaming)
- Vision/multimodal support (base64 images)
- Embeddings via text-embedding-004
- Model mapping: Ollama models → Gemini equivalents
  (gemma3:4b → gemini-2.0-flash, llava:7b → gemini-2.0-flash, etc.)

Auto-fallback routing (src/providers/router.py):
- Concurrent request tracking for Ollama (OLLAMA_MAX_CONCURRENT=3)
- When Ollama concurrent > max: route to Google automatically
- When Ollama fails: retry on Google with model mapping
- Health check caching (5s TTL) to avoid hammering Ollama
- Non-Ollama providers (openrouter, groq, together) are never fallback-routed
- Fallback info included in /health endpoint response

New config (src/config.py):
- GOOGLE_API_KEY: enables Google provider
- GOOGLE_DEFAULT_MODEL: default gemini-2.0-flash
- AUTO_FALLBACK_ENABLED: toggle fallback (default: true)
- OLLAMA_MAX_CONCURRENT: concurrent request threshold (default: 3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 22:44:09 +01:00

39 lines
811 B
TOML

[project]
name = "mana-llm"
version = "0.1.0"
description = "Central LLM abstraction service for Ollama and OpenAI-compatible APIs"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.115.0",
"uvicorn[standard]>=0.32.0",
"pydantic>=2.10.0",
"pydantic-settings>=2.6.0",
"httpx>=0.28.0",
"sse-starlette>=2.2.0",
"redis>=5.2.0",
"prometheus-client>=0.21.0",
"google-genai>=1.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.3.0",
"pytest-asyncio>=0.24.0",
"pytest-httpx>=0.35.0",
"ruff>=0.8.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.ruff]
line-length = 100
target-version = "py311"
[tool.ruff.lint]
select = ["E", "F", "I", "W"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]