Commit graph

6 commits

Author SHA1 Message Date
Till JS
45063b88be feat(mana-llm): add Google Gemini fallback provider with auto-routing
Add Google Gemini as a fallback provider that activates automatically
when Ollama is overloaded or unavailable, ensuring LLM requests always
succeed even under load.

New provider (src/providers/google.py):
- Full LLMProvider implementation using google-genai SDK
- Chat completions (streaming + non-streaming)
- Vision/multimodal support (base64 images)
- Embeddings via text-embedding-004
- Model mapping: Ollama models → Gemini equivalents
  (gemma3:4b → gemini-2.0-flash, llava:7b → gemini-2.0-flash, etc.)

Auto-fallback routing (src/providers/router.py):
- Concurrent request tracking for Ollama (OLLAMA_MAX_CONCURRENT=3)
- When Ollama concurrent > max: route to Google automatically
- When Ollama fails: retry on Google with model mapping
- Health check caching (5s TTL) to avoid hammering Ollama
- Non-Ollama providers (openrouter, groq, together) are never fallback-routed
- Fallback info included in /health endpoint response

New config (src/config.py):
- GOOGLE_API_KEY: enables Google provider
- GOOGLE_DEFAULT_MODEL: default gemini-2.0-flash
- AUTO_FALLBACK_ENABLED: toggle fallback (default: true)
- OLLAMA_MAX_CONCURRENT: concurrent request threshold (default: 3)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23 22:44:09 +01:00
Till-JS
aba79f5c16 fix(mana-llm): fix SSE double data prefix causing message parsing issues
EventSourceResponse from sse-starlette adds its own 'data:' prefix,
so we should yield dicts with a 'data' key instead of pre-formatted
SSE strings. This was causing 'data: data:' double prefixes and
backticks appearing in chat messages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 15:29:11 +01:00
Till-JS
d605366460 feat(llm-playground): add model comparison feature
- Add modality detection (text/vision/code) to models store
- Create comparison store for parallel multi-model streaming
- Add ModelModalityFilter and ModelComparisonSelector components
- Add ComparisonResponseCard with metrics (duration, tokens, t/s)
- Add ComparisonMessageBubble for side-by-side response view
- Integrate comparison mode into ChatInput, MessageList, Sidebar
- Add dev:full script to start mana-llm + playground together
- Add start.sh script for mana-llm Python service
2026-01-31 23:30:16 +01:00
Till-JS
fdba0e3425 feat(llm-playground): add production deployment with auth
- Add Dockerfile for multi-stage Docker build
- Add mana-core-auth integration with login/register pages
- Add auth store using Svelte 5 runes
- Add protected route layout with auth guard
- Add health endpoint for container health checks
- Add runtime URL injection via hooks.server.ts
- Add logout button to header
- Update docker-compose.macmini.yml with llm-playground service
- Update cloudflared-config.yml with playground.mana.how route
- Update mana-llm CORS config for playground domain
- Update generate-env.mjs with auth URL variable

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 18:15:02 +01:00
Till-JS
3edbd0cb26 chore: update dependencies and mana-llm improvements
- Update pnpm-lock.yaml with matrix bot dependencies
- Add environment variables to generate-env.mjs
- Improve mana-llm config and ollama provider

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-30 17:50:58 +01:00
Till-JS
1495dbe476 feat(mana-llm): add central LLM abstraction service
Python/FastAPI service providing unified OpenAI-compatible API for
Ollama and cloud LLM providers (OpenRouter, Groq, Together).

Features:
- Chat completions with streaming (SSE)
- Vision/multimodal support
- Embeddings generation
- Multi-provider routing (provider/model format)
- Prometheus metrics
- Optional Redis caching
2026-01-29 22:01:00 +01:00