managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-20 11:03:38 +02:00

Author	SHA1	Message	Date
Till JS	3be4612f04	fix(mana-llm): google-genai v1.73 keyword-only Part.from_text() google-genai >=1.70 changed Part.from_text() from positional to keyword-only argument. The production container installed v1.73.1 and crashed on startup with "Part.from_text() takes 1 positional argument but 2 were given". Fix: Part.from_text(msg.content) → Part.from_text(text=msg.content) Tested live: curl https://llm.mana.how/v1/chat/completions with model=google/gemini-2.5-flash returns correct response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:47:23 +02:00
Till JS	8a0bf93699	chore(cloud-tier): upgrade default model gemini-2.0-flash → gemini-2.5-flash gemini-2.0-flash is deprecated June 1 2026. gemini-2.5-flash has been stable since Q1 2026 with similar pricing ($0.15/$0.60 per 1M tokens vs $0.10/$0.40 — pricing table already had the entry). Three files touched: - packages/shared-llm/src/backends/cloud.ts — client default - services/mana-llm/src/config.py — server default - services/mana-llm/src/providers/google.py — Ollama→Gemini fallback map + constructor default + deduplicated model list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:32:03 +02:00
Till JS	5520f1385e	fix(mana-llm): add response_format to ChatCompletionRequest model The first iteration of the Ollama response_format passthrough crashed with 'ChatCompletionRequest object has no attribute response_format' because the Pydantic request model didn't declare the field at all — incoming response_format from OpenAI-compatible clients was being silently dropped at the parsing layer before the provider could see it. Fix: declare a typed ResponseFormat sub-model with the two OpenAI shapes ('json_object' and 'json_schema'), add it as an optional field on ChatCompletionRequest, and let the Ollama provider read it directly without defensive getattr fallbacks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:50:54 +02:00
Till JS	3ef095aaff	fix(mana-llm/ollama): pass response_format to Ollama + strip markdown fences The Ollama provider was completely ignoring `response_format` from the incoming OpenAI-compatible request. Two consequences: 1. Clients that asked for `{"type":"json_object"}` or `{"type":"json_schema",...}` got back JSON wrapped in ```json ... ``` markdown fences, because Ollama defaults to conversational output. 2. Strict downstream parsers (Vercel AI SDK `generateObject`, manual `JSON.parse`) failed to decode the response and threw, even though the underlying JSON was valid inside the fences. Fix: when response_format is set, translate it to Ollama's native `format` field: - `{"type":"json_object"}` → `format: "json"` - `{"type":"json_schema","json_schema":{"schema":{...}}}` → `format: <the schema dict>` (Ollama 0.5+ supports full JSON schemas in the format field) Defensive belt-and-suspenders: a small `_strip_json_fences` helper runs after the Ollama response is decoded and removes any leftover ```json ... ``` wrapping. Some older vision models still wrap output in fences even when `format` is set; this catches them. Streaming path is unchanged because the nutriphi/planta refactor uses non-streaming `generateObject`. Streaming structured output with Ollama deserves its own pass when someone actually needs it. Discovered during the AI SDK + Zod refactor smoke test — neither the old nor the new vision routes ever returned validated JSON locally because of this bug. Production uses Google Gemini directly via fallback so the issue was masked there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:12:01 +02:00
Till JS	45063b88be	feat(mana-llm): add Google Gemini fallback provider with auto-routing Add Google Gemini as a fallback provider that activates automatically when Ollama is overloaded or unavailable, ensuring LLM requests always succeed even under load. New provider (src/providers/google.py): - Full LLMProvider implementation using google-genai SDK - Chat completions (streaming + non-streaming) - Vision/multimodal support (base64 images) - Embeddings via text-embedding-004 - Model mapping: Ollama models → Gemini equivalents (gemma3:4b → gemini-2.0-flash, llava:7b → gemini-2.0-flash, etc.) Auto-fallback routing (src/providers/router.py): - Concurrent request tracking for Ollama (OLLAMA_MAX_CONCURRENT=3) - When Ollama concurrent > max: route to Google automatically - When Ollama fails: retry on Google with model mapping - Health check caching (5s TTL) to avoid hammering Ollama - Non-Ollama providers (openrouter, groq, together) are never fallback-routed - Fallback info included in /health endpoint response New config (src/config.py): - GOOGLE_API_KEY: enables Google provider - GOOGLE_DEFAULT_MODEL: default gemini-2.0-flash - AUTO_FALLBACK_ENABLED: toggle fallback (default: true) - OLLAMA_MAX_CONCURRENT: concurrent request threshold (default: 3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 22:44:09 +01:00
Till-JS	3edbd0cb26	chore: update dependencies and mana-llm improvements - Update pnpm-lock.yaml with matrix bot dependencies - Add environment variables to generate-env.mjs - Improve mana-llm config and ollama provider Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-30 17:50:58 +01:00
Till-JS	1495dbe476	✨ feat(mana-llm): add central LLM abstraction service Python/FastAPI service providing unified OpenAI-compatible API for Ollama and cloud LLM providers (OpenRouter, Groq, Together). Features: - Chat completions with streaming (SSE) - Vision/multimodal support - Embeddings generation - Multi-provider routing (provider/model format) - Prometheus metrics - Optional Redis caching	2026-01-29 22:01:00 +01:00

7 commits