Till JS
|
56ffcbac39
|
feat: add Ollama memory optimization, LLM metrics, and chat streaming
Three improvements to the unified LLM infrastructure:
1. Ollama memory optimization (scripts/mac-mini/configure-ollama.sh):
- OLLAMA_KEEP_ALIVE=5m → models unload after 5min idle (saves 3-16GB RAM)
- OLLAMA_NUM_PARALLEL=1 → predictable memory usage
- OLLAMA_MAX_LOADED_MODELS=1 → max 1 model in RAM at a time
2. Request-level metrics in @manacore/shared-llm:
- LlmRequestMetrics interface (model, latency, tokens, fallback detection)
- LlmMetricsCollector class with summary stats (for health endpoints)
- Optional onMetrics callback in LlmModuleOptions
- Automatic metrics emission in chatMessages() (success + error)
3. Chat streaming (token-by-token SSE):
- Backend: POST /chat/completions/stream SSE endpoint
- OllamaService.createStreamingCompletion() via llm.chatStreamMessages()
- ChatService.createStreamingCompletion() with upfront credit consumption
- Web: chatApi.createStreamingCompletion() SSE consumer
- Chat store: sendMessage() now streams tokens into assistant message
- UI updates reactively as each token arrives
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-24 09:41:33 +01:00 |
|
Till JS
|
e2f144962c
|
feat: add unified @manacore/shared-llm package and migrate all backends
Create a shared LLM client package that provides a unified interface
to the mana-llm service, replacing 9 individual fetch-based integrations
with consistent error handling, retry logic, and JSON extraction.
Package (@manacore/shared-llm):
- LlmModule with forRoot/forRootAsync (NestJS dynamic module)
- LlmClientService: chat, json, vision, visionJson, embed, stream
- LlmClient standalone class for non-NestJS consumers
- extractJson utility (consolidates 3 markdown-stripping implementations)
- retryFetch with exponential backoff (429, 5xx, network errors)
- 44 unit tests (json-extractor, retry, llm-client)
Migrated backends:
- mana-core-auth: raw fetch → llm.json()
- planta: raw fetch + vision → llm.visionJson()
- nutriphi: raw fetch + regex → llm.visionJson() + llm.json()
- chat: custom OllamaService (175 LOC) → llm.chatMessages()
- context: raw fetch → llm.chat() (keeps token tracking)
- traces: 2x raw fetch → llm.chat()
- manadeck: @google/genai SDK → llm.json() + llm.visionJson()
- bot-services: raw Ollama API → LlmClient standalone
- matrix-ollama-bot: raw fetch → llm.chatMessages() + llm.vision()
New credit operations:
- AI_PLANT_ANALYSIS (2 credits, planta)
- AI_GUIDE_GENERATION (5 credits, traces)
- AI_CONTEXT_GENERATION (2 credits, context)
- AI_BOT_CHAT (0.1 credits, matrix)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-23 22:06:30 +01:00 |
|