managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 23:01:09 +02:00

Commit graph

Author	SHA1	Message	Date
Till JS	56ffcbac39	feat: add Ollama memory optimization, LLM metrics, and chat streaming Three improvements to the unified LLM infrastructure: 1. Ollama memory optimization (scripts/mac-mini/configure-ollama.sh): - OLLAMA_KEEP_ALIVE=5m → models unload after 5min idle (saves 3-16GB RAM) - OLLAMA_NUM_PARALLEL=1 → predictable memory usage - OLLAMA_MAX_LOADED_MODELS=1 → max 1 model in RAM at a time 2. Request-level metrics in @manacore/shared-llm: - LlmRequestMetrics interface (model, latency, tokens, fallback detection) - LlmMetricsCollector class with summary stats (for health endpoints) - Optional onMetrics callback in LlmModuleOptions - Automatic metrics emission in chatMessages() (success + error) 3. Chat streaming (token-by-token SSE): - Backend: POST /chat/completions/stream SSE endpoint - OllamaService.createStreamingCompletion() via llm.chatStreamMessages() - ChatService.createStreamingCompletion() with upfront credit consumption - Web: chatApi.createStreamingCompletion() SSE consumer - Chat store: sendMessage() now streams tokens into assistant message - UI updates reactively as each token arrives Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-24 09:41:33 +01:00
Till JS	e2f144962c	feat: add unified @manacore/shared-llm package and migrate all backends Create a shared LLM client package that provides a unified interface to the mana-llm service, replacing 9 individual fetch-based integrations with consistent error handling, retry logic, and JSON extraction. Package (@manacore/shared-llm): - LlmModule with forRoot/forRootAsync (NestJS dynamic module) - LlmClientService: chat, json, vision, visionJson, embed, stream - LlmClient standalone class for non-NestJS consumers - extractJson utility (consolidates 3 markdown-stripping implementations) - retryFetch with exponential backoff (429, 5xx, network errors) - 44 unit tests (json-extractor, retry, llm-client) Migrated backends: - mana-core-auth: raw fetch → llm.json() - planta: raw fetch + vision → llm.visionJson() - nutriphi: raw fetch + regex → llm.visionJson() + llm.json() - chat: custom OllamaService (175 LOC) → llm.chatMessages() - context: raw fetch → llm.chat() (keeps token tracking) - traces: 2x raw fetch → llm.chat() - manadeck: @google/genai SDK → llm.json() + llm.visionJson() - bot-services: raw Ollama API → LlmClient standalone - matrix-ollama-bot: raw fetch → llm.chatMessages() + llm.vision() New credit operations: - AI_PLANT_ANALYSIS (2 credits, planta) - AI_GUIDE_GENERATION (5 credits, traces) - AI_CONTEXT_GENERATION (2 credits, context) - AI_BOT_CHAT (0.1 credits, matrix) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-23 22:06:30 +01:00

Author

SHA1

Message

Date

Till JS

56ffcbac39

feat: add Ollama memory optimization, LLM metrics, and chat streaming

Three improvements to the unified LLM infrastructure:

1. Ollama memory optimization (scripts/mac-mini/configure-ollama.sh):
   - OLLAMA_KEEP_ALIVE=5m → models unload after 5min idle (saves 3-16GB RAM)
   - OLLAMA_NUM_PARALLEL=1 → predictable memory usage
   - OLLAMA_MAX_LOADED_MODELS=1 → max 1 model in RAM at a time

2. Request-level metrics in @manacore/shared-llm:
   - LlmRequestMetrics interface (model, latency, tokens, fallback detection)
   - LlmMetricsCollector class with summary stats (for health endpoints)
   - Optional onMetrics callback in LlmModuleOptions
   - Automatic metrics emission in chatMessages() (success + error)

3. Chat streaming (token-by-token SSE):
   - Backend: POST /chat/completions/stream SSE endpoint
   - OllamaService.createStreamingCompletion() via llm.chatStreamMessages()
   - ChatService.createStreamingCompletion() with upfront credit consumption
   - Web: chatApi.createStreamingCompletion() SSE consumer
   - Chat store: sendMessage() now streams tokens into assistant message
   - UI updates reactively as each token arrives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-24 09:41:33 +01:00

Till JS

e2f144962c

feat: add unified @manacore/shared-llm package and migrate all backends

Create a shared LLM client package that provides a unified interface
to the mana-llm service, replacing 9 individual fetch-based integrations
with consistent error handling, retry logic, and JSON extraction.

Package (@manacore/shared-llm):
- LlmModule with forRoot/forRootAsync (NestJS dynamic module)
- LlmClientService: chat, json, vision, visionJson, embed, stream
- LlmClient standalone class for non-NestJS consumers
- extractJson utility (consolidates 3 markdown-stripping implementations)
- retryFetch with exponential backoff (429, 5xx, network errors)
- 44 unit tests (json-extractor, retry, llm-client)

Migrated backends:
- mana-core-auth: raw fetch → llm.json()
- planta: raw fetch + vision → llm.visionJson()
- nutriphi: raw fetch + regex → llm.visionJson() + llm.json()
- chat: custom OllamaService (175 LOC) → llm.chatMessages()
- context: raw fetch → llm.chat() (keeps token tracking)
- traces: 2x raw fetch → llm.chat()
- manadeck: @google/genai SDK → llm.json() + llm.visionJson()
- bot-services: raw Ollama API → LlmClient standalone
- matrix-ollama-bot: raw fetch → llm.chatMessages() + llm.vision()

New credit operations:
- AI_PLANT_ANALYSIS (2 credits, planta)
- AI_GUIDE_GENERATION (5 credits, traces)
- AI_CONTEXT_GENERATION (2 credits, context)
- AI_BOT_CHAT (0.1 credits, matrix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-23 22:06:30 +01:00

2 commits