managarten/services/mana-llm/CLAUDE.md
Till JS bfeeef7819 chore(matrix): final scrub of stale matrix references
A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:

- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
  consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
  bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
  (the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
  Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
  Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
  scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
  counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
  service itself was deleted long ago)

Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
  2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
  role behind)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:47:54 +02:00

290 lines
9.1 KiB
Markdown

# mana-llm
Central LLM abstraction service providing a unified OpenAI-compatible API for Ollama and cloud LLM providers.
## Overview
mana-llm acts as a central gateway for all LLM requests in the monorepo, providing:
- Unified OpenAI-compatible API
- Provider routing (Ollama, OpenRouter, Groq, Together)
- Streaming via Server-Sent Events (SSE)
- Vision/multimodal support
- Embeddings generation
- Prometheus metrics
## Architecture
```
┌─────────────────────────────────────────────────────────────────────┐
│ Consumer Apps │
│ chat-backend │ mana web │ todo (LLM enrich) │ etc. │
└────────────────────────────────┬────────────────────────────────────┘
│ HTTP/SSE
┌─────────────────────────────────────────────────────────────────────┐
│ mana-llm (Port 3025) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Router │ │ Cache │ │ Metrics │ │
│ │ (Provider) │ │ (Redis) │ │ (Prometheus)│ │
│ └──────┬──────┘ └─────────────┘ └─────────────┘ │
│ │ │
│ ┌──────┴──────────────────────────────────────────┐ │
│ │ Provider Adapters │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ │
│ │ │ Ollama │ │ OpenAI │ │ OpenRouter │ │ │
│ │ │ Adapter │ │ Adapter │ │ Adapter │ │ │
│ │ └──────────┘ └──────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```
## Quick Start
### Prerequisites
- Python 3.11+
- Ollama running locally (http://localhost:11434)
- Redis (optional, for caching)
### Development
```bash
cd services/mana-llm
# Create virtual environment
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp .env.example .env
# Start Redis (optional)
docker-compose -f docker-compose.dev.yml up -d
# Run service
python -m uvicorn src.main:app --port 3025 --reload
```
### Docker
```bash
# Full stack (mana-llm + Redis)
docker-compose up -d
# View logs
docker-compose logs -f mana-llm
```
## API Endpoints
### Chat Completions
```bash
# Non-streaming
curl -X POST http://localhost:3025/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/gemma3:4b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
# Streaming (SSE)
curl -X POST http://localhost:3025/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/gemma3:4b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
```
### Vision/Multimodal
```bash
curl -X POST http://localhost:3025/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/llava:7b",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}]
}'
```
### Models
```bash
# List all models
curl http://localhost:3025/v1/models
# Get specific model
curl http://localhost:3025/v1/models/ollama/gemma3:4b
```
### Embeddings
```bash
curl -X POST http://localhost:3025/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "ollama/nomic-embed-text",
"input": "Text to embed"
}'
```
### Health & Metrics
```bash
# Health check
curl http://localhost:3025/health
# Prometheus metrics
curl http://localhost:3025/metrics
```
## Provider Routing
Models use the format `provider/model`:
| Model | Provider | Target |
|-------|----------|--------|
| `ollama/gemma3:4b` | Ollama | localhost:11434 |
| `ollama/llava:7b` | Ollama | localhost:11434 |
| `openrouter/meta-llama/llama-3.1-8b-instruct` | OpenRouter | api.openrouter.ai |
| `groq/llama-3.1-8b-instant` | Groq | api.groq.com |
| `together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo` | Together | api.together.xyz |
**Default:** If no provider prefix is given (e.g., `gemma3:4b`), Ollama is used.
## Configuration
Environment variables (see `.env.example`):
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | 3025 | Service port |
| `LOG_LEVEL` | info | Logging level |
| `OLLAMA_URL` | http://localhost:11434 | Ollama server URL |
| `OLLAMA_DEFAULT_MODEL` | gemma3:4b | Default Ollama model |
| `OLLAMA_TIMEOUT` | 120 | Ollama request timeout (seconds) |
| `OPENROUTER_API_KEY` | - | OpenRouter API key |
| `GROQ_API_KEY` | - | Groq API key |
| `TOGETHER_API_KEY` | - | Together API key |
| `REDIS_URL` | - | Redis URL for caching |
| `CACHE_TTL` | 3600 | Cache TTL in seconds |
| `CORS_ORIGINS` | localhost | Allowed CORS origins |
## Project Structure
```
services/mana-llm/
├── src/
│ ├── main.py # FastAPI app entry point
│ ├── config.py # Settings via pydantic-settings
│ ├── providers/
│ │ ├── base.py # Abstract provider interface
│ │ ├── ollama.py # Ollama provider
│ │ ├── openai_compat.py # OpenAI-compatible provider
│ │ └── router.py # Provider routing logic
│ ├── models/
│ │ ├── requests.py # Request Pydantic models
│ │ └── responses.py # Response Pydantic models
│ ├── streaming/
│ │ └── sse.py # SSE response handling
│ └── utils/
│ ├── cache.py # Redis caching
│ └── metrics.py # Prometheus metrics
├── tests/
│ ├── test_api.py # API endpoint tests
│ ├── test_providers.py # Provider tests
│ └── test_streaming.py # Streaming tests
├── Dockerfile
├── docker-compose.yml
├── docker-compose.dev.yml
├── requirements.txt
├── pyproject.toml
└── .env.example
```
## Testing
```bash
# Run tests
pytest
# Run with coverage
pytest --cov=src
# Run specific test file
pytest tests/test_providers.py -v
```
## Integration Example
### TypeScript/Node.js Client
```typescript
// Using fetch
const response = await fetch('http://localhost:3025/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'ollama/gemma3:4b',
messages: [{ role: 'user', content: 'Hello!' }],
stream: false,
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
```
### Streaming with EventSource
```typescript
const response = await fetch('http://localhost:3025/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'ollama/gemma3:4b',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true,
}),
});
const reader = response.body?.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader!.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}
}
```
## Related Services
| Service | Port | Description |
|---------|------|-------------|
| mana-tts | 3022 | Text-to-speech service |
| mana-stt | 3023 | Speech-to-text service |
| mana-search | 3021 | Web search & extraction |