mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-17 04:59:41 +02:00
A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:
- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
(the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
service itself was deleted long ago)
Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
role behind)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
290 lines
9.1 KiB
Markdown
290 lines
9.1 KiB
Markdown
# mana-llm
|
|
|
|
Central LLM abstraction service providing a unified OpenAI-compatible API for Ollama and cloud LLM providers.
|
|
|
|
## Overview
|
|
|
|
mana-llm acts as a central gateway for all LLM requests in the monorepo, providing:
|
|
- Unified OpenAI-compatible API
|
|
- Provider routing (Ollama, OpenRouter, Groq, Together)
|
|
- Streaming via Server-Sent Events (SSE)
|
|
- Vision/multimodal support
|
|
- Embeddings generation
|
|
- Prometheus metrics
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Consumer Apps │
|
|
│ chat-backend │ mana web │ todo (LLM enrich) │ etc. │
|
|
└────────────────────────────────┬────────────────────────────────────┘
|
|
│ HTTP/SSE
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ mana-llm (Port 3025) │
|
|
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
|
│ │ Router │ │ Cache │ │ Metrics │ │
|
|
│ │ (Provider) │ │ (Redis) │ │ (Prometheus)│ │
|
|
│ └──────┬──────┘ └─────────────┘ └─────────────┘ │
|
|
│ │ │
|
|
│ ┌──────┴──────────────────────────────────────────┐ │
|
|
│ │ Provider Adapters │ │
|
|
│ │ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ │
|
|
│ │ │ Ollama │ │ OpenAI │ │ OpenRouter │ │ │
|
|
│ │ │ Adapter │ │ Adapter │ │ Adapter │ │ │
|
|
│ │ └──────────┘ └──────────┘ └──────────────┘ │ │
|
|
│ └─────────────────────────────────────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Python 3.11+
|
|
- Ollama running locally (http://localhost:11434)
|
|
- Redis (optional, for caching)
|
|
|
|
### Development
|
|
|
|
```bash
|
|
cd services/mana-llm
|
|
|
|
# Create virtual environment
|
|
python -m venv venv
|
|
source venv/bin/activate # or venv\Scripts\activate on Windows
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Copy environment file
|
|
cp .env.example .env
|
|
|
|
# Start Redis (optional)
|
|
docker-compose -f docker-compose.dev.yml up -d
|
|
|
|
# Run service
|
|
python -m uvicorn src.main:app --port 3025 --reload
|
|
```
|
|
|
|
### Docker
|
|
|
|
```bash
|
|
# Full stack (mana-llm + Redis)
|
|
docker-compose up -d
|
|
|
|
# View logs
|
|
docker-compose logs -f mana-llm
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
### Chat Completions
|
|
|
|
```bash
|
|
# Non-streaming
|
|
curl -X POST http://localhost:3025/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ollama/gemma3:4b",
|
|
"messages": [{"role": "user", "content": "Hello!"}],
|
|
"stream": false
|
|
}'
|
|
|
|
# Streaming (SSE)
|
|
curl -X POST http://localhost:3025/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ollama/gemma3:4b",
|
|
"messages": [{"role": "user", "content": "Hello!"}],
|
|
"stream": true
|
|
}'
|
|
```
|
|
|
|
### Vision/Multimodal
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3025/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ollama/llava:7b",
|
|
"messages": [{
|
|
"role": "user",
|
|
"content": [
|
|
{"type": "text", "text": "What is in this image?"},
|
|
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
|
|
]
|
|
}]
|
|
}'
|
|
```
|
|
|
|
### Models
|
|
|
|
```bash
|
|
# List all models
|
|
curl http://localhost:3025/v1/models
|
|
|
|
# Get specific model
|
|
curl http://localhost:3025/v1/models/ollama/gemma3:4b
|
|
```
|
|
|
|
### Embeddings
|
|
|
|
```bash
|
|
curl -X POST http://localhost:3025/v1/embeddings \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "ollama/nomic-embed-text",
|
|
"input": "Text to embed"
|
|
}'
|
|
```
|
|
|
|
### Health & Metrics
|
|
|
|
```bash
|
|
# Health check
|
|
curl http://localhost:3025/health
|
|
|
|
# Prometheus metrics
|
|
curl http://localhost:3025/metrics
|
|
```
|
|
|
|
## Provider Routing
|
|
|
|
Models use the format `provider/model`:
|
|
|
|
| Model | Provider | Target |
|
|
|-------|----------|--------|
|
|
| `ollama/gemma3:4b` | Ollama | localhost:11434 |
|
|
| `ollama/llava:7b` | Ollama | localhost:11434 |
|
|
| `openrouter/meta-llama/llama-3.1-8b-instruct` | OpenRouter | api.openrouter.ai |
|
|
| `groq/llama-3.1-8b-instant` | Groq | api.groq.com |
|
|
| `together/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo` | Together | api.together.xyz |
|
|
|
|
**Default:** If no provider prefix is given (e.g., `gemma3:4b`), Ollama is used.
|
|
|
|
## Configuration
|
|
|
|
Environment variables (see `.env.example`):
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `PORT` | 3025 | Service port |
|
|
| `LOG_LEVEL` | info | Logging level |
|
|
| `OLLAMA_URL` | http://localhost:11434 | Ollama server URL |
|
|
| `OLLAMA_DEFAULT_MODEL` | gemma3:4b | Default Ollama model |
|
|
| `OLLAMA_TIMEOUT` | 120 | Ollama request timeout (seconds) |
|
|
| `OPENROUTER_API_KEY` | - | OpenRouter API key |
|
|
| `GROQ_API_KEY` | - | Groq API key |
|
|
| `TOGETHER_API_KEY` | - | Together API key |
|
|
| `REDIS_URL` | - | Redis URL for caching |
|
|
| `CACHE_TTL` | 3600 | Cache TTL in seconds |
|
|
| `CORS_ORIGINS` | localhost | Allowed CORS origins |
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
services/mana-llm/
|
|
├── src/
|
|
│ ├── main.py # FastAPI app entry point
|
|
│ ├── config.py # Settings via pydantic-settings
|
|
│ ├── providers/
|
|
│ │ ├── base.py # Abstract provider interface
|
|
│ │ ├── ollama.py # Ollama provider
|
|
│ │ ├── openai_compat.py # OpenAI-compatible provider
|
|
│ │ └── router.py # Provider routing logic
|
|
│ ├── models/
|
|
│ │ ├── requests.py # Request Pydantic models
|
|
│ │ └── responses.py # Response Pydantic models
|
|
│ ├── streaming/
|
|
│ │ └── sse.py # SSE response handling
|
|
│ └── utils/
|
|
│ ├── cache.py # Redis caching
|
|
│ └── metrics.py # Prometheus metrics
|
|
├── tests/
|
|
│ ├── test_api.py # API endpoint tests
|
|
│ ├── test_providers.py # Provider tests
|
|
│ └── test_streaming.py # Streaming tests
|
|
├── Dockerfile
|
|
├── docker-compose.yml
|
|
├── docker-compose.dev.yml
|
|
├── requirements.txt
|
|
├── pyproject.toml
|
|
└── .env.example
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Run tests
|
|
pytest
|
|
|
|
# Run with coverage
|
|
pytest --cov=src
|
|
|
|
# Run specific test file
|
|
pytest tests/test_providers.py -v
|
|
```
|
|
|
|
## Integration Example
|
|
|
|
### TypeScript/Node.js Client
|
|
|
|
```typescript
|
|
// Using fetch
|
|
const response = await fetch('http://localhost:3025/v1/chat/completions', {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
model: 'ollama/gemma3:4b',
|
|
messages: [{ role: 'user', content: 'Hello!' }],
|
|
stream: false,
|
|
}),
|
|
});
|
|
|
|
const data = await response.json();
|
|
console.log(data.choices[0].message.content);
|
|
```
|
|
|
|
### Streaming with EventSource
|
|
|
|
```typescript
|
|
const response = await fetch('http://localhost:3025/v1/chat/completions', {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
model: 'ollama/gemma3:4b',
|
|
messages: [{ role: 'user', content: 'Hello!' }],
|
|
stream: true,
|
|
}),
|
|
});
|
|
|
|
const reader = response.body?.getReader();
|
|
const decoder = new TextDecoder();
|
|
|
|
while (true) {
|
|
const { done, value } = await reader!.read();
|
|
if (done) break;
|
|
|
|
const chunk = decoder.decode(value);
|
|
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
|
|
|
|
for (const line of lines) {
|
|
const data = line.slice(6);
|
|
if (data === '[DONE]') break;
|
|
|
|
const parsed = JSON.parse(data);
|
|
const content = parsed.choices[0]?.delta?.content;
|
|
if (content) process.stdout.write(content);
|
|
}
|
|
}
|
|
```
|
|
|
|
## Related Services
|
|
|
|
| Service | Port | Description |
|
|
|---------|------|-------------|
|
|
| mana-tts | 3022 | Text-to-speech service |
|
|
| mana-stt | 3023 | Speech-to-text service |
|
|
| mana-search | 3021 | Web search & extraction |
|