mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 18:41:08 +02:00
refactor: remove local AI services from Mac Mini, GPU-only architecture
- Deactivate Ollama, FLUX.2, and Telegram Bot LaunchAgents on Mac Mini - Remove extra_hosts from mana-llm (no longer needs host.docker.internal) - Update health-check.sh to monitor GPU server services instead of local - Update status.sh to show GPU server status instead of native services - Rewrite MAC_MINI_SERVER.md: remove ~400 lines of Ollama/FLUX/Bot docs, add GPU server architecture diagram and deactivation notes - Update CAPACITY_PLANNING.md with post-offload numbers (~80-150 peak users) Mac Mini is now a pure hosting server (Web, API, DB, Sync). All AI workloads run on GPU server (RTX 3090) via LAN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
99f15955fe
commit
b45ddbbb83
5 changed files with 109 additions and 369 deletions
|
|
@ -1311,8 +1311,6 @@ services:
|
|||
AUTO_FALLBACK_ENABLED: "true"
|
||||
OLLAMA_MAX_CONCURRENT: 5
|
||||
CORS_ORIGINS: https://playground.mana.how,https://mana.how,https://chat.mana.how
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
ports:
|
||||
- "3020:3020"
|
||||
healthcheck:
|
||||
|
|
|
|||
|
|
@ -40,13 +40,10 @@ Stand: 2026-03-28
|
|||
| Sonstiges (Watchtower, Landing Builder, LLM) | 4 | ~0.5 GB |
|
||||
| **Gesamt** | **61** | **~10.6 GB** |
|
||||
|
||||
### Native Services
|
||||
### Native Services (deaktiviert seit 2026-03-28)
|
||||
|
||||
| Service | RAM (idle) | RAM (aktiv) |
|
||||
|---------|-----------|-------------|
|
||||
| Ollama (Gemma 3 4B) | ~0 MB (nach 5min entladen) | ~3.3 GB |
|
||||
| Ollama (Gemma 3 27B) | ~0 MB | ~16 GB (gesamter RAM!) |
|
||||
| FLUX.2 klein | ~0.5 GB | ~2 GB |
|
||||
Ollama, FLUX.2 und Telegram Bot wurden auf den GPU-Server migriert.
|
||||
Keine nativen AI-Services mehr auf dem Mac Mini.
|
||||
|
||||
### RAM-Budget
|
||||
|
||||
|
|
@ -55,11 +52,11 @@ Verfuegbar: 16.0 GB
|
|||
Docker Container: -10.6 GB
|
||||
macOS Overhead: -1.5 GB
|
||||
─────────────────────────────
|
||||
Frei: 3.9 GB ← fuer Ollama, Builds, Peaks
|
||||
Frei fuer Builds/Peaks: 3.9 GB ← stabil, kein Ollama-Konflikt
|
||||
```
|
||||
|
||||
**Kritisch:** Bei aktivem Ollama (3.3 GB fuer 4B-Modell) bleiben nur ~0.6 GB fuer Peaks.
|
||||
Build-Script stoppt deshalb 13 Monitoring-Container (~2 GB) vor dem Bauen.
|
||||
Keine RAM-Konkurrenz mit LLM-Modellen mehr. Build-Script muss Monitoring-Container
|
||||
nur noch bei grossen Multi-App-Builds stoppen.
|
||||
|
||||
## Kapazitaetsschaetzung nach Workload-Typ
|
||||
|
||||
|
|
@ -81,39 +78,41 @@ Apps wie Todo, Calendar, Clock, Zitare, Contacts, etc.
|
|||
| API Requests/sec | **~100-200** | NestJS/Hono koennen mehr, DB ist Limit |
|
||||
| Bottleneck | PostgreSQL Connections + RAM | |
|
||||
|
||||
### Tier 3: AI-Workloads (Ollama, FLUX.2)
|
||||
### Tier 3: AI-Workloads (GPU-Server, RTX 3090)
|
||||
|
||||
| Metrik | Wert | Begruendung |
|
||||
|--------|------|-------------|
|
||||
| LLM gleichzeitig | **1** | OLLAMA_NUM_PARALLEL=1, Modell belegt 3-16 GB |
|
||||
| LLM Durchsatz | **~53 tokens/sec** (4B) | ~260 tokens/sec Prompt Processing |
|
||||
| Bildgenerierung | **1 gleichzeitig** | ~1.5s pro 1024x1024 Bild |
|
||||
| Bottleneck | **RAM** (Ollama + Container konkurrieren) | |
|
||||
| LLM gleichzeitig | **5** | OLLAMA_MAX_CONCURRENT=5, 24 GB VRAM |
|
||||
| LLM Durchsatz | **~80-100 tokens/sec** (12B) | CUDA deutlich schneller als Metal |
|
||||
| Bildgenerierung | **3-5 gleichzeitig** | ~0.5s pro 1024x1024 Bild |
|
||||
| Bottleneck | **VRAM (24 GB)** | Aber kein Konflikt mit Hosting |
|
||||
|
||||
### Gesamtschaetzung
|
||||
### Gesamtschaetzung (nach GPU-Offload)
|
||||
|
||||
| Szenario | Max. gleichzeitige User |
|
||||
|----------|------------------------|
|
||||
| Nur Local-First Apps | ~200 |
|
||||
| Mixed (Local-First + API) | ~50-100 |
|
||||
| Mit aktiver LLM-Nutzung | ~20-30 |
|
||||
| Peak (alle Services + LLM + Bildgen) | **~10-20** |
|
||||
| Mixed (Local-First + API) | ~100-150 |
|
||||
| Mit aktiver LLM-Nutzung | ~80-120 |
|
||||
| Peak (alle Services + LLM + Bildgen) | **~80-150** |
|
||||
|
||||
## Bottleneck-Analyse
|
||||
|
||||
| Rang | Bottleneck | Auswirkung | Loesung |
|
||||
|------|-----------|------------|---------|
|
||||
| 1 | **RAM (16 GB)** | Ollama + Container kaempfen um Speicher | RAM-Upgrade (neuer Mac Mini) oder GPU-Server fuer LLM |
|
||||
| 2 | **Cloudflare Tunnel Latenz** | ~4s TTFB fuer erste Requests | CDN/Workers fuer statische Assets |
|
||||
| 3 | **PostgreSQL Connections** | Max 20 pro Service, shared DB | Connection Pooling (PgBouncer) |
|
||||
| 4 | **Single Server** | Kein Failover, kein horizontales Scaling | Zweiter Mac Mini oder Cloud-Burst |
|
||||
| 1 | **Cloudflare Tunnel Latenz** | ~4s TTFB fuer erste Requests | CDN/Workers fuer statische Assets |
|
||||
| 2 | **PostgreSQL Connections** | Max 20 pro Service, shared DB | Connection Pooling (PgBouncer) |
|
||||
| 3 | **Single Server** | Kein Failover, kein horizontales Scaling | Zweiter Mac Mini oder Cloud-Burst |
|
||||
| 4 | **GPU-Server LAN-Latenz** | <1ms, vernachlaessigbar | Kein Handlungsbedarf |
|
||||
|
||||
## Scaling-Roadmap
|
||||
|
||||
### Phase 1: Optimierung (0 EUR)
|
||||
|
||||
- [x] GPU-Server ueber LAN anbinden → alle AI-Last vom Mac Mini verlagert
|
||||
- [x] Ollama/FLUX.2/Telegram-Bot auf Mac Mini deaktiviert
|
||||
- [x] Registrierungslimit implementiert (MAX_DAILY_SIGNUPS, default: unlimitiert)
|
||||
- [x] Health-Checks und status.sh auf GPU-Server umgestellt
|
||||
- [ ] PgBouncer fuer Connection Pooling einrichten
|
||||
- [ ] Cloudflare Cache Rules fuer statische Assets
|
||||
- [ ] Registrierungslimit aktivieren (5/Tag) in .env auf Server
|
||||
|
|
|
|||
|
|
@ -18,45 +18,37 @@ Cloudflare Tunnel (cloudflared)
|
|||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Mac Mini M4 (mana-server) │
|
||||
│ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ ┌────────────┐ │
|
||||
│ │ PostgreSQL │ │ Redis │ │ Ollama │ │
|
||||
│ │ (Docker) │ │ (Docker) │ │ (nativ) │ │
|
||||
│ └─────────────────┘ └─────────────────┘ └────────────┘ │
|
||||
│ ┌─────────────────┐ ┌─────────────────┐ │
|
||||
│ │ PostgreSQL │ │ Redis │ │
|
||||
│ │ (Docker) │ │ (Docker) │ │
|
||||
│ └─────────────────┘ └─────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ Docker Container │ │
|
||||
│ │ Docker Container (~61 Services) │ │
|
||||
│ │ ├── mana-core-auth (Port 3001) │ │
|
||||
│ │ ├── dashboard-web (Port 5173) │ │
|
||||
│ │ ├── chat-backend (Port 3002) │ │
|
||||
│ │ ├── chat-web (Port 3000) │ │
|
||||
│ │ ├── todo-backend (Port 3018) │ │
|
||||
│ │ ├── todo-web (Port 5188) │ │
|
||||
│ │ ├── calendar-backend (Port 3016) │ │
|
||||
│ │ ├── calendar-web (Port 5186) │ │
|
||||
│ │ ├── clock-backend (Port 3017) │ │
|
||||
│ │ └── clock-web (Port 5187) │ │
|
||||
│ │ ├── clock-web (Port 5187) │ │
|
||||
│ │ ├── mana-sync (Go) (Port 3050) │ │
|
||||
│ │ ├── mana-llm (Port 3020) │ │
|
||||
│ │ └── ... (19 web apps, core services, monitoring) │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ ▲ │
|
||||
│ │ host.docker.internal:11434 │
|
||||
│ │ │
|
||||
│ │ LAN (192.168.178.11) │
|
||||
│ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ Ollama (Port 11434) - Gemma 3 4B │ │
|
||||
│ │ ~53 t/s Generation | Metal GPU Acceleration │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ Native Services │ │
|
||||
│ │ ├── Ollama (Port 11434) - LLM │ │
|
||||
│ │ ├── Mana Image Gen (Port 3025) - FLUX.2 klein │ │
|
||||
│ │ └── Telegram Ollama Bot (Port 3301) - Chat Bot │ │
|
||||
│ │ GPU Server (Windows, RTX 3090, 24 GB VRAM) │ │
|
||||
│ │ ├── Ollama (Port 11434) - gemma3:12b │ │
|
||||
│ │ ├── STT (Whisper) (Port 3020) │ │
|
||||
│ │ ├── TTS (Port 3022) │ │
|
||||
│ │ └── Image Gen (FLUX) (Port 3023) │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ LaunchAgents (Autostart) │ │
|
||||
│ │ ├── cloudflared (Tunnel) │ │
|
||||
│ │ ├── ollama (LLM Service) │ │
|
||||
│ │ ├── mana-image-gen (Bildgenerierung) │ │
|
||||
│ │ ├── telegram-ollama-bot (Chat Bot) │ │
|
||||
│ │ ├── docker-startup (Container beim Boot) │ │
|
||||
│ │ └── health-check (alle 5 Minuten) │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
|
|
@ -166,14 +158,14 @@ launchctl load ~/Library/LaunchAgents/com.manacore.docker-startup.plist
|
|||
|
||||
## Autostart-Konfiguration
|
||||
|
||||
Fünf LaunchAgents sorgen für automatischen Betrieb:
|
||||
Drei LaunchAgents sorgen fuer automatischen Betrieb:
|
||||
|
||||
### 1. Cloudflare Tunnel
|
||||
|
||||
**Datei:** `~/Library/LaunchAgents/com.cloudflare.cloudflared.plist`
|
||||
|
||||
- Startet beim Login
|
||||
- Hält den Tunnel zu Cloudflare offen
|
||||
- Haelt den Tunnel zu Cloudflare offen
|
||||
- Automatischer Neustart bei Absturz
|
||||
|
||||
### 2. Docker Container Startup
|
||||
|
|
@ -182,32 +174,23 @@ Fünf LaunchAgents sorgen für automatischen Betrieb:
|
|||
|
||||
- Startet beim Login
|
||||
- Wartet auf Docker Desktop
|
||||
- Führt `docker compose up -d` aus
|
||||
- Fuehrt `docker compose up -d` aus
|
||||
- Erstellt fehlende Datenbanken automatisch
|
||||
|
||||
### 3. Health Check
|
||||
|
||||
**Datei:** `~/Library/LaunchAgents/com.manacore.health-check.plist`
|
||||
|
||||
- Läuft alle 5 Minuten
|
||||
- Prüft alle Services (HTTP + Docker)
|
||||
- Laeuft alle 5 Minuten
|
||||
- Prueft alle Services (HTTP + Docker)
|
||||
- Sendet Benachrichtigungen bei Fehlern
|
||||
|
||||
### 4. Ollama
|
||||
### Deaktivierte LaunchAgents
|
||||
|
||||
**Datei:** `~/Library/LaunchAgents/homebrew.mxcl.ollama.plist`
|
||||
|
||||
- Startet beim Login
|
||||
- LLM-Server auf Port 11434
|
||||
- Metal GPU-Beschleunigung
|
||||
|
||||
### 5. Telegram Ollama Bot
|
||||
|
||||
**Datei:** `~/Library/LaunchAgents/com.manacore.telegram-ollama-bot.plist`
|
||||
|
||||
- Startet beim Login
|
||||
- Telegram Bot auf Port 3301
|
||||
- Verbindet zu Ollama für LLM-Anfragen
|
||||
Diese LaunchAgents sind seit der GPU-Server-Migration deaktiviert:
|
||||
- `homebrew.mxcl.ollama.plist` — LLM laeuft auf GPU-Server
|
||||
- `com.manacore.image-gen.plist` — Bildgenerierung laeuft auf GPU-Server
|
||||
- `com.manacore.telegram-ollama-bot.plist` — Bot deaktiviert
|
||||
|
||||
### Setup neu ausführen
|
||||
|
||||
|
|
@ -514,17 +497,40 @@ docker image prune -a
|
|||
| `deploy.sh` | Pullt neue Images und startet neu |
|
||||
| `build-app.sh` | Baut einzelne Apps (stoppt Monitoring für RAM) |
|
||||
|
||||
## Ollama (Lokale KI)
|
||||
|
||||
Ollama läuft nativ auf dem Mac Mini für lokale LLM-Inferenz (Klassifizierung, Text-Analyse, etc.).
|
||||
|
||||
### Hardware
|
||||
## Hardware
|
||||
|
||||
- **Chip:** Apple M4 (10 Cores)
|
||||
- **RAM:** 16 GB Unified Memory
|
||||
- **Interne SSD:** 228 GB
|
||||
- **Externe SSD:** 4 TB (ManaData)
|
||||
|
||||
## AI-Workloads (GPU-Server)
|
||||
|
||||
Alle AI-Services (LLM, Bildgenerierung, STT, TTS) laufen auf dem Windows GPU-Server (RTX 3090, 24 GB VRAM) unter `192.168.178.11`. Der Mac Mini ist reiner Hosting-Server fuer Web, API, DB und Sync.
|
||||
|
||||
| Service | GPU-Server Port | Zugriff aus Docker |
|
||||
|---------|----------------|-------------------|
|
||||
| Ollama (LLM) | 11434 | `http://192.168.178.11:11434` |
|
||||
| STT (Whisper) | 3020 | `http://192.168.178.11:3020` |
|
||||
| TTS | 3022 | `http://192.168.178.11:3022` |
|
||||
| Image Gen | 3023 | `http://192.168.178.11:3023` |
|
||||
|
||||
Alle Werte sind per Env-Var ueberschreibbar (`OLLAMA_URL`, `STT_SERVICE_URL`, `TTS_SERVICE_URL`, `IMAGE_GEN_SERVICE_URL`).
|
||||
|
||||
Cloud-Fallback bei GPU-Server-Ausfall: `mana-llm` hat `AUTO_FALLBACK_ENABLED=true` (OpenRouter, Groq, Google).
|
||||
|
||||
### Ollama/FLUX.2 auf dem Mac Mini (deaktiviert)
|
||||
|
||||
Ollama und FLUX.2 waren frueher lokal installiert, sind aber seit 2026-03-28 deaktiviert. Die Modelle liegen noch auf der SSD als Backup:
|
||||
- `/Volumes/ManaData/ollama/` (~58 GB)
|
||||
- `/Volumes/ManaData/flux2/` (~15 GB)
|
||||
|
||||
Bei Bedarf reaktivieren:
|
||||
```bash
|
||||
brew services start ollama
|
||||
launchctl load ~/Library/LaunchAgents/com.manacore.image-gen.plist
|
||||
```
|
||||
|
||||
## Externe 4TB SSD
|
||||
|
||||
Die externe SSD wird für persistente Daten verwendet - sowohl für große Dateien (AI-Modelle) als auch für kritische Datenbanken (PostgreSQL, MinIO).
|
||||
|
|
@ -583,13 +589,13 @@ Die folgenden Services nutzen direkte SSD-Mounts (kein Docker Volume):
|
|||
| PostgreSQL | `/Volumes/ManaData/postgres` | `volumes: - /Volumes/ManaData/postgres:/var/lib/postgresql/data` |
|
||||
| MinIO | `/Volumes/ManaData/minio` | `volumes: - /Volumes/ManaData/minio:/data` |
|
||||
|
||||
### Symlinks (für native Services)
|
||||
### Symlinks (archiviert, fuer Backup-Modelle)
|
||||
|
||||
| Original | Symlink |
|
||||
|----------|---------|
|
||||
| `~/.ollama` | `/Volumes/ManaData/ollama` |
|
||||
| `~/stt-models` | `/Volumes/ManaData/stt-models` |
|
||||
| `~/flux2` | `/Volumes/ManaData/flux2` |
|
||||
| Original | Symlink | Status |
|
||||
|----------|---------|--------|
|
||||
| `~/.ollama` | `/Volumes/ManaData/ollama` | Deaktiviert (GPU-Server) |
|
||||
| `~/stt-models` | `/Volumes/ManaData/stt-models` | Deaktiviert (GPU-Server) |
|
||||
| `~/flux2` | `/Volumes/ManaData/flux2` | Deaktiviert (GPU-Server) |
|
||||
|
||||
### SSD prüfen
|
||||
|
||||
|
|
@ -626,275 +632,6 @@ Docker Desktop benötigt "Full Disk Access" für SSD-Mounts:
|
|||
Systemeinstellungen → Datenschutz & Sicherheit → Voller Festplattenzugriff → Docker.app ✅
|
||||
```
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Bereits installiert via Homebrew
|
||||
/opt/homebrew/bin/brew install ollama
|
||||
/opt/homebrew/bin/brew services start ollama
|
||||
```
|
||||
|
||||
### Konfiguration
|
||||
|
||||
**LaunchAgent:** `~/Library/LaunchAgents/homebrew.mxcl.ollama.plist`
|
||||
|
||||
Optimierungen bereits aktiviert:
|
||||
- `OLLAMA_KEEP_ALIVE=5m` - Modelle nach 5min Inaktivität aus RAM entladen (spart 3-16 GB)
|
||||
- `OLLAMA_FLASH_ATTENTION=1` - Schnellere Attention-Berechnung
|
||||
- `OLLAMA_KV_CACHE_TYPE=q8_0` - Effizienterer KV-Cache
|
||||
- `OLLAMA_NUM_PARALLEL=1` - Max 1 paralleler Request (vorhersagbarer RAM)
|
||||
- `OLLAMA_MAX_LOADED_MODELS=1` - Max 1 Modell gleichzeitig im RAM
|
||||
|
||||
Setup-Script: `./scripts/mac-mini/configure-ollama.sh`
|
||||
|
||||
### Speicherort
|
||||
|
||||
Die Modelle liegen auf der externen 4TB SSD für mehr Platz:
|
||||
- **Pfad:** `/Volumes/ManaData/ollama/models`
|
||||
- **Symlink:** `~/.ollama -> /Volumes/ManaData/ollama`
|
||||
|
||||
### Verfügbare Modelle
|
||||
|
||||
| Modell | Größe | Typ | Performance | Zweck |
|
||||
|--------|-------|-----|-------------|-------|
|
||||
| gemma3:4b | 3.3 GB | Text | ~53 t/s | Standard - schnell |
|
||||
| gemma3:12b | 8 GB | Text | ~30 t/s | Empfohlen - gute Balance |
|
||||
| gemma3:27b | 16 GB | Text | ~15 t/s | Beste Qualität |
|
||||
| phi3.5:latest | 2.2 GB | Text | ~60 t/s | Microsoft - kompakt |
|
||||
| ministral-3:3b | 3 GB | Text | ~55 t/s | Mistral Mini |
|
||||
| llava:7b | 4.7 GB | Vision | ~25 t/s | Bildverständnis |
|
||||
| qwen3-vl:4b | 3.3 GB | Vision | ~40 t/s | Vision-Language |
|
||||
| deepseek-ocr:latest | 6.7 GB | Vision | ~20 t/s | OCR & Dokumente |
|
||||
| qwen2.5-coder:7b | 4.7 GB | Code | ~35 t/s | Code-Generierung |
|
||||
| qwen2.5-coder:14b | 10 GB | Code | ~20 t/s | Erweiterte Code-Gen |
|
||||
|
||||
Siehe [OLLAMA_MODELS.md](./OLLAMA_MODELS.md) für Details zum Hinzufügen neuer Modelle.
|
||||
|
||||
```bash
|
||||
# Modelle auflisten
|
||||
/opt/homebrew/bin/ollama list
|
||||
|
||||
# Neues Modell herunterladen
|
||||
/opt/homebrew/bin/ollama pull gemma3:12b
|
||||
```
|
||||
|
||||
### Performance (gemessen)
|
||||
|
||||
| Metrik | Wert |
|
||||
|--------|------|
|
||||
| Text Generation | ~53 tokens/sec |
|
||||
| Prompt Processing | ~260 tokens/sec |
|
||||
| Latenz (kurze Anfrage) | ~0.4 sec |
|
||||
|
||||
### API-Zugriff
|
||||
|
||||
**Lokaler Endpunkt:** `http://localhost:11434`
|
||||
|
||||
```bash
|
||||
# Generate API
|
||||
curl http://localhost:11434/api/generate -d '{
|
||||
"model": "gemma3:4b",
|
||||
"prompt": "Klassifiziere: Newsletter oder Spam?",
|
||||
"stream": false
|
||||
}'
|
||||
|
||||
# OpenAI-kompatible API
|
||||
curl http://localhost:11434/v1/chat/completions -d '{
|
||||
"model": "gemma3:4b",
|
||||
"messages": [{"role": "user", "content": "Hallo"}]
|
||||
}'
|
||||
```
|
||||
|
||||
### Zugriff aus Docker-Containern
|
||||
|
||||
Docker-Container können Ollama über `host.docker.internal` erreichen:
|
||||
|
||||
```bash
|
||||
# Aus einem Container heraus
|
||||
curl http://host.docker.internal:11434/api/generate -d '...'
|
||||
```
|
||||
|
||||
Oder in Docker Compose Environment-Variablen:
|
||||
```yaml
|
||||
environment:
|
||||
OLLAMA_URL: http://host.docker.internal:11434
|
||||
```
|
||||
|
||||
### Ollama Management
|
||||
|
||||
```bash
|
||||
# Service Status
|
||||
/opt/homebrew/bin/brew services info ollama
|
||||
|
||||
# Service neustarten
|
||||
/opt/homebrew/bin/brew services restart ollama
|
||||
|
||||
# Logs prüfen
|
||||
tail -f /opt/homebrew/var/log/ollama.log
|
||||
|
||||
# Modell entfernen
|
||||
/opt/homebrew/bin/ollama rm gemma3:4b
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
```bash
|
||||
# Prüfen ob Ollama läuft
|
||||
curl http://localhost:11434/api/version
|
||||
|
||||
# GPU-Nutzung prüfen (sollte Metal verwenden)
|
||||
/opt/homebrew/bin/ollama ps
|
||||
|
||||
# Bei Problemen: Service neustarten
|
||||
/opt/homebrew/bin/brew services restart ollama
|
||||
```
|
||||
|
||||
## Mana Image Generation (FLUX.2 klein)
|
||||
|
||||
Lokale Bildgenerierung mit FLUX.2 klein 4B via flux2.c.
|
||||
|
||||
### Service-Info
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| **Port** | 3025 |
|
||||
| **Health** | http://localhost:3025/health |
|
||||
| **Code** | `services/mana-image-gen/` |
|
||||
| **Model** | FLUX.2 klein 4B (4 Milliarden Parameter) |
|
||||
| **Lizenz** | Apache 2.0 (kommerziell nutzbar) |
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Setup-Script ausführen (installiert flux2.c + Modell)
|
||||
./scripts/mac-mini/setup-image-gen.sh
|
||||
```
|
||||
|
||||
Das Script:
|
||||
1. Kompiliert flux2.c mit MPS-Unterstützung
|
||||
2. Lädt das FLUX.2 klein 4B Modell herunter (~16 GB)
|
||||
3. Richtet Python-Umgebung ein
|
||||
4. Erstellt LaunchAgent für Autostart
|
||||
|
||||
### Performance
|
||||
|
||||
| Auflösung | Schritte | Zeit |
|
||||
|-----------|----------|------|
|
||||
| 512x512 | 4 | ~0.3s |
|
||||
| 1024x1024 | 4 | ~0.8s |
|
||||
| 1024x1024 | 8 | ~1.5s |
|
||||
|
||||
### API-Zugriff
|
||||
|
||||
**Lokaler Endpunkt:** `http://localhost:3025`
|
||||
|
||||
```bash
|
||||
# Health Check
|
||||
curl http://localhost:3025/health
|
||||
|
||||
# Bild generieren
|
||||
curl -X POST http://localhost:3025/generate \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"prompt": "A cat in space", "width": 1024, "height": 1024}'
|
||||
|
||||
# Bild abrufen
|
||||
curl http://localhost:3025/images/{filename} --output image.png
|
||||
```
|
||||
|
||||
### Zugriff aus Docker-Containern
|
||||
|
||||
```yaml
|
||||
environment:
|
||||
IMAGE_GEN_SERVICE_URL: http://host.docker.internal:3025
|
||||
```
|
||||
|
||||
### Management
|
||||
|
||||
```bash
|
||||
# Logs anzeigen
|
||||
tail -f /tmp/manacore-image-gen.log
|
||||
|
||||
# Service neustarten
|
||||
launchctl unload ~/Library/LaunchAgents/com.manacore.image-gen.plist
|
||||
launchctl load ~/Library/LaunchAgents/com.manacore.image-gen.plist
|
||||
|
||||
# Status prüfen
|
||||
launchctl list | grep image-gen
|
||||
```
|
||||
|
||||
## Telegram Ollama Bot
|
||||
|
||||
Telegram Bot für Interaktion mit dem lokalen Ollama LLM.
|
||||
|
||||
### Bot-Info
|
||||
|
||||
| | |
|
||||
|--|--|
|
||||
| **Telegram** | [@chat_mana_bot](https://t.me/chat_mana_bot) |
|
||||
| **Port** | 3301 |
|
||||
| **Health** | http://localhost:3301/health |
|
||||
| **Code** | `services/telegram-ollama-bot/` |
|
||||
|
||||
### Telegram-Befehle
|
||||
|
||||
| Befehl | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `/start` | Hilfe anzeigen |
|
||||
| `/help` | Alle Befehle |
|
||||
| `/models` | Verfügbare Modelle anzeigen |
|
||||
| `/model [name]` | Modell wechseln |
|
||||
| `/mode [modus]` | System-Prompt ändern |
|
||||
| `/clear` | Chat-Verlauf löschen |
|
||||
| `/status` | Ollama-Status prüfen |
|
||||
|
||||
### Modi
|
||||
|
||||
| Modus | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `default` | Allgemeiner Assistent |
|
||||
| `classify` | Text-Klassifizierung |
|
||||
| `summarize` | Zusammenfassungen |
|
||||
| `translate` | Übersetzungen |
|
||||
| `code` | Programmier-Hilfe |
|
||||
|
||||
### LaunchAgent
|
||||
|
||||
**Datei:** `~/Library/LaunchAgents/com.manacore.telegram-ollama-bot.plist`
|
||||
|
||||
- Startet automatisch beim Login
|
||||
- Neustart bei Absturz (KeepAlive)
|
||||
- Logs: `~/Library/Logs/telegram-ollama-bot.log`
|
||||
|
||||
### Bot Management
|
||||
|
||||
```bash
|
||||
# Status prüfen
|
||||
curl http://localhost:3301/health
|
||||
|
||||
# Logs anzeigen
|
||||
tail -f ~/Library/Logs/telegram-ollama-bot.log
|
||||
|
||||
# Bot neustarten
|
||||
launchctl stop com.manacore.telegram-ollama-bot
|
||||
launchctl start com.manacore.telegram-ollama-bot
|
||||
|
||||
# Bot manuell starten (für Debugging)
|
||||
cd ~/projects/manacore-monorepo/services/telegram-ollama-bot
|
||||
TELEGRAM_BOT_TOKEN=xxx OLLAMA_URL=http://localhost:11434 node dist/main.js
|
||||
```
|
||||
|
||||
### Bot aktualisieren
|
||||
|
||||
```bash
|
||||
cd ~/projects/manacore-monorepo
|
||||
git pull
|
||||
cd services/telegram-ollama-bot
|
||||
pnpm install
|
||||
pnpm build
|
||||
launchctl stop com.manacore.telegram-ollama-bot
|
||||
launchctl start com.manacore.telegram-ollama-bot
|
||||
```
|
||||
|
||||
## Matrix (DSGVO-konformes Messaging)
|
||||
|
||||
Matrix ist eine DSGVO-konforme Alternative zu Telegram für Bot-Kommunikation.
|
||||
|
|
@ -913,7 +650,7 @@ Alle Matrix Bots laufen als Docker Container und werden via GHCR (GitHub Contain
|
|||
| Bot | Port | Beschreibung |
|
||||
|-----|------|--------------|
|
||||
| matrix-mana-bot | 4010 | Gateway - alle Features in einem Bot |
|
||||
| matrix-ollama-bot | 4011 | KI-Chat via lokalem Ollama |
|
||||
| matrix-ollama-bot | 4011 | KI-Chat via GPU-Server Ollama |
|
||||
| matrix-stats-bot | 4012 | Server-Statistiken & Monitoring |
|
||||
| matrix-project-doc-bot | 4013 | Projekt-Dokumentation aus Fotos/Voice/Text |
|
||||
| matrix-todo-bot | 4014 | Aufgabenverwaltung |
|
||||
|
|
@ -973,12 +710,13 @@ Siehe [MATRIX_SELF_HOSTING.md](./MATRIX_SELF_HOSTING.md) für detaillierte Anlei
|
|||
## Chronologie der Einrichtung
|
||||
|
||||
1. **Docker Setup** - PostgreSQL, Redis, App-Container
|
||||
2. **Cloudflare Tunnel** - Öffentliche Erreichbarkeit
|
||||
2. **Cloudflare Tunnel** - Oeffentliche Erreichbarkeit
|
||||
3. **SSH via Cloudflare Access** - Sicherer Remote-Zugang
|
||||
4. **LaunchAgents** - Autostart bei Boot
|
||||
5. **Health Checks** - Automatische Überwachung
|
||||
5. **Health Checks** - Automatische Ueberwachung
|
||||
6. **Telegram Notifications** - Alerts bei Fehlern
|
||||
7. **Email Notifications** - Redundante Benachrichtigung
|
||||
8. **Ollama** - Lokale LLM-Inferenz (Gemma 3 4B)
|
||||
9. **Telegram Ollama Bot** - Chat-Interface für Ollama
|
||||
8. ~~**Ollama** - Lokale LLM-Inferenz~~ → Migriert auf GPU-Server (2026-03-28)
|
||||
9. ~~**Telegram Ollama Bot**~~ → Deaktiviert (2026-03-28)
|
||||
10. **Matrix Synapse** - DSGVO-konformes Messaging
|
||||
11. **GPU-Server Offload** - Alle AI-Workloads auf RTX 3090 (2026-03-28)
|
||||
|
|
|
|||
|
|
@ -254,11 +254,17 @@ check_service "Photos Web" "http://localhost:5019/health"
|
|||
|
||||
echo ""
|
||||
echo "Core Services:"
|
||||
# API Gateway disabled - no GHCR image, no Dockerfile
|
||||
check_service "Search Service" "http://localhost:3020/api/v1/health"
|
||||
check_service "Media Service" "http://localhost:3015/api/v1/health"
|
||||
check_service "LLM Service" "http://localhost:3025/health"
|
||||
|
||||
echo ""
|
||||
echo "GPU Server (192.168.178.11):"
|
||||
check_service "GPU Ollama" "http://192.168.178.11:11434/api/version" 3
|
||||
check_service "GPU STT" "http://192.168.178.11:3020/health" 3
|
||||
check_service "GPU TTS" "http://192.168.178.11:3022/health" 3
|
||||
check_service "GPU Image Gen" "http://192.168.178.11:3023/health" 3
|
||||
|
||||
echo ""
|
||||
echo "Matrix:"
|
||||
check_service "Synapse" "http://localhost:4000/health"
|
||||
|
|
|
|||
|
|
@ -46,7 +46,6 @@ check_launchd() {
|
|||
check_launchd "com.cloudflare.cloudflared" "Cloudflared Tunnel"
|
||||
check_launchd "com.manacore.docker-startup" "Docker Startup"
|
||||
check_launchd "com.manacore.health-check" "Health Check (5min)"
|
||||
check_launchd "com.manacore.stt" "STT Service (Whisper/Voxtral)"
|
||||
|
||||
# ============================================
|
||||
# Docker Status
|
||||
|
|
@ -85,25 +84,25 @@ if docker info >/dev/null 2>&1; then
|
|||
fi
|
||||
|
||||
# ============================================
|
||||
# Native Services (non-Docker)
|
||||
# GPU Server (192.168.178.11)
|
||||
# ============================================
|
||||
echo ""
|
||||
echo -e "${BOLD}Native Services:${NC}"
|
||||
echo -e "${BOLD}GPU Server (192.168.178.11):${NC}"
|
||||
|
||||
# Ollama
|
||||
if curl -s --max-time 2 http://localhost:11434/api/tags >/dev/null 2>&1; then
|
||||
OLLAMA_MODELS=$(curl -s http://localhost:11434/api/tags | grep -o '"name":"[^"]*"' | wc -l | tr -d ' ')
|
||||
echo -e " ${GREEN}[Running]${NC} Ollama (${OLLAMA_MODELS} models)"
|
||||
else
|
||||
echo -e " ${YELLOW}[Stopped]${NC} Ollama"
|
||||
fi
|
||||
check_gpu_service() {
|
||||
local name=$1
|
||||
local url=$2
|
||||
if curl -s --max-time 3 "$url" >/dev/null 2>&1; then
|
||||
echo -e " ${GREEN}[Running]${NC} $name"
|
||||
else
|
||||
echo -e " ${YELLOW}[Offline]${NC} $name"
|
||||
fi
|
||||
}
|
||||
|
||||
# STT Service
|
||||
if curl -s --max-time 2 http://localhost:3020/health >/dev/null 2>&1; then
|
||||
echo -e " ${GREEN}[Running]${NC} STT Service (port 3020)"
|
||||
else
|
||||
echo -e " ${YELLOW}[Stopped]${NC} STT Service"
|
||||
fi
|
||||
check_gpu_service "Ollama (LLM)" "http://192.168.178.11:11434/api/version"
|
||||
check_gpu_service "STT (Whisper)" "http://192.168.178.11:3020/health"
|
||||
check_gpu_service "TTS" "http://192.168.178.11:3022/health"
|
||||
check_gpu_service "Image Gen (FLUX)" "http://192.168.178.11:3023/health"
|
||||
|
||||
# ============================================
|
||||
# Network/Tunnel Status
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue