diff --git a/docs/PORT_SCHEMA.md b/docs/PORT_SCHEMA.md index 26d0a28ee..bad18d39d 100644 --- a/docs/PORT_SCHEMA.md +++ b/docs/PORT_SCHEMA.md @@ -1,48 +1,49 @@ # Port Schema -> ⚠️ **ASPIRATIONAL — does not match running services as of 2026-04-08.** +> ⚠️ **PARTIALLY ASPIRATIONAL.** The clean range layout below +> (3000–3009 core, 3010–3019 infra, 3020–3029 AI/ML, …) was drafted +> 2026-03-28 as a target state. Many services do match it; many don't. +> Authoritative per-service ports live in each `services/*/CLAUDE.md` +> alongside the source defaults in `app/main.py` / `config.ts` / etc. > -> This document describes a *planned* reorganization of port assignments -> into clean ranges (3000–3009 core, 3010–3019 infra, 3020–3029 AI/ML, …). -> The reorg has not been executed: the actual ports services bind to -> live in their `app/main.py` / `start.sh` / `config.ts`. Per-service -> ports are documented in each `services/*/CLAUDE.md`. +> ### Real ports today (2026-04-08) > -> ### Real ports today +> **Windows GPU server (`192.168.178.11`):** +> - mana-stt `3020` (Scheduled Task `ManaSTT`, public: `gpu-stt.mana.how`) +> - mana-tts `3022` (Task `ManaTTS`, public: `gpu-tts.mana.how`) +> - mana-image-gen `3023` (Task `ManaImageGen`, public: `gpu-img.mana.how`) +> - mana-llm `3025` (Task `ManaLLM`, public: `gpu-llm.mana.how`) +> - mana-video-gen `3026` (Task `ManaVideoGen`, public: `gpu-video.mana.how`) +> - Ollama `11434` (public: `gpu-ollama.mana.how`) > -> **Mac Mini:** +> **Mac Mini (production):** > - mana-auth `3001` -> - mana-stt `3020` (Mac Mini local instance, MLX) -> - mana-image-gen `3025` (Mac Mini, flux2.c, MPS — separate from the -> Windows GPU image-gen on `gpu-img.mana.how` which lives outside the repo) +> - mana-media `3015` +> - mana-search `3021` (overlaps with the planned range slot, not a host +> collision since search runs on Mac Mini and stt runs on the GPU box) +> - mana-crawler `3023` (same — Mac Mini, no host collision with image-gen on GPU) +> - mana-notify `3040` > - mana-sync `3050` -> - mana-search `3021`, mana-notify `3040`, mana-crawler `3023`, -> mana-media `3015` > - mana-credits `3061`, mana-user `3062`, mana-subscriptions `3063`, > mana-analytics `3064`, mana-events `3065` > -> **Windows GPU server (`192.168.178.11`):** -> - mana-llm `3025` -> - mana-stt `3020` -> - mana-tts `3022` -> - image-gen (Windows variant, **not the repo's `mana-image-gen`**) `3023` -> - mana-video-gen `3026` -> - Ollama `11434` +> **Not deployed:** `mana-voice-bot` (default port `3024`, no scheduled +> task, no cloudflared route, no launchd plist). > -> ### No production collisions today, but two latent ones in source defaults +> No production port collisions exist today. The two latent collisions +> that PORT_SCHEMA.md previously warned about (image-gen ↔ video-gen on +> 3026, voice-bot ↔ sync on 3050) were resolved on 2026-04-08 by: +> - Moving the only `mana-image-gen` to be the Windows-only diffusers +> variant on port 3023 (the Mac flux2.c variant was deleted) +> - Moving `mana-voice-bot`'s source default from 3050 to 3024 > -> | Latent collision | Why it doesn't bite | What to watch for | -> |---|---|---| -> | mana-image-gen and mana-llm both use `3025` | Different machines (Mac Mini vs Windows GPU); mana-image-gen `setup.sh` hard-fails outside macOS arm64 so it can't be deployed onto the Windows GPU by accident | Don't try to run mana-image-gen and mana-llm on the same host | -> | mana-voice-bot defaults to `3050`, mana-sync also `3050` | mana-voice-bot is not deployed anywhere yet (no launchd plist, no Scheduled Task, no cloudflared route) | Pick a free port for mana-voice-bot before deploying it — current default will collide with mana-sync wherever sync runs | -> -> The previous version of this warning claimed two **active** collisions -> (image-gen ↔ video-gen on 3026, voice-bot ↔ sync on 3050). That was -> wrong: image-gen on Mac Mini was overridden to 3025 via a launchd plist -> (now also the source default — see commit history), and voice-bot isn't -> running anywhere. +> Some services still don't match the planned range layout below +> (mana-credits is at 3061 not 3002, mana-user 3062 not 3004, etc). +> Either execute the move and update this doc, or accept reality and +> rewrite the planned tables to reflect what's actually running. **Originally drafted:** 2026-03-28 +**Reality reconciled:** 2026-04-08 ## Principles diff --git a/docs/WINDOWS_GPU_SERVER_SETUP.md b/docs/WINDOWS_GPU_SERVER_SETUP.md index f6ce8441c..d96c98dfb 100644 --- a/docs/WINDOWS_GPU_SERVER_SETUP.md +++ b/docs/WINDOWS_GPU_SERVER_SETUP.md @@ -30,6 +30,7 @@ Start-ScheduledTask -TaskName "ManaLLM" Start-ScheduledTask -TaskName "ManaSTT" Start-ScheduledTask -TaskName "ManaTTS" Start-ScheduledTask -TaskName "ManaImageGen" +Start-ScheduledTask -TaskName "ManaVideoGen" ``` Wenn Schritt 9 (Server-Modus) korrekt konfiguriert ist, sollte der PC: @@ -415,13 +416,37 @@ Text-to-Speech mit mehreren Backends: Bildgenerierung mit FLUX.1-schnell (12B Parameter) via HuggingFace diffusers. - **Verzeichnis**: `C:\mana\services\mana-image-gen\` +- **Repo-Pendant**: [`services/mana-image-gen/`](../services/mana-image-gen/) — `service.pyw`, `app/main.py`, `app/flux_service.py`, `app/api_auth.py`, `app/vram_manager.py` - **venv**: `C:\mana\venvs\image-gen\` (PyTorch 2.5.1+cu121) -- **Config**: `C:\mana\services\mana-image-gen\.env` +- **Config**: `C:\mana\services\mana-image-gen\.env` (siehe `services/mana-image-gen/.env.example`) - **Log**: `C:\mana\services\mana-image-gen\service.log` - **Autostart**: Windows Scheduled Task "ManaImageGen" (AtLogOn) - **Modell**: FLUX.1-schnell (Apache 2.0, 4-bit quantisiert via BitsAndBytes) - **HuggingFace**: Erfordert Login + Lizenzakzeptanz für gated Model +### mana-video-gen (Port 3026) + +Videogenerierung mit LTX-Video (~2B Parameter) via HuggingFace diffusers + CUDA. + +- **Verzeichnis**: `C:\mana\services\mana-video-gen\` +- **Repo-Pendant**: [`services/mana-video-gen/`](../services/mana-video-gen/) — `service.pyw`, `app/main.py`, `app/ltx_service.py`, `setup.sh`, `requirements.txt` +- **venv**: `C:\mana\venvs\video-gen\` (PyTorch + CUDA + diffusers) +- **Config**: `C:\mana\services\mana-video-gen\.env` +- **Log**: `C:\mana\services\mana-video-gen\service.log` +- **Autostart**: Windows Scheduled Task "ManaVideoGen" (AtLogOn) +- **Modell**: LTX-Video (Lightricks) +- **HuggingFace**: HF_TOKEN erforderlich für Model-Download + +### Repo-Pendants der anderen GPU-Services + +| Windows-Pfad | Repo-Pfad | +|---|---| +| `C:\mana\services\mana-llm\` | [`services/mana-llm/`](../services/mana-llm/) | +| `C:\mana\services\mana-stt\` | [`services/mana-stt/`](../services/mana-stt/) | +| `C:\mana\services\mana-tts\` | [`services/mana-tts/`](../services/mana-tts/) | + +Jeder Service hat im Repo eine `service.pyw` Datei — das ist der Runner, den die Scheduled Tasks aufrufen. Änderungen an einem Service sollten primär im Repo gemacht und dann auf die Windows-Box gespiegelt werden, nicht andersrum. + ### Management-Skripte ```powershell @@ -439,6 +464,7 @@ Start-ScheduledTask -TaskName "ManaLLM" Start-ScheduledTask -TaskName "ManaSTT" Start-ScheduledTask -TaskName "ManaTTS" Start-ScheduledTask -TaskName "ManaImageGen" +Start-ScheduledTask -TaskName "ManaVideoGen" # Alle Scheduled Tasks auf einmal anzeigen Get-ScheduledTask -TaskName "Mana*" | Format-Table TaskName, State @@ -738,6 +764,7 @@ Start-ScheduledTask -TaskName "ManaLLM" Start-ScheduledTask -TaskName "ManaSTT" Start-ScheduledTask -TaskName "ManaTTS" Start-ScheduledTask -TaskName "ManaImageGen" +Start-ScheduledTask -TaskName "ManaVideoGen" # Status prüfen python C:\mana\status.py diff --git a/services/mana-voice-bot/CLAUDE.md b/services/mana-voice-bot/CLAUDE.md index da877c06f..81b6ece94 100644 --- a/services/mana-voice-bot/CLAUDE.md +++ b/services/mana-voice-bot/CLAUDE.md @@ -1,132 +1,108 @@ -# CLAUDE.md - Mana Voice Bot +# mana-voice-bot -## Service Overview +German voice-to-voice assistant. Wires together STT (mana-stt), an LLM (Ollama via mana-llm), and TTS (Edge TTS cloud or mana-tts) into a single end-to-end audio pipeline. -German voice-to-voice assistant combining: -- **STT**: Whisper via mana-stt (Port 3020) -- **LLM**: Ollama with Gemma/Qwen (Port 11434) -- **TTS**: Edge TTS (Microsoft, cloud API) +> ⚠️ **Not deployed yet.** This service exists in the repo and runs +> locally for development, but it has no Scheduled Task on the Windows +> GPU server, no launchd plist, no Cloudflare Tunnel hostname, and no +> entry in the production startup scripts. When you're ready to deploy +> it, target the Windows GPU server alongside the other AI services +> (`C:\mana\services\mana-voice-bot\`, Scheduled Task `ManaVoiceBot`, +> `service.pyw` runner, public URL `gpu-voice.mana.how` via the existing +> Mac Mini cloudflared+gpu-proxy chain). -**Port**: 3050 +## Tech Stack -## Architecture +| Layer | Technology | +|-------|------------| +| **Runtime** | Python 3.11 + uvicorn | +| **Framework** | FastAPI | +| **STT** | Whisper via mana-stt | +| **LLM** | Ollama via mana-llm (Gemma/Qwen) | +| **TTS** | Edge TTS (Microsoft cloud) — could move to mana-tts later | -``` -Audio Input → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio Output - ↓ ↓ ↓ ↓ - [WAV/MP3] [German Text] [Response] [MP3 Audio] -``` +## Port: 3024 -## Commands +> The default was `3050` until 2026-04-08. That collided with `mana-sync` +> on the Mac Mini and was a latent footgun for any future deployment +> that put both on the same host. Moved to 3024 to fit in the AI/ML +> port range alongside mana-stt (3020), mana-tts (3022), mana-image-gen +> (3023), and mana-llm (3025). + +## Quick Start (local dev) ```bash -# Setup +cd services/mana-voice-bot ./setup.sh - -# Development -source venv/bin/activate -uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload - -# Production ./start.sh - -# Test -curl http://localhost:3050/health +# or directly: +uvicorn app.main:app --host 0.0.0.0 --port 3024 --reload ``` ## API Endpoints -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/health` | GET | Service health check | -| `/voices` | GET | List German TTS voices | -| `/models` | GET | List available Ollama models | -| `/transcribe` | POST | Audio → Text (STT only) | -| `/chat` | POST | Text → Text (LLM only) | -| `/chat/audio` | POST | Text → Audio (LLM + TTS) | -| `/tts` | POST | Text → Audio (TTS only) | -| `/voice` | POST | Audio → Audio (Full pipeline) | -| `/voice/metadata` | POST | Audio → JSON (Full pipeline, no audio) | +| Method | Path | Description | +|--------|------|-------------| +| GET | `/health` | Service health check | +| GET | `/voices` | List German TTS voices | +| GET | `/models` | List available Ollama models | +| POST | `/transcribe` | Audio → text (STT only) | +| POST | `/chat` | Text → text (LLM only) | +| POST | `/chat/audio` | Text → audio (LLM + TTS) | +| POST | `/tts` | Text → audio (TTS only) | +| POST | `/voice` | Audio → audio (full pipeline) | +| POST | `/voice/metadata` | Audio → JSON (full pipeline, no audio response) | -## Usage Examples +## Pipeline -### Full Voice Pipeline -```bash -# Record audio and send to voice bot -curl -X POST http://localhost:3050/voice \ - -F "audio=@input.wav" \ - -F "model=gemma3:4b" \ - -F "voice=de-DE-ConradNeural" \ - -o response.mp3 ``` - -### Text to Audio -```bash -curl -X POST http://localhost:3050/chat/audio \ - -H "Content-Type: application/json" \ - -d '{"message": "Was ist die Hauptstadt von Deutschland?", "voice": "de-DE-KatjaNeural"}' \ - -o response.mp3 -``` - -### TTS Only -```bash -curl -X POST http://localhost:3050/tts \ - -F "text=Hallo, wie geht es dir?" \ - -F "voice=de-DE-ConradNeural" \ - -o hello.mp3 +Audio in → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio out + ↓ ↓ ↓ + [German text] [Response] [MP3 audio] ``` ## German Voices | Voice ID | Description | |----------|-------------| -| `de-DE-ConradNeural` | Male - Professional (Default) | -| `de-DE-KatjaNeural` | Female - Natural | -| `de-DE-AmalaNeural` | Female - Friendly | -| `de-DE-BerndNeural` | Male - Calm | -| `de-DE-ChristophNeural` | Male - News | -| `de-DE-ElkeNeural` | Female - Warm | -| `de-DE-KillianNeural` | Male - Casual | -| `de-DE-KlarissaNeural` | Female - Cheerful | -| `de-DE-KlausNeural` | Male - Storyteller | -| `de-DE-LouisaNeural` | Female - Assistant | -| `de-DE-TanjaNeural` | Female - Business | +| `de-DE-ConradNeural` | Male, professional (default) | +| `de-DE-KatjaNeural` | Female, natural | +| `de-DE-AmalaNeural` | Female, friendly | +| `de-DE-BerndNeural` | Male, calm | +| `de-DE-ChristophNeural` | Male, news | +| `de-DE-ElkeNeural` | Female, warm | +| `de-DE-KillianNeural` | Male, casual | +| `de-DE-KlarissaNeural` | Female, cheerful | +| `de-DE-KlausNeural` | Male, storyteller | +| `de-DE-LouisaNeural` | Female, assistant | +| `de-DE-TanjaNeural` | Female, business | -## Environment Variables +## Configuration | Variable | Default | Description | |----------|---------|-------------| -| `PORT` | `3050` | Service port | +| `PORT` | `3024` | Service port | | `STT_URL` | `http://localhost:3020` | mana-stt URL | | `OLLAMA_URL` | `http://localhost:11434` | Ollama URL | | `DEFAULT_MODEL` | `gemma3:4b` | Default LLM model | | `DEFAULT_VOICE` | `de-DE-ConradNeural` | Default TTS voice | | `SYSTEM_PROMPT` | (German assistant) | LLM system prompt | -## Dependencies +## Performance budget -- `fastapi` - Web framework -- `uvicorn` - ASGI server -- `aiohttp` - Async HTTP client -- `edge-tts` - Microsoft TTS -- `python-multipart` - File uploads +Typical latency on the GPU server: +- STT (Whisper): 0.5–2 s +- LLM (Gemma 4B): 1–5 s +- TTS (Edge): 0.3–0.5 s +- **Total**: 2–7 s -## Performance +## When you actually deploy this -Typical latency breakdown: -- STT (Whisper): 0.5-2s -- LLM (Gemma 4B): 1-5s -- TTS (Edge): 0.3-0.5s -- **Total**: 2-7s - -## Mac Mini Deployment - -```bash -# On Mac Mini -cd ~/projects/mana-monorepo/services/mana-voice-bot -./setup.sh -./start.sh - -# Or with launchd (autostart) -# See scripts/mac-mini/setup-voice-bot.sh -``` +1. Copy the directory to `C:\mana\services\mana-voice-bot\` on `mana-server-gpu` +2. Create the venv (`C:\mana\venvs\voice-bot\`) and install requirements +3. Write a `service.pyw` runner mirroring the other AI services (loads `.env`, redirects stdout/stderr to `service.log`, calls `uvicorn.run(... port=3024)`) +4. Create the Windows Scheduled Task `ManaVoiceBot` (AtLogOn) pointing at `service.pyw` +5. Add the firewall rule (`New-NetFirewallRule -DisplayName "Mana-Voice-Bot" -Direction Inbound -LocalPort 3024 -Protocol TCP -Action Allow`) +6. Add the cloudflared route in `cloudflared-config.yml`: + `- hostname: gpu-voice.mana.how → service: http://192.168.178.11:3024` +7. Update `docs/WINDOWS_GPU_SERVER_SETUP.md` with the new task diff --git a/services/mana-voice-bot/app/main.py b/services/mana-voice-bot/app/main.py index 5115c98ed..afb5b56fc 100644 --- a/services/mana-voice-bot/app/main.py +++ b/services/mana-voice-bot/app/main.py @@ -32,7 +32,7 @@ logging.basicConfig( logger = logging.getLogger(__name__) # Configuration -PORT = int(os.getenv("PORT", "3050")) +PORT = int(os.getenv("PORT", "3024")) STT_URL = os.getenv("STT_URL", "http://localhost:3020") OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434") DEFAULT_MODEL = os.getenv("DEFAULT_MODEL", "gemma3:4b") diff --git a/services/mana-voice-bot/setup.sh b/services/mana-voice-bot/setup.sh index 23a24b824..7a791b03a 100755 --- a/services/mana-voice-bot/setup.sh +++ b/services/mana-voice-bot/setup.sh @@ -21,7 +21,7 @@ echo "Setup complete!" echo "" echo "To start the service:" echo " source venv/bin/activate" -echo " uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload" +echo " uvicorn app.main:app --host 0.0.0.0 --port 3024 --reload" echo "" echo "Or use the start script:" echo " ./start.sh" diff --git a/services/mana-voice-bot/start.sh b/services/mana-voice-bot/start.sh index db60e7e3a..4eb9ca6da 100755 --- a/services/mana-voice-bot/start.sh +++ b/services/mana-voice-bot/start.sh @@ -4,7 +4,7 @@ cd "$(dirname "$0")" source venv/bin/activate -export PORT=${PORT:-3050} +export PORT=${PORT:-3024} export STT_URL=${STT_URL:-http://localhost:3020} export OLLAMA_URL=${OLLAMA_URL:-http://localhost:11434} export DEFAULT_MODEL=${DEFAULT_MODEL:-gemma3:4b}