fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes

mana-voice-bot's source default was 3050, which collided with mana-sync. Today the collision is latent (voice-bot isn't deployed anywhere), but sooner or later someone is going to start it on a host that's already running mana-sync and the second one will refuse to bind. Moving to 3024 puts it inside the AI/ML port range alongside its dependencies (stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync. Updated: - app/main.py — PORT default 3050 → 3024 - start.sh, setup.sh — same fix in the example commands - CLAUDE.md — full rewrite. Old version described "Mac Mini deployment" with launchd; the new version explicitly says "not deployed yet" and documents the seven concrete steps to deploy on the Windows GPU box alongside the other AI services (Scheduled Task, service.pyw, .env, firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update). docs/WINDOWS_GPU_SERVER_SETUP.md: - Added the missing ManaVideoGen scheduled task to all four Start-ScheduledTask snippets — video-gen has been running on the Windows GPU but the doc had never picked it up. - Added a "mana-video-gen (Port 3026)" service section parallel to the existing image-gen one, with venv path, repo pointer, model, etc. - Added a repo-pendants table mapping C:\mana\services\<svc>\ to the corresponding services/<svc>/ directory in the repo, plus a note that changes should flow repo→Windows, not the other way around. docs/PORT_SCHEMA.md: - Reconciled the warning block with the post-cleanup reality: no more active or latent port collisions (image-gen ↔ video-gen and voice-bot ↔ sync are both resolved). Listed the actual ports per host with public URLs. Kept the planned-vs-actual disclaimer for the services that still don't match the aspirational ranges (mana-credits 3061 vs planned 3002, etc).
2026-05-14 20:21:09 +02:00 · 2026-04-08 13:14:57 +02:00 · 2026-04-08 13:14:57 +02:00 · 4cb1bc1827
commit 4cb1bc1827
parent f4347032ca
6 changed files with 135 additions and 131 deletions
--- a/docs/PORT_SCHEMA.md
+++ b/docs/PORT_SCHEMA.md
@ -1,48 +1,49 @@
 # Port Schema
-> ⚠️ **ASPIRATIONAL — does not match running services as of 2026-04-08.**
+> ⚠️ **PARTIALLY ASPIRATIONAL.** The clean range layout below
 > (3000–3009 core, 3010–3019 infra, 3020–3029 AI/ML, …) was drafted
 > 2026-03-28 as a target state. Many services do match it; many don't.
 > Authoritative per-service ports live in each `services/*/CLAUDE.md`
 > alongside the source defaults in `app/main.py` / `config.ts` / etc.
 >
-> This document describes a *planned* reorganization of port assignments
+> ### Real ports today (2026-04-08)
 > into clean ranges (3000–3009 core, 3010–3019 infra, 3020–3029 AI/ML, …).
 > The reorg has not been executed: the actual ports services bind to
 > live in their `app/main.py` / `start.sh` / `config.ts`. Per-service
 > ports are documented in each `services/*/CLAUDE.md`.
 >
-> ### Real ports today
+> **Windows GPU server (`192.168.178.11`):**
 > - mana-stt `3020` (Scheduled Task `ManaSTT`, public: `gpu-stt.mana.how`)
 > - mana-tts `3022` (Task `ManaTTS`, public: `gpu-tts.mana.how`)
 > - mana-image-gen `3023` (Task `ManaImageGen`, public: `gpu-img.mana.how`)
 > - mana-llm `3025` (Task `ManaLLM`, public: `gpu-llm.mana.how`)
 > - mana-video-gen `3026` (Task `ManaVideoGen`, public: `gpu-video.mana.how`)
 > - Ollama `11434` (public: `gpu-ollama.mana.how`)
 >
-> **Mac Mini:**
+> **Mac Mini (production):**
 > - mana-auth `3001`
-> - mana-stt `3020` (Mac Mini local instance, MLX)
+> - mana-media `3015`
-> - mana-image-gen `3025` (Mac Mini, flux2.c, MPS — separate from the
+> - mana-search `3021` (overlaps with the planned range slot, not a host
->   Windows GPU image-gen on `gpu-img.mana.how` which lives outside the repo)
+>   collision since search runs on Mac Mini and stt runs on the GPU box)
 > - mana-crawler `3023` (same — Mac Mini, no host collision with image-gen on GPU)
 > - mana-notify `3040`
 > - mana-sync `3050`
 > - mana-search `3021`, mana-notify `3040`, mana-crawler `3023`,
 >   mana-media `3015`
 > - mana-credits `3061`, mana-user `3062`, mana-subscriptions `3063`,
 >   mana-analytics `3064`, mana-events `3065`
 >
-> **Windows GPU server (`192.168.178.11`):**
+> **Not deployed:** `mana-voice-bot` (default port `3024`, no scheduled
-> - mana-llm `3025`
+> task, no cloudflared route, no launchd plist).
 > - mana-stt `3020`
 > - mana-tts `3022`
 > - image-gen (Windows variant, **not the repo's `mana-image-gen`**) `3023`
 > - mana-video-gen `3026`
 > - Ollama `11434`
 >
-> ### No production collisions today, but two latent ones in source defaults
+> No production port collisions exist today. The two latent collisions
 > that PORT_SCHEMA.md previously warned about (image-gen ↔ video-gen on
 > 3026, voice-bot ↔ sync on 3050) were resolved on 2026-04-08 by:
 > - Moving the only `mana-image-gen` to be the Windows-only diffusers
 >   variant on port 3023 (the Mac flux2.c variant was deleted)
 > - Moving `mana-voice-bot`'s source default from 3050 to 3024
 >
-> | Latent collision | Why it doesn't bite | What to watch for |
+> Some services still don't match the planned range layout below
-> |---|---|---|
+> (mana-credits is at 3061 not 3002, mana-user 3062 not 3004, etc).
-> | mana-image-gen and mana-llm both use `3025` | Different machines (Mac Mini vs Windows GPU); mana-image-gen `setup.sh` hard-fails outside macOS arm64 so it can't be deployed onto the Windows GPU by accident | Don't try to run mana-image-gen and mana-llm on the same host |
+> Either execute the move and update this doc, or accept reality and
-> | mana-voice-bot defaults to `3050`, mana-sync also `3050` | mana-voice-bot is not deployed anywhere yet (no launchd plist, no Scheduled Task, no cloudflared route) | Pick a free port for mana-voice-bot before deploying it — current default will collide with mana-sync wherever sync runs |
+> rewrite the planned tables to reflect what's actually running.
 >
 > The previous version of this warning claimed two **active** collisions
 > (image-gen ↔ video-gen on 3026, voice-bot ↔ sync on 3050). That was
 > wrong: image-gen on Mac Mini was overridden to 3025 via a launchd plist
 > (now also the source default — see commit history), and voice-bot isn't
 > running anywhere.
 **Originally drafted:** 2026-03-28
 **Reality reconciled:** 2026-04-08
 ## Principles
--- a/docs/WINDOWS_GPU_SERVER_SETUP.md
+++ b/docs/WINDOWS_GPU_SERVER_SETUP.md
@ -30,6 +30,7 @@ Start-ScheduledTask -TaskName "ManaLLM"
 Start-ScheduledTask -TaskName "ManaSTT"
 Start-ScheduledTask -TaskName "ManaTTS"
 Start-ScheduledTask -TaskName "ManaImageGen"
 Start-ScheduledTask -TaskName "ManaVideoGen"
 ```
 Wenn Schritt 9 (Server-Modus) korrekt konfiguriert ist, sollte der PC:
@ -415,13 +416,37 @@ Text-to-Speech mit mehreren Backends:
 Bildgenerierung mit FLUX.1-schnell (12B Parameter) via HuggingFace diffusers.
 - **Verzeichnis**: `C:\mana\services\mana-image-gen\`
 - **Repo-Pendant**: [`services/mana-image-gen/`](../services/mana-image-gen/) — `service.pyw`, `app/main.py`, `app/flux_service.py`, `app/api_auth.py`, `app/vram_manager.py`
 - **venv**: `C:\mana\venvs\image-gen\` (PyTorch 2.5.1+cu121)
- **Config**: `C:\mana\services\mana-image-gen\.env`
+- **Config**: `C:\mana\services\mana-image-gen\.env` (siehe `services/mana-image-gen/.env.example`)
 - **Log**: `C:\mana\services\mana-image-gen\service.log`
 - **Autostart**: Windows Scheduled Task "ManaImageGen" (AtLogOn)
 - **Modell**: FLUX.1-schnell (Apache 2.0, 4-bit quantisiert via BitsAndBytes)
 - **HuggingFace**: Erfordert Login + Lizenzakzeptanz für gated Model
 ### mana-video-gen (Port 3026)
 Videogenerierung mit LTX-Video (~2B Parameter) via HuggingFace diffusers + CUDA.
 - **Verzeichnis**: `C:\mana\services\mana-video-gen\`
 - **Repo-Pendant**: [`services/mana-video-gen/`](../services/mana-video-gen/) — `service.pyw`, `app/main.py`, `app/ltx_service.py`, `setup.sh`, `requirements.txt`
 - **venv**: `C:\mana\venvs\video-gen\` (PyTorch + CUDA + diffusers)
 - **Config**: `C:\mana\services\mana-video-gen\.env`
 - **Log**: `C:\mana\services\mana-video-gen\service.log`
 - **Autostart**: Windows Scheduled Task "ManaVideoGen" (AtLogOn)
 - **Modell**: LTX-Video (Lightricks)
 - **HuggingFace**: HF_TOKEN erforderlich für Model-Download
 ### Repo-Pendants der anderen GPU-Services
 | Windows-Pfad | Repo-Pfad |
 |---|---|
 | `C:\mana\services\mana-llm\` | [`services/mana-llm/`](../services/mana-llm/) |
 | `C:\mana\services\mana-stt\` | [`services/mana-stt/`](../services/mana-stt/) |
 | `C:\mana\services\mana-tts\` | [`services/mana-tts/`](../services/mana-tts/) |
 Jeder Service hat im Repo eine `service.pyw` Datei — das ist der Runner, den die Scheduled Tasks aufrufen. Änderungen an einem Service sollten primär im Repo gemacht und dann auf die Windows-Box gespiegelt werden, nicht andersrum.
 ### Management-Skripte
 ```powershell
@ -439,6 +464,7 @@ Start-ScheduledTask -TaskName "ManaLLM"
 Start-ScheduledTask -TaskName "ManaSTT"
 Start-ScheduledTask -TaskName "ManaTTS"
 Start-ScheduledTask -TaskName "ManaImageGen"
 Start-ScheduledTask -TaskName "ManaVideoGen"
 # Alle Scheduled Tasks auf einmal anzeigen
 Get-ScheduledTask -TaskName "Mana*" | Format-Table TaskName, State
@ -738,6 +764,7 @@ Start-ScheduledTask -TaskName "ManaLLM"
 Start-ScheduledTask -TaskName "ManaSTT"
 Start-ScheduledTask -TaskName "ManaTTS"
 Start-ScheduledTask -TaskName "ManaImageGen"
 Start-ScheduledTask -TaskName "ManaVideoGen"
 # Status prüfen
 python C:\mana\status.py
--- a/services/mana-voice-bot/CLAUDE.md
+++ b/services/mana-voice-bot/CLAUDE.md
@ -1,132 +1,108 @@
-# CLAUDE.md - Mana Voice Bot
+# mana-voice-bot
-## Service Overview
+German voice-to-voice assistant. Wires together STT (mana-stt), an LLM (Ollama via mana-llm), and TTS (Edge TTS cloud or mana-tts) into a single end-to-end audio pipeline.
-German voice-to-voice assistant combining:
+> ⚠️ **Not deployed yet.** This service exists in the repo and runs
- **STT**: Whisper via mana-stt (Port 3020)
+> locally for development, but it has no Scheduled Task on the Windows
- **LLM**: Ollama with Gemma/Qwen (Port 11434)
+> GPU server, no launchd plist, no Cloudflare Tunnel hostname, and no
- **TTS**: Edge TTS (Microsoft, cloud API)
+> entry in the production startup scripts. When you're ready to deploy
 > it, target the Windows GPU server alongside the other AI services
 > (`C:\mana\services\mana-voice-bot\`, Scheduled Task `ManaVoiceBot`,
 > `service.pyw` runner, public URL `gpu-voice.mana.how` via the existing
 > Mac Mini cloudflared+gpu-proxy chain).
-**Port**: 3050
+## Tech Stack
-## Architecture
+| Layer | Technology |
 |-------|------------|
 | **Runtime** | Python 3.11 + uvicorn |
 | **Framework** | FastAPI |
 | **STT** | Whisper via mana-stt |
 | **LLM** | Ollama via mana-llm (Gemma/Qwen) |
 | **TTS** | Edge TTS (Microsoft cloud) — could move to mana-tts later |
-```
+## Port: 3024
 Audio Input → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio Output
     ↓              ↓              ↓              ↓
  [WAV/MP3]    [German Text]  [Response]     [MP3 Audio]
 ```
-## Commands
+> The default was `3050` until 2026-04-08. That collided with `mana-sync`
 > on the Mac Mini and was a latent footgun for any future deployment
 > that put both on the same host. Moved to 3024 to fit in the AI/ML
 > port range alongside mana-stt (3020), mana-tts (3022), mana-image-gen
 > (3023), and mana-llm (3025).
 ## Quick Start (local dev)
 ```bash
-# Setup
+cd services/mana-voice-bot
 ./setup.sh
 # Development
 source venv/bin/activate
 uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload
 # Production
 ./start.sh
-
+# or directly:
-# Test
+uvicorn app.main:app --host 0.0.0.0 --port 3024 --reload
 curl http://localhost:3050/health
 ```
 ## API Endpoints
-| Endpoint | Method | Description |
+| Method | Path | Description |
-|----------|--------|-------------|
+|--------|------|-------------|
-| `/health` | GET | Service health check |
+| GET | `/health` | Service health check |
-| `/voices` | GET | List German TTS voices |
+| GET | `/voices` | List German TTS voices |
-| `/models` | GET | List available Ollama models |
+| GET | `/models` | List available Ollama models |
-| `/transcribe` | POST | Audio → Text (STT only) |
+| POST | `/transcribe` | Audio → text (STT only) |
-| `/chat` | POST | Text → Text (LLM only) |
+| POST | `/chat` | Text → text (LLM only) |
-| `/chat/audio` | POST | Text → Audio (LLM + TTS) |
+| POST | `/chat/audio` | Text → audio (LLM + TTS) |
-| `/tts` | POST | Text → Audio (TTS only) |
+| POST | `/tts` | Text → audio (TTS only) |
-| `/voice` | POST | Audio → Audio (Full pipeline) |
+| POST | `/voice` | Audio → audio (full pipeline) |
-| `/voice/metadata` | POST | Audio → JSON (Full pipeline, no audio) |
+| POST | `/voice/metadata` | Audio → JSON (full pipeline, no audio response) |
-## Usage Examples
+## Pipeline
 ### Full Voice Pipeline
 ```bash
 # Record audio and send to voice bot
 curl -X POST http://localhost:3050/voice \
  -F "audio=@input.wav" \
  -F "model=gemma3:4b" \
  -F "voice=de-DE-ConradNeural" \
  -o response.mp3
 ```
-
+Audio in → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio out
-### Text to Audio
+              ↓                  ↓             ↓
-```bash
+         [German text]      [Response]    [MP3 audio]
 curl -X POST http://localhost:3050/chat/audio \
  -H "Content-Type: application/json" \
  -d '{"message": "Was ist die Hauptstadt von Deutschland?", "voice": "de-DE-KatjaNeural"}' \
  -o response.mp3
 ```
 ### TTS Only
 ```bash
 curl -X POST http://localhost:3050/tts \
  -F "text=Hallo, wie geht es dir?" \
  -F "voice=de-DE-ConradNeural" \
  -o hello.mp3
 ```
 ## German Voices
 | Voice ID | Description |
 |----------|-------------|
-| `de-DE-ConradNeural` | Male - Professional (Default) |
+| `de-DE-ConradNeural` | Male, professional (default) |
-| `de-DE-KatjaNeural` | Female - Natural |
+| `de-DE-KatjaNeural` | Female, natural |
-| `de-DE-AmalaNeural` | Female - Friendly |
+| `de-DE-AmalaNeural` | Female, friendly |
-| `de-DE-BerndNeural` | Male - Calm |
+| `de-DE-BerndNeural` | Male, calm |
-| `de-DE-ChristophNeural` | Male - News |
+| `de-DE-ChristophNeural` | Male, news |
-| `de-DE-ElkeNeural` | Female - Warm |
+| `de-DE-ElkeNeural` | Female, warm |
-| `de-DE-KillianNeural` | Male - Casual |
+| `de-DE-KillianNeural` | Male, casual |
-| `de-DE-KlarissaNeural` | Female - Cheerful |
+| `de-DE-KlarissaNeural` | Female, cheerful |
-| `de-DE-KlausNeural` | Male - Storyteller |
+| `de-DE-KlausNeural` | Male, storyteller |
-| `de-DE-LouisaNeural` | Female - Assistant |
+| `de-DE-LouisaNeural` | Female, assistant |
-| `de-DE-TanjaNeural` | Female - Business |
+| `de-DE-TanjaNeural` | Female, business |
-## Environment Variables
+## Configuration
 | Variable | Default | Description |
 |----------|---------|-------------|
-| `PORT` | `3050` | Service port |
+| `PORT` | `3024` | Service port |
 | `STT_URL` | `http://localhost:3020` | mana-stt URL |
 | `OLLAMA_URL` | `http://localhost:11434` | Ollama URL |
 | `DEFAULT_MODEL` | `gemma3:4b` | Default LLM model |
 | `DEFAULT_VOICE` | `de-DE-ConradNeural` | Default TTS voice |
 | `SYSTEM_PROMPT` | (German assistant) | LLM system prompt |
-## Dependencies
+## Performance budget
- `fastapi` - Web framework
+Typical latency on the GPU server:
- `uvicorn` - ASGI server
+- STT (Whisper): 0.5–2 s
- `aiohttp` - Async HTTP client
+- LLM (Gemma 4B): 1–5 s
- `edge-tts` - Microsoft TTS
+- TTS (Edge): 0.3–0.5 s
- `python-multipart` - File uploads
+- **Total**: 2–7 s
-## Performance
+## When you actually deploy this
-Typical latency breakdown:
+1. Copy the directory to `C:\mana\services\mana-voice-bot\` on `mana-server-gpu`
- STT (Whisper): 0.5-2s
+2. Create the venv (`C:\mana\venvs\voice-bot\`) and install requirements
- LLM (Gemma 4B): 1-5s
+3. Write a `service.pyw` runner mirroring the other AI services (loads `.env`, redirects stdout/stderr to `service.log`, calls `uvicorn.run(... port=3024)`)
- TTS (Edge): 0.3-0.5s
+4. Create the Windows Scheduled Task `ManaVoiceBot` (AtLogOn) pointing at `service.pyw`
- **Total**: 2-7s
+5. Add the firewall rule (`New-NetFirewallRule -DisplayName "Mana-Voice-Bot" -Direction Inbound -LocalPort 3024 -Protocol TCP -Action Allow`)
-
+6. Add the cloudflared route in `cloudflared-config.yml`:
-## Mac Mini Deployment
+   `- hostname: gpu-voice.mana.how → service: http://192.168.178.11:3024`
-
+7. Update `docs/WINDOWS_GPU_SERVER_SETUP.md` with the new task
 ```bash
 # On Mac Mini
 cd ~/projects/mana-monorepo/services/mana-voice-bot
 ./setup.sh
 ./start.sh
 # Or with launchd (autostart)
 # See scripts/mac-mini/setup-voice-bot.sh
 ```
--- a/services/mana-voice-bot/app/main.py
+++ b/services/mana-voice-bot/app/main.py
@ -32,7 +32,7 @@ logging.basicConfig(
 logger = logging.getLogger(__name__)
 # Configuration
-PORT = int(os.getenv("PORT", "3050"))
+PORT = int(os.getenv("PORT", "3024"))
 STT_URL = os.getenv("STT_URL", "http://localhost:3020")
 OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
 DEFAULT_MODEL = os.getenv("DEFAULT_MODEL", "gemma3:4b")
--- a/services/mana-voice-bot/setup.sh
+++ b/services/mana-voice-bot/setup.sh
@ -21,7 +21,7 @@ echo "Setup complete!"
 echo ""
 echo "To start the service:"
 echo "  source venv/bin/activate"
-echo "  uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload"
+echo "  uvicorn app.main:app --host 0.0.0.0 --port 3024 --reload"
 echo ""
 echo "Or use the start script:"
 echo "  ./start.sh"
--- a/services/mana-voice-bot/start.sh
+++ b/services/mana-voice-bot/start.sh
@ -4,7 +4,7 @@
 cd "$(dirname "$0")"
 source venv/bin/activate
-export PORT=${PORT:-3050}
+export PORT=${PORT:-3024}
 export STT_URL=${STT_URL:-http://localhost:3020}
 export OLLAMA_URL=${OLLAMA_URL:-http://localhost:11434}
 export DEFAULT_MODEL=${DEFAULT_MODEL:-gemma3:4b}