fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes

mana-voice-bot's source default was 3050, which collided with mana-sync.
Today the collision is latent (voice-bot isn't deployed anywhere), but
sooner or later someone is going to start it on a host that's already
running mana-sync and the second one will refuse to bind. Moving to
3024 puts it inside the AI/ML port range alongside its dependencies
(stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync.

Updated:
- app/main.py — PORT default 3050 → 3024
- start.sh, setup.sh — same fix in the example commands
- CLAUDE.md — full rewrite. Old version described "Mac Mini deployment"
  with launchd; the new version explicitly says "not deployed yet" and
  documents the seven concrete steps to deploy on the Windows GPU box
  alongside the other AI services (Scheduled Task, service.pyw, .env,
  firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update).

docs/WINDOWS_GPU_SERVER_SETUP.md:
- Added the missing ManaVideoGen scheduled task to all four
  Start-ScheduledTask snippets — video-gen has been running on the
  Windows GPU but the doc had never picked it up.
- Added a "mana-video-gen (Port 3026)" service section parallel to the
  existing image-gen one, with venv path, repo pointer, model, etc.
- Added a repo-pendants table mapping C:\mana\services\<svc>\ to the
  corresponding services/<svc>/ directory in the repo, plus a note that
  changes should flow repo→Windows, not the other way around.

docs/PORT_SCHEMA.md:
- Reconciled the warning block with the post-cleanup reality: no more
  active or latent port collisions (image-gen ↔ video-gen and
  voice-bot ↔ sync are both resolved). Listed the actual ports per host
  with public URLs. Kept the planned-vs-actual disclaimer for the
  services that still don't match the aspirational ranges (mana-credits
  3061 vs planned 3002, etc).
This commit is contained in:
Till JS 2026-04-08 13:14:57 +02:00
parent f4347032ca
commit 4cb1bc1827
6 changed files with 135 additions and 131 deletions

View file

@ -1,132 +1,108 @@
# CLAUDE.md - Mana Voice Bot
# mana-voice-bot
## Service Overview
German voice-to-voice assistant. Wires together STT (mana-stt), an LLM (Ollama via mana-llm), and TTS (Edge TTS cloud or mana-tts) into a single end-to-end audio pipeline.
German voice-to-voice assistant combining:
- **STT**: Whisper via mana-stt (Port 3020)
- **LLM**: Ollama with Gemma/Qwen (Port 11434)
- **TTS**: Edge TTS (Microsoft, cloud API)
> ⚠️ **Not deployed yet.** This service exists in the repo and runs
> locally for development, but it has no Scheduled Task on the Windows
> GPU server, no launchd plist, no Cloudflare Tunnel hostname, and no
> entry in the production startup scripts. When you're ready to deploy
> it, target the Windows GPU server alongside the other AI services
> (`C:\mana\services\mana-voice-bot\`, Scheduled Task `ManaVoiceBot`,
> `service.pyw` runner, public URL `gpu-voice.mana.how` via the existing
> Mac Mini cloudflared+gpu-proxy chain).
**Port**: 3050
## Tech Stack
## Architecture
| Layer | Technology |
|-------|------------|
| **Runtime** | Python 3.11 + uvicorn |
| **Framework** | FastAPI |
| **STT** | Whisper via mana-stt |
| **LLM** | Ollama via mana-llm (Gemma/Qwen) |
| **TTS** | Edge TTS (Microsoft cloud) — could move to mana-tts later |
```
Audio Input → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio Output
↓ ↓ ↓ ↓
[WAV/MP3] [German Text] [Response] [MP3 Audio]
```
## Port: 3024
## Commands
> The default was `3050` until 2026-04-08. That collided with `mana-sync`
> on the Mac Mini and was a latent footgun for any future deployment
> that put both on the same host. Moved to 3024 to fit in the AI/ML
> port range alongside mana-stt (3020), mana-tts (3022), mana-image-gen
> (3023), and mana-llm (3025).
## Quick Start (local dev)
```bash
# Setup
cd services/mana-voice-bot
./setup.sh
# Development
source venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload
# Production
./start.sh
# Test
curl http://localhost:3050/health
# or directly:
uvicorn app.main:app --host 0.0.0.0 --port 3024 --reload
```
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Service health check |
| `/voices` | GET | List German TTS voices |
| `/models` | GET | List available Ollama models |
| `/transcribe` | POST | Audio → Text (STT only) |
| `/chat` | POST | Text → Text (LLM only) |
| `/chat/audio` | POST | Text → Audio (LLM + TTS) |
| `/tts` | POST | Text → Audio (TTS only) |
| `/voice` | POST | Audio → Audio (Full pipeline) |
| `/voice/metadata` | POST | Audio → JSON (Full pipeline, no audio) |
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Service health check |
| GET | `/voices` | List German TTS voices |
| GET | `/models` | List available Ollama models |
| POST | `/transcribe` | Audio → text (STT only) |
| POST | `/chat` | Text → text (LLM only) |
| POST | `/chat/audio` | Text → audio (LLM + TTS) |
| POST | `/tts` | Text → audio (TTS only) |
| POST | `/voice` | Audio → audio (full pipeline) |
| POST | `/voice/metadata` | Audio → JSON (full pipeline, no audio response) |
## Usage Examples
## Pipeline
### Full Voice Pipeline
```bash
# Record audio and send to voice bot
curl -X POST http://localhost:3050/voice \
-F "audio=@input.wav" \
-F "model=gemma3:4b" \
-F "voice=de-DE-ConradNeural" \
-o response.mp3
```
### Text to Audio
```bash
curl -X POST http://localhost:3050/chat/audio \
-H "Content-Type: application/json" \
-d '{"message": "Was ist die Hauptstadt von Deutschland?", "voice": "de-DE-KatjaNeural"}' \
-o response.mp3
```
### TTS Only
```bash
curl -X POST http://localhost:3050/tts \
-F "text=Hallo, wie geht es dir?" \
-F "voice=de-DE-ConradNeural" \
-o hello.mp3
Audio in → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio out
↓ ↓ ↓
[German text] [Response] [MP3 audio]
```
## German Voices
| Voice ID | Description |
|----------|-------------|
| `de-DE-ConradNeural` | Male - Professional (Default) |
| `de-DE-KatjaNeural` | Female - Natural |
| `de-DE-AmalaNeural` | Female - Friendly |
| `de-DE-BerndNeural` | Male - Calm |
| `de-DE-ChristophNeural` | Male - News |
| `de-DE-ElkeNeural` | Female - Warm |
| `de-DE-KillianNeural` | Male - Casual |
| `de-DE-KlarissaNeural` | Female - Cheerful |
| `de-DE-KlausNeural` | Male - Storyteller |
| `de-DE-LouisaNeural` | Female - Assistant |
| `de-DE-TanjaNeural` | Female - Business |
| `de-DE-ConradNeural` | Male, professional (default) |
| `de-DE-KatjaNeural` | Female, natural |
| `de-DE-AmalaNeural` | Female, friendly |
| `de-DE-BerndNeural` | Male, calm |
| `de-DE-ChristophNeural` | Male, news |
| `de-DE-ElkeNeural` | Female, warm |
| `de-DE-KillianNeural` | Male, casual |
| `de-DE-KlarissaNeural` | Female, cheerful |
| `de-DE-KlausNeural` | Male, storyteller |
| `de-DE-LouisaNeural` | Female, assistant |
| `de-DE-TanjaNeural` | Female, business |
## Environment Variables
## Configuration
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3050` | Service port |
| `PORT` | `3024` | Service port |
| `STT_URL` | `http://localhost:3020` | mana-stt URL |
| `OLLAMA_URL` | `http://localhost:11434` | Ollama URL |
| `DEFAULT_MODEL` | `gemma3:4b` | Default LLM model |
| `DEFAULT_VOICE` | `de-DE-ConradNeural` | Default TTS voice |
| `SYSTEM_PROMPT` | (German assistant) | LLM system prompt |
## Dependencies
## Performance budget
- `fastapi` - Web framework
- `uvicorn` - ASGI server
- `aiohttp` - Async HTTP client
- `edge-tts` - Microsoft TTS
- `python-multipart` - File uploads
Typical latency on the GPU server:
- STT (Whisper): 0.52 s
- LLM (Gemma 4B): 15 s
- TTS (Edge): 0.30.5 s
- **Total**: 27 s
## Performance
## When you actually deploy this
Typical latency breakdown:
- STT (Whisper): 0.5-2s
- LLM (Gemma 4B): 1-5s
- TTS (Edge): 0.3-0.5s
- **Total**: 2-7s
## Mac Mini Deployment
```bash
# On Mac Mini
cd ~/projects/mana-monorepo/services/mana-voice-bot
./setup.sh
./start.sh
# Or with launchd (autostart)
# See scripts/mac-mini/setup-voice-bot.sh
```
1. Copy the directory to `C:\mana\services\mana-voice-bot\` on `mana-server-gpu`
2. Create the venv (`C:\mana\venvs\voice-bot\`) and install requirements
3. Write a `service.pyw` runner mirroring the other AI services (loads `.env`, redirects stdout/stderr to `service.log`, calls `uvicorn.run(... port=3024)`)
4. Create the Windows Scheduled Task `ManaVoiceBot` (AtLogOn) pointing at `service.pyw`
5. Add the firewall rule (`New-NetFirewallRule -DisplayName "Mana-Voice-Bot" -Direction Inbound -LocalPort 3024 -Protocol TCP -Action Allow`)
6. Add the cloudflared route in `cloudflared-config.yml`:
`- hostname: gpu-voice.mana.how → service: http://192.168.178.11:3024`
7. Update `docs/WINDOWS_GPU_SERVER_SETUP.md` with the new task

View file

@ -32,7 +32,7 @@ logging.basicConfig(
logger = logging.getLogger(__name__)
# Configuration
PORT = int(os.getenv("PORT", "3050"))
PORT = int(os.getenv("PORT", "3024"))
STT_URL = os.getenv("STT_URL", "http://localhost:3020")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434")
DEFAULT_MODEL = os.getenv("DEFAULT_MODEL", "gemma3:4b")

View file

@ -21,7 +21,7 @@ echo "Setup complete!"
echo ""
echo "To start the service:"
echo " source venv/bin/activate"
echo " uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload"
echo " uvicorn app.main:app --host 0.0.0.0 --port 3024 --reload"
echo ""
echo "Or use the start script:"
echo " ./start.sh"

View file

@ -4,7 +4,7 @@
cd "$(dirname "$0")"
source venv/bin/activate
export PORT=${PORT:-3050}
export PORT=${PORT:-3024}
export STT_URL=${STT_URL:-http://localhost:3020}
export OLLAMA_URL=${OLLAMA_URL:-http://localhost:11434}
export DEFAULT_MODEL=${DEFAULT_MODEL:-gemma3:4b}