From f4347032ca8e50d882ceb734348ba8d7af613769 Mon Sep 17 00:00:00 2001 From: Till JS Date: Wed, 8 Apr 2026 13:06:40 +0200 Subject: [PATCH] chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere --- docs/MAC_MINI_SERVER.md | 50 ++-- scripts/mac-mini/README.md | 36 +-- .../mac-mini/launchd/com.mana.image-gen.plist | 53 ---- .../mac-mini/launchd/com.mana.mana-stt.plist | 39 --- .../mac-mini/launchd/com.mana.mana-tts.plist | 39 --- scripts/mac-mini/setup-image-gen.sh | 198 --------------- scripts/mac-mini/setup-stt.sh | 153 ----------- scripts/mac-mini/setup-tts.sh | 172 ------------- services/mana-stt/CLAUDE.md | 101 ++++---- services/mana-stt/README.md | 182 ++------------ services/mana-stt/com.mana.mana-stt.plist | 39 --- services/mana-stt/com.mana.vllm-voxtral.plist | 41 --- services/mana-stt/install-service.sh | 45 ---- services/mana-stt/install-services.sh | 84 ------- services/mana-stt/scripts/setup-vllm.sh | 83 ------ .../mana-stt/scripts/start-vllm-voxtral.sh | 41 --- services/mana-stt/setup.sh | 123 --------- services/mana-tts/CLAUDE.md | 190 +++++++------- services/mana-tts/README.md | 237 ++---------------- services/mana-tts/com.mana.mana-tts.plist | 39 --- services/mana-tts/install-service.sh | 45 ---- services/mana-tts/setup.sh | 150 ----------- 22 files changed, 226 insertions(+), 1914 deletions(-) delete mode 100644 scripts/mac-mini/launchd/com.mana.image-gen.plist delete mode 100644 scripts/mac-mini/launchd/com.mana.mana-stt.plist delete mode 100644 scripts/mac-mini/launchd/com.mana.mana-tts.plist delete mode 100755 scripts/mac-mini/setup-image-gen.sh delete mode 100755 scripts/mac-mini/setup-stt.sh delete mode 100755 scripts/mac-mini/setup-tts.sh delete mode 100644 services/mana-stt/com.mana.mana-stt.plist delete mode 100644 services/mana-stt/com.mana.vllm-voxtral.plist delete mode 100755 services/mana-stt/install-service.sh delete mode 100755 services/mana-stt/install-services.sh delete mode 100755 services/mana-stt/scripts/setup-vllm.sh delete mode 100755 services/mana-stt/scripts/start-vllm-voxtral.sh delete mode 100755 services/mana-stt/setup.sh delete mode 100644 services/mana-tts/com.mana.mana-tts.plist delete mode 100755 services/mana-tts/install-service.sh delete mode 100755 services/mana-tts/setup.sh diff --git a/docs/MAC_MINI_SERVER.md b/docs/MAC_MINI_SERVER.md index 96f5c962f..f741377ad 100644 --- a/docs/MAC_MINI_SERVER.md +++ b/docs/MAC_MINI_SERVER.md @@ -318,13 +318,29 @@ Drei LaunchAgents sorgen fuer automatischen Betrieb: - Prueft alle Services (HTTP + Docker) - Sendet Benachrichtigungen bei Fehlern -### Deaktivierte LaunchAgents +### Deaktivierte / entfernte LaunchAgents -Diese LaunchAgents sind seit der GPU-Server-Migration deaktiviert: -- `homebrew.mxcl.ollama.plist` — LLM laeuft auf GPU-Server -- `com.mana.image-gen.plist` — Bildgenerierung laeuft auf GPU-Server +Seit der GPU-Server-Migration laufen keine AI-Services mehr auf dem Mac +Mini. Die zugehörigen LaunchAgents sind deaktiviert und ihre Repo-Vorlagen +wurden entfernt: +- `homebrew.mxcl.ollama.plist` — LLM läuft auf GPU-Server (`gpu-llm.mana.how`) +- `com.mana.image-gen.plist` — entfernt; image-gen läuft als + Scheduled Task `ManaImageGen` auf GPU-Server (`gpu-img.mana.how`) +- `com.mana.mana-stt.plist` — entfernt; STT als Task `ManaSTT` +- `com.mana.mana-tts.plist` — entfernt; TTS als Task `ManaTTS` +- `com.mana.vllm-voxtral.plist` — entfernt; vLLM-Voxtral nicht mehr verwendet - `com.mana.telegram-ollama-bot.plist` — Bot deaktiviert +Falls auf einem Mac Mini noch alte plists installiert sind: + +```bash +launchctl unload ~/Library/LaunchAgents/com.mana.image-gen.plist 2>/dev/null +launchctl unload ~/Library/LaunchAgents/com.mana.mana-stt.plist 2>/dev/null +launchctl unload ~/Library/LaunchAgents/com.mana.mana-tts.plist 2>/dev/null +launchctl unload ~/Library/LaunchAgents/com.mana.vllm-voxtral.plist 2>/dev/null +rm -f ~/Library/LaunchAgents/com.mana.{image-gen,mana-stt,mana-tts,vllm-voxtral}.plist +``` + ### Setup neu ausführen Falls die LaunchAgents neu eingerichtet werden müssen: @@ -684,28 +700,28 @@ docker image prune -a Alle AI-Services (LLM, Bildgenerierung, STT, TTS) laufen auf dem Windows GPU-Server (RTX 3090, 24 GB VRAM) unter `192.168.178.11`. Der Mac Mini ist reiner Hosting-Server fuer Web, API, DB und Sync. -| Service | GPU-Server Port | Zugriff aus Docker | -|---------|----------------|-------------------| -| Ollama (LLM) | 11434 | `http://192.168.178.11:11434` | -| STT (Whisper) | 3020 | `http://192.168.178.11:3020` | -| TTS | 3022 | `http://192.168.178.11:3022` | -| Image Gen | 3023 | `http://192.168.178.11:3023` | +| Service | GPU-Server Port | Zugriff aus Docker | Public URL | +|---------|----------------|-------------------|------------| +| mana-llm | 3025 | `http://192.168.178.11:3025` | `gpu-llm.mana.how` | +| mana-stt (Whisper) | 3020 | `http://192.168.178.11:3020` | `gpu-stt.mana.how` | +| mana-tts | 3022 | `http://192.168.178.11:3022` | `gpu-tts.mana.how` | +| mana-image-gen | 3023 | `http://192.168.178.11:3023` | `gpu-img.mana.how` | +| mana-video-gen | 3026 | `http://192.168.178.11:3026` | `gpu-video.mana.how` | +| Ollama | 11434 | `http://192.168.178.11:11434` | `gpu-ollama.mana.how` | + +Repo-Pendants: `services/mana-{llm,stt,tts,image-gen,video-gen}/` — die `service.pyw` Runner werden direkt auf der Windows-Box als Scheduled Tasks ausgeführt. Alle Werte sind per Env-Var ueberschreibbar (`OLLAMA_URL`, `STT_SERVICE_URL`, `TTS_SERVICE_URL`, `IMAGE_GEN_SERVICE_URL`). Cloud-Fallback bei GPU-Server-Ausfall: `mana-llm` hat `AUTO_FALLBACK_ENABLED=true` (OpenRouter, Groq, Google). -### Ollama/FLUX.2 auf dem Mac Mini (deaktiviert) +### Ollama/FLUX.2 Mac-Mini-Reste (deaktiviert) -Ollama und FLUX.2 waren frueher lokal installiert, sind aber seit 2026-03-28 deaktiviert. Die Modelle liegen noch auf der SSD als Backup: +Ollama und das alte Mac-Mini FLUX.2 (`flux2.c` MPS) waren früher lokal installiert, sind seit 2026-03-28 deaktiviert. Die zugehörigen Repo-Setup-Skripte (`scripts/mac-mini/setup-image-gen.sh`, launchd plists) wurden 2026-04-08 entfernt; die Modelle liegen ggf. noch auf der SSD als Backup: - `/Volumes/ManaData/ollama/` (~58 GB) - `/Volumes/ManaData/flux2/` (~15 GB) -Bei Bedarf reaktivieren: -```bash -brew services start ollama -launchctl load ~/Library/LaunchAgents/com.mana.image-gen.plist -``` +Falls du sie auf einem alten Mac Mini noch findest, einfach löschen — sie laufen nicht mehr und werden nirgendwo gebraucht. ## Externe 4TB SSD diff --git a/scripts/mac-mini/README.md b/scripts/mac-mini/README.md index f63fbe457..6c02a3351 100644 --- a/scripts/mac-mini/README.md +++ b/scripts/mac-mini/README.md @@ -23,7 +23,6 @@ cd ~/projects/mana-monorepo | Script | Purpose | |--------|---------| | `setup-autostart.sh` | Configure automatic startup on boot (run once) | -| `setup-stt.sh` | Setup STT service (Whisper + Voxtral) | | `startup.sh` | Main startup script (called by launchd) | | `health-check.sh` | Check all services health | | `status.sh` | Show full system status | @@ -257,29 +256,18 @@ ollama list ollama pull gemma3:4b ``` -### STT Service (Speech-to-Text) +### AI Services (STT, TTS, LLM, Image-Gen, Video-Gen) -The STT service provides Whisper and Voxtral transcription: +These have moved off the Mac Mini entirely. They run on the Windows GPU +server (`mana-server-gpu`) as Windows Scheduled Tasks. See +[`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md) +for setup, and the per-service `services/mana-{stt,tts,llm,image-gen,video-gen}/CLAUDE.md` +files for endpoint details. -```bash -# Setup (first time) -./scripts/mac-mini/setup-stt.sh +Public URLs (proxied via Cloudflare Tunnel + the Mac Mini gpu-proxy): -# Check status -curl http://localhost:3020/health - -# Transcribe audio -curl -X POST http://localhost:3020/transcribe \ - -F "file=@audio.mp3" \ - -F "language=de" - -# View logs -tail -f /tmp/mana-stt.log -``` - -**Available endpoints:** -- `POST /transcribe` - Whisper transcription (recommended) -- `POST /transcribe/voxtral` - Voxtral transcription -- `POST /transcribe/auto` - Auto-select model -- `GET /health` - Health check -- `GET /models` - List available models +- `https://gpu-stt.mana.how` +- `https://gpu-tts.mana.how` +- `https://gpu-llm.mana.how` +- `https://gpu-img.mana.how` +- `https://gpu-video.mana.how` diff --git a/scripts/mac-mini/launchd/com.mana.image-gen.plist b/scripts/mac-mini/launchd/com.mana.image-gen.plist deleted file mode 100644 index b8355adc6..000000000 --- a/scripts/mac-mini/launchd/com.mana.image-gen.plist +++ /dev/null @@ -1,53 +0,0 @@ - - - - - Label - com.mana.image-gen - ProgramArguments - - /Users/mana/projects/mana-monorepo/services/mana-image-gen/.venv/bin/python3 - -m - uvicorn - app.main:app - --host - 0.0.0.0 - --port - 3025 - - WorkingDirectory - /Users/mana/projects/mana-monorepo/services/mana-image-gen - EnvironmentVariables - - PATH - /opt/homebrew/bin:/Users/mana/projects/mana-monorepo/services/mana-image-gen/.venv/bin:/usr/local/bin:/usr/bin:/bin - HOME - /Users/mana - PORT - 3025 - FLUX_BINARY - /Users/mana/flux2/flux - FLUX_MODEL_DIR - /Users/mana/flux2/model - DEFAULT_STEPS - 4 - GENERATION_TIMEOUT - 300 - CORS_ORIGINS - https://mana.how - - RunAtLoad - - KeepAlive - - SuccessfulExit - - Crashed - - - StandardOutPath - /tmp/mana-image-gen.log - StandardErrorPath - /tmp/mana-image-gen.error.log - - diff --git a/scripts/mac-mini/launchd/com.mana.mana-stt.plist b/scripts/mac-mini/launchd/com.mana.mana-stt.plist deleted file mode 100644 index 9271a5668..000000000 --- a/scripts/mac-mini/launchd/com.mana.mana-stt.plist +++ /dev/null @@ -1,39 +0,0 @@ - - - - - Label - com.mana.mana-stt - - ProgramArguments - - /bin/bash - -c - cd /Users/mana/projects/mana-monorepo/services/mana-stt && set -a && source .env && set +a && .venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3020 - - - WorkingDirectory - /Users/mana/projects/mana-monorepo/services/mana-stt - - EnvironmentVariables - - PATH - /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin - - - RunAtLoad - - - KeepAlive - - - StandardOutPath - /Users/mana/logs/mana-stt.log - - StandardErrorPath - /Users/mana/logs/mana-stt.error.log - - ThrottleInterval - 10 - - diff --git a/scripts/mac-mini/launchd/com.mana.mana-tts.plist b/scripts/mac-mini/launchd/com.mana.mana-tts.plist deleted file mode 100644 index 084e39afb..000000000 --- a/scripts/mac-mini/launchd/com.mana.mana-tts.plist +++ /dev/null @@ -1,39 +0,0 @@ - - - - - Label - com.mana.mana-tts - - ProgramArguments - - /bin/bash - -c - cd /Users/mana/projects/mana-monorepo/services/mana-tts && set -a && source .env && set +a && .venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3022 - - - WorkingDirectory - /Users/mana/projects/mana-monorepo/services/mana-tts - - EnvironmentVariables - - PATH - /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin - - - RunAtLoad - - - KeepAlive - - - StandardOutPath - /Users/mana/logs/mana-tts.log - - StandardErrorPath - /Users/mana/logs/mana-tts.error.log - - ThrottleInterval - 10 - - diff --git a/scripts/mac-mini/setup-image-gen.sh b/scripts/mac-mini/setup-image-gen.sh deleted file mode 100755 index 9de8e8c55..000000000 --- a/scripts/mac-mini/setup-image-gen.sh +++ /dev/null @@ -1,198 +0,0 @@ -#!/bin/bash -# Setup script for Mana Image Generation as a launchd service on Mac Mini -# Run this on the Mac Mini server to install and start the image generation service - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -REPO_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)" -SERVICE_DIR="$REPO_DIR/services/mana-image-gen" -PLIST_NAME="com.mana.image-gen" -PLIST_PATH="$HOME/Library/LaunchAgents/$PLIST_NAME.plist" - -# flux2.c paths (in home directory, no sudo required) -FLUX_BINARY="$HOME/flux2/flux" -FLUX_MODEL_DIR="$HOME/flux2/model" - -echo "==========================================" -echo "Mana Image Generation - Mac Mini Setup" -echo "==========================================" -echo "" -echo "Service directory: $SERVICE_DIR" -echo "Plist path: $PLIST_PATH" -echo "Flux binary: $FLUX_BINARY" -echo "Flux model: $FLUX_MODEL_DIR" -echo "" - -# Verify service directory exists -if [[ ! -d "$SERVICE_DIR" ]]; then - echo "Error: Service directory not found: $SERVICE_DIR" - exit 1 -fi - -# Run main setup if venv doesn't exist or flux2.c not installed -if [[ ! -d "$SERVICE_DIR/.venv" ]] || [[ ! -x "$FLUX_BINARY" ]]; then - echo "Running setup (installs flux2.c + Python environment)..." - echo "" - "$SERVICE_DIR/setup.sh" - echo "" -fi - -# Verify flux2.c is available -if [[ ! -x "$FLUX_BINARY" ]]; then - echo "Error: flux2.c not found at $FLUX_BINARY" - echo "Please run setup.sh first to install flux2.c" - exit 1 -fi - -if [[ ! -d "$FLUX_MODEL_DIR" ]]; then - echo "Error: Model not found at $FLUX_MODEL_DIR" - echo "Please download the FLUX.2 klein 4B model" - exit 1 -fi - -# Create LaunchAgents directory if needed -mkdir -p "$HOME/Library/LaunchAgents" - -# Unload existing service if running -if launchctl list | grep -q "$PLIST_NAME"; then - echo "Stopping existing service..." - launchctl unload "$PLIST_PATH" 2>/dev/null || true -fi - -# Create plist file -echo "Creating launchd plist..." -cat > "$PLIST_PATH" << EOF - - - - - Label - $PLIST_NAME - - ProgramArguments - - $SERVICE_DIR/.venv/bin/uvicorn - app.main:app - --host - 0.0.0.0 - --port - 3025 - - - WorkingDirectory - $SERVICE_DIR - - EnvironmentVariables - - PATH - /opt/homebrew/bin:$SERVICE_DIR/.venv/bin:/usr/local/bin:/usr/bin:/bin - PORT - 3025 - FLUX_BINARY - $FLUX_BINARY - FLUX_MODEL_DIR - $FLUX_MODEL_DIR - DEFAULT_STEPS - 4 - DEFAULT_WIDTH - 1024 - DEFAULT_HEIGHT - 1024 - GENERATION_TIMEOUT - 120 - CORS_ORIGINS - https://mana.how,http://localhost:5173 - - - RunAtLoad - - - KeepAlive - - SuccessfulExit - - Crashed - - - - ThrottleInterval - 10 - - StandardOutPath - /tmp/mana-image-gen.log - - StandardErrorPath - /tmp/mana-image-gen.error.log - - -EOF - -echo "Plist created: $PLIST_PATH" - -# Load service -echo "" -echo "Loading service..." -launchctl load "$PLIST_PATH" - -# Wait for startup -echo "Waiting for service to start..." -sleep 3 - -# Check if running -if launchctl list | grep -q "$PLIST_NAME"; then - echo "Service loaded successfully!" -else - echo "Warning: Service may not have loaded correctly." - echo "Check logs: tail -f /tmp/mana-image-gen.log" -fi - -# Health check -echo "" -echo "Running health check..." -sleep 2 - -if curl -s http://localhost:3025/health | grep -q "healthy\|degraded"; then - echo "Health check passed!" - echo "" - curl -s http://localhost:3025/health | python3 -m json.tool -else - echo "Health check failed. Service may still be starting." - echo "Try again in a few seconds: curl http://localhost:3025/health" -fi - -echo "" -echo "==========================================" -echo "Setup Complete!" -echo "==========================================" -echo "" -echo "Service management commands:" -echo "" -echo " # View logs" -echo " tail -f /tmp/mana-image-gen.log" -echo "" -echo " # Stop service" -echo " launchctl unload $PLIST_PATH" -echo "" -echo " # Start service" -echo " launchctl load $PLIST_PATH" -echo "" -echo " # Restart service" -echo " launchctl unload $PLIST_PATH && launchctl load $PLIST_PATH" -echo "" -echo " # Check status" -echo " launchctl list | grep $PLIST_NAME" -echo "" -echo "Test endpoints:" -echo "" -echo " # Health check" -echo " curl http://localhost:3025/health" -echo "" -echo " # Model info" -echo " curl http://localhost:3025/models" -echo "" -echo " # Generate image" -echo " curl -X POST http://localhost:3025/generate \\" -echo " -H 'Content-Type: application/json' \\" -echo " -d '{\"prompt\": \"A cat in space\"}'" -echo "" diff --git a/scripts/mac-mini/setup-stt.sh b/scripts/mac-mini/setup-stt.sh deleted file mode 100755 index 398246ce2..000000000 --- a/scripts/mac-mini/setup-stt.sh +++ /dev/null @@ -1,153 +0,0 @@ -#!/bin/bash -# Setup STT Service on Mac Mini -# Creates launchd service for auto-start - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -REPO_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)" -STT_DIR="$REPO_DIR/services/mana-stt" -PLIST_NAME="com.mana.stt" -PLIST_PATH="$HOME/Library/LaunchAgents/$PLIST_NAME.plist" - -echo "==============================================" -echo " Mana STT Service Setup (Mac Mini)" -echo "==============================================" -echo "" - -# Check if STT service directory exists -if [ ! -d "$STT_DIR" ]; then - echo "Error: STT service directory not found at $STT_DIR" - exit 1 -fi - -# Run the main setup script first -echo "1. Running STT service setup..." -cd "$STT_DIR" -if [ ! -d ".venv" ]; then - echo " Installing dependencies..." - ./setup.sh -else - echo " Virtual environment already exists" - echo " Skipping dependency installation" -fi - -# Create launchd plist -echo "" -echo "2. Creating launchd service..." - -cat > "$PLIST_PATH" << EOF - - - - - Label - $PLIST_NAME - - ProgramArguments - - $STT_DIR/.venv/bin/uvicorn - app.main:app - --host - 0.0.0.0 - --port - 3020 - - - WorkingDirectory - $STT_DIR - - EnvironmentVariables - - PATH - /opt/homebrew/bin:$STT_DIR/.venv/bin:/usr/local/bin:/usr/bin:/bin - PORT - 3020 - WHISPER_MODEL - large-v3 - PRELOAD_MODELS - false - CORS_ORIGINS - https://mana.how - - - RunAtLoad - - - KeepAlive - - SuccessfulExit - - Crashed - - - - ThrottleInterval - 10 - - StandardOutPath - /tmp/mana-stt.log - - StandardErrorPath - /tmp/mana-stt.error.log - - -EOF - -echo " Created: $PLIST_PATH" - -# Unload if already loaded -echo "" -echo "3. Loading launchd service..." -launchctl unload "$PLIST_PATH" 2>/dev/null || true -launchctl load "$PLIST_PATH" - -# Wait for service to start -sleep 2 - -# Check if service is running -echo "" -echo "4. Checking service status..." -if launchctl list | grep -q "$PLIST_NAME"; then - echo " Service is running" - - # Check health endpoint - sleep 3 - if curl -s http://localhost:3020/health > /dev/null 2>&1; then - echo " Health check passed" - HEALTH=$(curl -s http://localhost:3020/health) - echo " $HEALTH" - else - echo " Warning: Health check failed (service may still be starting)" - echo " Check logs: tail -f /tmp/mana-stt.log" - fi -else - echo " Warning: Service may not be running" - echo " Check logs: tail -f /tmp/mana-stt.error.log" -fi - -echo "" -echo "==============================================" -echo " STT Service Setup Complete!" -echo "==============================================" -echo "" -echo "Service URL: http://localhost:3020" -echo "" -echo "Useful commands:" -echo " # View logs" -echo " tail -f /tmp/mana-stt.log" -echo "" -echo " # Restart service" -echo " launchctl kickstart -k gui/\$(id -u)/$PLIST_NAME" -echo "" -echo " # Stop service" -echo " launchctl unload $PLIST_PATH" -echo "" -echo " # Start service" -echo " launchctl load $PLIST_PATH" -echo "" -echo " # Test transcription" -echo " curl -X POST http://localhost:3020/transcribe \\" -echo " -F 'file=@audio.mp3' \\" -echo " -F 'language=de'" -echo "" diff --git a/scripts/mac-mini/setup-tts.sh b/scripts/mac-mini/setup-tts.sh deleted file mode 100755 index 4fe28c23a..000000000 --- a/scripts/mac-mini/setup-tts.sh +++ /dev/null @@ -1,172 +0,0 @@ -#!/bin/bash -# Setup script for Mana TTS as a launchd service on Mac Mini -# Run this on the Mac Mini server to install and start the TTS service - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -REPO_DIR="$(cd "$SCRIPT_DIR/../.." && pwd)" -SERVICE_DIR="$REPO_DIR/services/mana-tts" -PLIST_NAME="com.mana.tts" -PLIST_PATH="$HOME/Library/LaunchAgents/$PLIST_NAME.plist" - -echo "==========================================" -echo "Mana TTS - Mac Mini Setup" -echo "==========================================" -echo "" -echo "Service directory: $SERVICE_DIR" -echo "Plist path: $PLIST_PATH" -echo "" - -# Verify service directory exists -if [[ ! -d "$SERVICE_DIR" ]]; then - echo "Error: Service directory not found: $SERVICE_DIR" - exit 1 -fi - -# Run main setup if venv doesn't exist -if [[ ! -d "$SERVICE_DIR/.venv" ]]; then - echo "Virtual environment not found. Running setup..." - echo "" - "$SERVICE_DIR/setup.sh" - echo "" -fi - -# Create LaunchAgents directory if needed -mkdir -p "$HOME/Library/LaunchAgents" - -# Unload existing service if running -if launchctl list | grep -q "$PLIST_NAME"; then - echo "Stopping existing service..." - launchctl unload "$PLIST_PATH" 2>/dev/null || true -fi - -# Create plist file -echo "Creating launchd plist..." -cat > "$PLIST_PATH" << EOF - - - - - Label - $PLIST_NAME - - ProgramArguments - - $SERVICE_DIR/.venv/bin/uvicorn - app.main:app - --host - 0.0.0.0 - --port - 3022 - - - WorkingDirectory - $SERVICE_DIR - - EnvironmentVariables - - PATH - /opt/homebrew/bin:$SERVICE_DIR/.venv/bin:/usr/local/bin:/usr/bin:/bin - PORT - 3022 - PRELOAD_MODELS - false - MAX_TEXT_LENGTH - 1000 - CORS_ORIGINS - https://mana.how - - - RunAtLoad - - - KeepAlive - - SuccessfulExit - - Crashed - - - - ThrottleInterval - 10 - - StandardOutPath - /tmp/mana-tts.log - - StandardErrorPath - /tmp/mana-tts.error.log - - -EOF - -echo "Plist created: $PLIST_PATH" - -# Load service -echo "" -echo "Loading service..." -launchctl load "$PLIST_PATH" - -# Wait for startup -echo "Waiting for service to start..." -sleep 3 - -# Check if running -if launchctl list | grep -q "$PLIST_NAME"; then - echo "Service loaded successfully!" -else - echo "Warning: Service may not have loaded correctly." - echo "Check logs: tail -f /tmp/mana-tts.log" -fi - -# Health check -echo "" -echo "Running health check..." -sleep 2 - -if curl -s http://localhost:3022/health | grep -q "healthy"; then - echo "Health check passed!" - echo "" - curl -s http://localhost:3022/health | python3 -m json.tool -else - echo "Health check failed. Service may still be starting." - echo "Try again in a few seconds: curl http://localhost:3022/health" -fi - -echo "" -echo "==========================================" -echo "Setup Complete!" -echo "==========================================" -echo "" -echo "Service management commands:" -echo "" -echo " # View logs" -echo " tail -f /tmp/mana-tts.log" -echo "" -echo " # Stop service" -echo " launchctl unload $PLIST_PATH" -echo "" -echo " # Start service" -echo " launchctl load $PLIST_PATH" -echo "" -echo " # Restart service" -echo " launchctl unload $PLIST_PATH && launchctl load $PLIST_PATH" -echo "" -echo " # Check status" -echo " launchctl list | grep $PLIST_NAME" -echo "" -echo "Test endpoints:" -echo "" -echo " # Health check" -echo " curl http://localhost:3022/health" -echo "" -echo " # List voices" -echo " curl http://localhost:3022/voices" -echo "" -echo " # Synthesize with Kokoro" -echo " curl -X POST http://localhost:3022/synthesize/kokoro \\" -echo " -H 'Content-Type: application/json' \\" -echo " -d '{\"text\": \"Hello world\", \"voice\": \"af_heart\"}' \\" -echo " --output test.wav" -echo "" diff --git a/services/mana-stt/CLAUDE.md b/services/mana-stt/CLAUDE.md index 6d91d86a1..0a98c2386 100644 --- a/services/mana-stt/CLAUDE.md +++ b/services/mana-stt/CLAUDE.md @@ -1,79 +1,96 @@ # mana-stt -Speech-to-Text service for the Mana ecosystem. Runs on the Mac Mini M4 (Apple Silicon) and exposes a small FastAPI surface that wraps multiple Whisper backends plus Mistral's hosted Voxtral API. +Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090). + +> ⚠️ **Earlier history**: this directory used to contain Mac-Mini–targeted +> code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup, +> setup.sh with Apple-Silicon checks). That all moved to the Windows +> GPU box and was removed from the repo. If you're looking for the MLX +> path, see git history. ## Tech Stack | Layer | Technology | |-------|------------| -| **Runtime** | Python 3.11 + uvicorn | +| **Runtime** | Python 3.11 + uvicorn (Windows) | | **Framework** | FastAPI | -| **Local model** | Whisper Large V3 via [`lightning-whisper-mlx`](https://github.com/mustafaaljadery/lightning-whisper-mlx) (Apple MLX) | -| **Local model (rich)** | WhisperX for word-level timestamps + diarization | -| **Cloud model** | Mistral Voxtral Mini API | -| **Optional** | vLLM Voxtral (GPU) — see `vllm_service.py` | -| **Auth** | JWT validation via mana-auth (`external_auth.py`) + API key fallback (`auth.py`) | -| **Process supervision** | launchd via `com.mana.mana-stt.plist` | +| **Whisper** | `whisperx` on CUDA (large-v3 + word alignment + pyannote diarization) | +| **Voxtral (local)** | vLLM serving Voxtral 3B/4B/24B (`vllm_service.py`) | +| **Voxtral (cloud)** | Mistral API (`voxtral_api_service.py`) | +| **Auth** | Per-key + internal-key API auth (`app/auth.py`, JWT via mana-auth in `app/external_auth.py`) | +| **VRAM** | Shared `vram_manager.py` accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other | +| **Process supervision** | Windows Scheduled Task `ManaSTT` (AtLogOn) | ## Port: 3020 -## Quick Start +## Where it runs -```bash -cd services/mana-stt -./setup.sh # Create venv + install -.venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3020 -``` +| Host | Path on disk | Entrypoint | +|------|--------------|------------| +| Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-stt\` | `service.pyw` via Scheduled Task `ManaSTT` | -Production runs via launchd on the Mac Mini — `install-service.sh` (single service) or `install-services.sh` (mana-stt + vllm-voxtral together). +Public URL: `https://gpu-stt.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy). ## API Endpoints | Method | Path | Description | |--------|------|-------------| | GET | `/health` | Liveness + which backends are loaded | -| GET | `/models` | List available STT models | -| POST | `/transcribe` | Whisper MLX (default, fastest local) | -| POST | `/transcribe/whisperx` | WhisperX with word-level timestamps + diarization | -| POST | `/transcribe/voxtral` | Local Voxtral (vLLM) | -| POST | `/transcribe/voxtral/api` | Mistral Voxtral API (cloud) | -| POST | `/transcribe/auto` | Tries WhisperX first, falls back to Whisper MLX | +| GET | `/models` | Available STT models | +| POST | `/transcribe` | Whisper (WhisperX, default) — multipart `file` + optional `language` | +| POST | `/transcribe/voxtral` | Local Voxtral via vLLM | +| POST | `/transcribe/auto` | Routing helper — picks the best backend for the input | -All `/transcribe*` endpoints accept multipart `file` upload + optional `language` form field. Auth via `Authorization: Bearer ` or `X-API-Key`. +All endpoints (except `/health`) require `Authorization: Bearer `. Tokens are validated against `API_KEYS` (per-app keys) or `INTERNAL_API_KEY` (no rate limit), and JWTs from mana-auth are also accepted via `external_auth.py`. ## Backends (`app/`) | File | What it loads | |------|---------------| -| `whisper_service.py` | Whisper Large V3 via MLX (local, default) | -| `whisper_service_cuda.py` | CUDA Whisper (only used on Windows GPU server) | -| `whisperx_service.py` | WhisperX with diarization (local, slower, richer output) | -| `voxtral_service.py` | Local Voxtral via vLLM (optional, needs the second launchd job) | -| `voxtral_api_service.py` | Mistral hosted Voxtral API (cloud) | -| `vllm_service.py` | vLLM client primitives shared with Voxtral | -| `auth.py` | API key auth (fallback path) | -| `external_auth.py` | JWT auth via mana-auth public key | +| `whisper_service.py` | WhisperX on CUDA (large-v3 + alignment + pyannote diarization) | +| `voxtral_service.py` | Local Voxtral via vLLM (slower start, richer multilingual) | +| `voxtral_api_service.py` | Mistral hosted Voxtral API (cloud, no GPU needed) | +| `vllm_service.py` | vLLM client primitives shared by Voxtral | +| `vram_manager.py` | Shared VRAM accounting — same module also used by mana-tts and mana-image-gen | +| `auth.py` | API-key auth (internal + per-app keys) | +| `external_auth.py` | JWT validation via mana-auth | -Backends are loaded lazily during the FastAPI lifespan and reported by `/health`. Missing dependencies (e.g. CUDA on Mac) are tolerated — the service starts without them. +Backends are loaded lazily during the FastAPI lifespan and reported by `/health`. -## Configuration - -Reads from `services/mana-stt/.env` (loaded by the launchd plist's `set -a; source .env; set +a`). Relevant variables: +## Configuration (`.env` on the Windows GPU box) ```env PORT=3020 -MANA_AUTH_URL=http://localhost:3001 # JWKS source for JWT verification -MISTRAL_API_KEY=... # only needed for /transcribe/voxtral/api -STT_API_KEY=... # legacy API key fallback +WHISPER_MODEL=large-v3 +WHISPER_DEVICE=cuda +WHISPER_COMPUTE_TYPE=float16 +WHISPER_DEFAULT_LANGUAGE=de +PRELOAD_MODELS=true +USE_VLLM=false +HF_TOKEN=... # required for pyannote diarization models +REQUIRE_AUTH=true +API_KEYS=sk-app1:app1,sk-app2:app2 +INTERNAL_API_KEY=... # cross-service, no rate limit +CORS_ORIGINS=https://mana.how,https://chat.mana.how ``` ## Operations -- **Logs**: launchd writes to `~/Library/Logs/mana-stt.{out,err}.log` (see plist) -- **Metrics**: Prometheus endpoint at `/metrics` if enabled in config; Grafana dashboard JSON checked in at `grafana-dashboard.json` -- **Restart**: `launchctl kickstart -k gui/$(id -u)/com.mana.mana-stt` +```powershell +# Status +Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State +Get-NetTCPConnection -LocalPort 3020 -State Listen + +# Restart +Stop-ScheduledTask -TaskName "ManaSTT" +Start-ScheduledTask -TaskName "ManaSTT" + +# Logs +Get-Content C:\mana\services\mana-stt\service.log -Tail 50 +``` ## Reference -- `services/mana-stt/README.md` — user-facing setup, model download instructions, language coverage -- `docs/LOCAL_STT_MODELS.md` — WER comparisons, model size/quality tradeoffs +- `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel +- `docs/LOCAL_STT_MODELS.md` — model comparisons (WER, latency, language coverage) +- `services/mana-stt/grafana-dashboard.json` — Prometheus metrics dashboard diff --git a/services/mana-stt/README.md b/services/mana-stt/README.md index 7d76ce525..8e4abf5f1 100644 --- a/services/mana-stt/README.md +++ b/services/mana-stt/README.md @@ -1,185 +1,31 @@ # Mana STT Service -Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**. +Speech-to-Text API service running on the Windows GPU server (`mana-server-gpu`, RTX 3090). Wraps **WhisperX** (CUDA, large-v3 + word alignment + pyannote diarization), local **Voxtral via vLLM**, and the hosted **Mistral Voxtral API**. -Optimized for Mac Mini M4 (Apple Silicon). +For architecture, deployment, configuration, and operations see [`CLAUDE.md`](./CLAUDE.md) and [`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md). -## Architecture +## Port: 3020 -``` - ┌─────────────────────┐ - │ mana-stt (3020) │ - │ FastAPI │ - └─────────┬───────────┘ - │ - ┌─────────────────┼─────────────────┐ - ▼ ▼ ▼ - ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ - │ Whisper │ │ Voxtral API │ │ vLLM │ - │ MLX (Local) │ │ (Mistral) │ │ (Optional) │ - └──────────────┘ └──────────────┘ └──────────────┘ -``` +## Public URL -## Features - -- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX) -- **Voxtral Mini** - Mistral API, speaker diarization support (cloud) -- **Apple Silicon Optimized** - Uses MLX for fast local inference -- **Automatic Fallback** - Falls back between backends automatically -- **REST API** - Simple HTTP endpoints for integration - -## Quick Start - -### Installation - -```bash -cd services/mana-stt -./setup.sh -``` - -### Run Locally - -```bash -source .venv/bin/activate -uvicorn app.main:app --host 0.0.0.0 --port 3020 -``` - -### Setup as System Service (Mac Mini) - -```bash -./scripts/mac-mini/setup-stt.sh -``` +`https://gpu-stt.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy) ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| -| `/health` | GET | Health check | +| `/health` | GET | Health check + which backends are loaded | | `/models` | GET | List available models | -| `/transcribe` | POST | Whisper transcription | -| `/transcribe/voxtral` | POST | Voxtral transcription | -| `/transcribe/auto` | POST | Auto-select best model | +| `/transcribe` | POST | Whisper / WhisperX transcription | +| `/transcribe/voxtral` | POST | Voxtral transcription (local vLLM) | +| `/transcribe/auto` | POST | Auto-select best backend for the input | -## Usage Examples +All endpoints (except `/health`) require `Authorization: Bearer `. -### Transcribe with Whisper (Recommended) +## Quick Test ```bash -curl -X POST http://localhost:3020/transcribe \ - -F "file=@recording.mp3" \ - -F "language=de" -``` - -Response: -```json -{ - "text": "Das ist ein Beispieltext...", - "language": "de", - "model": "whisper-large-v3-turbo" -} -``` - -### Transcribe with Voxtral - -```bash -curl -X POST http://localhost:3020/transcribe/voxtral \ - -F "file=@recording.mp3" \ - -F "language=de" -``` - -### Auto-Select Model - -```bash -curl -X POST http://localhost:3020/transcribe/auto \ - -F "file=@recording.mp3" \ - -F "prefer=whisper" -``` - -## Configuration - -Environment variables: - -| Variable | Default | Description | -|----------|---------|-------------| -| `PORT` | `3020` | API server port | -| `WHISPER_MODEL` | `large-v3` | Default Whisper model | -| `PRELOAD_MODELS` | `false` | Load models on startup | -| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins | -| `MISTRAL_API_KEY` | - | Required for Voxtral API | -| `USE_VLLM` | `false` | Enable vLLM backend (experimental) | -| `VLLM_URL` | `http://localhost:8100` | vLLM server URL | - -## Supported Audio Formats - -- MP3, WAV, M4A, FLAC, OGG, WebM, MP4 -- Max file size: 100MB -- Any sample rate (automatically resampled to 16kHz) - -## Model Comparison - -| Model | German WER | Speed | VRAM | License | -|-------|------------|-------|------|---------| -| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT | -| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 | - -## Logs - -```bash -# Service logs -tail -f /tmp/mana-stt.log - -# Error logs -tail -f /tmp/mana-stt.error.log -``` - -## Troubleshooting - -### Model Download Slow - -First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient. - -### Out of Memory - -Reduce batch size or use smaller model: -```bash -export WHISPER_MODEL=medium -``` - -### MPS Not Available - -Ensure PyTorch is installed with MPS support: -```bash -pip install torch torchvision torchaudio -python -c "import torch; print(torch.backends.mps.is_available())" -``` - -## Integration - -### From Chat Backend (NestJS) - -```typescript -const formData = new FormData(); -formData.append('file', audioBuffer, 'recording.webm'); -formData.append('language', 'de'); - -const response = await fetch('http://localhost:3020/transcribe', { - method: 'POST', - body: formData, -}); - -const { text } = await response.json(); -``` - -### From SvelteKit Web - -```typescript -const formData = new FormData(); -formData.append('file', audioBlob, 'recording.webm'); - -const response = await fetch('https://gpu-stt.mana.how/transcribe', { - method: 'POST', - body: formData, -}); - -const { text } = await response.json(); +curl -F "file=@audio.wav" -F "language=de" \ + -H "Authorization: Bearer $INTERNAL_API_KEY" \ + https://gpu-stt.mana.how/transcribe ``` diff --git a/services/mana-stt/com.mana.mana-stt.plist b/services/mana-stt/com.mana.mana-stt.plist deleted file mode 100644 index 9271a5668..000000000 --- a/services/mana-stt/com.mana.mana-stt.plist +++ /dev/null @@ -1,39 +0,0 @@ - - - - - Label - com.mana.mana-stt - - ProgramArguments - - /bin/bash - -c - cd /Users/mana/projects/mana-monorepo/services/mana-stt && set -a && source .env && set +a && .venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3020 - - - WorkingDirectory - /Users/mana/projects/mana-monorepo/services/mana-stt - - EnvironmentVariables - - PATH - /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin - - - RunAtLoad - - - KeepAlive - - - StandardOutPath - /Users/mana/logs/mana-stt.log - - StandardErrorPath - /Users/mana/logs/mana-stt.error.log - - ThrottleInterval - 10 - - diff --git a/services/mana-stt/com.mana.vllm-voxtral.plist b/services/mana-stt/com.mana.vllm-voxtral.plist deleted file mode 100644 index 197e41921..000000000 --- a/services/mana-stt/com.mana.vllm-voxtral.plist +++ /dev/null @@ -1,41 +0,0 @@ - - - - - Label - com.mana.vllm-voxtral - - ProgramArguments - - /bin/bash - -c - cd /Users/mana/projects/mana-monorepo/services/mana-stt && ./scripts/start-vllm-voxtral.sh - - - WorkingDirectory - /Users/mana/projects/mana-monorepo/services/mana-stt - - EnvironmentVariables - - PATH - /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin - VLLM_PORT - 8100 - - - RunAtLoad - - - KeepAlive - - - StandardOutPath - /Users/mana/logs/vllm-voxtral.log - - StandardErrorPath - /Users/mana/logs/vllm-voxtral.error.log - - ThrottleInterval - 30 - - diff --git a/services/mana-stt/install-service.sh b/services/mana-stt/install-service.sh deleted file mode 100755 index 55a5200dc..000000000 --- a/services/mana-stt/install-service.sh +++ /dev/null @@ -1,45 +0,0 @@ -#!/bin/bash -# Install mana-stt as a launchd service on macOS -# Run this script on the Mac Mini server - -set -e - -SERVICE_NAME="com.mana.mana-stt" -PLIST_FILE="$SERVICE_NAME.plist" -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -LAUNCH_AGENTS_DIR="$HOME/Library/LaunchAgents" -LOG_DIR="$HOME/logs" - -echo "Installing mana-stt launchd service..." - -# Create logs directory -mkdir -p "$LOG_DIR" - -# Stop existing service if running -if launchctl list | grep -q "$SERVICE_NAME"; then - echo "Stopping existing service..." - launchctl unload "$LAUNCH_AGENTS_DIR/$PLIST_FILE" 2>/dev/null || true -fi - -# Copy plist to LaunchAgents -cp "$SCRIPT_DIR/$PLIST_FILE" "$LAUNCH_AGENTS_DIR/" - -# Load the service -echo "Loading service..." -launchctl load "$LAUNCH_AGENTS_DIR/$PLIST_FILE" - -# Check status -sleep 2 -if launchctl list | grep -q "$SERVICE_NAME"; then - echo "Service installed and running!" - echo "" - echo "Useful commands:" - echo " View logs: tail -f $LOG_DIR/mana-stt.log" - echo " View errors: tail -f $LOG_DIR/mana-stt.error.log" - echo " Stop: launchctl unload $LAUNCH_AGENTS_DIR/$PLIST_FILE" - echo " Start: launchctl load $LAUNCH_AGENTS_DIR/$PLIST_FILE" - echo " Health check: curl http://localhost:3020/health" -else - echo "ERROR: Service failed to start. Check logs at $LOG_DIR/mana-stt.error.log" - exit 1 -fi diff --git a/services/mana-stt/install-services.sh b/services/mana-stt/install-services.sh deleted file mode 100755 index e5cd3dfbb..000000000 --- a/services/mana-stt/install-services.sh +++ /dev/null @@ -1,84 +0,0 @@ -#!/bin/bash -# Install mana-stt and vllm-voxtral as launchd services on macOS -# Run this script on the Mac Mini server - -set -e - -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -LAUNCH_AGENTS_DIR="$HOME/Library/LaunchAgents" -LOG_DIR="$HOME/logs" - -echo "============================================" -echo "Installing Mana STT Services" -echo "============================================" -echo "" - -# Create logs directory -mkdir -p "$LOG_DIR" - -install_service() { - local service_name="$1" - local plist_file="$service_name.plist" - - echo "Installing $service_name..." - - # Stop existing service if running - if launchctl list | grep -q "$service_name"; then - echo " Stopping existing service..." - launchctl unload "$LAUNCH_AGENTS_DIR/$plist_file" 2>/dev/null || true - fi - - # Copy plist to LaunchAgents - cp "$SCRIPT_DIR/$plist_file" "$LAUNCH_AGENTS_DIR/" - - # Load the service - echo " Loading service..." - launchctl load "$LAUNCH_AGENTS_DIR/$plist_file" - - sleep 2 - if launchctl list | grep -q "$service_name"; then - echo " ✓ $service_name installed and running" - else - echo " ✗ $service_name failed to start" - return 1 - fi -} - -# Install vLLM first (STT depends on it) -install_service "com.mana.vllm-voxtral" - -# Wait for vLLM to initialize -echo "" -echo "Waiting for vLLM server to initialize..." -for i in {1..30}; do - if curl -s http://localhost:8100/health > /dev/null 2>&1; then - echo " ✓ vLLM server is ready" - break - fi - if [ $i -eq 30 ]; then - echo " ! vLLM server not responding yet (may still be loading model)" - fi - sleep 2 -done - -# Install STT service -echo "" -install_service "com.mana.mana-stt" - -echo "" -echo "============================================" -echo "Installation complete!" -echo "============================================" -echo "" -echo "Services:" -echo " vLLM Voxtral: http://localhost:8100" -echo " Mana STT: http://localhost:3020" -echo "" -echo "Useful commands:" -echo " View vLLM logs: tail -f $LOG_DIR/vllm-voxtral.log" -echo " View STT logs: tail -f $LOG_DIR/mana-stt.log" -echo " Health check: curl http://localhost:3020/health" -echo "" -echo "Stop all:" -echo " launchctl unload $LAUNCH_AGENTS_DIR/com.mana.vllm-voxtral.plist" -echo " launchctl unload $LAUNCH_AGENTS_DIR/com.mana.mana-stt.plist" diff --git a/services/mana-stt/scripts/setup-vllm.sh b/services/mana-stt/scripts/setup-vllm.sh deleted file mode 100755 index c6a6ad48f..000000000 --- a/services/mana-stt/scripts/setup-vllm.sh +++ /dev/null @@ -1,83 +0,0 @@ -#!/bin/bash -# Setup vLLM for Voxtral on Mac Mini M4 -# -# vLLM runs in CPU mode on macOS (no CUDA), but still provides -# the optimized inference pipeline for Voxtral models. -# -# Usage: ./scripts/setup-vllm.sh - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -SERVICE_DIR="$(dirname "$SCRIPT_DIR")" -VENV_DIR="$SERVICE_DIR/.venv-vllm" - -echo "============================================" -echo "vLLM Setup for Voxtral on Mac Mini M4" -echo "============================================" -echo "" - -# Check Python version -PYTHON_VERSION=$(python3 --version 2>&1 | awk '{print $2}') -PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1) -PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2) - -if [[ "$PYTHON_MAJOR" -lt 3 ]] || [[ "$PYTHON_MAJOR" -eq 3 && "$PYTHON_MINOR" -lt 10 ]]; then - echo "Error: Python 3.10+ required (found $PYTHON_VERSION)" - exit 1 -fi -echo "Python version: $PYTHON_VERSION" - -# Create separate venv for vLLM (to avoid conflicts with whisper) -echo "" -echo "Creating virtual environment for vLLM..." -python3 -m venv "$VENV_DIR" -source "$VENV_DIR/bin/activate" - -# Upgrade pip -pip install --upgrade pip --quiet - -# Install vLLM with audio support -echo "" -echo "Installing vLLM with audio support..." -echo "This may take a few minutes..." - -# Install uv for faster package installation -pip install uv --quiet - -# Install vLLM with audio support (nightly for best Voxtral support) -uv pip install "vllm[audio]>=0.10.0" --extra-index-url https://wheels.vllm.ai/nightly 2>&1 || { - echo "Nightly install failed, trying stable..." - uv pip install "vllm[audio]>=0.10.0" -} - -# Install mistral-common with audio -uv pip install "mistral-common[audio]>=1.8.1" - -echo "" -echo "============================================" -echo "Installation complete!" -echo "============================================" -echo "" -echo "To start Voxtral Mini 3B server:" -echo " source $VENV_DIR/bin/activate" -echo " vllm serve mistralai/Voxtral-Mini-3B-2507 \\" -echo " --tokenizer_mode mistral \\" -echo " --config_format mistral \\" -echo " --load_format mistral \\" -echo " --host 0.0.0.0 \\" -echo " --port 8100" -echo "" -echo "To start Voxtral Realtime 4B server:" -echo " source $VENV_DIR/bin/activate" -echo " vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 \\" -echo " --host 0.0.0.0 \\" -echo " --port 8100" -echo "" -echo "API Endpoint: http://localhost:8100/v1/audio/transcriptions" -echo "" -echo "Test with:" -echo " curl http://localhost:8100/v1/audio/transcriptions \\" -echo " -F file=@test.mp3 \\" -echo " -F model=mistralai/Voxtral-Mini-3B-2507 \\" -echo " -F language=de" diff --git a/services/mana-stt/scripts/start-vllm-voxtral.sh b/services/mana-stt/scripts/start-vllm-voxtral.sh deleted file mode 100755 index 70259d59a..000000000 --- a/services/mana-stt/scripts/start-vllm-voxtral.sh +++ /dev/null @@ -1,41 +0,0 @@ -#!/bin/bash -# Start vLLM server for Voxtral -# -# Usage: ./scripts/start-vllm-voxtral.sh [model] -# model: "3b" (default) or "4b" for Realtime - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -SERVICE_DIR="$(dirname "$SCRIPT_DIR")" -VENV_DIR="$SERVICE_DIR/.venv-vllm" -MODEL="${1:-3b}" -PORT="${VLLM_PORT:-8100}" - -# Activate venv -source "$VENV_DIR/bin/activate" - -echo "Starting vLLM Voxtral server..." -echo "Port: $PORT" - -if [[ "$MODEL" == "4b" || "$MODEL" == "realtime" ]]; then - echo "Model: Voxtral Mini 4B Realtime" - exec vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 \ - --host 0.0.0.0 \ - --port "$PORT" \ - --max-model-len 4096 \ - --max-num-batched-tokens 4096 \ - --enforce-eager -else - echo "Model: Voxtral Mini 3B" - # CPU mode needs smaller context and batched tokens - exec vllm serve mistralai/Voxtral-Mini-3B-2507 \ - --tokenizer_mode mistral \ - --config_format mistral \ - --load_format mistral \ - --host 0.0.0.0 \ - --port "$PORT" \ - --max-model-len 4096 \ - --max-num-batched-tokens 4096 \ - --enforce-eager -fi diff --git a/services/mana-stt/setup.sh b/services/mana-stt/setup.sh deleted file mode 100755 index f7c878bd3..000000000 --- a/services/mana-stt/setup.sh +++ /dev/null @@ -1,123 +0,0 @@ -#!/bin/bash -# Mana STT Service Setup Script -# For Mac Mini M4 (Apple Silicon) - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -VENV_DIR="$SCRIPT_DIR/.venv" -PYTHON_VERSION="3.11" - -echo "==============================================" -echo " Mana STT Service Setup" -echo " Whisper (Lightning MLX) + Voxtral" -echo "==============================================" -echo "" - -# Check if running on macOS -if [[ "$(uname)" != "Darwin" ]]; then - echo "Warning: This script is optimized for macOS (Apple Silicon)" -fi - -# Check for Apple Silicon -if [[ "$(uname -m)" != "arm64" ]]; then - echo "Warning: Not running on Apple Silicon. MLX optimizations won't work." -fi - -# Check Python version -echo "1. Checking Python installation..." -if command -v python3.11 &> /dev/null; then - PYTHON_CMD="python3.11" -elif command -v python3 &> /dev/null; then - PYTHON_CMD="python3" - PY_VERSION=$($PYTHON_CMD --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2) - echo " Found Python $PY_VERSION" -else - echo "Error: Python 3 not found. Please install Python 3.11+" - echo " brew install python@3.11" - exit 1 -fi - -# Create virtual environment -echo "" -echo "2. Creating virtual environment..." -if [ -d "$VENV_DIR" ]; then - echo " Virtual environment already exists at $VENV_DIR" - read -p " Recreate? (y/N) " -n 1 -r - echo - if [[ $REPLY =~ ^[Yy]$ ]]; then - rm -rf "$VENV_DIR" - $PYTHON_CMD -m venv "$VENV_DIR" - echo " Virtual environment recreated" - fi -else - $PYTHON_CMD -m venv "$VENV_DIR" - echo " Virtual environment created at $VENV_DIR" -fi - -# Activate virtual environment -source "$VENV_DIR/bin/activate" - -# Upgrade pip -echo "" -echo "3. Upgrading pip..." -pip install --upgrade pip wheel setuptools - -# Install dependencies -echo "" -echo "4. Installing dependencies..." -echo " This may take several minutes (downloading large models)..." - -# Install PyTorch with MPS support first -pip install torch torchvision torchaudio - -# Install MLX for Apple Silicon -pip install mlx - -# Install other dependencies -pip install -r "$SCRIPT_DIR/requirements.txt" - -# Install scipy for audio resampling (needed by Voxtral) -pip install scipy - -echo "" -echo "5. Verifying installation..." - -# Test imports -python -c "import torch; print(f' PyTorch {torch.__version__} - MPS available: {torch.backends.mps.is_available()}')" -python -c "import mlx; print(f' MLX installed')" 2>/dev/null || echo " MLX not available (CPU fallback)" -python -c "import fastapi; print(f' FastAPI {fastapi.__version__}')" - -echo "" -echo "6. Downloading Whisper model (large-v3)..." -echo " This will download ~2.9 GB on first run..." -# Pre-download the model -python -c " -from lightning_whisper_mlx import LightningWhisperMLX -print(' Initializing Whisper model...') -whisper = LightningWhisperMLX(model='large-v3', batch_size=12) -print(' Whisper model ready!') -" || echo " Note: Model will be downloaded on first transcription request" - -echo "" -echo "==============================================" -echo " Setup Complete!" -echo "==============================================" -echo "" -echo "To start the STT service:" -echo "" -echo " cd $SCRIPT_DIR" -echo " source .venv/bin/activate" -echo " uvicorn app.main:app --host 0.0.0.0 --port 3020" -echo "" -echo "Or use the systemd/launchd service (recommended for production):" -echo "" -echo " ./scripts/mac-mini/setup-stt.sh" -echo "" -echo "API Endpoints:" -echo " POST /transcribe - Whisper transcription" -echo " POST /transcribe/voxtral - Voxtral transcription" -echo " POST /transcribe/auto - Auto-select best model" -echo " GET /health - Health check" -echo " GET /models - List available models" -echo "" diff --git a/services/mana-tts/CLAUDE.md b/services/mana-tts/CLAUDE.md index 6951e048b..78319c0da 100644 --- a/services/mana-tts/CLAUDE.md +++ b/services/mana-tts/CLAUDE.md @@ -1,125 +1,115 @@ -# CLAUDE.md - Mana TTS Service +# mana-tts -## Service Overview +Text-to-Speech microservice. Wraps Kokoro (English presets), Piper (German, local ONNX), and F5-TTS (voice cloning) behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090). -Text-to-Speech microservice using MLX-optimized models for Apple Silicon: +> ⚠️ **Earlier history**: this directory used to contain MLX-optimized +> Mac-Mini code (`f5-tts-mlx`, `mlx-audio`, `setup.sh` with Apple Silicon +> checks, `com.mana.mana-tts.plist` launchd setup). All of that moved to +> the Windows GPU box and was removed from the repo. If you need the +> MLX path, see git history. -- **Port**: 3022 -- **Framework**: Python + FastAPI -- **Models**: Kokoro-82M (fast), F5-TTS (voice cloning) +## Tech Stack -## Commands +| Layer | Technology | +|-------|------------| +| **Runtime** | Python 3.11 + uvicorn (Windows) | +| **Framework** | FastAPI | +| **English (preset)** | Kokoro-82M (`kokoro_service.py`) | +| **German (local)** | Piper ONNX with `kerstin_low.onnx` and `thorsten_medium.onnx` voices (`piper_service.py`) | +| **Voice cloning** | F5-TTS on CUDA (`f5_service.py`) | +| **Audio I/O** | `soundfile`, `pydub` | +| **Auth** | Per-key + internal-key API auth (`auth.py`) + JWT via mana-auth (`external_auth.py`) | +| **VRAM** | Shared `vram_manager.py` (same module as mana-stt + mana-image-gen) | +| **Process supervision** | Windows Scheduled Task `ManaTTS` (AtLogOn) | -```bash -# Setup -./setup.sh +## Port: 3022 -# Development -source .venv/bin/activate -uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload +## Where it runs -# Production (Mac Mini) -../../scripts/mac-mini/setup-tts.sh +| Host | Path on disk | Entrypoint | +|------|--------------|------------| +| Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-tts\` | `service.pyw` via Scheduled Task `ManaTTS` | -# Test -curl http://localhost:3022/health +Public URL: `https://gpu-tts.mana.how`. -# English (Kokoro) -curl -X POST http://localhost:3022/synthesize/kokoro \ - -H "Content-Type: application/json" \ - -d '{"text": "Hello world", "voice": "af_heart"}' \ - --output test_en.wav +## API Endpoints -# German (Piper) - use /synthesize/auto -curl -X POST http://localhost:3022/synthesize/auto \ - -H "Content-Type: application/json" \ - -d '{"text": "Hallo Welt", "voice": "de_kerstin"}' \ - --output test_de.wav +| Method | Path | Description | +|--------|------|-------------| +| GET | `/health` | Liveness + which backends are loaded | +| GET | `/models` | Available TTS models | +| GET | `/voices` | List all voices (preset + custom) | +| POST | `/voices` | Register a custom voice (reference audio + transcript) | +| DELETE | `/voices/{voice_id}` | Delete a custom voice | +| POST | `/synthesize/kokoro` | Kokoro synthesis (English presets) | +| POST | `/synthesize` | F5-TTS voice cloning | +| POST | `/synthesize/auto` | Routing helper — picks the right backend for the requested voice | + +All non-health endpoints require `Authorization: Bearer ` (per-app key, internal key, or mana-auth JWT). + +## Voices + +### Kokoro-82M (English presets) +~300 MB download. 30+ preset English voices. Fast, no reference audio needed. + +### Piper (German, local ONNX) +~63 MB per voice. 100% local, GDPR-compliant. Available: +- `de_kerstin` (female, default) +- `de_thorsten` (male) + +Fallback to Edge TTS cloud voices if Piper isn't loaded. + +### F5-TTS (voice cloning) +~6 GB. Requires reference audio + transcript. Higher quality, slower. Custom voices live in `voices/` (reference audio + transcript per voice ID). + +## Configuration (`.env` on the Windows GPU box) + +```env +PORT=3022 +PRELOAD_MODELS=false +MAX_TEXT_LENGTH=1000 +REQUIRE_AUTH=true +API_KEYS=sk-app1:app1,sk-app2:app2 +INTERNAL_API_KEY=... +CORS_ORIGINS=https://mana.how,https://chat.mana.how ``` -## File Structure +## Code layout ``` services/mana-tts/ ├── app/ │ ├── __init__.py -│ ├── main.py # FastAPI endpoints -│ ├── kokoro_service.py # Kokoro TTS (English preset voices) -│ ├── piper_service.py # Piper TTS (German voices, local) -│ ├── f5_service.py # F5-TTS (voice cloning) -│ ├── voice_manager.py # Custom voice registry -│ └── audio_utils.py # Audio format conversion -├── piper_voices/ # Piper voice models (.onnx) -├── voices/ # Custom F5 voice storage -├── mlx_models/ # MLX model cache -├── setup.sh # Setup script -├── requirements.txt -└── README.md +│ ├── main.py # FastAPI endpoints +│ ├── kokoro_service.py # Kokoro (English presets) +│ ├── piper_service.py # Piper (German, local ONNX) +│ ├── f5_service.py # F5-TTS (voice cloning, CUDA) +│ ├── voice_manager.py # Custom voice registry +│ ├── audio_utils.py # Format conversion, resampling +│ ├── auth.py # API-key auth +│ ├── external_auth.py # JWT validation via mana-auth +│ └── vram_manager.py # Shared VRAM accountant +└── service.pyw # Windows runner (used by ManaTTS scheduled task) ``` -## API Endpoints +The Piper voice ONNX files live alongside the service on the GPU box (`C:\mana\services\mana-tts\piper_voices\*.onnx`) — too big to commit, downloaded once during setup. -| Endpoint | Method | Purpose | -|----------|--------|---------| -| `/health` | GET | Health check | -| `/models` | GET | Model info | -| `/voices` | GET | List all voices | -| `/voices` | POST | Register custom voice | -| `/voices/{id}` | DELETE | Delete custom voice | -| `/synthesize/kokoro` | POST | Kokoro synthesis | -| `/synthesize` | POST | F5-TTS voice cloning | -| `/synthesize/auto` | POST | Auto-select model | +## Operations -## Models +```powershell +# Status +Get-ScheduledTask -TaskName "ManaTTS" | Format-List TaskName, State +Get-NetTCPConnection -LocalPort 3022 -State Listen -### Kokoro-82M (English) -- ~300 MB download -- 30+ preset English voices -- Fast inference -- No reference audio needed +# Restart +Stop-ScheduledTask -TaskName "ManaTTS" +Start-ScheduledTask -TaskName "ManaTTS" -### Piper TTS (German) -- ~63 MB per voice model -- 100% local, GDPR-compliant -- Fast inference on CPU -- Available voices: - - `de_kerstin` - Female (default) - - `de_thorsten` - Male -- Fallback to Edge TTS (cloud) if Piper unavailable: - - `de_katja` - Female (cloud) - - `de_conrad` - Male (cloud) - - `de_amala` - Female young (cloud) - - `de_florian` - Male young (cloud) +# Logs +Get-Content C:\mana\services\mana-tts\service.log -Tail 50 +``` -### F5-TTS (Voice Cloning) -- ~6 GB download -- Voice cloning capability -- Requires reference audio + transcript -- Higher quality, slower +## Reference -## Environment Variables - -| Variable | Default | Description | -|----------|---------|-------------| -| `PORT` | `3022` | Service port | -| `PRELOAD_MODELS` | `false` | Load on startup | -| `MAX_TEXT_LENGTH` | `1000` | Max chars | -| `CORS_ORIGINS` | (production URLs) | CORS config | - -## Key Dependencies - -- `fastapi` - Web framework -- `f5-tts-mlx` - Voice cloning model -- `mlx-audio` - Kokoro implementation -- `mlx` - Apple Silicon ML framework -- `piper-tts` - German TTS (local) -- `edge-tts` - German TTS fallback (cloud) -- `soundfile` - Audio I/O -- `pydub` - MP3 conversion - -## Development Notes - -- Models load lazily on first request (unless `PRELOAD_MODELS=true`) -- Custom voices stored in `voices/` with reference audio + transcript -- Singleton pattern for model instances -- Audio returned as raw bytes with headers for metadata +- `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel +- `docs/PORT_SCHEMA.md` — port assignments across services diff --git a/services/mana-tts/README.md b/services/mana-tts/README.md index 15f936ca6..fa99f7039 100644 --- a/services/mana-tts/README.md +++ b/services/mana-tts/README.md @@ -1,237 +1,36 @@ # Mana TTS -Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon. +Text-to-Speech microservice running on the Windows GPU server (`mana-server-gpu`, RTX 3090). Wraps **Kokoro** (English presets), **Piper** (German, local ONNX), and **F5-TTS** (CUDA voice cloning). -## Features +For architecture, deployment, configuration, and operations see [`CLAUDE.md`](./CLAUDE.md) and [`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md). -- **Kokoro TTS**: Fast preset voices (~300 MB model) -- **F5-TTS**: Voice cloning with reference audio (~6 GB model) -- **MLX Optimized**: Runs efficiently on Apple Silicon -- **REST API**: FastAPI with OpenAPI documentation +## Port: 3022 -## Quick Start +## Public URL -### Setup - -```bash -# Run setup script -./setup.sh - -# Or manually -python3.11 -m venv .venv -source .venv/bin/activate -pip install -r requirements.txt -``` - -### Start Service - -```bash -source .venv/bin/activate -uvicorn app.main:app --host 0.0.0.0 --port 3022 -``` - -### Test - -```bash -# Health check -curl http://localhost:3022/health - -# Synthesize with Kokoro -curl -X POST http://localhost:3022/synthesize/kokoro \ - -H "Content-Type: application/json" \ - -d '{"text": "Hello world", "voice": "af_heart"}' \ - --output test.wav - -# Play audio (macOS) -afplay test.wav -``` +`https://gpu-tts.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy) ## API Endpoints -### Health & Info - | Endpoint | Method | Description | |----------|--------|-------------| -| `/health` | GET | Health check | -| `/models` | GET | Available models | -| `/voices` | GET | All available voices | - -### Synthesis - -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/synthesize/kokoro` | POST | Kokoro preset voices | +| `/health` | GET | Health check + which backends are loaded | +| `/models` | GET | List available models | +| `/voices` | GET | List preset + custom voices | +| `/voices` | POST | Register a custom voice (reference audio + transcript) | +| `/voices/{id}` | DELETE | Delete a custom voice | +| `/synthesize/kokoro` | POST | Kokoro (English presets) | | `/synthesize` | POST | F5-TTS voice cloning | -| `/synthesize/auto` | POST | Auto-select model | +| `/synthesize/auto` | POST | Auto-select best backend for the requested voice | -### Voice Management +All non-health endpoints require `Authorization: Bearer `. -| Endpoint | Method | Description | -|----------|--------|-------------| -| `/voices` | POST | Register custom voice | -| `/voices/{id}` | DELETE | Delete custom voice | - -## Synthesis Examples - -### Kokoro (Fast Preset Voices) +## Quick Test ```bash -curl -X POST http://localhost:3022/synthesize/kokoro \ +curl -X POST https://gpu-tts.mana.how/synthesize/kokoro \ + -H "Authorization: Bearer $INTERNAL_API_KEY" \ -H "Content-Type: application/json" \ - -d '{ - "text": "Welcome to Mana TTS, your personal voice synthesis service.", - "voice": "af_heart", - "speed": 1.0, - "output_format": "wav" - }' \ - --output output.wav + -d '{"text":"Hello world","voice":"af_heart"}' \ + --output test.wav ``` - -### F5-TTS (Voice Cloning) - -```bash -# With reference audio upload -curl -X POST http://localhost:3022/synthesize \ - -F "text=Hello, this is a cloned voice speaking." \ - -F "reference_audio=@reference.wav" \ - -F "reference_text=This is what the reference audio says." \ - -F "output_format=wav" \ - --output cloned.wav - -# With registered voice -curl -X POST http://localhost:3022/synthesize \ - -F "text=Hello from my registered voice." \ - -F "voice_id=my_custom_voice" \ - --output output.wav -``` - -### Auto-Select - -```bash -# Uses Kokoro for preset voices, F5-TTS for custom -curl -X POST http://localhost:3022/synthesize/auto \ - -H "Content-Type: application/json" \ - -d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \ - --output output.wav -``` - -## Available Kokoro Voices - -### American Female -- `af_heart` - Warm, emotional (default) -- `af_alloy` - Neutral, professional -- `af_bella` - Friendly, approachable -- `af_jessica` - Confident, clear -- `af_nicole` - Bright, energetic -- `af_nova` - Modern, dynamic -- `af_sarah` - Warm, conversational -- ... and more - -### American Male -- `am_adam` - Deep, authoritative -- `am_echo` - Resonant, clear -- `am_eric` - Professional, neutral -- `am_michael` - Warm, trustworthy -- ... and more - -### British Female -- `bf_alice` - Refined, elegant -- `bf_emma` - Clear, professional -- `bf_lily` - Soft, gentle - -### British Male -- `bm_daniel` - Classic, authoritative -- `bm_fable` - Storyteller, expressive -- `bm_george` - Traditional, clear - -## Voice Registration - -Register a custom voice for F5-TTS voice cloning: - -```bash -curl -X POST http://localhost:3022/voices \ - -F "voice_id=my_voice" \ - -F "name=My Custom Voice" \ - -F "description=A sample voice for testing" \ - -F "transcript=Hello, this is the text spoken in the reference audio." \ - -F "reference_audio=@my_reference.wav" -``` - -Pre-defined voices can also be placed in the `voices/` directory: - -``` -voices/ -└── my_voice/ - ├── reference.wav # Reference audio (required) - ├── transcript.txt # Transcript of reference (required) - └── metadata.json # Name and description (optional) -``` - -## Configuration - -| Variable | Default | Description | -|----------|---------|-------------| -| `PORT` | `3022` | API port | -| `PRELOAD_MODELS` | `false` | Load models on startup | -| `MAX_TEXT_LENGTH` | `1000` | Max characters per request | -| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins | -| `F5_MODEL` | `lucasnewman/f5-tts-mlx` | F5-TTS model | -| `KOKORO_MODEL` | `mlx-community/Kokoro-82M-bf16` | Kokoro model | - -## Mac Mini Deployment - -```bash -# Install and start as launchd service -../../scripts/mac-mini/setup-tts.sh - -# Service management -launchctl list | grep com.mana.tts -launchctl unload ~/Library/LaunchAgents/com.mana.tts.plist -launchctl load ~/Library/LaunchAgents/com.mana.tts.plist - -# View logs -tail -f /tmp/mana-tts.log -``` - -## Requirements - -- Python 3.10+ -- macOS with Apple Silicon (recommended) -- ~7 GB disk space for models -- 16 GB RAM recommended -- ffmpeg (for MP3 output) - -## Troubleshooting - -### Models Not Loading - -```bash -# Check MLX installation -python -c "import mlx; print(mlx.__version__)" - -# Check mlx-audio -python -c "import mlx_audio; print('OK')" - -# Check f5-tts-mlx -python -c "from f5_tts_mlx import F5TTS; print('OK')" -``` - -### MP3 Output Not Working - -```bash -# Install ffmpeg -brew install ffmpeg - -# Verify -ffmpeg -version -``` - -### Memory Issues - -- Reduce `MAX_TEXT_LENGTH` for less memory usage -- Set `PRELOAD_MODELS=false` for lazy loading -- F5-TTS requires ~6 GB, Kokoro ~500 MB - -## API Documentation - -When running, visit http://localhost:3022/docs for interactive API documentation. diff --git a/services/mana-tts/com.mana.mana-tts.plist b/services/mana-tts/com.mana.mana-tts.plist deleted file mode 100644 index 084e39afb..000000000 --- a/services/mana-tts/com.mana.mana-tts.plist +++ /dev/null @@ -1,39 +0,0 @@ - - - - - Label - com.mana.mana-tts - - ProgramArguments - - /bin/bash - -c - cd /Users/mana/projects/mana-monorepo/services/mana-tts && set -a && source .env && set +a && .venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3022 - - - WorkingDirectory - /Users/mana/projects/mana-monorepo/services/mana-tts - - EnvironmentVariables - - PATH - /opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin - - - RunAtLoad - - - KeepAlive - - - StandardOutPath - /Users/mana/logs/mana-tts.log - - StandardErrorPath - /Users/mana/logs/mana-tts.error.log - - ThrottleInterval - 10 - - diff --git a/services/mana-tts/install-service.sh b/services/mana-tts/install-service.sh deleted file mode 100755 index 15a153af1..000000000 --- a/services/mana-tts/install-service.sh +++ /dev/null @@ -1,45 +0,0 @@ -#!/bin/bash -# Install mana-tts as a launchd service on macOS -# Run this script on the Mac Mini server - -set -e - -SERVICE_NAME="com.mana.mana-tts" -PLIST_FILE="$SERVICE_NAME.plist" -SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" -LAUNCH_AGENTS_DIR="$HOME/Library/LaunchAgents" -LOG_DIR="$HOME/logs" - -echo "Installing mana-tts launchd service..." - -# Create logs directory -mkdir -p "$LOG_DIR" - -# Stop existing service if running -if launchctl list | grep -q "$SERVICE_NAME"; then - echo "Stopping existing service..." - launchctl unload "$LAUNCH_AGENTS_DIR/$PLIST_FILE" 2>/dev/null || true -fi - -# Copy plist to LaunchAgents -cp "$SCRIPT_DIR/$PLIST_FILE" "$LAUNCH_AGENTS_DIR/" - -# Load the service -echo "Loading service..." -launchctl load "$LAUNCH_AGENTS_DIR/$PLIST_FILE" - -# Check status -sleep 2 -if launchctl list | grep -q "$SERVICE_NAME"; then - echo "Service installed and running!" - echo "" - echo "Useful commands:" - echo " View logs: tail -f $LOG_DIR/mana-tts.log" - echo " View errors: tail -f $LOG_DIR/mana-tts.error.log" - echo " Stop: launchctl unload $LAUNCH_AGENTS_DIR/$PLIST_FILE" - echo " Start: launchctl load $LAUNCH_AGENTS_DIR/$PLIST_FILE" - echo " Health check: curl http://localhost:3022/health" -else - echo "ERROR: Service failed to start. Check logs at $LOG_DIR/mana-tts.error.log" - exit 1 -fi diff --git a/services/mana-tts/setup.sh b/services/mana-tts/setup.sh deleted file mode 100755 index 280bfa625..000000000 --- a/services/mana-tts/setup.sh +++ /dev/null @@ -1,150 +0,0 @@ -#!/bin/bash -# Setup script for Mana TTS service -# Optimized for Apple Silicon (MLX) - -set -e - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -VENV_DIR="$SCRIPT_DIR/.venv" -PYTHON_VERSION="3.11" - -echo "==========================================" -echo "Mana TTS Setup" -echo "==========================================" -echo "" - -# Check platform -if [[ "$(uname)" != "Darwin" ]]; then - echo "Warning: This service is optimized for macOS with Apple Silicon." - echo "Some features may not work on other platforms." - echo "" -fi - -# Check for Apple Silicon -if [[ "$(uname -m)" != "arm64" ]]; then - echo "Warning: This service is optimized for Apple Silicon (arm64)." - echo "Performance may be reduced on Intel Macs." - echo "" -fi - -# Find Python -if command -v python3.11 &> /dev/null; then - PYTHON_CMD="python3.11" -elif command -v python3 &> /dev/null; then - PYTHON_CMD="python3" -else - echo "Error: Python 3 not found. Please install Python 3.11 or later." - exit 1 -fi - -echo "Using Python: $PYTHON_CMD" -$PYTHON_CMD --version -echo "" - -# Check Python version -PYTHON_MAJOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.major)") -PYTHON_MINOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.minor)") - -if [[ $PYTHON_MAJOR -lt 3 ]] || [[ $PYTHON_MINOR -lt 10 ]]; then - echo "Error: Python 3.10 or later required. Found $PYTHON_MAJOR.$PYTHON_MINOR" - exit 1 -fi - -# Create or recreate virtual environment -if [[ -d "$VENV_DIR" ]]; then - echo "Virtual environment exists at $VENV_DIR" - read -p "Recreate it? (y/N) " -n 1 -r - echo "" - if [[ $REPLY =~ ^[Yy]$ ]]; then - echo "Removing existing virtual environment..." - rm -rf "$VENV_DIR" - echo "Creating new virtual environment..." - $PYTHON_CMD -m venv "$VENV_DIR" - fi -else - echo "Creating virtual environment..." - $PYTHON_CMD -m venv "$VENV_DIR" -fi - -# Activate virtual environment -echo "Activating virtual environment..." -source "$VENV_DIR/bin/activate" - -# Upgrade pip -echo "" -echo "Upgrading pip..." -pip install --upgrade pip - -# Install dependencies -echo "" -echo "Installing dependencies..." -pip install -r "$SCRIPT_DIR/requirements.txt" - -# Install ffmpeg check (for MP3 support) -echo "" -echo "Checking for ffmpeg (required for MP3 output)..." -if command -v ffmpeg &> /dev/null; then - echo "ffmpeg found: $(which ffmpeg)" -else - echo "Warning: ffmpeg not found. MP3 output will not work." - echo "Install with: brew install ffmpeg" -fi - -# Verify installations -echo "" -echo "Verifying installations..." - -# Test FastAPI -python -c "import fastapi; print(f'FastAPI {fastapi.__version__}')" || { - echo "Error: FastAPI not installed correctly" - exit 1 -} - -# Test soundfile -python -c "import soundfile; print(f'soundfile {soundfile.__version__}')" || { - echo "Error: soundfile not installed correctly" - exit 1 -} - -# Test MLX (on Apple Silicon) -if [[ "$(uname -m)" == "arm64" ]]; then - python -c "import mlx; print(f'MLX {mlx.__version__}')" || { - echo "Warning: MLX not installed correctly. TTS may not work." - } -fi - -# Test mlx-audio -python -c "import mlx_audio; print('mlx-audio installed')" 2>/dev/null || { - echo "Warning: mlx-audio not imported successfully." - echo "You may need to install it manually or models won't load." -} - -# Create directories -echo "" -echo "Creating required directories..." -mkdir -p "$SCRIPT_DIR/voices" -mkdir -p "$SCRIPT_DIR/mlx_models" - -echo "" -echo "==========================================" -echo "Setup Complete!" -echo "==========================================" -echo "" -echo "To start the service:" -echo "" -echo " cd $SCRIPT_DIR" -echo " source .venv/bin/activate" -echo " uvicorn app.main:app --host 0.0.0.0 --port 3022" -echo "" -echo "Or for development with auto-reload:" -echo "" -echo " uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload" -echo "" -echo "Test the service:" -echo "" -echo " curl http://localhost:3022/health" -echo "" -echo "For Mac Mini deployment, run:" -echo "" -echo " ./../../scripts/mac-mini/setup-tts.sh" -echo ""