chore(cutover): remove services/mana-tts/ — moved to mana-platform

Live containers on the Mac Mini build out of `../mana/services/mana-tts/` since the 8-Doppel-Cutover commit (774852ba2). Smoke test green 2026-05-08 — health endpoints, JWKS, login flow, Stripe-webhook all reachable from the new build path. Removing the now-stale duplicate. Was 148K in this repo, gone now. Active code lives in `Code/mana/services/mana-tts/` (siehe ../mana/CLAUDE.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 20:21:09 +02:00 · 2026-05-08 18:53:53 +02:00 · 2026-05-08 18:53:53 +02:00 · 6103d4d2d9
commit 6103d4d2d9
parent 3c4a6d4f69
19 changed files with 0 additions and 3360 deletions
--- a/services/mana-tts/.env.example
+++ b/services/mana-tts/.env.example
@ -1,36 +0,0 @@
 # Mana TTS Service Configuration
 # Copy to .env and adjust values as needed
 # Server
 PORT=3022
 # Models
 # Set to true to preload models on startup (slower startup, faster first request)
 PRELOAD_MODELS=false
 # Text Limits
 MAX_TEXT_LENGTH=1000
 # CORS Origins (comma-separated)
 CORS_ORIGINS=https://mana.how,https://chat.mana.how,http://localhost:5173
 # ===========================================
 # Authentication
 # ===========================================
 # Enable API key authentication (default: true for production)
 REQUIRE_AUTH=true
 # API Keys (comma-separated, format: key:name)
 # Example: sk-abc123:myapp,sk-def456:testuser
 API_KEYS=
 # Internal API key (no rate limit, for internal services)
 # Generate with: openssl rand -hex 32
 INTERNAL_API_KEY=
 # Rate Limiting
 # Requests per window per API key
 RATE_LIMIT_REQUESTS=60
 # Window size in seconds
 RATE_LIMIT_WINDOW=60
--- a/services/mana-tts/CLAUDE.md
+++ b/services/mana-tts/CLAUDE.md
@ -1,127 +0,0 @@
 # mana-tts
 Text-to-Speech microservice. Wraps Kokoro (English presets), Piper (German, local ONNX), and F5-TTS (voice cloning) behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090).
 > ⚠️ **Earlier history**: this directory used to contain MLX-optimized
 > Mac-Mini code (`f5-tts-mlx`, `mlx-audio`, `setup.sh` with Apple Silicon
 > checks, `com.mana.mana-tts.plist` launchd setup). All of that moved to
 > the Windows GPU box and was removed from the repo. If you need the
 > MLX path, see git history.
 ## Tech Stack
 | Layer | Technology |
 |-------|------------|
 | **Runtime** | Python 3.11 + uvicorn (Windows) |
 | **Framework** | FastAPI |
 | **English (preset)** | Kokoro-82M (`kokoro_service.py`) |
 | **German (local)** | Piper ONNX with `kerstin_low.onnx` and `thorsten_medium.onnx` voices (`piper_service.py`) |
 | **German (high-quality)** | Orpheus-3B German finetune (`orpheus_service.py`) — best for pre-generation |
 | **Multilingual (expressive)** | Zonos v0.1 by Zyphra (`zonos_service.py`) — emotion control, 200k hours training |
 | **Voice cloning** | F5-TTS on CUDA (`f5_service.py`) |
 | **Audio I/O** | `soundfile`, `pydub` |
 | **Auth** | Per-key + internal-key API auth (`auth.py`) + JWT via mana-auth (`external_auth.py`) |
 | **VRAM** | Shared `vram_manager.py` (same module as mana-stt + mana-image-gen) |
 | **Process supervision** | Windows Scheduled Task `ManaTTS` (AtLogOn) |
 ## Port: 3022
 ## Where it runs
 | Host | Path on disk | Entrypoint |
 |------|--------------|------------|
 | Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-tts\` | `service.pyw` via Scheduled Task `ManaTTS` |
 Public URL: `https://gpu-tts.mana.how`.
 ## API Endpoints
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Liveness + which backends are loaded |
 | GET | `/models` | Available TTS models |
 | GET | `/voices` | List all voices (preset + custom) |
 | POST | `/voices` | Register a custom voice (reference audio + transcript) |
 | DELETE | `/voices/{voice_id}` | Delete a custom voice |
 | POST | `/synthesize/kokoro` | Kokoro synthesis (English presets) |
 | POST | `/synthesize` | F5-TTS voice cloning |
 | POST | `/synthesize/orpheus` | Orpheus synthesis (German, high-quality, pre-generation) |
 | POST | `/synthesize/zonos` | Zonos synthesis (multilingual, expressive, emotion control) |
 | POST | `/synthesize/auto` | Routing helper — picks the right backend for the requested voice |
 All non-health endpoints require `Authorization: Bearer <token>` (per-app key, internal key, or mana-auth JWT).
 ## Voices
 ### Kokoro-82M (English presets)
 ~300 MB download. 30+ preset English voices. Fast, no reference audio needed.
 ### Piper (German, local ONNX)
 ~63 MB per voice. 100% local, GDPR-compliant. Available:
 - `de_kerstin` (female, default)
 - `de_thorsten` (male)
 Fallback to Edge TTS cloud voices if Piper isn't loaded.
 ### Orpheus-3B German (high-quality pre-generation)
 ~8 GB VRAM. German finetune (`Kartoffel/Orpheus-3B_german_natural-v0.1`). Natural intonation, built-in speaker voices (tara, leo, emma, ...). Best quality for pre-generating static audio files. Not real-time.
 ### Zonos v0.1 (expressive multilingual)
 ~5 GB VRAM. By Zyphra, trained on 200k hours. Explicit German support. Fine-grained control: emotion (neutral/friendly/warm/curious), speaking rate, pitch variation. Can clone voices from 5s reference audio.
 ### F5-TTS (voice cloning)
 ~6 GB. Requires reference audio + transcript. Higher quality, slower. Custom voices live in `voices/` (reference audio + transcript per voice ID).
 ## Configuration (`.env` on the Windows GPU box)
 ```env
 PORT=3022
 PRELOAD_MODELS=false
 MAX_TEXT_LENGTH=1000
 REQUIRE_AUTH=true
 API_KEYS=sk-app1:app1,sk-app2:app2
 INTERNAL_API_KEY=...
 CORS_ORIGINS=https://mana.how,https://chat.mana.how
 ```
 ## Code layout
 ```
 services/mana-tts/
 ├── app/
 │   ├── __init__.py
 │   ├── main.py             # FastAPI endpoints
 │   ├── kokoro_service.py   # Kokoro (English presets)
 │   ├── piper_service.py    # Piper (German, local ONNX)
 │   ├── f5_service.py       # F5-TTS (voice cloning, CUDA)
 │   ├── orpheus_service.py  # Orpheus-3B German (high-quality)
 │   ├── zonos_service.py    # Zonos v0.1 (expressive multilingual)
 │   ├── voice_manager.py    # Custom voice registry
 │   ├── audio_utils.py      # Format conversion, resampling
 │   ├── auth.py             # API-key auth
 │   ├── external_auth.py    # JWT validation via mana-auth
 │   └── vram_manager.py     # Shared VRAM accountant
 └── service.pyw             # Windows runner (used by ManaTTS scheduled task)
 ```
 The Piper voice ONNX files live alongside the service on the GPU box (`C:\mana\services\mana-tts\piper_voices\*.onnx`) — too big to commit, downloaded once during setup.
 ## Operations
 ```powershell
 # Status
 Get-ScheduledTask -TaskName "ManaTTS" | Format-List TaskName, State
 Get-NetTCPConnection -LocalPort 3022 -State Listen
 # Restart
 Stop-ScheduledTask -TaskName "ManaTTS"
 Start-ScheduledTask -TaskName "ManaTTS"
 # Logs
 Get-Content C:\mana\services\mana-tts\service.log -Tail 50
 ```
 ## Reference
 - `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
 - `docs/PORT_SCHEMA.md` — port assignments across services
--- a/services/mana-tts/README.md
+++ b/services/mana-tts/README.md
@ -1,36 +0,0 @@
 # Mana TTS
 Text-to-Speech microservice running on the Windows GPU server (`mana-server-gpu`, RTX 3090). Wraps **Kokoro** (English presets), **Piper** (German, local ONNX), and **F5-TTS** (CUDA voice cloning).
 For architecture, deployment, configuration, and operations see [`CLAUDE.md`](./CLAUDE.md) and [`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md).
 ## Port: 3022
 ## Public URL
 `https://gpu-tts.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy)
 ## API Endpoints
 | Endpoint | Method | Description |
 |----------|--------|-------------|
 | `/health` | GET | Health check + which backends are loaded |
 | `/models` | GET | List available models |
 | `/voices` | GET | List preset + custom voices |
 | `/voices` | POST | Register a custom voice (reference audio + transcript) |
 | `/voices/{id}` | DELETE | Delete a custom voice |
 | `/synthesize/kokoro` | POST | Kokoro (English presets) |
 | `/synthesize` | POST | F5-TTS voice cloning |
 | `/synthesize/auto` | POST | Auto-select best backend for the requested voice |
 All non-health endpoints require `Authorization: Bearer <token>`.
 ## Quick Test
 ```bash
 curl -X POST https://gpu-tts.mana.how/synthesize/kokoro \
  -H "Authorization: Bearer $INTERNAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world","voice":"af_heart"}' \
  --output test.wav
 ```
--- a/services/mana-tts/app/init.py
+++ b/services/mana-tts/app/init.py
--- a/services/mana-tts/app/audio_utils.py
+++ b/services/mana-tts/app/audio_utils.py
@ -1,224 +0,0 @@
 """
 Audio conversion utilities for the TTS service.
 Handles format conversion between WAV and MP3.
 """
 import io
 import logging
 import tempfile
 from pathlib import Path
 from typing import Optional
 import numpy as np
 import soundfile as sf
 logger = logging.getLogger(__name__)
 # Supported output formats
 SUPPORTED_FORMATS = ["wav", "mp3"]
 DEFAULT_FORMAT = "wav"
 DEFAULT_SAMPLE_RATE = 24000
 def audio_to_wav_bytes(
    audio_data: np.ndarray,
    sample_rate: int = DEFAULT_SAMPLE_RATE,
 ) -> bytes:
    """
    Convert numpy audio array to WAV bytes.
    Args:
        audio_data: Audio samples as numpy array
        sample_rate: Sample rate in Hz
    Returns:
        WAV file as bytes
    """
    buffer = io.BytesIO()
    sf.write(buffer, audio_data, sample_rate, format="WAV")
    buffer.seek(0)
    return buffer.read()
 def audio_to_mp3_bytes(
    audio_data: np.ndarray,
    sample_rate: int = DEFAULT_SAMPLE_RATE,
    bitrate: str = "192k",
 ) -> bytes:
    """
    Convert numpy audio array to MP3 bytes.
    Requires ffmpeg to be installed.
    Args:
        audio_data: Audio samples as numpy array
        sample_rate: Sample rate in Hz
        bitrate: MP3 bitrate (e.g., "128k", "192k", "320k")
    Returns:
        MP3 file as bytes
    """
    try:
        from pydub import AudioSegment
    except ImportError:
        logger.error("pydub not installed, falling back to WAV")
        return audio_to_wav_bytes(audio_data, sample_rate)
    # First convert to WAV
    wav_bytes = audio_to_wav_bytes(audio_data, sample_rate)
    # Then convert to MP3 using pydub
    try:
        audio_segment = AudioSegment.from_wav(io.BytesIO(wav_bytes))
        buffer = io.BytesIO()
        audio_segment.export(buffer, format="mp3", bitrate=bitrate)
        buffer.seek(0)
        return buffer.read()
    except Exception as e:
        logger.error(f"MP3 conversion failed: {e}, falling back to WAV")
        return wav_bytes
 def convert_audio(
    audio_data: np.ndarray,
    sample_rate: int = DEFAULT_SAMPLE_RATE,
    output_format: str = DEFAULT_FORMAT,
 ) -> tuple[bytes, str]:
    """
    Convert audio data to the specified format.
    Args:
        audio_data: Audio samples as numpy array
        sample_rate: Sample rate in Hz
        output_format: Output format ("wav" or "mp3")
    Returns:
        Tuple of (audio bytes, content type)
    """
    output_format = output_format.lower()
    if output_format not in SUPPORTED_FORMATS:
        logger.warning(f"Unsupported format '{output_format}', using WAV")
        output_format = "wav"
    if output_format == "mp3":
        return audio_to_mp3_bytes(audio_data, sample_rate), "audio/mpeg"
    else:
        return audio_to_wav_bytes(audio_data, sample_rate), "audio/wav"
 def get_content_type(format: str) -> str:
    """Get MIME content type for audio format."""
    content_types = {
        "wav": "audio/wav",
        "mp3": "audio/mpeg",
    }
    return content_types.get(format.lower(), "audio/wav")
 def load_reference_audio(
    file_path: str | Path,
 ) -> tuple[np.ndarray, int]:
    """
    Load reference audio file for voice cloning.
    Args:
        file_path: Path to the audio file
    Returns:
        Tuple of (audio data as numpy array, sample rate)
    """
    audio_data, sample_rate = sf.read(file_path)
    # Convert to mono if stereo
    if len(audio_data.shape) > 1:
        audio_data = np.mean(audio_data, axis=1)
    return audio_data, sample_rate
 def resample_audio(
    audio_data: np.ndarray,
    original_sr: int,
    target_sr: int = DEFAULT_SAMPLE_RATE,
 ) -> np.ndarray:
    """
    Resample audio to target sample rate.
    Args:
        audio_data: Audio samples as numpy array
        original_sr: Original sample rate
        target_sr: Target sample rate
    Returns:
        Resampled audio data
    """
    if original_sr == target_sr:
        return audio_data
    from scipy import signal
    # Calculate resampling ratio
    num_samples = int(len(audio_data) * target_sr / original_sr)
    resampled = signal.resample(audio_data, num_samples)
    return resampled.astype(np.float32)
 def normalize_audio(
    audio_data: np.ndarray,
    target_db: float = -3.0,
 ) -> np.ndarray:
    """
    Normalize audio to target dB level.
    Args:
        audio_data: Audio samples as numpy array
        target_db: Target peak level in dB
    Returns:
        Normalized audio data
    """
    # Calculate current peak
    peak = np.max(np.abs(audio_data))
    if peak == 0:
        return audio_data
    # Calculate target peak from dB
    target_peak = 10 ** (target_db / 20)
    # Apply gain
    gain = target_peak / peak
    return audio_data * gain
 def save_temp_audio(
    audio_bytes: bytes,
    suffix: str = ".wav",
 ) -> str:
    """
    Save audio bytes to a temporary file.
    Args:
        audio_bytes: Audio data as bytes
        suffix: File extension
    Returns:
        Path to temporary file
    """
    with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
        tmp.write(audio_bytes)
        return tmp.name
 def cleanup_temp_file(file_path: str) -> None:
    """
    Clean up a temporary file.
    Args:
        file_path: Path to the file to delete
    """
    try:
        Path(file_path).unlink()
    except Exception:
        pass  # Silent cleanup failure
--- a/services/mana-tts/app/auth.py
+++ b/services/mana-tts/app/auth.py
@ -1,271 +0,0 @@
 """
 API Key Authentication for ManaCore STT Service
 Supports two authentication modes:
 1. Local API keys: Configured via environment variables
 2. External API keys: Validated via mana-core-auth service (when EXTERNAL_AUTH_ENABLED=true)
 Usage:
    # Local keys
    API_KEYS=sk-key1:name1,sk-key2:name2
    INTERNAL_API_KEY=sk-internal-xxx
    # External auth (for user-created keys via mana.how)
    EXTERNAL_AUTH_ENABLED=true
    MANA_CORE_AUTH_URL=http://localhost:3001
 """
 import os
 import time
 import logging
 from typing import Optional
 from collections import defaultdict
 from dataclasses import dataclass, field
 from fastapi import HTTPException, Security, Request
 from fastapi.security import APIKeyHeader
 from .external_auth import (
    is_external_auth_enabled,
    validate_api_key_external,
    ExternalValidationResult,
 )
 logger = logging.getLogger(__name__)
 # Configuration
 API_KEYS_ENV = os.getenv("API_KEYS", "")  # Format: "sk-key1:name1,sk-key2:name2"
 INTERNAL_API_KEY = os.getenv("INTERNAL_API_KEY", "")  # Unlimited internal key
 REQUIRE_AUTH = os.getenv("REQUIRE_AUTH", "true").lower() == "true"
 RATE_LIMIT_REQUESTS = int(os.getenv("RATE_LIMIT_REQUESTS", "60"))  # Per minute
 RATE_LIMIT_WINDOW = int(os.getenv("RATE_LIMIT_WINDOW", "60"))  # Seconds
@dataclass
 class APIKey:
    """API Key with metadata."""
    key: str
    name: str
    is_internal: bool = False
    rate_limit: int = RATE_LIMIT_REQUESTS  # Requests per window
@dataclass
 class RateLimitInfo:
    """Rate limit tracking per key."""
    requests: list = field(default_factory=list)
    def is_allowed(self, limit: int, window: int) -> bool:
        """Check if request is allowed within rate limit."""
        now = time.time()
        # Remove old requests outside window
        self.requests = [t for t in self.requests if now - t < window]
        if len(self.requests) >= limit:
            return False
        self.requests.append(now)
        return True
    def remaining(self, limit: int, window: int) -> int:
        """Get remaining requests in current window."""
        now = time.time()
        self.requests = [t for t in self.requests if now - t < window]
        return max(0, limit - len(self.requests))
 # Parse API keys from environment
 def _parse_api_keys() -> dict[str, APIKey]:
    """Parse API keys from environment variables."""
    keys = {}
    # Parse comma-separated keys
    if API_KEYS_ENV:
        for entry in API_KEYS_ENV.split(","):
            entry = entry.strip()
            if ":" in entry:
                key, name = entry.split(":", 1)
            else:
                key, name = entry, "default"
            keys[key.strip()] = APIKey(key=key.strip(), name=name.strip())
    # Add internal key with no rate limit
    if INTERNAL_API_KEY:
        keys[INTERNAL_API_KEY] = APIKey(
            key=INTERNAL_API_KEY,
            name="internal",
            is_internal=True,
            rate_limit=999999,  # Effectively unlimited
        )
    return keys
 # Global state
 _api_keys = _parse_api_keys()
 _rate_limits: dict[str, RateLimitInfo] = defaultdict(RateLimitInfo)
 # Security scheme
 api_key_header = APIKeyHeader(name="X-API-Key", auto_error=False)
@dataclass
 class AuthResult:
    """Result of authentication check."""
    authenticated: bool
    key_name: Optional[str] = None
    is_internal: bool = False
    rate_limit_remaining: Optional[int] = None
    user_id: Optional[str] = None  # Set when using external auth
 async def verify_api_key(
    request: Request,
    api_key: Optional[str] = Security(api_key_header),
 ) -> AuthResult:
    """
    Verify API key and check rate limits.
    Supports two authentication modes:
    1. External auth via mana-core-auth (for sk_live_ keys)
    2. Local auth via environment variables
    Returns AuthResult with authentication status.
    Raises HTTPException if auth fails or rate limited.
    """
    # Skip auth for health and docs endpoints
    path = request.url.path
    if path in ["/health", "/docs", "/openapi.json", "/redoc"]:
        return AuthResult(authenticated=True, key_name="public")
    # If auth not required, allow all
    if not REQUIRE_AUTH:
        return AuthResult(authenticated=True, key_name="anonymous")
    # Check for API key
    if not api_key:
        logger.warning(f"Missing API key for {path} from {request.client.host if request.client else 'unknown'}")
        raise HTTPException(
            status_code=401,
            detail="Missing API key. Provide X-API-Key header.",
            headers={"WWW-Authenticate": "ApiKey"},
        )
    # Try external auth first for sk_live_ keys (user-created keys via mana.how)
    if api_key.startswith("sk_live_") and is_external_auth_enabled():
        external_result = await validate_api_key_external(api_key, "stt")
        if external_result is not None:
            if external_result.valid:
                # Use rate limits from external auth
                rate_info = _rate_limits[api_key]
                limit = external_result.rate_limit_requests
                window = external_result.rate_limit_window
                if not rate_info.is_allowed(limit, window):
                    remaining = rate_info.remaining(limit, window)
                    logger.warning(f"Rate limit exceeded for external key")
                    raise HTTPException(
                        status_code=429,
                        detail=f"Rate limit exceeded. Try again in {window} seconds.",
                        headers={
                            "X-RateLimit-Limit": str(limit),
                            "X-RateLimit-Remaining": str(remaining),
                            "X-RateLimit-Reset": str(int(time.time()) + window),
                            "Retry-After": str(window),
                        },
                    )
                remaining = rate_info.remaining(limit, window)
                logger.debug(f"Authenticated external request from user {external_result.user_id} to {path}")
                return AuthResult(
                    authenticated=True,
                    key_name="external",
                    is_internal=False,
                    rate_limit_remaining=remaining,
                    user_id=external_result.user_id,
                )
            else:
                # External auth returned invalid
                logger.warning(f"External auth failed: {external_result.error}")
                raise HTTPException(
                    status_code=401,
                    detail=external_result.error or "Invalid API key.",
                    headers={"WWW-Authenticate": "ApiKey"},
                )
        # If external_result is None, fall through to local auth
    # Local auth: Validate key against environment variables
    if api_key not in _api_keys:
        logger.warning(f"Invalid API key attempt for {path}")
        raise HTTPException(
            status_code=401,
            detail="Invalid API key.",
            headers={"WWW-Authenticate": "ApiKey"},
        )
    key_info = _api_keys[api_key]
    # Check rate limit (skip for internal keys)
    if not key_info.is_internal:
        rate_info = _rate_limits[api_key]
        if not rate_info.is_allowed(key_info.rate_limit, RATE_LIMIT_WINDOW):
            remaining = rate_info.remaining(key_info.rate_limit, RATE_LIMIT_WINDOW)
            logger.warning(f"Rate limit exceeded for key '{key_info.name}'")
            raise HTTPException(
                status_code=429,
                detail=f"Rate limit exceeded. Try again in {RATE_LIMIT_WINDOW} seconds.",
                headers={
                    "X-RateLimit-Limit": str(key_info.rate_limit),
                    "X-RateLimit-Remaining": str(remaining),
                    "X-RateLimit-Reset": str(int(time.time()) + RATE_LIMIT_WINDOW),
                    "Retry-After": str(RATE_LIMIT_WINDOW),
                },
            )
        remaining = rate_info.remaining(key_info.rate_limit, RATE_LIMIT_WINDOW)
    else:
        remaining = None
    logger.debug(f"Authenticated request from '{key_info.name}' to {path}")
    return AuthResult(
        authenticated=True,
        key_name=key_info.name,
        is_internal=key_info.is_internal,
        rate_limit_remaining=remaining,
    )
 def get_api_key_stats() -> dict:
    """Get statistics about API keys (for admin endpoint)."""
    stats = {
        "total_keys": len(_api_keys),
        "auth_required": REQUIRE_AUTH,
        "rate_limit": {
            "requests_per_window": RATE_LIMIT_REQUESTS,
            "window_seconds": RATE_LIMIT_WINDOW,
        },
        "keys": [],
    }
    for key, info in _api_keys.items():
        # Don't expose actual keys, just metadata
        masked_key = key[:8] + "..." if len(key) > 8 else "***"
        rate_info = _rate_limits.get(key, RateLimitInfo())
        stats["keys"].append({
            "name": info.name,
            "key_prefix": masked_key,
            "is_internal": info.is_internal,
            "requests_in_window": len(rate_info.requests),
            "remaining": rate_info.remaining(info.rate_limit, RATE_LIMIT_WINDOW),
        })
    return stats
 def reload_api_keys():
    """Reload API keys from environment (for runtime updates)."""
    global _api_keys
    _api_keys = _parse_api_keys()
    logger.info(f"Reloaded {len(_api_keys)} API keys")
--- a/services/mana-tts/app/external_auth.py
+++ b/services/mana-tts/app/external_auth.py
@ -1,145 +0,0 @@
 """
 External API Key Validation via mana-core-auth
 When EXTERNAL_AUTH_ENABLED=true, API keys are validated against the
 central mana-core-auth service. This allows users to create and manage
 API keys from the mana.how web interface.
 Results are cached for 5 minutes to reduce load on the auth service.
 """
 import os
 import time
 import logging
 import httpx
 from typing import Optional
 from dataclasses import dataclass
 logger = logging.getLogger(__name__)
 # Configuration
 EXTERNAL_AUTH_ENABLED = os.getenv("EXTERNAL_AUTH_ENABLED", "false").lower() == "true"
 MANA_CORE_AUTH_URL = os.getenv("MANA_CORE_AUTH_URL", "http://localhost:3001")
 API_KEY_CACHE_TTL = int(os.getenv("API_KEY_CACHE_TTL", "300"))  # 5 minutes
 EXTERNAL_AUTH_TIMEOUT = float(os.getenv("EXTERNAL_AUTH_TIMEOUT", "5.0"))  # seconds
@dataclass
 class ExternalValidationResult:
    """Result from external API key validation."""
    valid: bool
    user_id: Optional[str] = None
    scopes: Optional[list] = None
    rate_limit_requests: int = 60
    rate_limit_window: int = 60
    error: Optional[str] = None
    cached_at: float = 0.0
 # In-memory cache for validation results
 # Key: API key, Value: ExternalValidationResult
 _validation_cache: dict[str, ExternalValidationResult] = {}
 def is_external_auth_enabled() -> bool:
    """Check if external authentication is enabled."""
    return EXTERNAL_AUTH_ENABLED
 def _get_cached_result(api_key: str) -> Optional[ExternalValidationResult]:
    """Get cached validation result if still valid."""
    result = _validation_cache.get(api_key)
    if result and (time.time() - result.cached_at) < API_KEY_CACHE_TTL:
        return result
    return None
 def _cache_result(api_key: str, result: ExternalValidationResult):
    """Cache a validation result."""
    result.cached_at = time.time()
    _validation_cache[api_key] = result
    # Clean up old entries periodically (keep cache size manageable)
    if len(_validation_cache) > 1000:
        now = time.time()
        expired_keys = [
            k for k, v in _validation_cache.items()
            if (now - v.cached_at) >= API_KEY_CACHE_TTL
        ]
        for k in expired_keys:
            del _validation_cache[k]
 async def validate_api_key_external(api_key: str, scope: str) -> Optional[ExternalValidationResult]:
    """
    Validate an API key against mana-core-auth service.
    Args:
        api_key: The API key to validate (e.g., "sk_live_...")
        scope: The required scope (e.g., "stt" or "tts")
    Returns:
        ExternalValidationResult if external auth is enabled and the key was validated.
        None if external auth is disabled or the service is unavailable (fallback to local).
    """
    if not EXTERNAL_AUTH_ENABLED:
        return None
    # Check cache first
    cached = _get_cached_result(api_key)
    if cached:
        logger.debug(f"Using cached validation result for key prefix: {api_key[:12]}...")
        # Check scope against cached result
        if cached.valid and cached.scopes and scope not in cached.scopes:
            return ExternalValidationResult(
                valid=False,
                error=f"API key does not have scope: {scope}",
            )
        return cached
    # Call mana-core-auth validation endpoint
    try:
        async with httpx.AsyncClient(timeout=EXTERNAL_AUTH_TIMEOUT) as client:
            response = await client.post(
                f"{MANA_CORE_AUTH_URL}/api/v1/api-keys/validate",
                json={"apiKey": api_key, "scope": scope},
            )
            if response.status_code == 200:
                data = response.json()
                result = ExternalValidationResult(
                    valid=data.get("valid", False),
                    user_id=data.get("userId"),
                    scopes=data.get("scopes", []),
                    rate_limit_requests=data.get("rateLimit", {}).get("requests", 60),
                    rate_limit_window=data.get("rateLimit", {}).get("window", 60),
                    error=data.get("error"),
                )
                _cache_result(api_key, result)
                return result
            else:
                logger.warning(
                    f"External auth returned status {response.status_code}: {response.text}"
                )
                # Don't cache errors - allow retry
                return ExternalValidationResult(
                    valid=False,
                    error=f"Auth service returned {response.status_code}",
                )
    except httpx.TimeoutException:
        logger.warning("External auth service timeout - falling back to local auth")
        return None
    except httpx.ConnectError:
        logger.warning("Cannot connect to external auth service - falling back to local auth")
        return None
    except Exception as e:
        logger.error(f"External auth error: {e}")
        return None
 def clear_cache():
    """Clear the validation cache (for testing or runtime updates)."""
    global _validation_cache
    _validation_cache.clear()
    logger.info("External auth cache cleared")
--- a/services/mana-tts/app/f5_service.py
+++ b/services/mana-tts/app/f5_service.py
@ -1,178 +0,0 @@
 """
 F5-TTS Service for voice cloning synthesis.
 CUDA version using f5-tts PyTorch package.
 """
 import logging
 import os
 import tempfile
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Optional
 import numpy as np
 logger = logging.getLogger(__name__)
 # Global singleton for lazy initialization
 _f5_api = None
 # Default model
 DEFAULT_F5_MODEL = os.getenv("F5_MODEL", "F5-TTS")
 # Default generation parameters
 DEFAULT_STEPS = 32
 DEFAULT_CFG_STRENGTH = 2.0
 DEFAULT_SWAY_COEF = -1.0
 DEFAULT_SPEED = 1.0
@dataclass
 class F5Result:
    """Result from F5-TTS synthesis."""
    audio: np.ndarray
    sample_rate: int
    duration: float
    voice_id: Optional[str] = None
 def get_f5_model(model_name: str = DEFAULT_F5_MODEL):
    """Get or create F5-TTS API instance (singleton pattern)."""
    global _f5_api
    if _f5_api is not None:
        return _f5_api
    logger.info(f"Loading F5-TTS model: {model_name}")
    try:
        from f5_tts.api import F5TTS
        _f5_api = F5TTS(model_type="F5-TTS")
        logger.info("F5-TTS model loaded successfully (CUDA)")
        return _f5_api
    except ImportError as e:
        logger.error(f"Failed to import f5_tts: {e}")
        raise RuntimeError(
            "f5-tts not installed. Run: pip install f5-tts"
        )
    except Exception as e:
        logger.error(f"Failed to load F5-TTS model: {e}")
        raise
 def is_f5_loaded() -> bool:
    """Check if F5-TTS model is currently loaded."""
    return _f5_api is not None
 async def synthesize_f5(
    text: str,
    reference_audio_path: str,
    reference_text: str,
    duration: Optional[float] = None,
    steps: int = DEFAULT_STEPS,
    cfg_strength: float = DEFAULT_CFG_STRENGTH,
    sway_coef: float = DEFAULT_SWAY_COEF,
    speed: float = DEFAULT_SPEED,
    model_name: str = DEFAULT_F5_MODEL,
 ) -> F5Result:
    """
    Synthesize speech using F5-TTS with voice cloning.
    Args:
        text: Text to synthesize
        reference_audio_path: Path to reference audio file
        reference_text: Transcript of the reference audio
        duration: Target duration in seconds (auto-calculated if None)
        steps: Number of diffusion steps
        cfg_strength: Classifier-free guidance strength
        sway_coef: Sway sampling coefficient
        speed: Speech speed multiplier
        model_name: Model identifier
    Returns:
        F5Result with audio data
    """
    import asyncio
    api = get_f5_model(model_name)
    logger.info(
        f"Synthesizing with F5-TTS: text_length={len(text)}, "
        f"ref_audio={reference_audio_path}, steps={steps}"
    )
    try:
        # F5-TTS API infer method (runs synchronously, offload to thread)
        loop = asyncio.get_event_loop()
        def _generate():
            wav, sr, _ = api.infer(
                ref_file=reference_audio_path,
                ref_text=reference_text,
                gen_text=text,
                nfe_step=steps,
                cfg_strength=cfg_strength,
                sway_sampling_coeff=sway_coef,
                speed=speed,
            )
            return wav, sr
        audio, sample_rate = await loop.run_in_executor(None, _generate)
        # Convert to numpy if needed
        if not isinstance(audio, np.ndarray):
            audio = np.array(audio, dtype=np.float32)
        # Calculate duration
        audio_duration = len(audio) / sample_rate
        logger.info(f"F5-TTS synthesis complete: duration={audio_duration:.2f}s")
        return F5Result(
            audio=audio,
            sample_rate=sample_rate,
            duration=audio_duration,
        )
    except Exception as e:
        logger.error(f"F5-TTS synthesis failed: {e}")
        raise RuntimeError(f"Voice cloning synthesis failed: {e}")
 async def synthesize_f5_from_bytes(
    text: str,
    reference_audio_bytes: bytes,
    reference_text: str,
    audio_extension: str = ".wav",
    **kwargs,
 ) -> F5Result:
    """Synthesize speech using F5-TTS with reference audio as bytes."""
    with tempfile.NamedTemporaryFile(suffix=audio_extension, delete=False) as tmp:
        tmp.write(reference_audio_bytes)
        tmp_path = tmp.name
    try:
        result = await synthesize_f5(
            text=text,
            reference_audio_path=tmp_path,
            reference_text=reference_text,
            **kwargs,
        )
        return result
    finally:
        try:
            Path(tmp_path).unlink()
        except Exception:
            pass
 def estimate_duration(text: str, speed: float = 1.0) -> float:
    """Estimate audio duration from text."""
    words = len(text) / 5
    minutes = words / 150
    seconds = minutes * 60
    return seconds / speed
--- a/services/mana-tts/app/kokoro_service.py
+++ b/services/mana-tts/app/kokoro_service.py
@ -1,165 +0,0 @@
 """
 Kokoro TTS Service for fast preset voice synthesis.
 CUDA version using kokoro PyTorch package.
 """
 import logging
 from dataclasses import dataclass
 from typing import Optional
 import numpy as np
 logger = logging.getLogger(__name__)
 # Global singleton for lazy initialization
 _kokoro_pipeline = None
 # Default model
 DEFAULT_KOKORO_MODEL = "hexgrad/Kokoro-82M"
 # Available Kokoro voices (American Female/Male, British Female/Male)
 KOKORO_VOICES = {
    # American Female voices
    "af_heart": "American Female - Heart (warm, emotional)",
    "af_alloy": "American Female - Alloy (neutral, professional)",
    "af_aoede": "American Female - Aoede (clear, articulate)",
    "af_bella": "American Female - Bella (friendly, approachable)",
    "af_jessica": "American Female - Jessica (confident, clear)",
    "af_kore": "American Female - Kore (calm, measured)",
    "af_nicole": "American Female - Nicole (bright, energetic)",
    "af_nova": "American Female - Nova (modern, dynamic)",
    "af_river": "American Female - River (smooth, flowing)",
    "af_sarah": "American Female - Sarah (warm, conversational)",
    "af_sky": "American Female - Sky (light, airy)",
    # American Male voices
    "am_adam": "American Male - Adam (deep, authoritative)",
    "am_echo": "American Male - Echo (resonant, clear)",
    "am_eric": "American Male - Eric (professional, neutral)",
    "am_fenrir": "American Male - Fenrir (strong, commanding)",
    "am_liam": "American Male - Liam (friendly, casual)",
    "am_michael": "American Male - Michael (warm, trustworthy)",
    "am_onyx": "American Male - Onyx (deep, smooth)",
    "am_puck": "American Male - Puck (playful, light)",
    # British Female voices
    "bf_alice": "British Female - Alice (refined, elegant)",
    "bf_emma": "British Female - Emma (clear, professional)",
    "bf_isabella": "British Female - Isabella (sophisticated, warm)",
    "bf_lily": "British Female - Lily (soft, gentle)",
    # British Male voices
    "bm_daniel": "British Male - Daniel (classic, authoritative)",
    "bm_fable": "British Male - Fable (storyteller, expressive)",
    "bm_george": "British Male - George (traditional, clear)",
    "bm_lewis": "British Male - Lewis (modern, approachable)",
 }
 DEFAULT_VOICE = "af_heart"
@dataclass
 class KokoroResult:
    """Result from Kokoro TTS synthesis."""
    audio: np.ndarray
    sample_rate: int
    voice: str
    duration: float
 def get_kokoro_model(model_name: str = DEFAULT_KOKORO_MODEL):
    """Get or create Kokoro pipeline instance (singleton pattern)."""
    global _kokoro_pipeline
    if _kokoro_pipeline is not None:
        return _kokoro_pipeline
    logger.info(f"Loading Kokoro model: {model_name}")
    try:
        from kokoro import KPipeline
        _kokoro_pipeline = KPipeline(lang_code="a")  # 'a' for American English
        logger.info("Kokoro pipeline loaded successfully")
        return _kokoro_pipeline
    except ImportError as e:
        logger.error(f"Failed to import kokoro: {e}")
        raise RuntimeError(
            "kokoro not installed. Run: pip install kokoro"
        )
    except Exception as e:
        logger.error(f"Failed to load Kokoro model: {e}")
        raise
 def is_kokoro_loaded() -> bool:
    """Check if Kokoro model is currently loaded."""
    return _kokoro_pipeline is not None
 def get_available_voices() -> dict[str, str]:
    """Get dictionary of available Kokoro voices."""
    return KOKORO_VOICES.copy()
 async def synthesize_kokoro(
    text: str,
    voice: str = DEFAULT_VOICE,
    speed: float = 1.0,
    model_name: str = DEFAULT_KOKORO_MODEL,
 ) -> KokoroResult:
    """
    Synthesize speech using Kokoro TTS.
    Args:
        text: Text to synthesize
        voice: Voice ID from KOKORO_VOICES
        speed: Speech speed multiplier (0.5-2.0)
        model_name: Model identifier
    Returns:
        KokoroResult with audio data
    """
    # Validate voice
    if voice not in KOKORO_VOICES:
        logger.warning(f"Unknown voice '{voice}', using default '{DEFAULT_VOICE}'")
        voice = DEFAULT_VOICE
    # Clamp speed to valid range
    speed = max(0.5, min(2.0, speed))
    # Get model
    pipeline = get_kokoro_model(model_name)
    logger.info(f"Synthesizing with Kokoro: voice={voice}, speed={speed}, text_length={len(text)}")
    try:
        # Generate audio using kokoro pipeline
        audio_chunks = []
        sample_rate = 24000  # Kokoro default
        for result in pipeline(text, voice=voice, speed=speed):
            # result is a KPipelineResult with .audio (tensor) and .graphemes/.phonemes
            audio_np = result.audio.numpy()
            audio_chunks.append(audio_np)
        # Concatenate all chunks
        if audio_chunks:
            full_audio = np.concatenate(audio_chunks)
        else:
            raise RuntimeError("No audio generated")
        # Calculate duration from audio length
        total_duration = len(full_audio) / sample_rate
        logger.info(f"Kokoro synthesis complete: duration={total_duration:.2f}s")
        return KokoroResult(
            audio=full_audio,
            sample_rate=sample_rate,
            voice=voice,
            duration=total_duration,
        )
    except Exception as e:
        logger.error(f"Kokoro synthesis failed: {e}")
        raise RuntimeError(f"TTS synthesis failed: {e}")
--- a/services/mana-tts/app/main.py
+++ b/services/mana-tts/app/main.py
@ -1,844 +0,0 @@
 """
 Mana TTS - Text-to-Speech Microservice
 Provides TTS synthesis using:
 - Kokoro: Fast preset voices
 - F5-TTS: Voice cloning with reference audio
 Optimized for Apple Silicon (MLX).
 """
 import logging
 import os
 from contextlib import asynccontextmanager
 from pathlib import Path
 from typing import Optional
 from fastapi import FastAPI, HTTPException, UploadFile, File, Form, Response, Depends
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, Field
 from .auth import verify_api_key, AuthResult, REQUIRE_AUTH
 from .audio_utils import convert_audio, SUPPORTED_FORMATS, cleanup_temp_file, save_temp_audio
 from .kokoro_service import (
    synthesize_kokoro,
    get_kokoro_model,
    is_kokoro_loaded,
    KOKORO_VOICES,
    DEFAULT_VOICE as DEFAULT_KOKORO_VOICE,
    DEFAULT_KOKORO_MODEL,
 )
 from .f5_service import (
    synthesize_f5,
    synthesize_f5_from_bytes,
    get_f5_model,
    is_f5_loaded,
    DEFAULT_F5_MODEL,
 )
 from .voice_manager import get_voice_manager, CustomVoice
 from .piper_service import (
    synthesize_piper,
    PIPER_VOICES,
    is_piper_loaded,
 )
 from .orpheus_service import (
    synthesize_orpheus,
    is_orpheus_loaded,
    ORPHEUS_VOICES,
    DEFAULT_VOICE as DEFAULT_ORPHEUS_VOICE,
 )
 from .zonos_service import (
    synthesize_zonos,
    is_zonos_loaded,
    EMOTION_PRESETS as ZONOS_EMOTIONS,
 )
 # Configure logging
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
 )
 logger = logging.getLogger(__name__)
 # Configuration from environment
 PORT = int(os.getenv("PORT", "3022"))
 PRELOAD_MODELS = os.getenv("PRELOAD_MODELS", "false").lower() == "true"
 MAX_TEXT_LENGTH = int(os.getenv("MAX_TEXT_LENGTH", "1000"))
 CORS_ORIGINS = os.getenv(
    "CORS_ORIGINS",
    "https://mana.how,https://chat.mana.how,https://todo.mana.how,http://localhost:5173",
 ).split(",")
 # Supported audio extensions for uploads
 SUPPORTED_AUDIO_EXTENSIONS = {".wav", ".mp3", ".m4a", ".flac", ".ogg"}
@asynccontextmanager
 async def lifespan(app: FastAPI):
    """Application lifespan manager for startup/shutdown."""
    logger.info(f"Starting Mana TTS service on port {PORT}")
    # Initialize voice manager (scans voices directory)
    voice_manager = get_voice_manager()
    logger.info(f"Voice manager initialized with {len(voice_manager.list_voices())} custom voices")
    if PRELOAD_MODELS:
        logger.info("Pre-loading models (PRELOAD_MODELS=true)...")
        try:
            get_kokoro_model()
            logger.info("Kokoro model pre-loaded")
        except Exception as e:
            logger.warning(f"Failed to pre-load Kokoro: {e}")
        try:
            get_f5_model()
            logger.info("F5-TTS model pre-loaded")
        except Exception as e:
            logger.warning(f"Failed to pre-load F5-TTS: {e}")
    else:
        logger.info("Models will be loaded on first request (lazy loading)")
    yield
    logger.info("Shutting down Mana TTS service")
 # Create FastAPI app
 app = FastAPI(
    title="Mana TTS",
    description="Text-to-Speech service with voice cloning support",
    version="1.0.0",
    lifespan=lifespan,
 )
 # CORS middleware
 app.add_middleware(
    CORSMiddleware,
    allow_origins=CORS_ORIGINS,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
 )
 # ============================================================================
 # Request/Response Models
 # ============================================================================
 class KokoroRequest(BaseModel):
    """Request for Kokoro TTS synthesis."""
    text: str = Field(..., description="Text to synthesize", max_length=5000)
    voice: str = Field(DEFAULT_KOKORO_VOICE, description="Voice ID")
    speed: float = Field(1.0, ge=0.5, le=2.0, description="Speech speed")
    output_format: str = Field("wav", description="Output format (wav, mp3)")
 class AutoRequest(BaseModel):
    """Request for auto-selection TTS synthesis."""
    text: str = Field(..., description="Text to synthesize", max_length=5000)
    voice: Optional[str] = Field(None, description="Voice ID (Kokoro preset or registered)")
    speed: float = Field(1.0, ge=0.5, le=2.0, description="Speech speed")
    output_format: str = Field("wav", description="Output format (wav, mp3)")
 class RegisterVoiceRequest(BaseModel):
    """Request to register a new custom voice."""
    voice_id: str = Field(..., description="Unique voice identifier", min_length=2, max_length=50)
    name: str = Field(..., description="Display name")
    description: str = Field("", description="Voice description")
    transcript: str = Field(..., description="Transcript of the reference audio")
 class HealthResponse(BaseModel):
    """Health check response."""
    status: str
    service: str
    models_loaded: dict
    auth_required: bool
 class ModelsResponse(BaseModel):
    """Available models response."""
    kokoro: dict
    f5: dict
 class VoiceInfo(BaseModel):
    """Voice information."""
    id: str
    name: str
    description: str
    type: str  # "kokoro" or "f5_custom"
 class VoicesResponse(BaseModel):
    """Available voices response."""
    kokoro_voices: list[VoiceInfo]
    custom_voices: list[VoiceInfo]
 class VoiceRegisteredResponse(BaseModel):
    """Response after registering a voice."""
    voice_id: str
    message: str
 class VoiceDeletedResponse(BaseModel):
    """Response after deleting a voice."""
    voice_id: str
    message: str
 # ============================================================================
 # Health & Info Endpoints
 # ============================================================================
@app.get("/health", response_model=HealthResponse)
 async def health_check():
    """Check service health and model status."""
    return HealthResponse(
        status="healthy",
        service="mana-tts",
        models_loaded={
            "kokoro": is_kokoro_loaded(),
            "f5": is_f5_loaded(),
            "orpheus": is_orpheus_loaded(),
            "zonos": is_zonos_loaded(),
        },
        auth_required=REQUIRE_AUTH,
    )
@app.get("/models", response_model=ModelsResponse)
 async def get_models(auth: AuthResult = Depends(verify_api_key)):
    """Get information about available models."""
    return ModelsResponse(
        kokoro={
            "name": "Kokoro-82M",
            "description": "Fast TTS with preset voices",
            "model_id": DEFAULT_KOKORO_MODEL,
            "loaded": is_kokoro_loaded(),
            "voice_count": len(KOKORO_VOICES),
        },
        f5={
            "name": "F5-TTS",
            "description": "Voice cloning with reference audio",
            "model_id": DEFAULT_F5_MODEL,
            "loaded": is_f5_loaded(),
            "supports_cloning": True,
        },
    )
 # ============================================================================
 # Voice Management Endpoints
 # ============================================================================
@app.get("/voices", response_model=VoicesResponse)
 async def get_voices(auth: AuthResult = Depends(verify_api_key)):
    """Get all available voices."""
    # Kokoro preset voices
    kokoro_voices = [
        VoiceInfo(
            id=voice_id,
            name=voice_id,
            description=description,
            type="kokoro",
        )
        for voice_id, description in KOKORO_VOICES.items()
    ]
    # Custom voices from voice manager
    voice_manager = get_voice_manager()
    custom_voices = [
        VoiceInfo(
            id=voice.id,
            name=voice.name,
            description=voice.description,
            type="f5_custom",
        )
        for voice in voice_manager.list_voices()
    ]
    return VoicesResponse(
        kokoro_voices=kokoro_voices,
        custom_voices=custom_voices,
    )
@app.post("/voices", response_model=VoiceRegisteredResponse)
 async def register_voice(
    voice_id: str = Form(..., description="Unique voice identifier"),
    name: str = Form(..., description="Display name"),
    description: str = Form("", description="Voice description"),
    transcript: str = Form(..., description="Transcript of the reference audio"),
    reference_audio: UploadFile = File(..., description="Reference audio file"),
    auth: AuthResult = Depends(verify_api_key),
 ):
    """
    Register a new custom voice for F5-TTS voice cloning.
    Requires:
    - Reference audio file (WAV, MP3, M4A, FLAC, OGG)
    - Transcript of what is said in the audio
    """
    # Validate file extension
    if reference_audio.filename:
        ext = Path(reference_audio.filename).suffix.lower()
        if ext not in SUPPORTED_AUDIO_EXTENSIONS:
            raise HTTPException(
                status_code=400,
                detail=f"Unsupported audio format. Use one of: {SUPPORTED_AUDIO_EXTENSIONS}",
            )
    else:
        ext = ".wav"
    # Read audio bytes
    audio_bytes = await reference_audio.read()
    if len(audio_bytes) == 0:
        raise HTTPException(status_code=400, detail="Audio file is empty")
    if len(audio_bytes) > 50 * 1024 * 1024:  # 50 MB limit
        raise HTTPException(status_code=400, detail="Audio file too large (max 50 MB)")
    # Register voice
    voice_manager = get_voice_manager()
    try:
        voice_manager.register_voice(
            voice_id=voice_id,
            name=name,
            description=description,
            audio_bytes=audio_bytes,
            transcript=transcript,
            audio_extension=ext,
        )
    except ValueError as e:
        raise HTTPException(status_code=400, detail=str(e))
    return VoiceRegisteredResponse(
        voice_id=voice_id,
        message=f"Voice '{voice_id}' registered successfully",
    )
@app.delete("/voices/{voice_id}", response_model=VoiceDeletedResponse)
 async def delete_voice(voice_id: str, auth: AuthResult = Depends(verify_api_key)):
    """Delete a registered custom voice."""
    voice_manager = get_voice_manager()
    if not voice_manager.delete_voice(voice_id):
        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
    return VoiceDeletedResponse(
        voice_id=voice_id,
        message=f"Voice '{voice_id}' deleted successfully",
    )
 # ============================================================================
 # Kokoro TTS Endpoint
 # ============================================================================
@app.post("/synthesize/kokoro")
 async def synthesize_with_kokoro(
    request: KokoroRequest,
    auth: AuthResult = Depends(verify_api_key),
 ):
    """
    Synthesize speech using Kokoro with preset voices.
    Fast synthesis with high-quality preset voices.
    """
    # Validate text length
    if len(request.text) > MAX_TEXT_LENGTH:
        raise HTTPException(
            status_code=400,
            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
        )
    if not request.text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty")
    # Validate output format
    output_format = request.output_format.lower()
    if output_format not in SUPPORTED_FORMATS:
        raise HTTPException(
            status_code=400,
            detail=f"Unsupported format. Use one of: {SUPPORTED_FORMATS}",
        )
    try:
        # Synthesize
        result = await synthesize_kokoro(
            text=request.text,
            voice=request.voice,
            speed=request.speed,
        )
        # Convert to requested format
        audio_bytes, content_type = convert_audio(
            result.audio,
            result.sample_rate,
            output_format,
        )
        # Return audio response
        return Response(
            content=audio_bytes,
            media_type=content_type,
            headers={
                "X-Voice": result.voice,
                "X-Duration": str(result.duration),
                "X-Sample-Rate": str(result.sample_rate),
            },
        )
    except RuntimeError as e:
        raise HTTPException(status_code=500, detail=str(e))
    except Exception as e:
        logger.error(f"Kokoro synthesis error: {e}")
        raise HTTPException(status_code=500, detail=f"Synthesis failed: {e}")
 # ============================================================================
 # F5-TTS Endpoint
 # ============================================================================
@app.post("/synthesize")
 async def synthesize_with_f5(
    text: str = Form(..., description="Text to synthesize"),
    voice_id: Optional[str] = Form(None, description="Registered voice ID"),
    reference_audio: Optional[UploadFile] = File(None, description="Reference audio for cloning"),
    reference_text: Optional[str] = Form(None, description="Transcript of reference audio"),
    output_format: str = Form("wav", description="Output format (wav, mp3)"),
    speed: float = Form(1.0, ge=0.5, le=2.0, description="Speech speed"),
    steps: int = Form(32, ge=8, le=64, description="Diffusion steps"),
    auth: AuthResult = Depends(verify_api_key),
 ):
    """
    Synthesize speech using F5-TTS with voice cloning.
    Provide either:
    - voice_id: Use a pre-registered voice
    - reference_audio + reference_text: Clone voice from audio sample
    """
    # Validate text
    if len(text) > MAX_TEXT_LENGTH:
        raise HTTPException(
            status_code=400,
            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
        )
    if not text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty")
    # Validate output format
    output_format = output_format.lower()
    if output_format not in SUPPORTED_FORMATS:
        raise HTTPException(
            status_code=400,
            detail=f"Unsupported format. Use one of: {SUPPORTED_FORMATS}",
        )
    voice_manager = get_voice_manager()
    ref_audio_path: Optional[str] = None
    ref_text: Optional[str] = None
    temp_file_path: Optional[str] = None
    try:
        # Option 1: Use registered voice
        if voice_id:
            voice = voice_manager.get_voice(voice_id)
            if not voice:
                raise HTTPException(
                    status_code=404,
                    detail=f"Voice '{voice_id}' not found. Register it first or provide reference audio.",
                )
            ref_audio_path = voice.audio_path
            ref_text = voice.transcript
        # Option 2: Use uploaded reference audio
        elif reference_audio and reference_text:
            # Get file extension
            ext = ".wav"
            if reference_audio.filename:
                ext = Path(reference_audio.filename).suffix.lower()
                if ext not in SUPPORTED_AUDIO_EXTENSIONS:
                    raise HTTPException(
                        status_code=400,
                        detail=f"Unsupported audio format. Use one of: {SUPPORTED_AUDIO_EXTENSIONS}",
                    )
            # Read and save to temp file
            audio_bytes = await reference_audio.read()
            if len(audio_bytes) == 0:
                raise HTTPException(status_code=400, detail="Reference audio is empty")
            temp_file_path = save_temp_audio(audio_bytes, suffix=ext)
            ref_audio_path = temp_file_path
            ref_text = reference_text
        else:
            raise HTTPException(
                status_code=400,
                detail="Provide either voice_id or reference_audio + reference_text",
            )
        # Synthesize with F5-TTS
        result = await synthesize_f5(
            text=text,
            reference_audio_path=ref_audio_path,
            reference_text=ref_text,
            speed=speed,
            steps=steps,
        )
        # Convert to requested format
        audio_bytes, content_type = convert_audio(
            result.audio,
            result.sample_rate,
            output_format,
        )
        # Return audio response
        return Response(
            content=audio_bytes,
            media_type=content_type,
            headers={
                "X-Model": "f5-tts",
                "X-Voice-ID": voice_id or "custom",
                "X-Duration": str(result.duration),
                "X-Sample-Rate": str(result.sample_rate),
            },
        )
    except HTTPException:
        raise
    except RuntimeError as e:
        raise HTTPException(status_code=500, detail=str(e))
    except Exception as e:
        logger.error(f"F5-TTS synthesis error: {e}")
        raise HTTPException(status_code=500, detail=f"Voice cloning synthesis failed: {e}")
    finally:
        # Clean up temp file
        if temp_file_path:
            cleanup_temp_file(temp_file_path)
 # ============================================================================
 # Orpheus TTS Endpoint (German, high-quality)
 # ============================================================================
 class OrpheusRequest(BaseModel):
    """Request for Orpheus TTS synthesis."""
    text: str = Field(..., description="Text to synthesize (German)", max_length=5000)
    voice: str = Field(DEFAULT_ORPHEUS_VOICE, description="Speaker voice")
    output_format: str = Field("wav", description="Output format (wav, mp3)")
    temperature: float = Field(0.6, ge=0.1, le=1.5, description="Sampling temperature")
@app.post("/synthesize/orpheus")
 async def synthesize_with_orpheus(
    request: OrpheusRequest,
    auth: AuthResult = Depends(verify_api_key),
 ):
    """
    Synthesize German speech using Orpheus TTS.
    High-quality German synthesis with natural intonation.
    Not optimized for real-time — designed for pre-generation.
    """
    if not request.text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty")
    if len(request.text) > MAX_TEXT_LENGTH:
        raise HTTPException(
            status_code=400,
            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
        )
    output_format = request.output_format.lower()
    if output_format not in SUPPORTED_FORMATS:
        raise HTTPException(
            status_code=400,
            detail=f"Unsupported format. Use one of: {SUPPORTED_FORMATS}",
        )
    try:
        result = await synthesize_orpheus(
            text=request.text,
            voice=request.voice,
            temperature=request.temperature,
        )
        audio_bytes, content_type = convert_audio(
            result.audio,
            result.sample_rate,
            output_format,
        )
        return Response(
            content=audio_bytes,
            media_type=content_type,
            headers={
                "X-Model": "orpheus-german",
                "X-Voice": result.voice,
                "X-Duration": str(result.duration),
                "X-Sample-Rate": str(result.sample_rate),
            },
        )
    except RuntimeError as e:
        raise HTTPException(status_code=500, detail=str(e))
    except Exception as e:
        logger.error(f"Orpheus synthesis error: {e}")
        raise HTTPException(status_code=500, detail=f"Orpheus synthesis failed: {e}")
 # ============================================================================
 # Zonos TTS Endpoint (Multilingual, expressive)
 # ============================================================================
 class ZonosRequest(BaseModel):
    """Request for Zonos TTS synthesis."""
    text: str = Field(..., description="Text to synthesize", max_length=5000)
    language: str = Field("de", description="Language code")
    emotion: str = Field("friendly", description="Emotion preset: neutral, friendly, warm, curious")
    speaking_rate: float = Field(13.0, ge=5.0, le=25.0, description="Phonemes per second")
    pitch_std: float = Field(20.0, ge=5.0, le=50.0, description="Pitch variation in Hz")
    output_format: str = Field("wav", description="Output format (wav, mp3)")
@app.post("/synthesize/zonos")
 async def synthesize_with_zonos(
    request: ZonosRequest,
    auth: AuthResult = Depends(verify_api_key),
 ):
    """
    Synthesize speech using Zonos TTS by Zyphra.
    Expressive multilingual synthesis with emotion control.
    Trained on 200k hours — explicit German support.
    """
    if not request.text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty")
    if len(request.text) > MAX_TEXT_LENGTH:
        raise HTTPException(
            status_code=400,
            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
        )
    output_format = request.output_format.lower()
    if output_format not in SUPPORTED_FORMATS:
        raise HTTPException(
            status_code=400,
            detail=f"Unsupported format. Use one of: {SUPPORTED_FORMATS}",
        )
    if request.emotion not in ZONOS_EMOTIONS:
        raise HTTPException(
            status_code=400,
            detail=f"Unknown emotion. Use one of: {list(ZONOS_EMOTIONS.keys())}",
        )
    try:
        result = await synthesize_zonos(
            text=request.text,
            language=request.language,
            emotion=request.emotion,
            speaking_rate=request.speaking_rate,
            pitch_std=request.pitch_std,
        )
        audio_bytes, content_type = convert_audio(
            result.audio,
            result.sample_rate,
            output_format,
        )
        return Response(
            content=audio_bytes,
            media_type=content_type,
            headers={
                "X-Model": "zonos-v0.1",
                "X-Emotion": result.emotion,
                "X-Duration": str(result.duration),
                "X-Sample-Rate": str(result.sample_rate),
            },
        )
    except RuntimeError as e:
        raise HTTPException(status_code=500, detail=str(e))
    except Exception as e:
        logger.error(f"Zonos synthesis error: {e}")
        raise HTTPException(status_code=500, detail=f"Zonos synthesis failed: {e}")
 # ============================================================================
 # Auto-Selection Endpoint
 # ============================================================================
@app.post("/synthesize/auto")
 async def synthesize_auto(
    request: AutoRequest,
    auth: AuthResult = Depends(verify_api_key),
 ):
    """
    Auto-select the best TTS model based on voice parameter.
    - If voice is a Kokoro preset: Use Kokoro
    - If voice is a registered custom voice: Use F5-TTS
    - If no voice specified: Use Kokoro with default voice
    """
    # Validate text
    if len(request.text) > MAX_TEXT_LENGTH:
        raise HTTPException(
            status_code=400,
            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
        )
    if not request.text.strip():
        raise HTTPException(status_code=400, detail="Text cannot be empty")
    # Determine which model to use
    voice = request.voice or DEFAULT_KOKORO_VOICE
    # Check if it's a Kokoro voice
    if voice in KOKORO_VOICES:
        kokoro_request = KokoroRequest(
            text=request.text,
            voice=voice,
            speed=request.speed,
            output_format=request.output_format,
        )
        return await synthesize_with_kokoro(kokoro_request)
    # Check if it's a Piper/German voice
    if voice in PIPER_VOICES:
        try:
            # Convert speed to length_scale (inverse relationship)
            # speed > 1 means faster, so length_scale < 1
            length_scale = 1.0 / request.speed
            result = await synthesize_piper(
                text=request.text,
                voice=voice,
                length_scale=length_scale,
            )
            # Convert to requested format
            output_format = request.output_format.lower()
            audio_bytes, content_type = convert_audio(
                result.audio,
                result.sample_rate,
                output_format,
            )
            return Response(
                content=audio_bytes,
                media_type=content_type,
                headers={
                    "X-Model": "piper",
                    "X-Voice": voice,
                    "X-Duration": str(result.duration),
                    "X-Sample-Rate": str(result.sample_rate),
                },
            )
        except Exception as e:
            logger.error(f"Piper synthesis error: {e}")
            raise HTTPException(status_code=500, detail=f"German voice synthesis failed: {e}")
    # Check if it's a registered custom voice
    voice_manager = get_voice_manager()
    if voice_manager.voice_exists(voice):
        # Use F5-TTS with registered voice
        # Create a form-like context for the F5 endpoint
        custom_voice = voice_manager.get_voice(voice)
        try:
            result = await synthesize_f5(
                text=request.text,
                reference_audio_path=custom_voice.audio_path,
                reference_text=custom_voice.transcript,
                speed=request.speed,
            )
            # Convert to requested format
            output_format = request.output_format.lower()
            audio_bytes, content_type = convert_audio(
                result.audio,
                result.sample_rate,
                output_format,
            )
            return Response(
                content=audio_bytes,
                media_type=content_type,
                headers={
                    "X-Model": "f5-tts",
                    "X-Voice-ID": voice,
                    "X-Duration": str(result.duration),
                    "X-Sample-Rate": str(result.sample_rate),
                },
            )
        except Exception as e:
            logger.error(f"F5-TTS auto synthesis error: {e}")
            raise HTTPException(status_code=500, detail=f"Voice synthesis failed: {e}")
    # Unknown voice - fall back to Kokoro with default
    logger.warning(f"Unknown voice '{voice}', falling back to Kokoro default")
    kokoro_request = KokoroRequest(
        text=request.text,
        voice=DEFAULT_KOKORO_VOICE,
        speed=request.speed,
        output_format=request.output_format,
    )
    return await synthesize_with_kokoro(kokoro_request)
 # ============================================================================
 # Error Handler
 # ============================================================================
@app.exception_handler(Exception)
 async def global_exception_handler(request, exc):
    """Handle uncaught exceptions."""
    logger.error(f"Unhandled exception: {exc}")
    return Response(
        content=f'{{"error": "Internal server error", "detail": "{str(exc)}"}}',
        status_code=500,
        media_type="application/json",
    )
 # ============================================================================
 # Main
 # ============================================================================
 if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=PORT)
--- a/services/mana-tts/app/orpheus_service.py
+++ b/services/mana-tts/app/orpheus_service.py
@ -1,229 +0,0 @@
 """
 Orpheus TTS — High-quality German speech synthesis.
 Uses the Orpheus-TTS model with German finetune for natural-sounding
 interview question generation. Not optimized for real-time — quality first.
 Model: Kartoffel_Orpheus-3B_german_natural-v0.1 (HuggingFace)
 VRAM: ~8 GB (fits comfortably on RTX 3090 alongside other models)
 """
 import logging
 import asyncio
 from dataclasses import dataclass
 from typing import Optional
 import numpy as np
 logger = logging.getLogger(__name__)
 # Lazy-loaded model state
 _model = None
 _tokenizer = None
 _loaded = False
 MODEL_ID = "Vishalshendge3198/orpheus-3b-tts-german-emotional-merged"
 SAMPLE_RATE = 24000
 # Available voices (Orpheus built-in speaker tags)
 ORPHEUS_VOICES = {
    "tara": "Female, warm and clear (default)",
    "leah": "Female, soft and friendly",
    "jess": "Female, energetic",
    "leo": "Male, calm and professional",
    "dan": "Male, deep and warm",
    "mia": "Female, young and bright",
    "zac": "Male, confident",
    "emma": "Female, neutral",
 }
 DEFAULT_VOICE = "tara"
@dataclass
 class OrpheusResult:
    audio: np.ndarray
    sample_rate: int
    duration: float
    voice: str
 def is_orpheus_loaded() -> bool:
    return _loaded
 def get_orpheus_model():
    """Load the Orpheus German model (lazy, first call only)."""
    global _model, _tokenizer, _loaded
    if _loaded:
        return _model, _tokenizer
    logger.info(f"Loading Orpheus German model: {MODEL_ID}")
    try:
        from transformers import AutoTokenizer, AutoModelForCausalLM
        import torch
        _tokenizer = AutoTokenizer.from_pretrained(
            MODEL_ID,
            trust_remote_code=True,
        )
        _model = AutoModelForCausalLM.from_pretrained(
            MODEL_ID,
            torch_dtype=torch.bfloat16,
            device_map="cuda",
            trust_remote_code=True,
        )
        _model.eval()
        _loaded = True
        logger.info("Orpheus German model loaded successfully")
        return _model, _tokenizer
    except Exception as e:
        logger.error(f"Failed to load Orpheus model: {e}")
        raise RuntimeError(f"Failed to load Orpheus model: {e}")
 def unload_orpheus():
    """Free VRAM by unloading the model."""
    global _model, _tokenizer, _loaded
    import torch
    if _model is not None:
        del _model
        _model = None
    if _tokenizer is not None:
        del _tokenizer
        _tokenizer = None
    _loaded = False
    torch.cuda.empty_cache()
    logger.info("Orpheus model unloaded")
 async def synthesize_orpheus(
    text: str,
    voice: str = DEFAULT_VOICE,
    temperature: float = 0.6,
    top_p: float = 0.95,
    max_new_tokens: int = 4096,
 ) -> OrpheusResult:
    """
    Synthesize German speech using Orpheus TTS.
    Returns OrpheusResult with audio as numpy float32 array.
    """
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(
        None,
        _synthesize_sync,
        text,
        voice,
        temperature,
        top_p,
        max_new_tokens,
    )
 def _synthesize_sync(
    text: str,
    voice: str,
    temperature: float,
    top_p: float,
    max_new_tokens: int,
 ) -> OrpheusResult:
    """Synchronous synthesis (runs in thread pool)."""
    import torch
    model, tokenizer = get_orpheus_model()
    # Orpheus uses a specific prompt format with speaker tags
    prompt = f"<|speaker:{voice}|>{text}"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=top_p,
            do_sample=True,
        )
    # Extract audio tokens (model-specific decoding)
    audio_tokens = outputs[0][inputs["input_ids"].shape[1]:]
    # Decode audio tokens to waveform
    # Orpheus uses a SNAC-based codec — tokens map to audio via the model's decode method
    if hasattr(model, "decode_audio"):
        audio_np = model.decode_audio(audio_tokens).cpu().numpy().flatten()
    else:
        # Fallback: use the tokenizer's decode if model doesn't have decode_audio
        # This handles different Orpheus model versions
        audio_np = _decode_orpheus_tokens(audio_tokens, model)
    duration = len(audio_np) / SAMPLE_RATE
    return OrpheusResult(
        audio=audio_np,
        sample_rate=SAMPLE_RATE,
        duration=duration,
        voice=voice,
    )
 def _decode_orpheus_tokens(tokens, model) -> np.ndarray:
    """
    Decode Orpheus audio tokens using SNAC codec.
    Orpheus generates special audio tokens that need to be decoded
    through the SNAC vocoder to produce the final waveform.
    """
    import torch
    try:
        from snac import SNAC
        snac = SNAC.from_pretrained("hubertsiuzdak/snac_24khz").to(model.device)
        # Filter to audio-only tokens (above text vocab range)
        audio_token_ids = tokens[tokens >= 128256].tolist()
        if not audio_token_ids:
            logger.warning("No audio tokens generated")
            return np.zeros(SAMPLE_RATE, dtype=np.float32)  # 1s silence
        # Orpheus interleaves 3 codebook levels: [c1, c2, c3, c1, c2, c3, ...]
        # Redistribute into separate codebook tensors
        codes_0, codes_1, codes_2 = [], [], []
        for i, token_id in enumerate(audio_token_ids):
            # Offset tokens back to codebook range
            code = token_id - 128256
            level = i % 3
            if level == 0:
                codes_0.append(code)
            elif level == 1:
                codes_1.append(code)
            else:
                codes_2.append(code)
        # Trim to equal lengths
        min_len = min(len(codes_0), len(codes_1), len(codes_2))
        if min_len == 0:
            return np.zeros(SAMPLE_RATE, dtype=np.float32)
        codes = [
            torch.tensor(codes_0[:min_len], device=model.device).unsqueeze(0),
            torch.tensor(codes_1[:min_len], device=model.device).unsqueeze(0),
            torch.tensor(codes_2[:min_len], device=model.device).unsqueeze(0),
        ]
        with torch.no_grad():
            audio = snac.decode(codes).squeeze().cpu().numpy()
        return audio.astype(np.float32)
    except ImportError:
        logger.error("snac package not installed — pip install snac")
        raise RuntimeError("snac package required for Orpheus audio decoding")
--- a/services/mana-tts/app/piper_service.py
+++ b/services/mana-tts/app/piper_service.py
@ -1,385 +0,0 @@
 """
 German TTS Service - Piper TTS (local, fast) with Edge TTS fallback.
 Primary: Piper TTS - 100% local, DSGVO-konform, very fast
 Fallback: Edge TTS - Cloud-based (Microsoft), high quality but sends data externally
 """
 import logging
 import tempfile
 import os
 import asyncio
 from dataclasses import dataclass
 from typing import Optional
 from pathlib import Path
 import numpy as np
 import soundfile as sf
 logger = logging.getLogger(__name__)
 # Paths for Piper models
 PIPER_VOICES_DIR = Path(__file__).parent.parent / "piper_voices"
 # Available German voices
 PIPER_VOICES = {
    # === LOCAL PIPER VOICES (Primary - 100% local) ===
    "de_thorsten": {
        "type": "piper",
        "model": "thorsten_medium.onnx",
        "name": "Thorsten",
        "description": "Deutsche Männerstimme (lokal, schnell)",
        "language": "de",
        "gender": "male",
        "local": True,
    },
    "de_kerstin": {
        "type": "piper",
        "model": "kerstin_low.onnx",
        "name": "Kerstin",
        "description": "Deutsche Frauenstimme (lokal, schnell)",
        "language": "de",
        "gender": "female",
        "local": True,
    },
    # === EDGE TTS VOICES (Fallback - Cloud) ===
    "de_katja": {
        "type": "edge",
        "edge_voice": "de-DE-KatjaNeural",
        "name": "Katja",
        "description": "Deutsche Frauenstimme (Cloud)",
        "language": "de",
        "gender": "female",
        "local": False,
    },
    "de_conrad": {
        "type": "edge",
        "edge_voice": "de-DE-ConradNeural",
        "name": "Conrad",
        "description": "Deutsche Männerstimme (Cloud)",
        "language": "de",
        "gender": "male",
        "local": False,
    },
    "de_amala": {
        "type": "edge",
        "edge_voice": "de-DE-AmalaNeural",
        "name": "Amala",
        "description": "Deutsche Frauenstimme jung (Cloud)",
        "language": "de",
        "gender": "female",
        "local": False,
    },
    "de_florian": {
        "type": "edge",
        "edge_voice": "de-DE-FlorianNeural",
        "name": "Florian",
        "description": "Deutsche Männerstimme jung (Cloud)",
        "language": "de",
        "gender": "male",
        "local": False,
    },
    # Legacy alias - maps to local Thorsten
    "de_anna": {
        "type": "piper",
        "model": "thorsten_medium.onnx",
        "name": "Anna (→ Thorsten)",
        "description": "Alias für Thorsten (lokal)",
        "language": "de",
        "gender": "male",
        "local": True,
    },
 }
 DEFAULT_PIPER_VOICE = "de_thorsten"
 # Cached Piper voice instances (one per model)
 _piper_voices: dict = {}
 _piper_available = None
 _edge_available = None
 def _get_piper_model_path(model_name: str) -> Path:
    """Get full path to a Piper model."""
    return PIPER_VOICES_DIR / model_name
 def check_piper_available() -> bool:
    """Check if Piper TTS is available."""
    global _piper_available
    if _piper_available is not None:
        return _piper_available
    try:
        from piper import PiperVoice
        model_path = _get_piper_model_path("thorsten_medium.onnx")
        if model_path.exists():
            _piper_available = True
            logger.info(f"Piper TTS available with model: {model_path}")
        else:
            _piper_available = False
            logger.warning(f"Piper model not found: {model_path}")
    except ImportError as e:
        _piper_available = False
        logger.warning(f"Piper TTS not installed: {e}")
    return _piper_available
 def _check_edge_available() -> bool:
    """Check if Edge TTS is available."""
    global _edge_available
    if _edge_available is not None:
        return _edge_available
    try:
        import edge_tts
        _edge_available = True
        logger.info("Edge TTS available as fallback")
    except ImportError:
        _edge_available = False
        logger.warning("Edge TTS not installed")
    return _edge_available
 def is_piper_loaded() -> bool:
    """Check if any TTS is available."""
    return check_piper_available() or _check_edge_available()
 def _get_piper_voice(model_name: str = "thorsten_medium.onnx"):
    """Get or create cached Piper voice instance for a specific model."""
    global _piper_voices
    if model_name in _piper_voices:
        return _piper_voices[model_name]
    if not check_piper_available():
        return None
    try:
        from piper import PiperVoice
        model_path = _get_piper_model_path(model_name)
        config_path = _get_piper_model_path(f"{model_name}.json")
        logger.info(f"Loading Piper voice from {model_path}")
        voice = PiperVoice.load(str(model_path), str(config_path))
        _piper_voices[model_name] = voice
        logger.info(f"Piper voice {model_name} loaded successfully")
        return voice
    except Exception as e:
        logger.error(f"Failed to load Piper voice {model_name}: {e}")
        return None
@dataclass
 class PiperSynthesisResult:
    """Result of TTS synthesis."""
    audio: np.ndarray
    sample_rate: int
    duration: float
    voice: str
 async def _synthesize_with_piper(
    text: str,
    voice_id: str = "de_thorsten",
    length_scale: float = 1.0,
 ) -> PiperSynthesisResult:
    """Synthesize using local Piper TTS."""
    # Get the model name for this voice
    voice_config = PIPER_VOICES.get(voice_id, PIPER_VOICES["de_thorsten"])
    model_name = voice_config.get("model", "thorsten_medium.onnx")
    piper_voice = _get_piper_voice(model_name)
    if piper_voice is None:
        raise RuntimeError(f"Piper voice {voice_id} not available")
    logger.debug(f"Piper synthesizing with {voice_id}: \"{text[:50]}...\"")
    # Piper uses length_scale directly (1.0 = normal, >1 = slower)
    # Run in thread pool to not block async
    loop = asyncio.get_event_loop()
    def _synth():
        audio_data = []
        for audio_chunk in piper_voice.synthesize_stream_raw(text, length_scale=length_scale):
            audio_data.append(audio_chunk)
        return b"".join(audio_data)
    audio_bytes = await loop.run_in_executor(None, _synth)
    # Convert to numpy (16-bit PCM)
    audio = np.frombuffer(audio_bytes, dtype=np.int16).astype(np.float32) / 32768.0
    sample_rate = piper_voice.config.sample_rate
    duration = len(audio) / sample_rate
    logger.debug(f"Piper synthesis complete: {duration:.2f}s, {sample_rate}Hz")
    return PiperSynthesisResult(
        audio=audio,
        sample_rate=sample_rate,
        duration=duration,
        voice=voice_id,
    )
 async def _synthesize_with_edge(
    text: str,
    edge_voice: str,
    length_scale: float = 1.0,
 ) -> PiperSynthesisResult:
    """Synthesize using Edge TTS (cloud fallback)."""
    import edge_tts
    logger.debug(f"Edge TTS synthesizing: \"{text[:50]}...\" with voice={edge_voice}")
    # Convert length_scale to rate string
    rate_percent = int((1.0 / length_scale - 1.0) * 100)
    rate_str = f"{rate_percent:+d}%"
    with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as tmp_file:
        tmp_path = tmp_file.name
    try:
        communicate = edge_tts.Communicate(text, edge_voice, rate=rate_str)
        await communicate.save(tmp_path)
        audio, sample_rate = sf.read(tmp_path)
        if len(audio.shape) > 1:
            audio = audio.mean(axis=1)
        audio = audio.astype(np.float32)
        duration = len(audio) / sample_rate
        logger.debug(f"Edge TTS synthesis complete: {duration:.2f}s, {sample_rate}Hz")
        return PiperSynthesisResult(
            audio=audio,
            sample_rate=sample_rate,
            duration=duration,
            voice=edge_voice,
        )
    finally:
        if os.path.exists(tmp_path):
            os.unlink(tmp_path)
 async def synthesize_piper(
    text: str,
    voice: str = DEFAULT_PIPER_VOICE,
    length_scale: float = 1.0,
 ) -> PiperSynthesisResult:
    """
    Synthesize speech - uses local Piper if available, falls back to Edge TTS.
    Args:
        text: Text to synthesize
        voice: Voice ID (e.g., "de_thorsten", "de_katja")
        length_scale: Speed control (1.0 = normal, >1 = slower, <1 = faster)
    Returns:
        PiperSynthesisResult with audio data
    """
    if not text.strip():
        raise ValueError("Text cannot be empty")
    # Get voice config
    if voice not in PIPER_VOICES:
        logger.warning(f"Unknown voice: {voice}, using default {DEFAULT_PIPER_VOICE}")
        voice = DEFAULT_PIPER_VOICE
    voice_config = PIPER_VOICES[voice]
    voice_type = voice_config.get("type", "piper")
    # Try local Piper first for piper-type voices
    if voice_type == "piper" and check_piper_available():
        try:
            return await _synthesize_with_piper(text, voice, length_scale)
        except Exception as e:
            logger.warning(f"Piper synthesis failed, trying Edge fallback: {e}")
    # Use Edge TTS for edge-type voices or as fallback
    if _check_edge_available():
        edge_voice = voice_config.get("edge_voice", "de-DE-ConradNeural")
        if voice_type == "piper":
            # Fallback: use appropriate Edge voice based on gender
            gender = voice_config.get("gender", "male")
            edge_voice = "de-DE-KatjaNeural" if gender == "female" else "de-DE-ConradNeural"
        return await _synthesize_with_edge(text, edge_voice, length_scale)
    raise RuntimeError("No TTS backend available (neither Piper nor Edge TTS)")
 def list_piper_voices() -> list[dict]:
    """List all available German voices."""
    voices = []
    piper_available = check_piper_available()
    edge_available = _check_edge_available()
    for voice_id, config in PIPER_VOICES.items():
        # Skip legacy alias
        if voice_id == "de_anna":
            continue
        voice_type = config.get("type", "piper")
        is_available = (voice_type == "piper" and piper_available) or \
                       (voice_type == "edge" and edge_available)
        voices.append({
            "id": voice_id,
            "name": config["name"],
            "description": config["description"],
            "language": config["language"],
            "gender": config.get("gender", "unknown"),
            "local": config.get("local", False),
            "installed": is_available,
            "loaded": is_available,
        })
    # Sort: local voices first
    voices.sort(key=lambda v: (not v["local"], v["id"]))
    return voices
 def get_piper_voice(voice_id: str) -> Optional[dict]:
    """Get voice configuration by ID."""
    if voice_id not in PIPER_VOICES:
        return None
    config = PIPER_VOICES[voice_id]
    voice_type = config.get("type", "piper")
    piper_available = check_piper_available()
    edge_available = _check_edge_available()
    is_available = (voice_type == "piper" and piper_available) or \
                   (voice_type == "edge" and edge_available)
    return {
        "id": voice_id,
        "name": config["name"],
        "description": config["description"],
        "language": config["language"],
        "gender": config.get("gender", "unknown"),
        "local": config.get("local", False),
        "installed": is_available,
        "loaded": is_available,
    }
 async def download_piper_voice(voice_id: str) -> bool:
    """Check if voice is available."""
    if voice_id not in PIPER_VOICES:
        return False
    config = PIPER_VOICES[voice_id]
    voice_type = config.get("type", "piper")
    if voice_type == "piper":
        return check_piper_available()
    elif voice_type == "edge":
        return _check_edge_available()
    return False
--- a/services/mana-tts/app/voice_manager.py
+++ b/services/mana-tts/app/voice_manager.py
@ -1,275 +0,0 @@
 """
 Voice Manager for registering and managing custom voices.
 Handles pre-defined voices from the voices/ directory and runtime-registered voices.
 """
 import json
 import logging
 import os
 from dataclasses import dataclass, asdict
 from pathlib import Path
 from typing import Optional
 logger = logging.getLogger(__name__)
 # Base directory for voices
 VOICES_DIR = Path(__file__).parent.parent / "voices"
 # Registry file for custom voices
 REGISTRY_FILE = VOICES_DIR / "registry.json"
@dataclass
 class CustomVoice:
    """Custom voice registration."""
    id: str
    name: str
    description: str
    audio_path: str
    transcript: str
    created_at: str  # ISO format timestamp
 class VoiceManager:
    """Manages custom voice registrations for F5-TTS."""
    def __init__(self, voices_dir: Path = VOICES_DIR):
        self.voices_dir = voices_dir
        self.registry_file = voices_dir / "registry.json"
        self._voices: dict[str, CustomVoice] = {}
        self._load_registry()
        self._scan_predefined_voices()
    def _load_registry(self) -> None:
        """Load voice registry from disk."""
        if not self.registry_file.exists():
            logger.info("No voice registry found, starting fresh")
            return
        try:
            with open(self.registry_file, "r") as f:
                data = json.load(f)
            for voice_id, voice_data in data.items():
                # Verify audio file exists
                if Path(voice_data["audio_path"]).exists():
                    self._voices[voice_id] = CustomVoice(**voice_data)
                else:
                    logger.warning(
                        f"Voice '{voice_id}' audio file not found: {voice_data['audio_path']}"
                    )
            logger.info(f"Loaded {len(self._voices)} custom voices from registry")
        except Exception as e:
            logger.error(f"Failed to load voice registry: {e}")
    def _save_registry(self) -> None:
        """Save voice registry to disk."""
        try:
            data = {
                voice_id: asdict(voice)
                for voice_id, voice in self._voices.items()
            }
            with open(self.registry_file, "w") as f:
                json.dump(data, f, indent=2)
            logger.info("Voice registry saved")
        except Exception as e:
            logger.error(f"Failed to save voice registry: {e}")
    def _scan_predefined_voices(self) -> None:
        """Scan voices directory for pre-defined voices."""
        if not self.voices_dir.exists():
            return
        # Look for voice directories with audio + transcript
        for voice_dir in self.voices_dir.iterdir():
            if not voice_dir.is_dir():
                continue
            voice_id = voice_dir.name
            if voice_id in self._voices:
                continue  # Already registered
            # Look for audio file
            audio_file = None
            for ext in [".wav", ".mp3", ".m4a", ".flac"]:
                candidate = voice_dir / f"reference{ext}"
                if candidate.exists():
                    audio_file = candidate
                    break
            # Look for transcript
            transcript_file = voice_dir / "transcript.txt"
            if not transcript_file.exists():
                continue
            if not audio_file:
                logger.warning(f"No reference audio found in {voice_dir}")
                continue
            # Load transcript
            try:
                transcript = transcript_file.read_text().strip()
            except Exception as e:
                logger.warning(f"Failed to read transcript for {voice_id}: {e}")
                continue
            # Load metadata if exists
            metadata_file = voice_dir / "metadata.json"
            name = voice_id
            description = f"Pre-defined voice: {voice_id}"
            if metadata_file.exists():
                try:
                    with open(metadata_file, "r") as f:
                        metadata = json.load(f)
                    name = metadata.get("name", name)
                    description = metadata.get("description", description)
                except Exception:
                    pass
            # Register pre-defined voice
            from datetime import datetime
            self._voices[voice_id] = CustomVoice(
                id=voice_id,
                name=name,
                description=description,
                audio_path=str(audio_file),
                transcript=transcript,
                created_at=datetime.now().isoformat(),
            )
            logger.info(f"Found pre-defined voice: {voice_id}")
    def register_voice(
        self,
        voice_id: str,
        name: str,
        description: str,
        audio_bytes: bytes,
        transcript: str,
        audio_extension: str = ".wav",
    ) -> CustomVoice:
        """
        Register a new custom voice.
        Args:
            voice_id: Unique voice identifier
            name: Display name
            description: Voice description
            audio_bytes: Reference audio data
            transcript: Transcript of the reference audio
            audio_extension: Audio file extension
        Returns:
            Registered CustomVoice
        Raises:
            ValueError: If voice_id already exists
        """
        if voice_id in self._voices:
            raise ValueError(f"Voice '{voice_id}' already exists")
        # Validate voice_id format
        if not voice_id.replace("_", "").replace("-", "").isalnum():
            raise ValueError("Voice ID must be alphanumeric (with _ or -)")
        # Create voice directory
        voice_dir = self.voices_dir / voice_id
        voice_dir.mkdir(parents=True, exist_ok=True)
        # Save audio file
        audio_path = voice_dir / f"reference{audio_extension}"
        with open(audio_path, "wb") as f:
            f.write(audio_bytes)
        # Save transcript
        transcript_file = voice_dir / "transcript.txt"
        with open(transcript_file, "w") as f:
            f.write(transcript)
        # Create voice entry
        from datetime import datetime
        voice = CustomVoice(
            id=voice_id,
            name=name,
            description=description,
            audio_path=str(audio_path),
            transcript=transcript,
            created_at=datetime.now().isoformat(),
        )
        # Save metadata
        metadata_file = voice_dir / "metadata.json"
        with open(metadata_file, "w") as f:
            json.dump(
                {"name": name, "description": description},
                f,
                indent=2,
            )
        # Add to registry
        self._voices[voice_id] = voice
        self._save_registry()
        logger.info(f"Registered new voice: {voice_id}")
        return voice
    def get_voice(self, voice_id: str) -> Optional[CustomVoice]:
        """Get a voice by ID."""
        return self._voices.get(voice_id)
    def delete_voice(self, voice_id: str) -> bool:
        """
        Delete a custom voice.
        Args:
            voice_id: Voice to delete
        Returns:
            True if deleted, False if not found
        """
        if voice_id not in self._voices:
            return False
        voice = self._voices[voice_id]
        # Remove voice directory
        voice_dir = self.voices_dir / voice_id
        if voice_dir.exists():
            import shutil
            try:
                shutil.rmtree(voice_dir)
            except Exception as e:
                logger.error(f"Failed to delete voice directory: {e}")
        # Remove from registry
        del self._voices[voice_id]
        self._save_registry()
        logger.info(f"Deleted voice: {voice_id}")
        return True
    def list_voices(self) -> list[CustomVoice]:
        """List all registered custom voices."""
        return list(self._voices.values())
    def voice_exists(self, voice_id: str) -> bool:
        """Check if a voice exists."""
        return voice_id in self._voices
 # Global singleton instance
 _voice_manager: Optional[VoiceManager] = None
 def get_voice_manager() -> VoiceManager:
    """Get the global VoiceManager instance."""
    global _voice_manager
    if _voice_manager is None:
        _voice_manager = VoiceManager()
    return _voice_manager
--- a/services/mana-tts/app/vram_manager.py
+++ b/services/mana-tts/app/vram_manager.py
@ -1,114 +0,0 @@
 """
 VRAM Manager — Automatic model unloading after idle timeout.
 Tracks last usage time per model and unloads after configurable timeout.
 Designed for shared GPU environments (multiple services on one RTX 3090).
 Usage in a service:
    from vram_manager import VramManager
    vram = VramManager(idle_timeout=300)  # 5 min
    # Before using a model
    vram.touch()
    # Call periodically (e.g., from health check or background task)
    vram.check_idle(unload_fn=my_unload_function)
 """
 import os
 import time
 import logging
 import threading
 from typing import Optional, Callable
 logger = logging.getLogger(__name__)
 DEFAULT_IDLE_TIMEOUT = int(os.getenv("VRAM_IDLE_TIMEOUT", "300"))  # 5 minutes
 class VramManager:
    def __init__(self, idle_timeout: int = DEFAULT_IDLE_TIMEOUT, service_name: str = "unknown"):
        self.idle_timeout = idle_timeout
        self.service_name = service_name
        self.last_used: float = 0.0
        self.model_loaded: bool = False
        self._lock = threading.Lock()
        self._timer: Optional[threading.Timer] = None
    def touch(self):
        """Mark the model as recently used. Call before/after each inference."""
        with self._lock:
            self.last_used = time.time()
            self.model_loaded = True
            self._schedule_check()
    def mark_loaded(self):
        """Mark that a model has been loaded into VRAM."""
        with self._lock:
            self.model_loaded = True
            self.last_used = time.time()
            self._schedule_check()
            logger.info(f"[{self.service_name}] Model loaded, idle timeout: {self.idle_timeout}s")
    def mark_unloaded(self):
        """Mark that a model has been unloaded from VRAM."""
        with self._lock:
            self.model_loaded = False
            if self._timer:
                self._timer.cancel()
                self._timer = None
            logger.info(f"[{self.service_name}] Model unloaded, VRAM freed")
    def is_idle(self) -> bool:
        """Check if the model has been idle longer than the timeout."""
        if not self.model_loaded:
            return False
        return (time.time() - self.last_used) > self.idle_timeout
    def seconds_until_unload(self) -> Optional[float]:
        """Seconds until the model will be unloaded, or None if not loaded."""
        if not self.model_loaded:
            return None
        remaining = self.idle_timeout - (time.time() - self.last_used)
        return max(0, remaining)
    def check_and_unload(self, unload_fn: Callable[[], None]) -> bool:
        """Check if idle and unload if so. Returns True if unloaded."""
        if self.is_idle():
            logger.info(f"[{self.service_name}] Idle for >{self.idle_timeout}s, unloading model...")
            try:
                unload_fn()
                self.mark_unloaded()
                return True
            except Exception as e:
                logger.error(f"[{self.service_name}] Failed to unload: {e}")
        return False
    def _schedule_check(self):
        """Schedule an idle check after the timeout period."""
        if self._timer:
            self._timer.cancel()
        self._timer = threading.Timer(
            self.idle_timeout + 5,  # Small buffer
            self._auto_check,
        )
        self._timer.daemon = True
        self._timer.start()
    def _auto_check(self):
        """Auto-triggered idle check (called by timer)."""
        # This is just a log — actual unloading needs the unload_fn
        # which depends on the service. The service should call check_and_unload.
        if self.is_idle():
            logger.info(f"[{self.service_name}] Model idle for >{self.idle_timeout}s — ready to unload")
    def status(self) -> dict:
        """Get current VRAM manager status."""
        return {
            "model_loaded": self.model_loaded,
            "idle_seconds": round(time.time() - self.last_used, 1) if self.model_loaded else None,
            "idle_timeout": self.idle_timeout,
            "seconds_until_unload": round(self.seconds_until_unload(), 1) if self.model_loaded else None,
        }
--- a/services/mana-tts/app/zonos_service.py
+++ b/services/mana-tts/app/zonos_service.py
@ -1,205 +0,0 @@
 """
 Zonos TTS — Expressive multilingual speech synthesis by Zyphra.
 Trained on 200k hours of speech data with explicit German support.
 Fine-grained control over pitch, speaking rate, and emotions.
 Model: Zyphra/Zonos-v0.1-transformer (HuggingFace)
 VRAM: ~5 GB (fits comfortably on RTX 3090)
 """
 import logging
 import asyncio
 import os
 from dataclasses import dataclass
 from typing import Optional
 import numpy as np
 # Disable torch.compile (requires MSVC cl.exe on Windows which we don't have)
 os.environ["TORCHDYNAMO_DISABLE"] = "1"
 logger = logging.getLogger(__name__)
 # Lazy-loaded model state
 _model = None
 _loaded = False
 MODEL_ID = "Zyphra/Zonos-v0.1-transformer"
 SAMPLE_RATE = 44100  # Zonos outputs 44.1 kHz audio
 # Emotion presets for the interview context
 EMOTION_PRESETS = {
    "neutral": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0],  # neutral dominant
    "friendly": [0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5],  # happiness + neutral
    "warm": [0.3, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7],  # slight warmth
    "curious": [0.2, 0.1, 0.0, 0.0, 0.0, 0.0, 0.0, 0.7],  # interested
 }
 DEFAULT_EMOTION = "friendly"
@dataclass
 class ZonosResult:
    audio: np.ndarray
    sample_rate: int
    duration: float
    emotion: str
 def is_zonos_loaded() -> bool:
    return _loaded
 def get_zonos_model():
    """Load the Zonos model (lazy, first call only)."""
    global _model, _loaded
    if _loaded:
        return _model
    logger.info(f"Loading Zonos model: {MODEL_ID}")
    try:
        import torch
        # Zonos provides its own loader
        # Try the official zonos package first, fall back to transformers
        try:
            from zonos.model import Zonos
            _model = Zonos.from_pretrained(MODEL_ID, device="cuda")
        except ImportError:
            # If zonos package not installed, use transformers
            logger.info("zonos package not found, trying transformers loading")
            from transformers import AutoModel
            _model = AutoModel.from_pretrained(
                MODEL_ID,
                torch_dtype=torch.float32,
                trust_remote_code=True,
            ).to("cuda")
        _loaded = True
        logger.info("Zonos model loaded successfully")
        return _model
    except Exception as e:
        logger.error(f"Failed to load Zonos model: {e}")
        raise RuntimeError(f"Failed to load Zonos model: {e}")
 def unload_zonos():
    """Free VRAM by unloading the model."""
    global _model, _loaded
    import torch
    if _model is not None:
        del _model
        _model = None
    _loaded = False
    torch.cuda.empty_cache()
    logger.info("Zonos model unloaded")
 async def synthesize_zonos(
    text: str,
    language: str = "de",
    emotion: str = DEFAULT_EMOTION,
    speaking_rate: float = 13.0,
    pitch_std: float = 20.0,
    speaker_audio: Optional[bytes] = None,
 ) -> ZonosResult:
    """
    Synthesize speech using Zonos TTS.
    Args:
        text: Text to synthesize
        language: Language code (default: 'de' for German)
        emotion: Emotion preset name or custom emotion vector
        speaking_rate: Speaking rate in phonemes/sec (default 13.0, range ~8-20)
        pitch_std: Pitch variation in Hz (default 20.0, range ~5-50)
        speaker_audio: Optional reference audio bytes for voice cloning
    Returns ZonosResult with audio as numpy float32 array.
    """
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(
        None,
        _synthesize_sync,
        text,
        language,
        emotion,
        speaking_rate,
        pitch_std,
        speaker_audio,
    )
 def _synthesize_sync(
    text: str,
    language: str,
    emotion: str,
    speaking_rate: float,
    pitch_std: float,
    speaker_audio: Optional[bytes],
 ) -> ZonosResult:
    """Synchronous synthesis (runs in thread pool)."""
    import torch
    from zonos.conditioning import make_cond_dict
    model = get_zonos_model()
    # Resolve emotion preset
    emotion_values = EMOTION_PRESETS.get(emotion, EMOTION_PRESETS["friendly"])
    # Build speaker embedding if reference audio provided
    speaker_embedding = None
    if speaker_audio:
        speaker_embedding = _embed_speaker(speaker_audio, model)
    # Map language codes: Zonos expects espeak language codes like 'de' or 'en-us'
    lang_map = {"de": "de", "en": "en-us", "fr": "fr-fr", "es": "es", "it": "it"}
    espeak_lang = lang_map.get(language, language)
    # Build conditioning using Zonos's own helper
    cond = make_cond_dict(
        text=text,
        language=espeak_lang,
        emotion=emotion_values,
        speaking_rate=speaking_rate,
        pitch_std=pitch_std,
        speaker=speaker_embedding,
    )
    # Generate
    with torch.no_grad():
        conditioning = model.prepare_conditioning(cond)
        codes = model.generate(conditioning)
        audio = model.autoencoder.decode(codes).squeeze().cpu().numpy()
    audio = audio.astype(np.float32)
    duration = len(audio) / SAMPLE_RATE
    return ZonosResult(
        audio=audio,
        sample_rate=SAMPLE_RATE,
        duration=duration,
        emotion=emotion,
    )
 def _embed_speaker(audio_bytes: bytes, model) -> "torch.Tensor":
    """Create speaker embedding from reference audio bytes."""
    import torch
    import io
    import soundfile as sf
    audio_data, sr = sf.read(io.BytesIO(audio_bytes))
    if len(audio_data.shape) > 1:
        audio_data = audio_data.mean(axis=1)  # mono
    audio_tensor = torch.tensor(audio_data, dtype=torch.float32, device="cuda").unsqueeze(0)
    return model.make_speaker_embedding(audio_tensor, sr)
--- a/services/mana-tts/requirements.txt
+++ b/services/mana-tts/requirements.txt
@ -1,35 +0,0 @@
 # Web Framework
 fastapi>=0.115.0
 uvicorn[standard]>=0.34.0
 python-multipart>=0.0.20
 # TTS Models (MLX optimized for Apple Silicon)
 f5-tts-mlx>=0.2.6
 mlx-audio>=0.1.0
 mlx>=0.21.0
 # Kokoro dependencies (phonemizer)
 misaki[en]>=0.9.0
 # Audio Processing
 soundfile>=0.13.0
 scipy>=1.11.0
 numpy>=1.26.0
 pydub>=0.25.1
 tqdm>=4.67.0
 # Utilities
 aiofiles>=24.1.0
 # External Auth (mana-core-auth integration)
 httpx>=0.27.0
 # ── Orpheus TTS (German high-quality) ──
 # Uses transformers + SNAC codec for audio decoding
 transformers>=4.44.0
 snac>=1.2.0
 torch>=2.1.0
 # ── Zonos TTS (expressive multilingual by Zyphra) ──
 # Install via: pip install git+https://github.com/Zyphra/Zonos.git
 # (the 'zonos' package pulls its own deps including torch, encodec, etc.)
--- a/services/mana-tts/scripts/compare-german-tts.sh
+++ b/services/mana-tts/scripts/compare-german-tts.sh
@ -1,74 +0,0 @@
 #!/usr/bin/env bash
 #
 # Compare Orpheus vs Zonos vs Piper for German interview questions.
 # Run this after both models are installed on the GPU box.
 #
 # Usage: ./compare-german-tts.sh [TTS_URL] [API_KEY]
 #
 # Generates WAV files in ./comparison/ for side-by-side listening.
 set -euo pipefail
 TTS_URL="${1:-https://gpu-tts.mana.how}"
 API_KEY="${2:-${MANA_TTS_API_KEY:-}}"
 OUT="./comparison"
 mkdir -p "$OUT"
 # Sample interview questions (subset)
 QUESTIONS=(
  "Was machst du beruflich?"
  "Wo lebst du?"
  "Welche Sprachen sprichst du?"
  "Erzähl kurz von dir."
  "Wann stehst du normalerweise auf?"
  "Was sind deine Interessen und Hobbys?"
  "Was sind deine aktuellen Ziele?"
 )
 AUTH_HEADER=""
 if [ -n "$API_KEY" ]; then
  AUTH_HEADER="Authorization: Bearer $API_KEY"
 fi
 echo "=== German TTS Comparison ==="
 echo "Server: $TTS_URL"
 echo "Output: $OUT/"
 echo ""
 for i in "${!QUESTIONS[@]}"; do
  q="${QUESTIONS[$i]}"
  idx=$(printf "%02d" $((i + 1)))
  echo "[$idx] \"$q\""
  # Piper (baseline)
  echo "  → Piper..."
  curl -s -X POST "$TTS_URL/synthesize/auto" \
    ${AUTH_HEADER:+-H "$AUTH_HEADER"} \
    -H "Content-Type: application/json" \
    -d "{\"text\": \"$q\", \"voice\": \"de_kerstin\"}" \
    -o "$OUT/${idx}_piper.wav" 2>/dev/null || echo "  ✗ Piper failed"
  # Orpheus
  echo "  → Orpheus..."
  curl -s -X POST "$TTS_URL/synthesize/orpheus" \
    ${AUTH_HEADER:+-H "$AUTH_HEADER"} \
    -H "Content-Type: application/json" \
    -d "{\"text\": \"$q\", \"voice\": \"tara\"}" \
    -o "$OUT/${idx}_orpheus.wav" 2>/dev/null || echo "  ✗ Orpheus failed"
  # Zonos (friendly)
  echo "  → Zonos..."
  curl -s -X POST "$TTS_URL/synthesize/zonos" \
    ${AUTH_HEADER:+-H "$AUTH_HEADER"} \
    -H "Content-Type: application/json" \
    -d "{\"text\": \"$q\", \"language\": \"de\", \"emotion\": \"friendly\"}" \
    -o "$OUT/${idx}_zonos.wav" 2>/dev/null || echo "  ✗ Zonos failed"
  echo ""
 done
 echo "Done! Compare files in $OUT/"
 echo ""
 echo "Quick listen (macOS):"
 echo "  for f in $OUT/01_*.wav; do echo \"\$f\"; afplay \"\$f\"; sleep 1; done"
--- a/services/mana-tts/service.pyw
+++ b/services/mana-tts/service.pyw
@ -1,17 +0,0 @@
 """mana-tts service runner."""
 import os
 import sys
 os.chdir(r"C:\mana\services\mana-tts")
 sys.path.insert(0, r"C:\mana\services\mana-tts")
 # Load .env file
 from dotenv import load_dotenv
 load_dotenv(r"C:\mana\services\mana-tts\.env")
 # Redirect stdout/stderr to log file
 log = open(r"C:\mana\services\mana-tts\service.log", "w", buffering=1)
 sys.stdout = log
 sys.stderr = log
 import uvicorn
 uvicorn.run("app.main:app", host="0.0.0.0", port=3022, log_level="info")
--- a/services/mana-tts/voices/.gitkeep
+++ b/services/mana-tts/voices/.gitkeep