🌐 feat: add i18n support to 6 web apps

Add internationalization (DE + EN) to previously missing apps: - todo: task management translations - skilltree: skill/XP system translations - nutriphi: nutrition tracking translations - planta: plant care translations - questions: research app translations - matrix: chat client translations (layout integration) Each app includes: - svelte-i18n setup with SSR support - localStorage persistence ({app}_locale pattern) - i18n loading state in +layout.svelte - German (default) and English translations Updated CONSISTENCY_REPORT.md to mark i18n task as complete. Also includes: - mana-tts service placeholder files
2026-05-14 20:21:09 +02:00 · 2026-01-29 14:47:58 +01:00 · 2026-01-29 14:47:58 +01:00 · 5a0815708c
commit 5a0815708c
parent a938ed86d4
35 changed files with 3440 additions and 56 deletions
--- a/services/mana-tts/CLAUDE.md
+++ b/services/mana-tts/CLAUDE.md
@ -0,0 +1,100 @@
+# CLAUDE.md - Mana TTS Service
+
+## Service Overview
+
+Text-to-Speech microservice using MLX-optimized models for Apple Silicon:
+
+- **Port**: 3022
+- **Framework**: Python + FastAPI
+- **Models**: Kokoro-82M (fast), F5-TTS (voice cloning)
+
+## Commands
+
+```bash
+# Setup
+./setup.sh
+
+# Development
+source .venv/bin/activate
+uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload
+
+# Production (Mac Mini)
+../../scripts/mac-mini/setup-tts.sh
+
+# Test
+curl http://localhost:3022/health
+curl -X POST http://localhost:3022/synthesize/kokoro \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello world", "voice": "af_heart"}' \
+  --output test.wav
+```
+
+## File Structure
+
+```
+services/mana-tts/
+├── app/
+│   ├── __init__.py
+│   ├── main.py              # FastAPI endpoints
+│   ├── kokoro_service.py    # Kokoro TTS (preset voices)
+│   ├── f5_service.py        # F5-TTS (voice cloning)
+│   ├── voice_manager.py     # Custom voice registry
+│   └── audio_utils.py       # Audio format conversion
+├── voices/                  # Custom voice storage
+├── mlx_models/             # Model cache
+├── setup.sh                # Setup script
+├── requirements.txt
+└── README.md
+```
+
+## API Endpoints
+
+| Endpoint | Method | Purpose |
+|----------|--------|---------|
+| `/health` | GET | Health check |
+| `/models` | GET | Model info |
+| `/voices` | GET | List all voices |
+| `/voices` | POST | Register custom voice |
+| `/voices/{id}` | DELETE | Delete custom voice |
+| `/synthesize/kokoro` | POST | Kokoro synthesis |
+| `/synthesize` | POST | F5-TTS voice cloning |
+| `/synthesize/auto` | POST | Auto-select model |
+
+## Models
+
+### Kokoro-82M
+- ~300 MB download
+- 30+ preset voices
+- Fast inference
+- No reference audio needed
+
+### F5-TTS
+- ~6 GB download
+- Voice cloning capability
+- Requires reference audio + transcript
+- Higher quality, slower
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PORT` | `3022` | Service port |
+| `PRELOAD_MODELS` | `false` | Load on startup |
+| `MAX_TEXT_LENGTH` | `1000` | Max chars |
+| `CORS_ORIGINS` | (production URLs) | CORS config |
+
+## Key Dependencies
+
+- `fastapi` - Web framework
+- `f5-tts-mlx` - Voice cloning model
+- `mlx-audio` - Kokoro implementation
+- `mlx` - Apple Silicon ML framework
+- `soundfile` - Audio I/O
+- `pydub` - MP3 conversion
+
+## Development Notes
+
+- Models load lazily on first request (unless `PRELOAD_MODELS=true`)
+- Custom voices stored in `voices/` with reference audio + transcript
+- Singleton pattern for model instances
+- Audio returned as raw bytes with headers for metadata
--- a/services/mana-tts/README.md
+++ b/services/mana-tts/README.md
@ -0,0 +1,237 @@
+# Mana TTS
+
+Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.
+
+## Features
+
+- **Kokoro TTS**: Fast preset voices (~300 MB model)
+- **F5-TTS**: Voice cloning with reference audio (~6 GB model)
+- **MLX Optimized**: Runs efficiently on Apple Silicon
+- **REST API**: FastAPI with OpenAPI documentation
+
+## Quick Start
+
+### Setup
+
+```bash
+# Run setup script
+./setup.sh
+
+# Or manually
+python3.11 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+```
+
+### Start Service
+
+```bash
+source .venv/bin/activate
+uvicorn app.main:app --host 0.0.0.0 --port 3022
+```
+
+### Test
+
+```bash
+# Health check
+curl http://localhost:3022/health
+
+# Synthesize with Kokoro
+curl -X POST http://localhost:3022/synthesize/kokoro \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Hello world", "voice": "af_heart"}' \
+  --output test.wav
+
+# Play audio (macOS)
+afplay test.wav
+```
+
+## API Endpoints
+
+### Health & Info
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/health` | GET | Health check |
+| `/models` | GET | Available models |
+| `/voices` | GET | All available voices |
+
+### Synthesis
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/synthesize/kokoro` | POST | Kokoro preset voices |
+| `/synthesize` | POST | F5-TTS voice cloning |
+| `/synthesize/auto` | POST | Auto-select model |
+
+### Voice Management
+
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/voices` | POST | Register custom voice |
+| `/voices/{id}` | DELETE | Delete custom voice |
+
+## Synthesis Examples
+
+### Kokoro (Fast Preset Voices)
+
+```bash
+curl -X POST http://localhost:3022/synthesize/kokoro \
+  -H "Content-Type: application/json" \
+  -d '{
+    "text": "Welcome to Mana TTS, your personal voice synthesis service.",
+    "voice": "af_heart",
+    "speed": 1.0,
+    "output_format": "wav"
+  }' \
+  --output output.wav
+```
+
+### F5-TTS (Voice Cloning)
+
+```bash
+# With reference audio upload
+curl -X POST http://localhost:3022/synthesize \
+  -F "text=Hello, this is a cloned voice speaking." \
+  -F "reference_audio=@reference.wav" \
+  -F "reference_text=This is what the reference audio says." \
+  -F "output_format=wav" \
+  --output cloned.wav
+
+# With registered voice
+curl -X POST http://localhost:3022/synthesize \
+  -F "text=Hello from my registered voice." \
+  -F "voice_id=my_custom_voice" \
+  --output output.wav
+```
+
+### Auto-Select
+
+```bash
+# Uses Kokoro for preset voices, F5-TTS for custom
+curl -X POST http://localhost:3022/synthesize/auto \
+  -H "Content-Type: application/json" \
+  -d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
+  --output output.wav
+```
+
+## Available Kokoro Voices
+
+### American Female
+- `af_heart` - Warm, emotional (default)
+- `af_alloy` - Neutral, professional
+- `af_bella` - Friendly, approachable
+- `af_jessica` - Confident, clear
+- `af_nicole` - Bright, energetic
+- `af_nova` - Modern, dynamic
+- `af_sarah` - Warm, conversational
+- ... and more
+
+### American Male
+- `am_adam` - Deep, authoritative
+- `am_echo` - Resonant, clear
+- `am_eric` - Professional, neutral
+- `am_michael` - Warm, trustworthy
+- ... and more
+
+### British Female
+- `bf_alice` - Refined, elegant
+- `bf_emma` - Clear, professional
+- `bf_lily` - Soft, gentle
+
+### British Male
+- `bm_daniel` - Classic, authoritative
+- `bm_fable` - Storyteller, expressive
+- `bm_george` - Traditional, clear
+
+## Voice Registration
+
+Register a custom voice for F5-TTS voice cloning:
+
+```bash
+curl -X POST http://localhost:3022/voices \
+  -F "voice_id=my_voice" \
+  -F "name=My Custom Voice" \
+  -F "description=A sample voice for testing" \
+  -F "transcript=Hello, this is the text spoken in the reference audio." \
+  -F "reference_audio=@my_reference.wav"
+```
+
+Pre-defined voices can also be placed in the `voices/` directory:
+
+```
+voices/
+└── my_voice/
+    ├── reference.wav       # Reference audio (required)
+    ├── transcript.txt      # Transcript of reference (required)
+    └── metadata.json       # Name and description (optional)
+```
+
+## Configuration
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PORT` | `3022` | API port |
+| `PRELOAD_MODELS` | `false` | Load models on startup |
+| `MAX_TEXT_LENGTH` | `1000` | Max characters per request |
+| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
+| `F5_MODEL` | `lucasnewman/f5-tts-mlx` | F5-TTS model |
+| `KOKORO_MODEL` | `mlx-community/Kokoro-82M-bf16` | Kokoro model |
+
+## Mac Mini Deployment
+
+```bash
+# Install and start as launchd service
+../../scripts/mac-mini/setup-tts.sh
+
+# Service management
+launchctl list | grep com.manacore.tts
+launchctl unload ~/Library/LaunchAgents/com.manacore.tts.plist
+launchctl load ~/Library/LaunchAgents/com.manacore.tts.plist
+
+# View logs
+tail -f /tmp/manacore-tts.log
+```
+
+## Requirements
+
+- Python 3.10+
+- macOS with Apple Silicon (recommended)
+- ~7 GB disk space for models
+- 16 GB RAM recommended
+- ffmpeg (for MP3 output)
+
+## Troubleshooting
+
+### Models Not Loading
+
+```bash
+# Check MLX installation
+python -c "import mlx; print(mlx.__version__)"
+
+# Check mlx-audio
+python -c "import mlx_audio; print('OK')"
+
+# Check f5-tts-mlx
+python -c "from f5_tts_mlx import F5TTS; print('OK')"
+```
+
+### MP3 Output Not Working
+
+```bash
+# Install ffmpeg
+brew install ffmpeg
+
+# Verify
+ffmpeg -version
+```
+
+### Memory Issues
+
+- Reduce `MAX_TEXT_LENGTH` for less memory usage
+- Set `PRELOAD_MODELS=false` for lazy loading
+- F5-TTS requires ~6 GB, Kokoro ~500 MB
+
+## API Documentation
+
+When running, visit http://localhost:3022/docs for interactive API documentation.
--- a/services/mana-tts/app/init.py
+++ b/services/mana-tts/app/init.py
--- a/services/mana-tts/app/audio_utils.py
+++ b/services/mana-tts/app/audio_utils.py
@ -0,0 +1,224 @@
+"""
+Audio conversion utilities for the TTS service.
+Handles format conversion between WAV and MP3.
+"""
+
+import io
+import logging
+import tempfile
+from pathlib import Path
+from typing import Optional
+
+import numpy as np
+import soundfile as sf
+
+logger = logging.getLogger(__name__)
+
+# Supported output formats
+SUPPORTED_FORMATS = ["wav", "mp3"]
+DEFAULT_FORMAT = "wav"
+DEFAULT_SAMPLE_RATE = 24000
+
+
+def audio_to_wav_bytes(
+    audio_data: np.ndarray,
+    sample_rate: int = DEFAULT_SAMPLE_RATE,
+) -> bytes:
+    """
+    Convert numpy audio array to WAV bytes.
+
+    Args:
+        audio_data: Audio samples as numpy array
+        sample_rate: Sample rate in Hz
+
+    Returns:
+        WAV file as bytes
+    """
+    buffer = io.BytesIO()
+    sf.write(buffer, audio_data, sample_rate, format="WAV")
+    buffer.seek(0)
+    return buffer.read()
+
+
+def audio_to_mp3_bytes(
+    audio_data: np.ndarray,
+    sample_rate: int = DEFAULT_SAMPLE_RATE,
+    bitrate: str = "192k",
+) -> bytes:
+    """
+    Convert numpy audio array to MP3 bytes.
+    Requires ffmpeg to be installed.
+
+    Args:
+        audio_data: Audio samples as numpy array
+        sample_rate: Sample rate in Hz
+        bitrate: MP3 bitrate (e.g., "128k", "192k", "320k")
+
+    Returns:
+        MP3 file as bytes
+    """
+    try:
+        from pydub import AudioSegment
+    except ImportError:
+        logger.error("pydub not installed, falling back to WAV")
+        return audio_to_wav_bytes(audio_data, sample_rate)
+
+    # First convert to WAV
+    wav_bytes = audio_to_wav_bytes(audio_data, sample_rate)
+
+    # Then convert to MP3 using pydub
+    try:
+        audio_segment = AudioSegment.from_wav(io.BytesIO(wav_bytes))
+        buffer = io.BytesIO()
+        audio_segment.export(buffer, format="mp3", bitrate=bitrate)
+        buffer.seek(0)
+        return buffer.read()
+    except Exception as e:
+        logger.error(f"MP3 conversion failed: {e}, falling back to WAV")
+        return wav_bytes
+
+
+def convert_audio(
+    audio_data: np.ndarray,
+    sample_rate: int = DEFAULT_SAMPLE_RATE,
+    output_format: str = DEFAULT_FORMAT,
+) -> tuple[bytes, str]:
+    """
+    Convert audio data to the specified format.
+
+    Args:
+        audio_data: Audio samples as numpy array
+        sample_rate: Sample rate in Hz
+        output_format: Output format ("wav" or "mp3")
+
+    Returns:
+        Tuple of (audio bytes, content type)
+    """
+    output_format = output_format.lower()
+
+    if output_format not in SUPPORTED_FORMATS:
+        logger.warning(f"Unsupported format '{output_format}', using WAV")
+        output_format = "wav"
+
+    if output_format == "mp3":
+        return audio_to_mp3_bytes(audio_data, sample_rate), "audio/mpeg"
+    else:
+        return audio_to_wav_bytes(audio_data, sample_rate), "audio/wav"
+
+
+def get_content_type(format: str) -> str:
+    """Get MIME content type for audio format."""
+    content_types = {
+        "wav": "audio/wav",
+        "mp3": "audio/mpeg",
+    }
+    return content_types.get(format.lower(), "audio/wav")
+
+
+def load_reference_audio(
+    file_path: str | Path,
+) -> tuple[np.ndarray, int]:
+    """
+    Load reference audio file for voice cloning.
+
+    Args:
+        file_path: Path to the audio file
+
+    Returns:
+        Tuple of (audio data as numpy array, sample rate)
+    """
+    audio_data, sample_rate = sf.read(file_path)
+
+    # Convert to mono if stereo
+    if len(audio_data.shape) > 1:
+        audio_data = np.mean(audio_data, axis=1)
+
+    return audio_data, sample_rate
+
+
+def resample_audio(
+    audio_data: np.ndarray,
+    original_sr: int,
+    target_sr: int = DEFAULT_SAMPLE_RATE,
+) -> np.ndarray:
+    """
+    Resample audio to target sample rate.
+
+    Args:
+        audio_data: Audio samples as numpy array
+        original_sr: Original sample rate
+        target_sr: Target sample rate
+
+    Returns:
+        Resampled audio data
+    """
+    if original_sr == target_sr:
+        return audio_data
+
+    from scipy import signal
+
+    # Calculate resampling ratio
+    num_samples = int(len(audio_data) * target_sr / original_sr)
+    resampled = signal.resample(audio_data, num_samples)
+
+    return resampled.astype(np.float32)
+
+
+def normalize_audio(
+    audio_data: np.ndarray,
+    target_db: float = -3.0,
+) -> np.ndarray:
+    """
+    Normalize audio to target dB level.
+
+    Args:
+        audio_data: Audio samples as numpy array
+        target_db: Target peak level in dB
+
+    Returns:
+        Normalized audio data
+    """
+    # Calculate current peak
+    peak = np.max(np.abs(audio_data))
+
+    if peak == 0:
+        return audio_data
+
+    # Calculate target peak from dB
+    target_peak = 10 ** (target_db / 20)
+
+    # Apply gain
+    gain = target_peak / peak
+    return audio_data * gain
+
+
+def save_temp_audio(
+    audio_bytes: bytes,
+    suffix: str = ".wav",
+) -> str:
+    """
+    Save audio bytes to a temporary file.
+
+    Args:
+        audio_bytes: Audio data as bytes
+        suffix: File extension
+
+    Returns:
+        Path to temporary file
+    """
+    with tempfile.NamedTemporaryFile(suffix=suffix, delete=False) as tmp:
+        tmp.write(audio_bytes)
+        return tmp.name
+
+
+def cleanup_temp_file(file_path: str) -> None:
+    """
+    Clean up a temporary file.
+
+    Args:
+        file_path: Path to the file to delete
+    """
+    try:
+        Path(file_path).unlink()
+    except Exception:
+        pass  # Silent cleanup failure
--- a/services/mana-tts/app/f5_service.py
+++ b/services/mana-tts/app/f5_service.py
@ -0,0 +1,208 @@
+"""
+F5-TTS Service for voice cloning synthesis.
+Uses f5-tts-mlx optimized for Apple Silicon.
+"""
+
+import logging
+import os
+import tempfile
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Optional
+
+import numpy as np
+
+logger = logging.getLogger(__name__)
+
+# Global singleton for lazy initialization
+_f5_model = None
+_f5_model_name = None
+
+# Default model
+DEFAULT_F5_MODEL = os.getenv("F5_MODEL", "lucasnewman/f5-tts-mlx")
+
+# Default generation parameters
+DEFAULT_DURATION = 10.0  # seconds
+DEFAULT_STEPS = 32
+DEFAULT_CFG_STRENGTH = 2.0
+DEFAULT_SWAY_COEF = -1.0
+DEFAULT_SPEED = 1.0
+
+
+@dataclass
+class F5Result:
+    """Result from F5-TTS synthesis."""
+
+    audio: np.ndarray
+    sample_rate: int
+    duration: float
+    voice_id: Optional[str] = None
+
+
+def get_f5_model(model_name: str = DEFAULT_F5_MODEL):
+    """
+    Get or create F5-TTS model instance (singleton pattern).
+
+    Args:
+        model_name: HuggingFace model identifier
+
+    Returns:
+        F5TTS model instance
+    """
+    global _f5_model, _f5_model_name
+
+    # Return existing model if same model name
+    if _f5_model is not None and _f5_model_name == model_name:
+        return _f5_model
+
+    logger.info(f"Loading F5-TTS model: {model_name}")
+
+    try:
+        from f5_tts_mlx import F5TTS
+
+        _f5_model = F5TTS(model_name=model_name)
+        _f5_model_name = model_name
+        logger.info("F5-TTS model loaded successfully")
+        return _f5_model
+
+    except ImportError as e:
+        logger.error(f"Failed to import f5_tts_mlx: {e}")
+        raise RuntimeError(
+            "f5-tts-mlx not installed. Run: pip install f5-tts-mlx"
+        )
+    except Exception as e:
+        logger.error(f"Failed to load F5-TTS model: {e}")
+        raise
+
+
+def is_f5_loaded() -> bool:
+    """Check if F5-TTS model is currently loaded."""
+    return _f5_model is not None
+
+
+async def synthesize_f5(
+    text: str,
+    reference_audio_path: str,
+    reference_text: str,
+    duration: Optional[float] = None,
+    steps: int = DEFAULT_STEPS,
+    cfg_strength: float = DEFAULT_CFG_STRENGTH,
+    sway_coef: float = DEFAULT_SWAY_COEF,
+    speed: float = DEFAULT_SPEED,
+    model_name: str = DEFAULT_F5_MODEL,
+) -> F5Result:
+    """
+    Synthesize speech using F5-TTS with voice cloning.
+
+    Args:
+        text: Text to synthesize
+        reference_audio_path: Path to reference audio file
+        reference_text: Transcript of the reference audio
+        duration: Target duration in seconds (auto-calculated if None)
+        steps: Number of diffusion steps
+        cfg_strength: Classifier-free guidance strength
+        sway_coef: Sway sampling coefficient
+        speed: Speech speed multiplier
+        model_name: HuggingFace model identifier
+
+    Returns:
+        F5Result with audio data
+    """
+    # Get model
+    model = get_f5_model(model_name)
+
+    logger.info(
+        f"Synthesizing with F5-TTS: text_length={len(text)}, "
+        f"ref_audio={reference_audio_path}, steps={steps}"
+    )
+
+    try:
+        # Generate audio
+        audio, sample_rate = model.generate(
+            text=text,
+            ref_audio_path=reference_audio_path,
+            ref_audio_text=reference_text,
+            duration=duration,
+            steps=steps,
+            cfg_strength=cfg_strength,
+            sway_coef=sway_coef,
+            speed=speed,
+        )
+
+        # Calculate duration
+        audio_duration = len(audio) / sample_rate
+
+        logger.info(f"F5-TTS synthesis complete: duration={audio_duration:.2f}s")
+
+        return F5Result(
+            audio=audio,
+            sample_rate=sample_rate,
+            duration=audio_duration,
+        )
+
+    except Exception as e:
+        logger.error(f"F5-TTS synthesis failed: {e}")
+        raise RuntimeError(f"Voice cloning synthesis failed: {e}")
+
+
+async def synthesize_f5_from_bytes(
+    text: str,
+    reference_audio_bytes: bytes,
+    reference_text: str,
+    audio_extension: str = ".wav",
+    **kwargs,
+) -> F5Result:
+    """
+    Synthesize speech using F5-TTS with reference audio as bytes.
+
+    Args:
+        text: Text to synthesize
+        reference_audio_bytes: Reference audio as bytes
+        reference_text: Transcript of the reference audio
+        audio_extension: File extension for temp file
+        **kwargs: Additional arguments passed to synthesize_f5
+
+    Returns:
+        F5Result with audio data
+    """
+    # Save reference audio to temp file
+    with tempfile.NamedTemporaryFile(
+        suffix=audio_extension,
+        delete=False,
+    ) as tmp:
+        tmp.write(reference_audio_bytes)
+        tmp_path = tmp.name
+
+    try:
+        result = await synthesize_f5(
+            text=text,
+            reference_audio_path=tmp_path,
+            reference_text=reference_text,
+            **kwargs,
+        )
+        return result
+    finally:
+        # Clean up temp file
+        try:
+            Path(tmp_path).unlink()
+        except Exception:
+            pass
+
+
+def estimate_duration(text: str, speed: float = 1.0) -> float:
+    """
+    Estimate audio duration from text.
+
+    Args:
+        text: Text to synthesize
+        speed: Speech speed multiplier
+
+    Returns:
+        Estimated duration in seconds
+    """
+    # Rough estimate: ~150 words per minute at normal speed
+    # Average word length: ~5 characters
+    words = len(text) / 5
+    minutes = words / 150
+    seconds = minutes * 60
+    return seconds / speed
--- a/services/mana-tts/app/kokoro_service.py
+++ b/services/mana-tts/app/kokoro_service.py
@ -0,0 +1,187 @@
+"""
+Kokoro TTS Service for fast preset voice synthesis.
+Uses mlx-audio's Kokoro implementation optimized for Apple Silicon.
+"""
+
+import logging
+from dataclasses import dataclass
+from typing import Optional
+
+import numpy as np
+
+logger = logging.getLogger(__name__)
+
+# Global singleton for lazy initialization
+_kokoro_model = None
+_kokoro_model_name = None
+
+# Default model
+DEFAULT_KOKORO_MODEL = "mlx-community/Kokoro-82M-bf16"
+
+# Available Kokoro voices (American Female/Male, British Female/Male)
+KOKORO_VOICES = {
+    # American Female voices
+    "af_heart": "American Female - Heart (warm, emotional)",
+    "af_alloy": "American Female - Alloy (neutral, professional)",
+    "af_aoede": "American Female - Aoede (clear, articulate)",
+    "af_bella": "American Female - Bella (friendly, approachable)",
+    "af_jessica": "American Female - Jessica (confident, clear)",
+    "af_kore": "American Female - Kore (calm, measured)",
+    "af_nicole": "American Female - Nicole (bright, energetic)",
+    "af_nova": "American Female - Nova (modern, dynamic)",
+    "af_river": "American Female - River (smooth, flowing)",
+    "af_sarah": "American Female - Sarah (warm, conversational)",
+    "af_sky": "American Female - Sky (light, airy)",
+    # American Male voices
+    "am_adam": "American Male - Adam (deep, authoritative)",
+    "am_echo": "American Male - Echo (resonant, clear)",
+    "am_eric": "American Male - Eric (professional, neutral)",
+    "am_fenrir": "American Male - Fenrir (strong, commanding)",
+    "am_liam": "American Male - Liam (friendly, casual)",
+    "am_michael": "American Male - Michael (warm, trustworthy)",
+    "am_onyx": "American Male - Onyx (deep, smooth)",
+    "am_puck": "American Male - Puck (playful, light)",
+    # British Female voices
+    "bf_alice": "British Female - Alice (refined, elegant)",
+    "bf_emma": "British Female - Emma (clear, professional)",
+    "bf_isabella": "British Female - Isabella (sophisticated, warm)",
+    "bf_lily": "British Female - Lily (soft, gentle)",
+    # British Male voices
+    "bm_daniel": "British Male - Daniel (classic, authoritative)",
+    "bm_fable": "British Male - Fable (storyteller, expressive)",
+    "bm_george": "British Male - George (traditional, clear)",
+    "bm_lewis": "British Male - Lewis (modern, approachable)",
+}
+
+DEFAULT_VOICE = "af_heart"
+
+
+@dataclass
+class KokoroResult:
+    """Result from Kokoro TTS synthesis."""
+
+    audio: np.ndarray
+    sample_rate: int
+    voice: str
+    duration: float
+
+
+def get_kokoro_model(model_name: str = DEFAULT_KOKORO_MODEL):
+    """
+    Get or create Kokoro model instance (singleton pattern).
+
+    Args:
+        model_name: HuggingFace model identifier
+
+    Returns:
+        Kokoro model instance
+    """
+    global _kokoro_model, _kokoro_model_name
+
+    # Return existing model if same model name
+    if _kokoro_model is not None and _kokoro_model_name == model_name:
+        return _kokoro_model
+
+    logger.info(f"Loading Kokoro model: {model_name}")
+
+    try:
+        from mlx_audio.tts import load
+
+        _kokoro_model = load(model_name)
+        _kokoro_model_name = model_name
+        logger.info("Kokoro model loaded successfully")
+        return _kokoro_model
+
+    except ImportError as e:
+        logger.error(f"Failed to import mlx_audio: {e}")
+        raise RuntimeError(
+            "mlx-audio not installed. Run: pip install mlx-audio"
+        )
+    except Exception as e:
+        logger.error(f"Failed to load Kokoro model: {e}")
+        raise
+
+
+def is_kokoro_loaded() -> bool:
+    """Check if Kokoro model is currently loaded."""
+    return _kokoro_model is not None
+
+
+def get_available_voices() -> dict[str, str]:
+    """Get dictionary of available Kokoro voices."""
+    return KOKORO_VOICES.copy()
+
+
+async def synthesize_kokoro(
+    text: str,
+    voice: str = DEFAULT_VOICE,
+    speed: float = 1.0,
+    model_name: str = DEFAULT_KOKORO_MODEL,
+) -> KokoroResult:
+    """
+    Synthesize speech using Kokoro TTS.
+
+    Args:
+        text: Text to synthesize
+        voice: Voice ID from KOKORO_VOICES
+        speed: Speech speed multiplier (0.5-2.0)
+        model_name: HuggingFace model identifier
+
+    Returns:
+        KokoroResult with audio data
+    """
+    # Validate voice
+    if voice not in KOKORO_VOICES:
+        logger.warning(f"Unknown voice '{voice}', using default '{DEFAULT_VOICE}'")
+        voice = DEFAULT_VOICE
+
+    # Clamp speed to valid range
+    speed = max(0.5, min(2.0, speed))
+
+    # Get model
+    model = get_kokoro_model(model_name)
+
+    logger.info(f"Synthesizing with Kokoro: voice={voice}, speed={speed}, text_length={len(text)}")
+
+    try:
+        # Generate audio using mlx-audio's generate method
+        # Returns a generator of GenerationResult objects
+        result_gen = model.generate(
+            text=text,
+            voice=voice,
+            speed=speed,
+        )
+
+        # Collect all audio chunks from the generator
+        audio_chunks = []
+        sample_rate = 24000  # Default, will be updated from result
+
+        for result in result_gen:
+            # Each result has audio, sample_rate, audio_duration (string)
+            sample_rate = result.sample_rate
+
+            # Convert MLX array to numpy
+            audio_np = np.array(result.audio, dtype=np.float32)
+            audio_chunks.append(audio_np)
+
+        # Concatenate all chunks
+        if audio_chunks:
+            full_audio = np.concatenate(audio_chunks)
+        else:
+            raise RuntimeError("No audio generated")
+
+        # Calculate duration from audio length
+        total_duration = len(full_audio) / sample_rate
+
+        logger.info(f"Kokoro synthesis complete: duration={total_duration:.2f}s")
+
+        return KokoroResult(
+            audio=full_audio,
+            sample_rate=sample_rate,
+            voice=voice,
+            duration=total_duration,
+        )
+
+    except Exception as e:
+        logger.error(f"Kokoro synthesis failed: {e}")
+        raise RuntimeError(f"TTS synthesis failed: {e}")
--- a/services/mana-tts/app/main.py
+++ b/services/mana-tts/app/main.py
@ -0,0 +1,625 @@
+"""
+Mana TTS - Text-to-Speech Microservice
+
+Provides TTS synthesis using:
+- Kokoro: Fast preset voices
+- F5-TTS: Voice cloning with reference audio
+
+Optimized for Apple Silicon (MLX).
+"""
+
+import logging
+import os
+from contextlib import asynccontextmanager
+from pathlib import Path
+from typing import Optional
+
+from fastapi import FastAPI, HTTPException, UploadFile, File, Form, Response
+from fastapi.middleware.cors import CORSMiddleware
+from pydantic import BaseModel, Field
+
+from .audio_utils import convert_audio, SUPPORTED_FORMATS, cleanup_temp_file, save_temp_audio
+from .kokoro_service import (
+    synthesize_kokoro,
+    get_kokoro_model,
+    is_kokoro_loaded,
+    KOKORO_VOICES,
+    DEFAULT_VOICE as DEFAULT_KOKORO_VOICE,
+    DEFAULT_KOKORO_MODEL,
+)
+from .f5_service import (
+    synthesize_f5,
+    synthesize_f5_from_bytes,
+    get_f5_model,
+    is_f5_loaded,
+    DEFAULT_F5_MODEL,
+)
+from .voice_manager import get_voice_manager, CustomVoice
+
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+)
+logger = logging.getLogger(__name__)
+
+# Configuration from environment
+PORT = int(os.getenv("PORT", "3022"))
+PRELOAD_MODELS = os.getenv("PRELOAD_MODELS", "false").lower() == "true"
+MAX_TEXT_LENGTH = int(os.getenv("MAX_TEXT_LENGTH", "1000"))
+CORS_ORIGINS = os.getenv(
+    "CORS_ORIGINS",
+    "https://mana.how,https://chat.mana.how,https://todo.mana.how,http://localhost:5173",
+).split(",")
+
+# Supported audio extensions for uploads
+SUPPORTED_AUDIO_EXTENSIONS = {".wav", ".mp3", ".m4a", ".flac", ".ogg"}
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    """Application lifespan manager for startup/shutdown."""
+    logger.info(f"Starting Mana TTS service on port {PORT}")
+
+    # Initialize voice manager (scans voices directory)
+    voice_manager = get_voice_manager()
+    logger.info(f"Voice manager initialized with {len(voice_manager.list_voices())} custom voices")
+
+    if PRELOAD_MODELS:
+        logger.info("Pre-loading models (PRELOAD_MODELS=true)...")
+        try:
+            get_kokoro_model()
+            logger.info("Kokoro model pre-loaded")
+        except Exception as e:
+            logger.warning(f"Failed to pre-load Kokoro: {e}")
+
+        try:
+            get_f5_model()
+            logger.info("F5-TTS model pre-loaded")
+        except Exception as e:
+            logger.warning(f"Failed to pre-load F5-TTS: {e}")
+    else:
+        logger.info("Models will be loaded on first request (lazy loading)")
+
+    yield
+
+    logger.info("Shutting down Mana TTS service")
+
+
+# Create FastAPI app
+app = FastAPI(
+    title="Mana TTS",
+    description="Text-to-Speech service with voice cloning support",
+    version="1.0.0",
+    lifespan=lifespan,
+)
+
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=CORS_ORIGINS,
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+# ============================================================================
+# Request/Response Models
+# ============================================================================
+
+
+class KokoroRequest(BaseModel):
+    """Request for Kokoro TTS synthesis."""
+
+    text: str = Field(..., description="Text to synthesize", max_length=5000)
+    voice: str = Field(DEFAULT_KOKORO_VOICE, description="Voice ID")
+    speed: float = Field(1.0, ge=0.5, le=2.0, description="Speech speed")
+    output_format: str = Field("wav", description="Output format (wav, mp3)")
+
+
+class AutoRequest(BaseModel):
+    """Request for auto-selection TTS synthesis."""
+
+    text: str = Field(..., description="Text to synthesize", max_length=5000)
+    voice: Optional[str] = Field(None, description="Voice ID (Kokoro preset or registered)")
+    speed: float = Field(1.0, ge=0.5, le=2.0, description="Speech speed")
+    output_format: str = Field("wav", description="Output format (wav, mp3)")
+
+
+class RegisterVoiceRequest(BaseModel):
+    """Request to register a new custom voice."""
+
+    voice_id: str = Field(..., description="Unique voice identifier", min_length=2, max_length=50)
+    name: str = Field(..., description="Display name")
+    description: str = Field("", description="Voice description")
+    transcript: str = Field(..., description="Transcript of the reference audio")
+
+
+class HealthResponse(BaseModel):
+    """Health check response."""
+
+    status: str
+    service: str
+    models_loaded: dict
+
+
+class ModelsResponse(BaseModel):
+    """Available models response."""
+
+    kokoro: dict
+    f5: dict
+
+
+class VoiceInfo(BaseModel):
+    """Voice information."""
+
+    id: str
+    name: str
+    description: str
+    type: str  # "kokoro" or "f5_custom"
+
+
+class VoicesResponse(BaseModel):
+    """Available voices response."""
+
+    kokoro_voices: list[VoiceInfo]
+    custom_voices: list[VoiceInfo]
+
+
+class VoiceRegisteredResponse(BaseModel):
+    """Response after registering a voice."""
+
+    voice_id: str
+    message: str
+
+
+class VoiceDeletedResponse(BaseModel):
+    """Response after deleting a voice."""
+
+    voice_id: str
+    message: str
+
+
+# ============================================================================
+# Health & Info Endpoints
+# ============================================================================
+
+
+@app.get("/health", response_model=HealthResponse)
+async def health_check():
+    """Check service health and model status."""
+    return HealthResponse(
+        status="healthy",
+        service="mana-tts",
+        models_loaded={
+            "kokoro": is_kokoro_loaded(),
+            "f5": is_f5_loaded(),
+        },
+    )
+
+
+@app.get("/models", response_model=ModelsResponse)
+async def get_models():
+    """Get information about available models."""
+    return ModelsResponse(
+        kokoro={
+            "name": "Kokoro-82M",
+            "description": "Fast TTS with preset voices",
+            "model_id": DEFAULT_KOKORO_MODEL,
+            "loaded": is_kokoro_loaded(),
+            "voice_count": len(KOKORO_VOICES),
+        },
+        f5={
+            "name": "F5-TTS",
+            "description": "Voice cloning with reference audio",
+            "model_id": DEFAULT_F5_MODEL,
+            "loaded": is_f5_loaded(),
+            "supports_cloning": True,
+        },
+    )
+
+
+# ============================================================================
+# Voice Management Endpoints
+# ============================================================================
+
+
+@app.get("/voices", response_model=VoicesResponse)
+async def get_voices():
+    """Get all available voices."""
+    # Kokoro preset voices
+    kokoro_voices = [
+        VoiceInfo(
+            id=voice_id,
+            name=voice_id,
+            description=description,
+            type="kokoro",
+        )
+        for voice_id, description in KOKORO_VOICES.items()
+    ]
+
+    # Custom voices from voice manager
+    voice_manager = get_voice_manager()
+    custom_voices = [
+        VoiceInfo(
+            id=voice.id,
+            name=voice.name,
+            description=voice.description,
+            type="f5_custom",
+        )
+        for voice in voice_manager.list_voices()
+    ]
+
+    return VoicesResponse(
+        kokoro_voices=kokoro_voices,
+        custom_voices=custom_voices,
+    )
+
+
+@app.post("/voices", response_model=VoiceRegisteredResponse)
+async def register_voice(
+    voice_id: str = Form(..., description="Unique voice identifier"),
+    name: str = Form(..., description="Display name"),
+    description: str = Form("", description="Voice description"),
+    transcript: str = Form(..., description="Transcript of the reference audio"),
+    reference_audio: UploadFile = File(..., description="Reference audio file"),
+):
+    """
+    Register a new custom voice for F5-TTS voice cloning.
+
+    Requires:
+    - Reference audio file (WAV, MP3, M4A, FLAC, OGG)
+    - Transcript of what is said in the audio
+    """
+    # Validate file extension
+    if reference_audio.filename:
+        ext = Path(reference_audio.filename).suffix.lower()
+        if ext not in SUPPORTED_AUDIO_EXTENSIONS:
+            raise HTTPException(
+                status_code=400,
+                detail=f"Unsupported audio format. Use one of: {SUPPORTED_AUDIO_EXTENSIONS}",
+            )
+    else:
+        ext = ".wav"
+
+    # Read audio bytes
+    audio_bytes = await reference_audio.read()
+
+    if len(audio_bytes) == 0:
+        raise HTTPException(status_code=400, detail="Audio file is empty")
+
+    if len(audio_bytes) > 50 * 1024 * 1024:  # 50 MB limit
+        raise HTTPException(status_code=400, detail="Audio file too large (max 50 MB)")
+
+    # Register voice
+    voice_manager = get_voice_manager()
+    try:
+        voice_manager.register_voice(
+            voice_id=voice_id,
+            name=name,
+            description=description,
+            audio_bytes=audio_bytes,
+            transcript=transcript,
+            audio_extension=ext,
+        )
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    return VoiceRegisteredResponse(
+        voice_id=voice_id,
+        message=f"Voice '{voice_id}' registered successfully",
+    )
+
+
+@app.delete("/voices/{voice_id}", response_model=VoiceDeletedResponse)
+async def delete_voice(voice_id: str):
+    """Delete a registered custom voice."""
+    voice_manager = get_voice_manager()
+
+    if not voice_manager.delete_voice(voice_id):
+        raise HTTPException(status_code=404, detail=f"Voice '{voice_id}' not found")
+
+    return VoiceDeletedResponse(
+        voice_id=voice_id,
+        message=f"Voice '{voice_id}' deleted successfully",
+    )
+
+
+# ============================================================================
+# Kokoro TTS Endpoint
+# ============================================================================
+
+
+@app.post("/synthesize/kokoro")
+async def synthesize_with_kokoro(request: KokoroRequest):
+    """
+    Synthesize speech using Kokoro with preset voices.
+
+    Fast synthesis with high-quality preset voices.
+    """
+    # Validate text length
+    if len(request.text) > MAX_TEXT_LENGTH:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
+        )
+
+    if not request.text.strip():
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+
+    # Validate output format
+    output_format = request.output_format.lower()
+    if output_format not in SUPPORTED_FORMATS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unsupported format. Use one of: {SUPPORTED_FORMATS}",
+        )
+
+    try:
+        # Synthesize
+        result = await synthesize_kokoro(
+            text=request.text,
+            voice=request.voice,
+            speed=request.speed,
+        )
+
+        # Convert to requested format
+        audio_bytes, content_type = convert_audio(
+            result.audio,
+            result.sample_rate,
+            output_format,
+        )
+
+        # Return audio response
+        return Response(
+            content=audio_bytes,
+            media_type=content_type,
+            headers={
+                "X-Voice": result.voice,
+                "X-Duration": str(result.duration),
+                "X-Sample-Rate": str(result.sample_rate),
+            },
+        )
+
+    except RuntimeError as e:
+        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        logger.error(f"Kokoro synthesis error: {e}")
+        raise HTTPException(status_code=500, detail=f"Synthesis failed: {e}")
+
+
+# ============================================================================
+# F5-TTS Endpoint
+# ============================================================================
+
+
+@app.post("/synthesize")
+async def synthesize_with_f5(
+    text: str = Form(..., description="Text to synthesize"),
+    voice_id: Optional[str] = Form(None, description="Registered voice ID"),
+    reference_audio: Optional[UploadFile] = File(None, description="Reference audio for cloning"),
+    reference_text: Optional[str] = Form(None, description="Transcript of reference audio"),
+    output_format: str = Form("wav", description="Output format (wav, mp3)"),
+    speed: float = Form(1.0, ge=0.5, le=2.0, description="Speech speed"),
+    steps: int = Form(32, ge=8, le=64, description="Diffusion steps"),
+):
+    """
+    Synthesize speech using F5-TTS with voice cloning.
+
+    Provide either:
+    - voice_id: Use a pre-registered voice
+    - reference_audio + reference_text: Clone voice from audio sample
+    """
+    # Validate text
+    if len(text) > MAX_TEXT_LENGTH:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
+        )
+
+    if not text.strip():
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+
+    # Validate output format
+    output_format = output_format.lower()
+    if output_format not in SUPPORTED_FORMATS:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Unsupported format. Use one of: {SUPPORTED_FORMATS}",
+        )
+
+    voice_manager = get_voice_manager()
+    ref_audio_path: Optional[str] = None
+    ref_text: Optional[str] = None
+    temp_file_path: Optional[str] = None
+
+    try:
+        # Option 1: Use registered voice
+        if voice_id:
+            voice = voice_manager.get_voice(voice_id)
+            if not voice:
+                raise HTTPException(
+                    status_code=404,
+                    detail=f"Voice '{voice_id}' not found. Register it first or provide reference audio.",
+                )
+            ref_audio_path = voice.audio_path
+            ref_text = voice.transcript
+
+        # Option 2: Use uploaded reference audio
+        elif reference_audio and reference_text:
+            # Get file extension
+            ext = ".wav"
+            if reference_audio.filename:
+                ext = Path(reference_audio.filename).suffix.lower()
+                if ext not in SUPPORTED_AUDIO_EXTENSIONS:
+                    raise HTTPException(
+                        status_code=400,
+                        detail=f"Unsupported audio format. Use one of: {SUPPORTED_AUDIO_EXTENSIONS}",
+                    )
+
+            # Read and save to temp file
+            audio_bytes = await reference_audio.read()
+            if len(audio_bytes) == 0:
+                raise HTTPException(status_code=400, detail="Reference audio is empty")
+
+            temp_file_path = save_temp_audio(audio_bytes, suffix=ext)
+            ref_audio_path = temp_file_path
+            ref_text = reference_text
+
+        else:
+            raise HTTPException(
+                status_code=400,
+                detail="Provide either voice_id or reference_audio + reference_text",
+            )
+
+        # Synthesize with F5-TTS
+        result = await synthesize_f5(
+            text=text,
+            reference_audio_path=ref_audio_path,
+            reference_text=ref_text,
+            speed=speed,
+            steps=steps,
+        )
+
+        # Convert to requested format
+        audio_bytes, content_type = convert_audio(
+            result.audio,
+            result.sample_rate,
+            output_format,
+        )
+
+        # Return audio response
+        return Response(
+            content=audio_bytes,
+            media_type=content_type,
+            headers={
+                "X-Model": "f5-tts",
+                "X-Voice-ID": voice_id or "custom",
+                "X-Duration": str(result.duration),
+                "X-Sample-Rate": str(result.sample_rate),
+            },
+        )
+
+    except HTTPException:
+        raise
+    except RuntimeError as e:
+        raise HTTPException(status_code=500, detail=str(e))
+    except Exception as e:
+        logger.error(f"F5-TTS synthesis error: {e}")
+        raise HTTPException(status_code=500, detail=f"Voice cloning synthesis failed: {e}")
+    finally:
+        # Clean up temp file
+        if temp_file_path:
+            cleanup_temp_file(temp_file_path)
+
+
+# ============================================================================
+# Auto-Selection Endpoint
+# ============================================================================
+
+
+@app.post("/synthesize/auto")
+async def synthesize_auto(request: AutoRequest):
+    """
+    Auto-select the best TTS model based on voice parameter.
+
+    - If voice is a Kokoro preset: Use Kokoro
+    - If voice is a registered custom voice: Use F5-TTS
+    - If no voice specified: Use Kokoro with default voice
+    """
+    # Validate text
+    if len(request.text) > MAX_TEXT_LENGTH:
+        raise HTTPException(
+            status_code=400,
+            detail=f"Text exceeds maximum length of {MAX_TEXT_LENGTH} characters",
+        )
+
+    if not request.text.strip():
+        raise HTTPException(status_code=400, detail="Text cannot be empty")
+
+    # Determine which model to use
+    voice = request.voice or DEFAULT_KOKORO_VOICE
+
+    # Check if it's a Kokoro voice
+    if voice in KOKORO_VOICES:
+        kokoro_request = KokoroRequest(
+            text=request.text,
+            voice=voice,
+            speed=request.speed,
+            output_format=request.output_format,
+        )
+        return await synthesize_with_kokoro(kokoro_request)
+
+    # Check if it's a registered custom voice
+    voice_manager = get_voice_manager()
+    if voice_manager.voice_exists(voice):
+        # Use F5-TTS with registered voice
+        # Create a form-like context for the F5 endpoint
+        custom_voice = voice_manager.get_voice(voice)
+        try:
+            result = await synthesize_f5(
+                text=request.text,
+                reference_audio_path=custom_voice.audio_path,
+                reference_text=custom_voice.transcript,
+                speed=request.speed,
+            )
+
+            # Convert to requested format
+            output_format = request.output_format.lower()
+            audio_bytes, content_type = convert_audio(
+                result.audio,
+                result.sample_rate,
+                output_format,
+            )
+
+            return Response(
+                content=audio_bytes,
+                media_type=content_type,
+                headers={
+                    "X-Model": "f5-tts",
+                    "X-Voice-ID": voice,
+                    "X-Duration": str(result.duration),
+                    "X-Sample-Rate": str(result.sample_rate),
+                },
+            )
+        except Exception as e:
+            logger.error(f"F5-TTS auto synthesis error: {e}")
+            raise HTTPException(status_code=500, detail=f"Voice synthesis failed: {e}")
+
+    # Unknown voice - fall back to Kokoro with default
+    logger.warning(f"Unknown voice '{voice}', falling back to Kokoro default")
+    kokoro_request = KokoroRequest(
+        text=request.text,
+        voice=DEFAULT_KOKORO_VOICE,
+        speed=request.speed,
+        output_format=request.output_format,
+    )
+    return await synthesize_with_kokoro(kokoro_request)
+
+
+# ============================================================================
+# Error Handler
+# ============================================================================
+
+
+@app.exception_handler(Exception)
+async def global_exception_handler(request, exc):
+    """Handle uncaught exceptions."""
+    logger.error(f"Unhandled exception: {exc}")
+    return Response(
+        content=f'{{"error": "Internal server error", "detail": "{str(exc)}"}}',
+        status_code=500,
+        media_type="application/json",
+    )
+
+
+# ============================================================================
+# Main
+# ============================================================================
+
+
+if __name__ == "__main__":
+    import uvicorn
+
+    uvicorn.run(app, host="0.0.0.0", port=PORT)
--- a/services/mana-tts/app/voice_manager.py
+++ b/services/mana-tts/app/voice_manager.py
@ -0,0 +1,275 @@
+"""
+Voice Manager for registering and managing custom voices.
+Handles pre-defined voices from the voices/ directory and runtime-registered voices.
+"""
+
+import json
+import logging
+import os
+from dataclasses import dataclass, asdict
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Base directory for voices
+VOICES_DIR = Path(__file__).parent.parent / "voices"
+
+# Registry file for custom voices
+REGISTRY_FILE = VOICES_DIR / "registry.json"
+
+
+@dataclass
+class CustomVoice:
+    """Custom voice registration."""
+
+    id: str
+    name: str
+    description: str
+    audio_path: str
+    transcript: str
+    created_at: str  # ISO format timestamp
+
+
+class VoiceManager:
+    """Manages custom voice registrations for F5-TTS."""
+
+    def __init__(self, voices_dir: Path = VOICES_DIR):
+        self.voices_dir = voices_dir
+        self.registry_file = voices_dir / "registry.json"
+        self._voices: dict[str, CustomVoice] = {}
+        self._load_registry()
+        self._scan_predefined_voices()
+
+    def _load_registry(self) -> None:
+        """Load voice registry from disk."""
+        if not self.registry_file.exists():
+            logger.info("No voice registry found, starting fresh")
+            return
+
+        try:
+            with open(self.registry_file, "r") as f:
+                data = json.load(f)
+
+            for voice_id, voice_data in data.items():
+                # Verify audio file exists
+                if Path(voice_data["audio_path"]).exists():
+                    self._voices[voice_id] = CustomVoice(**voice_data)
+                else:
+                    logger.warning(
+                        f"Voice '{voice_id}' audio file not found: {voice_data['audio_path']}"
+                    )
+
+            logger.info(f"Loaded {len(self._voices)} custom voices from registry")
+
+        except Exception as e:
+            logger.error(f"Failed to load voice registry: {e}")
+
+    def _save_registry(self) -> None:
+        """Save voice registry to disk."""
+        try:
+            data = {
+                voice_id: asdict(voice)
+                for voice_id, voice in self._voices.items()
+            }
+            with open(self.registry_file, "w") as f:
+                json.dump(data, f, indent=2)
+            logger.info("Voice registry saved")
+        except Exception as e:
+            logger.error(f"Failed to save voice registry: {e}")
+
+    def _scan_predefined_voices(self) -> None:
+        """Scan voices directory for pre-defined voices."""
+        if not self.voices_dir.exists():
+            return
+
+        # Look for voice directories with audio + transcript
+        for voice_dir in self.voices_dir.iterdir():
+            if not voice_dir.is_dir():
+                continue
+
+            voice_id = voice_dir.name
+            if voice_id in self._voices:
+                continue  # Already registered
+
+            # Look for audio file
+            audio_file = None
+            for ext in [".wav", ".mp3", ".m4a", ".flac"]:
+                candidate = voice_dir / f"reference{ext}"
+                if candidate.exists():
+                    audio_file = candidate
+                    break
+
+            # Look for transcript
+            transcript_file = voice_dir / "transcript.txt"
+            if not transcript_file.exists():
+                continue
+
+            if not audio_file:
+                logger.warning(f"No reference audio found in {voice_dir}")
+                continue
+
+            # Load transcript
+            try:
+                transcript = transcript_file.read_text().strip()
+            except Exception as e:
+                logger.warning(f"Failed to read transcript for {voice_id}: {e}")
+                continue
+
+            # Load metadata if exists
+            metadata_file = voice_dir / "metadata.json"
+            name = voice_id
+            description = f"Pre-defined voice: {voice_id}"
+
+            if metadata_file.exists():
+                try:
+                    with open(metadata_file, "r") as f:
+                        metadata = json.load(f)
+                    name = metadata.get("name", name)
+                    description = metadata.get("description", description)
+                except Exception:
+                    pass
+
+            # Register pre-defined voice
+            from datetime import datetime
+
+            self._voices[voice_id] = CustomVoice(
+                id=voice_id,
+                name=name,
+                description=description,
+                audio_path=str(audio_file),
+                transcript=transcript,
+                created_at=datetime.now().isoformat(),
+            )
+            logger.info(f"Found pre-defined voice: {voice_id}")
+
+    def register_voice(
+        self,
+        voice_id: str,
+        name: str,
+        description: str,
+        audio_bytes: bytes,
+        transcript: str,
+        audio_extension: str = ".wav",
+    ) -> CustomVoice:
+        """
+        Register a new custom voice.
+
+        Args:
+            voice_id: Unique voice identifier
+            name: Display name
+            description: Voice description
+            audio_bytes: Reference audio data
+            transcript: Transcript of the reference audio
+            audio_extension: Audio file extension
+
+        Returns:
+            Registered CustomVoice
+
+        Raises:
+            ValueError: If voice_id already exists
+        """
+        if voice_id in self._voices:
+            raise ValueError(f"Voice '{voice_id}' already exists")
+
+        # Validate voice_id format
+        if not voice_id.replace("_", "").replace("-", "").isalnum():
+            raise ValueError("Voice ID must be alphanumeric (with _ or -)")
+
+        # Create voice directory
+        voice_dir = self.voices_dir / voice_id
+        voice_dir.mkdir(parents=True, exist_ok=True)
+
+        # Save audio file
+        audio_path = voice_dir / f"reference{audio_extension}"
+        with open(audio_path, "wb") as f:
+            f.write(audio_bytes)
+
+        # Save transcript
+        transcript_file = voice_dir / "transcript.txt"
+        with open(transcript_file, "w") as f:
+            f.write(transcript)
+
+        # Create voice entry
+        from datetime import datetime
+
+        voice = CustomVoice(
+            id=voice_id,
+            name=name,
+            description=description,
+            audio_path=str(audio_path),
+            transcript=transcript,
+            created_at=datetime.now().isoformat(),
+        )
+
+        # Save metadata
+        metadata_file = voice_dir / "metadata.json"
+        with open(metadata_file, "w") as f:
+            json.dump(
+                {"name": name, "description": description},
+                f,
+                indent=2,
+            )
+
+        # Add to registry
+        self._voices[voice_id] = voice
+        self._save_registry()
+
+        logger.info(f"Registered new voice: {voice_id}")
+        return voice
+
+    def get_voice(self, voice_id: str) -> Optional[CustomVoice]:
+        """Get a voice by ID."""
+        return self._voices.get(voice_id)
+
+    def delete_voice(self, voice_id: str) -> bool:
+        """
+        Delete a custom voice.
+
+        Args:
+            voice_id: Voice to delete
+
+        Returns:
+            True if deleted, False if not found
+        """
+        if voice_id not in self._voices:
+            return False
+
+        voice = self._voices[voice_id]
+
+        # Remove voice directory
+        voice_dir = self.voices_dir / voice_id
+        if voice_dir.exists():
+            import shutil
+
+            try:
+                shutil.rmtree(voice_dir)
+            except Exception as e:
+                logger.error(f"Failed to delete voice directory: {e}")
+
+        # Remove from registry
+        del self._voices[voice_id]
+        self._save_registry()
+
+        logger.info(f"Deleted voice: {voice_id}")
+        return True
+
+    def list_voices(self) -> list[CustomVoice]:
+        """List all registered custom voices."""
+        return list(self._voices.values())
+
+    def voice_exists(self, voice_id: str) -> bool:
+        """Check if a voice exists."""
+        return voice_id in self._voices
+
+
+# Global singleton instance
+_voice_manager: Optional[VoiceManager] = None
+
+
+def get_voice_manager() -> VoiceManager:
+    """Get the global VoiceManager instance."""
+    global _voice_manager
+    if _voice_manager is None:
+        _voice_manager = VoiceManager()
+    return _voice_manager
--- a/services/mana-tts/requirements.txt
+++ b/services/mana-tts/requirements.txt
@ -0,0 +1,22 @@
+# Web Framework
+fastapi>=0.115.0
+uvicorn[standard]>=0.34.0
+python-multipart>=0.0.20
+
+# TTS Models (MLX optimized for Apple Silicon)
+f5-tts-mlx>=0.2.6
+mlx-audio>=0.1.0
+mlx>=0.21.0
+
+# Kokoro dependencies (phonemizer)
+misaki[en]>=0.9.0
+
+# Audio Processing
+soundfile>=0.13.0
+scipy>=1.11.0
+numpy>=1.26.0
+pydub>=0.25.1
+tqdm>=4.67.0
+
+# Utilities
+aiofiles>=24.1.0
--- a/services/mana-tts/setup.sh
+++ b/services/mana-tts/setup.sh
@ -0,0 +1,150 @@
+#!/bin/bash
+# Setup script for Mana TTS service
+# Optimized for Apple Silicon (MLX)
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+VENV_DIR="$SCRIPT_DIR/.venv"
+PYTHON_VERSION="3.11"
+
+echo "=========================================="
+echo "Mana TTS Setup"
+echo "=========================================="
+echo ""
+
+# Check platform
+if [[ "$(uname)" != "Darwin" ]]; then
+    echo "Warning: This service is optimized for macOS with Apple Silicon."
+    echo "Some features may not work on other platforms."
+    echo ""
+fi
+
+# Check for Apple Silicon
+if [[ "$(uname -m)" != "arm64" ]]; then
+    echo "Warning: This service is optimized for Apple Silicon (arm64)."
+    echo "Performance may be reduced on Intel Macs."
+    echo ""
+fi
+
+# Find Python
+if command -v python3.11 &> /dev/null; then
+    PYTHON_CMD="python3.11"
+elif command -v python3 &> /dev/null; then
+    PYTHON_CMD="python3"
+else
+    echo "Error: Python 3 not found. Please install Python 3.11 or later."
+    exit 1
+fi
+
+echo "Using Python: $PYTHON_CMD"
+$PYTHON_CMD --version
+echo ""
+
+# Check Python version
+PYTHON_MAJOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.major)")
+PYTHON_MINOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.minor)")
+
+if [[ $PYTHON_MAJOR -lt 3 ]] || [[ $PYTHON_MINOR -lt 10 ]]; then
+    echo "Error: Python 3.10 or later required. Found $PYTHON_MAJOR.$PYTHON_MINOR"
+    exit 1
+fi
+
+# Create or recreate virtual environment
+if [[ -d "$VENV_DIR" ]]; then
+    echo "Virtual environment exists at $VENV_DIR"
+    read -p "Recreate it? (y/N) " -n 1 -r
+    echo ""
+    if [[ $REPLY =~ ^[Yy]$ ]]; then
+        echo "Removing existing virtual environment..."
+        rm -rf "$VENV_DIR"
+        echo "Creating new virtual environment..."
+        $PYTHON_CMD -m venv "$VENV_DIR"
+    fi
+else
+    echo "Creating virtual environment..."
+    $PYTHON_CMD -m venv "$VENV_DIR"
+fi
+
+# Activate virtual environment
+echo "Activating virtual environment..."
+source "$VENV_DIR/bin/activate"
+
+# Upgrade pip
+echo ""
+echo "Upgrading pip..."
+pip install --upgrade pip
+
+# Install dependencies
+echo ""
+echo "Installing dependencies..."
+pip install -r "$SCRIPT_DIR/requirements.txt"
+
+# Install ffmpeg check (for MP3 support)
+echo ""
+echo "Checking for ffmpeg (required for MP3 output)..."
+if command -v ffmpeg &> /dev/null; then
+    echo "ffmpeg found: $(which ffmpeg)"
+else
+    echo "Warning: ffmpeg not found. MP3 output will not work."
+    echo "Install with: brew install ffmpeg"
+fi
+
+# Verify installations
+echo ""
+echo "Verifying installations..."
+
+# Test FastAPI
+python -c "import fastapi; print(f'FastAPI {fastapi.__version__}')" || {
+    echo "Error: FastAPI not installed correctly"
+    exit 1
+}
+
+# Test soundfile
+python -c "import soundfile; print(f'soundfile {soundfile.__version__}')" || {
+    echo "Error: soundfile not installed correctly"
+    exit 1
+}
+
+# Test MLX (on Apple Silicon)
+if [[ "$(uname -m)" == "arm64" ]]; then
+    python -c "import mlx; print(f'MLX {mlx.__version__}')" || {
+        echo "Warning: MLX not installed correctly. TTS may not work."
+    }
+fi
+
+# Test mlx-audio
+python -c "import mlx_audio; print('mlx-audio installed')" 2>/dev/null || {
+    echo "Warning: mlx-audio not imported successfully."
+    echo "You may need to install it manually or models won't load."
+}
+
+# Create directories
+echo ""
+echo "Creating required directories..."
+mkdir -p "$SCRIPT_DIR/voices"
+mkdir -p "$SCRIPT_DIR/mlx_models"
+
+echo ""
+echo "=========================================="
+echo "Setup Complete!"
+echo "=========================================="
+echo ""
+echo "To start the service:"
+echo ""
+echo "  cd $SCRIPT_DIR"
+echo "  source .venv/bin/activate"
+echo "  uvicorn app.main:app --host 0.0.0.0 --port 3022"
+echo ""
+echo "Or for development with auto-reload:"
+echo ""
+echo "  uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload"
+echo ""
+echo "Test the service:"
+echo ""
+echo "  curl http://localhost:3022/health"
+echo ""
+echo "For Mac Mini deployment, run:"
+echo ""
+echo "  ./../../scripts/mac-mini/setup-tts.sh"
+echo ""
--- a/services/mana-tts/voices/.gitkeep
+++ b/services/mana-tts/voices/.gitkeep