managarten/services/mana-tts/CLAUDE.md

# CLAUDE.md - Mana TTS Service

## Service Overview

Text-to-Speech microservice using MLX-optimized models for Apple Silicon:

- **Port**: 3022
- **Framework**: Python + FastAPI
- **Models**: Kokoro-82M (fast), F5-TTS (voice cloning)

## Commands

```bash
# Setup
./setup.sh

# Development
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload

# Production (Mac Mini)
../../scripts/mac-mini/setup-tts.sh

# Test
curl http://localhost:3022/health

# English (Kokoro)
curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "af_heart"}' \
  --output test_en.wav

# German (Piper) - use /synthesize/auto
curl -X POST http://localhost:3022/synthesize/auto \
  -H "Content-Type: application/json" \
  -d '{"text": "Hallo Welt", "voice": "de_kerstin"}' \
  --output test_de.wav
```

## File Structure

```
services/mana-tts/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI endpoints
│   ├── kokoro_service.py    # Kokoro TTS (English preset voices)
│   ├── piper_service.py     # Piper TTS (German voices, local)
│   ├── f5_service.py        # F5-TTS (voice cloning)
│   ├── voice_manager.py     # Custom voice registry
│   └── audio_utils.py       # Audio format conversion
├── piper_voices/            # Piper voice models (.onnx)
├── voices/                  # Custom F5 voice storage
├── mlx_models/             # MLX model cache
├── setup.sh                # Setup script
├── requirements.txt
└── README.md
```

## API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/health` | GET | Health check |
| `/models` | GET | Model info |
| `/voices` | GET | List all voices |
| `/voices` | POST | Register custom voice |
| `/voices/{id}` | DELETE | Delete custom voice |
| `/synthesize/kokoro` | POST | Kokoro synthesis |
| `/synthesize` | POST | F5-TTS voice cloning |
| `/synthesize/auto` | POST | Auto-select model |

## Models

### Kokoro-82M (English)
- ~300 MB download
- 30+ preset English voices
- Fast inference
- No reference audio needed

### Piper TTS (German)
- ~63 MB per voice model
- 100% local, GDPR-compliant
- Fast inference on CPU
- Available voices:
  - `de_kerstin` - Female (default)
  - `de_thorsten` - Male
- Fallback to Edge TTS (cloud) if Piper unavailable:
  - `de_katja` - Female (cloud)
  - `de_conrad` - Male (cloud)
  - `de_amala` - Female young (cloud)
  - `de_florian` - Male young (cloud)

### F5-TTS (Voice Cloning)
- ~6 GB download
- Voice cloning capability
- Requires reference audio + transcript
- Higher quality, slower

## Environment Variables

| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3022` | Service port |
| `PRELOAD_MODELS` | `false` | Load on startup |
| `MAX_TEXT_LENGTH` | `1000` | Max chars |
| `CORS_ORIGINS` | (production URLs) | CORS config |

## Key Dependencies

- `fastapi` - Web framework
- `f5-tts-mlx` - Voice cloning model
- `mlx-audio` - Kokoro implementation
- `mlx` - Apple Silicon ML framework
- `piper-tts` - German TTS (local)
- `edge-tts` - German TTS fallback (cloud)
- `soundfile` - Audio I/O
- `pydub` - MP3 conversion

## Development Notes

- Models load lazily on first request (unless `PRELOAD_MODELS=true`)
- Custom voices stored in `voices/` with reference audio + transcript
- Singleton pattern for model instances
- Audio returned as raw bytes with headers for metadata