managarten/services/mana-tts/README.md

# Mana TTS

Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.

## Features

- **Kokoro TTS**: Fast preset voices (~300 MB model)
- **F5-TTS**: Voice cloning with reference audio (~6 GB model)
- **MLX Optimized**: Runs efficiently on Apple Silicon
- **REST API**: FastAPI with OpenAPI documentation

## Quick Start

### Setup

```bash
# Run setup script
./setup.sh

# Or manually
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

### Start Service

```bash
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022
```

### Test

```bash
# Health check
curl http://localhost:3022/health

# Synthesize with Kokoro
curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "af_heart"}' \
  --output test.wav

# Play audio (macOS)
afplay test.wav
```

## API Endpoints

### Health & Info

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/models` | GET | Available models |
| `/voices` | GET | All available voices |

### Synthesis

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/synthesize/kokoro` | POST | Kokoro preset voices |
| `/synthesize` | POST | F5-TTS voice cloning |
| `/synthesize/auto` | POST | Auto-select model |

### Voice Management

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/voices` | POST | Register custom voice |
| `/voices/{id}` | DELETE | Delete custom voice |

## Synthesis Examples

### Kokoro (Fast Preset Voices)

```bash
curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Mana TTS, your personal voice synthesis service.",
    "voice": "af_heart",
    "speed": 1.0,
    "output_format": "wav"
  }' \
  --output output.wav
```

### F5-TTS (Voice Cloning)

```bash
# With reference audio upload
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello, this is a cloned voice speaking." \
  -F "reference_audio=@reference.wav" \
  -F "reference_text=This is what the reference audio says." \
  -F "output_format=wav" \
  --output cloned.wav

# With registered voice
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello from my registered voice." \
  -F "voice_id=my_custom_voice" \
  --output output.wav
```

### Auto-Select

```bash
# Uses Kokoro for preset voices, F5-TTS for custom
curl -X POST http://localhost:3022/synthesize/auto \
  -H "Content-Type: application/json" \
  -d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
  --output output.wav
```

## Available Kokoro Voices

### American Female
- `af_heart` - Warm, emotional (default)
- `af_alloy` - Neutral, professional
- `af_bella` - Friendly, approachable
- `af_jessica` - Confident, clear
- `af_nicole` - Bright, energetic
- `af_nova` - Modern, dynamic
- `af_sarah` - Warm, conversational
- ... and more

### American Male
- `am_adam` - Deep, authoritative
- `am_echo` - Resonant, clear
- `am_eric` - Professional, neutral
- `am_michael` - Warm, trustworthy
- ... and more

### British Female
- `bf_alice` - Refined, elegant
- `bf_emma` - Clear, professional
- `bf_lily` - Soft, gentle

### British Male
- `bm_daniel` - Classic, authoritative
- `bm_fable` - Storyteller, expressive
- `bm_george` - Traditional, clear

## Voice Registration

Register a custom voice for F5-TTS voice cloning:

```bash
curl -X POST http://localhost:3022/voices \
  -F "voice_id=my_voice" \
  -F "name=My Custom Voice" \
  -F "description=A sample voice for testing" \
  -F "transcript=Hello, this is the text spoken in the reference audio." \
  -F "reference_audio=@my_reference.wav"
```

Pre-defined voices can also be placed in the `voices/` directory:

```
voices/
└── my_voice/
    ├── reference.wav       # Reference audio (required)
    ├── transcript.txt      # Transcript of reference (required)
    └── metadata.json       # Name and description (optional)
```

## Configuration

| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3022` | API port |
| `PRELOAD_MODELS` | `false` | Load models on startup |
| `MAX_TEXT_LENGTH` | `1000` | Max characters per request |
| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
| `F5_MODEL` | `lucasnewman/f5-tts-mlx` | F5-TTS model |
| `KOKORO_MODEL` | `mlx-community/Kokoro-82M-bf16` | Kokoro model |

## Mac Mini Deployment

```bash
# Install and start as launchd service
../../scripts/mac-mini/setup-tts.sh

# Service management
launchctl list | grep com.manacore.tts
launchctl unload ~/Library/LaunchAgents/com.manacore.tts.plist
launchctl load ~/Library/LaunchAgents/com.manacore.tts.plist

# View logs
tail -f /tmp/manacore-tts.log
```

## Requirements

- Python 3.10+
- macOS with Apple Silicon (recommended)
- ~7 GB disk space for models
- 16 GB RAM recommended
- ffmpeg (for MP3 output)

## Troubleshooting

### Models Not Loading

```bash
# Check MLX installation
python -c "import mlx; print(mlx.__version__)"

# Check mlx-audio
python -c "import mlx_audio; print('OK')"

# Check f5-tts-mlx
python -c "from f5_tts_mlx import F5TTS; print('OK')"
```

### MP3 Output Not Working

```bash
# Install ffmpeg
brew install ffmpeg

# Verify
ffmpeg -version
```

### Memory Issues

- Reduce `MAX_TEXT_LENGTH` for less memory usage
- Set `PRELOAD_MODELS=false` for lazy loading
- F5-TTS requires ~6 GB, Kokoro ~500 MB

## API Documentation

When running, visit http://localhost:3022/docs for interactive API documentation.