managarten/services/mana-tts
Till JS b8e18b7f82 chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts
The Windows GPU server has been the actual production home for these
services for some time, and the running code there has drifted ahead of
the repo. This sync pulls the live versions back into the repo so the
Windows box is no longer the only place those changes exist.

Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11):

mana-llm:
- src/main.py, src/config.py — small fixes (auth wiring, config tweaks)
- src/api_auth.py — NEW (cross-service GPU_API_KEY validator)
- service.pyw — Windows runner used by the ManaLLM scheduled task
  (sets up logging redirect, loads .env, calls uvicorn)

mana-stt:
- app/main.py — substantial cleanup (684→392 lines), drops the
  whisperx-as-separate-backend branching now that whisper_service.py
  rolls whisperx in directly
- app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines)
- app/auth.py + external_auth.py — significantly expanded auth
- app/vram_manager.py — NEW (shared VRAM accounting helper)
- service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH
  injection, .env loading
- removed: app/whisper_service_cuda.py (folded into whisper_service.py)
- removed: app/whisperx_service.py (folded into whisper_service.py)

mana-tts:
- app/auth.py, external_auth.py — same auth expansion as stt
- app/f5_service.py, kokoro_service.py — Windows tweaks
- app/vram_manager.py — NEW (same shared helper as stt)
- service.pyw — Windows runner

mana-video-gen:
- service.pyw — Windows runner (no other changes; the .py code on the
  GPU box is byte-identical to what's already in the repo)

The service.pyw files contain absolute Windows paths
(C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user
profile. Kept as-is intentionally — they exist to be deployed to that
one machine and any abstraction layer would just hide what's actually
happening. Anyone redeploying to a different layout will need to edit
the path strings, which is a known and obvious change.

Mac-Mini infrastructure for these services (launchd plists, install
scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen
implementation) is still on disk and will be removed in a follow-up
commit, along with replacing mana-image-gen with the Windows
diffusers+CUDA implementation. This commit is just the live-code sync.
2026-04-08 12:46:03 +02:00
..
app chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts 2026-04-08 12:46:03 +02:00
voices 🌐 feat: add i18n support to 6 web apps 2026-01-29 14:48:35 +01:00
.env.example chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
CLAUDE.md 📝 docs(tts): document German voice support (Piper/Kerstin) 2026-02-14 12:21:40 +01:00
com.mana.mana-tts.plist chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
install-service.sh feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
README.md chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
requirements.txt feat(auth): add API key management for STT/TTS services 2026-02-12 02:12:05 +01:00
service.pyw chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts 2026-04-08 12:46:03 +02:00
setup.sh 🌐 feat: add i18n support to 6 web apps 2026-01-29 14:48:35 +01:00

Mana TTS

Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.

Features

  • Kokoro TTS: Fast preset voices (~300 MB model)
  • F5-TTS: Voice cloning with reference audio (~6 GB model)
  • MLX Optimized: Runs efficiently on Apple Silicon
  • REST API: FastAPI with OpenAPI documentation

Quick Start

Setup

# Run setup script
./setup.sh

# Or manually
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Start Service

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022

Test

# Health check
curl http://localhost:3022/health

# Synthesize with Kokoro
curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "af_heart"}' \
  --output test.wav

# Play audio (macOS)
afplay test.wav

API Endpoints

Health & Info

Endpoint Method Description
/health GET Health check
/models GET Available models
/voices GET All available voices

Synthesis

Endpoint Method Description
/synthesize/kokoro POST Kokoro preset voices
/synthesize POST F5-TTS voice cloning
/synthesize/auto POST Auto-select model

Voice Management

Endpoint Method Description
/voices POST Register custom voice
/voices/{id} DELETE Delete custom voice

Synthesis Examples

Kokoro (Fast Preset Voices)

curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Mana TTS, your personal voice synthesis service.",
    "voice": "af_heart",
    "speed": 1.0,
    "output_format": "wav"
  }' \
  --output output.wav

F5-TTS (Voice Cloning)

# With reference audio upload
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello, this is a cloned voice speaking." \
  -F "reference_audio=@reference.wav" \
  -F "reference_text=This is what the reference audio says." \
  -F "output_format=wav" \
  --output cloned.wav

# With registered voice
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello from my registered voice." \
  -F "voice_id=my_custom_voice" \
  --output output.wav

Auto-Select

# Uses Kokoro for preset voices, F5-TTS for custom
curl -X POST http://localhost:3022/synthesize/auto \
  -H "Content-Type: application/json" \
  -d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
  --output output.wav

Available Kokoro Voices

American Female

  • af_heart - Warm, emotional (default)
  • af_alloy - Neutral, professional
  • af_bella - Friendly, approachable
  • af_jessica - Confident, clear
  • af_nicole - Bright, energetic
  • af_nova - Modern, dynamic
  • af_sarah - Warm, conversational
  • ... and more

American Male

  • am_adam - Deep, authoritative
  • am_echo - Resonant, clear
  • am_eric - Professional, neutral
  • am_michael - Warm, trustworthy
  • ... and more

British Female

  • bf_alice - Refined, elegant
  • bf_emma - Clear, professional
  • bf_lily - Soft, gentle

British Male

  • bm_daniel - Classic, authoritative
  • bm_fable - Storyteller, expressive
  • bm_george - Traditional, clear

Voice Registration

Register a custom voice for F5-TTS voice cloning:

curl -X POST http://localhost:3022/voices \
  -F "voice_id=my_voice" \
  -F "name=My Custom Voice" \
  -F "description=A sample voice for testing" \
  -F "transcript=Hello, this is the text spoken in the reference audio." \
  -F "reference_audio=@my_reference.wav"

Pre-defined voices can also be placed in the voices/ directory:

voices/
└── my_voice/
    ├── reference.wav       # Reference audio (required)
    ├── transcript.txt      # Transcript of reference (required)
    └── metadata.json       # Name and description (optional)

Configuration

Variable Default Description
PORT 3022 API port
PRELOAD_MODELS false Load models on startup
MAX_TEXT_LENGTH 1000 Max characters per request
CORS_ORIGINS https://mana.how,... Allowed CORS origins
F5_MODEL lucasnewman/f5-tts-mlx F5-TTS model
KOKORO_MODEL mlx-community/Kokoro-82M-bf16 Kokoro model

Mac Mini Deployment

# Install and start as launchd service
../../scripts/mac-mini/setup-tts.sh

# Service management
launchctl list | grep com.mana.tts
launchctl unload ~/Library/LaunchAgents/com.mana.tts.plist
launchctl load ~/Library/LaunchAgents/com.mana.tts.plist

# View logs
tail -f /tmp/mana-tts.log

Requirements

  • Python 3.10+
  • macOS with Apple Silicon (recommended)
  • ~7 GB disk space for models
  • 16 GB RAM recommended
  • ffmpeg (for MP3 output)

Troubleshooting

Models Not Loading

# Check MLX installation
python -c "import mlx; print(mlx.__version__)"

# Check mlx-audio
python -c "import mlx_audio; print('OK')"

# Check f5-tts-mlx
python -c "from f5_tts_mlx import F5TTS; print('OK')"

MP3 Output Not Working

# Install ffmpeg
brew install ffmpeg

# Verify
ffmpeg -version

Memory Issues

  • Reduce MAX_TEXT_LENGTH for less memory usage
  • Set PRELOAD_MODELS=false for lazy loading
  • F5-TTS requires ~6 GB, Kokoro ~500 MB

API Documentation

When running, visit http://localhost:3022/docs for interactive API documentation.