managarten/services/mana-tts/CLAUDE.md
Till JS f4347032ca chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.

Removed (Mac-Mini deployment infrastructure):

services/mana-stt/
- com.mana.mana-stt.plist            (LaunchAgent)
- com.mana.vllm-voxtral.plist        (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh                 (single-service launchd installer)
- install-services.sh                (mana-stt + vllm-voxtral installer)
- setup.sh                           (Mac arm64 installer)
- scripts/setup-vllm.sh              (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh

services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh                           (Mac arm64 installer)

scripts/mac-mini/
- setup-image-gen.sh                 (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist

setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.

Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
  Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
  matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
  Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
  with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
  CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
  list to mention the now-removed plists, added the full GPU service
  port table with public URLs, added a cleanup snippet for any old plists
  still installed on a Mac Mini somewhere
2026-04-08 13:06:40 +02:00

4.2 KiB

mana-tts

Text-to-Speech microservice. Wraps Kokoro (English presets), Piper (German, local ONNX), and F5-TTS (voice cloning) behind a small FastAPI surface. Lives on the Windows GPU server (mana-server-gpu, RTX 3090).

⚠️ Earlier history: this directory used to contain MLX-optimized Mac-Mini code (f5-tts-mlx, mlx-audio, setup.sh with Apple Silicon checks, com.mana.mana-tts.plist launchd setup). All of that moved to the Windows GPU box and was removed from the repo. If you need the MLX path, see git history.

Tech Stack

Layer Technology
Runtime Python 3.11 + uvicorn (Windows)
Framework FastAPI
English (preset) Kokoro-82M (kokoro_service.py)
German (local) Piper ONNX with kerstin_low.onnx and thorsten_medium.onnx voices (piper_service.py)
Voice cloning F5-TTS on CUDA (f5_service.py)
Audio I/O soundfile, pydub
Auth Per-key + internal-key API auth (auth.py) + JWT via mana-auth (external_auth.py)
VRAM Shared vram_manager.py (same module as mana-stt + mana-image-gen)
Process supervision Windows Scheduled Task ManaTTS (AtLogOn)

Port: 3022

Where it runs

Host Path on disk Entrypoint
Windows GPU server (192.168.178.11) C:\mana\services\mana-tts\ service.pyw via Scheduled Task ManaTTS

Public URL: https://gpu-tts.mana.how.

API Endpoints

Method Path Description
GET /health Liveness + which backends are loaded
GET /models Available TTS models
GET /voices List all voices (preset + custom)
POST /voices Register a custom voice (reference audio + transcript)
DELETE /voices/{voice_id} Delete a custom voice
POST /synthesize/kokoro Kokoro synthesis (English presets)
POST /synthesize F5-TTS voice cloning
POST /synthesize/auto Routing helper — picks the right backend for the requested voice

All non-health endpoints require Authorization: Bearer <token> (per-app key, internal key, or mana-auth JWT).

Voices

Kokoro-82M (English presets)

~300 MB download. 30+ preset English voices. Fast, no reference audio needed.

Piper (German, local ONNX)

~63 MB per voice. 100% local, GDPR-compliant. Available:

  • de_kerstin (female, default)
  • de_thorsten (male)

Fallback to Edge TTS cloud voices if Piper isn't loaded.

F5-TTS (voice cloning)

~6 GB. Requires reference audio + transcript. Higher quality, slower. Custom voices live in voices/ (reference audio + transcript per voice ID).

Configuration (.env on the Windows GPU box)

PORT=3022
PRELOAD_MODELS=false
MAX_TEXT_LENGTH=1000
REQUIRE_AUTH=true
API_KEYS=sk-app1:app1,sk-app2:app2
INTERNAL_API_KEY=...
CORS_ORIGINS=https://mana.how,https://chat.mana.how

Code layout

services/mana-tts/
├── app/
│   ├── __init__.py
│   ├── main.py             # FastAPI endpoints
│   ├── kokoro_service.py   # Kokoro (English presets)
│   ├── piper_service.py    # Piper (German, local ONNX)
│   ├── f5_service.py       # F5-TTS (voice cloning, CUDA)
│   ├── voice_manager.py    # Custom voice registry
│   ├── audio_utils.py      # Format conversion, resampling
│   ├── auth.py             # API-key auth
│   ├── external_auth.py    # JWT validation via mana-auth
│   └── vram_manager.py     # Shared VRAM accountant
└── service.pyw             # Windows runner (used by ManaTTS scheduled task)

The Piper voice ONNX files live alongside the service on the GPU box (C:\mana\services\mana-tts\piper_voices\*.onnx) — too big to commit, downloaded once during setup.

Operations

# Status
Get-ScheduledTask -TaskName "ManaTTS" | Format-List TaskName, State
Get-NetTCPConnection -LocalPort 3022 -State Listen

# Restart
Stop-ScheduledTask -TaskName "ManaTTS"
Start-ScheduledTask -TaskName "ManaTTS"

# Logs
Get-Content C:\mana\services\mana-tts\service.log -Tail 50

Reference

  • docs/WINDOWS_GPU_SERVER_SETUP.md — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
  • docs/PORT_SCHEMA.md — port assignments across services