managarten/services/mana-tts
Till JS 8823cc0bf0 feat(profile): voice interview with pre-rendered TTS audio + Orpheus/Zonos backends
Voice-based interview for the profile module — users choose between text,
voice (question read aloud + mic for answer), or conversation mode (fully
automatic flow with auto-save).

Interview audio:
- 92 pre-rendered MP3 files (23 questions × 4 voices) via Edge TTS
- Voices: Seraphina (DE-f), Florian (DE-m), Leni (CH-f), Jan (CH-m)
- User picks voice via dropdown, persisted in localStorage
- Web Speech API fallback for missing audio files

Profile UI:
- Interview hero block on overview with 3 start modes (text/voice/conversation)
- Voice/conversation toggle + voice picker in interview view
- Mic button on text/textarea/tags inputs for per-question voice input
- Conversation mode: auto-save + auto-advance after STT transcription
- Recording/transcribing/speaking state indicators

mana-tts service:
- New Orpheus TTS backend (German finetune, SNAC codec)
- New Zonos TTS backend (Zyphra, 200k hours, emotion control)
- Endpoints: POST /synthesize/orpheus, POST /synthesize/zonos
- espeak-ng installed on GPU server for Zonos phonemizer
- Compare script for side-by-side voice quality testing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 15:22:52 +02:00
..
app feat(profile): voice interview with pre-rendered TTS audio + Orpheus/Zonos backends 2026-04-17 15:22:52 +02:00
scripts feat(profile): voice interview with pre-rendered TTS audio + Orpheus/Zonos backends 2026-04-17 15:22:52 +02:00
voices 🌐 feat: add i18n support to 6 web apps 2026-01-29 14:48:35 +01:00
.env.example chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
CLAUDE.md feat(profile): voice interview with pre-rendered TTS audio + Orpheus/Zonos backends 2026-04-17 15:22:52 +02:00
README.md chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) 2026-04-08 13:06:40 +02:00
requirements.txt feat(profile): voice interview with pre-rendered TTS audio + Orpheus/Zonos backends 2026-04-17 15:22:52 +02:00
service.pyw chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts 2026-04-08 12:46:03 +02:00

Mana TTS

Text-to-Speech microservice running on the Windows GPU server (mana-server-gpu, RTX 3090). Wraps Kokoro (English presets), Piper (German, local ONNX), and F5-TTS (CUDA voice cloning).

For architecture, deployment, configuration, and operations see CLAUDE.md and docs/WINDOWS_GPU_SERVER_SETUP.md.

Port: 3022

Public URL

https://gpu-tts.mana.how (via Cloudflare Tunnel + Mac Mini gpu-proxy)

API Endpoints

Endpoint Method Description
/health GET Health check + which backends are loaded
/models GET List available models
/voices GET List preset + custom voices
/voices POST Register a custom voice (reference audio + transcript)
/voices/{id} DELETE Delete a custom voice
/synthesize/kokoro POST Kokoro (English presets)
/synthesize POST F5-TTS voice cloning
/synthesize/auto POST Auto-select best backend for the requested voice

All non-health endpoints require Authorization: Bearer <token>.

Quick Test

curl -X POST https://gpu-tts.mana.how/synthesize/kokoro \
  -H "Authorization: Bearer $INTERNAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Hello world","voice":"af_heart"}' \
  --output test.wav