mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:21:09 +02:00
Voice-based interview for the profile module — users choose between text, voice (question read aloud + mic for answer), or conversation mode (fully automatic flow with auto-save). Interview audio: - 92 pre-rendered MP3 files (23 questions × 4 voices) via Edge TTS - Voices: Seraphina (DE-f), Florian (DE-m), Leni (CH-f), Jan (CH-m) - User picks voice via dropdown, persisted in localStorage - Web Speech API fallback for missing audio files Profile UI: - Interview hero block on overview with 3 start modes (text/voice/conversation) - Voice/conversation toggle + voice picker in interview view - Mic button on text/textarea/tags inputs for per-question voice input - Conversation mode: auto-save + auto-advance after STT transcription - Recording/transcribing/speaking state indicators mana-tts service: - New Orpheus TTS backend (German finetune, SNAC codec) - New Zonos TTS backend (Zyphra, 200k hours, emotion control) - Endpoints: POST /synthesize/orpheus, POST /synthesize/zonos - espeak-ng installed on GPU server for Zonos phonemizer - Compare script for side-by-side voice quality testing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| app | ||
| scripts | ||
| voices | ||
| .env.example | ||
| CLAUDE.md | ||
| README.md | ||
| requirements.txt | ||
| service.pyw | ||
Mana TTS
Text-to-Speech microservice running on the Windows GPU server (mana-server-gpu, RTX 3090). Wraps Kokoro (English presets), Piper (German, local ONNX), and F5-TTS (CUDA voice cloning).
For architecture, deployment, configuration, and operations see CLAUDE.md and docs/WINDOWS_GPU_SERVER_SETUP.md.
Port: 3022
Public URL
https://gpu-tts.mana.how (via Cloudflare Tunnel + Mac Mini gpu-proxy)
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check + which backends are loaded |
/models |
GET | List available models |
/voices |
GET | List preset + custom voices |
/voices |
POST | Register a custom voice (reference audio + transcript) |
/voices/{id} |
DELETE | Delete a custom voice |
/synthesize/kokoro |
POST | Kokoro (English presets) |
/synthesize |
POST | F5-TTS voice cloning |
/synthesize/auto |
POST | Auto-select best backend for the requested voice |
All non-health endpoints require Authorization: Bearer <token>.
Quick Test
curl -X POST https://gpu-tts.mana.how/synthesize/kokoro \
-H "Authorization: Bearer $INTERNAL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text":"Hello world","voice":"af_heart"}' \
--output test.wav