mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-20 03:41:25 +02:00
The Windows GPU server has been the actual production home for these
services for some time, and the running code there has drifted ahead of
the repo. This sync pulls the live versions back into the repo so the
Windows box is no longer the only place those changes exist.
Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11):
mana-llm:
- src/main.py, src/config.py — small fixes (auth wiring, config tweaks)
- src/api_auth.py — NEW (cross-service GPU_API_KEY validator)
- service.pyw — Windows runner used by the ManaLLM scheduled task
(sets up logging redirect, loads .env, calls uvicorn)
mana-stt:
- app/main.py — substantial cleanup (684→392 lines), drops the
whisperx-as-separate-backend branching now that whisper_service.py
rolls whisperx in directly
- app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines)
- app/auth.py + external_auth.py — significantly expanded auth
- app/vram_manager.py — NEW (shared VRAM accounting helper)
- service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH
injection, .env loading
- removed: app/whisper_service_cuda.py (folded into whisper_service.py)
- removed: app/whisperx_service.py (folded into whisper_service.py)
mana-tts:
- app/auth.py, external_auth.py — same auth expansion as stt
- app/f5_service.py, kokoro_service.py — Windows tweaks
- app/vram_manager.py — NEW (same shared helper as stt)
- service.pyw — Windows runner
mana-video-gen:
- service.pyw — Windows runner (no other changes; the .py code on the
GPU box is byte-identical to what's already in the repo)
The service.pyw files contain absolute Windows paths
(C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user
profile. Kept as-is intentionally — they exist to be deployed to that
one machine and any abstraction layer would just hide what's actually
happening. Anyone redeploying to a different layout will need to edit
the path strings, which is a known and obvious change.
Mac-Mini infrastructure for these services (launchd plists, install
scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen
implementation) is still on disk and will be removed in a follow-up
commit, along with replacing mana-image-gen with the Windows
diffusers+CUDA implementation. This commit is just the live-code sync.
|
||
|---|---|---|
| .. | ||
| app | ||
| voices | ||
| .env.example | ||
| CLAUDE.md | ||
| com.mana.mana-tts.plist | ||
| install-service.sh | ||
| README.md | ||
| requirements.txt | ||
| service.pyw | ||
| setup.sh | ||
Mana TTS
Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.
Features
- Kokoro TTS: Fast preset voices (~300 MB model)
- F5-TTS: Voice cloning with reference audio (~6 GB model)
- MLX Optimized: Runs efficiently on Apple Silicon
- REST API: FastAPI with OpenAPI documentation
Quick Start
Setup
# Run setup script
./setup.sh
# Or manually
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Start Service
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022
Test
# Health check
curl http://localhost:3022/health
# Synthesize with Kokoro
curl -X POST http://localhost:3022/synthesize/kokoro \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "af_heart"}' \
--output test.wav
# Play audio (macOS)
afplay test.wav
API Endpoints
Health & Info
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/models |
GET | Available models |
/voices |
GET | All available voices |
Synthesis
| Endpoint | Method | Description |
|---|---|---|
/synthesize/kokoro |
POST | Kokoro preset voices |
/synthesize |
POST | F5-TTS voice cloning |
/synthesize/auto |
POST | Auto-select model |
Voice Management
| Endpoint | Method | Description |
|---|---|---|
/voices |
POST | Register custom voice |
/voices/{id} |
DELETE | Delete custom voice |
Synthesis Examples
Kokoro (Fast Preset Voices)
curl -X POST http://localhost:3022/synthesize/kokoro \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to Mana TTS, your personal voice synthesis service.",
"voice": "af_heart",
"speed": 1.0,
"output_format": "wav"
}' \
--output output.wav
F5-TTS (Voice Cloning)
# With reference audio upload
curl -X POST http://localhost:3022/synthesize \
-F "text=Hello, this is a cloned voice speaking." \
-F "reference_audio=@reference.wav" \
-F "reference_text=This is what the reference audio says." \
-F "output_format=wav" \
--output cloned.wav
# With registered voice
curl -X POST http://localhost:3022/synthesize \
-F "text=Hello from my registered voice." \
-F "voice_id=my_custom_voice" \
--output output.wav
Auto-Select
# Uses Kokoro for preset voices, F5-TTS for custom
curl -X POST http://localhost:3022/synthesize/auto \
-H "Content-Type: application/json" \
-d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
--output output.wav
Available Kokoro Voices
American Female
af_heart- Warm, emotional (default)af_alloy- Neutral, professionalaf_bella- Friendly, approachableaf_jessica- Confident, clearaf_nicole- Bright, energeticaf_nova- Modern, dynamicaf_sarah- Warm, conversational- ... and more
American Male
am_adam- Deep, authoritativeam_echo- Resonant, clearam_eric- Professional, neutralam_michael- Warm, trustworthy- ... and more
British Female
bf_alice- Refined, elegantbf_emma- Clear, professionalbf_lily- Soft, gentle
British Male
bm_daniel- Classic, authoritativebm_fable- Storyteller, expressivebm_george- Traditional, clear
Voice Registration
Register a custom voice for F5-TTS voice cloning:
curl -X POST http://localhost:3022/voices \
-F "voice_id=my_voice" \
-F "name=My Custom Voice" \
-F "description=A sample voice for testing" \
-F "transcript=Hello, this is the text spoken in the reference audio." \
-F "reference_audio=@my_reference.wav"
Pre-defined voices can also be placed in the voices/ directory:
voices/
└── my_voice/
├── reference.wav # Reference audio (required)
├── transcript.txt # Transcript of reference (required)
└── metadata.json # Name and description (optional)
Configuration
| Variable | Default | Description |
|---|---|---|
PORT |
3022 |
API port |
PRELOAD_MODELS |
false |
Load models on startup |
MAX_TEXT_LENGTH |
1000 |
Max characters per request |
CORS_ORIGINS |
https://mana.how,... |
Allowed CORS origins |
F5_MODEL |
lucasnewman/f5-tts-mlx |
F5-TTS model |
KOKORO_MODEL |
mlx-community/Kokoro-82M-bf16 |
Kokoro model |
Mac Mini Deployment
# Install and start as launchd service
../../scripts/mac-mini/setup-tts.sh
# Service management
launchctl list | grep com.mana.tts
launchctl unload ~/Library/LaunchAgents/com.mana.tts.plist
launchctl load ~/Library/LaunchAgents/com.mana.tts.plist
# View logs
tail -f /tmp/mana-tts.log
Requirements
- Python 3.10+
- macOS with Apple Silicon (recommended)
- ~7 GB disk space for models
- 16 GB RAM recommended
- ffmpeg (for MP3 output)
Troubleshooting
Models Not Loading
# Check MLX installation
python -c "import mlx; print(mlx.__version__)"
# Check mlx-audio
python -c "import mlx_audio; print('OK')"
# Check f5-tts-mlx
python -c "from f5_tts_mlx import F5TTS; print('OK')"
MP3 Output Not Working
# Install ffmpeg
brew install ffmpeg
# Verify
ffmpeg -version
Memory Issues
- Reduce
MAX_TEXT_LENGTHfor less memory usage - Set
PRELOAD_MODELS=falsefor lazy loading - F5-TTS requires ~6 GB, Kokoro ~500 MB
API Documentation
When running, visit http://localhost:3022/docs for interactive API documentation.