mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 23:01:09 +02:00
Create packages/shared-python/manacore_auth/ with:
- auth.py: API key validation, rate limiting, local + external auth
- external_auth.py: mana-core-auth remote validation with caching
- create_auth_dependency(scope): factory for per-service auth deps
Migrated services:
- mana-stt: auth.py now wraps shared auth with scope="stt" (272→42 LOC)
- mana-tts: auth.py now wraps shared auth with scope="tts" (272→42 LOC)
The only difference between services was the scope parameter ("stt" vs "tts").
Both external_auth.py files were 100% identical and are now thin re-exports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| app | ||
| voices | ||
| .env.example | ||
| CLAUDE.md | ||
| com.manacore.mana-tts.plist | ||
| install-service.sh | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
Mana TTS
Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.
Features
- Kokoro TTS: Fast preset voices (~300 MB model)
- F5-TTS: Voice cloning with reference audio (~6 GB model)
- MLX Optimized: Runs efficiently on Apple Silicon
- REST API: FastAPI with OpenAPI documentation
Quick Start
Setup
# Run setup script
./setup.sh
# Or manually
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Start Service
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022
Test
# Health check
curl http://localhost:3022/health
# Synthesize with Kokoro
curl -X POST http://localhost:3022/synthesize/kokoro \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "af_heart"}' \
--output test.wav
# Play audio (macOS)
afplay test.wav
API Endpoints
Health & Info
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/models |
GET | Available models |
/voices |
GET | All available voices |
Synthesis
| Endpoint | Method | Description |
|---|---|---|
/synthesize/kokoro |
POST | Kokoro preset voices |
/synthesize |
POST | F5-TTS voice cloning |
/synthesize/auto |
POST | Auto-select model |
Voice Management
| Endpoint | Method | Description |
|---|---|---|
/voices |
POST | Register custom voice |
/voices/{id} |
DELETE | Delete custom voice |
Synthesis Examples
Kokoro (Fast Preset Voices)
curl -X POST http://localhost:3022/synthesize/kokoro \
-H "Content-Type: application/json" \
-d '{
"text": "Welcome to Mana TTS, your personal voice synthesis service.",
"voice": "af_heart",
"speed": 1.0,
"output_format": "wav"
}' \
--output output.wav
F5-TTS (Voice Cloning)
# With reference audio upload
curl -X POST http://localhost:3022/synthesize \
-F "text=Hello, this is a cloned voice speaking." \
-F "reference_audio=@reference.wav" \
-F "reference_text=This is what the reference audio says." \
-F "output_format=wav" \
--output cloned.wav
# With registered voice
curl -X POST http://localhost:3022/synthesize \
-F "text=Hello from my registered voice." \
-F "voice_id=my_custom_voice" \
--output output.wav
Auto-Select
# Uses Kokoro for preset voices, F5-TTS for custom
curl -X POST http://localhost:3022/synthesize/auto \
-H "Content-Type: application/json" \
-d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
--output output.wav
Available Kokoro Voices
American Female
af_heart- Warm, emotional (default)af_alloy- Neutral, professionalaf_bella- Friendly, approachableaf_jessica- Confident, clearaf_nicole- Bright, energeticaf_nova- Modern, dynamicaf_sarah- Warm, conversational- ... and more
American Male
am_adam- Deep, authoritativeam_echo- Resonant, clearam_eric- Professional, neutralam_michael- Warm, trustworthy- ... and more
British Female
bf_alice- Refined, elegantbf_emma- Clear, professionalbf_lily- Soft, gentle
British Male
bm_daniel- Classic, authoritativebm_fable- Storyteller, expressivebm_george- Traditional, clear
Voice Registration
Register a custom voice for F5-TTS voice cloning:
curl -X POST http://localhost:3022/voices \
-F "voice_id=my_voice" \
-F "name=My Custom Voice" \
-F "description=A sample voice for testing" \
-F "transcript=Hello, this is the text spoken in the reference audio." \
-F "reference_audio=@my_reference.wav"
Pre-defined voices can also be placed in the voices/ directory:
voices/
└── my_voice/
├── reference.wav # Reference audio (required)
├── transcript.txt # Transcript of reference (required)
└── metadata.json # Name and description (optional)
Configuration
| Variable | Default | Description |
|---|---|---|
PORT |
3022 |
API port |
PRELOAD_MODELS |
false |
Load models on startup |
MAX_TEXT_LENGTH |
1000 |
Max characters per request |
CORS_ORIGINS |
https://mana.how,... |
Allowed CORS origins |
F5_MODEL |
lucasnewman/f5-tts-mlx |
F5-TTS model |
KOKORO_MODEL |
mlx-community/Kokoro-82M-bf16 |
Kokoro model |
Mac Mini Deployment
# Install and start as launchd service
../../scripts/mac-mini/setup-tts.sh
# Service management
launchctl list | grep com.manacore.tts
launchctl unload ~/Library/LaunchAgents/com.manacore.tts.plist
launchctl load ~/Library/LaunchAgents/com.manacore.tts.plist
# View logs
tail -f /tmp/manacore-tts.log
Requirements
- Python 3.10+
- macOS with Apple Silicon (recommended)
- ~7 GB disk space for models
- 16 GB RAM recommended
- ffmpeg (for MP3 output)
Troubleshooting
Models Not Loading
# Check MLX installation
python -c "import mlx; print(mlx.__version__)"
# Check mlx-audio
python -c "import mlx_audio; print('OK')"
# Check f5-tts-mlx
python -c "from f5_tts_mlx import F5TTS; print('OK')"
MP3 Output Not Working
# Install ffmpeg
brew install ffmpeg
# Verify
ffmpeg -version
Memory Issues
- Reduce
MAX_TEXT_LENGTHfor less memory usage - Set
PRELOAD_MODELS=falsefor lazy loading - F5-TTS requires ~6 GB, Kokoro ~500 MB
API Documentation
When running, visit http://localhost:3022/docs for interactive API documentation.