mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 23:01:09 +02:00

History

Till JS 996ec81a0e refactor(shared-python): extract shared auth package from mana-stt and mana-tts Create packages/shared-python/manacore_auth/ with: - auth.py: API key validation, rate limiting, local + external auth - external_auth.py: mana-core-auth remote validation with caching - create_auth_dependency(scope): factory for per-service auth deps Migrated services: - mana-stt: auth.py now wraps shared auth with scope="stt" (272→42 LOC) - mana-tts: auth.py now wraps shared auth with scope="tts" (272→42 LOC) The only difference between services was the scope parameter ("stt" vs "tts"). Both external_auth.py files were 100% identical and are now thin re-exports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-02 14:09:32 +02:00
..
app	refactor(shared-python): extract shared auth package from mana-stt and mana-tts	2026-04-02 14:09:32 +02:00
voices	🌐 feat: add i18n support to 6 web apps	2026-01-29 14:48:35 +01:00
.env.example	🔒️ feat(stt,tts): add API key authentication with rate limiting	2026-02-11 18:04:22 +01:00
CLAUDE.md	📝 docs(tts): document German voice support (Piper/Kerstin)	2026-02-14 12:21:40 +01:00
com.manacore.mana-tts.plist	🔧 chore(stt,tts): update launchd plists to load .env files	2026-02-12 01:44:46 +01:00
install-service.sh	🔧 fix(mac-mini): update health checks and disable missing services	2026-02-12 13:28:55 +01:00
README.md	🌐 feat: add i18n support to 6 web apps	2026-01-29 14:48:35 +01:00
requirements.txt	✨ feat(auth): add API key management for STT/TTS services	2026-02-12 02:12:05 +01:00
setup.sh	🌐 feat: add i18n support to 6 web apps	2026-01-29 14:48:35 +01:00

README.md

Mana TTS

Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.

Features

Kokoro TTS: Fast preset voices (~300 MB model)
F5-TTS: Voice cloning with reference audio (~6 GB model)
MLX Optimized: Runs efficiently on Apple Silicon
REST API: FastAPI with OpenAPI documentation

Quick Start

Setup

# Run setup script
./setup.sh

# Or manually
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Start Service

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022

Test

# Health check
curl http://localhost:3022/health

# Synthesize with Kokoro
curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "af_heart"}' \
  --output test.wav

# Play audio (macOS)
afplay test.wav

API Endpoints

Health & Info

Endpoint	Method	Description
`/health`	GET	Health check
`/models`	GET	Available models
`/voices`	GET	All available voices

Synthesis

Endpoint	Method	Description
`/synthesize/kokoro`	POST	Kokoro preset voices
`/synthesize`	POST	F5-TTS voice cloning
`/synthesize/auto`	POST	Auto-select model

Voice Management

Endpoint	Method	Description
`/voices`	POST	Register custom voice
`/voices/{id}`	DELETE	Delete custom voice

Synthesis Examples

Kokoro (Fast Preset Voices)

curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Mana TTS, your personal voice synthesis service.",
    "voice": "af_heart",
    "speed": 1.0,
    "output_format": "wav"
  }' \
  --output output.wav

F5-TTS (Voice Cloning)

# With reference audio upload
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello, this is a cloned voice speaking." \
  -F "reference_audio=@reference.wav" \
  -F "reference_text=This is what the reference audio says." \
  -F "output_format=wav" \
  --output cloned.wav

# With registered voice
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello from my registered voice." \
  -F "voice_id=my_custom_voice" \
  --output output.wav

Auto-Select

# Uses Kokoro for preset voices, F5-TTS for custom
curl -X POST http://localhost:3022/synthesize/auto \
  -H "Content-Type: application/json" \
  -d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
  --output output.wav

Available Kokoro Voices

American Female

af_heart - Warm, emotional (default)
af_alloy - Neutral, professional
af_bella - Friendly, approachable
af_jessica - Confident, clear
af_nicole - Bright, energetic
af_nova - Modern, dynamic
af_sarah - Warm, conversational
... and more

American Male

am_adam - Deep, authoritative
am_echo - Resonant, clear
am_eric - Professional, neutral
am_michael - Warm, trustworthy
... and more

British Female

bf_alice - Refined, elegant
bf_emma - Clear, professional
bf_lily - Soft, gentle

British Male

bm_daniel - Classic, authoritative
bm_fable - Storyteller, expressive
bm_george - Traditional, clear

Voice Registration

curl -X POST http://localhost:3022/voices \
  -F "voice_id=my_voice" \
  -F "name=My Custom Voice" \
  -F "description=A sample voice for testing" \
  -F "transcript=Hello, this is the text spoken in the reference audio." \
  -F "reference_audio=@my_reference.wav"

Pre-defined voices can also be placed in the voices/ directory:

voices/
└── my_voice/
    ├── reference.wav       # Reference audio (required)
    ├── transcript.txt      # Transcript of reference (required)
    └── metadata.json       # Name and description (optional)

Configuration

Variable	Default	Description
`PORT`	`3022`	API port
`PRELOAD_MODELS`	`false`	Load models on startup
`MAX_TEXT_LENGTH`	`1000`	Max characters per request
`CORS_ORIGINS`	`https://mana.how,...`	Allowed CORS origins
`F5_MODEL`	`lucasnewman/f5-tts-mlx`	F5-TTS model
`KOKORO_MODEL`	`mlx-community/Kokoro-82M-bf16`	Kokoro model

Mac Mini Deployment

# Install and start as launchd service
../../scripts/mac-mini/setup-tts.sh

# Service management
launchctl list | grep com.manacore.tts
launchctl unload ~/Library/LaunchAgents/com.manacore.tts.plist
launchctl load ~/Library/LaunchAgents/com.manacore.tts.plist

# View logs
tail -f /tmp/manacore-tts.log

Requirements

Python 3.10+
macOS with Apple Silicon (recommended)
~7 GB disk space for models
16 GB RAM recommended
ffmpeg (for MP3 output)

Troubleshooting

Models Not Loading

# Check MLX installation
python -c "import mlx; print(mlx.__version__)"

# Check mlx-audio
python -c "import mlx_audio; print('OK')"

# Check f5-tts-mlx
python -c "from f5_tts_mlx import F5TTS; print('OK')"

MP3 Output Not Working

# Install ffmpeg
brew install ffmpeg

# Verify
ffmpeg -version

Memory Issues

Reduce MAX_TEXT_LENGTH for less memory usage
Set PRELOAD_MODELS=false for lazy loading
F5-TTS requires ~6 GB, Kokoro ~500 MB

API Documentation

When running, visit http://localhost:3022/docs for interactive API documentation.