mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-20 03:41:25 +02:00

History

Till JS b8e18b7f82 chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts The Windows GPU server has been the actual production home for these services for some time, and the running code there has drifted ahead of the repo. This sync pulls the live versions back into the repo so the Windows box is no longer the only place those changes exist. Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11): mana-llm: - src/main.py, src/config.py — small fixes (auth wiring, config tweaks) - src/api_auth.py — NEW (cross-service GPU_API_KEY validator) - service.pyw — Windows runner used by the ManaLLM scheduled task (sets up logging redirect, loads .env, calls uvicorn) mana-stt: - app/main.py — substantial cleanup (684→392 lines), drops the whisperx-as-separate-backend branching now that whisper_service.py rolls whisperx in directly - app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines) - app/auth.py + external_auth.py — significantly expanded auth - app/vram_manager.py — NEW (shared VRAM accounting helper) - service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH injection, .env loading - removed: app/whisper_service_cuda.py (folded into whisper_service.py) - removed: app/whisperx_service.py (folded into whisper_service.py) mana-tts: - app/auth.py, external_auth.py — same auth expansion as stt - app/f5_service.py, kokoro_service.py — Windows tweaks - app/vram_manager.py — NEW (same shared helper as stt) - service.pyw — Windows runner mana-video-gen: - service.pyw — Windows runner (no other changes; the .py code on the GPU box is byte-identical to what's already in the repo) The service.pyw files contain absolute Windows paths (C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user profile. Kept as-is intentionally — they exist to be deployed to that one machine and any abstraction layer would just hide what's actually happening. Anyone redeploying to a different layout will need to edit the path strings, which is a known and obvious change. Mac-Mini infrastructure for these services (launchd plists, install scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen implementation) is still on disk and will be removed in a follow-up commit, along with replacing mana-image-gen with the Windows diffusers+CUDA implementation. This commit is just the live-code sync.		2026-04-08 12:46:03 +02:00
..
app	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts	2026-04-08 12:46:03 +02:00
voices	🌐 feat: add i18n support to 6 web apps	2026-01-29 14:48:35 +01:00
.env.example	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
CLAUDE.md	📝 docs(tts): document German voice support (Piper/Kerstin)	2026-02-14 12:21:40 +01:00
com.mana.mana-tts.plist	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
install-service.sh	feat: rename ManaCore to Mana across entire codebase	2026-04-05 20:00:13 +02:00
README.md	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
requirements.txt	✨ feat(auth): add API key management for STT/TTS services	2026-02-12 02:12:05 +01:00
service.pyw	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts	2026-04-08 12:46:03 +02:00
setup.sh	🌐 feat: add i18n support to 6 web apps	2026-01-29 14:48:35 +01:00

README.md

Mana TTS

Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.

Features

Kokoro TTS: Fast preset voices (~300 MB model)
F5-TTS: Voice cloning with reference audio (~6 GB model)
MLX Optimized: Runs efficiently on Apple Silicon
REST API: FastAPI with OpenAPI documentation

Quick Start

Setup

# Run setup script
./setup.sh

# Or manually
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Start Service

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3022

Test

# Health check
curl http://localhost:3022/health

# Synthesize with Kokoro
curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "af_heart"}' \
  --output test.wav

# Play audio (macOS)
afplay test.wav

API Endpoints

Health & Info

Endpoint	Method	Description
`/health`	GET	Health check
`/models`	GET	Available models
`/voices`	GET	All available voices

Synthesis

Endpoint	Method	Description
`/synthesize/kokoro`	POST	Kokoro preset voices
`/synthesize`	POST	F5-TTS voice cloning
`/synthesize/auto`	POST	Auto-select model

Voice Management

Endpoint	Method	Description
`/voices`	POST	Register custom voice
`/voices/{id}`	DELETE	Delete custom voice

Synthesis Examples

Kokoro (Fast Preset Voices)

curl -X POST http://localhost:3022/synthesize/kokoro \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Welcome to Mana TTS, your personal voice synthesis service.",
    "voice": "af_heart",
    "speed": 1.0,
    "output_format": "wav"
  }' \
  --output output.wav

F5-TTS (Voice Cloning)

# With reference audio upload
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello, this is a cloned voice speaking." \
  -F "reference_audio=@reference.wav" \
  -F "reference_text=This is what the reference audio says." \
  -F "output_format=wav" \
  --output cloned.wav

# With registered voice
curl -X POST http://localhost:3022/synthesize \
  -F "text=Hello from my registered voice." \
  -F "voice_id=my_custom_voice" \
  --output output.wav

Auto-Select

# Uses Kokoro for preset voices, F5-TTS for custom
curl -X POST http://localhost:3022/synthesize/auto \
  -H "Content-Type: application/json" \
  -d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
  --output output.wav

Available Kokoro Voices

American Female

af_heart - Warm, emotional (default)
af_alloy - Neutral, professional
af_bella - Friendly, approachable
af_jessica - Confident, clear
af_nicole - Bright, energetic
af_nova - Modern, dynamic
af_sarah - Warm, conversational
... and more

American Male

am_adam - Deep, authoritative
am_echo - Resonant, clear
am_eric - Professional, neutral
am_michael - Warm, trustworthy
... and more

British Female

bf_alice - Refined, elegant
bf_emma - Clear, professional
bf_lily - Soft, gentle

British Male

bm_daniel - Classic, authoritative
bm_fable - Storyteller, expressive
bm_george - Traditional, clear

Voice Registration

curl -X POST http://localhost:3022/voices \
  -F "voice_id=my_voice" \
  -F "name=My Custom Voice" \
  -F "description=A sample voice for testing" \
  -F "transcript=Hello, this is the text spoken in the reference audio." \
  -F "reference_audio=@my_reference.wav"

Pre-defined voices can also be placed in the voices/ directory:

voices/
└── my_voice/
    ├── reference.wav       # Reference audio (required)
    ├── transcript.txt      # Transcript of reference (required)
    └── metadata.json       # Name and description (optional)

Configuration

Variable	Default	Description
`PORT`	`3022`	API port
`PRELOAD_MODELS`	`false`	Load models on startup
`MAX_TEXT_LENGTH`	`1000`	Max characters per request
`CORS_ORIGINS`	`https://mana.how,...`	Allowed CORS origins
`F5_MODEL`	`lucasnewman/f5-tts-mlx`	F5-TTS model
`KOKORO_MODEL`	`mlx-community/Kokoro-82M-bf16`	Kokoro model

Mac Mini Deployment

# Install and start as launchd service
../../scripts/mac-mini/setup-tts.sh

# Service management
launchctl list | grep com.mana.tts
launchctl unload ~/Library/LaunchAgents/com.mana.tts.plist
launchctl load ~/Library/LaunchAgents/com.mana.tts.plist

# View logs
tail -f /tmp/mana-tts.log

Requirements

Python 3.10+
macOS with Apple Silicon (recommended)
~7 GB disk space for models
16 GB RAM recommended
ffmpeg (for MP3 output)

Troubleshooting

Models Not Loading

# Check MLX installation
python -c "import mlx; print(mlx.__version__)"

# Check mlx-audio
python -c "import mlx_audio; print('OK')"

# Check f5-tts-mlx
python -c "from f5_tts_mlx import F5TTS; print('OK')"

MP3 Output Not Working

# Install ffmpeg
brew install ffmpeg

# Verify
ffmpeg -version

Memory Issues

Reduce MAX_TEXT_LENGTH for less memory usage
Set PRELOAD_MODELS=false for lazy loading
F5-TTS requires ~6 GB, Kokoro ~500 MB

API Documentation

When running, visit http://localhost:3022/docs for interactive API documentation.