mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:21:09 +02:00
chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere
This commit is contained in:
parent
c7b4388cec
commit
f4347032ca
22 changed files with 226 additions and 1914 deletions
|
|
@ -1,79 +1,96 @@
|
|||
# mana-stt
|
||||
|
||||
Speech-to-Text service for the Mana ecosystem. Runs on the Mac Mini M4 (Apple Silicon) and exposes a small FastAPI surface that wraps multiple Whisper backends plus Mistral's hosted Voxtral API.
|
||||
Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090).
|
||||
|
||||
> ⚠️ **Earlier history**: this directory used to contain Mac-Mini–targeted
|
||||
> code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup,
|
||||
> setup.sh with Apple-Silicon checks). That all moved to the Windows
|
||||
> GPU box and was removed from the repo. If you're looking for the MLX
|
||||
> path, see git history.
|
||||
|
||||
## Tech Stack
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| **Runtime** | Python 3.11 + uvicorn |
|
||||
| **Runtime** | Python 3.11 + uvicorn (Windows) |
|
||||
| **Framework** | FastAPI |
|
||||
| **Local model** | Whisper Large V3 via [`lightning-whisper-mlx`](https://github.com/mustafaaljadery/lightning-whisper-mlx) (Apple MLX) |
|
||||
| **Local model (rich)** | WhisperX for word-level timestamps + diarization |
|
||||
| **Cloud model** | Mistral Voxtral Mini API |
|
||||
| **Optional** | vLLM Voxtral (GPU) — see `vllm_service.py` |
|
||||
| **Auth** | JWT validation via mana-auth (`external_auth.py`) + API key fallback (`auth.py`) |
|
||||
| **Process supervision** | launchd via `com.mana.mana-stt.plist` |
|
||||
| **Whisper** | `whisperx` on CUDA (large-v3 + word alignment + pyannote diarization) |
|
||||
| **Voxtral (local)** | vLLM serving Voxtral 3B/4B/24B (`vllm_service.py`) |
|
||||
| **Voxtral (cloud)** | Mistral API (`voxtral_api_service.py`) |
|
||||
| **Auth** | Per-key + internal-key API auth (`app/auth.py`, JWT via mana-auth in `app/external_auth.py`) |
|
||||
| **VRAM** | Shared `vram_manager.py` accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other |
|
||||
| **Process supervision** | Windows Scheduled Task `ManaSTT` (AtLogOn) |
|
||||
|
||||
## Port: 3020
|
||||
|
||||
## Quick Start
|
||||
## Where it runs
|
||||
|
||||
```bash
|
||||
cd services/mana-stt
|
||||
./setup.sh # Create venv + install
|
||||
.venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3020
|
||||
```
|
||||
| Host | Path on disk | Entrypoint |
|
||||
|------|--------------|------------|
|
||||
| Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-stt\` | `service.pyw` via Scheduled Task `ManaSTT` |
|
||||
|
||||
Production runs via launchd on the Mac Mini — `install-service.sh` (single service) or `install-services.sh` (mana-stt + vllm-voxtral together).
|
||||
Public URL: `https://gpu-stt.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy).
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/health` | Liveness + which backends are loaded |
|
||||
| GET | `/models` | List available STT models |
|
||||
| POST | `/transcribe` | Whisper MLX (default, fastest local) |
|
||||
| POST | `/transcribe/whisperx` | WhisperX with word-level timestamps + diarization |
|
||||
| POST | `/transcribe/voxtral` | Local Voxtral (vLLM) |
|
||||
| POST | `/transcribe/voxtral/api` | Mistral Voxtral API (cloud) |
|
||||
| POST | `/transcribe/auto` | Tries WhisperX first, falls back to Whisper MLX |
|
||||
| GET | `/models` | Available STT models |
|
||||
| POST | `/transcribe` | Whisper (WhisperX, default) — multipart `file` + optional `language` |
|
||||
| POST | `/transcribe/voxtral` | Local Voxtral via vLLM |
|
||||
| POST | `/transcribe/auto` | Routing helper — picks the best backend for the input |
|
||||
|
||||
All `/transcribe*` endpoints accept multipart `file` upload + optional `language` form field. Auth via `Authorization: Bearer <jwt>` or `X-API-Key`.
|
||||
All endpoints (except `/health`) require `Authorization: Bearer <token>`. Tokens are validated against `API_KEYS` (per-app keys) or `INTERNAL_API_KEY` (no rate limit), and JWTs from mana-auth are also accepted via `external_auth.py`.
|
||||
|
||||
## Backends (`app/`)
|
||||
|
||||
| File | What it loads |
|
||||
|------|---------------|
|
||||
| `whisper_service.py` | Whisper Large V3 via MLX (local, default) |
|
||||
| `whisper_service_cuda.py` | CUDA Whisper (only used on Windows GPU server) |
|
||||
| `whisperx_service.py` | WhisperX with diarization (local, slower, richer output) |
|
||||
| `voxtral_service.py` | Local Voxtral via vLLM (optional, needs the second launchd job) |
|
||||
| `voxtral_api_service.py` | Mistral hosted Voxtral API (cloud) |
|
||||
| `vllm_service.py` | vLLM client primitives shared with Voxtral |
|
||||
| `auth.py` | API key auth (fallback path) |
|
||||
| `external_auth.py` | JWT auth via mana-auth public key |
|
||||
| `whisper_service.py` | WhisperX on CUDA (large-v3 + alignment + pyannote diarization) |
|
||||
| `voxtral_service.py` | Local Voxtral via vLLM (slower start, richer multilingual) |
|
||||
| `voxtral_api_service.py` | Mistral hosted Voxtral API (cloud, no GPU needed) |
|
||||
| `vllm_service.py` | vLLM client primitives shared by Voxtral |
|
||||
| `vram_manager.py` | Shared VRAM accounting — same module also used by mana-tts and mana-image-gen |
|
||||
| `auth.py` | API-key auth (internal + per-app keys) |
|
||||
| `external_auth.py` | JWT validation via mana-auth |
|
||||
|
||||
Backends are loaded lazily during the FastAPI lifespan and reported by `/health`. Missing dependencies (e.g. CUDA on Mac) are tolerated — the service starts without them.
|
||||
Backends are loaded lazily during the FastAPI lifespan and reported by `/health`.
|
||||
|
||||
## Configuration
|
||||
|
||||
Reads from `services/mana-stt/.env` (loaded by the launchd plist's `set -a; source .env; set +a`). Relevant variables:
|
||||
## Configuration (`.env` on the Windows GPU box)
|
||||
|
||||
```env
|
||||
PORT=3020
|
||||
MANA_AUTH_URL=http://localhost:3001 # JWKS source for JWT verification
|
||||
MISTRAL_API_KEY=... # only needed for /transcribe/voxtral/api
|
||||
STT_API_KEY=... # legacy API key fallback
|
||||
WHISPER_MODEL=large-v3
|
||||
WHISPER_DEVICE=cuda
|
||||
WHISPER_COMPUTE_TYPE=float16
|
||||
WHISPER_DEFAULT_LANGUAGE=de
|
||||
PRELOAD_MODELS=true
|
||||
USE_VLLM=false
|
||||
HF_TOKEN=... # required for pyannote diarization models
|
||||
REQUIRE_AUTH=true
|
||||
API_KEYS=sk-app1:app1,sk-app2:app2
|
||||
INTERNAL_API_KEY=... # cross-service, no rate limit
|
||||
CORS_ORIGINS=https://mana.how,https://chat.mana.how
|
||||
```
|
||||
|
||||
## Operations
|
||||
|
||||
- **Logs**: launchd writes to `~/Library/Logs/mana-stt.{out,err}.log` (see plist)
|
||||
- **Metrics**: Prometheus endpoint at `/metrics` if enabled in config; Grafana dashboard JSON checked in at `grafana-dashboard.json`
|
||||
- **Restart**: `launchctl kickstart -k gui/$(id -u)/com.mana.mana-stt`
|
||||
```powershell
|
||||
# Status
|
||||
Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State
|
||||
Get-NetTCPConnection -LocalPort 3020 -State Listen
|
||||
|
||||
# Restart
|
||||
Stop-ScheduledTask -TaskName "ManaSTT"
|
||||
Start-ScheduledTask -TaskName "ManaSTT"
|
||||
|
||||
# Logs
|
||||
Get-Content C:\mana\services\mana-stt\service.log -Tail 50
|
||||
```
|
||||
|
||||
## Reference
|
||||
|
||||
- `services/mana-stt/README.md` — user-facing setup, model download instructions, language coverage
|
||||
- `docs/LOCAL_STT_MODELS.md` — WER comparisons, model size/quality tradeoffs
|
||||
- `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
|
||||
- `docs/LOCAL_STT_MODELS.md` — model comparisons (WER, latency, language coverage)
|
||||
- `services/mana-stt/grafana-dashboard.json` — Prometheus metrics dashboard
|
||||
|
|
|
|||
|
|
@ -1,185 +1,31 @@
|
|||
# Mana STT Service
|
||||
|
||||
Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**.
|
||||
Speech-to-Text API service running on the Windows GPU server (`mana-server-gpu`, RTX 3090). Wraps **WhisperX** (CUDA, large-v3 + word alignment + pyannote diarization), local **Voxtral via vLLM**, and the hosted **Mistral Voxtral API**.
|
||||
|
||||
Optimized for Mac Mini M4 (Apple Silicon).
|
||||
For architecture, deployment, configuration, and operations see [`CLAUDE.md`](./CLAUDE.md) and [`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md).
|
||||
|
||||
## Architecture
|
||||
## Port: 3020
|
||||
|
||||
```
|
||||
┌─────────────────────┐
|
||||
│ mana-stt (3020) │
|
||||
│ FastAPI │
|
||||
└─────────┬───────────┘
|
||||
│
|
||||
┌─────────────────┼─────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
||||
│ Whisper │ │ Voxtral API │ │ vLLM │
|
||||
│ MLX (Local) │ │ (Mistral) │ │ (Optional) │
|
||||
└──────────────┘ └──────────────┘ └──────────────┘
|
||||
```
|
||||
## Public URL
|
||||
|
||||
## Features
|
||||
|
||||
- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX)
|
||||
- **Voxtral Mini** - Mistral API, speaker diarization support (cloud)
|
||||
- **Apple Silicon Optimized** - Uses MLX for fast local inference
|
||||
- **Automatic Fallback** - Falls back between backends automatically
|
||||
- **REST API** - Simple HTTP endpoints for integration
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
cd services/mana-stt
|
||||
./setup.sh
|
||||
```
|
||||
|
||||
### Run Locally
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 3020
|
||||
```
|
||||
|
||||
### Setup as System Service (Mac Mini)
|
||||
|
||||
```bash
|
||||
./scripts/mac-mini/setup-stt.sh
|
||||
```
|
||||
`https://gpu-stt.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy)
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/health` | GET | Health check |
|
||||
| `/health` | GET | Health check + which backends are loaded |
|
||||
| `/models` | GET | List available models |
|
||||
| `/transcribe` | POST | Whisper transcription |
|
||||
| `/transcribe/voxtral` | POST | Voxtral transcription |
|
||||
| `/transcribe/auto` | POST | Auto-select best model |
|
||||
| `/transcribe` | POST | Whisper / WhisperX transcription |
|
||||
| `/transcribe/voxtral` | POST | Voxtral transcription (local vLLM) |
|
||||
| `/transcribe/auto` | POST | Auto-select best backend for the input |
|
||||
|
||||
## Usage Examples
|
||||
All endpoints (except `/health`) require `Authorization: Bearer <token>`.
|
||||
|
||||
### Transcribe with Whisper (Recommended)
|
||||
## Quick Test
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3020/transcribe \
|
||||
-F "file=@recording.mp3" \
|
||||
-F "language=de"
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"text": "Das ist ein Beispieltext...",
|
||||
"language": "de",
|
||||
"model": "whisper-large-v3-turbo"
|
||||
}
|
||||
```
|
||||
|
||||
### Transcribe with Voxtral
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3020/transcribe/voxtral \
|
||||
-F "file=@recording.mp3" \
|
||||
-F "language=de"
|
||||
```
|
||||
|
||||
### Auto-Select Model
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3020/transcribe/auto \
|
||||
-F "file=@recording.mp3" \
|
||||
-F "prefer=whisper"
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `PORT` | `3020` | API server port |
|
||||
| `WHISPER_MODEL` | `large-v3` | Default Whisper model |
|
||||
| `PRELOAD_MODELS` | `false` | Load models on startup |
|
||||
| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
|
||||
| `MISTRAL_API_KEY` | - | Required for Voxtral API |
|
||||
| `USE_VLLM` | `false` | Enable vLLM backend (experimental) |
|
||||
| `VLLM_URL` | `http://localhost:8100` | vLLM server URL |
|
||||
|
||||
## Supported Audio Formats
|
||||
|
||||
- MP3, WAV, M4A, FLAC, OGG, WebM, MP4
|
||||
- Max file size: 100MB
|
||||
- Any sample rate (automatically resampled to 16kHz)
|
||||
|
||||
## Model Comparison
|
||||
|
||||
| Model | German WER | Speed | VRAM | License |
|
||||
|-------|------------|-------|------|---------|
|
||||
| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT |
|
||||
| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 |
|
||||
|
||||
## Logs
|
||||
|
||||
```bash
|
||||
# Service logs
|
||||
tail -f /tmp/mana-stt.log
|
||||
|
||||
# Error logs
|
||||
tail -f /tmp/mana-stt.error.log
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Model Download Slow
|
||||
|
||||
First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.
|
||||
|
||||
### Out of Memory
|
||||
|
||||
Reduce batch size or use smaller model:
|
||||
```bash
|
||||
export WHISPER_MODEL=medium
|
||||
```
|
||||
|
||||
### MPS Not Available
|
||||
|
||||
Ensure PyTorch is installed with MPS support:
|
||||
```bash
|
||||
pip install torch torchvision torchaudio
|
||||
python -c "import torch; print(torch.backends.mps.is_available())"
|
||||
```
|
||||
|
||||
## Integration
|
||||
|
||||
### From Chat Backend (NestJS)
|
||||
|
||||
```typescript
|
||||
const formData = new FormData();
|
||||
formData.append('file', audioBuffer, 'recording.webm');
|
||||
formData.append('language', 'de');
|
||||
|
||||
const response = await fetch('http://localhost:3020/transcribe', {
|
||||
method: 'POST',
|
||||
body: formData,
|
||||
});
|
||||
|
||||
const { text } = await response.json();
|
||||
```
|
||||
|
||||
### From SvelteKit Web
|
||||
|
||||
```typescript
|
||||
const formData = new FormData();
|
||||
formData.append('file', audioBlob, 'recording.webm');
|
||||
|
||||
const response = await fetch('https://gpu-stt.mana.how/transcribe', {
|
||||
method: 'POST',
|
||||
body: formData,
|
||||
});
|
||||
|
||||
const { text } = await response.json();
|
||||
curl -F "file=@audio.wav" -F "language=de" \
|
||||
-H "Authorization: Bearer $INTERNAL_API_KEY" \
|
||||
https://gpu-stt.mana.how/transcribe
|
||||
```
|
||||
|
|
|
|||
|
|
@ -1,39 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.mana.mana-stt</string>
|
||||
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/bin/bash</string>
|
||||
<string>-c</string>
|
||||
<string>cd /Users/mana/projects/mana-monorepo/services/mana-stt && set -a && source .env && set +a && .venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3020</string>
|
||||
</array>
|
||||
|
||||
<key>WorkingDirectory</key>
|
||||
<string>/Users/mana/projects/mana-monorepo/services/mana-stt</string>
|
||||
|
||||
<key>EnvironmentVariables</key>
|
||||
<dict>
|
||||
<key>PATH</key>
|
||||
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
|
||||
</dict>
|
||||
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
|
||||
<key>StandardOutPath</key>
|
||||
<string>/Users/mana/logs/mana-stt.log</string>
|
||||
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/Users/mana/logs/mana-stt.error.log</string>
|
||||
|
||||
<key>ThrottleInterval</key>
|
||||
<integer>10</integer>
|
||||
</dict>
|
||||
</plist>
|
||||
|
|
@ -1,41 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.mana.vllm-voxtral</string>
|
||||
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/bin/bash</string>
|
||||
<string>-c</string>
|
||||
<string>cd /Users/mana/projects/mana-monorepo/services/mana-stt && ./scripts/start-vllm-voxtral.sh</string>
|
||||
</array>
|
||||
|
||||
<key>WorkingDirectory</key>
|
||||
<string>/Users/mana/projects/mana-monorepo/services/mana-stt</string>
|
||||
|
||||
<key>EnvironmentVariables</key>
|
||||
<dict>
|
||||
<key>PATH</key>
|
||||
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
|
||||
<key>VLLM_PORT</key>
|
||||
<string>8100</string>
|
||||
</dict>
|
||||
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
|
||||
<key>StandardOutPath</key>
|
||||
<string>/Users/mana/logs/vllm-voxtral.log</string>
|
||||
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/Users/mana/logs/vllm-voxtral.error.log</string>
|
||||
|
||||
<key>ThrottleInterval</key>
|
||||
<integer>30</integer>
|
||||
</dict>
|
||||
</plist>
|
||||
|
|
@ -1,45 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Install mana-stt as a launchd service on macOS
|
||||
# Run this script on the Mac Mini server
|
||||
|
||||
set -e
|
||||
|
||||
SERVICE_NAME="com.mana.mana-stt"
|
||||
PLIST_FILE="$SERVICE_NAME.plist"
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
LAUNCH_AGENTS_DIR="$HOME/Library/LaunchAgents"
|
||||
LOG_DIR="$HOME/logs"
|
||||
|
||||
echo "Installing mana-stt launchd service..."
|
||||
|
||||
# Create logs directory
|
||||
mkdir -p "$LOG_DIR"
|
||||
|
||||
# Stop existing service if running
|
||||
if launchctl list | grep -q "$SERVICE_NAME"; then
|
||||
echo "Stopping existing service..."
|
||||
launchctl unload "$LAUNCH_AGENTS_DIR/$PLIST_FILE" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Copy plist to LaunchAgents
|
||||
cp "$SCRIPT_DIR/$PLIST_FILE" "$LAUNCH_AGENTS_DIR/"
|
||||
|
||||
# Load the service
|
||||
echo "Loading service..."
|
||||
launchctl load "$LAUNCH_AGENTS_DIR/$PLIST_FILE"
|
||||
|
||||
# Check status
|
||||
sleep 2
|
||||
if launchctl list | grep -q "$SERVICE_NAME"; then
|
||||
echo "Service installed and running!"
|
||||
echo ""
|
||||
echo "Useful commands:"
|
||||
echo " View logs: tail -f $LOG_DIR/mana-stt.log"
|
||||
echo " View errors: tail -f $LOG_DIR/mana-stt.error.log"
|
||||
echo " Stop: launchctl unload $LAUNCH_AGENTS_DIR/$PLIST_FILE"
|
||||
echo " Start: launchctl load $LAUNCH_AGENTS_DIR/$PLIST_FILE"
|
||||
echo " Health check: curl http://localhost:3020/health"
|
||||
else
|
||||
echo "ERROR: Service failed to start. Check logs at $LOG_DIR/mana-stt.error.log"
|
||||
exit 1
|
||||
fi
|
||||
|
|
@ -1,84 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Install mana-stt and vllm-voxtral as launchd services on macOS
|
||||
# Run this script on the Mac Mini server
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
LAUNCH_AGENTS_DIR="$HOME/Library/LaunchAgents"
|
||||
LOG_DIR="$HOME/logs"
|
||||
|
||||
echo "============================================"
|
||||
echo "Installing Mana STT Services"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Create logs directory
|
||||
mkdir -p "$LOG_DIR"
|
||||
|
||||
install_service() {
|
||||
local service_name="$1"
|
||||
local plist_file="$service_name.plist"
|
||||
|
||||
echo "Installing $service_name..."
|
||||
|
||||
# Stop existing service if running
|
||||
if launchctl list | grep -q "$service_name"; then
|
||||
echo " Stopping existing service..."
|
||||
launchctl unload "$LAUNCH_AGENTS_DIR/$plist_file" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Copy plist to LaunchAgents
|
||||
cp "$SCRIPT_DIR/$plist_file" "$LAUNCH_AGENTS_DIR/"
|
||||
|
||||
# Load the service
|
||||
echo " Loading service..."
|
||||
launchctl load "$LAUNCH_AGENTS_DIR/$plist_file"
|
||||
|
||||
sleep 2
|
||||
if launchctl list | grep -q "$service_name"; then
|
||||
echo " ✓ $service_name installed and running"
|
||||
else
|
||||
echo " ✗ $service_name failed to start"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Install vLLM first (STT depends on it)
|
||||
install_service "com.mana.vllm-voxtral"
|
||||
|
||||
# Wait for vLLM to initialize
|
||||
echo ""
|
||||
echo "Waiting for vLLM server to initialize..."
|
||||
for i in {1..30}; do
|
||||
if curl -s http://localhost:8100/health > /dev/null 2>&1; then
|
||||
echo " ✓ vLLM server is ready"
|
||||
break
|
||||
fi
|
||||
if [ $i -eq 30 ]; then
|
||||
echo " ! vLLM server not responding yet (may still be loading model)"
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
|
||||
# Install STT service
|
||||
echo ""
|
||||
install_service "com.mana.mana-stt"
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo "Installation complete!"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo "Services:"
|
||||
echo " vLLM Voxtral: http://localhost:8100"
|
||||
echo " Mana STT: http://localhost:3020"
|
||||
echo ""
|
||||
echo "Useful commands:"
|
||||
echo " View vLLM logs: tail -f $LOG_DIR/vllm-voxtral.log"
|
||||
echo " View STT logs: tail -f $LOG_DIR/mana-stt.log"
|
||||
echo " Health check: curl http://localhost:3020/health"
|
||||
echo ""
|
||||
echo "Stop all:"
|
||||
echo " launchctl unload $LAUNCH_AGENTS_DIR/com.mana.vllm-voxtral.plist"
|
||||
echo " launchctl unload $LAUNCH_AGENTS_DIR/com.mana.mana-stt.plist"
|
||||
|
|
@ -1,83 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Setup vLLM for Voxtral on Mac Mini M4
|
||||
#
|
||||
# vLLM runs in CPU mode on macOS (no CUDA), but still provides
|
||||
# the optimized inference pipeline for Voxtral models.
|
||||
#
|
||||
# Usage: ./scripts/setup-vllm.sh
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
SERVICE_DIR="$(dirname "$SCRIPT_DIR")"
|
||||
VENV_DIR="$SERVICE_DIR/.venv-vllm"
|
||||
|
||||
echo "============================================"
|
||||
echo "vLLM Setup for Voxtral on Mac Mini M4"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
|
||||
# Check Python version
|
||||
PYTHON_VERSION=$(python3 --version 2>&1 | awk '{print $2}')
|
||||
PYTHON_MAJOR=$(echo $PYTHON_VERSION | cut -d. -f1)
|
||||
PYTHON_MINOR=$(echo $PYTHON_VERSION | cut -d. -f2)
|
||||
|
||||
if [[ "$PYTHON_MAJOR" -lt 3 ]] || [[ "$PYTHON_MAJOR" -eq 3 && "$PYTHON_MINOR" -lt 10 ]]; then
|
||||
echo "Error: Python 3.10+ required (found $PYTHON_VERSION)"
|
||||
exit 1
|
||||
fi
|
||||
echo "Python version: $PYTHON_VERSION"
|
||||
|
||||
# Create separate venv for vLLM (to avoid conflicts with whisper)
|
||||
echo ""
|
||||
echo "Creating virtual environment for vLLM..."
|
||||
python3 -m venv "$VENV_DIR"
|
||||
source "$VENV_DIR/bin/activate"
|
||||
|
||||
# Upgrade pip
|
||||
pip install --upgrade pip --quiet
|
||||
|
||||
# Install vLLM with audio support
|
||||
echo ""
|
||||
echo "Installing vLLM with audio support..."
|
||||
echo "This may take a few minutes..."
|
||||
|
||||
# Install uv for faster package installation
|
||||
pip install uv --quiet
|
||||
|
||||
# Install vLLM with audio support (nightly for best Voxtral support)
|
||||
uv pip install "vllm[audio]>=0.10.0" --extra-index-url https://wheels.vllm.ai/nightly 2>&1 || {
|
||||
echo "Nightly install failed, trying stable..."
|
||||
uv pip install "vllm[audio]>=0.10.0"
|
||||
}
|
||||
|
||||
# Install mistral-common with audio
|
||||
uv pip install "mistral-common[audio]>=1.8.1"
|
||||
|
||||
echo ""
|
||||
echo "============================================"
|
||||
echo "Installation complete!"
|
||||
echo "============================================"
|
||||
echo ""
|
||||
echo "To start Voxtral Mini 3B server:"
|
||||
echo " source $VENV_DIR/bin/activate"
|
||||
echo " vllm serve mistralai/Voxtral-Mini-3B-2507 \\"
|
||||
echo " --tokenizer_mode mistral \\"
|
||||
echo " --config_format mistral \\"
|
||||
echo " --load_format mistral \\"
|
||||
echo " --host 0.0.0.0 \\"
|
||||
echo " --port 8100"
|
||||
echo ""
|
||||
echo "To start Voxtral Realtime 4B server:"
|
||||
echo " source $VENV_DIR/bin/activate"
|
||||
echo " vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 \\"
|
||||
echo " --host 0.0.0.0 \\"
|
||||
echo " --port 8100"
|
||||
echo ""
|
||||
echo "API Endpoint: http://localhost:8100/v1/audio/transcriptions"
|
||||
echo ""
|
||||
echo "Test with:"
|
||||
echo " curl http://localhost:8100/v1/audio/transcriptions \\"
|
||||
echo " -F file=@test.mp3 \\"
|
||||
echo " -F model=mistralai/Voxtral-Mini-3B-2507 \\"
|
||||
echo " -F language=de"
|
||||
|
|
@ -1,41 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Start vLLM server for Voxtral
|
||||
#
|
||||
# Usage: ./scripts/start-vllm-voxtral.sh [model]
|
||||
# model: "3b" (default) or "4b" for Realtime
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
SERVICE_DIR="$(dirname "$SCRIPT_DIR")"
|
||||
VENV_DIR="$SERVICE_DIR/.venv-vllm"
|
||||
MODEL="${1:-3b}"
|
||||
PORT="${VLLM_PORT:-8100}"
|
||||
|
||||
# Activate venv
|
||||
source "$VENV_DIR/bin/activate"
|
||||
|
||||
echo "Starting vLLM Voxtral server..."
|
||||
echo "Port: $PORT"
|
||||
|
||||
if [[ "$MODEL" == "4b" || "$MODEL" == "realtime" ]]; then
|
||||
echo "Model: Voxtral Mini 4B Realtime"
|
||||
exec vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 \
|
||||
--host 0.0.0.0 \
|
||||
--port "$PORT" \
|
||||
--max-model-len 4096 \
|
||||
--max-num-batched-tokens 4096 \
|
||||
--enforce-eager
|
||||
else
|
||||
echo "Model: Voxtral Mini 3B"
|
||||
# CPU mode needs smaller context and batched tokens
|
||||
exec vllm serve mistralai/Voxtral-Mini-3B-2507 \
|
||||
--tokenizer_mode mistral \
|
||||
--config_format mistral \
|
||||
--load_format mistral \
|
||||
--host 0.0.0.0 \
|
||||
--port "$PORT" \
|
||||
--max-model-len 4096 \
|
||||
--max-num-batched-tokens 4096 \
|
||||
--enforce-eager
|
||||
fi
|
||||
|
|
@ -1,123 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Mana STT Service Setup Script
|
||||
# For Mac Mini M4 (Apple Silicon)
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
VENV_DIR="$SCRIPT_DIR/.venv"
|
||||
PYTHON_VERSION="3.11"
|
||||
|
||||
echo "=============================================="
|
||||
echo " Mana STT Service Setup"
|
||||
echo " Whisper (Lightning MLX) + Voxtral"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
|
||||
# Check if running on macOS
|
||||
if [[ "$(uname)" != "Darwin" ]]; then
|
||||
echo "Warning: This script is optimized for macOS (Apple Silicon)"
|
||||
fi
|
||||
|
||||
# Check for Apple Silicon
|
||||
if [[ "$(uname -m)" != "arm64" ]]; then
|
||||
echo "Warning: Not running on Apple Silicon. MLX optimizations won't work."
|
||||
fi
|
||||
|
||||
# Check Python version
|
||||
echo "1. Checking Python installation..."
|
||||
if command -v python3.11 &> /dev/null; then
|
||||
PYTHON_CMD="python3.11"
|
||||
elif command -v python3 &> /dev/null; then
|
||||
PYTHON_CMD="python3"
|
||||
PY_VERSION=$($PYTHON_CMD --version 2>&1 | cut -d' ' -f2 | cut -d'.' -f1,2)
|
||||
echo " Found Python $PY_VERSION"
|
||||
else
|
||||
echo "Error: Python 3 not found. Please install Python 3.11+"
|
||||
echo " brew install python@3.11"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create virtual environment
|
||||
echo ""
|
||||
echo "2. Creating virtual environment..."
|
||||
if [ -d "$VENV_DIR" ]; then
|
||||
echo " Virtual environment already exists at $VENV_DIR"
|
||||
read -p " Recreate? (y/N) " -n 1 -r
|
||||
echo
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
rm -rf "$VENV_DIR"
|
||||
$PYTHON_CMD -m venv "$VENV_DIR"
|
||||
echo " Virtual environment recreated"
|
||||
fi
|
||||
else
|
||||
$PYTHON_CMD -m venv "$VENV_DIR"
|
||||
echo " Virtual environment created at $VENV_DIR"
|
||||
fi
|
||||
|
||||
# Activate virtual environment
|
||||
source "$VENV_DIR/bin/activate"
|
||||
|
||||
# Upgrade pip
|
||||
echo ""
|
||||
echo "3. Upgrading pip..."
|
||||
pip install --upgrade pip wheel setuptools
|
||||
|
||||
# Install dependencies
|
||||
echo ""
|
||||
echo "4. Installing dependencies..."
|
||||
echo " This may take several minutes (downloading large models)..."
|
||||
|
||||
# Install PyTorch with MPS support first
|
||||
pip install torch torchvision torchaudio
|
||||
|
||||
# Install MLX for Apple Silicon
|
||||
pip install mlx
|
||||
|
||||
# Install other dependencies
|
||||
pip install -r "$SCRIPT_DIR/requirements.txt"
|
||||
|
||||
# Install scipy for audio resampling (needed by Voxtral)
|
||||
pip install scipy
|
||||
|
||||
echo ""
|
||||
echo "5. Verifying installation..."
|
||||
|
||||
# Test imports
|
||||
python -c "import torch; print(f' PyTorch {torch.__version__} - MPS available: {torch.backends.mps.is_available()}')"
|
||||
python -c "import mlx; print(f' MLX installed')" 2>/dev/null || echo " MLX not available (CPU fallback)"
|
||||
python -c "import fastapi; print(f' FastAPI {fastapi.__version__}')"
|
||||
|
||||
echo ""
|
||||
echo "6. Downloading Whisper model (large-v3)..."
|
||||
echo " This will download ~2.9 GB on first run..."
|
||||
# Pre-download the model
|
||||
python -c "
|
||||
from lightning_whisper_mlx import LightningWhisperMLX
|
||||
print(' Initializing Whisper model...')
|
||||
whisper = LightningWhisperMLX(model='large-v3', batch_size=12)
|
||||
print(' Whisper model ready!')
|
||||
" || echo " Note: Model will be downloaded on first transcription request"
|
||||
|
||||
echo ""
|
||||
echo "=============================================="
|
||||
echo " Setup Complete!"
|
||||
echo "=============================================="
|
||||
echo ""
|
||||
echo "To start the STT service:"
|
||||
echo ""
|
||||
echo " cd $SCRIPT_DIR"
|
||||
echo " source .venv/bin/activate"
|
||||
echo " uvicorn app.main:app --host 0.0.0.0 --port 3020"
|
||||
echo ""
|
||||
echo "Or use the systemd/launchd service (recommended for production):"
|
||||
echo ""
|
||||
echo " ./scripts/mac-mini/setup-stt.sh"
|
||||
echo ""
|
||||
echo "API Endpoints:"
|
||||
echo " POST /transcribe - Whisper transcription"
|
||||
echo " POST /transcribe/voxtral - Voxtral transcription"
|
||||
echo " POST /transcribe/auto - Auto-select best model"
|
||||
echo " GET /health - Health check"
|
||||
echo " GET /models - List available models"
|
||||
echo ""
|
||||
|
|
@ -1,125 +1,115 @@
|
|||
# CLAUDE.md - Mana TTS Service
|
||||
# mana-tts
|
||||
|
||||
## Service Overview
|
||||
Text-to-Speech microservice. Wraps Kokoro (English presets), Piper (German, local ONNX), and F5-TTS (voice cloning) behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090).
|
||||
|
||||
Text-to-Speech microservice using MLX-optimized models for Apple Silicon:
|
||||
> ⚠️ **Earlier history**: this directory used to contain MLX-optimized
|
||||
> Mac-Mini code (`f5-tts-mlx`, `mlx-audio`, `setup.sh` with Apple Silicon
|
||||
> checks, `com.mana.mana-tts.plist` launchd setup). All of that moved to
|
||||
> the Windows GPU box and was removed from the repo. If you need the
|
||||
> MLX path, see git history.
|
||||
|
||||
- **Port**: 3022
|
||||
- **Framework**: Python + FastAPI
|
||||
- **Models**: Kokoro-82M (fast), F5-TTS (voice cloning)
|
||||
## Tech Stack
|
||||
|
||||
## Commands
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| **Runtime** | Python 3.11 + uvicorn (Windows) |
|
||||
| **Framework** | FastAPI |
|
||||
| **English (preset)** | Kokoro-82M (`kokoro_service.py`) |
|
||||
| **German (local)** | Piper ONNX with `kerstin_low.onnx` and `thorsten_medium.onnx` voices (`piper_service.py`) |
|
||||
| **Voice cloning** | F5-TTS on CUDA (`f5_service.py`) |
|
||||
| **Audio I/O** | `soundfile`, `pydub` |
|
||||
| **Auth** | Per-key + internal-key API auth (`auth.py`) + JWT via mana-auth (`external_auth.py`) |
|
||||
| **VRAM** | Shared `vram_manager.py` (same module as mana-stt + mana-image-gen) |
|
||||
| **Process supervision** | Windows Scheduled Task `ManaTTS` (AtLogOn) |
|
||||
|
||||
```bash
|
||||
# Setup
|
||||
./setup.sh
|
||||
## Port: 3022
|
||||
|
||||
# Development
|
||||
source .venv/bin/activate
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload
|
||||
## Where it runs
|
||||
|
||||
# Production (Mac Mini)
|
||||
../../scripts/mac-mini/setup-tts.sh
|
||||
| Host | Path on disk | Entrypoint |
|
||||
|------|--------------|------------|
|
||||
| Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-tts\` | `service.pyw` via Scheduled Task `ManaTTS` |
|
||||
|
||||
# Test
|
||||
curl http://localhost:3022/health
|
||||
Public URL: `https://gpu-tts.mana.how`.
|
||||
|
||||
# English (Kokoro)
|
||||
curl -X POST http://localhost:3022/synthesize/kokoro \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "Hello world", "voice": "af_heart"}' \
|
||||
--output test_en.wav
|
||||
## API Endpoints
|
||||
|
||||
# German (Piper) - use /synthesize/auto
|
||||
curl -X POST http://localhost:3022/synthesize/auto \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "Hallo Welt", "voice": "de_kerstin"}' \
|
||||
--output test_de.wav
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/health` | Liveness + which backends are loaded |
|
||||
| GET | `/models` | Available TTS models |
|
||||
| GET | `/voices` | List all voices (preset + custom) |
|
||||
| POST | `/voices` | Register a custom voice (reference audio + transcript) |
|
||||
| DELETE | `/voices/{voice_id}` | Delete a custom voice |
|
||||
| POST | `/synthesize/kokoro` | Kokoro synthesis (English presets) |
|
||||
| POST | `/synthesize` | F5-TTS voice cloning |
|
||||
| POST | `/synthesize/auto` | Routing helper — picks the right backend for the requested voice |
|
||||
|
||||
All non-health endpoints require `Authorization: Bearer <token>` (per-app key, internal key, or mana-auth JWT).
|
||||
|
||||
## Voices
|
||||
|
||||
### Kokoro-82M (English presets)
|
||||
~300 MB download. 30+ preset English voices. Fast, no reference audio needed.
|
||||
|
||||
### Piper (German, local ONNX)
|
||||
~63 MB per voice. 100% local, GDPR-compliant. Available:
|
||||
- `de_kerstin` (female, default)
|
||||
- `de_thorsten` (male)
|
||||
|
||||
Fallback to Edge TTS cloud voices if Piper isn't loaded.
|
||||
|
||||
### F5-TTS (voice cloning)
|
||||
~6 GB. Requires reference audio + transcript. Higher quality, slower. Custom voices live in `voices/` (reference audio + transcript per voice ID).
|
||||
|
||||
## Configuration (`.env` on the Windows GPU box)
|
||||
|
||||
```env
|
||||
PORT=3022
|
||||
PRELOAD_MODELS=false
|
||||
MAX_TEXT_LENGTH=1000
|
||||
REQUIRE_AUTH=true
|
||||
API_KEYS=sk-app1:app1,sk-app2:app2
|
||||
INTERNAL_API_KEY=...
|
||||
CORS_ORIGINS=https://mana.how,https://chat.mana.how
|
||||
```
|
||||
|
||||
## File Structure
|
||||
## Code layout
|
||||
|
||||
```
|
||||
services/mana-tts/
|
||||
├── app/
|
||||
│ ├── __init__.py
|
||||
│ ├── main.py # FastAPI endpoints
|
||||
│ ├── kokoro_service.py # Kokoro TTS (English preset voices)
|
||||
│ ├── piper_service.py # Piper TTS (German voices, local)
|
||||
│ ├── f5_service.py # F5-TTS (voice cloning)
|
||||
│ ├── voice_manager.py # Custom voice registry
|
||||
│ └── audio_utils.py # Audio format conversion
|
||||
├── piper_voices/ # Piper voice models (.onnx)
|
||||
├── voices/ # Custom F5 voice storage
|
||||
├── mlx_models/ # MLX model cache
|
||||
├── setup.sh # Setup script
|
||||
├── requirements.txt
|
||||
└── README.md
|
||||
│ ├── main.py # FastAPI endpoints
|
||||
│ ├── kokoro_service.py # Kokoro (English presets)
|
||||
│ ├── piper_service.py # Piper (German, local ONNX)
|
||||
│ ├── f5_service.py # F5-TTS (voice cloning, CUDA)
|
||||
│ ├── voice_manager.py # Custom voice registry
|
||||
│ ├── audio_utils.py # Format conversion, resampling
|
||||
│ ├── auth.py # API-key auth
|
||||
│ ├── external_auth.py # JWT validation via mana-auth
|
||||
│ └── vram_manager.py # Shared VRAM accountant
|
||||
└── service.pyw # Windows runner (used by ManaTTS scheduled task)
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
The Piper voice ONNX files live alongside the service on the GPU box (`C:\mana\services\mana-tts\piper_voices\*.onnx`) — too big to commit, downloaded once during setup.
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/health` | GET | Health check |
|
||||
| `/models` | GET | Model info |
|
||||
| `/voices` | GET | List all voices |
|
||||
| `/voices` | POST | Register custom voice |
|
||||
| `/voices/{id}` | DELETE | Delete custom voice |
|
||||
| `/synthesize/kokoro` | POST | Kokoro synthesis |
|
||||
| `/synthesize` | POST | F5-TTS voice cloning |
|
||||
| `/synthesize/auto` | POST | Auto-select model |
|
||||
## Operations
|
||||
|
||||
## Models
|
||||
```powershell
|
||||
# Status
|
||||
Get-ScheduledTask -TaskName "ManaTTS" | Format-List TaskName, State
|
||||
Get-NetTCPConnection -LocalPort 3022 -State Listen
|
||||
|
||||
### Kokoro-82M (English)
|
||||
- ~300 MB download
|
||||
- 30+ preset English voices
|
||||
- Fast inference
|
||||
- No reference audio needed
|
||||
# Restart
|
||||
Stop-ScheduledTask -TaskName "ManaTTS"
|
||||
Start-ScheduledTask -TaskName "ManaTTS"
|
||||
|
||||
### Piper TTS (German)
|
||||
- ~63 MB per voice model
|
||||
- 100% local, GDPR-compliant
|
||||
- Fast inference on CPU
|
||||
- Available voices:
|
||||
- `de_kerstin` - Female (default)
|
||||
- `de_thorsten` - Male
|
||||
- Fallback to Edge TTS (cloud) if Piper unavailable:
|
||||
- `de_katja` - Female (cloud)
|
||||
- `de_conrad` - Male (cloud)
|
||||
- `de_amala` - Female young (cloud)
|
||||
- `de_florian` - Male young (cloud)
|
||||
# Logs
|
||||
Get-Content C:\mana\services\mana-tts\service.log -Tail 50
|
||||
```
|
||||
|
||||
### F5-TTS (Voice Cloning)
|
||||
- ~6 GB download
|
||||
- Voice cloning capability
|
||||
- Requires reference audio + transcript
|
||||
- Higher quality, slower
|
||||
## Reference
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `PORT` | `3022` | Service port |
|
||||
| `PRELOAD_MODELS` | `false` | Load on startup |
|
||||
| `MAX_TEXT_LENGTH` | `1000` | Max chars |
|
||||
| `CORS_ORIGINS` | (production URLs) | CORS config |
|
||||
|
||||
## Key Dependencies
|
||||
|
||||
- `fastapi` - Web framework
|
||||
- `f5-tts-mlx` - Voice cloning model
|
||||
- `mlx-audio` - Kokoro implementation
|
||||
- `mlx` - Apple Silicon ML framework
|
||||
- `piper-tts` - German TTS (local)
|
||||
- `edge-tts` - German TTS fallback (cloud)
|
||||
- `soundfile` - Audio I/O
|
||||
- `pydub` - MP3 conversion
|
||||
|
||||
## Development Notes
|
||||
|
||||
- Models load lazily on first request (unless `PRELOAD_MODELS=true`)
|
||||
- Custom voices stored in `voices/` with reference audio + transcript
|
||||
- Singleton pattern for model instances
|
||||
- Audio returned as raw bytes with headers for metadata
|
||||
- `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
|
||||
- `docs/PORT_SCHEMA.md` — port assignments across services
|
||||
|
|
|
|||
|
|
@ -1,237 +1,36 @@
|
|||
# Mana TTS
|
||||
|
||||
Text-to-Speech microservice with voice cloning support, optimized for Apple Silicon.
|
||||
Text-to-Speech microservice running on the Windows GPU server (`mana-server-gpu`, RTX 3090). Wraps **Kokoro** (English presets), **Piper** (German, local ONNX), and **F5-TTS** (CUDA voice cloning).
|
||||
|
||||
## Features
|
||||
For architecture, deployment, configuration, and operations see [`CLAUDE.md`](./CLAUDE.md) and [`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md).
|
||||
|
||||
- **Kokoro TTS**: Fast preset voices (~300 MB model)
|
||||
- **F5-TTS**: Voice cloning with reference audio (~6 GB model)
|
||||
- **MLX Optimized**: Runs efficiently on Apple Silicon
|
||||
- **REST API**: FastAPI with OpenAPI documentation
|
||||
## Port: 3022
|
||||
|
||||
## Quick Start
|
||||
## Public URL
|
||||
|
||||
### Setup
|
||||
|
||||
```bash
|
||||
# Run setup script
|
||||
./setup.sh
|
||||
|
||||
# Or manually
|
||||
python3.11 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### Start Service
|
||||
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
uvicorn app.main:app --host 0.0.0.0 --port 3022
|
||||
```
|
||||
|
||||
### Test
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:3022/health
|
||||
|
||||
# Synthesize with Kokoro
|
||||
curl -X POST http://localhost:3022/synthesize/kokoro \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "Hello world", "voice": "af_heart"}' \
|
||||
--output test.wav
|
||||
|
||||
# Play audio (macOS)
|
||||
afplay test.wav
|
||||
```
|
||||
`https://gpu-tts.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy)
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Health & Info
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/health` | GET | Health check |
|
||||
| `/models` | GET | Available models |
|
||||
| `/voices` | GET | All available voices |
|
||||
|
||||
### Synthesis
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/synthesize/kokoro` | POST | Kokoro preset voices |
|
||||
| `/health` | GET | Health check + which backends are loaded |
|
||||
| `/models` | GET | List available models |
|
||||
| `/voices` | GET | List preset + custom voices |
|
||||
| `/voices` | POST | Register a custom voice (reference audio + transcript) |
|
||||
| `/voices/{id}` | DELETE | Delete a custom voice |
|
||||
| `/synthesize/kokoro` | POST | Kokoro (English presets) |
|
||||
| `/synthesize` | POST | F5-TTS voice cloning |
|
||||
| `/synthesize/auto` | POST | Auto-select model |
|
||||
| `/synthesize/auto` | POST | Auto-select best backend for the requested voice |
|
||||
|
||||
### Voice Management
|
||||
All non-health endpoints require `Authorization: Bearer <token>`.
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/voices` | POST | Register custom voice |
|
||||
| `/voices/{id}` | DELETE | Delete custom voice |
|
||||
|
||||
## Synthesis Examples
|
||||
|
||||
### Kokoro (Fast Preset Voices)
|
||||
## Quick Test
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3022/synthesize/kokoro \
|
||||
curl -X POST https://gpu-tts.mana.how/synthesize/kokoro \
|
||||
-H "Authorization: Bearer $INTERNAL_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"text": "Welcome to Mana TTS, your personal voice synthesis service.",
|
||||
"voice": "af_heart",
|
||||
"speed": 1.0,
|
||||
"output_format": "wav"
|
||||
}' \
|
||||
--output output.wav
|
||||
-d '{"text":"Hello world","voice":"af_heart"}' \
|
||||
--output test.wav
|
||||
```
|
||||
|
||||
### F5-TTS (Voice Cloning)
|
||||
|
||||
```bash
|
||||
# With reference audio upload
|
||||
curl -X POST http://localhost:3022/synthesize \
|
||||
-F "text=Hello, this is a cloned voice speaking." \
|
||||
-F "reference_audio=@reference.wav" \
|
||||
-F "reference_text=This is what the reference audio says." \
|
||||
-F "output_format=wav" \
|
||||
--output cloned.wav
|
||||
|
||||
# With registered voice
|
||||
curl -X POST http://localhost:3022/synthesize \
|
||||
-F "text=Hello from my registered voice." \
|
||||
-F "voice_id=my_custom_voice" \
|
||||
--output output.wav
|
||||
```
|
||||
|
||||
### Auto-Select
|
||||
|
||||
```bash
|
||||
# Uses Kokoro for preset voices, F5-TTS for custom
|
||||
curl -X POST http://localhost:3022/synthesize/auto \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"text": "Auto-selected synthesis", "voice": "af_bella"}' \
|
||||
--output output.wav
|
||||
```
|
||||
|
||||
## Available Kokoro Voices
|
||||
|
||||
### American Female
|
||||
- `af_heart` - Warm, emotional (default)
|
||||
- `af_alloy` - Neutral, professional
|
||||
- `af_bella` - Friendly, approachable
|
||||
- `af_jessica` - Confident, clear
|
||||
- `af_nicole` - Bright, energetic
|
||||
- `af_nova` - Modern, dynamic
|
||||
- `af_sarah` - Warm, conversational
|
||||
- ... and more
|
||||
|
||||
### American Male
|
||||
- `am_adam` - Deep, authoritative
|
||||
- `am_echo` - Resonant, clear
|
||||
- `am_eric` - Professional, neutral
|
||||
- `am_michael` - Warm, trustworthy
|
||||
- ... and more
|
||||
|
||||
### British Female
|
||||
- `bf_alice` - Refined, elegant
|
||||
- `bf_emma` - Clear, professional
|
||||
- `bf_lily` - Soft, gentle
|
||||
|
||||
### British Male
|
||||
- `bm_daniel` - Classic, authoritative
|
||||
- `bm_fable` - Storyteller, expressive
|
||||
- `bm_george` - Traditional, clear
|
||||
|
||||
## Voice Registration
|
||||
|
||||
Register a custom voice for F5-TTS voice cloning:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:3022/voices \
|
||||
-F "voice_id=my_voice" \
|
||||
-F "name=My Custom Voice" \
|
||||
-F "description=A sample voice for testing" \
|
||||
-F "transcript=Hello, this is the text spoken in the reference audio." \
|
||||
-F "reference_audio=@my_reference.wav"
|
||||
```
|
||||
|
||||
Pre-defined voices can also be placed in the `voices/` directory:
|
||||
|
||||
```
|
||||
voices/
|
||||
└── my_voice/
|
||||
├── reference.wav # Reference audio (required)
|
||||
├── transcript.txt # Transcript of reference (required)
|
||||
└── metadata.json # Name and description (optional)
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `PORT` | `3022` | API port |
|
||||
| `PRELOAD_MODELS` | `false` | Load models on startup |
|
||||
| `MAX_TEXT_LENGTH` | `1000` | Max characters per request |
|
||||
| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
|
||||
| `F5_MODEL` | `lucasnewman/f5-tts-mlx` | F5-TTS model |
|
||||
| `KOKORO_MODEL` | `mlx-community/Kokoro-82M-bf16` | Kokoro model |
|
||||
|
||||
## Mac Mini Deployment
|
||||
|
||||
```bash
|
||||
# Install and start as launchd service
|
||||
../../scripts/mac-mini/setup-tts.sh
|
||||
|
||||
# Service management
|
||||
launchctl list | grep com.mana.tts
|
||||
launchctl unload ~/Library/LaunchAgents/com.mana.tts.plist
|
||||
launchctl load ~/Library/LaunchAgents/com.mana.tts.plist
|
||||
|
||||
# View logs
|
||||
tail -f /tmp/mana-tts.log
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.10+
|
||||
- macOS with Apple Silicon (recommended)
|
||||
- ~7 GB disk space for models
|
||||
- 16 GB RAM recommended
|
||||
- ffmpeg (for MP3 output)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Models Not Loading
|
||||
|
||||
```bash
|
||||
# Check MLX installation
|
||||
python -c "import mlx; print(mlx.__version__)"
|
||||
|
||||
# Check mlx-audio
|
||||
python -c "import mlx_audio; print('OK')"
|
||||
|
||||
# Check f5-tts-mlx
|
||||
python -c "from f5_tts_mlx import F5TTS; print('OK')"
|
||||
```
|
||||
|
||||
### MP3 Output Not Working
|
||||
|
||||
```bash
|
||||
# Install ffmpeg
|
||||
brew install ffmpeg
|
||||
|
||||
# Verify
|
||||
ffmpeg -version
|
||||
```
|
||||
|
||||
### Memory Issues
|
||||
|
||||
- Reduce `MAX_TEXT_LENGTH` for less memory usage
|
||||
- Set `PRELOAD_MODELS=false` for lazy loading
|
||||
- F5-TTS requires ~6 GB, Kokoro ~500 MB
|
||||
|
||||
## API Documentation
|
||||
|
||||
When running, visit http://localhost:3022/docs for interactive API documentation.
|
||||
|
|
|
|||
|
|
@ -1,39 +0,0 @@
|
|||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||
<plist version="1.0">
|
||||
<dict>
|
||||
<key>Label</key>
|
||||
<string>com.mana.mana-tts</string>
|
||||
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>/bin/bash</string>
|
||||
<string>-c</string>
|
||||
<string>cd /Users/mana/projects/mana-monorepo/services/mana-tts && set -a && source .env && set +a && .venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3022</string>
|
||||
</array>
|
||||
|
||||
<key>WorkingDirectory</key>
|
||||
<string>/Users/mana/projects/mana-monorepo/services/mana-tts</string>
|
||||
|
||||
<key>EnvironmentVariables</key>
|
||||
<dict>
|
||||
<key>PATH</key>
|
||||
<string>/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin</string>
|
||||
</dict>
|
||||
|
||||
<key>RunAtLoad</key>
|
||||
<true/>
|
||||
|
||||
<key>KeepAlive</key>
|
||||
<true/>
|
||||
|
||||
<key>StandardOutPath</key>
|
||||
<string>/Users/mana/logs/mana-tts.log</string>
|
||||
|
||||
<key>StandardErrorPath</key>
|
||||
<string>/Users/mana/logs/mana-tts.error.log</string>
|
||||
|
||||
<key>ThrottleInterval</key>
|
||||
<integer>10</integer>
|
||||
</dict>
|
||||
</plist>
|
||||
|
|
@ -1,45 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Install mana-tts as a launchd service on macOS
|
||||
# Run this script on the Mac Mini server
|
||||
|
||||
set -e
|
||||
|
||||
SERVICE_NAME="com.mana.mana-tts"
|
||||
PLIST_FILE="$SERVICE_NAME.plist"
|
||||
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
LAUNCH_AGENTS_DIR="$HOME/Library/LaunchAgents"
|
||||
LOG_DIR="$HOME/logs"
|
||||
|
||||
echo "Installing mana-tts launchd service..."
|
||||
|
||||
# Create logs directory
|
||||
mkdir -p "$LOG_DIR"
|
||||
|
||||
# Stop existing service if running
|
||||
if launchctl list | grep -q "$SERVICE_NAME"; then
|
||||
echo "Stopping existing service..."
|
||||
launchctl unload "$LAUNCH_AGENTS_DIR/$PLIST_FILE" 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Copy plist to LaunchAgents
|
||||
cp "$SCRIPT_DIR/$PLIST_FILE" "$LAUNCH_AGENTS_DIR/"
|
||||
|
||||
# Load the service
|
||||
echo "Loading service..."
|
||||
launchctl load "$LAUNCH_AGENTS_DIR/$PLIST_FILE"
|
||||
|
||||
# Check status
|
||||
sleep 2
|
||||
if launchctl list | grep -q "$SERVICE_NAME"; then
|
||||
echo "Service installed and running!"
|
||||
echo ""
|
||||
echo "Useful commands:"
|
||||
echo " View logs: tail -f $LOG_DIR/mana-tts.log"
|
||||
echo " View errors: tail -f $LOG_DIR/mana-tts.error.log"
|
||||
echo " Stop: launchctl unload $LAUNCH_AGENTS_DIR/$PLIST_FILE"
|
||||
echo " Start: launchctl load $LAUNCH_AGENTS_DIR/$PLIST_FILE"
|
||||
echo " Health check: curl http://localhost:3022/health"
|
||||
else
|
||||
echo "ERROR: Service failed to start. Check logs at $LOG_DIR/mana-tts.error.log"
|
||||
exit 1
|
||||
fi
|
||||
|
|
@ -1,150 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Setup script for Mana TTS service
|
||||
# Optimized for Apple Silicon (MLX)
|
||||
|
||||
set -e
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
VENV_DIR="$SCRIPT_DIR/.venv"
|
||||
PYTHON_VERSION="3.11"
|
||||
|
||||
echo "=========================================="
|
||||
echo "Mana TTS Setup"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
|
||||
# Check platform
|
||||
if [[ "$(uname)" != "Darwin" ]]; then
|
||||
echo "Warning: This service is optimized for macOS with Apple Silicon."
|
||||
echo "Some features may not work on other platforms."
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Check for Apple Silicon
|
||||
if [[ "$(uname -m)" != "arm64" ]]; then
|
||||
echo "Warning: This service is optimized for Apple Silicon (arm64)."
|
||||
echo "Performance may be reduced on Intel Macs."
|
||||
echo ""
|
||||
fi
|
||||
|
||||
# Find Python
|
||||
if command -v python3.11 &> /dev/null; then
|
||||
PYTHON_CMD="python3.11"
|
||||
elif command -v python3 &> /dev/null; then
|
||||
PYTHON_CMD="python3"
|
||||
else
|
||||
echo "Error: Python 3 not found. Please install Python 3.11 or later."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Using Python: $PYTHON_CMD"
|
||||
$PYTHON_CMD --version
|
||||
echo ""
|
||||
|
||||
# Check Python version
|
||||
PYTHON_MAJOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.major)")
|
||||
PYTHON_MINOR=$($PYTHON_CMD -c "import sys; print(sys.version_info.minor)")
|
||||
|
||||
if [[ $PYTHON_MAJOR -lt 3 ]] || [[ $PYTHON_MINOR -lt 10 ]]; then
|
||||
echo "Error: Python 3.10 or later required. Found $PYTHON_MAJOR.$PYTHON_MINOR"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create or recreate virtual environment
|
||||
if [[ -d "$VENV_DIR" ]]; then
|
||||
echo "Virtual environment exists at $VENV_DIR"
|
||||
read -p "Recreate it? (y/N) " -n 1 -r
|
||||
echo ""
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
echo "Removing existing virtual environment..."
|
||||
rm -rf "$VENV_DIR"
|
||||
echo "Creating new virtual environment..."
|
||||
$PYTHON_CMD -m venv "$VENV_DIR"
|
||||
fi
|
||||
else
|
||||
echo "Creating virtual environment..."
|
||||
$PYTHON_CMD -m venv "$VENV_DIR"
|
||||
fi
|
||||
|
||||
# Activate virtual environment
|
||||
echo "Activating virtual environment..."
|
||||
source "$VENV_DIR/bin/activate"
|
||||
|
||||
# Upgrade pip
|
||||
echo ""
|
||||
echo "Upgrading pip..."
|
||||
pip install --upgrade pip
|
||||
|
||||
# Install dependencies
|
||||
echo ""
|
||||
echo "Installing dependencies..."
|
||||
pip install -r "$SCRIPT_DIR/requirements.txt"
|
||||
|
||||
# Install ffmpeg check (for MP3 support)
|
||||
echo ""
|
||||
echo "Checking for ffmpeg (required for MP3 output)..."
|
||||
if command -v ffmpeg &> /dev/null; then
|
||||
echo "ffmpeg found: $(which ffmpeg)"
|
||||
else
|
||||
echo "Warning: ffmpeg not found. MP3 output will not work."
|
||||
echo "Install with: brew install ffmpeg"
|
||||
fi
|
||||
|
||||
# Verify installations
|
||||
echo ""
|
||||
echo "Verifying installations..."
|
||||
|
||||
# Test FastAPI
|
||||
python -c "import fastapi; print(f'FastAPI {fastapi.__version__}')" || {
|
||||
echo "Error: FastAPI not installed correctly"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Test soundfile
|
||||
python -c "import soundfile; print(f'soundfile {soundfile.__version__}')" || {
|
||||
echo "Error: soundfile not installed correctly"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Test MLX (on Apple Silicon)
|
||||
if [[ "$(uname -m)" == "arm64" ]]; then
|
||||
python -c "import mlx; print(f'MLX {mlx.__version__}')" || {
|
||||
echo "Warning: MLX not installed correctly. TTS may not work."
|
||||
}
|
||||
fi
|
||||
|
||||
# Test mlx-audio
|
||||
python -c "import mlx_audio; print('mlx-audio installed')" 2>/dev/null || {
|
||||
echo "Warning: mlx-audio not imported successfully."
|
||||
echo "You may need to install it manually or models won't load."
|
||||
}
|
||||
|
||||
# Create directories
|
||||
echo ""
|
||||
echo "Creating required directories..."
|
||||
mkdir -p "$SCRIPT_DIR/voices"
|
||||
mkdir -p "$SCRIPT_DIR/mlx_models"
|
||||
|
||||
echo ""
|
||||
echo "=========================================="
|
||||
echo "Setup Complete!"
|
||||
echo "=========================================="
|
||||
echo ""
|
||||
echo "To start the service:"
|
||||
echo ""
|
||||
echo " cd $SCRIPT_DIR"
|
||||
echo " source .venv/bin/activate"
|
||||
echo " uvicorn app.main:app --host 0.0.0.0 --port 3022"
|
||||
echo ""
|
||||
echo "Or for development with auto-reload:"
|
||||
echo ""
|
||||
echo " uvicorn app.main:app --host 0.0.0.0 --port 3022 --reload"
|
||||
echo ""
|
||||
echo "Test the service:"
|
||||
echo ""
|
||||
echo " curl http://localhost:3022/health"
|
||||
echo ""
|
||||
echo "For Mac Mini deployment, run:"
|
||||
echo ""
|
||||
echo " ./../../scripts/mac-mini/setup-tts.sh"
|
||||
echo ""
|
||||
Loading…
Add table
Add a link
Reference in a new issue