The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere
3.9 KiB
mana-stt
Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (mana-server-gpu, RTX 3090).
⚠️ Earlier history: this directory used to contain Mac-Mini–targeted code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup, setup.sh with Apple-Silicon checks). That all moved to the Windows GPU box and was removed from the repo. If you're looking for the MLX path, see git history.
Tech Stack
| Layer | Technology |
|---|---|
| Runtime | Python 3.11 + uvicorn (Windows) |
| Framework | FastAPI |
| Whisper | whisperx on CUDA (large-v3 + word alignment + pyannote diarization) |
| Voxtral (local) | vLLM serving Voxtral 3B/4B/24B (vllm_service.py) |
| Voxtral (cloud) | Mistral API (voxtral_api_service.py) |
| Auth | Per-key + internal-key API auth (app/auth.py, JWT via mana-auth in app/external_auth.py) |
| VRAM | Shared vram_manager.py accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other |
| Process supervision | Windows Scheduled Task ManaSTT (AtLogOn) |
Port: 3020
Where it runs
| Host | Path on disk | Entrypoint |
|---|---|---|
Windows GPU server (192.168.178.11) |
C:\mana\services\mana-stt\ |
service.pyw via Scheduled Task ManaSTT |
Public URL: https://gpu-stt.mana.how (via Cloudflare Tunnel + Mac Mini gpu-proxy).
API Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /health |
Liveness + which backends are loaded |
| GET | /models |
Available STT models |
| POST | /transcribe |
Whisper (WhisperX, default) — multipart file + optional language |
| POST | /transcribe/voxtral |
Local Voxtral via vLLM |
| POST | /transcribe/auto |
Routing helper — picks the best backend for the input |
All endpoints (except /health) require Authorization: Bearer <token>. Tokens are validated against API_KEYS (per-app keys) or INTERNAL_API_KEY (no rate limit), and JWTs from mana-auth are also accepted via external_auth.py.
Backends (app/)
| File | What it loads |
|---|---|
whisper_service.py |
WhisperX on CUDA (large-v3 + alignment + pyannote diarization) |
voxtral_service.py |
Local Voxtral via vLLM (slower start, richer multilingual) |
voxtral_api_service.py |
Mistral hosted Voxtral API (cloud, no GPU needed) |
vllm_service.py |
vLLM client primitives shared by Voxtral |
vram_manager.py |
Shared VRAM accounting — same module also used by mana-tts and mana-image-gen |
auth.py |
API-key auth (internal + per-app keys) |
external_auth.py |
JWT validation via mana-auth |
Backends are loaded lazily during the FastAPI lifespan and reported by /health.
Configuration (.env on the Windows GPU box)
PORT=3020
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
WHISPER_DEFAULT_LANGUAGE=de
PRELOAD_MODELS=true
USE_VLLM=false
HF_TOKEN=... # required for pyannote diarization models
REQUIRE_AUTH=true
API_KEYS=sk-app1:app1,sk-app2:app2
INTERNAL_API_KEY=... # cross-service, no rate limit
CORS_ORIGINS=https://mana.how,https://chat.mana.how
Operations
# Status
Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State
Get-NetTCPConnection -LocalPort 3020 -State Listen
# Restart
Stop-ScheduledTask -TaskName "ManaSTT"
Start-ScheduledTask -TaskName "ManaSTT"
# Logs
Get-Content C:\mana\services\mana-stt\service.log -Tail 50
Reference
docs/WINDOWS_GPU_SERVER_SETUP.md— Windows box setup, scheduled tasks, firewall, Cloudflare tunneldocs/LOCAL_STT_MODELS.md— model comparisons (WER, latency, language coverage)services/mana-stt/grafana-dashboard.json— Prometheus metrics dashboard