managarten/services/mana-stt/CLAUDE.md
Till JS f4347032ca chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.

Removed (Mac-Mini deployment infrastructure):

services/mana-stt/
- com.mana.mana-stt.plist            (LaunchAgent)
- com.mana.vllm-voxtral.plist        (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh                 (single-service launchd installer)
- install-services.sh                (mana-stt + vllm-voxtral installer)
- setup.sh                           (Mac arm64 installer)
- scripts/setup-vllm.sh              (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh

services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh                           (Mac arm64 installer)

scripts/mac-mini/
- setup-image-gen.sh                 (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist

setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.

Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
  Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
  matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
  Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
  with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
  CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
  list to mention the now-removed plists, added the full GPU service
  port table with public URLs, added a cleanup snippet for any old plists
  still installed on a Mac Mini somewhere
2026-04-08 13:06:40 +02:00

3.9 KiB
Raw Blame History

mana-stt

Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (mana-server-gpu, RTX 3090).

⚠️ Earlier history: this directory used to contain Mac-Minitargeted code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup, setup.sh with Apple-Silicon checks). That all moved to the Windows GPU box and was removed from the repo. If you're looking for the MLX path, see git history.

Tech Stack

Layer Technology
Runtime Python 3.11 + uvicorn (Windows)
Framework FastAPI
Whisper whisperx on CUDA (large-v3 + word alignment + pyannote diarization)
Voxtral (local) vLLM serving Voxtral 3B/4B/24B (vllm_service.py)
Voxtral (cloud) Mistral API (voxtral_api_service.py)
Auth Per-key + internal-key API auth (app/auth.py, JWT via mana-auth in app/external_auth.py)
VRAM Shared vram_manager.py accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other
Process supervision Windows Scheduled Task ManaSTT (AtLogOn)

Port: 3020

Where it runs

Host Path on disk Entrypoint
Windows GPU server (192.168.178.11) C:\mana\services\mana-stt\ service.pyw via Scheduled Task ManaSTT

Public URL: https://gpu-stt.mana.how (via Cloudflare Tunnel + Mac Mini gpu-proxy).

API Endpoints

Method Path Description
GET /health Liveness + which backends are loaded
GET /models Available STT models
POST /transcribe Whisper (WhisperX, default) — multipart file + optional language
POST /transcribe/voxtral Local Voxtral via vLLM
POST /transcribe/auto Routing helper — picks the best backend for the input

All endpoints (except /health) require Authorization: Bearer <token>. Tokens are validated against API_KEYS (per-app keys) or INTERNAL_API_KEY (no rate limit), and JWTs from mana-auth are also accepted via external_auth.py.

Backends (app/)

File What it loads
whisper_service.py WhisperX on CUDA (large-v3 + alignment + pyannote diarization)
voxtral_service.py Local Voxtral via vLLM (slower start, richer multilingual)
voxtral_api_service.py Mistral hosted Voxtral API (cloud, no GPU needed)
vllm_service.py vLLM client primitives shared by Voxtral
vram_manager.py Shared VRAM accounting — same module also used by mana-tts and mana-image-gen
auth.py API-key auth (internal + per-app keys)
external_auth.py JWT validation via mana-auth

Backends are loaded lazily during the FastAPI lifespan and reported by /health.

Configuration (.env on the Windows GPU box)

PORT=3020
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
WHISPER_DEFAULT_LANGUAGE=de
PRELOAD_MODELS=true
USE_VLLM=false
HF_TOKEN=...                    # required for pyannote diarization models
REQUIRE_AUTH=true
API_KEYS=sk-app1:app1,sk-app2:app2
INTERNAL_API_KEY=...            # cross-service, no rate limit
CORS_ORIGINS=https://mana.how,https://chat.mana.how

Operations

# Status
Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State
Get-NetTCPConnection -LocalPort 3020 -State Listen

# Restart
Stop-ScheduledTask -TaskName "ManaSTT"
Start-ScheduledTask -TaskName "ManaSTT"

# Logs
Get-Content C:\mana\services\mana-stt\service.log -Tail 50

Reference

  • docs/WINDOWS_GPU_SERVER_SETUP.md — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
  • docs/LOCAL_STT_MODELS.md — model comparisons (WER, latency, language coverage)
  • services/mana-stt/grafana-dashboard.json — Prometheus metrics dashboard