mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 20:21:09 +02:00

Till JS f4347032ca chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)

The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.

Removed (Mac-Mini deployment infrastructure):

services/mana-stt/
- com.mana.mana-stt.plist            (LaunchAgent)
- com.mana.vllm-voxtral.plist        (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh                 (single-service launchd installer)
- install-services.sh                (mana-stt + vllm-voxtral installer)
- setup.sh                           (Mac arm64 installer)
- scripts/setup-vllm.sh              (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh

services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh                           (Mac arm64 installer)

scripts/mac-mini/
- setup-image-gen.sh                 (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist

setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.

Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
  Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
  matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
  Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
  with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
  CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
  list to mention the now-removed plists, added the full GPU service
  port table with public URLs, added a cleanup snippet for any old plists
  still installed on a Mac Mini somewhere

2026-04-08 13:06:40 +02:00

3.9 KiB

Raw Blame History

mana-stt

Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (mana-server-gpu, RTX 3090).

⚠️ Earlier history: this directory used to contain Mac-Mini–targeted code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup, setup.sh with Apple-Silicon checks). That all moved to the Windows GPU box and was removed from the repo. If you're looking for the MLX path, see git history.

Tech Stack

Layer	Technology
Runtime	Python 3.11 + uvicorn (Windows)
Framework	FastAPI
Whisper	`whisperx` on CUDA (large-v3 + word alignment + pyannote diarization)
Voxtral (local)	vLLM serving Voxtral 3B/4B/24B (`vllm_service.py`)
Voxtral (cloud)	Mistral API (`voxtral_api_service.py`)
Auth	Per-key + internal-key API auth (`app/auth.py`, JWT via mana-auth in `app/external_auth.py`)
VRAM	Shared `vram_manager.py` accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other
Process supervision	Windows Scheduled Task `ManaSTT` (AtLogOn)

Port: 3020

Where it runs

Host	Path on disk	Entrypoint
Windows GPU server (`192.168.178.11`)	`C:\mana\services\mana-stt\`	`service.pyw` via Scheduled Task `ManaSTT`

Public URL: https://gpu-stt.mana.how (via Cloudflare Tunnel + Mac Mini gpu-proxy).

API Endpoints

Method	Path	Description
GET	`/health`	Liveness + which backends are loaded
GET	`/models`	Available STT models
POST	`/transcribe`	Whisper (WhisperX, default) — multipart `file` + optional `language`
POST	`/transcribe/voxtral`	Local Voxtral via vLLM
POST	`/transcribe/auto`	Routing helper — picks the best backend for the input

All endpoints (except /health) require Authorization: Bearer <token>. Tokens are validated against API_KEYS (per-app keys) or INTERNAL_API_KEY (no rate limit), and JWTs from mana-auth are also accepted via external_auth.py.

Backends (`app/`)

File	What it loads
`whisper_service.py`	WhisperX on CUDA (large-v3 + alignment + pyannote diarization)
`voxtral_service.py`	Local Voxtral via vLLM (slower start, richer multilingual)
`voxtral_api_service.py`	Mistral hosted Voxtral API (cloud, no GPU needed)
`vllm_service.py`	vLLM client primitives shared by Voxtral
`vram_manager.py`	Shared VRAM accounting — same module also used by mana-tts and mana-image-gen
`auth.py`	API-key auth (internal + per-app keys)
`external_auth.py`	JWT validation via mana-auth

Backends are loaded lazily during the FastAPI lifespan and reported by /health.

Configuration (`.env` on the Windows GPU box)

PORT=3020
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
WHISPER_DEFAULT_LANGUAGE=de
PRELOAD_MODELS=true
USE_VLLM=false
HF_TOKEN=...                    # required for pyannote diarization models
REQUIRE_AUTH=true
API_KEYS=sk-app1:app1,sk-app2:app2
INTERNAL_API_KEY=...            # cross-service, no rate limit
CORS_ORIGINS=https://mana.how,https://chat.mana.how

Operations

# Status
Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State
Get-NetTCPConnection -LocalPort 3020 -State Listen

# Restart
Stop-ScheduledTask -TaskName "ManaSTT"
Start-ScheduledTask -TaskName "ManaSTT"

# Logs
Get-Content C:\mana\services\mana-stt\service.log -Tail 50

Reference

docs/WINDOWS_GPU_SERVER_SETUP.md — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
docs/LOCAL_STT_MODELS.md — model comparisons (WER, latency, language coverage)
services/mana-stt/grafana-dashboard.json — Prometheus metrics dashboard

3.9 KiB Raw Blame History Unescape Escape