managarten/services/mana-stt/CLAUDE.md
Till JS f4347032ca chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU)
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.

Removed (Mac-Mini deployment infrastructure):

services/mana-stt/
- com.mana.mana-stt.plist            (LaunchAgent)
- com.mana.vllm-voxtral.plist        (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh                 (single-service launchd installer)
- install-services.sh                (mana-stt + vllm-voxtral installer)
- setup.sh                           (Mac arm64 installer)
- scripts/setup-vllm.sh              (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh

services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh                           (Mac arm64 installer)

scripts/mac-mini/
- setup-image-gen.sh                 (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist

setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.

Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
  Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
  matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
  Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
  with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
  CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
  list to mention the now-removed plists, added the full GPU service
  port table with public URLs, added a cleanup snippet for any old plists
  still installed on a Mac Mini somewhere
2026-04-08 13:06:40 +02:00

96 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# mana-stt
Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090).
> ⚠️ **Earlier history**: this directory used to contain Mac-Minitargeted
> code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup,
> setup.sh with Apple-Silicon checks). That all moved to the Windows
> GPU box and was removed from the repo. If you're looking for the MLX
> path, see git history.
## Tech Stack
| Layer | Technology |
|-------|------------|
| **Runtime** | Python 3.11 + uvicorn (Windows) |
| **Framework** | FastAPI |
| **Whisper** | `whisperx` on CUDA (large-v3 + word alignment + pyannote diarization) |
| **Voxtral (local)** | vLLM serving Voxtral 3B/4B/24B (`vllm_service.py`) |
| **Voxtral (cloud)** | Mistral API (`voxtral_api_service.py`) |
| **Auth** | Per-key + internal-key API auth (`app/auth.py`, JWT via mana-auth in `app/external_auth.py`) |
| **VRAM** | Shared `vram_manager.py` accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other |
| **Process supervision** | Windows Scheduled Task `ManaSTT` (AtLogOn) |
## Port: 3020
## Where it runs
| Host | Path on disk | Entrypoint |
|------|--------------|------------|
| Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-stt\` | `service.pyw` via Scheduled Task `ManaSTT` |
Public URL: `https://gpu-stt.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy).
## API Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Liveness + which backends are loaded |
| GET | `/models` | Available STT models |
| POST | `/transcribe` | Whisper (WhisperX, default) — multipart `file` + optional `language` |
| POST | `/transcribe/voxtral` | Local Voxtral via vLLM |
| POST | `/transcribe/auto` | Routing helper — picks the best backend for the input |
All endpoints (except `/health`) require `Authorization: Bearer <token>`. Tokens are validated against `API_KEYS` (per-app keys) or `INTERNAL_API_KEY` (no rate limit), and JWTs from mana-auth are also accepted via `external_auth.py`.
## Backends (`app/`)
| File | What it loads |
|------|---------------|
| `whisper_service.py` | WhisperX on CUDA (large-v3 + alignment + pyannote diarization) |
| `voxtral_service.py` | Local Voxtral via vLLM (slower start, richer multilingual) |
| `voxtral_api_service.py` | Mistral hosted Voxtral API (cloud, no GPU needed) |
| `vllm_service.py` | vLLM client primitives shared by Voxtral |
| `vram_manager.py` | Shared VRAM accounting — same module also used by mana-tts and mana-image-gen |
| `auth.py` | API-key auth (internal + per-app keys) |
| `external_auth.py` | JWT validation via mana-auth |
Backends are loaded lazily during the FastAPI lifespan and reported by `/health`.
## Configuration (`.env` on the Windows GPU box)
```env
PORT=3020
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16
WHISPER_DEFAULT_LANGUAGE=de
PRELOAD_MODELS=true
USE_VLLM=false
HF_TOKEN=... # required for pyannote diarization models
REQUIRE_AUTH=true
API_KEYS=sk-app1:app1,sk-app2:app2
INTERNAL_API_KEY=... # cross-service, no rate limit
CORS_ORIGINS=https://mana.how,https://chat.mana.how
```
## Operations
```powershell
# Status
Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State
Get-NetTCPConnection -LocalPort 3020 -State Listen
# Restart
Stop-ScheduledTask -TaskName "ManaSTT"
Start-ScheduledTask -TaskName "ManaSTT"
# Logs
Get-Content C:\mana\services\mana-stt\service.log -Tail 50
```
## Reference
- `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
- `docs/LOCAL_STT_MODELS.md` — model comparisons (WER, latency, language coverage)
- `services/mana-stt/grafana-dashboard.json` — Prometheus metrics dashboard