mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-17 10:39:40 +02:00
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere
96 lines
3.9 KiB
Markdown
96 lines
3.9 KiB
Markdown
# mana-stt
|
||
|
||
Speech-to-Text microservice. Wraps Whisper (CUDA, with WhisperX for word-level timestamps + diarization), local Voxtral via vLLM, and Mistral's hosted Voxtral API behind a small FastAPI surface. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090).
|
||
|
||
> ⚠️ **Earlier history**: this directory used to contain Mac-Mini–targeted
|
||
> code (Whisper Lightning MLX, com.mana.mana-stt.plist launchd setup,
|
||
> setup.sh with Apple-Silicon checks). That all moved to the Windows
|
||
> GPU box and was removed from the repo. If you're looking for the MLX
|
||
> path, see git history.
|
||
|
||
## Tech Stack
|
||
|
||
| Layer | Technology |
|
||
|-------|------------|
|
||
| **Runtime** | Python 3.11 + uvicorn (Windows) |
|
||
| **Framework** | FastAPI |
|
||
| **Whisper** | `whisperx` on CUDA (large-v3 + word alignment + pyannote diarization) |
|
||
| **Voxtral (local)** | vLLM serving Voxtral 3B/4B/24B (`vllm_service.py`) |
|
||
| **Voxtral (cloud)** | Mistral API (`voxtral_api_service.py`) |
|
||
| **Auth** | Per-key + internal-key API auth (`app/auth.py`, JWT via mana-auth in `app/external_auth.py`) |
|
||
| **VRAM** | Shared `vram_manager.py` accountant — coordinated with mana-tts and mana-image-gen so multiple GPU services don't OOM each other |
|
||
| **Process supervision** | Windows Scheduled Task `ManaSTT` (AtLogOn) |
|
||
|
||
## Port: 3020
|
||
|
||
## Where it runs
|
||
|
||
| Host | Path on disk | Entrypoint |
|
||
|------|--------------|------------|
|
||
| Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-stt\` | `service.pyw` via Scheduled Task `ManaSTT` |
|
||
|
||
Public URL: `https://gpu-stt.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy).
|
||
|
||
## API Endpoints
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/health` | Liveness + which backends are loaded |
|
||
| GET | `/models` | Available STT models |
|
||
| POST | `/transcribe` | Whisper (WhisperX, default) — multipart `file` + optional `language` |
|
||
| POST | `/transcribe/voxtral` | Local Voxtral via vLLM |
|
||
| POST | `/transcribe/auto` | Routing helper — picks the best backend for the input |
|
||
|
||
All endpoints (except `/health`) require `Authorization: Bearer <token>`. Tokens are validated against `API_KEYS` (per-app keys) or `INTERNAL_API_KEY` (no rate limit), and JWTs from mana-auth are also accepted via `external_auth.py`.
|
||
|
||
## Backends (`app/`)
|
||
|
||
| File | What it loads |
|
||
|------|---------------|
|
||
| `whisper_service.py` | WhisperX on CUDA (large-v3 + alignment + pyannote diarization) |
|
||
| `voxtral_service.py` | Local Voxtral via vLLM (slower start, richer multilingual) |
|
||
| `voxtral_api_service.py` | Mistral hosted Voxtral API (cloud, no GPU needed) |
|
||
| `vllm_service.py` | vLLM client primitives shared by Voxtral |
|
||
| `vram_manager.py` | Shared VRAM accounting — same module also used by mana-tts and mana-image-gen |
|
||
| `auth.py` | API-key auth (internal + per-app keys) |
|
||
| `external_auth.py` | JWT validation via mana-auth |
|
||
|
||
Backends are loaded lazily during the FastAPI lifespan and reported by `/health`.
|
||
|
||
## Configuration (`.env` on the Windows GPU box)
|
||
|
||
```env
|
||
PORT=3020
|
||
WHISPER_MODEL=large-v3
|
||
WHISPER_DEVICE=cuda
|
||
WHISPER_COMPUTE_TYPE=float16
|
||
WHISPER_DEFAULT_LANGUAGE=de
|
||
PRELOAD_MODELS=true
|
||
USE_VLLM=false
|
||
HF_TOKEN=... # required for pyannote diarization models
|
||
REQUIRE_AUTH=true
|
||
API_KEYS=sk-app1:app1,sk-app2:app2
|
||
INTERNAL_API_KEY=... # cross-service, no rate limit
|
||
CORS_ORIGINS=https://mana.how,https://chat.mana.how
|
||
```
|
||
|
||
## Operations
|
||
|
||
```powershell
|
||
# Status
|
||
Get-ScheduledTask -TaskName "ManaSTT" | Format-List TaskName, State
|
||
Get-NetTCPConnection -LocalPort 3020 -State Listen
|
||
|
||
# Restart
|
||
Stop-ScheduledTask -TaskName "ManaSTT"
|
||
Start-ScheduledTask -TaskName "ManaSTT"
|
||
|
||
# Logs
|
||
Get-Content C:\mana\services\mana-stt\service.log -Tail 50
|
||
```
|
||
|
||
## Reference
|
||
|
||
- `docs/WINDOWS_GPU_SERVER_SETUP.md` — Windows box setup, scheduled tasks, firewall, Cloudflare tunnel
|
||
- `docs/LOCAL_STT_MODELS.md` — model comparisons (WER, latency, language coverage)
|
||
- `services/mana-stt/grafana-dashboard.json` — Prometheus metrics dashboard
|