managarten/services/mana-stt/CLAUDE.md
Till JS b0a08ce239 docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions
New service docs:
- services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local),
  WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy
  backend loading and the launchd plist setup on the Mac Mini.
- services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and
  event-sharing. Documents the host (JWT) vs public (token) split, the
  rate-limit sweeper, and the createApp factory pattern that lets unit
  tests run without bootstrapping the production sweeper.

Stale entries fixed:
- mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the
  rewrite is the only mana-auth there is now. Email channel updated from
  Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md).
- mana-notify: same Brevo → Stalwart fix in the channel table and env
  var defaults.

PORT_SCHEMA.md flagged as aspirational:
- The doc was dated 2026-03-28 and presented as "single source of truth",
  but cross-checking against actual service source files (config.go,
  main.py, start.sh) shows nothing matches. Added a prominent warning at
  the top with the real ports + two confirmed collisions:
  * mana-image-gen and mana-video-gen both default to PORT 3026
  * mana-voice-bot and mana-sync both default to PORT 3050
  Today these are masked because image-gen + voice-bot live on the
  Windows GPU server while video-gen + sync live on the Mac Mini, but
  the moment they share a host they collide. Either execute the planned
  reorg or pick non-colliding ports and rewrite the doc to match
  reality — flagged as a real follow-up.
2026-04-08 12:23:48 +02:00

3.5 KiB

mana-stt

Speech-to-Text service for the Mana ecosystem. Runs on the Mac Mini M4 (Apple Silicon) and exposes a small FastAPI surface that wraps multiple Whisper backends plus Mistral's hosted Voxtral API.

Tech Stack

Layer Technology
Runtime Python 3.11 + uvicorn
Framework FastAPI
Local model Whisper Large V3 via lightning-whisper-mlx (Apple MLX)
Local model (rich) WhisperX for word-level timestamps + diarization
Cloud model Mistral Voxtral Mini API
Optional vLLM Voxtral (GPU) — see vllm_service.py
Auth JWT validation via mana-auth (external_auth.py) + API key fallback (auth.py)
Process supervision launchd via com.mana.mana-stt.plist

Port: 3020

Quick Start

cd services/mana-stt
./setup.sh                                          # Create venv + install
.venv/bin/uvicorn app.main:app --host 0.0.0.0 --port 3020

Production runs via launchd on the Mac Mini — install-service.sh (single service) or install-services.sh (mana-stt + vllm-voxtral together).

API Endpoints

Method Path Description
GET /health Liveness + which backends are loaded
GET /models List available STT models
POST /transcribe Whisper MLX (default, fastest local)
POST /transcribe/whisperx WhisperX with word-level timestamps + diarization
POST /transcribe/voxtral Local Voxtral (vLLM)
POST /transcribe/voxtral/api Mistral Voxtral API (cloud)
POST /transcribe/auto Tries WhisperX first, falls back to Whisper MLX

All /transcribe* endpoints accept multipart file upload + optional language form field. Auth via Authorization: Bearer <jwt> or X-API-Key.

Backends (app/)

File What it loads
whisper_service.py Whisper Large V3 via MLX (local, default)
whisper_service_cuda.py CUDA Whisper (only used on Windows GPU server)
whisperx_service.py WhisperX with diarization (local, slower, richer output)
voxtral_service.py Local Voxtral via vLLM (optional, needs the second launchd job)
voxtral_api_service.py Mistral hosted Voxtral API (cloud)
vllm_service.py vLLM client primitives shared with Voxtral
auth.py API key auth (fallback path)
external_auth.py JWT auth via mana-auth public key

Backends are loaded lazily during the FastAPI lifespan and reported by /health. Missing dependencies (e.g. CUDA on Mac) are tolerated — the service starts without them.

Configuration

Reads from services/mana-stt/.env (loaded by the launchd plist's set -a; source .env; set +a). Relevant variables:

PORT=3020
MANA_AUTH_URL=http://localhost:3001     # JWKS source for JWT verification
MISTRAL_API_KEY=...                     # only needed for /transcribe/voxtral/api
STT_API_KEY=...                         # legacy API key fallback

Operations

  • Logs: launchd writes to ~/Library/Logs/mana-stt.{out,err}.log (see plist)
  • Metrics: Prometheus endpoint at /metrics if enabled in config; Grafana dashboard JSON checked in at grafana-dashboard.json
  • Restart: launchctl kickstart -k gui/$(id -u)/com.mana.mana-stt

Reference

  • services/mana-stt/README.md — user-facing setup, model download instructions, language coverage
  • docs/LOCAL_STT_MODELS.md — WER comparisons, model size/quality tradeoffs