managarten/services/mana-stt
Till JS b0a08ce239 docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions
New service docs:
- services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local),
  WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy
  backend loading and the launchd plist setup on the Mac Mini.
- services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and
  event-sharing. Documents the host (JWT) vs public (token) split, the
  rate-limit sweeper, and the createApp factory pattern that lets unit
  tests run without bootstrapping the production sweeper.

Stale entries fixed:
- mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the
  rewrite is the only mana-auth there is now. Email channel updated from
  Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md).
- mana-notify: same Brevo → Stalwart fix in the channel table and env
  var defaults.

PORT_SCHEMA.md flagged as aspirational:
- The doc was dated 2026-03-28 and presented as "single source of truth",
  but cross-checking against actual service source files (config.go,
  main.py, start.sh) shows nothing matches. Added a prominent warning at
  the top with the real ports + two confirmed collisions:
  * mana-image-gen and mana-video-gen both default to PORT 3026
  * mana-voice-bot and mana-sync both default to PORT 3050
  Today these are masked because image-gen + voice-bot live on the
  Windows GPU server while video-gen + sync live on the Mac Mini, but
  the moment they share a host they collide. Either execute the planned
  reorg or pick non-colliding ports and rewrite the doc to match
  reality — flagged as a real follow-up.
2026-04-08 12:23:48 +02:00
..
app feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
scripts 🐛 fix(mana-stt): adjust vLLM config for CPU mode 2026-02-11 16:14:14 +01:00
.env.example chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
CLAUDE.md docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions 2026-04-08 12:23:48 +02:00
com.mana.mana-stt.plist chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
com.mana.vllm-voxtral.plist chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
grafana-dashboard.json feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
install-service.sh feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
install-services.sh feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00
README.md feat(memoro): voice recording → mana-stt transcription pipeline 2026-04-07 18:48:41 +02:00
requirements-cuda.txt chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
requirements.txt chore: complete ManaCore → Mana rename (docs, go modules, plists, images) 2026-04-07 12:26:10 +02:00
setup.sh feat: rename ManaCore to Mana across entire codebase 2026-04-05 20:00:13 +02:00

Mana STT Service

Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral (Mistral API).

Optimized for Mac Mini M4 (Apple Silicon).

Architecture

                    ┌─────────────────────┐
                    │   mana-stt (3020)   │
                    │    FastAPI          │
                    └─────────┬───────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │   Whisper    │  │  Voxtral API │  │   vLLM       │
    │  MLX (Local) │  │  (Mistral)   │  │ (Optional)   │
    └──────────────┘  └──────────────┘  └──────────────┘

Features

  • Whisper Large V3 - Best quality, 99+ languages, German WER 6-9% (local, MLX)
  • Voxtral Mini - Mistral API, speaker diarization support (cloud)
  • Apple Silicon Optimized - Uses MLX for fast local inference
  • Automatic Fallback - Falls back between backends automatically
  • REST API - Simple HTTP endpoints for integration

Quick Start

Installation

cd services/mana-stt
./setup.sh

Run Locally

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020

Setup as System Service (Mac Mini)

./scripts/mac-mini/setup-stt.sh

API Endpoints

Endpoint Method Description
/health GET Health check
/models GET List available models
/transcribe POST Whisper transcription
/transcribe/voxtral POST Voxtral transcription
/transcribe/auto POST Auto-select best model

Usage Examples

curl -X POST http://localhost:3020/transcribe \
  -F "file=@recording.mp3" \
  -F "language=de"

Response:

{
  "text": "Das ist ein Beispieltext...",
  "language": "de",
  "model": "whisper-large-v3-turbo"
}

Transcribe with Voxtral

curl -X POST http://localhost:3020/transcribe/voxtral \
  -F "file=@recording.mp3" \
  -F "language=de"

Auto-Select Model

curl -X POST http://localhost:3020/transcribe/auto \
  -F "file=@recording.mp3" \
  -F "prefer=whisper"

Configuration

Environment variables:

Variable Default Description
PORT 3020 API server port
WHISPER_MODEL large-v3 Default Whisper model
PRELOAD_MODELS false Load models on startup
CORS_ORIGINS https://mana.how,... Allowed CORS origins
MISTRAL_API_KEY - Required for Voxtral API
USE_VLLM false Enable vLLM backend (experimental)
VLLM_URL http://localhost:8100 vLLM server URL

Supported Audio Formats

  • MP3, WAV, M4A, FLAC, OGG, WebM, MP4
  • Max file size: 100MB
  • Any sample rate (automatically resampled to 16kHz)

Model Comparison

Model German WER Speed VRAM License
Whisper Large V3 Turbo 6-9% Fast ~6 GB MIT
Voxtral Mini (3B) 8-12% Medium ~4 GB Apache 2.0

Logs

# Service logs
tail -f /tmp/mana-stt.log

# Error logs
tail -f /tmp/mana-stt.error.log

Troubleshooting

Model Download Slow

First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.

Out of Memory

Reduce batch size or use smaller model:

export WHISPER_MODEL=medium

MPS Not Available

Ensure PyTorch is installed with MPS support:

pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"

Integration

From Chat Backend (NestJS)

const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');

const response = await fetch('http://localhost:3020/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();

From SvelteKit Web

const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');

const response = await fetch('https://gpu-stt.mana.how/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();