mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 23:01:09 +02:00
New service docs: - services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local), WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy backend loading and the launchd plist setup on the Mac Mini. - services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and event-sharing. Documents the host (JWT) vs public (token) split, the rate-limit sweeper, and the createApp factory pattern that lets unit tests run without bootstrapping the production sweeper. Stale entries fixed: - mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the rewrite is the only mana-auth there is now. Email channel updated from Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md). - mana-notify: same Brevo → Stalwart fix in the channel table and env var defaults. PORT_SCHEMA.md flagged as aspirational: - The doc was dated 2026-03-28 and presented as "single source of truth", but cross-checking against actual service source files (config.go, main.py, start.sh) shows nothing matches. Added a prominent warning at the top with the real ports + two confirmed collisions: * mana-image-gen and mana-video-gen both default to PORT 3026 * mana-voice-bot and mana-sync both default to PORT 3050 Today these are masked because image-gen + voice-bot live on the Windows GPU server while video-gen + sync live on the Mac Mini, but the moment they share a host they collide. Either execute the planned reorg or pick non-colliding ports and rewrite the doc to match reality — flagged as a real follow-up. |
||
|---|---|---|
| .. | ||
| app | ||
| scripts | ||
| .env.example | ||
| CLAUDE.md | ||
| com.mana.mana-stt.plist | ||
| com.mana.vllm-voxtral.plist | ||
| grafana-dashboard.json | ||
| install-service.sh | ||
| install-services.sh | ||
| README.md | ||
| requirements-cuda.txt | ||
| requirements.txt | ||
| setup.sh | ||
Mana STT Service
Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral (Mistral API).
Optimized for Mac Mini M4 (Apple Silicon).
Architecture
┌─────────────────────┐
│ mana-stt (3020) │
│ FastAPI │
└─────────┬───────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Whisper │ │ Voxtral API │ │ vLLM │
│ MLX (Local) │ │ (Mistral) │ │ (Optional) │
└──────────────┘ └──────────────┘ └──────────────┘
Features
- Whisper Large V3 - Best quality, 99+ languages, German WER 6-9% (local, MLX)
- Voxtral Mini - Mistral API, speaker diarization support (cloud)
- Apple Silicon Optimized - Uses MLX for fast local inference
- Automatic Fallback - Falls back between backends automatically
- REST API - Simple HTTP endpoints for integration
Quick Start
Installation
cd services/mana-stt
./setup.sh
Run Locally
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020
Setup as System Service (Mac Mini)
./scripts/mac-mini/setup-stt.sh
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/models |
GET | List available models |
/transcribe |
POST | Whisper transcription |
/transcribe/voxtral |
POST | Voxtral transcription |
/transcribe/auto |
POST | Auto-select best model |
Usage Examples
Transcribe with Whisper (Recommended)
curl -X POST http://localhost:3020/transcribe \
-F "file=@recording.mp3" \
-F "language=de"
Response:
{
"text": "Das ist ein Beispieltext...",
"language": "de",
"model": "whisper-large-v3-turbo"
}
Transcribe with Voxtral
curl -X POST http://localhost:3020/transcribe/voxtral \
-F "file=@recording.mp3" \
-F "language=de"
Auto-Select Model
curl -X POST http://localhost:3020/transcribe/auto \
-F "file=@recording.mp3" \
-F "prefer=whisper"
Configuration
Environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3020 |
API server port |
WHISPER_MODEL |
large-v3 |
Default Whisper model |
PRELOAD_MODELS |
false |
Load models on startup |
CORS_ORIGINS |
https://mana.how,... |
Allowed CORS origins |
MISTRAL_API_KEY |
- | Required for Voxtral API |
USE_VLLM |
false |
Enable vLLM backend (experimental) |
VLLM_URL |
http://localhost:8100 |
vLLM server URL |
Supported Audio Formats
- MP3, WAV, M4A, FLAC, OGG, WebM, MP4
- Max file size: 100MB
- Any sample rate (automatically resampled to 16kHz)
Model Comparison
| Model | German WER | Speed | VRAM | License |
|---|---|---|---|---|
| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT |
| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 |
Logs
# Service logs
tail -f /tmp/mana-stt.log
# Error logs
tail -f /tmp/mana-stt.error.log
Troubleshooting
Model Download Slow
First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.
Out of Memory
Reduce batch size or use smaller model:
export WHISPER_MODEL=medium
MPS Not Available
Ensure PyTorch is installed with MPS support:
pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"
Integration
From Chat Backend (NestJS)
const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');
const response = await fetch('http://localhost:3020/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();
From SvelteKit Web
const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');
const response = await fetch('https://gpu-stt.mana.how/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();