📝 docs(mana-stt): document Whisper + Mistral API architecture

- Disable vLLM by default (has issues on macOS CPU)
- Use Mistral API for Voxtral transcription (cloud-based)
- Keep Whisper-MLX for local transcription
- Update README with architecture diagram

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Till-JS 2026-02-11 16:34:03 +01:00
parent 7c9c2645e3
commit 21d50d1e0b
2 changed files with 27 additions and 7 deletions

View file

@ -1,14 +1,31 @@
# ManaCore STT Service
Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral Mini**.
Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**.
Optimized for Mac Mini M4 (Apple Silicon).
## Architecture
```
┌─────────────────────┐
│ mana-stt (3020) │
│ FastAPI │
└─────────┬───────────┘
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Whisper │ │ Voxtral API │ │ vLLM │
│ MLX (Local) │ │ (Mistral) │ │ (Optional) │
└──────────────┘ └──────────────┘ └──────────────┘
```
## Features
- **Whisper Large V3 Turbo** - Best quality, 99+ languages, German WER 6-9%
- **Voxtral Mini (3B)** - Mistral AI, Apache 2.0, 8 languages including German
- **Apple Silicon Optimized** - Uses MLX for 10x faster inference
- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX)
- **Voxtral Mini** - Mistral API, speaker diarization support (cloud)
- **Apple Silicon Optimized** - Uses MLX for fast local inference
- **Automatic Fallback** - Falls back between backends automatically
- **REST API** - Simple HTTP endpoints for integration
## Quick Start
@ -85,9 +102,12 @@ Environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3020` | API server port |
| `WHISPER_MODEL` | `large-v3-turbo` | Default Whisper model |
| `WHISPER_MODEL` | `large-v3` | Default Whisper model |
| `PRELOAD_MODELS` | `false` | Load models on startup |
| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
| `MISTRAL_API_KEY` | - | Required for Voxtral API |
| `USE_VLLM` | `false` | Enable vLLM backend (experimental) |
| `VLLM_URL` | `http://localhost:8100` | vLLM server URL |
## Supported Audio Formats

View file

@ -32,9 +32,9 @@ CORS_ORIGINS = os.getenv(
"https://mana.how,https://chat.mana.how,http://localhost:5173"
).split(",")
# vLLM configuration
# vLLM configuration (disabled by default - has issues on macOS CPU)
VLLM_URL = os.getenv("VLLM_URL", "http://localhost:8100")
USE_VLLM = os.getenv("USE_VLLM", "true").lower() == "true"
USE_VLLM = os.getenv("USE_VLLM", "false").lower() == "true"
# Response models