📝 docs(mana-stt): document Whisper + Mistral API architecture

- Disable vLLM by default (has issues on macOS CPU) - Use Mistral API for Voxtral transcription (cloud-based) - Keep Whisper-MLX for local transcription - Update README with architecture diagram Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-05-14 20:01:09 +02:00 · 2026-02-11 16:34:03 +01:00 · 2026-02-11 16:34:03 +01:00 · 21d50d1e0b
commit 21d50d1e0b
parent 7c9c2645e3
2 changed files with 27 additions and 7 deletions
--- a/services/mana-stt/README.md
+++ b/services/mana-stt/README.md
@ -1,14 +1,31 @@
 # ManaCore STT Service

-Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral Mini**.
+Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**.

 Optimized for Mac Mini M4 (Apple Silicon).

+## Architecture
+
+```
+                    ┌─────────────────────┐
+                    │   mana-stt (3020)   │
+                    │    FastAPI          │
+                    └─────────┬───────────┘
+                              │
+            ┌─────────────────┼─────────────────┐
+            ▼                 ▼                 ▼
+    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
+    │   Whisper    │  │  Voxtral API │  │   vLLM       │
+    │  MLX (Local) │  │  (Mistral)   │  │ (Optional)   │
+    └──────────────┘  └──────────────┘  └──────────────┘
+```
+
 ## Features

- **Whisper Large V3 Turbo** - Best quality, 99+ languages, German WER 6-9%
- **Voxtral Mini (3B)** - Mistral AI, Apache 2.0, 8 languages including German
- **Apple Silicon Optimized** - Uses MLX for 10x faster inference
+- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX)
+- **Voxtral Mini** - Mistral API, speaker diarization support (cloud)
+- **Apple Silicon Optimized** - Uses MLX for fast local inference
+- **Automatic Fallback** - Falls back between backends automatically
 - **REST API** - Simple HTTP endpoints for integration

 ## Quick Start
@ -85,9 +102,12 @@ Environment variables:
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `PORT` | `3020` | API server port |
-| `WHISPER_MODEL` | `large-v3-turbo` | Default Whisper model |
+| `WHISPER_MODEL` | `large-v3` | Default Whisper model |
 | `PRELOAD_MODELS` | `false` | Load models on startup |
 | `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
+| `MISTRAL_API_KEY` | - | Required for Voxtral API |
+| `USE_VLLM` | `false` | Enable vLLM backend (experimental) |
+| `VLLM_URL` | `http://localhost:8100` | vLLM server URL |

 ## Supported Audio Formats

--- a/services/mana-stt/app/main.py
+++ b/services/mana-stt/app/main.py
@ -32,9 +32,9 @@ CORS_ORIGINS = os.getenv(
    "https://mana.how,https://chat.mana.how,http://localhost:5173"
 ).split(",")

-# vLLM configuration
+# vLLM configuration (disabled by default - has issues on macOS CPU)
 VLLM_URL = os.getenv("VLLM_URL", "http://localhost:8100")
-USE_VLLM = os.getenv("USE_VLLM", "true").lower() == "true"
+USE_VLLM = os.getenv("USE_VLLM", "false").lower() == "true"


 # Response models