From 21d50d1e0b07668bb26e5c18a571d624e54e2a82 Mon Sep 17 00:00:00 2001
From: Till-JS <101404291+Till-JS@users.noreply.github.com>
Date: Wed, 11 Feb 2026 16:34:03 +0100
Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20docs(mana-stt):=20document=20Whi?=
 =?UTF-8?q?sper=20+=20Mistral=20API=20architecture?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Disable vLLM by default (has issues on macOS CPU)
- Use Mistral API for Voxtral transcription (cloud-based)
- Keep Whisper-MLX for local transcription
- Update README with architecture diagram

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 services/mana-stt/README.md   | 30 +++++++++++++++++++++++++-----
 services/mana-stt/app/main.py |  4 ++--
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/services/mana-stt/README.md b/services/mana-stt/README.md
index 8ea9642f5..ba214a62f 100644
--- a/services/mana-stt/README.md
+++ b/services/mana-stt/README.md
@@ -1,14 +1,31 @@
 # ManaCore STT Service
 
-Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral Mini**.
+Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**.
 
 Optimized for Mac Mini M4 (Apple Silicon).
 
+## Architecture
+
+```
+                    ┌─────────────────────┐
+                    │   mana-stt (3020)   │
+                    │    FastAPI          │
+                    └─────────┬───────────┘
+                              │
+            ┌─────────────────┼─────────────────┐
+            ▼                 ▼                 ▼
+    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
+    │   Whisper    │  │  Voxtral API │  │   vLLM       │
+    │  MLX (Local) │  │  (Mistral)   │  │ (Optional)   │
+    └──────────────┘  └──────────────┘  └──────────────┘
+```
+
 ## Features
 
-- **Whisper Large V3 Turbo** - Best quality, 99+ languages, German WER 6-9%
-- **Voxtral Mini (3B)** - Mistral AI, Apache 2.0, 8 languages including German
-- **Apple Silicon Optimized** - Uses MLX for 10x faster inference
+- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX)
+- **Voxtral Mini** - Mistral API, speaker diarization support (cloud)
+- **Apple Silicon Optimized** - Uses MLX for fast local inference
+- **Automatic Fallback** - Falls back between backends automatically
 - **REST API** - Simple HTTP endpoints for integration
 
 ## Quick Start
@@ -85,9 +102,12 @@ Environment variables:
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `PORT` | `3020` | API server port |
-| `WHISPER_MODEL` | `large-v3-turbo` | Default Whisper model |
+| `WHISPER_MODEL` | `large-v3` | Default Whisper model |
 | `PRELOAD_MODELS` | `false` | Load models on startup |
 | `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
+| `MISTRAL_API_KEY` | - | Required for Voxtral API |
+| `USE_VLLM` | `false` | Enable vLLM backend (experimental) |
+| `VLLM_URL` | `http://localhost:8100` | vLLM server URL |
 
 ## Supported Audio Formats
 
diff --git a/services/mana-stt/app/main.py b/services/mana-stt/app/main.py
index 5423f044e..f5e0a5e3e 100644
--- a/services/mana-stt/app/main.py
+++ b/services/mana-stt/app/main.py
@@ -32,9 +32,9 @@ CORS_ORIGINS = os.getenv(
     "https://mana.how,https://chat.mana.how,http://localhost:5173"
 ).split(",")
 
-# vLLM configuration
+# vLLM configuration (disabled by default - has issues on macOS CPU)
 VLLM_URL = os.getenv("VLLM_URL", "http://localhost:8100")
-USE_VLLM = os.getenv("USE_VLLM", "true").lower() == "true"
+USE_VLLM = os.getenv("USE_VLLM", "false").lower() == "true"
 
 
 # Response models