From 21d50d1e0b07668bb26e5c18a571d624e54e2a82 Mon Sep 17 00:00:00 2001 From: Till-JS <101404291+Till-JS@users.noreply.github.com> Date: Wed, 11 Feb 2026 16:34:03 +0100 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=9D=20docs(mana-stt):=20document=20Whi?= =?UTF-8?q?sper=20+=20Mistral=20API=20architecture?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Disable vLLM by default (has issues on macOS CPU) - Use Mistral API for Voxtral transcription (cloud-based) - Keep Whisper-MLX for local transcription - Update README with architecture diagram Co-Authored-By: Claude Opus 4.5 --- services/mana-stt/README.md | 30 +++++++++++++++++++++++++----- services/mana-stt/app/main.py | 4 ++-- 2 files changed, 27 insertions(+), 7 deletions(-) diff --git a/services/mana-stt/README.md b/services/mana-stt/README.md index 8ea9642f5..ba214a62f 100644 --- a/services/mana-stt/README.md +++ b/services/mana-stt/README.md @@ -1,14 +1,31 @@ # ManaCore STT Service -Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral Mini**. +Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**. Optimized for Mac Mini M4 (Apple Silicon). +## Architecture + +``` + ┌─────────────────────┐ + │ mana-stt (3020) │ + │ FastAPI │ + └─────────┬───────────┘ + │ + ┌─────────────────┼─────────────────┐ + ▼ ▼ ▼ + ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ + │ Whisper │ │ Voxtral API │ │ vLLM │ + │ MLX (Local) │ │ (Mistral) │ │ (Optional) │ + └──────────────┘ └──────────────┘ └──────────────┘ +``` + ## Features -- **Whisper Large V3 Turbo** - Best quality, 99+ languages, German WER 6-9% -- **Voxtral Mini (3B)** - Mistral AI, Apache 2.0, 8 languages including German -- **Apple Silicon Optimized** - Uses MLX for 10x faster inference +- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX) +- **Voxtral Mini** - Mistral API, speaker diarization support (cloud) +- **Apple Silicon Optimized** - Uses MLX for fast local inference +- **Automatic Fallback** - Falls back between backends automatically - **REST API** - Simple HTTP endpoints for integration ## Quick Start @@ -85,9 +102,12 @@ Environment variables: | Variable | Default | Description | |----------|---------|-------------| | `PORT` | `3020` | API server port | -| `WHISPER_MODEL` | `large-v3-turbo` | Default Whisper model | +| `WHISPER_MODEL` | `large-v3` | Default Whisper model | | `PRELOAD_MODELS` | `false` | Load models on startup | | `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins | +| `MISTRAL_API_KEY` | - | Required for Voxtral API | +| `USE_VLLM` | `false` | Enable vLLM backend (experimental) | +| `VLLM_URL` | `http://localhost:8100` | vLLM server URL | ## Supported Audio Formats diff --git a/services/mana-stt/app/main.py b/services/mana-stt/app/main.py index 5423f044e..f5e0a5e3e 100644 --- a/services/mana-stt/app/main.py +++ b/services/mana-stt/app/main.py @@ -32,9 +32,9 @@ CORS_ORIGINS = os.getenv( "https://mana.how,https://chat.mana.how,http://localhost:5173" ).split(",") -# vLLM configuration +# vLLM configuration (disabled by default - has issues on macOS CPU) VLLM_URL = os.getenv("VLLM_URL", "http://localhost:8100") -USE_VLLM = os.getenv("USE_VLLM", "true").lower() == "true" +USE_VLLM = os.getenv("USE_VLLM", "false").lower() == "true" # Response models