managarten/services/mana-stt/README.md
Till-JS 21d50d1e0b 📝 docs(mana-stt): document Whisper + Mistral API architecture
- Disable vLLM by default (has issues on macOS CPU)
- Use Mistral API for Voxtral transcription (cloud-based)
- Keep Whisper-MLX for local transcription
- Update README with architecture diagram

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 16:34:03 +01:00

185 lines
4.6 KiB
Markdown

# ManaCore STT Service
Speech-to-Text API service with **Whisper (Lightning MLX)** and **Voxtral (Mistral API)**.
Optimized for Mac Mini M4 (Apple Silicon).
## Architecture
```
┌─────────────────────┐
│ mana-stt (3020) │
│ FastAPI │
└─────────┬───────────┘
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Whisper │ │ Voxtral API │ │ vLLM │
│ MLX (Local) │ │ (Mistral) │ │ (Optional) │
└──────────────┘ └──────────────┘ └──────────────┘
```
## Features
- **Whisper Large V3** - Best quality, 99+ languages, German WER 6-9% (local, MLX)
- **Voxtral Mini** - Mistral API, speaker diarization support (cloud)
- **Apple Silicon Optimized** - Uses MLX for fast local inference
- **Automatic Fallback** - Falls back between backends automatically
- **REST API** - Simple HTTP endpoints for integration
## Quick Start
### Installation
```bash
cd services/mana-stt
./setup.sh
```
### Run Locally
```bash
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020
```
### Setup as System Service (Mac Mini)
```bash
./scripts/mac-mini/setup-stt.sh
```
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/models` | GET | List available models |
| `/transcribe` | POST | Whisper transcription |
| `/transcribe/voxtral` | POST | Voxtral transcription |
| `/transcribe/auto` | POST | Auto-select best model |
## Usage Examples
### Transcribe with Whisper (Recommended)
```bash
curl -X POST http://localhost:3020/transcribe \
-F "file=@recording.mp3" \
-F "language=de"
```
Response:
```json
{
"text": "Das ist ein Beispieltext...",
"language": "de",
"model": "whisper-large-v3-turbo"
}
```
### Transcribe with Voxtral
```bash
curl -X POST http://localhost:3020/transcribe/voxtral \
-F "file=@recording.mp3" \
-F "language=de"
```
### Auto-Select Model
```bash
curl -X POST http://localhost:3020/transcribe/auto \
-F "file=@recording.mp3" \
-F "prefer=whisper"
```
## Configuration
Environment variables:
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3020` | API server port |
| `WHISPER_MODEL` | `large-v3` | Default Whisper model |
| `PRELOAD_MODELS` | `false` | Load models on startup |
| `CORS_ORIGINS` | `https://mana.how,...` | Allowed CORS origins |
| `MISTRAL_API_KEY` | - | Required for Voxtral API |
| `USE_VLLM` | `false` | Enable vLLM backend (experimental) |
| `VLLM_URL` | `http://localhost:8100` | vLLM server URL |
## Supported Audio Formats
- MP3, WAV, M4A, FLAC, OGG, WebM, MP4
- Max file size: 100MB
- Any sample rate (automatically resampled to 16kHz)
## Model Comparison
| Model | German WER | Speed | VRAM | License |
|-------|------------|-------|------|---------|
| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT |
| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 |
## Logs
```bash
# Service logs
tail -f /tmp/manacore-stt.log
# Error logs
tail -f /tmp/manacore-stt.error.log
```
## Troubleshooting
### Model Download Slow
First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.
### Out of Memory
Reduce batch size or use smaller model:
```bash
export WHISPER_MODEL=medium
```
### MPS Not Available
Ensure PyTorch is installed with MPS support:
```bash
pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"
```
## Integration
### From Chat Backend (NestJS)
```typescript
const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');
const response = await fetch('http://localhost:3020/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();
```
### From SvelteKit Web
```typescript
const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');
const response = await fetch('https://stt-api.mana.how/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();
```