mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 21:21:10 +02:00
- Add api_keys schema in mana-core-auth with SHA-256 hashing - Create NestJS module with CRUD endpoints and validation - Add external auth module to STT/TTS for sk_live_ key validation - Create web UI page at /api-keys for key management - Support rate limiting per key with configurable limits - Cache validation results for 5 minutes to reduce auth service load Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| app | ||
| scripts | ||
| .env.example | ||
| com.manacore.mana-stt.plist | ||
| com.manacore.vllm-voxtral.plist | ||
| grafana-dashboard.json | ||
| install-service.sh | ||
| install-services.sh | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
ManaCore STT Service
Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral (Mistral API).
Optimized for Mac Mini M4 (Apple Silicon).
Architecture
┌─────────────────────┐
│ mana-stt (3020) │
│ FastAPI │
└─────────┬───────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Whisper │ │ Voxtral API │ │ vLLM │
│ MLX (Local) │ │ (Mistral) │ │ (Optional) │
└──────────────┘ └──────────────┘ └──────────────┘
Features
- Whisper Large V3 - Best quality, 99+ languages, German WER 6-9% (local, MLX)
- Voxtral Mini - Mistral API, speaker diarization support (cloud)
- Apple Silicon Optimized - Uses MLX for fast local inference
- Automatic Fallback - Falls back between backends automatically
- REST API - Simple HTTP endpoints for integration
Quick Start
Installation
cd services/mana-stt
./setup.sh
Run Locally
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020
Setup as System Service (Mac Mini)
./scripts/mac-mini/setup-stt.sh
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/models |
GET | List available models |
/transcribe |
POST | Whisper transcription |
/transcribe/voxtral |
POST | Voxtral transcription |
/transcribe/auto |
POST | Auto-select best model |
Usage Examples
Transcribe with Whisper (Recommended)
curl -X POST http://localhost:3020/transcribe \
-F "file=@recording.mp3" \
-F "language=de"
Response:
{
"text": "Das ist ein Beispieltext...",
"language": "de",
"model": "whisper-large-v3-turbo"
}
Transcribe with Voxtral
curl -X POST http://localhost:3020/transcribe/voxtral \
-F "file=@recording.mp3" \
-F "language=de"
Auto-Select Model
curl -X POST http://localhost:3020/transcribe/auto \
-F "file=@recording.mp3" \
-F "prefer=whisper"
Configuration
Environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3020 |
API server port |
WHISPER_MODEL |
large-v3 |
Default Whisper model |
PRELOAD_MODELS |
false |
Load models on startup |
CORS_ORIGINS |
https://mana.how,... |
Allowed CORS origins |
MISTRAL_API_KEY |
- | Required for Voxtral API |
USE_VLLM |
false |
Enable vLLM backend (experimental) |
VLLM_URL |
http://localhost:8100 |
vLLM server URL |
Supported Audio Formats
- MP3, WAV, M4A, FLAC, OGG, WebM, MP4
- Max file size: 100MB
- Any sample rate (automatically resampled to 16kHz)
Model Comparison
| Model | German WER | Speed | VRAM | License |
|---|---|---|---|---|
| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT |
| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 |
Logs
# Service logs
tail -f /tmp/manacore-stt.log
# Error logs
tail -f /tmp/manacore-stt.error.log
Troubleshooting
Model Download Slow
First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.
Out of Memory
Reduce batch size or use smaller model:
export WHISPER_MODEL=medium
MPS Not Available
Ensure PyTorch is installed with MPS support:
pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"
Integration
From Chat Backend (NestJS)
const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');
const response = await fetch('http://localhost:3020/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();
From SvelteKit Web
const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');
const response = await fetch('https://stt-api.mana.how/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();