mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:21:09 +02:00
- Fix telegram_user_id column type (integer -> bigint) for large user IDs - Add local STT support via mana-stt service (Whisper MLX + Voxtral) - Add STT provider config (local/openai) with fallback support - Add Grafana dashboard for mana-stt service metrics - Add ollama-metrics-proxy for LLM metrics collection - Add Grafana dashboard for Ollama LLM metrics Services added/updated: - telegram-project-doc-bot: local STT integration - mana-stt: Grafana dashboard - ollama-metrics-proxy: new service for Ollama metrics Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| app | ||
| grafana-dashboard.json | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
ManaCore STT Service
Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral Mini.
Optimized for Mac Mini M4 (Apple Silicon).
Features
- Whisper Large V3 Turbo - Best quality, 99+ languages, German WER 6-9%
- Voxtral Mini (3B) - Mistral AI, Apache 2.0, 8 languages including German
- Apple Silicon Optimized - Uses MLX for 10x faster inference
- REST API - Simple HTTP endpoints for integration
Quick Start
Installation
cd services/mana-stt
./setup.sh
Run Locally
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020
Setup as System Service (Mac Mini)
./scripts/mac-mini/setup-stt.sh
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/models |
GET | List available models |
/transcribe |
POST | Whisper transcription |
/transcribe/voxtral |
POST | Voxtral transcription |
/transcribe/auto |
POST | Auto-select best model |
Usage Examples
Transcribe with Whisper (Recommended)
curl -X POST http://localhost:3020/transcribe \
-F "file=@recording.mp3" \
-F "language=de"
Response:
{
"text": "Das ist ein Beispieltext...",
"language": "de",
"model": "whisper-large-v3-turbo"
}
Transcribe with Voxtral
curl -X POST http://localhost:3020/transcribe/voxtral \
-F "file=@recording.mp3" \
-F "language=de"
Auto-Select Model
curl -X POST http://localhost:3020/transcribe/auto \
-F "file=@recording.mp3" \
-F "prefer=whisper"
Configuration
Environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3020 |
API server port |
WHISPER_MODEL |
large-v3-turbo |
Default Whisper model |
PRELOAD_MODELS |
false |
Load models on startup |
CORS_ORIGINS |
https://mana.how,... |
Allowed CORS origins |
Supported Audio Formats
- MP3, WAV, M4A, FLAC, OGG, WebM, MP4
- Max file size: 100MB
- Any sample rate (automatically resampled to 16kHz)
Model Comparison
| Model | German WER | Speed | VRAM | License |
|---|---|---|---|---|
| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT |
| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 |
Logs
# Service logs
tail -f /tmp/manacore-stt.log
# Error logs
tail -f /tmp/manacore-stt.error.log
Troubleshooting
Model Download Slow
First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.
Out of Memory
Reduce batch size or use smaller model:
export WHISPER_MODEL=medium
MPS Not Available
Ensure PyTorch is installed with MPS support:
pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"
Integration
From Chat Backend (NestJS)
const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');
const response = await fetch('http://localhost:3020/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();
From SvelteKit Web
const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');
const response = await fetch('https://stt-api.mana.how/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();