mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 23:21:08 +02:00
- Set up 5 AI services on Windows GPU server (RTX 3090): - mana-llm (Port 3025): OpenAI-compatible LLM gateway via Ollama - mana-stt (Port 3020): WhisperX with word timestamps + speaker diarization - mana-tts (Port 3022): Kokoro (EN) + Edge TTS (DE) + Piper (local DE) - mana-image-gen (Port 3023): FLUX.2 klein 4B image generation - Ollama (Port 11434): gemma3:4b/12b, qwen2.5-coder:14b, nomic-embed-text - Add @manacore/shared-gpu TypeScript client package with SttClient, TtsClient, ImageClient - Add CUDA-compatible whisper_service using faster-whisper for Windows - Configure public access via Cloudflare Tunnel (gpu-llm/stt/tts/img.mana.how) - Add Loki log aggregator (Docker on Mac Mini) + log shipper on GPU server - Add GPU scrape targets to Prometheus/VictoriaMetrics config - Add Grafana Loki datasource for GPU service logs - Add health check with auto-restart, log rotation, and log shipping - Document complete setup: Always-On config, troubleshooting, architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| app | ||
| scripts | ||
| .env.example | ||
| com.manacore.mana-stt.plist | ||
| com.manacore.vllm-voxtral.plist | ||
| grafana-dashboard.json | ||
| install-service.sh | ||
| install-services.sh | ||
| README.md | ||
| requirements.txt | ||
| setup.sh | ||
ManaCore STT Service
Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral (Mistral API).
Optimized for Mac Mini M4 (Apple Silicon).
Architecture
┌─────────────────────┐
│ mana-stt (3020) │
│ FastAPI │
└─────────┬───────────┘
│
┌─────────────────┼─────────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Whisper │ │ Voxtral API │ │ vLLM │
│ MLX (Local) │ │ (Mistral) │ │ (Optional) │
└──────────────┘ └──────────────┘ └──────────────┘
Features
- Whisper Large V3 - Best quality, 99+ languages, German WER 6-9% (local, MLX)
- Voxtral Mini - Mistral API, speaker diarization support (cloud)
- Apple Silicon Optimized - Uses MLX for fast local inference
- Automatic Fallback - Falls back between backends automatically
- REST API - Simple HTTP endpoints for integration
Quick Start
Installation
cd services/mana-stt
./setup.sh
Run Locally
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020
Setup as System Service (Mac Mini)
./scripts/mac-mini/setup-stt.sh
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check |
/models |
GET | List available models |
/transcribe |
POST | Whisper transcription |
/transcribe/voxtral |
POST | Voxtral transcription |
/transcribe/auto |
POST | Auto-select best model |
Usage Examples
Transcribe with Whisper (Recommended)
curl -X POST http://localhost:3020/transcribe \
-F "file=@recording.mp3" \
-F "language=de"
Response:
{
"text": "Das ist ein Beispieltext...",
"language": "de",
"model": "whisper-large-v3-turbo"
}
Transcribe with Voxtral
curl -X POST http://localhost:3020/transcribe/voxtral \
-F "file=@recording.mp3" \
-F "language=de"
Auto-Select Model
curl -X POST http://localhost:3020/transcribe/auto \
-F "file=@recording.mp3" \
-F "prefer=whisper"
Configuration
Environment variables:
| Variable | Default | Description |
|---|---|---|
PORT |
3020 |
API server port |
WHISPER_MODEL |
large-v3 |
Default Whisper model |
PRELOAD_MODELS |
false |
Load models on startup |
CORS_ORIGINS |
https://mana.how,... |
Allowed CORS origins |
MISTRAL_API_KEY |
- | Required for Voxtral API |
USE_VLLM |
false |
Enable vLLM backend (experimental) |
VLLM_URL |
http://localhost:8100 |
vLLM server URL |
Supported Audio Formats
- MP3, WAV, M4A, FLAC, OGG, WebM, MP4
- Max file size: 100MB
- Any sample rate (automatically resampled to 16kHz)
Model Comparison
| Model | German WER | Speed | VRAM | License |
|---|---|---|---|---|
| Whisper Large V3 Turbo | 6-9% | Fast | ~6 GB | MIT |
| Voxtral Mini (3B) | 8-12% | Medium | ~4 GB | Apache 2.0 |
Logs
# Service logs
tail -f /tmp/manacore-stt.log
# Error logs
tail -f /tmp/manacore-stt.error.log
Troubleshooting
Model Download Slow
First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.
Out of Memory
Reduce batch size or use smaller model:
export WHISPER_MODEL=medium
MPS Not Available
Ensure PyTorch is installed with MPS support:
pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"
Integration
From Chat Backend (NestJS)
const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');
const response = await fetch('http://localhost:3020/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();
From SvelteKit Web
const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');
const response = await fetch('https://stt-api.mana.how/transcribe', {
method: 'POST',
body: formData,
});
const { text } = await response.json();