mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 23:21:08 +02:00

History

Till JS 16e0d99c5a feat(gpu-server): complete GPU server setup with AI services, monitoring, and public access - Set up 5 AI services on Windows GPU server (RTX 3090): - mana-llm (Port 3025): OpenAI-compatible LLM gateway via Ollama - mana-stt (Port 3020): WhisperX with word timestamps + speaker diarization - mana-tts (Port 3022): Kokoro (EN) + Edge TTS (DE) + Piper (local DE) - mana-image-gen (Port 3023): FLUX.2 klein 4B image generation - Ollama (Port 11434): gemma3:4b/12b, qwen2.5-coder:14b, nomic-embed-text - Add @manacore/shared-gpu TypeScript client package with SttClient, TtsClient, ImageClient - Add CUDA-compatible whisper_service using faster-whisper for Windows - Configure public access via Cloudflare Tunnel (gpu-llm/stt/tts/img.mana.how) - Add Loki log aggregator (Docker on Mac Mini) + log shipper on GPU server - Add GPU scrape targets to Prometheus/VictoriaMetrics config - Add Grafana Loki datasource for GPU service logs - Add health check with auto-restart, log rotation, and log shipping - Document complete setup: Always-On config, troubleshooting, architecture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-03-27 21:35:30 +01:00
..
app	feat(gpu-server): complete GPU server setup with AI services, monitoring, and public access	2026-03-27 21:35:30 +01:00
scripts	🐛 fix(mana-stt): adjust vLLM config for CPU mode	2026-02-11 16:14:14 +01:00
.env.example	🔒️ feat(stt,tts): add API key authentication with rate limiting	2026-02-11 18:04:22 +01:00
com.manacore.mana-stt.plist	🔧 chore(stt,tts): update launchd plists to load .env files	2026-02-12 01:44:46 +01:00
com.manacore.vllm-voxtral.plist	✨ feat(mana-stt): add vLLM integration for Voxtral transcription	2026-02-11 16:10:00 +01:00
grafana-dashboard.json	feat(telegram-bot): add local STT support and Prometheus metrics	2026-01-27 16:51:09 +01:00
install-service.sh	✨ feat(mana-stt): add vLLM integration for Voxtral transcription	2026-02-11 16:10:00 +01:00
install-services.sh	✨ feat(mana-stt): add vLLM integration for Voxtral transcription	2026-02-11 16:10:00 +01:00
README.md	📝 docs(mana-stt): document Whisper + Mistral API architecture	2026-02-11 16:34:03 +01:00
requirements.txt	✨ feat(auth): add API key management for STT/TTS services	2026-02-12 02:12:05 +01:00
setup.sh	fix(stt): change default model to large-v3 (large-v3-turbo not supported by lightning-whisper-mlx)	2026-01-27 01:36:49 +01:00

README.md

ManaCore STT Service

Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral (Mistral API).

Optimized for Mac Mini M4 (Apple Silicon).

Architecture

                    ┌─────────────────────┐
                    │   mana-stt (3020)   │
                    │    FastAPI          │
                    └─────────┬───────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │   Whisper    │  │  Voxtral API │  │   vLLM       │
    │  MLX (Local) │  │  (Mistral)   │  │ (Optional)   │
    └──────────────┘  └──────────────┘  └──────────────┘

Features

Whisper Large V3 - Best quality, 99+ languages, German WER 6-9% (local, MLX)
Voxtral Mini - Mistral API, speaker diarization support (cloud)
Apple Silicon Optimized - Uses MLX for fast local inference
Automatic Fallback - Falls back between backends automatically
REST API - Simple HTTP endpoints for integration

Quick Start

Installation

cd services/mana-stt
./setup.sh

Run Locally

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020

Setup as System Service (Mac Mini)

./scripts/mac-mini/setup-stt.sh

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/models`	GET	List available models
`/transcribe`	POST	Whisper transcription
`/transcribe/voxtral`	POST	Voxtral transcription
`/transcribe/auto`	POST	Auto-select best model

Usage Examples

Transcribe with Whisper (Recommended)

curl -X POST http://localhost:3020/transcribe \
  -F "file=@recording.mp3" \
  -F "language=de"

Response:

{
  "text": "Das ist ein Beispieltext...",
  "language": "de",
  "model": "whisper-large-v3-turbo"
}

Transcribe with Voxtral

curl -X POST http://localhost:3020/transcribe/voxtral \
  -F "file=@recording.mp3" \
  -F "language=de"

Auto-Select Model

curl -X POST http://localhost:3020/transcribe/auto \
  -F "file=@recording.mp3" \
  -F "prefer=whisper"

Configuration

Environment variables:

Variable	Default	Description
`PORT`	`3020`	API server port
`WHISPER_MODEL`	`large-v3`	Default Whisper model
`PRELOAD_MODELS`	`false`	Load models on startup
`CORS_ORIGINS`	`https://mana.how,...`	Allowed CORS origins
`MISTRAL_API_KEY`	-	Required for Voxtral API
`USE_VLLM`	`false`	Enable vLLM backend (experimental)
`VLLM_URL`	`http://localhost:8100`	vLLM server URL

Supported Audio Formats

MP3, WAV, M4A, FLAC, OGG, WebM, MP4
Max file size: 100MB
Any sample rate (automatically resampled to 16kHz)

Model Comparison

Model	German WER	Speed	VRAM	License
Whisper Large V3 Turbo	6-9%	Fast	~6 GB	MIT
Voxtral Mini (3B)	8-12%	Medium	~4 GB	Apache 2.0

Logs

# Service logs
tail -f /tmp/manacore-stt.log

# Error logs
tail -f /tmp/manacore-stt.error.log

Troubleshooting

Model Download Slow

First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.

Out of Memory

Reduce batch size or use smaller model:

export WHISPER_MODEL=medium

MPS Not Available

Ensure PyTorch is installed with MPS support:

pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"

Integration

From Chat Backend (NestJS)

const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');

const response = await fetch('http://localhost:3020/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();

From SvelteKit Web

const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');

const response = await fetch('https://stt-api.mana.how/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();