mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 22:01:09 +02:00

History

Till JS b8e18b7f82 chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts The Windows GPU server has been the actual production home for these services for some time, and the running code there has drifted ahead of the repo. This sync pulls the live versions back into the repo so the Windows box is no longer the only place those changes exist. Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11): mana-llm: - src/main.py, src/config.py — small fixes (auth wiring, config tweaks) - src/api_auth.py — NEW (cross-service GPU_API_KEY validator) - service.pyw — Windows runner used by the ManaLLM scheduled task (sets up logging redirect, loads .env, calls uvicorn) mana-stt: - app/main.py — substantial cleanup (684→392 lines), drops the whisperx-as-separate-backend branching now that whisper_service.py rolls whisperx in directly - app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines) - app/auth.py + external_auth.py — significantly expanded auth - app/vram_manager.py — NEW (shared VRAM accounting helper) - service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH injection, .env loading - removed: app/whisper_service_cuda.py (folded into whisper_service.py) - removed: app/whisperx_service.py (folded into whisper_service.py) mana-tts: - app/auth.py, external_auth.py — same auth expansion as stt - app/f5_service.py, kokoro_service.py — Windows tweaks - app/vram_manager.py — NEW (same shared helper as stt) - service.pyw — Windows runner mana-video-gen: - service.pyw — Windows runner (no other changes; the .py code on the GPU box is byte-identical to what's already in the repo) The service.pyw files contain absolute Windows paths (C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user profile. Kept as-is intentionally — they exist to be deployed to that one machine and any abstraction layer would just hide what's actually happening. Anyone redeploying to a different layout will need to edit the path strings, which is a known and obvious change. Mac-Mini infrastructure for these services (launchd plists, install scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen implementation) is still on disk and will be removed in a follow-up commit, along with replacing mana-image-gen with the Windows diffusers+CUDA implementation. This commit is just the live-code sync.		2026-04-08 12:46:03 +02:00
..
app	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts	2026-04-08 12:46:03 +02:00
scripts	🐛 fix(mana-stt): adjust vLLM config for CPU mode	2026-02-11 16:14:14 +01:00
.env.example	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
CLAUDE.md	docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions	2026-04-08 12:23:48 +02:00
com.mana.mana-stt.plist	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
com.mana.vllm-voxtral.plist	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
grafana-dashboard.json	feat: rename ManaCore to Mana across entire codebase	2026-04-05 20:00:13 +02:00
install-service.sh	feat: rename ManaCore to Mana across entire codebase	2026-04-05 20:00:13 +02:00
install-services.sh	feat: rename ManaCore to Mana across entire codebase	2026-04-05 20:00:13 +02:00
README.md	feat(memoro): voice recording → mana-stt transcription pipeline	2026-04-07 18:48:41 +02:00
requirements-cuda.txt	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
requirements.txt	chore: complete ManaCore → Mana rename (docs, go modules, plists, images)	2026-04-07 12:26:10 +02:00
service.pyw	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts	2026-04-08 12:46:03 +02:00
setup.sh	feat: rename ManaCore to Mana across entire codebase	2026-04-05 20:00:13 +02:00

README.md

Mana STT Service

Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral (Mistral API).

Optimized for Mac Mini M4 (Apple Silicon).

Architecture

                    ┌─────────────────────┐
                    │   mana-stt (3020)   │
                    │    FastAPI          │
                    └─────────┬───────────┘
                              │
            ┌─────────────────┼─────────────────┐
            ▼                 ▼                 ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │   Whisper    │  │  Voxtral API │  │   vLLM       │
    │  MLX (Local) │  │  (Mistral)   │  │ (Optional)   │
    └──────────────┘  └──────────────┘  └──────────────┘

Features

Whisper Large V3 - Best quality, 99+ languages, German WER 6-9% (local, MLX)
Voxtral Mini - Mistral API, speaker diarization support (cloud)
Apple Silicon Optimized - Uses MLX for fast local inference
Automatic Fallback - Falls back between backends automatically
REST API - Simple HTTP endpoints for integration

Quick Start

Installation

cd services/mana-stt
./setup.sh

Run Locally

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020

Setup as System Service (Mac Mini)

./scripts/mac-mini/setup-stt.sh

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/models`	GET	List available models
`/transcribe`	POST	Whisper transcription
`/transcribe/voxtral`	POST	Voxtral transcription
`/transcribe/auto`	POST	Auto-select best model

Usage Examples

Transcribe with Whisper (Recommended)

curl -X POST http://localhost:3020/transcribe \
  -F "file=@recording.mp3" \
  -F "language=de"

Response:

{
  "text": "Das ist ein Beispieltext...",
  "language": "de",
  "model": "whisper-large-v3-turbo"
}

Transcribe with Voxtral

curl -X POST http://localhost:3020/transcribe/voxtral \
  -F "file=@recording.mp3" \
  -F "language=de"

Auto-Select Model

curl -X POST http://localhost:3020/transcribe/auto \
  -F "file=@recording.mp3" \
  -F "prefer=whisper"

Configuration

Environment variables:

Variable	Default	Description
`PORT`	`3020`	API server port
`WHISPER_MODEL`	`large-v3`	Default Whisper model
`PRELOAD_MODELS`	`false`	Load models on startup
`CORS_ORIGINS`	`https://mana.how,...`	Allowed CORS origins
`MISTRAL_API_KEY`	-	Required for Voxtral API
`USE_VLLM`	`false`	Enable vLLM backend (experimental)
`VLLM_URL`	`http://localhost:8100`	vLLM server URL

Supported Audio Formats

MP3, WAV, M4A, FLAC, OGG, WebM, MP4
Max file size: 100MB
Any sample rate (automatically resampled to 16kHz)

Model Comparison

Model	German WER	Speed	VRAM	License
Whisper Large V3 Turbo	6-9%	Fast	~6 GB	MIT
Voxtral Mini (3B)	8-12%	Medium	~4 GB	Apache 2.0

Logs

# Service logs
tail -f /tmp/mana-stt.log

# Error logs
tail -f /tmp/mana-stt.error.log

Troubleshooting

Model Download Slow

First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.

Out of Memory

Reduce batch size or use smaller model:

export WHISPER_MODEL=medium

MPS Not Available

Ensure PyTorch is installed with MPS support:

pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"

Integration

From Chat Backend (NestJS)

const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');

const response = await fetch('http://localhost:3020/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();

From SvelteKit Web

const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');

const response = await fetch('https://gpu-stt.mana.how/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();