managarten/services/mana-stt
Till-JS bf0fa04e7e feat(stt): add speech-to-text service for Mac Mini
Add mana-stt service with Whisper and Voxtral support for local
transcription. Includes setup script and launchd integration for
automatic startup on Mac Mini server.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 01:33:10 +01:00
..
app feat(stt): add speech-to-text service for Mac Mini 2026-01-27 01:33:10 +01:00
README.md feat(stt): add speech-to-text service for Mac Mini 2026-01-27 01:33:10 +01:00
requirements.txt feat(stt): add speech-to-text service for Mac Mini 2026-01-27 01:33:10 +01:00
setup.sh feat(stt): add speech-to-text service for Mac Mini 2026-01-27 01:33:10 +01:00

ManaCore STT Service

Speech-to-Text API service with Whisper (Lightning MLX) and Voxtral Mini.

Optimized for Mac Mini M4 (Apple Silicon).

Features

  • Whisper Large V3 Turbo - Best quality, 99+ languages, German WER 6-9%
  • Voxtral Mini (3B) - Mistral AI, Apache 2.0, 8 languages including German
  • Apple Silicon Optimized - Uses MLX for 10x faster inference
  • REST API - Simple HTTP endpoints for integration

Quick Start

Installation

cd services/mana-stt
./setup.sh

Run Locally

source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3020

Setup as System Service (Mac Mini)

./scripts/mac-mini/setup-stt.sh

API Endpoints

Endpoint Method Description
/health GET Health check
/models GET List available models
/transcribe POST Whisper transcription
/transcribe/voxtral POST Voxtral transcription
/transcribe/auto POST Auto-select best model

Usage Examples

curl -X POST http://localhost:3020/transcribe \
  -F "file=@recording.mp3" \
  -F "language=de"

Response:

{
  "text": "Das ist ein Beispieltext...",
  "language": "de",
  "model": "whisper-large-v3-turbo"
}

Transcribe with Voxtral

curl -X POST http://localhost:3020/transcribe/voxtral \
  -F "file=@recording.mp3" \
  -F "language=de"

Auto-Select Model

curl -X POST http://localhost:3020/transcribe/auto \
  -F "file=@recording.mp3" \
  -F "prefer=whisper"

Configuration

Environment variables:

Variable Default Description
PORT 3020 API server port
WHISPER_MODEL large-v3-turbo Default Whisper model
PRELOAD_MODELS false Load models on startup
CORS_ORIGINS https://mana.how,... Allowed CORS origins

Supported Audio Formats

  • MP3, WAV, M4A, FLAC, OGG, WebM, MP4
  • Max file size: 100MB
  • Any sample rate (automatically resampled to 16kHz)

Model Comparison

Model German WER Speed VRAM License
Whisper Large V3 Turbo 6-9% Fast ~6 GB MIT
Voxtral Mini (3B) 8-12% Medium ~4 GB Apache 2.0

Logs

# Service logs
tail -f /tmp/manacore-stt.log

# Error logs
tail -f /tmp/manacore-stt.error.log

Troubleshooting

Model Download Slow

First run downloads ~1.6 GB for Whisper and ~6 GB for Voxtral. Be patient.

Out of Memory

Reduce batch size or use smaller model:

export WHISPER_MODEL=medium

MPS Not Available

Ensure PyTorch is installed with MPS support:

pip install torch torchvision torchaudio
python -c "import torch; print(torch.backends.mps.is_available())"

Integration

From Chat Backend (NestJS)

const formData = new FormData();
formData.append('file', audioBuffer, 'recording.webm');
formData.append('language', 'de');

const response = await fetch('http://localhost:3020/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();

From SvelteKit Web

const formData = new FormData();
formData.append('file', audioBlob, 'recording.webm');

const response = await fetch('https://stt-api.mana.how/transcribe', {
  method: 'POST',
  body: formData,
});

const { text } = await response.json();