Commit graph

13 commits

Author SHA1 Message Date
Till JS
16e0d99c5a feat(gpu-server): complete GPU server setup with AI services, monitoring, and public access
- Set up 5 AI services on Windows GPU server (RTX 3090):
  - mana-llm (Port 3025): OpenAI-compatible LLM gateway via Ollama
  - mana-stt (Port 3020): WhisperX with word timestamps + speaker diarization
  - mana-tts (Port 3022): Kokoro (EN) + Edge TTS (DE) + Piper (local DE)
  - mana-image-gen (Port 3023): FLUX.2 klein 4B image generation
  - Ollama (Port 11434): gemma3:4b/12b, qwen2.5-coder:14b, nomic-embed-text

- Add @manacore/shared-gpu TypeScript client package with SttClient, TtsClient, ImageClient
- Add CUDA-compatible whisper_service using faster-whisper for Windows
- Configure public access via Cloudflare Tunnel (gpu-llm/stt/tts/img.mana.how)
- Add Loki log aggregator (Docker on Mac Mini) + log shipper on GPU server
- Add GPU scrape targets to Prometheus/VictoriaMetrics config
- Add Grafana Loki datasource for GPU service logs
- Add health check with auto-restart, log rotation, and log shipping
- Document complete setup: Always-On config, troubleshooting, architecture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:35:30 +01:00
Till-JS
8b6ff0c679 feat(auth): add API key management for STT/TTS services
- Add api_keys schema in mana-core-auth with SHA-256 hashing
- Create NestJS module with CRUD endpoints and validation
- Add external auth module to STT/TTS for sk_live_ key validation
- Create web UI page at /api-keys for key management
- Support rate limiting per key with configurable limits
- Cache validation results for 5 minutes to reduce auth service load

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-12 02:12:05 +01:00
Till-JS
898f5d2112 🔧 chore(stt,tts): update launchd plists to load .env files
Source .env file before starting uvicorn to enable API key auth
and other environment-based configuration.

Removes hardcoded PORT values in favor of .env configuration.
2026-02-12 01:44:46 +01:00
Till-JS
aab304fc95 🔒️ feat(stt,tts): add API key authentication with rate limiting
Add auth.py module to both STT and TTS services with:
- API key validation via X-API-Key header
- Rate limiting with sliding window (requests per minute)
- Internal API key option for unlimited access
- Environment variable configuration

All protected endpoints now require authentication.
Public endpoints (/health, /docs) remain accessible.
2026-02-11 18:04:22 +01:00
Till-JS
21d50d1e0b 📝 docs(mana-stt): document Whisper + Mistral API architecture
- Disable vLLM by default (has issues on macOS CPU)
- Use Mistral API for Voxtral transcription (cloud-based)
- Keep Whisper-MLX for local transcription
- Update README with architecture diagram

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 16:34:03 +01:00
Till-JS
7c9c2645e3 🐛 fix(mana-stt): adjust vLLM config for CPU mode
- Reduce max-model-len to 4096 for CPU compatibility
- Add max-num-batched-tokens matching the context size
- Add enforce-eager for stable CPU inference

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 16:14:14 +01:00
Till-JS
60394076e5 feat(mana-stt): add vLLM integration for Voxtral transcription
- Add vllm_service.py as proxy to vLLM server for Voxtral 3B/4B
- Add voxtral_api_service.py for Mistral API fallback
- Update main.py with /transcribe/voxtral endpoint using vLLM
- Add /transcribe/auto endpoint with automatic fallback chain
- Create setup-vllm.sh and start-vllm-voxtral.sh scripts
- Add launchd plist files for Mac Mini deployment
- Add install-services.sh for automated service installation

Architecture:
- vLLM server runs Voxtral models on port 8100
- mana-stt proxies to vLLM with Mistral API fallback
- Fallback chain: vLLM -> Mistral API

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 16:10:00 +01:00
Till-JS
6402f287e8 feat(telegram-bot): add local STT support and Prometheus metrics
- Fix telegram_user_id column type (integer -> bigint) for large user IDs
- Add local STT support via mana-stt service (Whisper MLX + Voxtral)
- Add STT provider config (local/openai) with fallback support
- Add Grafana dashboard for mana-stt service metrics
- Add ollama-metrics-proxy for LLM metrics collection
- Add Grafana dashboard for Ollama LLM metrics

Services added/updated:
- telegram-project-doc-bot: local STT integration
- mana-stt: Grafana dashboard
- ollama-metrics-proxy: new service for Ollama metrics

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 16:51:09 +01:00
Till-JS
bff80b552a fix(stt): remove unsupported add_generation_prompt kwarg 2026-01-27 03:24:43 +01:00
Till-JS
a2233dc366 fix(stt): properly encode audio as base64 for Voxtral 2026-01-27 02:13:34 +01:00
Till-JS
49255ac794 fix(stt): use correct AutoModel for Voxtral multimodal architecture 2026-01-27 01:58:32 +01:00
Till-JS
92a700ac7e fix(stt): change default model to large-v3 (large-v3-turbo not supported by lightning-whisper-mlx) 2026-01-27 01:36:49 +01:00
Till-JS
bf0fa04e7e feat(stt): add speech-to-text service for Mac Mini
Add mana-stt service with Whisper and Voxtral support for local
transcription. Includes setup script and launchd integration for
automatic startup on Mac Mini server.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27 01:33:10 +01:00