- Disable vLLM by default (has issues on macOS CPU)
- Use Mistral API for Voxtral transcription (cloud-based)
- Keep Whisper-MLX for local transcription
- Update README with architecture diagram
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Reduce max-model-len to 4096 for CPU compatibility
- Add max-num-batched-tokens matching the context size
- Add enforce-eager for stable CPU inference
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add vllm_service.py as proxy to vLLM server for Voxtral 3B/4B
- Add voxtral_api_service.py for Mistral API fallback
- Update main.py with /transcribe/voxtral endpoint using vLLM
- Add /transcribe/auto endpoint with automatic fallback chain
- Create setup-vllm.sh and start-vllm-voxtral.sh scripts
- Add launchd plist files for Mac Mini deployment
- Add install-services.sh for automated service installation
Architecture:
- vLLM server runs Voxtral models on port 8100
- mana-stt proxies to vLLM with Mistral API fallback
- Fallback chain: vLLM -> Mistral API
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix telegram_user_id column type (integer -> bigint) for large user IDs
- Add local STT support via mana-stt service (Whisper MLX + Voxtral)
- Add STT provider config (local/openai) with fallback support
- Add Grafana dashboard for mana-stt service metrics
- Add ollama-metrics-proxy for LLM metrics collection
- Add Grafana dashboard for Ollama LLM metrics
Services added/updated:
- telegram-project-doc-bot: local STT integration
- mana-stt: Grafana dashboard
- ollama-metrics-proxy: new service for Ollama metrics
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add mana-stt service with Whisper and Voxtral support for local
transcription. Includes setup script and launchd integration for
automatic startup on Mac Mini server.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>