managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-15 20:59:40 +02:00

Commit graph

Author	SHA1	Message	Date
Till-JS	7c9c2645e3	🐛 fix(mana-stt): adjust vLLM config for CPU mode - Reduce max-model-len to 4096 for CPU compatibility - Add max-num-batched-tokens matching the context size - Add enforce-eager for stable CPU inference Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-11 16:14:14 +01:00
Till-JS	60394076e5	✨ feat(mana-stt): add vLLM integration for Voxtral transcription - Add vllm_service.py as proxy to vLLM server for Voxtral 3B/4B - Add voxtral_api_service.py for Mistral API fallback - Update main.py with /transcribe/voxtral endpoint using vLLM - Add /transcribe/auto endpoint with automatic fallback chain - Create setup-vllm.sh and start-vllm-voxtral.sh scripts - Add launchd plist files for Mac Mini deployment - Add install-services.sh for automated service installation Architecture: - vLLM server runs Voxtral models on port 8100 - mana-stt proxies to vLLM with Mistral API fallback - Fallback chain: vLLM -> Mistral API Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-02-11 16:10:00 +01:00

Author

SHA1

Message

Date

Till-JS

7c9c2645e3

🐛 fix(mana-stt): adjust vLLM config for CPU mode

- Reduce max-model-len to 4096 for CPU compatibility
- Add max-num-batched-tokens matching the context size
- Add enforce-eager for stable CPU inference

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-11 16:14:14 +01:00

Till-JS

60394076e5

✨ feat(mana-stt): add vLLM integration for Voxtral transcription

- Add vllm_service.py as proxy to vLLM server for Voxtral 3B/4B
- Add voxtral_api_service.py for Mistral API fallback
- Update main.py with /transcribe/voxtral endpoint using vLLM
- Add /transcribe/auto endpoint with automatic fallback chain
- Create setup-vllm.sh and start-vllm-voxtral.sh scripts
- Add launchd plist files for Mac Mini deployment
- Add install-services.sh for automated service installation

Architecture:
- vLLM server runs Voxtral models on port 8100
- mana-stt proxies to vLLM with Mistral API fallback
- Fallback chain: vLLM -> Mistral API

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-11 16:10:00 +01:00

2 commits