managarten/services/mana-voice-bot/CLAUDE.md
Till-JS f4d8ed491c feat(mana-voice-bot): add German voice-to-voice assistant service
Complete voice pipeline combining:
- STT: Whisper (mana-stt)
- LLM: Ollama (Gemma/Qwen)
- TTS: Edge TTS (15 German voices)

Endpoints:
- /voice - Full audio-to-audio pipeline
- /chat/audio - Text-to-audio
- /tts - Direct TTS
- /transcribe - STT only

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 02:21:13 +01:00

3.3 KiB

CLAUDE.md - Mana Voice Bot

Service Overview

German voice-to-voice assistant combining:

  • STT: Whisper via mana-stt (Port 3020)
  • LLM: Ollama with Gemma/Qwen (Port 11434)
  • TTS: Edge TTS (Microsoft, cloud API)

Port: 3050

Architecture

Audio Input → Whisper (STT) → Ollama (LLM) → Edge TTS → Audio Output
     ↓              ↓              ↓              ↓
  [WAV/MP3]    [German Text]  [Response]     [MP3 Audio]

Commands

# Setup
./setup.sh

# Development
source venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3050 --reload

# Production
./start.sh

# Test
curl http://localhost:3050/health

API Endpoints

Endpoint Method Description
/health GET Service health check
/voices GET List German TTS voices
/models GET List available Ollama models
/transcribe POST Audio → Text (STT only)
/chat POST Text → Text (LLM only)
/chat/audio POST Text → Audio (LLM + TTS)
/tts POST Text → Audio (TTS only)
/voice POST Audio → Audio (Full pipeline)
/voice/metadata POST Audio → JSON (Full pipeline, no audio)

Usage Examples

Full Voice Pipeline

# Record audio and send to voice bot
curl -X POST http://localhost:3050/voice \
  -F "audio=@input.wav" \
  -F "model=gemma3:4b" \
  -F "voice=de-DE-ConradNeural" \
  -o response.mp3

Text to Audio

curl -X POST http://localhost:3050/chat/audio \
  -H "Content-Type: application/json" \
  -d '{"message": "Was ist die Hauptstadt von Deutschland?", "voice": "de-DE-KatjaNeural"}' \
  -o response.mp3

TTS Only

curl -X POST http://localhost:3050/tts \
  -F "text=Hallo, wie geht es dir?" \
  -F "voice=de-DE-ConradNeural" \
  -o hello.mp3

German Voices

Voice ID Description
de-DE-ConradNeural Male - Professional (Default)
de-DE-KatjaNeural Female - Natural
de-DE-AmalaNeural Female - Friendly
de-DE-BerndNeural Male - Calm
de-DE-ChristophNeural Male - News
de-DE-ElkeNeural Female - Warm
de-DE-KillianNeural Male - Casual
de-DE-KlarissaNeural Female - Cheerful
de-DE-KlausNeural Male - Storyteller
de-DE-LouisaNeural Female - Assistant
de-DE-TanjaNeural Female - Business

Environment Variables

Variable Default Description
PORT 3050 Service port
STT_URL http://localhost:3020 mana-stt URL
OLLAMA_URL http://localhost:11434 Ollama URL
DEFAULT_MODEL gemma3:4b Default LLM model
DEFAULT_VOICE de-DE-ConradNeural Default TTS voice
SYSTEM_PROMPT (German assistant) LLM system prompt

Dependencies

  • fastapi - Web framework
  • uvicorn - ASGI server
  • aiohttp - Async HTTP client
  • edge-tts - Microsoft TTS
  • python-multipart - File uploads

Performance

Typical latency breakdown:

  • STT (Whisper): 0.5-2s
  • LLM (Gemma 4B): 1-5s
  • TTS (Edge): 0.3-0.5s
  • Total: 2-7s

Mac Mini Deployment

# On Mac Mini
cd ~/projects/manacore-monorepo/services/mana-voice-bot
./setup.sh
./start.sh

# Or with launchd (autostart)
# See scripts/mac-mini/setup-voice-bot.sh