mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-16 22:39:41 +02:00

Till-JS e357f9f292 feat(matrix-stt-bot): add speech-to-text Matrix bot

- New bot that transcribes voice messages to text
- Uses mana-stt service (Whisper/Voxtral) for transcription
- Supports German and English with auto-detection
- Commands: !language, !model, !status, !help
- Runs on port 3024

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-14 14:29:34 +01:00

4.1 KiB

Raw Blame History

Matrix STT Bot - Claude Code Guidelines

Overview

Matrix STT Bot converts audio/voice messages to text and sends them back as text messages. Uses the mana-stt service (port 3020) for transcription.

Tech Stack

Framework: NestJS 10
Matrix: matrix-bot-sdk
STT Backend: mana-stt service (Whisper, Voxtral)

Commands

# Development
pnpm install
pnpm start:dev        # Start with hot reload

# Build
pnpm build            # Production build

# Type check
pnpm type-check       # Check TypeScript types

Project Structure

services/matrix-stt-bot/
├── src/
│   ├── main.ts               # Application entry point (port 3024)
│   ├── app.module.ts         # Root module
│   ├── config/
│   │   └── configuration.ts  # Configuration & help text
│   ├── bot/
│   │   ├── bot.module.ts
│   │   └── matrix.service.ts # Matrix client & message handler
│   └── stt/
│       ├── stt.module.ts
│       └── stt.service.ts    # mana-stt API client
├── Dockerfile
└── package.json

Bot Commands

Command	Description
`!help` / `!hilfe`	Show help text
`!language [de\|en\|auto]`	Change transcription language
`!model [whisper\|voxtral\|auto]`	Change STT model
`!status`	Show current settings
(voice message)	Transcribe to text

Message Flow

User sends voice/audio message
Bot receives via matrix-bot-sdk
Audio downloaded from Matrix
STT service transcribes audio
Text message sent back to room

Environment Variables

# Server
PORT=3024

# Matrix
MATRIX_HOMESERVER_URL=http://localhost:8008
MATRIX_ACCESS_TOKEN=syt_xxx
MATRIX_ALLOWED_ROOMS=!roomid:matrix.mana.how
MATRIX_STORAGE_PATH=./data/bot-storage.json

# STT Service
STT_URL=http://localhost:3020

# Defaults
DEFAULT_LANGUAGE=de
DEFAULT_MODEL=whisper

STT API Integration

The bot sends audio to mana-stt for transcription:

// Default Whisper endpoint
POST /transcribe
FormData: file=audio.ogg, language=de

// Voxtral endpoint (with speaker diarization)
POST /transcribe/voxtral
FormData: file=audio.ogg, language=de

// Auto-select endpoint
POST /transcribe/auto
FormData: file=audio.ogg, prefer=whisper

// Response
{
  "text": "Das ist der transkribierte Text...",
  "language": "de",
  "model": "whisper-large-v3-turbo",
  "duration": 3.5
}

Available Models

Model	Description
`whisper`	Whisper Large V3 (local, fast, 99+ languages)
`voxtral`	Voxtral Mini (cloud, speaker diarization)
`auto`	Automatic model selection

Supported Languages

Code	Language
`de`	German (default)
`en`	English
`auto`	Automatic detection

Supported Audio Formats

OGG, MP3, WAV, M4A, FLAC, WebM, Opus
Matrix voice messages (typically OGG/Opus)

Docker

# Build
docker build -f services/matrix-stt-bot/Dockerfile -t matrix-stt-bot .

# Run
docker run -p 3024:3024 \
  -e MATRIX_HOMESERVER_URL=http://synapse:8008 \
  -e MATRIX_ACCESS_TOKEN=syt_xxx \
  -e STT_URL=http://mana-stt:3020 \
  -v matrix-stt-bot-data:/app/data \
  matrix-stt-bot

Health Check

curl http://localhost:3024/health

Dependencies

mana-stt: Must be running on port 3020 (or configured via STT_URL)
Matrix homeserver: Synapse or compatible homeserver

User Settings

Settings are stored in-memory per Matrix user ID:

Language selection persists during bot runtime
Model selection persists during bot runtime
Settings reset when bot restarts

Testing

# 1. Ensure mana-stt is running
curl http://localhost:3020/health

# 2. Start the bot
cd services/matrix-stt-bot
pnpm start:dev

# 3. Check bot health
curl http://localhost:3024/health

# 4. In Matrix:
#    - Invite bot to a room
#    - Send a voice message
#    - Receive text transcription

Service	Port	Description
mana-stt	3020	STT backend service
matrix-tts-bot	3023	Text-to-speech bot (reverse of this)
mana-tts	3022	TTS backend service

4.1 KiB Raw Blame History