mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-18 23:21:24 +02:00
- New bot that transcribes voice messages to text - Uses mana-stt service (Whisper/Voxtral) for transcription - Supports German and English with auto-detection - Commands: !language, !model, !status, !help - Runs on port 3024 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
189 lines
4.1 KiB
Markdown
189 lines
4.1 KiB
Markdown
# Matrix STT Bot - Claude Code Guidelines
|
|
|
|
## Overview
|
|
|
|
Matrix STT Bot converts audio/voice messages to text and sends them back as text messages. Uses the mana-stt service (port 3020) for transcription.
|
|
|
|
## Tech Stack
|
|
|
|
- **Framework**: NestJS 10
|
|
- **Matrix**: matrix-bot-sdk
|
|
- **STT Backend**: mana-stt service (Whisper, Voxtral)
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Development
|
|
pnpm install
|
|
pnpm start:dev # Start with hot reload
|
|
|
|
# Build
|
|
pnpm build # Production build
|
|
|
|
# Type check
|
|
pnpm type-check # Check TypeScript types
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
services/matrix-stt-bot/
|
|
├── src/
|
|
│ ├── main.ts # Application entry point (port 3024)
|
|
│ ├── app.module.ts # Root module
|
|
│ ├── config/
|
|
│ │ └── configuration.ts # Configuration & help text
|
|
│ ├── bot/
|
|
│ │ ├── bot.module.ts
|
|
│ │ └── matrix.service.ts # Matrix client & message handler
|
|
│ └── stt/
|
|
│ ├── stt.module.ts
|
|
│ └── stt.service.ts # mana-stt API client
|
|
├── Dockerfile
|
|
└── package.json
|
|
```
|
|
|
|
## Bot Commands
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `!help` / `!hilfe` | Show help text |
|
|
| `!language [de\|en\|auto]` | Change transcription language |
|
|
| `!model [whisper\|voxtral\|auto]` | Change STT model |
|
|
| `!status` | Show current settings |
|
|
| (voice message) | Transcribe to text |
|
|
|
|
## Message Flow
|
|
|
|
1. User sends voice/audio message
|
|
2. Bot receives via matrix-bot-sdk
|
|
3. Audio downloaded from Matrix
|
|
4. STT service transcribes audio
|
|
5. Text message sent back to room
|
|
|
|
## Environment Variables
|
|
|
|
```env
|
|
# Server
|
|
PORT=3024
|
|
|
|
# Matrix
|
|
MATRIX_HOMESERVER_URL=http://localhost:8008
|
|
MATRIX_ACCESS_TOKEN=syt_xxx
|
|
MATRIX_ALLOWED_ROOMS=!roomid:matrix.mana.how
|
|
MATRIX_STORAGE_PATH=./data/bot-storage.json
|
|
|
|
# STT Service
|
|
STT_URL=http://localhost:3020
|
|
|
|
# Defaults
|
|
DEFAULT_LANGUAGE=de
|
|
DEFAULT_MODEL=whisper
|
|
```
|
|
|
|
## STT API Integration
|
|
|
|
The bot sends audio to mana-stt for transcription:
|
|
|
|
```typescript
|
|
// Default Whisper endpoint
|
|
POST /transcribe
|
|
FormData: file=audio.ogg, language=de
|
|
|
|
// Voxtral endpoint (with speaker diarization)
|
|
POST /transcribe/voxtral
|
|
FormData: file=audio.ogg, language=de
|
|
|
|
// Auto-select endpoint
|
|
POST /transcribe/auto
|
|
FormData: file=audio.ogg, prefer=whisper
|
|
|
|
// Response
|
|
{
|
|
"text": "Das ist der transkribierte Text...",
|
|
"language": "de",
|
|
"model": "whisper-large-v3-turbo",
|
|
"duration": 3.5
|
|
}
|
|
```
|
|
|
|
## Available Models
|
|
|
|
| Model | Description |
|
|
|-------|-------------|
|
|
| `whisper` | Whisper Large V3 (local, fast, 99+ languages) |
|
|
| `voxtral` | Voxtral Mini (cloud, speaker diarization) |
|
|
| `auto` | Automatic model selection |
|
|
|
|
## Supported Languages
|
|
|
|
| Code | Language |
|
|
|------|----------|
|
|
| `de` | German (default) |
|
|
| `en` | English |
|
|
| `auto` | Automatic detection |
|
|
|
|
## Supported Audio Formats
|
|
|
|
- OGG, MP3, WAV, M4A, FLAC, WebM, Opus
|
|
- Matrix voice messages (typically OGG/Opus)
|
|
|
|
## Docker
|
|
|
|
```bash
|
|
# Build
|
|
docker build -f services/matrix-stt-bot/Dockerfile -t matrix-stt-bot .
|
|
|
|
# Run
|
|
docker run -p 3024:3024 \
|
|
-e MATRIX_HOMESERVER_URL=http://synapse:8008 \
|
|
-e MATRIX_ACCESS_TOKEN=syt_xxx \
|
|
-e STT_URL=http://mana-stt:3020 \
|
|
-v matrix-stt-bot-data:/app/data \
|
|
matrix-stt-bot
|
|
```
|
|
|
|
## Health Check
|
|
|
|
```bash
|
|
curl http://localhost:3024/health
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
- **mana-stt**: Must be running on port 3020 (or configured via `STT_URL`)
|
|
- **Matrix homeserver**: Synapse or compatible homeserver
|
|
|
|
## User Settings
|
|
|
|
Settings are stored in-memory per Matrix user ID:
|
|
- Language selection persists during bot runtime
|
|
- Model selection persists during bot runtime
|
|
- Settings reset when bot restarts
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# 1. Ensure mana-stt is running
|
|
curl http://localhost:3020/health
|
|
|
|
# 2. Start the bot
|
|
cd services/matrix-stt-bot
|
|
pnpm start:dev
|
|
|
|
# 3. Check bot health
|
|
curl http://localhost:3024/health
|
|
|
|
# 4. In Matrix:
|
|
# - Invite bot to a room
|
|
# - Send a voice message
|
|
# - Receive text transcription
|
|
```
|
|
|
|
## Related Services
|
|
|
|
| Service | Port | Description |
|
|
|---------|------|-------------|
|
|
| mana-stt | 3020 | STT backend service |
|
|
| matrix-tts-bot | 3023 | Text-to-speech bot (reverse of this) |
|
|
| mana-tts | 3022 | TTS backend service |
|