managarten/services/matrix-stt-bot/CLAUDE.md

# Matrix STT Bot - Claude Code Guidelines

## Overview

Matrix STT Bot converts audio/voice messages to text and sends them back as text messages. Uses the mana-stt service (port 3020) for transcription.

## Tech Stack

- **Framework**: NestJS 10
- **Matrix**: matrix-bot-sdk
- **STT Backend**: mana-stt service (Whisper, Voxtral)

## Commands

```bash
# Development
pnpm install
pnpm start:dev        # Start with hot reload

# Build
pnpm build            # Production build

# Type check
pnpm type-check       # Check TypeScript types
```

## Project Structure

```
services/matrix-stt-bot/
├── src/
│   ├── main.ts               # Application entry point (port 3024)
│   ├── app.module.ts         # Root module
│   ├── config/
│   │   └── configuration.ts  # Configuration & help text
│   ├── bot/
│   │   ├── bot.module.ts
│   │   └── matrix.service.ts # Matrix client & message handler
│   └── stt/
│       ├── stt.module.ts
│       └── stt.service.ts    # mana-stt API client
├── Dockerfile
└── package.json
```

## Bot Commands

| Command | Description |
|---------|-------------|
| `!help` / `!hilfe` | Show help text |
| `!language [de\|en\|auto]` | Change transcription language |
| `!model [whisper\|voxtral\|auto]` | Change STT model |
| `!status` | Show current settings |
| (voice message) | Transcribe to text |

## Message Flow

1. User sends voice/audio message
2. Bot receives via matrix-bot-sdk
3. Audio downloaded from Matrix
4. STT service transcribes audio
5. Text message sent back to room

## Environment Variables

```env
# Server
PORT=3024

# Matrix
MATRIX_HOMESERVER_URL=http://localhost:8008
MATRIX_ACCESS_TOKEN=syt_xxx
MATRIX_ALLOWED_ROOMS=!roomid:matrix.mana.how
MATRIX_STORAGE_PATH=./data/bot-storage.json

# STT Service
STT_URL=http://localhost:3020

# Defaults
DEFAULT_LANGUAGE=de
DEFAULT_MODEL=whisper
```

## STT API Integration

The bot sends audio to mana-stt for transcription:

```typescript
// Default Whisper endpoint
POST /transcribe
FormData: file=audio.ogg, language=de

// Voxtral endpoint (with speaker diarization)
POST /transcribe/voxtral
FormData: file=audio.ogg, language=de

// Auto-select endpoint
POST /transcribe/auto
FormData: file=audio.ogg, prefer=whisper

// Response
{
  "text": "Das ist der transkribierte Text...",
  "language": "de",
  "model": "whisper-large-v3-turbo",
  "duration": 3.5
}
```

## Available Models

| Model | Description |
|-------|-------------|
| `whisper` | Whisper Large V3 (local, fast, 99+ languages) |
| `voxtral` | Voxtral Mini (cloud, speaker diarization) |
| `auto` | Automatic model selection |

## Supported Languages

| Code | Language |
|------|----------|
| `de` | German (default) |
| `en` | English |
| `auto` | Automatic detection |

## Supported Audio Formats

- OGG, MP3, WAV, M4A, FLAC, WebM, Opus
- Matrix voice messages (typically OGG/Opus)

## Docker

```bash
# Build
docker build -f services/matrix-stt-bot/Dockerfile -t matrix-stt-bot .

# Run
docker run -p 3024:3024 \
  -e MATRIX_HOMESERVER_URL=http://synapse:8008 \
  -e MATRIX_ACCESS_TOKEN=syt_xxx \
  -e STT_URL=http://mana-stt:3020 \
  -v matrix-stt-bot-data:/app/data \
  matrix-stt-bot
```

## Health Check

```bash
curl http://localhost:3024/health
```

## Dependencies

- **mana-stt**: Must be running on port 3020 (or configured via `STT_URL`)
- **Matrix homeserver**: Synapse or compatible homeserver

## User Settings

Settings are stored in-memory per Matrix user ID:
- Language selection persists during bot runtime
- Model selection persists during bot runtime
- Settings reset when bot restarts

## Testing

```bash
# 1. Ensure mana-stt is running
curl http://localhost:3020/health

# 2. Start the bot
cd services/matrix-stt-bot
pnpm start:dev

# 3. Check bot health
curl http://localhost:3024/health

# 4. In Matrix:
#    - Invite bot to a room
#    - Send a voice message
#    - Receive text transcription
```

## Related Services

| Service | Port | Description |
|---------|------|-------------|
| mana-stt | 3020 | STT backend service |
| matrix-tts-bot | 3023 | Text-to-speech bot (reverse of this) |
| mana-tts | 3022 | TTS backend service |