managarten/services/mana-video-gen/CLAUDE.md
Till JS bfeeef7819 chore(matrix): final scrub of stale matrix references
A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:

- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
  consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
  bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
  (the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
  Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
  Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
  scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
  counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
  service itself was deleted long ago)

Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
  2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
  role behind)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 16:47:54 +02:00

171 lines
4.6 KiB
Markdown

# CLAUDE.md - Mana Video Generation Service
## Service Overview
AI video generation microservice using LTX-Video via HuggingFace diffusers:
- **Port**: 3026
- **Framework**: Python + FastAPI
- **Model**: LTX-Video (~2B params, Lightricks)
- **Backend**: diffusers + PyTorch CUDA
- **Target Hardware**: NVIDIA RTX 3090 (24 GB VRAM)
## Features
- **Fast generation**: 10-30 seconds per clip on RTX 3090
- **Text-to-video**: 480p-720p, up to ~6 seconds
- **Low VRAM**: ~10 GB — leaves room for other GPU services
- **Lazy model loading**: Model loads on first request, stays in VRAM
- **VRAM management**: POST /unload to free GPU memory for other services
- **MP4 output**: Direct video file serving
## Commands
```bash
# Setup (installs PyTorch CUDA + diffusers + LTX-Video)
chmod +x setup.sh && ./setup.sh
# Development
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3026 --reload
# Test
curl http://localhost:3026/health
curl -X POST http://localhost:3026/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "A cat walking in a garden"}' | jq
# Free VRAM (e.g. before running image generation)
curl -X POST http://localhost:3026/unload
```
## File Structure
```
services/mana-video-gen/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI endpoints
│ └── ltx_service.py # LTX-Video diffusers pipeline
├── setup.sh # Setup script (CUDA + Python deps)
├── requirements.txt
├── .env.example
└── CLAUDE.md
```
## API Endpoints
| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/health` | GET | Health check + GPU info |
| `/models` | GET | Model info |
| `/generate` | POST | Generate video from text prompt |
| `/videos/{filename}` | GET | Serve generated video |
| `/videos/{filename}` | DELETE | Delete video |
| `/unload` | POST | Unload model, free VRAM |
| `/cleanup` | POST | Clean old videos |
## Generate Request
```json
{
"prompt": "A timelapse of a flower blooming",
"negative_prompt": "blurry, low quality",
"width": 704,
"height": 480,
"num_frames": 81,
"fps": 25,
"steps": 30,
"guidance_scale": 7.5,
"seed": null
}
```
## Generate Response
```json
{
"success": true,
"video_url": "/videos/abc123.mp4",
"prompt": "A timelapse of a flower blooming",
"width": 704,
"height": 480,
"num_frames": 81,
"fps": 25,
"duration": 3.24,
"steps": 30,
"seed": 42,
"generation_time": 18.5
}
```
## Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `PORT` | `3026` | Service port |
| `LTX_MODEL_ID` | `Lightricks/LTX-Video` | HuggingFace model ID |
| `DEVICE` | `cuda` | PyTorch device |
| `DEFAULT_WIDTH` | `704` | Default video width |
| `DEFAULT_HEIGHT` | `480` | Default video height |
| `DEFAULT_NUM_FRAMES` | `81` | Default frame count (~3.2s) |
| `DEFAULT_FPS` | `25` | Default framerate |
| `DEFAULT_STEPS` | `30` | Default inference steps |
| `DEFAULT_GUIDANCE_SCALE` | `7.5` | Default CFG scale |
| `GENERATION_TIMEOUT` | `600` | Timeout in seconds |
| `MAX_PROMPT_LENGTH` | `2000` | Max prompt chars |
| `MAX_FRAMES` | `161` | Max frames (~6.4s) |
| `CORS_ORIGINS` | (production URLs) | CORS config |
## Model Details
### LTX-Video
- **Parameters**: ~2 billion
- **License**: Lightricks Open License (commercial use allowed)
- **Download size**: ~4 GB (auto-downloaded on first use)
- **VRAM usage**: ~10 GB
- **Optimal settings**: 704x480, 30 steps, 7.5 guidance
- **Speed on RTX 3090**: 10-30 seconds per clip
## VRAM Management
The GPU server runs multiple AI services. LTX-Video uses ~10 GB VRAM:
- Model loads lazily on first `/generate` request
- Use `POST /unload` to free VRAM when not generating videos
- Other services (mana-image-gen, mana-stt, mana-tts) share the same GPU
- `enable_model_cpu_offload()` moves unused layers to CPU automatically
## Performance (RTX 3090)
| Resolution | Frames | Steps | Time |
|------------|--------|-------|------|
| 512x320 | 41 | 20 | ~8s |
| 704x480 | 81 | 30 | ~20s |
| 704x480 | 41 | 20 | ~10s |
| 1280x720 | 41 | 30 | ~45s |
## Integration
Used by:
- **Picture App** — video generation alongside images
- **Chat App** — inline video generation
### Example (TypeScript)
```typescript
const response = await fetch('http://192.168.178.11:3026/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Ocean waves crashing on rocks at sunset',
width: 704,
height: 480,
num_frames: 81,
}),
});
const result = await response.json();
const videoUrl = `http://192.168.178.11:3026${result.video_url}`;
```