mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 20:01:09 +02:00
A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:
- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
(the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
service itself was deleted long ago)
Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
role behind)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.6 KiB
4.6 KiB
CLAUDE.md - Mana Video Generation Service
Service Overview
AI video generation microservice using LTX-Video via HuggingFace diffusers:
- Port: 3026
- Framework: Python + FastAPI
- Model: LTX-Video (~2B params, Lightricks)
- Backend: diffusers + PyTorch CUDA
- Target Hardware: NVIDIA RTX 3090 (24 GB VRAM)
Features
- Fast generation: 10-30 seconds per clip on RTX 3090
- Text-to-video: 480p-720p, up to ~6 seconds
- Low VRAM: ~10 GB — leaves room for other GPU services
- Lazy model loading: Model loads on first request, stays in VRAM
- VRAM management: POST /unload to free GPU memory for other services
- MP4 output: Direct video file serving
Commands
# Setup (installs PyTorch CUDA + diffusers + LTX-Video)
chmod +x setup.sh && ./setup.sh
# Development
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3026 --reload
# Test
curl http://localhost:3026/health
curl -X POST http://localhost:3026/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "A cat walking in a garden"}' | jq
# Free VRAM (e.g. before running image generation)
curl -X POST http://localhost:3026/unload
File Structure
services/mana-video-gen/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI endpoints
│ └── ltx_service.py # LTX-Video diffusers pipeline
├── setup.sh # Setup script (CUDA + Python deps)
├── requirements.txt
├── .env.example
└── CLAUDE.md
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/health |
GET | Health check + GPU info |
/models |
GET | Model info |
/generate |
POST | Generate video from text prompt |
/videos/{filename} |
GET | Serve generated video |
/videos/{filename} |
DELETE | Delete video |
/unload |
POST | Unload model, free VRAM |
/cleanup |
POST | Clean old videos |
Generate Request
{
"prompt": "A timelapse of a flower blooming",
"negative_prompt": "blurry, low quality",
"width": 704,
"height": 480,
"num_frames": 81,
"fps": 25,
"steps": 30,
"guidance_scale": 7.5,
"seed": null
}
Generate Response
{
"success": true,
"video_url": "/videos/abc123.mp4",
"prompt": "A timelapse of a flower blooming",
"width": 704,
"height": 480,
"num_frames": 81,
"fps": 25,
"duration": 3.24,
"steps": 30,
"seed": 42,
"generation_time": 18.5
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT |
3026 |
Service port |
LTX_MODEL_ID |
Lightricks/LTX-Video |
HuggingFace model ID |
DEVICE |
cuda |
PyTorch device |
DEFAULT_WIDTH |
704 |
Default video width |
DEFAULT_HEIGHT |
480 |
Default video height |
DEFAULT_NUM_FRAMES |
81 |
Default frame count (~3.2s) |
DEFAULT_FPS |
25 |
Default framerate |
DEFAULT_STEPS |
30 |
Default inference steps |
DEFAULT_GUIDANCE_SCALE |
7.5 |
Default CFG scale |
GENERATION_TIMEOUT |
600 |
Timeout in seconds |
MAX_PROMPT_LENGTH |
2000 |
Max prompt chars |
MAX_FRAMES |
161 |
Max frames (~6.4s) |
CORS_ORIGINS |
(production URLs) | CORS config |
Model Details
LTX-Video
- Parameters: ~2 billion
- License: Lightricks Open License (commercial use allowed)
- Download size: ~4 GB (auto-downloaded on first use)
- VRAM usage: ~10 GB
- Optimal settings: 704x480, 30 steps, 7.5 guidance
- Speed on RTX 3090: 10-30 seconds per clip
VRAM Management
The GPU server runs multiple AI services. LTX-Video uses ~10 GB VRAM:
- Model loads lazily on first
/generaterequest - Use
POST /unloadto free VRAM when not generating videos - Other services (mana-image-gen, mana-stt, mana-tts) share the same GPU
enable_model_cpu_offload()moves unused layers to CPU automatically
Performance (RTX 3090)
| Resolution | Frames | Steps | Time |
|---|---|---|---|
| 512x320 | 41 | 20 | ~8s |
| 704x480 | 81 | 30 | ~20s |
| 704x480 | 41 | 20 | ~10s |
| 1280x720 | 41 | 30 | ~45s |
Integration
Used by:
- Picture App — video generation alongside images
- Chat App — inline video generation
Example (TypeScript)
const response = await fetch('http://192.168.178.11:3026/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Ocean waves crashing on rocks at sunset',
width: 704,
height: 480,
num_frames: 81,
}),
});
const result = await response.json();
const videoUrl = `http://192.168.178.11:3026${result.video_url}`;