mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-19 13:33:35 +02:00
New GPU service for fast text-to-video generation using LTX-Video (~2B params) on the RTX 3090. Generates 480p clips in 10-30 seconds, uses ~10GB VRAM. Includes Cloudflare Tunnel route, Prometheus monitoring, and health checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4.7 KiB
4.7 KiB
CLAUDE.md - Mana Video Generation Service
Service Overview
AI video generation microservice using LTX-Video via HuggingFace diffusers:
- Port: 3026
- Framework: Python + FastAPI
- Model: LTX-Video (~2B params, Lightricks)
- Backend: diffusers + PyTorch CUDA
- Target Hardware: NVIDIA RTX 3090 (24 GB VRAM)
Features
- Fast generation: 10-30 seconds per clip on RTX 3090
- Text-to-video: 480p-720p, up to ~6 seconds
- Low VRAM: ~10 GB — leaves room for other GPU services
- Lazy model loading: Model loads on first request, stays in VRAM
- VRAM management: POST /unload to free GPU memory for other services
- MP4 output: Direct video file serving
Commands
# Setup (installs PyTorch CUDA + diffusers + LTX-Video)
chmod +x setup.sh && ./setup.sh
# Development
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3026 --reload
# Test
curl http://localhost:3026/health
curl -X POST http://localhost:3026/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "A cat walking in a garden"}' | jq
# Free VRAM (e.g. before running image generation)
curl -X POST http://localhost:3026/unload
File Structure
services/mana-video-gen/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI endpoints
│ └── ltx_service.py # LTX-Video diffusers pipeline
├── setup.sh # Setup script (CUDA + Python deps)
├── requirements.txt
├── .env.example
└── CLAUDE.md
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/health |
GET | Health check + GPU info |
/models |
GET | Model info |
/generate |
POST | Generate video from text prompt |
/videos/{filename} |
GET | Serve generated video |
/videos/{filename} |
DELETE | Delete video |
/unload |
POST | Unload model, free VRAM |
/cleanup |
POST | Clean old videos |
Generate Request
{
"prompt": "A timelapse of a flower blooming",
"negative_prompt": "blurry, low quality",
"width": 704,
"height": 480,
"num_frames": 81,
"fps": 25,
"steps": 30,
"guidance_scale": 7.5,
"seed": null
}
Generate Response
{
"success": true,
"video_url": "/videos/abc123.mp4",
"prompt": "A timelapse of a flower blooming",
"width": 704,
"height": 480,
"num_frames": 81,
"fps": 25,
"duration": 3.24,
"steps": 30,
"seed": 42,
"generation_time": 18.5
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
PORT |
3026 |
Service port |
LTX_MODEL_ID |
Lightricks/LTX-Video |
HuggingFace model ID |
DEVICE |
cuda |
PyTorch device |
DEFAULT_WIDTH |
704 |
Default video width |
DEFAULT_HEIGHT |
480 |
Default video height |
DEFAULT_NUM_FRAMES |
81 |
Default frame count (~3.2s) |
DEFAULT_FPS |
25 |
Default framerate |
DEFAULT_STEPS |
30 |
Default inference steps |
DEFAULT_GUIDANCE_SCALE |
7.5 |
Default CFG scale |
GENERATION_TIMEOUT |
600 |
Timeout in seconds |
MAX_PROMPT_LENGTH |
2000 |
Max prompt chars |
MAX_FRAMES |
161 |
Max frames (~6.4s) |
CORS_ORIGINS |
(production URLs) | CORS config |
Model Details
LTX-Video
- Parameters: ~2 billion
- License: Lightricks Open License (commercial use allowed)
- Download size: ~4 GB (auto-downloaded on first use)
- VRAM usage: ~10 GB
- Optimal settings: 704x480, 30 steps, 7.5 guidance
- Speed on RTX 3090: 10-30 seconds per clip
VRAM Management
The GPU server runs multiple AI services. LTX-Video uses ~10 GB VRAM:
- Model loads lazily on first
/generaterequest - Use
POST /unloadto free VRAM when not generating videos - Other services (mana-image-gen, mana-stt, mana-tts) share the same GPU
enable_model_cpu_offload()moves unused layers to CPU automatically
Performance (RTX 3090)
| Resolution | Frames | Steps | Time |
|---|---|---|---|
| 512x320 | 41 | 20 | ~8s |
| 704x480 | 81 | 30 | ~20s |
| 704x480 | 41 | 20 | ~10s |
| 1280x720 | 41 | 30 | ~45s |
Integration
Used by:
- Picture App — video generation alongside images
- Chat App — inline video generation
- Matrix Bots — video generation via chat commands
Example (TypeScript)
const response = await fetch('http://192.168.178.11:3026/generate', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Ocean waves crashing on rocks at sunset',
width: 704,
height: 480,
num_frames: 81,
}),
});
const result = await response.json();
const videoUrl = `http://192.168.178.11:3026${result.video_url}`;