mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 20:21:09 +02:00

Till JS bfeeef7819 chore(matrix): final scrub of stale matrix references

A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:

- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
  consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
  bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
  (the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
  Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
  Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
  scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
  counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
  service itself was deleted long ago)

Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
  2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
  role behind)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-08 16:47:54 +02:00

4.6 KiB

Raw Blame History

CLAUDE.md - Mana Video Generation Service

Service Overview

AI video generation microservice using LTX-Video via HuggingFace diffusers:

Port: 3026
Framework: Python + FastAPI
Model: LTX-Video (~2B params, Lightricks)
Backend: diffusers + PyTorch CUDA
Target Hardware: NVIDIA RTX 3090 (24 GB VRAM)

Features

Fast generation: 10-30 seconds per clip on RTX 3090
Text-to-video: 480p-720p, up to ~6 seconds
Low VRAM: ~10 GB — leaves room for other GPU services
Lazy model loading: Model loads on first request, stays in VRAM
VRAM management: POST /unload to free GPU memory for other services
MP4 output: Direct video file serving

Commands

# Setup (installs PyTorch CUDA + diffusers + LTX-Video)
chmod +x setup.sh && ./setup.sh

# Development
source .venv/bin/activate
uvicorn app.main:app --host 0.0.0.0 --port 3026 --reload

# Test
curl http://localhost:3026/health
curl -X POST http://localhost:3026/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat walking in a garden"}' | jq

# Free VRAM (e.g. before running image generation)
curl -X POST http://localhost:3026/unload

File Structure

services/mana-video-gen/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI endpoints
│   └── ltx_service.py       # LTX-Video diffusers pipeline
├── setup.sh                 # Setup script (CUDA + Python deps)
├── requirements.txt
├── .env.example
└── CLAUDE.md

API Endpoints

Endpoint	Method	Purpose
`/health`	GET	Health check + GPU info
`/models`	GET	Model info
`/generate`	POST	Generate video from text prompt
`/videos/{filename}`	GET	Serve generated video
`/videos/{filename}`	DELETE	Delete video
`/unload`	POST	Unload model, free VRAM
`/cleanup`	POST	Clean old videos

Generate Request

{
  "prompt": "A timelapse of a flower blooming",
  "negative_prompt": "blurry, low quality",
  "width": 704,
  "height": 480,
  "num_frames": 81,
  "fps": 25,
  "steps": 30,
  "guidance_scale": 7.5,
  "seed": null
}

Generate Response

{
  "success": true,
  "video_url": "/videos/abc123.mp4",
  "prompt": "A timelapse of a flower blooming",
  "width": 704,
  "height": 480,
  "num_frames": 81,
  "fps": 25,
  "duration": 3.24,
  "steps": 30,
  "seed": 42,
  "generation_time": 18.5
}

Environment Variables

Variable	Default	Description
`PORT`	`3026`	Service port
`LTX_MODEL_ID`	`Lightricks/LTX-Video`	HuggingFace model ID
`DEVICE`	`cuda`	PyTorch device
`DEFAULT_WIDTH`	`704`	Default video width
`DEFAULT_HEIGHT`	`480`	Default video height
`DEFAULT_NUM_FRAMES`	`81`	Default frame count (~3.2s)
`DEFAULT_FPS`	`25`	Default framerate
`DEFAULT_STEPS`	`30`	Default inference steps
`DEFAULT_GUIDANCE_SCALE`	`7.5`	Default CFG scale
`GENERATION_TIMEOUT`	`600`	Timeout in seconds
`MAX_PROMPT_LENGTH`	`2000`	Max prompt chars
`MAX_FRAMES`	`161`	Max frames (~6.4s)
`CORS_ORIGINS`	(production URLs)	CORS config

Model Details

LTX-Video

Parameters: ~2 billion
License: Lightricks Open License (commercial use allowed)
Download size: ~4 GB (auto-downloaded on first use)
VRAM usage: ~10 GB
Optimal settings: 704x480, 30 steps, 7.5 guidance
Speed on RTX 3090: 10-30 seconds per clip

VRAM Management

The GPU server runs multiple AI services. LTX-Video uses ~10 GB VRAM:

Model loads lazily on first /generate request
Use POST /unload to free VRAM when not generating videos
Other services (mana-image-gen, mana-stt, mana-tts) share the same GPU
enable_model_cpu_offload() moves unused layers to CPU automatically

Performance (RTX 3090)

Resolution	Frames	Steps	Time
512x320	41	20	~8s
704x480	81	30	~20s
704x480	41	20	~10s
1280x720	41	30	~45s

Integration

Used by:

Picture App — video generation alongside images
Chat App — inline video generation

Example (TypeScript)

const response = await fetch('http://192.168.178.11:3026/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    prompt: 'Ocean waves crashing on rocks at sunset',
    width: 704,
    height: 480,
    num_frames: 81,
  }),
});

const result = await response.json();
const videoUrl = `http://192.168.178.11:3026${result.video_url}`;

4.6 KiB Raw Blame History