feat(mana-image-gen): replace Mac flux2.c implementation with Windows GPU diffusers

The repo's mana-image-gen used to be a Mac Mini–only service built on flux2.c with hard MPS+arm64 platform checks. The actual production image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code that lived only at C:\mana\services\mana-image-gen\ on the GPU box. This commit pulls the Windows implementation into the repo and deletes the Mac one, so there's exactly one mana-image-gen and its source of truth is git rather than one folder on one machine. Removed: - setup.sh — Mac-only flux2.c installer with hard arm64 platform check - app/main.py (Mac flux2.c subprocess wrapper version) - app/flux_service.py (Mac flux2.c subprocess wrapper version) Added (pulled from C:\mana\services\mana-image-gen\): - app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup) - app/flux_service.py — diffusers FluxPipeline wrapper - app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY) - app/vram_manager.py — shared VRAM accounting - service.pyw — Windows runner used by the ManaImageGen scheduled task Updated: - main.py PORT default from 3025 → 3023 to match the production reality (the service.pyw runner already binds 3023 explicitly via uvicorn.run, but the source default should match so direct uvicorn invocations and local tests don't pick the wrong port) - CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack - README.md trimmed to a pointer at CLAUDE.md + the public URL - .env.example written from scratch (didn't exist before — the service's .env on the GPU box was undocumented) The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the actual Mac Mini deployment will be cleaned up in the next commit, along with the rest of the Mac-Mini AI service infrastructure.
2026-05-14 20:21:09 +02:00 · 2026-04-08 13:02:42 +02:00 · 2026-04-08 13:02:42 +02:00 · c7b4388cec
commit c7b4388cec
parent b8e18b7f82
9 changed files with 562 additions and 607 deletions
--- a/services/mana-image-gen/.env.example
+++ b/services/mana-image-gen/.env.example
@ -0,0 +1,25 @@
 # Mana Image Generation — Windows GPU server only
 # Server
 PORT=3023
 # Model
 IMAGE_MODEL_ID=black-forest-labs/FLUX.1-schnell
 # Generation defaults
 DEFAULT_STEPS=4
 DEFAULT_WIDTH=1024
 DEFAULT_HEIGHT=1024
 MAX_STEPS=8
 GUIDANCE_SCALE=0.0
 GENERATION_TIMEOUT=120
 # Output (where generated images are written)
 OUTPUT_DIR=C:\mana\services\mana-image-gen\outputs
 # CORS
 CORS_ORIGINS=https://mana.how,https://chat.mana.how,http://localhost:5173
 # Cross-service auth — enforced by ApiKeyMiddleware in app/api_auth.py.
 # Same key as mana-llm. Generate with: openssl rand -hex 32
 GPU_API_KEY=
--- a/services/mana-image-gen/CLAUDE.md
+++ b/services/mana-image-gen/CLAUDE.md
@ -1,200 +1,147 @@
-# CLAUDE.md - Mana Image Generation Service
+# mana-image-gen
-## Service Overview
+AI image generation microservice using FLUX models via HuggingFace `diffusers` on NVIDIA CUDA. Lives on the Windows GPU server (`mana-server-gpu`, RTX 3090).
-AI image generation microservice using FLUX.2 klein 4B model via flux2.c:
+> ⚠️ **Earlier history**: this directory used to contain a Mac Mini–only
 > implementation built on `flux2.c` (MPS, Apple Silicon arm64). That
 > version was removed when the service moved fully onto the Windows GPU.
 > If you're looking for the old code, see git history before this commit.
- **Port**: 3025
+## Tech Stack
 - **Host**: Mac Mini only — `setup.sh` hard-fails on anything other than macOS arm64
 - **Framework**: Python + FastAPI
 - **Model**: FLUX.2 klein 4B (Black Forest Labs)
 - **Backend**: flux2.c (Pure C, MPS accelerated)
-> ⚠️ **Two image-gen services exist with the same name.** This one is the
+| Layer | Technology |
-> Mac Mini implementation in the repo (flux2.c, MPS, Apple Silicon only).
+|-------|------------|
-> The Windows GPU server runs a *separate* image-gen on `gpu-img.mana.how`
+| **Runtime** | Python 3.11 + uvicorn (Windows) |
-> (port 3023, PyTorch + diffusers + CUDA) whose code lives outside the
+| **Framework** | FastAPI |
-> repo at `C:\mana\services\mana-image-gen\` on the GPU box. See
+| **Inference** | HuggingFace `diffusers` + PyTorch CUDA |
-> `docs/WINDOWS_GPU_SERVER_SETUP.md` for that one.
+| **Default model** | FLUX.1-schnell (BFL, Apache 2.0, 4-step distilled) |
 | **GPU** | NVIDIA RTX 3090 (24 GB VRAM) |
 | **Auth** | `GPU_API_KEY` middleware (`app/api_auth.py`) |
 | **Process supervision** | Windows Scheduled Task `ManaImageGen` (AtLogOn) |
-## Features
+## Port: 3023
- **Sub-second generation** on Apple Silicon (M4)
+## Where it runs
 - **Memory efficient**: ~4-5 GB RAM usage (memory-mapped weights)
 - **Apache 2.0 license**: Commercially usable
 - **4 sampling steps**: Optimized for speed
 - **1024x1024 default resolution**
-## Commands
+| Host | Path on disk | Entrypoint |
 |------|--------------|------------|
 | Windows GPU server (`192.168.178.11`) | `C:\mana\services\mana-image-gen\` | `service.pyw` via Scheduled Task `ManaImageGen` |
-```bash
+The service is exposed publicly via Cloudflare Tunnel + the Mac Mini TCP-proxy (`gpu-proxy.py`):
 # Setup (installs flux2.c + downloads model)
 ./setup.sh
-# Development
+```
-source .venv/bin/activate
+Internet → Cloudflare → Mac Mini (gpu-proxy.py) → 192.168.178.11:3023
 FLUX_BINARY=/opt/flux2/flux FLUX_MODEL_DIR=/opt/flux2/model \
  uvicorn app.main:app --host 0.0.0.0 --port 3025 --reload
 # Production
 ../../scripts/mac-mini/setup-image-gen.sh
 # Test
 curl http://localhost:3025/health
 curl -X POST http://localhost:3025/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "A cat in space"}' | jq
 ```
-## File Structure
+Public URL: `https://gpu-img.mana.how`
 ## Quick Start (Windows GPU)
 ```powershell
 # As tills on mana-server-gpu
 cd C:\mana\services\mana-image-gen
 C:\mana\venvs\image-gen\Scripts\python.exe service.pyw
 # Or kick the scheduled task
 Start-ScheduledTask -TaskName "ManaImageGen"
 # Health
 curl http://localhost:3023/health
 ```
 The Scheduled Task runs:
 ```
 Execute:    C:\mana\venvs\image-gen\Scripts\python.exe
 Arguments:  C:\mana\services\mana-image-gen\service.pyw
 WorkingDir: C:\mana\services\mana-image-gen
 ```
 ## API Endpoints
 | Method | Path | Description |
 |--------|------|-------------|
 | GET | `/health` | Liveness + GPU + model status |
 | GET | `/models` | Loaded model info |
 | POST | `/generate` | Generate an image (returns `{image_url, ...}`) |
 | GET | `/images/{filename}` | Serve a generated image |
 | DELETE | `/images/{filename}` | Delete a generated image |
 | POST | `/cleanup?max_age_hours=24` | Sweep old images |
 All non-health endpoints are gated by `ApiKeyMiddleware` — clients must send `Authorization: Bearer $GPU_API_KEY` (header name and verification details in `app/api_auth.py`).
 ### Generate request
 ```json
 {
  "prompt": "A futuristic city skyline at sunset",
  "width": 1024,
  "height": 1024,
  "steps": 4,
  "seed": -1
 }
 ```
 ## Code layout
 ```
 services/mana-image-gen/
 ├── app/
 │   ├── __init__.py
-│   ├── main.py              # FastAPI endpoints
+│   ├── main.py            # FastAPI endpoints
-│   └── flux_service.py      # flux2.c subprocess wrapper
+│   ├── flux_service.py    # diffusers pipeline + generate_image()
-├── setup.sh                 # Setup script
+│   ├── api_auth.py        # ApiKeyMiddleware (GPU_API_KEY)
-├── requirements.txt
+│   └── vram_manager.py    # shared VRAM accounting helper
-├── CLAUDE.md
+└── service.pyw            # Windows runner (used by Scheduled Task)
 └── README.md
 ```
-## API Endpoints
+## Configuration (`.env` on the Windows GPU box)
-| Endpoint | Method | Purpose |
+```env
-|----------|--------|---------|
+PORT=3023
-| `/health` | GET | Health check |
+IMAGE_MODEL_ID=black-forest-labs/FLUX.1-schnell
-| `/models` | GET | Model info |
+DEFAULT_STEPS=4
-| `/generate` | POST | Generate image |
+DEFAULT_WIDTH=1024
-| `/images/{filename}` | GET | Serve generated image |
+DEFAULT_HEIGHT=1024
-| `/images/{filename}` | DELETE | Delete image |
+MAX_STEPS=8
-| `/cleanup` | POST | Clean old images |
+GUIDANCE_SCALE=0.0
-
+GENERATION_TIMEOUT=120
-## Generate Request
+OUTPUT_DIR=C:\mana\services\mana-image-gen\outputs
-
+CORS_ORIGINS=https://mana.how,https://chat.mana.how
-```json
+GPU_API_KEY=...                # cross-service auth, also used by mana-llm
 {
  "prompt": "A beautiful sunset over mountains",
  "width": 1024,
  "height": 1024,
  "steps": 4,
  "seed": -1,
  "output_format": "png"
 }
 ```
-## Generate Response
+The `service.pyw` runner loads `.env` from the service directory before
 starting uvicorn.
-```json
+## Operations
-{
+
-  "success": true,
+```powershell
-  "image_url": "/images/abc123.png",
+# Status
-  "prompt": "A beautiful sunset over mountains",
+Get-ScheduledTask -TaskName "ManaImageGen" | Format-List TaskName, State
-  "width": 1024,
+Get-NetTCPConnection -LocalPort 3023 -State Listen
-  "height": 1024,
+
-  "steps": 4,
+# Restart
-  "seed": 42,
+Stop-ScheduledTask -TaskName "ManaImageGen"
-  "generation_time": 0.85
+Start-ScheduledTask -TaskName "ManaImageGen"
-}
+
 # Logs
 Get-Content C:\mana\services\mana-image-gen\service.log -Tail 50
 ```
-## Environment Variables
+## Model details
-| Variable | Default | Description |
+| Field | Value |
-|----------|---------|-------------|
+|-------|-------|
-| `PORT` | `3025` | Service port |
+| Model | `black-forest-labs/FLUX.1-schnell` |
-| `FLUX_BINARY` | `/opt/flux2/flux` | Path to flux2.c binary |
+| Parameters | ~12B |
-| `FLUX_MODEL_DIR` | `/opt/flux2/model` | Path to model weights |
+| License | Apache 2.0 (commercial use OK) |
-| `DEFAULT_STEPS` | `4` | Default sampling steps |
+| Weights size | ~24 GB on disk |
-| `DEFAULT_WIDTH` | `1024` | Default image width |
+| VRAM footprint | ~12 GB (with the default precision/optimization settings) |
-| `DEFAULT_HEIGHT` | `1024` | Default image height |
+| Optimal sampling steps | 4 (distilled "schnell" variant) |
-| `GENERATION_TIMEOUT` | `120` | Timeout in seconds |
+| HuggingFace gate | Requires HF login + license accept |
 | `MAX_PROMPT_LENGTH` | `2000` | Max prompt chars |
 | `CORS_ORIGINS` | (production URLs) | CORS config |
-## Model Details
+## Reference
-### FLUX.2 klein 4B
+- `docs/WINDOWS_GPU_SERVER_SETUP.md` — full Windows GPU box setup, all
-
+  AI services, scheduled task setup, firewall rules, Cloudflare tunnel
- **Parameters**: 4 billion
+- `docs/PORT_SCHEMA.md` — port assignments across services
 - **License**: Apache 2.0 (commercial use allowed)
 - **Download size**: ~16 GB
 - **RAM usage**: ~4-5 GB (memory-mapped)
 - **Optimal steps**: 4 (distilled model)
 - **Release**: January 2026
 ## Integration with Other Apps
 The service is designed to be used by:
 - **Picture App** (`apps/picture/`) - AI image generation platform
 - **Chat App** (`apps/chat/`) - Inline image generation
 - **Matrix Bots** - Image generation via chat commands
 - **API Gateway** - Public API access
 ### Example Integration (TypeScript)
 ```typescript
 const response = await fetch('http://localhost:3025/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    prompt: 'A futuristic city at night',
    width: 1024,
    height: 1024,
  }),
 });
 const result = await response.json();
 const imageUrl = `http://localhost:3025${result.image_url}`;
 ```
 ## Dependencies
 - `fastapi` - Web framework
 - `uvicorn` - ASGI server
 - `pillow` - Image processing
 - `flux2.c` - Native binary (installed separately)
 ## Performance
 On Mac Mini M4 (16 GB):
 | Resolution | Steps | Time |
 |------------|-------|------|
 | 512x512 | 4 | ~0.3s |
 | 1024x1024 | 4 | ~0.8s |
 | 1024x1024 | 8 | ~1.5s |
 ## Troubleshooting
 ### flux2.c not found
 ```bash
 # Verify installation
 ls -la /opt/flux2/flux
 # Reinstall
 sudo rm -rf /opt/flux2
 ./setup.sh
 ```
 ### Model not found
 ```bash
 # Check model directory
 ls -la /opt/flux2/model/
 # Re-download
 cd /opt/flux2/src
 ./download-model.sh /opt/flux2/model
 ```
 ### Out of memory
 - Reduce resolution to 512x512
 - Close other applications
 - The 16 GB Mac Mini should handle 1024x1024 fine
 ### Slow generation
 - Ensure MPS build was used: `make mps`
 - Check Metal GPU is being used
 - Reduce steps (4 is optimal for klein)
--- a/services/mana-image-gen/README.md
+++ b/services/mana-image-gen/README.md
@ -1,109 +1,31 @@
 # Mana Image Generation Service
-Local AI image generation using **FLUX.2 klein 4B** model via flux2.c.
+AI image generation via **FLUX.1-schnell** (HuggingFace `diffusers` + PyTorch CUDA). Runs on the Windows GPU server (`mana-server-gpu`, NVIDIA RTX 3090).
-## Features
+For architecture, deployment, and operations, see [`CLAUDE.md`](./CLAUDE.md) and [`docs/WINDOWS_GPU_SERVER_SETUP.md`](../../docs/WINDOWS_GPU_SERVER_SETUP.md).
- **Fast**: Sub-second generation on Apple Silicon
+## Port: 3023
 - **Efficient**: ~4-5 GB RAM (memory-mapped weights)
 - **Open**: Apache 2.0 license (commercial use)
 - **Local**: 100% on-device, no API keys needed
-## Requirements
+## Public URL
- macOS with Apple Silicon (M1/M2/M3/M4)
+`https://gpu-img.mana.how` (via Cloudflare Tunnel + Mac Mini gpu-proxy)
 - 16 GB RAM minimum
 - ~20 GB disk space (model + binary)
 - Python 3.11+
-## Quick Start
+## Quickly
 ```bash
-# 1. Run setup (installs flux2.c + downloads model)
+curl https://gpu-img.mana.how/health
 ./setup.sh
-# 2. Start the service
+curl -X POST https://gpu-img.mana.how/generate \
-source .venv/bin/activate
+  -H "Authorization: Bearer $GPU_API_KEY" \
 FLUX_BINARY=/opt/flux2/flux FLUX_MODEL_DIR=/opt/flux2/model \
  uvicorn app.main:app --host 0.0.0.0 --port 3025
 # 3. Generate an image
 curl -X POST http://localhost:3025/generate \
  -H "Content-Type: application/json" \
-  -d '{"prompt": "A cat wearing sunglasses"}' | jq
+  -d '{"prompt":"A serene mountain lake at dawn","width":1024,"height":1024,"steps":4}'
 ```
 ## API
 ### Generate Image
 ```bash
 POST /generate
 Content-Type: application/json
 {
  "prompt": "A beautiful mountain landscape",
  "width": 1024,
  "height": 1024,
  "steps": 4,
  "seed": -1,
  "output_format": "png"
 }
 ```
 Response:
 ```json
 {
  "success": true,
  "image_url": "/images/abc123.png",
  "prompt": "A beautiful mountain landscape",
  "width": 1024,
  "height": 1024,
  "steps": 4,
  "seed": 42,
  "generation_time": 0.85
 }
 ```
 ### Get Image
 ```bash
 GET /images/{filename}
 ```
 ### Health Check
 ```bash
 GET /health
 ```
 ### Model Info
 ```bash
 GET /models
 ```
 ## Environment Variables
 | Variable | Default | Description |
 |----------|---------|-------------|
 | `PORT` | `3025` | Service port |
 | `FLUX_BINARY` | `/opt/flux2/flux` | flux2.c binary path |
 | `FLUX_MODEL_DIR` | `/opt/flux2/model` | Model weights path |
 | `DEFAULT_STEPS` | `4` | Sampling steps |
 | `DEFAULT_WIDTH` | `1024` | Default width |
 | `DEFAULT_HEIGHT` | `1024` | Default height |
 ## Model
-**FLUX.2 klein 4B** by Black Forest Labs (January 2026)
+| Field | Value |
-
+|-------|-------|
- 4 billion parameters
+| Model | `black-forest-labs/FLUX.1-schnell` |
- Apache 2.0 license
+| License | Apache 2.0 |
- Optimized for 4 sampling steps
+| Sampling | 4 steps (distilled) |
- Sub-second inference on consumer GPUs
+| VRAM | ~12 GB |
 ## Credits
 - [flux2.c](https://github.com/antirez/flux2.c) - Pure C implementation by antirez
 - [Black Forest Labs](https://bfl.ai) - FLUX.2 model
--- a/services/mana-image-gen/app/api_auth.py
+++ b/services/mana-image-gen/app/api_auth.py
@ -0,0 +1,53 @@
 """
 Simple API Key Authentication Middleware for GPU Services.
 Checks X-API-Key header or ?api_key query parameter.
 Skips auth for /health, /docs, /openapi.json, /redoc endpoints.
 Environment variables:
  GPU_API_KEY: Required API key (if empty, auth is disabled)
  GPU_REQUIRE_AUTH: Enable/disable auth (default: true if GPU_API_KEY is set)
 """
 import os
 import logging
 from fastapi import Request
 from fastapi.responses import JSONResponse
 from starlette.middleware.base import BaseHTTPMiddleware
 logger = logging.getLogger(__name__)
 GPU_API_KEY = os.getenv("GPU_API_KEY", "")
 GPU_REQUIRE_AUTH = os.getenv("GPU_REQUIRE_AUTH", "true" if GPU_API_KEY else "false").lower() == "true"
 # Endpoints that don't require auth
 PUBLIC_PATHS = {"/health", "/docs", "/openapi.json", "/redoc", "/metrics"}
 class ApiKeyMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        # Skip auth if disabled
        if not GPU_REQUIRE_AUTH or not GPU_API_KEY:
            return await call_next(request)
        # Skip auth for public endpoints
        if request.url.path in PUBLIC_PATHS:
            return await call_next(request)
        # Check API key from header or query param
        api_key = request.headers.get("X-API-Key") or request.query_params.get("api_key")
        if not api_key:
            return JSONResponse(
                status_code=401,
                content={"detail": "Missing API key. Provide X-API-Key header."},
            )
        if api_key != GPU_API_KEY:
            logger.warning(f"Invalid API key attempt from {request.client.host if request.client else 'unknown'}")
            return JSONResponse(
                status_code=401,
                content={"detail": "Invalid API key."},
            )
        return await call_next(request)
--- a/services/mana-image-gen/app/flux_service.py
+++ b/services/mana-image-gen/app/flux_service.py
@ -1,14 +1,18 @@
 """
-FLUX.2 klein Image Generation Service
+Image Generation Service - CUDA version
-Uses flux2.c (Pure C implementation) for image generation.
+Supports multiple models via HuggingFace diffusers:
-Optimized for Apple Silicon with MPS acceleration.
+- FLUX.2 klein 4B (default): Fast, ~13GB VRAM, best quality/speed ratio
 - SDXL-Turbo: Fast fallback, 6GB, ungated
 - FLUX.1-schnell: 12B params, 23GB, gated
 Optimized for NVIDIA RTX 3090 (24GB VRAM).
 """
 import asyncio
 import logging
 import os
-import tempfile
+import time
 import uuid
 from dataclasses import dataclass
 from pathlib import Path
@ -17,23 +21,83 @@ from typing import Optional
 logger = logging.getLogger(__name__)
 # Configuration
-FLUX_BINARY = os.getenv("FLUX_BINARY", os.path.expanduser("~/flux2/flux"))
+MODEL_ID = os.getenv("IMAGE_MODEL_ID", "black-forest-labs/FLUX.2-klein-4B")
 FLUX_MODEL_DIR = os.getenv("FLUX_MODEL_DIR", os.path.expanduser("~/flux2/model"))
 DEFAULT_STEPS = int(os.getenv("DEFAULT_STEPS", "4"))
 DEFAULT_WIDTH = int(os.getenv("DEFAULT_WIDTH", "1024"))
 DEFAULT_HEIGHT = int(os.getenv("DEFAULT_HEIGHT", "1024"))
-DEFAULT_SEED = int(os.getenv("DEFAULT_SEED", "-1"))  # -1 = random
+GENERATION_TIMEOUT = int(os.getenv("GENERATION_TIMEOUT", "300"))
-GENERATION_TIMEOUT = int(os.getenv("GENERATION_TIMEOUT", "300"))  # seconds (first load takes ~90s)
+GUIDANCE_SCALE = float(os.getenv("GUIDANCE_SCALE", "0.0"))
 # Output directory for generated images
-OUTPUT_DIR = Path(os.getenv("OUTPUT_DIR", "/tmp/mana-image-gen"))
+OUTPUT_DIR = Path(os.getenv("OUTPUT_DIR", "C:/mana/services/mana-image-gen/output"))
 OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
 # Known model configs
 MODEL_CONFIGS = {
    "black-forest-labs/FLUX.2-klein-4B": {
        "pipeline_class": "Flux2KleinPipeline",
        "model_name": "FLUX.2-klein-4B",
        "parameters": "4 billion",
        "license": "FLUX.2 Community License",
        "torch_dtype": "bfloat16",
        "guidance_scale": 4.0,
        "default_steps": 4,
    },
    "black-forest-labs/FLUX.2-klein-9B": {
        "pipeline_class": "Flux2KleinPipeline",
        "model_name": "FLUX.2-klein-9B",
        "parameters": "9 billion",
        "license": "FLUX.2 Community License",
        "torch_dtype": "bfloat16",
        "guidance_scale": 4.0,
        "default_steps": 4,
    },
    "stabilityai/sdxl-turbo": {
        "pipeline_class": "AutoPipelineForText2Image",
        "model_name": "SDXL-Turbo",
        "parameters": "3.5 billion",
        "license": "Stability AI Community License",
        "torch_dtype": "float16",
        "guidance_scale": 0.0,
        "default_steps": 4,
    },
    "black-forest-labs/FLUX.1-schnell": {
        "pipeline_class": "FluxPipeline",
        "model_name": "FLUX.1-schnell",
        "parameters": "12 billion",
        "license": "Apache 2.0",
        "torch_dtype": "float16",
        "guidance_scale": 0.0,
        "default_steps": 4,
    },
 }
 # Global pipeline instance (lazy loaded)
 _pipeline = None
 # VRAM management — unload FLUX after 5 min idle (frees ~13GB)
 from app.vram_manager import VramManager
 _vram = VramManager(
    idle_timeout=int(os.getenv("VRAM_IDLE_TIMEOUT", "300")),
    service_name="mana-image-gen",
 )
 def unload_pipeline():
    """Unload FLUX pipeline from GPU to free VRAM."""
    global _pipeline
    if _pipeline is not None:
        import torch
        del _pipeline
        _pipeline = None
        torch.cuda.empty_cache()
        _vram.mark_unloaded()
        logger.info("FLUX pipeline unloaded, VRAM freed")
@dataclass
 class GenerationResult:
    """Result of image generation."""
    image_path: str
    prompt: str
    width: int
@ -43,25 +107,99 @@ class GenerationResult:
    generation_time: float
 def _load_pipeline():
    """Load the image generation pipeline (called once, lazy)."""
    global _pipeline
    if _pipeline is not None:
        return _pipeline
    logger.info(f"Loading model: {MODEL_ID}")
    load_start = time.time()
    import torch
    config = MODEL_CONFIGS.get(MODEL_ID, {})
    pipeline_class = config.get("pipeline_class", "AutoPipelineForText2Image")
    dtype_str = config.get("torch_dtype", "float16")
    dtype = torch.bfloat16 if dtype_str == "bfloat16" else torch.float16
    if pipeline_class == "Flux2KleinPipeline":
        from diffusers import Flux2KleinPipeline
        _pipeline = Flux2KleinPipeline.from_pretrained(
            MODEL_ID,
            torch_dtype=dtype,
        )
        _pipeline.to("cuda")
    elif pipeline_class == "FluxPipeline":
        from diffusers import FluxPipeline
        _pipeline = FluxPipeline.from_pretrained(
            MODEL_ID,
            torch_dtype=dtype,
        )
        _pipeline.enable_model_cpu_offload()
    else:
        from diffusers import AutoPipelineForText2Image
        _pipeline = AutoPipelineForText2Image.from_pretrained(
            MODEL_ID,
            torch_dtype=dtype,
            variant="fp16",
        )
        _pipeline.to("cuda")
    load_time = time.time() - load_start
    logger.info(f"Model loaded in {load_time:.1f}s")
    _vram.mark_loaded()
    return _pipeline
 def is_flux_available() -> bool:
-    """Check if flux2.c binary and model are available."""
+    """Check if image generation is available."""
-    binary_exists = Path(FLUX_BINARY).exists()
+    try:
-    model_exists = Path(FLUX_MODEL_DIR).exists()
+        import torch
-    return binary_exists and model_exists
+        import diffusers
        return torch.cuda.is_available()
    except ImportError:
        return False
 def get_flux_info() -> dict:
-    """Get information about the flux installation."""
+    """Get information about the model."""
    import torch
    loaded = _pipeline is not None
    gpu_name = torch.cuda.get_device_name(0) if torch.cuda.is_available() else "N/A"
    vram_used = torch.cuda.memory_allocated(0) / (1024**3) if torch.cuda.is_available() else 0
    config = MODEL_CONFIGS.get(MODEL_ID, {})
    return {
-        "binary": FLUX_BINARY,
+        "model_id": MODEL_ID,
-        "binary_exists": Path(FLUX_BINARY).exists(),
+        "model_name": config.get("model_name", MODEL_ID.split("/")[-1]),
-        "model_dir": FLUX_MODEL_DIR,
+        "parameters": config.get("parameters", "unknown"),
-        "model_exists": Path(FLUX_MODEL_DIR).exists(),
+        "license": config.get("license", "unknown"),
-        "model_name": "FLUX.2-klein-4B",
+        "backend": "diffusers (CUDA)",
-        "parameters": "4 billion",
+        "gpu": gpu_name,
-        "license": "Apache 2.0",
+        "gpu_vram_used_gb": round(vram_used, 2),
        "loaded": loaded,
        "default_steps": DEFAULT_STEPS,
        "default_resolution": f"{DEFAULT_WIDTH}x{DEFAULT_HEIGHT}",
        "vram": _vram.status(),
    }
 def get_vram_status() -> dict:
    """Get VRAM manager status."""
    import torch
    vram_allocated = torch.cuda.memory_allocated(0) / (1024**3) if torch.cuda.is_available() else 0
    vram_reserved = torch.cuda.memory_reserved(0) / (1024**3) if torch.cuda.is_available() else 0
    vram_total = torch.cuda.get_device_properties(0).total_mem / (1024**3) if torch.cuda.is_available() else 0
    return {
        "gpu_vram_allocated_gb": round(vram_allocated, 2),
        "gpu_vram_reserved_gb": round(vram_reserved, 2),
        "gpu_vram_total_gb": round(vram_total, 2),
        "model": _vram.status(),
    }
@ -73,110 +211,76 @@ async def generate_image(
    seed: Optional[int] = None,
    output_format: str = "png",
 ) -> GenerationResult:
-    """
+    """Generate an image from a text prompt."""
-    Generate an image using FLUX.2 klein via flux2.c.
+    import torch
-    Args:
+    # Check idle unload first
-        prompt: Text prompt for image generation
+    _vram.check_and_unload(unload_pipeline)
        width: Image width (default 1024)
        height: Image height (default 1024)
        steps: Number of sampling steps (default 4)
        seed: Random seed (-1 for random)
        output_format: Output format (png, jpg)
-    Returns:
+    # Load pipeline (lazy — reloads if previously unloaded)
-        GenerationResult with image path and metadata
+    loop = asyncio.get_event_loop()
-
+    pipe = await loop.run_in_executor(None, _load_pipeline)
    Raises:
        RuntimeError: If flux2.c is not available or generation fails
    """
    if not is_flux_available():
        raise RuntimeError(
            f"flux2.c not available. Binary: {FLUX_BINARY}, Model: {FLUX_MODEL_DIR}"
        )
    # Generate unique output filename
    image_id = str(uuid.uuid4())[:8]
    output_path = OUTPUT_DIR / f"{image_id}.{output_format}"
-    # Use provided seed or generate random
+    # Set seed
-    actual_seed = seed if seed is not None and seed >= 0 else -1
+    if seed is not None and seed >= 0:
        generator = torch.Generator("cuda").manual_seed(seed)
        actual_seed = seed
    else:
        actual_seed = torch.randint(0, 2**32, (1,)).item()
        generator = torch.Generator("cuda").manual_seed(actual_seed)
-    # Build flux2.c command
+    # Get guidance scale from config
-    cmd = [
+    config = MODEL_CONFIGS.get(MODEL_ID, {})
-        FLUX_BINARY,
+    guidance = GUIDANCE_SCALE if GUIDANCE_SCALE > 0 else config.get("guidance_scale", 0.0)
        "-d", FLUX_MODEL_DIR,
        "-p", prompt,
        "-o", str(output_path),
        "-W", str(width),
        "-H", str(height),
        "-s", str(steps),
    ]
-    if actual_seed >= 0:
+    logger.info(f"Generating: {width}x{height}, {steps} steps, seed={actual_seed}")
        cmd.extend(["-S", str(actual_seed)])
    logger.info(f"Running flux2.c: {' '.join(cmd[:6])}...")
    import time
    start_time = time.time()
-    try:
+    def _generate():
-        # Run flux2.c as subprocess
+        with torch.inference_mode():
-        process = await asyncio.create_subprocess_exec(
+            result = pipe(
-            *cmd,
+                prompt=prompt,
-            stdout=asyncio.subprocess.PIPE,
+                width=width,
-            stderr=asyncio.subprocess.PIPE,
+                height=height,
-        )
+                num_inference_steps=steps,
                generator=generator,
                guidance_scale=guidance,
            )
            return result.images[0]
-        stdout, stderr = await asyncio.wait_for(
+    try:
-            process.communicate(),
+        image = await asyncio.wait_for(
            loop.run_in_executor(None, _generate),
            timeout=GENERATION_TIMEOUT,
        )
        generation_time = time.time() - start_time
        if process.returncode != 0:
            error_msg = stderr.decode() if stderr else "Unknown error"
            logger.error(f"flux2.c failed: {error_msg}")
            raise RuntimeError(f"Image generation failed: {error_msg}")
        # Verify output file exists
        if not output_path.exists():
            raise RuntimeError("Image generation completed but output file not found")
        # Parse seed from output if random
        parsed_seed = actual_seed
        if stdout:
            output_text = stdout.decode()
            # flux2.c outputs "seed: 12345" when using random seed
            for line in output_text.split("\n"):
                if line.startswith("seed:"):
                    try:
                        parsed_seed = int(line.split(":")[1].strip())
                    except (ValueError, IndexError):
                        pass
        logger.info(
            f"Image generated: {output_path} ({width}x{height}, {steps} steps, {generation_time:.2f}s)"
        )
        return GenerationResult(
            image_path=str(output_path),
            prompt=prompt,
            width=width,
            height=height,
            steps=steps,
            seed=parsed_seed,
            generation_time=generation_time,
        )
    except asyncio.TimeoutError:
-        logger.error(f"Image generation timed out after {GENERATION_TIMEOUT}s")
+        raise RuntimeError(f"Generation timed out after {GENERATION_TIMEOUT}s")
-        raise RuntimeError(f"Generation timed out after {GENERATION_TIMEOUT} seconds")
+
-    except Exception as e:
+    generation_time = time.time() - start_time
-        logger.error(f"Image generation error: {e}")
+
-        raise
+    # Save image
    if output_format == "jpg":
        image.save(output_path, "JPEG", quality=95)
    else:
        image.save(output_path, "PNG")
    _vram.touch()
    logger.info(f"Generated: {output_path} ({width}x{height}, {steps} steps, {generation_time:.2f}s)")
    return GenerationResult(
        image_path=str(output_path),
        prompt=prompt,
        width=width,
        height=height,
        steps=steps,
        seed=actual_seed,
        generation_time=generation_time,
    )
 def cleanup_image(image_path: str) -> bool:
@ -193,8 +297,6 @@ def cleanup_image(image_path: str) -> bool:
 def cleanup_old_images(max_age_hours: int = 24) -> int:
    """Clean up images older than max_age_hours."""
    import time
    cleaned = 0
    cutoff = time.time() - (max_age_hours * 3600)
--- a/services/mana-image-gen/app/main.py
+++ b/services/mana-image-gen/app/main.py
@ -21,6 +21,7 @@ from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import FileResponse
 from pydantic import BaseModel, Field
 from .api_auth import ApiKeyMiddleware
 from .flux_service import (
    generate_image,
    is_flux_available,
@ -40,7 +41,7 @@ logging.basicConfig(
 logger = logging.getLogger(__name__)
 # Configuration from environment
-PORT = int(os.getenv("PORT", "3025"))
+PORT = int(os.getenv("PORT", "3023"))
 MAX_PROMPT_LENGTH = int(os.getenv("MAX_PROMPT_LENGTH", "2000"))
 MIN_DIMENSION = int(os.getenv("MIN_DIMENSION", "256"))
 MAX_DIMENSION = int(os.getenv("MAX_DIMENSION", "2048"))
@ -87,6 +88,7 @@ app.add_middleware(
    allow_methods=["*"],
    allow_headers=["*"],
 )
 app.add_middleware(ApiKeyMiddleware)
 # ============================================================================
--- a/services/mana-image-gen/app/vram_manager.py
+++ b/services/mana-image-gen/app/vram_manager.py
@ -0,0 +1,114 @@
 """
 VRAM Manager — Automatic model unloading after idle timeout.
 Tracks last usage time per model and unloads after configurable timeout.
 Designed for shared GPU environments (multiple services on one RTX 3090).
 Usage in a service:
    from vram_manager import VramManager
    vram = VramManager(idle_timeout=300)  # 5 min
    # Before using a model
    vram.touch()
    # Call periodically (e.g., from health check or background task)
    vram.check_idle(unload_fn=my_unload_function)
 """
 import os
 import time
 import logging
 import threading
 from typing import Optional, Callable
 logger = logging.getLogger(__name__)
 DEFAULT_IDLE_TIMEOUT = int(os.getenv("VRAM_IDLE_TIMEOUT", "300"))  # 5 minutes
 class VramManager:
    def __init__(self, idle_timeout: int = DEFAULT_IDLE_TIMEOUT, service_name: str = "unknown"):
        self.idle_timeout = idle_timeout
        self.service_name = service_name
        self.last_used: float = 0.0
        self.model_loaded: bool = False
        self._lock = threading.Lock()
        self._timer: Optional[threading.Timer] = None
    def touch(self):
        """Mark the model as recently used. Call before/after each inference."""
        with self._lock:
            self.last_used = time.time()
            self.model_loaded = True
            self._schedule_check()
    def mark_loaded(self):
        """Mark that a model has been loaded into VRAM."""
        with self._lock:
            self.model_loaded = True
            self.last_used = time.time()
            self._schedule_check()
            logger.info(f"[{self.service_name}] Model loaded, idle timeout: {self.idle_timeout}s")
    def mark_unloaded(self):
        """Mark that a model has been unloaded from VRAM."""
        with self._lock:
            self.model_loaded = False
            if self._timer:
                self._timer.cancel()
                self._timer = None
            logger.info(f"[{self.service_name}] Model unloaded, VRAM freed")
    def is_idle(self) -> bool:
        """Check if the model has been idle longer than the timeout."""
        if not self.model_loaded:
            return False
        return (time.time() - self.last_used) > self.idle_timeout
    def seconds_until_unload(self) -> Optional[float]:
        """Seconds until the model will be unloaded, or None if not loaded."""
        if not self.model_loaded:
            return None
        remaining = self.idle_timeout - (time.time() - self.last_used)
        return max(0, remaining)
    def check_and_unload(self, unload_fn: Callable[[], None]) -> bool:
        """Check if idle and unload if so. Returns True if unloaded."""
        if self.is_idle():
            logger.info(f"[{self.service_name}] Idle for >{self.idle_timeout}s, unloading model...")
            try:
                unload_fn()
                self.mark_unloaded()
                return True
            except Exception as e:
                logger.error(f"[{self.service_name}] Failed to unload: {e}")
        return False
    def _schedule_check(self):
        """Schedule an idle check after the timeout period."""
        if self._timer:
            self._timer.cancel()
        self._timer = threading.Timer(
            self.idle_timeout + 5,  # Small buffer
            self._auto_check,
        )
        self._timer.daemon = True
        self._timer.start()
    def _auto_check(self):
        """Auto-triggered idle check (called by timer)."""
        # This is just a log — actual unloading needs the unload_fn
        # which depends on the service. The service should call check_and_unload.
        if self.is_idle():
            logger.info(f"[{self.service_name}] Model idle for >{self.idle_timeout}s — ready to unload")
    def status(self) -> dict:
        """Get current VRAM manager status."""
        return {
            "model_loaded": self.model_loaded,
            "idle_seconds": round(time.time() - self.last_used, 1) if self.model_loaded else None,
            "idle_timeout": self.idle_timeout,
            "seconds_until_unload": round(self.seconds_until_unload(), 1) if self.model_loaded else None,
        }
--- a/services/mana-image-gen/service.pyw
+++ b/services/mana-image-gen/service.pyw
@ -0,0 +1,17 @@
 """mana-image-gen service runner."""
 import os
 import sys
 os.chdir(r"C:\mana\services\mana-image-gen")
 sys.path.insert(0, r"C:\mana\services\mana-image-gen")
 # Load .env file
 from dotenv import load_dotenv
 load_dotenv(r"C:\mana\services\mana-image-gen\.env")
 # Redirect stdout/stderr to log file
 log = open(r"C:\mana\services\mana-image-gen\service.log", "w", buffering=1)
 sys.stdout = log
 sys.stderr = log
 import uvicorn
 uvicorn.run("app.main:app", host="0.0.0.0", port=3023, log_level="info")
--- a/services/mana-image-gen/setup.sh
+++ b/services/mana-image-gen/setup.sh
@ -1,227 +0,0 @@
 #!/bin/bash
 # Setup script for Mana Image Generation service
 # Installs flux2.c and FLUX.2 klein 4B model
 # Optimized for Apple Silicon (MPS)
 set -e
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 VENV_DIR="$SCRIPT_DIR/.venv"
 FLUX_DIR="/opt/flux2"
 MODEL_DIR="$FLUX_DIR/model"
 echo "=========================================="
 echo "Mana Image Generation Setup"
 echo "=========================================="
 echo ""
 # Check platform
 if [[ "$(uname)" != "Darwin" ]]; then
    echo "Error: This service requires macOS with Apple Silicon."
    echo "flux2.c uses MPS (Metal Performance Shaders) for acceleration."
    exit 1
 fi
 # Check for Apple Silicon
 if [[ "$(uname -m)" != "arm64" ]]; then
    echo "Error: This service requires Apple Silicon (arm64)."
    echo "flux2.c is optimized for M1/M2/M3/M4 chips."
    exit 1
 fi
 echo "Platform: macOS $(sw_vers -productVersion) on $(uname -m)"
 echo ""
 # ============================================
 # Step 1: Install flux2.c
 # ============================================
 echo "Step 1: Installing flux2.c"
 echo "----------------------------------------"
 # Check if flux2.c already exists
 if [[ -f "$FLUX_DIR/flux" ]]; then
    echo "flux2.c already installed at $FLUX_DIR/flux"
    echo "To reinstall, remove the directory first: sudo rm -rf $FLUX_DIR"
 else
    echo "Creating installation directory..."
    sudo mkdir -p "$FLUX_DIR"
    sudo chown $(whoami) "$FLUX_DIR"
    # Clone flux2.c repository
    echo "Cloning flux2.c repository..."
    cd "$FLUX_DIR"
    git clone https://github.com/antirez/flux2.c.git src
    cd src
    # Build with MPS support (Apple Silicon optimized)
    echo "Building flux2.c with MPS acceleration..."
    make mps
    # Move binary to parent directory
    cp flux "$FLUX_DIR/flux"
    chmod +x "$FLUX_DIR/flux"
    echo "flux2.c installed successfully!"
 fi
 # Verify binary
 if [[ -x "$FLUX_DIR/flux" ]]; then
    echo "Binary: $FLUX_DIR/flux"
 else
    echo "Error: flux2.c binary not found or not executable"
    exit 1
 fi
 echo ""
 # ============================================
 # Step 2: Download FLUX.2 klein 4B model
 # ============================================
 echo "Step 2: Downloading FLUX.2 klein 4B model"
 echo "----------------------------------------"
 echo "Note: This will download ~16GB of model weights"
 echo ""
 if [[ -d "$MODEL_DIR" ]] && [[ -f "$MODEL_DIR/flux.safetensors" ]]; then
    echo "Model already downloaded at $MODEL_DIR"
 else
    mkdir -p "$MODEL_DIR"
    cd "$FLUX_DIR/src"
    # Run the model download script
    if [[ -f "./download-model.sh" ]]; then
        echo "Running download script..."
        ./download-model.sh "$MODEL_DIR"
    else
        echo "Downloading model manually..."
        # flux2.c expects the model in a specific format
        # The model includes:
        # - flux.safetensors (main weights)
        # - qwen3-4b.safetensors (text encoder)
        # - ae.safetensors (autoencoder)
        echo "Please run the following commands manually:"
        echo ""
        echo "  cd $FLUX_DIR/src"
        echo "  ./download-model.sh $MODEL_DIR"
        echo ""
        echo "Or download from Hugging Face:"
        echo "  https://huggingface.co/black-forest-labs/FLUX.2-klein-4B"
        echo ""
    fi
 fi
 echo ""
 # ============================================
 # Step 3: Setup Python environment
 # ============================================
 echo "Step 3: Setting up Python environment"
 echo "----------------------------------------"
 # Find Python
 if command -v python3.11 &> /dev/null; then
    PYTHON_CMD="python3.11"
 elif command -v python3 &> /dev/null; then
    PYTHON_CMD="python3"
 else
    echo "Error: Python 3 not found. Please install Python 3.11 or later."
    exit 1
 fi
 echo "Using Python: $PYTHON_CMD"
 $PYTHON_CMD --version
 echo ""
 # Create virtual environment
 if [[ -d "$VENV_DIR" ]]; then
    echo "Virtual environment exists at $VENV_DIR"
    read -p "Recreate it? (y/N) " -n 1 -r
    echo ""
    if [[ $REPLY =~ ^[Yy]$ ]]; then
        rm -rf "$VENV_DIR"
        $PYTHON_CMD -m venv "$VENV_DIR"
    fi
 else
    echo "Creating virtual environment..."
    $PYTHON_CMD -m venv "$VENV_DIR"
 fi
 # Activate and install dependencies
 source "$VENV_DIR/bin/activate"
 pip install --upgrade pip
 pip install -r "$SCRIPT_DIR/requirements.txt"
 echo ""
 # ============================================
 # Step 4: Create output directory
 # ============================================
 echo "Step 4: Creating output directory"
 echo "----------------------------------------"
 OUTPUT_DIR="/tmp/mana-image-gen"
 mkdir -p "$OUTPUT_DIR"
 echo "Output directory: $OUTPUT_DIR"
 echo ""
 # ============================================
 # Step 5: Test flux2.c
 # ============================================
 echo "Step 5: Testing flux2.c"
 echo "----------------------------------------"
 if [[ -x "$FLUX_DIR/flux" ]] && [[ -d "$MODEL_DIR" ]]; then
    echo "Testing image generation..."
    TEST_OUTPUT="$OUTPUT_DIR/test_setup.png"
    # Quick test with low resolution
    "$FLUX_DIR/flux" -d "$MODEL_DIR" -p "A simple test image" -o "$TEST_OUTPUT" -W 256 -H 256 -s 2 2>/dev/null && {
        echo "Test successful! Generated: $TEST_OUTPUT"
        rm -f "$TEST_OUTPUT"
    } || {
        echo "Warning: Test generation failed. Model may not be fully downloaded."
        echo "Please ensure the model is complete before using the service."
    }
 else
    echo "Skipping test - flux2.c or model not ready"
 fi
 echo ""
 # ============================================
 # Done
 # ============================================
 echo "=========================================="
 echo "Setup Complete!"
 echo "=========================================="
 echo ""
 echo "Configuration:"
 echo "  FLUX_BINARY: $FLUX_DIR/flux"
 echo "  FLUX_MODEL_DIR: $MODEL_DIR"
 echo "  OUTPUT_DIR: $OUTPUT_DIR"
 echo ""
 echo "To start the service:"
 echo ""
 echo "  cd $SCRIPT_DIR"
 echo "  source .venv/bin/activate"
 echo "  FLUX_BINARY=$FLUX_DIR/flux FLUX_MODEL_DIR=$MODEL_DIR uvicorn app.main:app --host 0.0.0.0 --port 3025"
 echo ""
 echo "Or for development with auto-reload:"
 echo ""
 echo "  FLUX_BINARY=$FLUX_DIR/flux FLUX_MODEL_DIR=$MODEL_DIR uvicorn app.main:app --host 0.0.0.0 --port 3025 --reload"
 echo ""
 echo "Test the service:"
 echo ""
 echo "  curl http://localhost:3025/health"
 echo "  curl -X POST http://localhost:3025/generate \\"
 echo "    -H 'Content-Type: application/json' \\"
 echo "    -d '{\"prompt\": \"A cat wearing sunglasses\"}'"
 echo ""