managarten/docker/prometheus/prometheus.yml
Till JS 0b44acdde1
Some checks are pending
CD Mac Mini / Detect Changes (push) Waiting to run
CD Mac Mini / Deploy (push) Blocked by required conditions
CI / Detect Changes (push) Waiting to run
CI / Validate (push) Waiting to run
CI / Build mana-search (push) Blocked by required conditions
CI / Build mana-sync (push) Blocked by required conditions
CI / Build mana-api-gateway (push) Blocked by required conditions
CI / Build mana-crawler (push) Blocked by required conditions
Docker Validate / Validate Dockerfiles (push) Waiting to run
Docker Validate / Build calendar-web (push) Blocked by required conditions
Docker Validate / Build quotes-web (push) Blocked by required conditions
Docker Validate / Build todo-backend (push) Blocked by required conditions
Docker Validate / Build todo-web (push) Blocked by required conditions
Docker Validate / Build mana-auth (push) Blocked by required conditions
Docker Validate / Build mana-sync (push) Blocked by required conditions
Docker Validate / Build mana-media (push) Blocked by required conditions
Mirror to Forgejo / Push to Forgejo (push) Waiting to run
chore(mana): uload aus unified-App entfernen
uLoad ist nach Code/uload/ als eigenständiger Hono+Bun-Server
migriert (siehe mana/docs/playbooks/ULOAD_GREENFIELD.md, υ-0..υ-7
durch). Live auf:
- uload.mana.how → :3108 (SvelteKit-Web, Standalone)
- uload-api.mana.how → :3107 (Hono-API, eigene Postgres-DB im
  `uload`-Schema)
- ulo.ad → :3107 (Short-Redirect-Domain)

Gelöscht / abgebaut:
- Module: apps/mana/.../modules/uload + Routen + Locales
- Top-Level: apps/uload/ (alter SvelteKit-Web + Hono-Server-Code)
- docker-compose.macmini.yml uload-server Service-Block (alter
  Container :3070 wurde durch Standalone-Stack auf :3107 ersetzt)
- mana-web env: PUBLIC_ULOAD_SERVER_URL / _CLIENT in compose +
  hooks.server.ts (env-Injection, window.__-Export, CSP-connectSrc),
  status/+page.server.ts Service-List
- prometheus uload-server scrape job + mana.how/uload probe
- shared-branding APP_BRANDING.uload + APP_ICONS.uload + MANA_APPS
  uload-Entry + UloadLogo
- spiral-db MANA_APP_INDEX.uload (=21)
- shared-types/spaces 5× 'uload' Modul-Einträge in den Space-Listen
- Registries: app-registry/apps.ts (Uload registerApp + DownloadSimple
  icon + Header), categories, help-content, module-registry,
  splitscreen, hooks.server APP_SUBDOMAINS, data/tools/init
- package.json dev:uload:* + deploy:landing:uload Scripts
- i18n: uload in apps/{de,en,es,fr,it}.json

Was BLEIBT:
- cloudflared `uload.mana.how` → :3108, `uload-api.mana.how` → :3107,
  `ulo.ad` → :3107 — Standalone-Routes
- docker-compose mana-auth CORS_ORIGINS uload.mana.how + ulo.ad —
  SSO für Standalone

Dexie v67:
- droppt links + uloadTags + uloadFolders + linkTags

mana-web svelte-check 0/0 (7256 files), snapshot test 10/10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:14:40 +02:00

379 lines
12 KiB
YAML

# Mana Prometheus Configuration
# Scrapes metrics from all services
global:
scrape_interval: 15s
evaluation_interval: 15s
# Load alerting rules
rule_files:
- /etc/prometheus/alerts.yml
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Host system metrics via node-exporter
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
relabel_configs:
- source_labels: [__address__]
target_label: instance
replacement: 'mac-mini'
# Docker container metrics via cAdvisor
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
# PostgreSQL metrics
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
# Redis metrics
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']
# ============================================
# Core Services (Hono/Bun + Go)
# ============================================
# Auth Service
- job_name: 'mana-auth'
static_configs:
- targets: ['mana-auth:3001']
metrics_path: '/metrics'
scrape_interval: 30s
# Credits Service
- job_name: 'mana-credits'
static_configs:
- targets: ['mana-credits:3002']
metrics_path: '/metrics'
scrape_interval: 30s
# User Service
- job_name: 'mana-user'
static_configs:
- targets: ['mana-user:3062']
metrics_path: '/metrics'
scrape_interval: 30s
# Subscriptions Service
- job_name: 'mana-subscriptions'
static_configs:
- targets: ['mana-subscriptions:3063']
metrics_path: '/metrics'
scrape_interval: 30s
# Analytics Service
- job_name: 'mana-analytics'
static_configs:
- targets: ['mana-analytics:3064']
metrics_path: '/metrics'
scrape_interval: 30s
# ULoad Server
- job_name: 'mana-llm'
static_configs:
- targets: ['mana-llm:3020']
metrics_path: '/metrics'
scrape_interval: 15s
# Mana Search Service
- job_name: 'mana-search'
static_configs:
- targets: ['mana-search:3012']
metrics_path: '/metrics'
scrape_interval: 30s
# Mana Media Service
- job_name: 'mana-media'
static_configs:
- targets: ['mana-media:3011']
metrics_path: '/metrics'
scrape_interval: 30s
# SKIPPED Phase 2f-3 — mana-ai moved to GPU-Box; no public /metrics endpoint via tunnel.
# blackbox-api probt mana-ai.mana.how/health stattdessen.
# Mana MCP Gateway (Bun, :3069) — exposes the shared tool-registry
# over Streamable HTTP to external agents. Emits policy-gate
# decisions (`mana_mcp_policy_decisions_total{decision,reason,mode}`)
# and per-tool invocation metrics. Critical during the POLICY_MODE
# log-only soak period to decide when it's safe to flip to enforce.
- job_name: 'mana-mcp'
static_configs:
- targets: ['mana-mcp:3069']
metrics_path: '/metrics'
scrape_interval: 30s
# ============================================
# GPU Server (Windows PC, LAN: 192.168.178.11)
# ============================================
# GPU: LLM Gateway
- job_name: 'gpu-llm'
static_configs:
- targets: ['192.168.178.11:3025']
labels:
instance: 'gpu-server'
metrics_path: '/metrics'
scrape_interval: 15s
# GPU: Speech-to-Text (WhisperX)
- job_name: 'gpu-stt'
static_configs:
- targets: ['192.168.178.11:3020']
labels:
instance: 'gpu-server'
metrics_path: '/health'
scrape_interval: 30s
# GPU: Text-to-Speech
- job_name: 'gpu-tts'
static_configs:
- targets: ['192.168.178.11:3022']
labels:
instance: 'gpu-server'
metrics_path: '/health'
scrape_interval: 30s
# GPU: Image Generation (FLUX.2)
- job_name: 'gpu-image-gen'
static_configs:
- targets: ['192.168.178.11:3023']
labels:
instance: 'gpu-server'
metrics_path: '/health'
scrape_interval: 30s
# GPU: Video Generation (LTX-Video)
- job_name: 'gpu-video-gen'
static_configs:
- targets: ['192.168.178.11:3026']
labels:
instance: 'gpu-server'
metrics_path: '/health'
scrape_interval: 30s
# ============================================
# Go Infrastructure Services
# ============================================
# API Gateway (Go)
- job_name: 'mana-api-gateway'
static_configs:
- targets: ['mana-api-gateway:3016']
metrics_path: '/metrics'
scrape_interval: 15s
# Sync Server (Go) — local-first data sync
- job_name: 'mana-sync'
static_configs:
- targets: ['mana-core-sync:3051']
metrics_path: '/metrics'
scrape_interval: 30s
# Notification Service (Go) — email, push, webhook
- job_name: 'mana-notify'
static_configs:
- targets: ['mana-core-notify:3013']
metrics_path: '/metrics'
scrape_interval: 30s
# Crawler Service (Go)
- job_name: 'mana-crawler'
static_configs:
- targets: ['mana-crawler:3014']
metrics_path: '/metrics'
scrape_interval: 30s
# ============================================
# Blackbox Exporter — HTTP Uptime Probes
# ============================================
# Web Apps (Unified Mana app at mana.how + standalone games)
- job_name: 'blackbox-web'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
# Unified Mana app (all modules as routes)
- https://mana.how
- https://mana.how/chat
- https://mana.how/todo
- https://mana.how/calendar
- https://mana.how/contacts
- https://mana.how/times
- https://mana.how/photos
- https://mana.how/picture
- https://mana.how/storage
- https://mana.how/presi
- https://mana.how/calc
- https://mana.how/quotes
- https://mana.how/cards
- https://mana.how/skilltree
- https://mana.how/music
- https://mana.how/moodlit
# mana.how/context: Modul wurde 2026-04-29 gedropt (Commit 1815139dc) — Probe entfernt
- https://mana.how/questions
- https://mana.how/notes
- https://mana.how/habits
- https://mana.how/guides
- https://mana.how/inventory
- https://mana.how/body
- https://mana.how/journal
- https://mana.how/dreams
- https://mana.how/firsts
- https://mana.how/period
- https://mana.how/events
- https://mana.how/finance
- https://mana.how/places
# mana.how/who: existiert nicht im unified-app — Who läuft als Standalone-Stack auf who.mana.how
- https://mana.how/mail
- https://mana.how/playground
# ─── Standalone Apps / Games (separate Container, eigene Tunnel-Hostnames) ───
- https://manavoxel.mana.how
# Memoro Standalone-Stack (Phase 2 mana e.V. Plattform-Migration)
- https://memoro.mana.how
# Cardecky Standalone (Phase-1 Spinoff vom Unified-mana-Modul, 2026-05-06)
- https://cardecky.mana.how
# Who? Game (Standalone-Bun-Stack auf Mac Mini, native unter PM2)
- https://who.mana.how/cantina
# npm-Registry (mana e.V. Plattform-Repo, Verdaccio)
- https://npm.mana.how
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
# API Health Endpoints (only services with running containers)
- job_name: 'blackbox-api'
metrics_path: /probe
params:
module: [http_health]
static_configs:
- targets:
- https://auth.mana.how/health
- https://api.mana.how/health
# Memoro standalone API + Audio (Phase 2 platform migration)
- https://memoro-api.mana.how/health
- https://memoro-audio.mana.how/health
- https://mana-ai.mana.how/health
- https://research.mana.how/health
# who.mana.how API on /api/decks — root is 404 by design (Phaser-Cantina mounts at /cantina)
- https://who-api.mana.how/api/decks
# Verein backoffice (mana e.V. Plattform); only /health returns 200, root is auth-walled
- https://admin.mana.how/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
# Internal-only services (not exposed via Cloudflare).
# Probed over the Docker network so the blackbox exporter reaches
# them by container name.
- job_name: 'blackbox-internal'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
# mana-geocoding's own health (Hono wrapper)
- http://mana-geocoding:3018/health
# Upstream photon-self health, proxied through the wrapper so the
# blackbox-exporter doesn't need host.docker.internal access.
- http://mana-geocoding:3018/health/photon-self
# mana-ai (Mission Runner) probe via public hostname seit Phase 2f-3 — auf GPU-Box
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
# Infrastructure & Monitoring Tools
- job_name: 'blackbox-infra'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://git.mana.how
- https://grafana.mana.how
- https://stats.mana.how
- https://glitchtip.mana.how
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
# GPU Server Services — probe /health, not /
# The GPU services (whisper STT, TTS, FLUX image gen) only return 2xx
# on /health; their root path returns 401/403/404 by design (auth or
# API-only). Ollama is the exception — its / returns 200, but it has
# no /health endpoint, so we keep it on / via a separate target.
- job_name: 'blackbox-gpu'
metrics_path: /probe
params:
module: [http_health]
static_configs:
- targets:
- https://gpu-stt.mana.how/health
- https://gpu-tts.mana.how/health
- https://gpu-img.mana.how/health
- https://gpu-video.mana.how/health
- https://gpu-llm.mana.how/health
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
- job_name: 'blackbox-gpu-root'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://gpu-ollama.mana.how
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
# ============================================
# Pushgateway (deploy metrics, batch jobs)
# ============================================
- job_name: 'pushgateway'
honor_labels: true
static_configs:
- targets: ['pushgateway:9091']