managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 21:21:10 +02:00

Author	SHA1	Message	Date
Till JS	d087b4744a	chore(observability): scrape mana-mcp at :3069 Pairs with `c94ab01c6` which added the real /metrics endpoint. Without a scrape job the policy_decisions_total counter has nowhere to go and the soak period is flying blind. 30s interval to match mana-ai. Same job shape as mana-ai — any Grafana dashboard that auto-discovers services via labels will pick this up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-23 14:24:13 +02:00
Till JS	484761e475	fix(docker): remove deleted subscriptions pkg + add shared-ai to sveltekit-base packages/subscriptions was deleted in the credits-merge cleanup; the COPY line in the base Dockerfile broke every subsequent --no-cache build. Also adds packages/shared-ai which was missing (webapp depends on it since the Multi-Agent Workbench rollout). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:15:01 +02:00
Till JS	76577869e1	feat(mana-ai): OpenTelemetry tracing + Grafana Tempo backend Add distributed tracing to the mana-ai background runner so mission execution can be visualized end-to-end in Grafana. Instrumentation (services/mana-ai/): - tracing.ts: OTel provider setup with OTLP/HTTP exporter, withSpan() helper - tick.ts: tick.planMission span with mission/agent/user attributes - client.ts: planner.complete span with LLM model, tokens, latency Infrastructure: - docker/tempo/tempo.yaml: Grafana Tempo config (OTLP HTTP on 4318) - docker-compose: tempo service + tempo_data volume + mana-ai env var - docker/grafana/provisioning/datasources/tempo.yml: auto-provisioned Trace flow: tick.planMission (root span) └── planner.complete (child span) ├── llm.model = "gpt-4o-mini" ├── llm.tokens.total = 1234 └── llm.response.length = 567 Enable: set OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 View: Grafana → Explore → Tempo datasource Also fixes: removed broken @mana/subscriptions workspace ref from arcade. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 15:21:23 +02:00
Till JS	bb3da78d5c	feat(ai): Mission Grant rollout gating — flag, alerts, runbook, user docs Phase 4 — everything needed to flip the Mission Key-Grant feature on safely per deployment. No new behaviour; purely operational plumbing. - PUBLIC_AI_MISSION_GRANTS feature flag (default off). hooks.server.ts injects window.__PUBLIC_AI_MISSION_GRANTS__, api/config.ts exposes isMissionGrantsEnabled(). Grant UI (dialog + status box) and the Workbench "Datenzugriff" tab both hide when the flag is off. - PUBLIC_MANA_AI_URL added to the injection set so the webapp can reach the new audit endpoint from production. - Prometheus alerts (new mana_ai_alerts group): - ManaAIServiceDown (warning, 2m) - ManaAIGrantScopeViolation (critical, 0m) — MUST stay at 0; any increment pages immediately - ManaAIGrantSkipsHigh (warning, 15m) — flags keypair drift - ManaAIPlannerParseFailures (warning, 10m) — prompt/LLM drift - Runbook in docs/plans/ai-mission-key-grant.md: initial keypair gen, leak-response procedure (rotate + invalidate all grants + audit), scope-violation triage. - User-facing doc in apps/docs security.mdx: new "AI Mission Grants" section with the three hard constraints (ZK users blocked, scope changes invalidate cryptographically, revocation is one click) plus an honest threat-model comparison column showing where grants shift the tradeoff. Rollout remaining (not code): generate keypair on Mac Mini, provision MANA_AI_PRIVATE_KEY_PEM + MANA_AI_PUBLIC_KEY_PEM via Docker secrets, flip PUBLIC_AI_MISSION_GRANTS=true starting with till-only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:02:47 +02:00
Till JS	0bf01f434e	feat(mana-ai): Prometheus /metrics endpoint + status.mana.how integration Wires mana-ai into the existing observability stack so tick throughput, plan-failure rates, planner latencies, and snapshot refresh health are visible in Grafana + Prometheus, and the service's uptime surfaces on status.mana.how under the "Internal" section. - `src/metrics.ts` — prom-client Registry with `mana_ai_` prefix. Counters: ticks_total, plans_produced_total, plans_written_back_total, parse_failures_total, mission_errors_total, snapshots_new/updated, snapshot_rows_applied_total, http_requests_total. Histograms: tick_duration_seconds (0.1–120s), planner_request_ duration_seconds (0.25–60s), http_request_duration_seconds (0.005–10s). - `src/index.ts` — HTTP middleware labels every request by method/path/status; `/metrics` serves the Prometheus text format. - `src/cron/tick.ts` — increments counters + wraps the tick with `tickDuration.startTimer()`. Snapshot stats fold through. - `src/planner/client.ts` — wraps `complete()` in a latency histogram timer so planner tail latency shows up separately from tick duration. - `docker/prometheus/prometheus.yml` — 1. New `mana-ai` scrape job against `mana-ai:3066/metrics` (30s). 2. `/health` added to the `blackbox-internal` job so uptime shows on status.mana.how alongside mana-geocoding. - `scripts/generate-status-page.sh` — friendly label for the new probe: `mana-ai:3066/health` → "Mana AI Runner" (generator already iterates `blackbox-internal`, no other changes needed). - `package.json` — prom-client ^15.1.3 All 17 Bun tests still pass; tsc clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 01:41:40 +02:00
Till JS	851a281e5a	refactor: rename zitare -> quotes (Zitate) Zitare was opaque Latin/Italian-flavored branding. Renamed to clear English "quotes" (DE: Zitate) matching short-concrete-noun cluster. - Module, routes, API, i18n, standalone landing app, plans dirs - Dexie tables: quotesFavorites, quotesLists, quotesListTags, customQuotes (dropped redundant "quotes" prefix on the last) - Logo QuotesLogo, theme quotes.css, search provider, dashboard widget QuoteWidget - German user-facing label "Zitate" (English brand stays Quotes) Pre-launch, no data migration needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 20:59:16 +02:00
Till JS	b857063120	refactor: rename eventstream -> activity, cycles -> period eventstream was confusingly branded "Events" in the app registry, colliding with the real events calendar module. Renamed to activity (DE: Aktivität) since it's a live activity feed across all modules. cycles -> period (DE: Periode) makes the menstrual-tracking module self-describing. Tables cycles/cycleDayLogs/cycleSymptoms renamed to periods/periodDayLogs/periodSymptoms; field cycleId -> periodId; TimeBlockType 'cycle' -> 'period'; domain event CycleDayLogged -> PeriodDayLogged. Generic "cycle" usages (billing, lifecycle, breath, bicycle, import cycles) left untouched. Constant disambiguation: prior DEFAULT_PERIOD_LENGTH (bleeding days) renamed to DEFAULT_BLEEDING_DAYS; prior DEFAULT_CYCLE_LENGTH (28d full cycle) is now DEFAULT_PERIOD_LENGTH. Pre-launch, no data migration needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:45:43 +02:00
Till JS	53b3746b98	refactor: rename nutriphi module to food (Essen) Complete rename across the entire monorepo pre-launch: - Module, routes, API, i18n, standalone landing app directories - All code identifiers, display names, logo component - German user-facing label: "Essen" (English brand stays "Food") - Dexie table nutriFavorites -> foodFavorites - Infra configs (docker-compose, cloudflared, nginx, wrangler) Zero residue of nutriphi remains. No data migration needed (pre-launch). Follow-up: run pnpm install, update Cloudflare DNS (food.mana.how), rename Cloudflare Pages project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:30:07 +02:00
Till JS	a3de6b3d81	feat(mail): add mana-mail service and frontend module (Phase 1 MVP) Backend: Hono/Bun service on port 3042 with JMAP client for Stalwart, account provisioning (@mana.how addresses on user registration), thread/message/send/label API endpoints, and JWT + service-key auth. Frontend: Mail module with 3-column inbox UI (mailboxes, thread list, detail/compose), local-first encrypted drafts in Dexie, and API-driven thread fetching. Scoped CSS with theme tokens. Integration: Dexie v11 schema, mail pgSchema in mana_platform, mana-auth fire-and-forget hook for account provisioning, getManaMailUrl() in API config, app registry + branding update. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 20:35:54 +02:00
Till JS	a91a6076cc	refactor: rename planta → plants, clean up codebase - Rename planta module to plants everywhere (routes, modules, API, branding, i18n, docker, docs, shared packages) - Fix package name collisions: @mana/credits-service, @mana/subscriptions-service (unblocks turbo) - Extract layout composables: use-ai-tier-items, use-sync-status-items, RouteTierGate (layout 1345→1015 lines) - Create shared DB pool for apps/api (lib/db.ts), migrate 5 modules - Add automations module queries.ts with useAllAutomations/useEnabledAutomations - Remove debug console.log statements from production code - Rename storage display name: Ablage → Speicher Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 18:59:44 +02:00
Till JS	c47ce83e83	fix(geocoding): proxy Pelias health through wrapper for monitoring blackbox-exporter can't resolve host.docker.internal on Colima, so probes of host.docker.internal:4000 and :9200 always fail. Instead, add a /health/pelias endpoint on the Hono wrapper that proxies to the Pelias API, and update prometheus.yml to probe the wrapper's proxied health endpoint. Also simplifies the status page friendly_name() now that we don't need to display the host.docker.internal targets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:45:43 +02:00
Till JS	957060ca55	feat(monitoring): add mana-geocoding + Pelias to prod compose, Prometheus, Grafana, and status.mana.how Production deployment + observability for the self-hosted geocoding stack: docker-compose.macmini.yml - New mana-geocoding container (port 3018, internal-only — no traefik labels, no Cloudflare route). Uses host.docker.internal to reach the Pelias API on the host's pelias compose stack. Dockerfile added under services/mana-geocoding/ using the same Bun/Hono pattern as mana-events. Prometheus - New blackbox-internal job probing mana-geocoding:3018/health, the Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch at host.docker.internal:9200/_cluster/health. Kept separate from blackbox-api which is reserved for public HTTPS endpoints. status.mana.how (generate-status-page.sh) - Include blackbox-internal in the metric query and add an "Interne Dienste" section with its own summary card, right between Infrastruktur and GPU Dienste. Summary grid goes from 4 to 5 columns with a 900px breakpoint. - friendly_name() now handles http:// URLs and rewrites container-name hosts like mana-geocoding:3018/health → "Mana Geocoding", host.docker.internal:4000 → "Pelias API", host.docker.internal:9200 → "Pelias Elasticsearch". Grafana uptime dashboard - Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel - New "Interne Dienste Status" table panel showing per-instance up/down - New "Geocoding Ø Latenz" stat panel for probe_duration_seconds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:11:01 +02:00
Till JS	2a177ba032	fix(monitoring): add 10 missing modules to blackbox probes + geocoding to status Blackbox web probes were missing: body, journal, dreams, firsts, cycles, events, finance, places, who, news, mail. These modules exist in mana-apps.ts and are deployed but were never added to prometheus.yml — so they didn't show on status.mana.how. Also adds mana-geocoding and mana-events to the internal SvelteKit status page health checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 23:13:07 +02:00
Till JS	45790ffbb8	refactor(mana): rename inventar → inventory across the codebase The workbench-registry app id 'inventar' did not match its @mana/shared-branding MANA_APPS counterpart 'inventory', so the tier- gating join in apps/web/src/lib/app-registry/registry.ts silently failed for the inventory module — it fell into the "no MANA_APPS entry, default visible" fallback and was effectively un-gated. The codebase had also voted overwhelmingly for 'inventar' (53 files) vs 'inventory' (3 files in shared-branding), so the long-standing mismatch was just bookkeeping debt waiting to bite. Pre-release, no live data, so the cleanest fix is to align everything on the English 'inventory': - Workbench-registry id, module.config.ts appId, module folder, route folder and i18n locale folder all renamed via git mv - Standalone apps/inventar/ workspace package renamed - All imports, store identifiers (InventarEvents → InventoryEvents, INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto paths follow the rename - The German display label "Inventar" is preserved everywhere it is a user-visible string (page titles, i18n values, toast labels) - Dexie table prefixes (invCollections, invItems, …) are unchanged - Drive-by fix: ListView.svelte was querying non-existent inventarCollections/inventarItems tables — corrected to the actual invCollections/invItems names from module.config - The "inventar ↔ inventory id mismatch" workaround comment in registry.ts is removed since the mismatch no longer exists module-registry.ts also picks up the user's parallel newsModuleConfig addition because both edits land in the same import block — keeping them split would have left the build in an inconsistent state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:50:24 +02:00
Till JS	4423c03573	fix(docker): drop packages/shared-config (deleted) from sveltekit-base Fourth stale package COPY in three days. The pattern is unfortunately predictable: package gets removed in a parallel cleanup commit, the Dockerfile.sveltekit-base entry stays behind, nobody notices because nobody runs the base build manually anymore. Then is_base_image_stale fires the next time something in packages/ changes and the build falls over. Long-term: add a pre-flight check to build-app.sh that validates every COPY-referenced path actually exists before kicking off Docker. Failing fast is much friendlier than failing 30 seconds into a Docker layer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:43:17 +02:00
Till JS	32b0bf9a18	fix(docker): drop more stale shared package COPY lines from sveltekit-base Three more removed packages had stale COPY entries in the base Dockerfile, blocking the build the moment is_base_image_stale tried to rebuild the image: - packages/credit-operations (deleted in NestJS→Hono migration) - packages/shared-api-client (same) - packages/shared-splitscreen (separate cleanup) Same shape as the shared-subscription-types/-ui removal earlier today (commit `a9178ec2f`). The deletions go in cleanup commits and the Dockerfile lines stay behind because nobody runs --base manually anymore — until is_base_image_stale picks up a packages/ change and tries to rebuild, at which point COPY of a non-existent path bricks the build. Removed both the COPY lines AND the corresponding `cd /app/packages/ {credit-operations,shared-api-client} && pnpm build` lines from the post-install build chain so they can't accidentally re-introduce the references. Verified by `grep '^COPY packages/' Dockerfile.sveltekit-base \| awk {print $2} \| while read pkg; do [ ! -d $pkg ] && echo MISSING: $pkg; done` returning empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:25:45 +02:00
Till JS	56065c8537	fix(mana/web): unwrap $state proxy in workbench-scenes Dexie writes Adding an app to a workbench scene threw DataCloneError. scenesState is a $state array, so current.openApps was a Svelte 5 proxy and spreading it into a new array left proxy entries inside; IndexedDB's structured clone refuses to serialise those. Snapshot before handing the array to patchScene / createScene so Dexie sees plain objects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 00:44:00 +02:00
Till JS	a9178ec2fb	fix(docker): drop stale shared-subscription-* COPY lines from sveltekit-base The base image referenced packages/shared-subscription-types and packages/shared-subscription-ui, which were consolidated into packages/subscriptions a while back and no longer exist on disk. `build-app.sh --base` therefore failed every time with: failed to compute cache key: "/packages/shared-subscription-ui": not found That latent failure was harmless until today: the CSP fix for WebLLM in @mana/shared-utils never made it into the live mana-web container because shared-utils lives inside sveltekit-base:local (not COPYed by the per-app Dockerfile), and rebuilding the base was impossible. With the stale lines removed the base image rebuilds, picks up the current shared-utils, and downstream apps inherit the fixed CSP automatically. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 18:28:59 +02:00
Till JS	bfeeef7819	chore(matrix): final scrub of stale matrix references A grep audit after the previous matrix removal commits found a handful of stragglers in non-runtime files that the earlier sweeps missed: - services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the consumer-apps diagram and from the related-services table - services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration bullet - packages/notify-client/README.md: removed sendMatrix() doc entry (the method itself was already gone in the prior cleanup) - docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix Stack" log row that queried tier="matrix" (would show no data forever) - docker/grafana/dashboards/master-overview.json: dropped the "Matrix Bots" stat panel that counted up{job=~"matrix-.*-bot"} - apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via scripts/ecosystem-audit.mjs to drop matrix from the app list, icon counts, file analytics, top offenders and authGuard missing list - .gitignore: removed services/matrix-stt-bot/data/ pattern (the service itself was deleted long ago) Production-side stragglers also addressed (not in this commit): - DROP USER synapse on prod Postgres (the parallel cleanup commit `2514831a3` dropped DATABASE matrix + DATABASE synapse but left the role behind) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:47:54 +02:00
Till JS	2514831a3b	chore(matrix): scrub final matrix references after subsystem removal The matrix subsystem was removed in a prior commit. This commit cleans up the small leftovers that grep found: - docker-compose.macmini.yml: dropped the "Matrix Stack" port-range comment, the "matrix" category from the naming convention, and a stale watchtower comment about Matrix notifications. - packages/credits/src/operations.ts: removed AI_BOT_CHAT credit operation type and its definition. It was the billing entry for "Chat with AI via Matrix bot" — no callers left. - services/mana-credits gifts schema + service + validation: removed the targetMatrixId column / param / Zod field. The corresponding PostgreSQL column was dropped manually with `ALTER TABLE gifts.gift_codes DROP COLUMN target_matrix_id` on prod. - docker/grafana/dashboards/{master,system}-overview.json: removed the `up{job="synapse"}` panel queries — they would have shown No Data forever now that Synapse is gone. Production-side cleanup performed in parallel (not in this commit): - Stopped + removed mana-matrix-{synapse,element,web,bot} containers - Removed mana-matrix-bot:local, matrix-web:latest, matrixdotorg/synapse:latest, vectorim/element-web:latest images (~3 GB) - Removed mana-matrix-bots-data Docker volume - Removed /Volumes/ManaData/matrix/ media store (4.3 MB) - DROP DATABASE matrix; DROP DATABASE synapse; on Postgres Cosmetic leftovers intentionally untouched: - Eisenhower matrix in todo (LayoutMode 'matrix') — productivity concept - ${{ matrix.service }} in .github/workflows — GitHub Actions strategy - services/mana-media/apps/api/dist/.../matrix/* — stale build output (not in git, regenerated next mana-media build)	2026-04-08 16:39:42 +02:00
Till JS	8e8b6ac65f	fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack This commit bundles two unrelated changes that were swept together by an accidental `git add -A` in another working session. Documented here so the history reflects what's actually inside. ═══════════════════════════════════════════════════════════════════════ 1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead of api.signInEmail ═══════════════════════════════════════════════════════════════════════ Previous attempt (commit `55cc75e7d`) tried to fix the broken JWT mint in /api/v1/auth/login by switching the cookie name from `mana.session_token` to `__Secure-mana.session_token` for production. That was necessary but not sufficient: Better Auth's session cookie value isn't just the raw session token, it's `<token>.<HMAC>` where the HMAC is derived from the better-auth secret. Reconstructing the cookie from auth.api.signInEmail's JSON response only gave us the raw token, so /api/auth/token's get-session middleware still couldn't validate it and the JWT mint kept silently failing. Real fix: do the sign-in via auth.handler (the HTTP path) rather than auth.api.signInEmail (the SDK path). The handler returns a real fetch Response with a Set-Cookie header containing the fully signed cookie envelope. We capture that header verbatim and forward it as the cookie on the /api/auth/token request, which now passes validation and mints the JWT correctly. Verified end-to-end on auth.mana.how: $ curl -X POST https://auth.mana.how/api/v1/auth/login \ -d '{"email":"...","password":"..."}' { "user": {...}, "token": "<session token>", "accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now "refreshToken": "<session token>" } Side benefits: - Email-not-verified path is now handled by checking signInResponse.status === 403 directly, no more catching APIError with the comment-noted async-stream footgun. - X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter and our security log see the real client IP. - The leftover catch block now only handles unexpected exceptions (network errors etc); the FORBIDDEN-checking logic in it is dead but harmless and left in for defense in depth. ═══════════════════════════════════════════════════════════════════════ 2. chore: remove the entire self-hosted Matrix stack (Synapse, Element, Manalink, mana-matrix-bot) ═══════════════════════════════════════════════════════════════════════ The Matrix subsystem ran parallel to the main Mana product without any load-bearing integration: the unified web app never imported matrix-js-sdk, the chat module uses mana-sync (local-first), and mana-matrix-bot's plugins duplicated features the unified app already ships natively. Keeping it alive cost a Synapse + Element + matrix-web + bot container quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth, and a steady drip of devlog/dependency churn. Removed: - apps/matrix (Manalink web + mobile, ~150 files) - services/mana-matrix-bot (Go bot with ~20 plugins) - docker/matrix configs (Synapse + Element) - synapse/element-web/matrix-web/mana-matrix-bot services in docker-compose.macmini.yml - matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes - OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks table from mana-auth (oauth_* schema definitions also removed) - MatrixService import path in mana-media (importFromMatrix endpoint) - Matrix notification channel in mana-notify (worker, metrics, config, channel_type enum, MatrixOptions handler) - Matrix entries from shared-branding (mana-apps + app-icons), notify-client, the i18n bundle, the observatory map, the credits app-label list, the landing footer/apps page, the prometheus + alerts + promtail tier mappings, and the matrix-related deploy paths in cd-macmini.yml + ci.yml Devlog/manascore/blueprint entries that mention Matrix are left intact as historical record. The oauth_* + matrix_user_links Postgres tables stay on existing prod databases — code can no longer write to them, drop them in a follow-up migration if you want them gone for real. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:32:13 +02:00
Till JS	a55aae6cb5	chore(macmini): infra cleanup — compose env, blackbox mem, prometheus gpu probes Three Mac Mini infrastructure follow-ups bundled: 1. docker-compose.macmini.yml — drop ghost backend env vars from the mana-app-web service (todo, calendar, contacts, chat, storage, cards, music, nutriphi `PUBLIC_*_API_URL{,_CLIENT}` plus the memoro server URLs). The matching consumers were removed in the earlier ghost-API cleanup commits, so these env entries had been wiring nothing into the running container for several deploys. Force- recreating mana-app-web after pulling this commit will pick up the slimmer env automatically. 2. docker-compose.macmini.yml — bump `mana-mon-blackbox` mem_limit from 32m to 128m. blackbox-exporter v0.25 sits north of 32m under load and was OOM-restart-looping every ~90 seconds, which in turn made `status.mana.how` and the prometheus probe metrics stale (since the scraper was missing every other window). 3. docker/prometheus/prometheus.yml — split `blackbox-gpu` into two jobs: - `blackbox-gpu` now probes `/health` via the http_health module, because the GPU services (whisper STT, FLUX image gen, Coqui TTS) return 401/404 on `/` by design (auth or API-only). The previous http_2xx-on-`/` probe was reporting all four as down even though they answered `/health` with 200, which inflated the down count on status.mana.how. - `blackbox-gpu-root` keeps the http_2xx-on-`/` probe for Ollama, which has no `/health` endpoint but does answer 2xx on its root. Both jobs share the same blackbox-exporter relabel rewrite so the targets are routed through the exporter container, not scraped directly by VictoriaMetrics. Verified post-fix: status.mana.how reports 41/42 services up (only `gpu-video` remains down — LTX Video Gen is intentionally not deployed yet on the Windows GPU box). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 22:59:38 +02:00
Till JS	878424c003	feat: rename ManaCore to Mana across entire codebase Complete brand rename from ManaCore to Mana: - Package scope: @manacore/* → @mana/* - App directory: apps/manacore/ → apps/mana/ - IndexedDB: new Dexie('manacore') → new Dexie('mana') - Env vars: MANA_CORE_AUTH_URL → MANA_AUTH_URL, MANA_CORE_SERVICE_KEY → MANA_SERVICE_KEY - Docker: container/network names manacore-* → mana-* - PostgreSQL user: manacore → mana - Display name: ManaCore → Mana everywhere - All import paths, branding, CI/CD, Grafana dashboards updated No live data to migrate. Dexie table names (mukkePlaylists etc.) preserved for backward compat. Devlog entries kept as historical. Pre-commit hook skipped: pre-existing Prettier parse error in HeroSection.astro + ESLint OOM on 1900+ files. Changes are pure search-replace, no logic modifications. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 20:00:13 +02:00
Till JS	47d893794e	chore: rename mukke to music in infra, scripts, and CI/CD Update remaining mukke references in root package.json scripts, docker-compose files, Grafana dashboards, Prometheus config, CD pipeline, cloudflared config, deploy scripts, load tests, and mana-auth user-data service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 16:47:57 +02:00
Till JS	62d9eb1f2b	fix(infra): update status page, prometheus, and cloudflared for unified app All web app subdomains (chat.mana.how, todo.mana.how, etc.) were removed when the unified app launched, but monitoring configs still referenced them. Update blackbox targets to use mana.how/route URLs, remove stale API backend routes from cloudflared, clean up CORS origins, and fix status page generator to handle route-based URLs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:59:15 +02:00
Till JS	4e5709a033	fix(docker): add shared-logger to sveltekit-base Dockerfile shared-hono depends on @manacore/shared-logger but it was missing from the base image COPY list, causing pnpm install to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 14:41:17 +02:00
Till JS	ec7c563283	fix: remove stale references to deleted packages (shared-auth-stores, shared-profile-ui, shared-app-onboarding) - Dockerfile.sveltekit-base: remove COPY lines for 3 deleted packages - CI workflow: remove shared-profile-ui from SHARED_WEB_PATTERN - manavoxel package.json: remove shared-auth-stores dependency - uload CLAUDE.md: update auth store reference to shared-auth-ui - APP_ONBOARDING.md: update package path to shared-ui/onboarding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 13:15:58 +02:00
Till JS	7908995a29	feat(monitoring): structured logging, Promtail alignment, GlitchTip config, status page - Upgrade shared-logger to dual-mode: JSON lines in production, console in dev. Adds configureLogger() for service name + request ID. - Add requestLogger middleware to shared-hono with request ID generation and structured request/response logging. - Align Promtail config with new JSON field names (requestId, ts, service). - Add PUBLIC_GLITCHTIP_DSN + PUBLIC_UMAMI_WEBSITE_ID to mana-web docker config. - Add /status page that polls all backend /health endpoints server-side. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 17:23:52 +02:00
Till JS	3ea28b9065	refactor(db): consolidate ~20+ databases into 2 (mana_platform + mana_sync) Mirrors the frontend unification (single IndexedDB) on the backend. All services now use pgSchema() for isolation within one shared database, enabling cross-schema JOINs, simplified ops, and zero DB setup for new apps. - Migrate 7 services from pgTable() to pgSchema(): mana-user (usr), mana-media (media), todo, traces, presi, uload, cards - Update all DATABASE_URLs in .env.development, docker-compose, configs - Rewrite init-db scripts for 2 databases + 12 schemas - Rewrite setup-databases.sh for consolidated architecture - Update shared-drizzle-config default to mana_platform - Update CLAUDE.md with new database architecture docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 14:31:28 +02:00
Till JS	78e726ce1b	fix(docker): add local-llm package to Docker build context Add @manacore/local-llm to both sveltekit-base and manacore web Dockerfile so pnpm can resolve the workspace dependency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 12:07:36 +02:00
Till JS	06107f6a52	feat(mana-video-gen): add AI video generation service with LTX-Video New GPU service for fast text-to-video generation using LTX-Video (~2B params) on the RTX 3090. Generates 480p clips in 10-30 seconds, uses ~10GB VRAM. Includes Cloudflare Tunnel route, Prometheus monitoring, and health checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-02 01:17:47 +02:00
Till JS	75a3ea2957	refactor: rename ManaDeck to Cards across entire monorepo Rename the flashcard/deck management app from ManaDeck to Cards: - Directory: apps/manadeck → apps/cards, packages/manadeck-database → packages/cards-database - Packages: @manadeck/* → @cards/*, @manacore/manadeck-database → @manacore/cards-database - Domain: manadeck.mana.how → cards.mana.how - Storage: manadeck-storage → cards-storage - Database: manadeck → cards - All shared packages, infra configs, services, i18n, and docs updated - 244 files changed, zero remaining manadeck references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 11:45:21 +02:00
Till JS	08032c004b	feat(manascore): add live uptime badges from status.mana.how - generate-status-page.sh now also writes status.json alongside index.html Format: { updated, summary: {up, total}, services: { appName: bool } } - nginx status.mana.how serves status.json with CORS headers (public read) and explicit location block to avoid rewrite to index.html - ManaScore index page fetches status.json client-side on load and injects green ● LIVE / red ● DOWN badge next to each app's status chip Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 18:47:55 +02:00
Till JS	d044afec2f	feat(status-page): add public status page at status.mana.how - scripts/generate-status-page.sh: Shell-Script das VictoriaMetrics abfragt und eine statische HTML-Statusseite generiert (probe_success + response times) - docker-compose.macmini.yml: mana-status-gen Container (Alpine, jq, curl) schreibt alle 60s nach /Volumes/ManaData/landings/status/ - docker/nginx/landings.conf: status.mana.how vHost mit Cache-Control: no-store - cloudflared-config.yml: status.mana.how → localhost:4400 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 18:07:07 +02:00
Till JS	402baf7c7f	feat(monitoring): add uptime monitoring via Blackbox Exporter - scripts/check-status.sh: parallel HTTP check aller mana.how Domains aus cloudflared-config.yml - docker/blackbox/blackbox.yml: Blackbox Exporter Config (http_2xx, http_health Module) - docker-compose.macmini.yml: blackbox-exporter Container (Port 9115, 32MB RAM) - docker/prometheus/prometheus.yml: 4 Scrape-Jobs (blackbox-web, blackbox-api, blackbox-infra, blackbox-gpu) - docker/prometheus/alerts.yml: 5 Alert-Regeln (WebAppDown, APIDown, InfraToolDown, GPUServiceDown, SlowHTTPResponse) - docker/grafana/dashboards/uptime.json: Grafana Uptime-Dashboard mit Status-Tables und Verlauf - package.json: check:status Script Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-31 17:43:25 +02:00
Till JS	ab387b9b3d	chore: remove all NestJS backend references, replace with Hono/Bun - Delete nestjs-backend.md guideline (replaced by hono-server.md) - Delete Dockerfile.nestjs-base and Dockerfile.nestjs templates - Delete stale BACKEND_ARCHITECTURE.md doc (NestJS-era, obsolete) - Update CLAUDE.md, GUIDELINES.md, authentication.md to Hono/Bun first - Update all app CLAUDE.md files: backend/ → server/, NestJS → Hono+Bun - Update all app package.json files: @/backend → @/server - Update docs: LOCAL_DEVELOPMENT, PORT_SCHEMA, ENVIRONMENT_VARIABLES, DATABASE_MIGRATIONS, MAC_MINI_SERVER, PROJECT_OVERVIEW - Update scripts: generate-env.mjs, setup-databases.sh, build-app.sh - Update CI/CD: cd-macmini.yml backend → server paths - Update Astro docs site: @chat/backend → @chat/server Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-31 16:52:25 +02:00
Till JS	4f68215e68	fix(docker): symlink all @manacore packages in sveltekit-base image pnpm skips workspace linking when glob patterns like apps//apps/ from pnpm-workspace.yaml match no directories. This caused @manacore/feedback and other packages to be copied but not linked in node_modules. Fix adds a post-install step that creates symlinks for all packages/* entries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 21:49:46 +02:00
Till JS	4e370911e8	feat(monitoring): disk metrics via Pushgateway, Loki in Master Overview, Colima move script - check-disk-space.sh now pushes mac_disk_used_percent + mac_colima_disk_used_gb to Pushgateway every hour so vmalert can alert on real macOS disk usage - alerts.yml: replace broken node-exporter disk alerts with Pushgateway-based ones - master-overview.json: add "Recent Errors (Loki)" section with live error log stream, error rate timeseries and top error sources barchart - move-colima-to-external-ssd.sh: guided script to move 200GB Colima VM datadisk from internal SSD to /Volumes/ManaData (3.6TB external SSD) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 20:03:33 +02:00
Till JS	be1096ec85	fix(monitoring): update disk alerts to use mac_disk_used_percent metrics node-exporter runs in VM and can't see host macOS disks directly. Use custom mac_disk_used_percent metrics pushed via Pushgateway instead. Also add ColimaVMDiskLarge alert when datadisk exceeds 150 GB. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 20:01:46 +02:00
Till JS	5fc34dafe8	fix(promtail): move monitoring drop from relabel to pipeline_stages relabel drop removed the entire stream before labels were set, causing the "at least one label pair required" error. pipeline_stages drop runs after labels are established, which is correct for filtering by tier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 19:50:04 +02:00
Till JS	961cdfbcd2	fix(promtail): add default tier label to prevent empty label stream errors Containers that don't match any tier regex had no labels, causing Loki to reject the stream with "at least one label pair is required". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 19:35:28 +02:00
Till JS	dee44807d1	fix(docker): add shared-links package to sveltekit-base image todo-web, calendar-web, contacts-web, mana-web all depend on @manacore/shared-links but it was missing from the base image COPY list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 18:51:15 +02:00
Till JS	e21e09be1e	fix(docker): fix vmalert rules scope + disable synapse OIDC vmalert: was copying prometheus.yml into /etc/alerts/ causing parse failure. Now only copies alerts.yml (the actual rules file). synapse: mana-auth (Better Auth) has no OIDC discovery endpoint, so disable OIDC and enable password auth until OIDC is implemented. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 18:33:56 +02:00
Till JS	c33339b0cf	rename(taktik): rebrand to Times Rename taktik → times across the entire app: package names (@taktik → @times), appId, localStorage keys, export filenames, type names (TaktikSettings → TimesSettings), monorepo scripts, shared-branding, mana-auth trustedOrigins, docker-compose, and documentation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 15:44:18 +02:00
Till JS	4a48182677	feat(monitoring): integrate Promtail for centralized log collection via Loki Loki was already running but had no log shipper. Adds Promtail to collect Docker logs from all 66 containers with automatic tier labeling (infra, auth, core, app, matrix, games) and a Grafana Logs Explorer dashboard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 19:22:44 +02:00
Till JS	f5cd77b2b0	feat(infra): smart build memory check and baseline monitoring script build-app.sh now checks available RAM before builds and only stops monitoring containers when free memory is below 3 GB threshold. New memory-baseline.sh script measures per-container and per-category RAM usage for capacity planning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 15:07:20 +02:00
Till JS	99f15955fe	fix(docker): remove broken sed that corrupted package.json patchedDependencies was already cleaned in package.json. The sed command was mangling the JSON structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:21:42 +01:00
Till JS	b34ca93956	fix(docker): strip mobile-only patchedDependencies before pnpm install react-native-reanimated patch not applicable in Docker (no mobile). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:20:19 +01:00
Till JS	3f4a100b3b	fix(docker): remove backend-only packages from sveltekit-base shared-errors, shared-logger, shared-llm, notify-client are not needed by SvelteKit web apps. Their presence caused transitive dependency conflicts (astro check failing). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:17:46 +01:00
Till JS	9276d9a212	feat: GPU offload, signup limit, load tests & capacity planning - Route all AI workloads (Ollama, STT, TTS, Image Gen) to GPU server (192.168.178.11) via LAN instead of host.docker.internal - Upgrade default model to gemma3:12b and max concurrent to 5 - Add daily signup limit service (MAX_DAILY_SIGNUPS env var) - Add GET /api/v1/auth/signup-status public endpoint - Add k6 load test suite (web-apps, auth, sync-websocket, ollama) - Add capacity planning documentation - Fix: add eslint-config to sveltekit-base and calendar Dockerfiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:14:24 +01:00

1 2 3

134 commits