managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 23:41:08 +02:00

Author	SHA1	Message	Date
Till JS	4d5a96e21b	perf(invoices): lazy-load pdf-lib + swissqrbill, -516 KB on route /(app)/invoices/[id] route bundle drops from 534 KB → 18.6 KB by moving PDF rendering behind dynamic imports. Changes: - views/DetailView.svelte: `await import('../pdf/renderer')` inside renderPdf() + downloadPdf(), cached in a module-local ref. - components/SendModal.svelte: same for openAndDownload(). - pdf/scor.ts (new): generateSCORReference extracted so the invoices store can derive a reference string without pulling swissqrbill/svg + pdf-lib into the list-view bundle. - pdf/qr-bill.ts: re-exports generateSCORReference from scor.ts for backward compatibility. - stores/invoices.svelte.ts: imports from ../pdf/scor (light) instead of ../pdf/qr-bill (heavy). - index.ts: drop re-export of the PDF renderer from the module barrel so `import ... from '$lib/modules/invoices'` never drags pdf-lib in. The heavy chunk (pdf-lib + swissqrbill, ~576 KB) now only loads when a user actually opens an invoice detail — list views, create flow, and all other routes stay lean. 20/20 qr-bill tests pass; svelte-check clean. Bonus: scripts/audit-icon-usage.mjs (+ pnpm run audit:icon-usage) audits @mana/shared-icons imports. Reveals 204 distinct icons across the codebase, 199 of them at default weight but paying for all 6 Phosphor weights. Biggest offender: app-registry/apps.ts with 69 static icon imports accounting for ~290 KB of the shared 466 KB icon chunk. Migration path for that is documented in docs/optimizable/bundle-analysis.md §2 — next session's work. docs/optimizable/bundle-analysis.md also updated with the root (app) layout (260 KB) investigation notes (start/stop lifecycle hooks to defer via idleCallback). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:03:53 +02:00
Till JS	3b85d7d3d2	chore(bundle): add bundle-size audit + snapshot inventory scripts/audit-bundle.mjs reads `.svelte-kit/output/client/_app/immutable` after a prod build and reports: - Total size + category breakdown (entry / nodes / chunks / workers / assets). - Top N JS files with content heuristics (transformers.js, zxcvbn, tiptap, pdf-lib, swissqrbill, rrule, suncalc, Phosphor icon paths, Vite __vite__mapDeps metadata, etc). - Route mapping for `nodes/*.js` by parsing the server manifest's `leaf:` entries, so node 118 is identified as /(app)/invoices/[id]. - ⚠ flag on chunks/ ≥ 200 KB (shared, potentially eager). Current snapshot (docs/optimizable/bundle-analysis.md): entry 92 KB \| nodes 2.77 MB \| chunks 5.59 MB workers 22.3 MB (ONNX WASM, lazy) \| total 31.8 MB Already healthy: - 92 KB entry (no critical-path bloat). - 22 MB transformers.js WASM is worker-scoped — only fetched on first /llm-test or memoro voice use. - zxcvbn (1.25 MB combined dict + keyboard graphs) correctly behind a dynamic import in PasswordStrength.svelte. Follow-up opportunities logged: 1. /invoices/[id] = 534 KB — split swissqrbill + pdf-lib via dynamic import. 2. @mana/shared-icons = 317 + 149 KB SVG path chunks — migrate to tree-shakable per-icon imports or lazy-load. 3. Root (app) layout = 260 KB — check for module bleed into shared shell. Report-only. Run with `pnpm run audit:bundle`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:52:08 +02:00
Till JS	68c0eb2892	chore(test + audit): add test-coverage audit + wire audit:all #6 test coverage (pivot to reporting): 34/653 tests currently fail (in-flight spaces-foundation migrations). Hard coverage thresholds aren't enforceable until the suite is green, so this session ships a file-presence audit instead of line-coverage gates. - scripts/audit-test-coverage.mjs — counts .svelte + .ts source files vs .test.ts + .spec.ts per module. Reports total ratio, lists modules with 0 tests + ≥3 source files (prioritised by size). - pnpm run audit:test-coverage wires it into audit:*. - docs/optimizable/test-health.md — state + prevention path + top untested modules ranked by impact. Current baseline: 2.6% file-level coverage. 66/78 modules have zero tests. Biggest untested: times (32 src), articles (29), events (27), inventory + skilltree (20 each). #8 audit:all: single entry point for the reporting audits. Runs port-drift + i18n-coverage + test-coverage in --summary mode. Distinct from validate:all (which is gates, not reports). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:38:12 +02:00
Till JS	4d91e2daad	chore(services): add port-drift audit Each services/*/CLAUDE.md declares `## Port: NNNN` — the authoritative per-service port spec (docs/PORT_SCHEMA.md is explicitly partially aspirational). This audit verifies: 1. Declared port appears as a literal in the service's own source (catches: moved port in code but forgot to update CLAUDE.md). 2. No two services claim the same port (catches: accidental collision when scaffolding new services). Current state: ✓ 15 services, all declared ports found in code, zero collisions (mana-auth/geocoding/stt/tts/image-gen/voice-bot/mail/ credits/user/subscriptions/analytics/events/news-ingester/ai/research). Report-only; not a CI gate. Run with `pnpm run audit:port-drift`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:22:37 +02:00
Till JS	52af8c0cec	refactor(theming): migrate who semantic colours to theme tokens PlayView used Tailwind palette classes for game-status feedback: bg-emerald-500/10 + text-emerald-300 (won) → bg-success/10 + text-success bg-amber-500/10 + text-amber-300 (lost) → bg-warning/10 + text-warning border-red-500/20 + bg-red-500/10 + text-red-300 (error) → border-error/20 + bg-error/10 + text-error placeholder-white/30 focus:border-purple-400/50 → placeholder:text-muted-foreground/60 focus:border-primary/50 Semantic status now tracks the theme (errors are red in dark, darker red in light, etc.) instead of being fixed hex ramps. The `bg-purple-500` / `bg-purple-500/30` / `hover:bg-purple-600` classes on the user's chat bubble and submit buttons STAY — purple is the who module's primary identity colour (historical-deck accent `#a855f7` is semantically the same hue). Documented in brand-literals.md §who. Also harden two validators against mid-rename states where git ls-files returns paths that aren't on disk yet — both now skip unreadable files instead of crashing the pre-commit hook (caught while migrating who). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:19:53 +02:00
Till JS	eec369bd04	chore(i18n): add coverage audit + migration inventory Translation infrastructure (@mana/shared-i18n + svelte-i18n + 35 per-module locale files with ~3500 lines across de/en/it/fr/es) is fully wired, but 65/78 modules still hardcode German in .svelte templates rather than calling {$_('module.key')}. Adds: - scripts/audit-i18n-coverage.mjs — scans lib/modules/*/.svelte for hardcoded German keywords (Abbrechen, Speichern, Löschen, etc.) in files that don't import $_(). Reports per-module hit counts, bucket (FULL/PARTIAL/NONE), and whether the locale file exists. Supports --summary and --top N flags. - pnpm run audit:i18n-coverage wires it into the audit:* family (reporting only, not a CI gate — existing debt would fail validate:all otherwise). - docs/optimizable/i18n-migration-inventory.md — priority list, per-module workflow, and prevention plan. Top offenders: broadcast (26 hits), articles (24), events (23), invoices (22), quiz (20), stretch (20), library (19), profile (17), skilltree (15, PARTIAL), calendar (14, PARTIAL). Modules without a locale file (broadcast/articles/events/invoices/…) need the locale stubs scaffolded first. Real string migration is per-site careful work (key naming, 5-language parity, UI visual QA) and is left for per-module follow-up sessions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:16:55 +02:00
Till JS	430aa30cbf	refactor(theming): re-apply theme validator suite after parallel rebase The plan-doc commits `129971ffc` + `9db044178` dropped the audit-theme-tokens → validate-theme-variables rename, the validate-theme-tokens → validate-theme-utilities rename, the new validate-theme-parity script, brand-literals.md, and the corresponding package.json + lint-staged.config.js + themes.css wiring. The files still existed on disk (git mv changes survived) but were untracked. Restore the validator suite so `pnpm run validate:all` works again: - validate:theme-variables (CSS var names: --muted → --color-muted) - validate:theme-utilities (Tailwind: no white/N, no neutral palette) - validate:theme-parity (every --color-* in :root ⇔ .dark + each [data-theme="..."]) All three wired into validate:all and lint-staged. `pnpm run validate:all` is clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:07:48 +02:00
Till JS	ea71d3c215	refactor(theming): replace transition-all with specific transitions Sweep 98 `transition-all` occurrences across 62 files and replace with targeted Tailwind transition utilities. Motivation: 1. `transition-all` animates every property, including CSS custom- property-backed colours. On first paint the vars may not have resolved yet, producing the P5 "white-on-white until first interaction" rendering bug. The same bug hit food/moodlit ListViews in the earlier theme migration. 2. Specific transitions also perform better — no layout-property interpolation overhead. Codemod scripts/migrate-transition-all.mjs classifies each class attribute by its sibling classes and picks one of: - `transition-opacity` — icon fade on group-hover - `transition-[width]` — progress-bar width anim - `transition-[transform,colors,box-shadow]` — scaled buttons/cards - `transition-[border-color,box-shadow]` — card hover:border+shadow - `transition-colors` — default (card/row hover) 91 / 98 auto-classified, 7 hand-migrated: - EntryItem → transition-[box-shadow] (ring fade) - NutritionProgressWidget → transition-[stroke-dashoffset,stroke] - OnboardingModal → transition-[width,background-color] - times/reports (3×) → transition-[width] / -[height] (bar anims) - presi/present → transition-[width,background-color] (dots) svelte-check clean with 0 errors; validate:all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:57:49 +02:00
Till JS	7d6a340b13	refactor(theming): migrate remaining 738 token violations across routes + components Expand validate-theme-tokens.mjs scope from ListViews only to all lib/modules/*/.svelte and routes/(app)/*/.svelte. Add a second rule banning the neutral Tailwind palette (gray/slate/zinc/neutral/stone-N) — these should be theme tokens (bg-card, bg-muted, text-foreground, text-muted-foreground, border-border) instead. Apply one-shot codemod (scripts/migrate-theme-tokens.mjs) that replaces: bg-gray-800/900 → bg-card bg-gray-600/700 → bg-muted (with opacity preserved) border-gray-600..900 → border-border text-gray-800/900 → text-foreground text-gray-300 → text-foreground/90 text-gray-400/500/700 → text-muted-foreground placeholder-gray-* → placeholder:text-muted-foreground/60 bg/border-white/N → bg-muted/N, border-border/N text-white/70-90 → text-foreground text-white/40-60 → text-muted-foreground text-white/10-30 → text-muted-foreground/70 42 files touched; biggest: presi/deck/[id] (91 subs), uload/analytics (58), uload/+page (53), presi/+page (47), who/PlayView (35), skilltree/Edit+AddXpModal (28 each), context/* (115 across 4 pages), uload/links+tags (50 across 2). Brand-literal overlays in moodlit/components/mood/{MoodFullscreen, MoodCard,CreateMoodDialog}.svelte stay unmigrated — they render on vivid colour gradients. Validator exempts these 3 files from the white-alpha rule; they still obey the neutral-palette rule. Result: 527 files pass validate:theme-tokens; svelte-check clean with 0 errors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:42:55 +02:00
Till JS	a2a43b1d5a	refactor(theming): migrate 6 ListViews + ai-missions badges to theme tokens Replace raw white-alpha Tailwind utilities (text-white/x, bg-white/x, border-white/x) with canonical theme tokens (text-foreground, bg-muted, border-border, etc.) in cards, context, food, moodlit, storage, music ListViews. Replace hardcoded hex badge/dot/phase colors in ai-missions with success/warning/error/primary tokens. Fix two transition-all bugs (food:160, moodlit:223) that prevented CSS custom property colors from resolving on first paint under theme switches. Add scripts/validate-theme-tokens.mjs to prevent regression; run via pnpm run validate:theme-tokens. Not yet in validate:all — 12 modules still use raw white utilities (citycorners, guides, inventory, memoro, picture, plants, playground, presi, questions, times, uload, who). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 15:23:55 +02:00
Till JS	1861e89d45	chore(broadcast): wire mana-mail into env pipeline + push schema The three final pre-dogfood items: 1. drizzle.config: schemaFilter now includes 'broadcast' alongside 'mail'. Without this, `bun run db:push` skipped the broadcast tables — schema existed in code but not in Postgres. Tested via db:push + psql \dt (3 tables created: campaigns, events, sends). 2. .env.development: new MANA-MAIL SERVICE section with Stalwart knobs + broadcast config (tracking secret, rate limits, send throttle). DEV secret is explicitly labelled non-production — prod rotates via env. 3. generate-env.mjs: new block writes services/mana-mail/.env on `pnpm setup:env`. Mirrors the invoices / research / events pattern. All 16 broadcast/mail vars flow through from SSOT. Verified end-to-end: - pnpm setup:env → services/mana-mail/.env contains BROADCAST_TRACKING_SECRET + rate limits - bun run src/index.ts → /health returns 200 with the new config - psql → broadcast.campaigns / events / sends are materialised Broadcast module is now fully ready to send real mail — nothing else required before the first dogfood campaign. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-21 16:21:57 +02:00
Till JS	5ec1dfc747	chore(db): enforce pgSchema isolation with a lint script The "every Drizzle table uses pgSchema" rule was documented in .claude/guidelines/database.md (added yesterday as part of Concern 5) but enforced only by convention. A new service could slip a raw \`pgTable()\` past review and collide in the default \`public\` schema of \`mana_platform\`, and nothing would surface the mistake until a production migration failed. - \`scripts/validate-pg-schema-isolation.mjs\` scans every tracked TypeScript file under services/, apps/api/, packages/ for call sites of \`pgTable(\` (not imports — imports can still be useful for types). Strips comments before matching so doc-examples like "use \`pgTable()\`" don't trigger false positives. - Wired as \`pnpm run validate:pg-schema\` and a new CI step in the validate job (right after the turbo-recursion check). 721 files scan clean today. - Removed an unused \`pgTable\` import in mana-subscriptions that would have been the only import of the symbol remaining after this change. - Updated .claude/guidelines/database.md — the old verification blurb said "no automated lint rule yet", now points at the enforcer. Drift verified: injecting a synthetic \`pgTable('bad', {})\` into subscriptions.ts failed with a clear file:line violation pointing at the database guideline. Closes the "no automated lint rule" gap noted in the database guideline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:45:59 +02:00
Till JS	1eda3f5395	chore(turbo): lint against recursive \`turbo run\` calls in child packages CLAUDE.md flagged this as "CRITICAL" — a child package.json defining e.g. \`"build": "turbo run build"\` causes a 10+ minute CI hang with thousands of duplicate task spawns. The rule was documented but never enforced, so it re-emerged every couple of months as someone copied a parent script pattern. - \`scripts/validate-no-recursive-turbo.mjs\` walks every tracked package.json (via \`git ls-files\`, so node_modules is auto-skipped) and fails if any non-root package has build/type-check/lint/test/ test:coverage/check scripts containing \`turbo run\`. \`dev\` stays allowed — delegating it from a parent is the intended ergonomic. - Wired as \`pnpm run validate:turbo\` + a new CI step in the validate job (before type-check — fails fast). - CLAUDE.md §Turborepo updated to point at the enforcer and call out the full task list (test/test:coverage/check were missing from the original prose). Verified: 138 non-root package.json files scan clean. Drift simulation (injecting \`"build": "turbo run build"\` into apps/mana/apps/web) fails with a clear message pointing at the offending file + script + fix. This closes audit item #32 from the architecture review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:39:32 +02:00
Till JS	c7af693c6d	feat(crypto): Phase C — build-time registry ↔ Dexie audit Before: adding a new Dexie table left the encryption decision implicit. If you forgot to register it, the table silently shipped in plaintext forever — no error, no warning, no footprint anywhere. The architecture audit flagged this as the root of Concern 1. - `scripts/audit-crypto-registry.mjs` parses database.ts's `.stores()` blocks and registry.ts's entries, then enforces three invariants: 1. Every Dexie table is either in the encryption registry OR in the new `plaintext-allowlist.ts` — one conscious classification per table. 2. No dead registry entries (referring to tables that no longer exist in Dexie). 3. No table appears in both — single authoritative source. - `plaintext-allowlist.ts` auto-seeded from current state. 105 entries, each tagged `// TODO: audit` as an invitation to review whether the table truly holds nothing sensitive. The allowlist is intentionally a separate file so additions are reviewable on their own (not buried inside database.ts schema bumps). - Wired into `pnpm run check:crypto` + CI validate job — a new table now fails the PR check instead of slipping past review. - `check:crypto:seed` regenerates the allowlist if ever needed. Verified: drift simulation (removing aiMissions from the allowlist) fails the audit with a clear message pointing at the missing classification. Current state passes: 187 Dexie tables, 82 encrypted, 105 explicit plaintext. Concern 1 is now fully closed (A: typed registry entries, B: dev-mode runtime drift check, C: build-time audit enforcing coverage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-20 14:36:32 +02:00
Till JS	97abd251e3	fix(events): Eventbrite provider — switch from dead API to web scraping Eventbrite shut down their public Event Search API (/v3/events/search) in 2023. The provider now uses the website extractor pipeline (mana-research + LLM) to scrape Eventbrite's public search pages. No API key needed — same pipeline as any website source. Also adds mana-events to generate-env.mjs for automatic .env generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-18 16:51:58 +02:00
Till JS	2bdb48bdd1	feat(research): add mana-research service — Phase 1 + 2 New Bun/Hono service on port 3068 that bundles many web-research providers behind a unified interface for side-by-side comparison. All eval runs persist in research.* (mana_platform) so quality can be reviewed later. Providers (Phase 1+2): search: searxng, duckduckgo, brave, tavily, exa, serper extract: readability (via mana-search), jina-reader, firecrawl Endpoints: POST /v1/search, /v1/search/compare — single + fan-out POST /v1/extract, /v1/extract/compare — single + fan-out GET /v1/runs, /v1/runs/:id — history POST /v1/runs/:run/results/:id/rate — manual eval GET /v1/providers, /v1/providers/health — catalog + readiness Auto-routing: when `provider` is omitted, queries are classified via regex (fast path, 0ms) with optional mana-llm fallback, then routed to the first available provider for that query type (news → tavily, academic → exa, semantic → exa, etc.). Credits: server-key calls go through mana-credits reserve → commit/refund so failed provider calls don't charge the user. BYO-keys supported via research.provider_configs (UI arrives in Phase 4). Cache: Redis with graceful degradation (1h TTL for search, 24h for extract). Pay-per-use APIs only — no subscription-gated providers. Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:42:25 +02:00
Till JS	fc028fa8f0	chore(lint): audit:theme-tokens guard against bare --muted / --theme-* drift Three naming conventions had drifted through the monorepo (--muted, --theme-, --color-). Only the last is defined in the Mana theme; the others silently fell back to nothing and stopped tracking theme variants. Today's cleanup migrated ~100 files, but nothing stopped the drift from creeping back. - scripts/audit-theme-tokens.mjs scans ~3k source files and fails if any references a bare shadcn token or a --theme-* prefix, with an allowlist for known-literal module brand colors (news-research, agent templates) - wire into pnpm script and lint-staged (runs once per commit touching *.{svelte,css}, ignores per-file args) - design-ux.md guideline: fix stale --color-destructive entry (Mana uses --color-error), add explicit "never bare tokens" warning with examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 00:58:13 +02:00
Till JS	4c8034f9d0	chore(dev): seed real credit balance in setup-dev-user.sh The shared-hono credits client returns DEFAULT_BALANCE=1000 when /api/v1/internal/credits/balance/:userId responds with no row, so local-dev accounts silently diverge from production — credit-gated flows look free in dev and only blow up after deploy. Seeding a real credits.balances row makes the fallback unreachable and the dev stack exercises the same code path as prod. Default is 10_000 credits (overridable via CREDITS env var) and is applied alongside the existing tier + role + sync-gift upserts, so setup-dev-user.sh stays a single idempotent pass. Existing dev accounts (tills95, tilljkb, rajiehq) were backfilled manually once; re-running the script won't clobber a higher balance because the ON CONFLICT uses GREATEST. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:51:39 +02:00
Till JS	53fb3232f3	chore(dev): also grant role=admin in setup-dev-user.sh Admin-gated backend endpoints (e.g. POST /api/v1/admin/sync/:id/gift, GET /api/v1/admin/users/:id/tier) check auth.users.role === 'admin', which is orthogonal to access_tier. The script was already lifting every dev account to tier=founder but left role at the 'user' default, so founders couldn't exercise the admin UI flows against their local stack. Wire role alongside tier (both via env-overridable defaults) and reflect it in the success output so re-runs surface what's being applied. Backfilled the existing three dev accounts (tills95, tilljkb, rajiehq) to role=admin manually once; re-running the script now is idempotent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:18:31 +02:00
Till JS	0ef650d94a	chore(dev): run mana-credits locally and gift sync to dev users Two halves of the same "why is sync inactive in dev" fix: - package.json: new dev:credits script and mana-credits added to the dev:mana:servers concurrently group. The service was never started by pnpm dev:mana:all, so the frontend's GET /api/v1/sync/status failed, syncBilling.load() caught the error and defaulted to inactive — while mana-sync (Go) was actually fail-open on the billing check, making the UI indicator lie about the backend state. - scripts/dev/setup-dev-user.sh: after the existing email-verify + tier-lift UPDATE, upsert a row into credits.sync_subscriptions with is_gifted=true. Mirrors what POST /api/v1/admin/sync/:id/gift would do, so every new dev user gets Cloud Sync from the first login without a separate admin call. The credits schema lives inside mana_platform, so no new database needed — just a second statement in the same psql heredoc. Existing dev users (tills95, tilljkb, rajiehq) were backfilled manually with the same INSERT … ON CONFLICT DO UPDATE once; future runs of setup-dev-user.sh stay idempotent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 18:09:26 +02:00
Till JS	0bf01f434e	feat(mana-ai): Prometheus /metrics endpoint + status.mana.how integration Wires mana-ai into the existing observability stack so tick throughput, plan-failure rates, planner latencies, and snapshot refresh health are visible in Grafana + Prometheus, and the service's uptime surfaces on status.mana.how under the "Internal" section. - `src/metrics.ts` — prom-client Registry with `mana_ai_` prefix. Counters: ticks_total, plans_produced_total, plans_written_back_total, parse_failures_total, mission_errors_total, snapshots_new/updated, snapshot_rows_applied_total, http_requests_total. Histograms: tick_duration_seconds (0.1–120s), planner_request_ duration_seconds (0.25–60s), http_request_duration_seconds (0.005–10s). - `src/index.ts` — HTTP middleware labels every request by method/path/status; `/metrics` serves the Prometheus text format. - `src/cron/tick.ts` — increments counters + wraps the tick with `tickDuration.startTimer()`. Snapshot stats fold through. - `src/planner/client.ts` — wraps `complete()` in a latency histogram timer so planner tail latency shows up separately from tick duration. - `docker/prometheus/prometheus.yml` — 1. New `mana-ai` scrape job against `mana-ai:3066/metrics` (30s). 2. `/health` added to the `blackbox-internal` job so uptime shows on status.mana.how alongside mana-geocoding. - `scripts/generate-status-page.sh` — friendly label for the new probe: `mana-ai:3066/health` → "Mana AI Runner" (generator already iterates `blackbox-internal`, no other changes needed). - `package.json` — prom-client ^15.1.3 All 17 Bun tests still pass; tsc clean. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 01:41:40 +02:00
Till JS	851a281e5a	refactor: rename zitare -> quotes (Zitate) Zitare was opaque Latin/Italian-flavored branding. Renamed to clear English "quotes" (DE: Zitate) matching short-concrete-noun cluster. - Module, routes, API, i18n, standalone landing app, plans dirs - Dexie tables: quotesFavorites, quotesLists, quotesListTags, customQuotes (dropped redundant "quotes" prefix on the last) - Logo QuotesLogo, theme quotes.css, search provider, dashboard widget QuoteWidget - German user-facing label "Zitate" (English brand stays Quotes) Pre-launch, no data migration needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 20:59:16 +02:00
Till JS	7c1c6cd54c	chore(audit): module complexity reports + workbench map Adds four audit scripts (module health, inter-module coupling, per-function cognitive complexity, D3 treemap) with generated reports under docs/ and an iframe-embedded workbench app at /admin/complexity. Reports regenerate weekly via the module-health GitHub Action. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 19:47:42 +02:00
Till JS	53b3746b98	refactor: rename nutriphi module to food (Essen) Complete rename across the entire monorepo pre-launch: - Module, routes, API, i18n, standalone landing app directories - All code identifiers, display names, logo component - German user-facing label: "Essen" (English brand stays "Food") - Dexie table nutriFavorites -> foodFavorites - Infra configs (docker-compose, cloudflared, nginx, wrangler) Zero residue of nutriphi remains. No data migration needed (pre-launch). Follow-up: run pnpm install, update Cloudflare DNS (food.mana.how), rename Cloudflare Pages project. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:30:07 +02:00
Till JS	a2f05409a4	chore(mail): add infra — port 3042, DB schema setup, pnpm install Reserves port 3042 in PORT_SCHEMA.md, adds mail pgSchema to setup-databases.sh and init-db scripts, installs mana-mail workspace dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 20:42:12 +02:00
Till JS	a9c51517eb	fix(presi): wire up db:push for presi schema via @mana/api The presi module's schema was defined inline in routes.ts but had no working db:push mechanism — the old references to @presi/server and @presi/backend no longer exist after consolidation. Extracts schema into its own file, adds a dedicated drizzle config, and updates the setup script so tables are actually created. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-12 14:32:44 +02:00
Till JS	3ac3a4bae4	fix(status-page): drop set -e — heredoc subshells trigger silent exits	2026-04-11 17:33:35 +02:00
Till JS	6f975a5cbe	fix(status-page): use ${TIER_APPS:-} for set -u safety	2026-04-11 17:27:09 +02:00
Till JS	56a9811263	fix(status-page): replace multi-line awk with shell loop for ash compatibility The TIER_JSON generator used a multi-line awk script embedded in a \$() command substitution with escaped double quotes inside a single-quoted awk program. Alpine's ash shell refused to parse this, reporting "syntax error: unterminated quoted string". Under set -e the syntax error killed the script BEFORE the jq call that writes status.json, so the file stopped updating after our monitoring changes triggered a full re-parse cycle. Replace the awk block with a portable while-read shell loop that ash handles cleanly. Verified with both `bash -n` and `alpine:3.20 sh -n`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:23:29 +02:00
Till JS	c47ce83e83	fix(geocoding): proxy Pelias health through wrapper for monitoring blackbox-exporter can't resolve host.docker.internal on Colima, so probes of host.docker.internal:4000 and :9200 always fail. Instead, add a /health/pelias endpoint on the Hono wrapper that proxies to the Pelias API, and update prometheus.yml to probe the wrapper's proxied health endpoint. Also simplifies the status page friendly_name() now that we don't need to display the host.docker.internal targets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:45:43 +02:00
Till JS	957060ca55	feat(monitoring): add mana-geocoding + Pelias to prod compose, Prometheus, Grafana, and status.mana.how Production deployment + observability for the self-hosted geocoding stack: docker-compose.macmini.yml - New mana-geocoding container (port 3018, internal-only — no traefik labels, no Cloudflare route). Uses host.docker.internal to reach the Pelias API on the host's pelias compose stack. Dockerfile added under services/mana-geocoding/ using the same Bun/Hono pattern as mana-events. Prometheus - New blackbox-internal job probing mana-geocoding:3018/health, the Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch at host.docker.internal:9200/_cluster/health. Kept separate from blackbox-api which is reserved for public HTTPS endpoints. status.mana.how (generate-status-page.sh) - Include blackbox-internal in the metric query and add an "Interne Dienste" section with its own summary card, right between Infrastruktur and GPU Dienste. Summary grid goes from 4 to 5 columns with a 900px breakpoint. - friendly_name() now handles http:// URLs and rewrites container-name hosts like mana-geocoding:3018/health → "Mana Geocoding", host.docker.internal:4000 → "Pelias API", host.docker.internal:9200 → "Pelias Elasticsearch". Grafana uptime dashboard - Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel - New "Interne Dienste Status" table panel showing per-instance up/down - New "Geocoding Ø Latenz" stat panel for probe_duration_seconds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:11:01 +02:00
Till JS	5647b2f8ae	fix(dx): suppress AZURE_OPENAI_API_KEY warning, honest db:push reporting - docker-compose: add empty default for AZURE_OPENAI_API_KEY to suppress Docker Compose "variable is not set" warning - setup-databases.sh: detect when pnpm filter matches no packages and report "Skipped" instead of false "Schema pushed" success Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 18:36:35 +02:00
Till JS	83eaf71e9f	fix(macmini): clean up container conflicts in build-app.sh restart cycle Some checks are pending CI / Build mana-api-gateway (push) Blocked by required conditions Details CI / Build mana-crawler (push) Blocked by required conditions Details CI / Build mana-media (push) Blocked by required conditions Details CI / Build mana-credits (push) Blocked by required conditions Details CI / Build mana-web (push) Blocked by required conditions Details CI / Build chat-backend (push) Blocked by required conditions Details CI / Build chat-web (push) Blocked by required conditions Details CI / Build todo-backend (push) Blocked by required conditions Details CI / Build todo-web (push) Blocked by required conditions Details CI / Build calendar-backend (push) Blocked by required conditions Details CI / Build calendar-web (push) Blocked by required conditions Details CI / Build clock-web (push) Blocked by required conditions Details CI / Build contacts-backend (push) Blocked by required conditions Details CI / Build contacts-web (push) Blocked by required conditions Details CI / Build presi-web (push) Blocked by required conditions Details CI / Build storage-backend (push) Blocked by required conditions Details CI / Build storage-web (push) Blocked by required conditions Details CI / Build telegram-stats-bot (push) Blocked by required conditions Details CI / Build nutriphi-backend (push) Blocked by required conditions Details CI / Build nutriphi-web (push) Blocked by required conditions Details CI / Build skilltree-web (push) Blocked by required conditions Details Docker Validate / Validate Dockerfiles (push) Waiting to run Details Docker Validate / Build calendar-web (push) Blocked by required conditions Details Docker Validate / Build todo-backend (push) Blocked by required conditions Details Docker Validate / Build todo-web (push) Blocked by required conditions Details Docker Validate / Build zitare-web (push) Blocked by required conditions Details Docker Validate / Build mana-auth (push) Blocked by required conditions Details Docker Validate / Build mana-sync (push) Blocked by required conditions Details Docker Validate / Build mana-media (push) Blocked by required conditions Details Mirror to Forgejo / Push to Forgejo (push) Waiting to run Details Hit "container name already in use" / "removal in progress" errors three times during today's Phase 5 deploys. The previous restart pattern was just `compose up -d --no-deps`, which fails when: 1. A previous interrupted recreate left a stale container under the canonical name. The new `up` tries to claim the name and gets a conflict. 2. Compose's recovery from #1 sometimes creates a hash-prefixed orphan container (`<hash>_<container_name>`), which then blocks the next clean run too. 3. Even `--force-recreate` can't always handle the case because the old container is in the middle of being removed when the new one is being created (race). Two-step replacement that's reliable across all three failure modes: Step 1 — `docker compose rm -fs SERVICES` Stops + force-removes the canonical compose-managed container. Idempotent: does nothing if already gone. Filters out the "No stopped containers" log noise so the output stays clean. Step 2 — orphan sweep via `docker rm -f` For each service, look up its container_name from the compose config (falls back to the service name if not set), then `docker ps -aq --filter name=^${cname}$` for the canonical one and `name=_${cname}$` for hash-prefixed orphans. Anything found gets nuked. This catches the case where compose's own state has lost track of an orphan it created earlier. Step 3 — `docker compose up -d --no-deps --remove-orphans` Creates the fresh container. The `--remove-orphans` flag also silences the "Found orphan containers ([mana-game-whopixels])" warning we kept seeing — that's a leftover from a removed service that nobody had cleaned up. The container_name extraction uses awk on `compose config` output (verified locally: `mana-web` → `mana-app-web`) so the script doesn't need a hard-coded service→container mapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 20:22:52 +02:00
Till JS	6c0f88f5a2	chore(infra): pre-commit validator for cloudflared-config.yml Adds scripts/validate-cloudflared-config.mjs — a node-only validator that lint-staged runs whenever cloudflared-config.yml is staged. The goal is to catch the same failure modes that `cloudflared tunnel ingress validate` would catch on the server, but without requiring cloudflared to be installed on every dev box. Checks: - YAML parses - tunnel: is a uuid - credentials-file: ends with .json and contains the tunnel id (warning when it doesn't — likely an out-of-sync remnant from a previous rebuild, exactly the failure mode that bit us in the first locally-managed switch) - ingress: is a non-empty array - every rule except the last has both hostname AND service - the LAST rule is the catch-all `service: http_status:NNN` - no duplicate hostnames (the most common copy-paste mistake) - service URLs look like http(s):// / ssh:// / http_status:NNN / unix:/ / hello_world - hostnames are lowercase dot-separated DNS labels (no spaces, no weird characters) Wired into lint-staged.config.js with a single glob entry; the existing eslint + prettier flow is unchanged. Tested against the live cloudflared-config.yml (passes, 51 hostnames) and a synthetic broken file (catches all 6 categories of error + the credentials-file/tunnel id drift warning). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:02:51 +02:00
Till JS	77b2d1eb32	chore(infra): smarter tunnel rebuild — apex via API + sane probes Two improvements to scripts/mac-mini/rebuild-tunnel.sh based on what the first prod run actually surfaced. ═══ 1. Apex domain auto-fix via Cloudflare API ═══ `cloudflared tunnel route dns` cannot route the apex of a zone (error code 1003: "An A, AAAA, or CNAME record with that host already exists"). The CLI has no command to delete those records. The first rebuild left mana.how returning 530 because the script silently failed to route it and we had to fix the apex manually in the dashboard. The new `apex_route_via_api()` helper: - Detects apex hostnames by dot count (one dot → two-label name) - Uses $CLOUDFLARE_API_TOKEN if available - Resolves the zone id by name - Deletes any existing A / AAAA / CNAME records on the apex - Creates a fresh proxied CNAME pointing at <tunnel>.cfargotunnel.com - Cloudflare's CNAME flattening at the apex makes this work transparently If $CLOUDFLARE_API_TOKEN is not set, the script logs a warning at the top of step 6 and falls back to the old behavior (route fails, user fixes the apex manually). The token needs Zone:DNS:Edit on the target zone. ═══ 2. Smarter HTTP verification ═══ The first run reported "5 hosts down (404/000)" but those were all backend services without a root handler — credits/media/llm/mana-api all return 404 at `/` and 200 at `/health`. The verify pass was flagging healthy services as down and made the rebuild look more broken than it was. New `probe_host()` tries `/health` first, falls back to `/` only if /health returned 4xx, and prefers a 2xx/3xx root response over a 4xx /health. `probe_is_down()` only counts 5xx and 000 (libcurl error) as failures — anything in 1xx-4xx means the request reached the origin and the tunnel routing is correct, which is the actual thing the verify pass cares about. `probe_label()` adds a one-word health summary so the verify log reads "200 ok" / "401 auth required" / "404 routed (no handler)" / "530 tunnel error" instead of just bare status codes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 17:52:40 +02:00
Till JS	bd231cd689	feat(api/web): wire-format envelope versioning + Anthropic prompt-cache hints Two related AI-infrastructure hardenings landing together because both touch the same nutriphi/planta route definitions: ═══ 1. Wire-format schema versioning ═══ Adds AI_SCHEMA_VERSION + AiResponseEnvelope<T> in @mana/shared-types so every AI structured-output endpoint speaks a single envelope dialect: { schemaVersion: '1', data: <validated object> } Backend wraps via a small `envelope()` helper in each module's routes.ts; frontend api.ts unwraps via `unwrapEnvelope<T>()` which throws an AiSchemaVersionMismatchError if the server returns a version this client wasn't compiled against. Why this matters before launch: - Catches stale-cache scenarios immediately ("client v1 talking to server v2") with an actionable error in the network panel, not a cascade of "field is undefined" bugs further down the stack - Forces explicit version bumps when we make non-additive schema changes — the bump rules are documented inline next to the constant - Cheap to remove if it ever feels overkill: drop the envelope() call on the backend and the unwrapEnvelope on the frontend, ~10 lines ═══ 2. Anthropic prompt-caching directive (forward-compat) ═══ Adds `providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } }` on the system message in nutriphi + planta routes via a SYSTEM_CACHE_HINT constant. This is a NO-OP today because: - mana-llm currently routes to Gemini, not Claude - Our system prompts are ~50 tokens, well under Anthropic's 1024-token cache minimum Kept anyway because it's ~5 lines per route and lights up automatically when either condition flips (e.g. when we add per-user dietary preferences as system context, pushing prompts past the threshold). The day we point mana-llm at Claude Sonnet, every existing call site already has caching enabled — no scavenger hunt through the routes. System messages had to migrate from the `system:` shorthand to a full messages[] entry to attach providerOptions, which is a tiny readability loss but the only way to get per-message metadata into the AI SDK. ═══ Tests ═══ 13 new cases in apps/mana/apps/web/.../nutriphi/ai-schemas.test.ts cover: - AI_SCHEMA_VERSION presence + AiSchemaVersionMismatchError shape - MealAnalysisSchema acceptance/rejection (confidence bounds, missing nutrients, optional food fields, default empty arrays) - PlantIdentificationSchema (every-field-optional design, defaults, confidence range) (Test file lives in the web app rather than packages/shared-types because the latter has no test runner configured — adding vitest there just for these would be overkill.) Total nutriphi + planta suite: 62/62 passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 17:17:18 +02:00
Till JS	3993400013	chore(infra): make cloudflared-config.yml the single source of truth Reconciles the in-repo cloudflared-config.yml with the actually-loaded ingress map on the Mac Mini production tunnel — the previous repo file was missing 30+ hostnames (per-app subdomains, mana-api, sync, llm, media, credits, subscriptions, etc.) because it was last updated before the unified Mana web app rollout. Adds the new mana-api.mana.how ingress for apps/api on port 3060 so the unified backend has a public client URL for the SvelteKit web app's PUBLIC_MANA_API_URL_CLIENT. Drops the dead matrix.mana.how / element.mana.how routes — the matrix subsystem was removed in `2514831a3` and those services no longer exist. Adds scripts/mac-mini/sync-tunnel-config.sh — the one-command flow for shipping a tunnel-config change: pull on the server, validate the yaml, kickstart cloudflared via launchctl. setup-cloudflared-service.sh already wires the launchd plist with --config <repo-path> pointing at this file, so a fresh Mac Mini install + setup script + sync script gives you a fully reproducible tunnel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:37:21 +02:00
Till JS	9ef97a1877	feat(news): backend ingester service + curated feed API Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON feeds into news.curated_articles every 15 min, with Mozilla Readability fallback for thin RSS bodies and 30-day retention. apps/api /feed is rewritten to read from the new pool table directly instead of the sync_changes hack, with topics/lang/since/limit/offset query params. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:53:26 +02:00
Till JS	45790ffbb8	refactor(mana): rename inventar → inventory across the codebase The workbench-registry app id 'inventar' did not match its @mana/shared-branding MANA_APPS counterpart 'inventory', so the tier- gating join in apps/web/src/lib/app-registry/registry.ts silently failed for the inventory module — it fell into the "no MANA_APPS entry, default visible" fallback and was effectively un-gated. The codebase had also voted overwhelmingly for 'inventar' (53 files) vs 'inventory' (3 files in shared-branding), so the long-standing mismatch was just bookkeeping debt waiting to bite. Pre-release, no live data, so the cleanest fix is to align everything on the English 'inventory': - Workbench-registry id, module.config.ts appId, module folder, route folder and i18n locale folder all renamed via git mv - Standalone apps/inventar/ workspace package renamed - All imports, store identifiers (InventarEvents → InventoryEvents, INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto paths follow the rename - The German display label "Inventar" is preserved everywhere it is a user-visible string (page titles, i18n values, toast labels) - Dexie table prefixes (invCollections, invItems, …) are unchanged - Drive-by fix: ListView.svelte was querying non-existent inventarCollections/inventarItems tables — corrected to the actual invCollections/invItems names from module.config - The "inventar ↔ inventory id mismatch" workaround comment in registry.ts is removed since the mismatch no longer exists module-registry.ts also picks up the user's parallel newsModuleConfig addition because both edits land in the same import block — keeping them split would have left the build in an inconsistent state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:50:24 +02:00
Till JS	38a02b77a8	chore(dev): add setup-dev-user script for local founder accounts Local mana-auth has no built-in admin seed and `requireEmailVerification` turned on with no real SMTP — every developer ends up writing the same "register + UPDATE auth.users" SQL incantation by hand. Bundles it into one idempotent script + a pnpm alias. pnpm setup:dev-user # creates 3 default accounts ./scripts/dev/setup-dev-user.sh foo bar # creates / repairs one What it does per user: 1. POST /api/v1/auth/register on mana-auth (so Better Auth's signUpEmail handles password hashing the way the runtime expects — no hand-rolled scrypt) 2. UPDATE auth.users SET email_verified = true, access_tier = 'founder' so the new user can immediately log in AND exercise every tier-gated module without a tier upgrade dance Idempotent: existing users get tier + verification re-applied without touching the password. Re-running after a partial setup is safe. Defaults to three accounts (tills95 / tilljkb / rajiehq @gmail.com, all with password "Aa-123456789") so the next dev doesn't have to remember anything. Override via `TIER=alpha` / `DB_HOST=...` env vars when needed. Two preflight gates fail loud: psql in PATH + mana-auth reachable on :3001. ON_ERROR_STOP=1 in psql so a bad SQL run doesn't get silently swallowed. Replaces the dangling `seed:dev-user` package.json alias that pointed at a `pnpm --filter @mana/auth db:seed:dev` script that was never created — clean rename to `setup:dev-user` to match the existing `setup:env` / `setup:db` family. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:23:15 +02:00
Till JS	30766f153e	fix(macmini): auto-rebuild stale sveltekit-base before per-app web builds NOTE: the previous commit `048184bef` carried this commit message but accidentally bundled an unrelated PickerOverlay refactor instead of this script change (lint-staged stash interaction). This is the actual fix. Per-app web Dockerfiles do `FROM sveltekit-base:local` and do NOT re-COPY packages/shared-* — those packages are baked into the base image. So a change to packages/shared-utils, packages/shared-ui, etc. only reaches the live web app if the base image is also rebuilt. This bit us THREE times on 2026-04-08 alone: 1. CSP fix in shared-utils ('wasm-unsafe-eval') sat unused in production for over an hour because every `build-app.sh mana-web` reused the cached base layer with old shared-utils. 2. The BaseListView export in shared-ui after the ListView consolidation refactor — mana-web's build failed because Rollup couldn't resolve the new symbol from the stale base. 3. Same shape, different package, repeatedly during the Gemma 4 migration push. The pattern is identical every time and the manual workaround (`build-app.sh --base` first) is something you only think to run if you already know how the layering works. Make the script catch it. New `is_base_image_stale` helper compares the base image's `Created` timestamp against the latest git commit touching paths the base image actually depends on (packages/, docker/Dockerfile.sveltekit-base, pnpm-lock.yaml). When building any *-web service, if the image is stale or missing, the base is rebuilt automatically before the per-app build kicks off, with the triggering commit's oneline printed for transparency. Date parsing handles macOS Docker's local-TZ-offset RFC3339 format (`...+02:00`, not Z). We strip from char 19 onward and parse the literal local clock time with BSD date (no -u). GNU date is the fallback for Linux dev boxes. If parsing fails for any reason we conservatively force a rebuild rather than risk shipping stale code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 22:35:46 +02:00
Till JS	d941ff2231	fix(mana-auth): account lockout was structurally dead + add failure-path tests While adding negative-path integration tests for the auth flow I discovered that neither of the lockout primitives in services/mana-auth/src/services/security.ts has actually been working in production. Two independent silent failures that combined into a "the lockout never triggers, ever" outcome: 1. recordAttempt() inserted into auth.login_attempts with explicit `id = gen_random_uuid()`, but auth.login_attempts.id is a `serial integer` column with `nextval('auth.login_attempts_id_seq')` as default. The UUID-into-integer cast threw a type error every single time, the bare `catch {}` swallowed it as "non-critical", and not a single login attempt was ever persisted. Lockout's "5 failures in 15 min" check was running against an empty table. 2. checkLockout() built `attempted_at > ${new Date(...)}` via the drizzle sql template, but postgres-js cannot bind a JS Date object directly — it tries to byteLength() the parameter and crashes with `Received an instance of Date`. Same anti-pattern: bare `catch`, returns `{locked: false}` (fail-open), no log, completely invisible. Both are "silent broken since the encryption-vault series of changes" class — caught only because the integration test for the lockout flow expected the 6th login attempt to return 429 and got 200 instead. Fixes: - recordAttempt(): drop the bogus `id` column from the INSERT (let the sequence default assign it), default ipAddress to null instead of letting `${undefined}` collapse the parameter slot, and surface errors in the catch instead of swallowing them silently. - checkLockout(): pass `windowStart.toISOString()` instead of the Date object so postgres-js can serialize it. Same catch upgrade — log the cause when failing open. Failure-path test additions (tests/integration/auth-failures.test.ts): - wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events, +1 row in auth.login_attempts - account lockout: 5 failed attempts then 6th returns 429 with remainingSeconds, even with the correct password - unverified email login: 403 with code = EMAIL_NOT_VERIFIED - validate with garbage token: valid !== true - resend verification: second mail arrives in mailpit Plus the run-integration-tests.sh helper now runs both .test.ts files and tests/integration/package.json's `test` script does the same. Negative-control: reverted the recordAttempt fix (re-added the bogus gen_random_uuid id), the wrong-password test failed at the login_attempts assertion. Reverted the checkLockout fix, the lockout test failed at the 429 assertion. Both fixes verified to be load-bearing. 6 tests, 45 expects, ~1.3s on a warm cache.	2026-04-08 18:29:00 +02:00
Till JS	c5e5963cbe	fix(macmini): repair container auto-recovery (broken --env-file path) Two unrelated bugs in scripts/mac-mini/ensure-containers-running.sh, both caught while debugging a mana-auth crash loop on 2026-04-08: 1. The recovery path passed --env-file "$PROJECT_ROOT/.env.macmini" to docker compose, but that file has never existed on the server — only .env does, and compose auto-loads it from the working directory. The explicit --env-file silently caused recovered containers to start with empty secrets (e.g. blank MANA_AUTH_KEK), which made mana-auth crash the moment it came back up. The auto-recovery loop was therefore self-defeating: it kept "fixing" auth into the same broken state every 5 minutes for hours, with no notification because compose exited 0. Drop --env-file entirely and cd into PROJECT_ROOT so compose's standard .env discovery applies. 2. mana-infra-minio-init is a one-shot job container that legitimately sits in "exited" state after running once. The script flagged it as "stuck" every cycle, tried to "recover" it, and spammed the log with ERROR lines. Add an explicit ONESHOT_INIT_CONTAINERS allowlist and skip those names in both the initial scan and the post-recovery verification. Also tee compose output into the log so future failures actually leave a breadcrumb instead of disappearing into the void. Also: bump @mlc-ai/web-llm from a transitive dep (via @mana/local-llm) to a direct dep of @mana/web. SvelteKit's adapter-node post-build Rollup pass uses the web app's direct deps as its externals heuristic; without this entry it warns "@mlc-ai/web-llm ... could not be resolved - treating it as an external dependency" on every build. Functionally harmless (the dynamic import in LocalLLMEngine only fires in the browser), but the warning hid a real adapter-node misconfiguration that would have bitten us if we'd ever tried to SSR /llm-test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 18:17:31 +02:00
Till JS	4fce6a3ede	feat(env): persistent dev secrets via .env.secrets override Local dev secrets like MANA_STT_API_KEY had no persistent home — they lived only in the gitignored, generator-overwritten per-app .env files. Every `pnpm setup:env` wiped them, so devs had to re-paste keys after any env regeneration. Same recurring friction for MANA_LLM_API_KEY, MANA_AUTH_KEK, OAuth keys, etc. New layer: `.env.secrets` at the repo root. - Gitignored, optional, never required for the build to pass - Read by generate-env.mjs AFTER .env.development; non-empty values override the matching key, so the merged result drives every per-app .env the generator writes - Empty values fall through to the .env.development defaults — a freshly-copied .env.secrets.example is a no-op - One source of truth for all dev secrets, propagated to every app with one `pnpm setup:env` Files: - `.env.secrets.example` — committed template documenting all known secret keys (mana-stt, mana-llm, auth KEK, sync JWT, MinIO, third- party APIs). Devs `cp .env.secrets.example .env.secrets` and fill in. - `.gitignore` — ignores .env.secrets, allows .env.secrets.example - `scripts/generate-env.mjs` — loads .env.secrets if present, prints "Loaded N secrets from .env.secrets" so devs see the override taking effect - `scripts/setup-secrets.mjs` + `pnpm setup:secrets` — convenience script that SSHes to mana-server, greps the prod .env for the keys defined in .env.secrets.example, and writes them locally. Confirms before overwriting an existing .env.secrets unless --force is set; reports which keys couldn't be found on the remote so devs know what's left to fill manually - `docs/LOCAL_DEVELOPMENT.md` + `docs/ENVIRONMENT_VARIABLES.md` — walk-through and architecture diagram update Verified end-to-end: - `rm .env.secrets apps/mana/apps/web/.env && pnpm setup:env` → STT key empty (no regression for devs who haven't opted in) - `pnpm setup:secrets --force && pnpm setup:env` → STT key propagated, "Loaded 3 secrets from .env.secrets" in output - POST /api/v1/voice/transcribe with a real audio file → full transcript back via gpu-stt.mana.how, end-to-end working Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 17:50:37 +02:00
Till JS	e8de377cfe	fix(macmini): mount prometheus config directly so /-/reload picks up edits VictoriaMetrics + vmalert previously copied prometheus.yml/alerts.yml from /mnt/prometheus-config/ into /etc/prometheus/ at container start. The copy silently drifted from the host file whenever the container wasn't restarted — which is exactly what hid the matrix/element removal from status.mana.how until 2026-04-08, when VM was still actively scraping the deleted targets because its in-container config snapshot pre-dated the cleanup. Now both containers mount ./docker/prometheus directly into /etc/prometheus (resp. /etc/alerts) read-only and point the binary at it, and deploy.sh issues POST /-/reload to both after each deploy so config edits go live without a container recreate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 17:25:48 +02:00
Till JS	5af4ddab3c	test(integration): end-to-end auth flow test with Mailpit + CI gating Adds a 13-step integration test that exercises register → email verification → login → JWT validation → /me/data → encryption-vault init/key → logout against a real stack of postgres + redis + mailpit + mana-auth + mana-notify in docker compose. Verified locally that this catches every regression we hit on 2026-04-08 in well under a second: - missing nanoid dependency → register endpoint 500 - missing MANA_AUTH_KEK env passthrough → mana-auth never starts - missing encryption-vault SQL migrations → vault endpoints 500 - wrong cookie name in /api/v1/auth/login → no accessToken in response - mana-notify SMTP misconfigured → mailpit poll times out Files: - docker-compose.test.yml — minimal isolated stack on alt ports (postgres 5443, redis 6390, mailpit 1026/8026, mana-auth 3091, mana-notify 3092). Runs alongside the dev stack without collision. Postgres healthcheck runs a real query rather than just pg_isready to avoid the race where pg_isready reports healthy while the docker init scripts are still running on a unix socket. - tests/integration/auth-flow.test.ts — bun test that drives the full flow via fetch + mailpit's REST API. Cleans up its test user from postgres in afterAll. Self-contained, no extra deps. - tests/integration/README.md — what's covered, why it exists, how to run locally + extend. - scripts/run-integration-tests.sh — orchestrator. Brings up the stack, pushes the @mana/auth Drizzle schema, applies the encryption-vault SQL migrations (002, 003), restarts mana-auth so it sees the fresh tables, runs the test, tears down on exit. KEEP_STACK=1 to leave it up for manual mailpit inspection. - docker-compose.dev.yml — also adds Mailpit as a regular dev service (ports 1025/8025) so local development can have a working email capture without spinning up the test stack. - .github/workflows/ci.yml — new auth-integration job that runs on every PR. Calls run-integration-tests.sh; on failure dumps mana-auth + mana-notify logs and the mailpit message queue. Marked as a required check via the existing PR validation pipeline. Reproduced 3 clean runs and 1 negative-control run (removed nanoid from package.json → mana-auth container exits → script aborts with non-zero) before committing. Full happy path runs in ~22s on a warm Docker cache.	2026-04-08 17:14:02 +02:00
Till JS	029c7973ef	feat(mana/web): pass MANA_LLM_API_KEY from voice parse proxies The /api/v1/voice/parse-task and /api/v1/voice/parse-habit endpoints forwarded transcripts to mana-llm without an X-API-Key header. This worked against the local mana-llm container (no auth) but silently fell back to the no-LLM path when pointed at gpu-llm.mana.how, which requires an API key — voice quick-add would look like it was running in degraded mode forever with no signal that auth was the cause. Now both endpoints read MANA_LLM_API_KEY from the server-side env and attach it as X-API-Key when present, mirroring the pattern already used by /api/v1/voice/transcribe for mana-stt. When the var is empty the header is omitted, so local Docker setups without auth still work. Plumbing: generate-env.mjs writes MANA_LLM_URL + MANA_LLM_API_KEY into apps/mana/apps/web/.env, .env.development gets the new keys with empty defaults, ENVIRONMENT_VARIABLES.md documents the gateway and where to get a key. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:40:26 +02:00
Till JS	8e8b6ac65f	fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack This commit bundles two unrelated changes that were swept together by an accidental `git add -A` in another working session. Documented here so the history reflects what's actually inside. ═══════════════════════════════════════════════════════════════════════ 1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead of api.signInEmail ═══════════════════════════════════════════════════════════════════════ Previous attempt (commit `55cc75e7d`) tried to fix the broken JWT mint in /api/v1/auth/login by switching the cookie name from `mana.session_token` to `__Secure-mana.session_token` for production. That was necessary but not sufficient: Better Auth's session cookie value isn't just the raw session token, it's `<token>.<HMAC>` where the HMAC is derived from the better-auth secret. Reconstructing the cookie from auth.api.signInEmail's JSON response only gave us the raw token, so /api/auth/token's get-session middleware still couldn't validate it and the JWT mint kept silently failing. Real fix: do the sign-in via auth.handler (the HTTP path) rather than auth.api.signInEmail (the SDK path). The handler returns a real fetch Response with a Set-Cookie header containing the fully signed cookie envelope. We capture that header verbatim and forward it as the cookie on the /api/auth/token request, which now passes validation and mints the JWT correctly. Verified end-to-end on auth.mana.how: $ curl -X POST https://auth.mana.how/api/v1/auth/login \ -d '{"email":"...","password":"..."}' { "user": {...}, "token": "<session token>", "accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now "refreshToken": "<session token>" } Side benefits: - Email-not-verified path is now handled by checking signInResponse.status === 403 directly, no more catching APIError with the comment-noted async-stream footgun. - X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter and our security log see the real client IP. - The leftover catch block now only handles unexpected exceptions (network errors etc); the FORBIDDEN-checking logic in it is dead but harmless and left in for defense in depth. ═══════════════════════════════════════════════════════════════════════ 2. chore: remove the entire self-hosted Matrix stack (Synapse, Element, Manalink, mana-matrix-bot) ═══════════════════════════════════════════════════════════════════════ The Matrix subsystem ran parallel to the main Mana product without any load-bearing integration: the unified web app never imported matrix-js-sdk, the chat module uses mana-sync (local-first), and mana-matrix-bot's plugins duplicated features the unified app already ships natively. Keeping it alive cost a Synapse + Element + matrix-web + bot container quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth, and a steady drip of devlog/dependency churn. Removed: - apps/matrix (Manalink web + mobile, ~150 files) - services/mana-matrix-bot (Go bot with ~20 plugins) - docker/matrix configs (Synapse + Element) - synapse/element-web/matrix-web/mana-matrix-bot services in docker-compose.macmini.yml - matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes - OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks table from mana-auth (oauth_* schema definitions also removed) - MatrixService import path in mana-media (importFromMatrix endpoint) - Matrix notification channel in mana-notify (worker, metrics, config, channel_type enum, MatrixOptions handler) - Matrix entries from shared-branding (mana-apps + app-icons), notify-client, the i18n bundle, the observatory map, the credits app-label list, the landing footer/apps page, the prometheus + alerts + promtail tier mappings, and the matrix-related deploy paths in cd-macmini.yml + ci.yml Devlog/manascore/blueprint entries that mention Matrix are left intact as historical record. The oauth_* + matrix_user_links Postgres tables stay on existing prod databases — code can no longer write to them, drop them in a follow-up migration if you want them gone for real. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:32:13 +02:00
Till JS	f4347032ca	chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere	2026-04-08 13:06:40 +02:00
Till JS	5581295b12	chore: tidy root files + reorganize a few stale docs Root file cleanup: - mac-mini-setup.sh → scripts/mac-mini/bootstrap.sh (first-time bootstrap belongs next to the other mac-mini setup-* scripts) - test-chat-auth.sh → scripts/test-chat-auth.sh (ad-hoc smoke test, no reason to live in the repo root) - cloudflared-config.yml stays in root on purpose — it's the single source of truth read by scripts/mac-mini/setup-*.sh and scripts/check-status.sh. Docs: - docs/POSTMORTEM_2026-04-07.md → docs/postmortems/2026-04-07-memoro-deploy-prod-wipe.md (creates the postmortems/ home for future entries; descriptive name) - docs/future/MAIL_SERVER_MAC_MINI_TEMP.md deleted — what it described ("Bereit zur Umsetzung", Stalwart on Mac Mini) is what's actually running today, documented in docs/MAIL_SERVER.md. The DEDICATED variant in docs/future/ remains since it's still a real future plan. Root CLAUDE.md fix: - @mana/local-store description was wrong — claimed it was legacy/standalone only, but it's still used by apps/mana/apps/web itself, plus manavoxel, arcade, and three shared packages. Not touched (flagged for follow-up): - NewAppIdeas/ (344K of "Roblox Reimagined" planning notes in repo root) — user decision: archive externally or move under docs/future/ - Doc giants (PROJECT_OVERVIEW 41k, MATRIX_BOT_ARCHITECTURE 36k, etc.) — splitting them is its own refactor - Service CLAUDE.md staleness audit across 18 services — too broad for this pass	2026-04-08 12:15:27 +02:00

1 2 3 4 5

220 commits