Commit graph

226 commits

Author SHA1 Message Date
Till JS
8a882a3760 feat(wardrobe,picture): Google Nano Banana as a Try-On option
Add Google's Gemini image edit family (Nano Banana) as a user-
selectable model for Wardrobe Try-On next to the existing OpenAI
path. Three concrete choices now expose themselves in the Solo and
Outfit Try-On buttons:

  - openai/gpt-image-2          (default, falls back to gpt-image-1
                                 server-side when the org isn't
                                 verified)
  - google/gemini-3-pro-image-preview   (Nano Banana Pro — premium
                                 identity / character consistency)
  - google/gemini-3.1-flash-image-preview (Nano Banana 2 — newest,
                                 fast, cheapest)

All three accept multi-image refs (face + body + garment) through
the same /api/v1/picture/generate-with-reference endpoint; the only
differences are the provider-specific request/response shape and
the model-id routing.

Server (apps/api/src/modules/picture/routes.ts):
- Guard now accepts `openai/*` and `google/*` prefixes and rejects
  everything else as "not supported for edits". Each provider's key
  is validated separately so missing GEMINI_API_KEY doesn't break
  OpenAI calls and vice versa.
- New `callGeminiEdits(modelName)` helper mirrors the shape of
  callOpenAiEdits: encodes the normalized PNG refs as base64
  inline_data parts, POSTs to
  generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
  with responseModalities=["TEXT","IMAGE"] and imageConfig
  (aspectRatio + imageSize), pulls the generated image out of
  candidates[].content.parts[].inlineData.
- Our internal size strings map cleanly: 1024x1024 → 1:1 / 1K,
  1024x1536 → 2:3 / 1K, 1536x1024 → 3:2 / 1K. Gemini 1K is enough
  for the thumbnail sizes Wardrobe renders; going higher bloats
  payload without visible gain.
- creditsFor() gains a google/ branch proportional to upstream
  pricing (pro ≈ 18, 3.1-flash ≈ 6, 2.5-flash ≈ 5).
- Response `model` reports `${provider}/${modelUsed}` so the picture
  row's model metadata is accurate across providers.

Client (apps/mana/apps/web/src/lib/modules/wardrobe):
- api/try-on.ts: export `TryOnModel` union + `DEFAULT_TRY_ON_MODEL`.
  RunGarmentTryOnParams / RunOutfitTryOnParams gain an optional
  `model` field, threaded through `callGenerateWithReference`.
- components/TryOnModelPicker.svelte: new segmented control, three
  options with label + one-line hint. Grid-auto-fits so it reflows
  on the narrow workbench card.
- components/GarmentTryOnButton.svelte + TryOnButton.svelte: both
  mount the picker above the Sparkle CTA. `estimatedCredits` on the
  button label updates live when the user switches model so the
  cost signal matches what the server will actually charge.

Env (scripts/generate-env.mjs): GEMINI_API_KEY and GOOGLE_API_KEY
now propagate from the root `.env.development` into `apps/api/.env`
so mana-api can pick them up at boot. The route reads GEMINI_API_KEY
with GOOGLE_API_KEY as fallback, matching how mana-llm ships today.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 16:04:21 +02:00
Till JS
be8f5618c6 chore(setup:db): surface drizzle-kit errors instead of catch-all
push_schema used to print "Failed (may not have db:push script)" for
every non-zero exit, lumping real failures (stuck rename prompts,
pre-existing public enums) in with missing scripts. Now it prints the
real exit code and tails the last 5 lines of drizzle-kit output so the
root cause is visible without re-running by hand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:18 +02:00
Till JS
eb8fac23ec fix(personas): exact tool_use_id pairing + CI drift audit
Two loose ends from M3/M4:

1. Tool_use_id-based error attribution in the persona-runner
-----------------------------------------------------------
The previous collectActionsFromMessage() flipped the *most recent*
ActionRow to 'error' when a tool_result carried is_error:true. That was
fine as long as Claude invoked tools strictly in sequence, but when
the planner pipelines multiple tools in one turn, a later tool_result
carries an earlier tool_use_id — the last-action fallback mis-
attributes the error.

runMainTurn() now keeps a tool_use_id → action-index Map for the
duration of the tick. On tool_use we stash block.id, on tool_result we
look up the exact ActionRow via tool_use_id and flip that one. The
"flip last" path survives as a pure fallback if a future SDK ever
ships a block without an id.

2. New audit:encrypted-tools script
-----------------------------------
scripts/audit-encrypted-tools.ts — loads registerAllModules() and
apps/mana/…/crypto/registry.ts, diffs every ToolSpec.encryptedFields
against the authoritative web-app ENCRYPTION_REGISTRY.

Catches three classes of drift:
- missing-table : tool declares a table the web-app doesn't encrypt
- field-drift   : both agree a table is encrypted but the field lists
                  differ (half-encryption in the wire is silent death)
- disabled      : web-app has enabled:false while the tool still
                  encrypts — advisory warning, not a fail

Negative-tested by injecting a deliberate drift on todo.create +
todo.list (shortened ENCRYPTED_FIELDS to ['title']); the auditor
flagged both tools with full field diffs, restore returned to green.

Wired into `pnpm run validate:all` so the contract survives future
edits on either side. Fills the M4 audit gap noted in
project_mana_mcp_personas.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:34:52 +02:00
Till JS
493db0c3b2 feat(personas): M2.a-c — persona schemas + admin endpoints + seed pipeline
Continuation of docs/plans/mana-mcp-and-personas.md. Personas are the
auto-test users the M3 runner will drive — they're real Mana users
(kind='persona', tier='founder'), registered through the same Better
Auth pipeline as humans, just stamped differently and metadata-tracked
so the persona-runner knows how to role-play them.

Schemas (auth namespace — personas are 1:1 with users, no reason for a
separate platform.* schema that the plan originally sketched)

- userKindEnum ('human' | 'persona' | 'system') + users.kind column,
  wired into better-auth additionalFields so the JWT/user object carry
  the flag. Default 'human' keeps every existing user untouched.
- auth.personas — 1:1 descriptor (archetype, systemPrompt, moduleMix
  jsonb, tickCadence, lastActiveAt). CASCADE from users.id.
- auth.persona_actions — tick-grouped audit of every tool call the
  runner makes (toolName, inputHash for dedup, result, latency).
- auth.persona_feedback — structured 1-5 ratings per module per tick,
  plus free-text notes. This is where the runner writes the
  self-reflection step at end of each tick.

Admin endpoints (/api/v1/admin/personas, admin-tier-gated)

- POST /            create-or-update by email. Uses auth.api.signUpEmail
                    if the user's new, then stamps kind+tier+verified
                    and upserts the personas row. Idempotent — safe to
                    re-run after catalog edits.
- GET  /            list with 7-day action count per persona.
- GET  /:id         detail + recent 20 actions + per-module feedback
                    aggregate.
- DELETE /:id       hard delete. Refuses non-persona users as
                    defense-in-depth: an admin typo here would cascade
                    through the full user-delete chain.

Catalog + seed pipeline (scripts/personas/)

- catalog.json      10 handwritten personas spanning 7 archetypes
                    (adhd-student, ceo-busy, creative-parent, solo-dev,
                    researcher, freelancer, overwhelmed-newbie).
                    Five pairs of personas that will later share
                    family/team spaces (cross-space setup is deferred
                    to M2.d per the plan).
- catalog.ts        zod-validated loader. Refines email to require
                    @mana.test TLD — non-existent, no bounce risk.
- password.ts       deterministic HMAC-SHA256(PERSONA_SEED_SECRET,
                    email). No stored per-persona credentials; the
                    runner re-derives on every login. Refuses the
                    dev-fallback secret in production.
- seed.ts           POST /admin/personas per catalog entry. Flags:
                    --auth=, --jwt=, --dry-run.
- cleanup.ts        Hard-delete every live persona. Warns when the
                    live set drifts from the catalog.

Root package.json:
  pnpm seed:personas
  pnpm seed:personas:cleanup

Extends the ESLint root-ignore list with `scripts/**` so Bun-typed
utility scripts don't fail the typed-parser check they weren't opted
into. Consistent with the rest of scripts/ being .mjs+.sh.

To go live (user action):
  pnpm docker:up
  cd services/mana-auth && bun run db:push
  export MANA_ADMIN_JWT=...
  pnpm seed:personas

M2.d deferred: cross-space (family/team/practice) memberships between
persona pairs. Better Auth's org-invite flow is multi-step and would
roughly double the M2 scope; the persona-runner (M3) can operate in
personal spaces first, shared-space tests land as their own milestone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:55:14 +02:00
Till JS
f719d1768f chore(infra): unify prod deploy on .env.macmini + document missing keys
Two pieces of the same cleanup:

1. build-app.sh now passes `--env-file .env.macmini` explicitly via a
   shared COMPOSE_ARGS array. Without it, docker compose silently fell
   back to `.env` in the project root — a separate file that happened
   to hold MANA_AUTH_KEK and other secrets that `.env.macmini` lacked.
   deploy.sh, restart.sh, and the CD workflow already used the flag;
   this aligns build-app.sh with the rest. Server-side .env.macmini
   was reconciled 2026-04-23 with the union of both files, so the
   duplicate `.env` is no longer needed.

2. .env.macmini.example now documents 7 keys the prod stack actually
   depends on but that had never been listed: GOOGLE_GEMINI_API_KEY /
   GOOGLE_GENAI_API_KEY (SDK aliases for Deep-Research + mana-ai),
   MANA_AI_PRIVATE_KEY_PEM / MANA_AI_PUBLIC_KEY_PEM (Mission-Grant
   keypair), MANA_AI_DEEP_RESEARCH_ENABLED + PUBLIC_AI_MISSION_GRANTS
   (feature flags), MANA_CORE_SERVICE_KEY (legacy alias), and the STT/
   TTS internal shared secrets.

Matrix-bot tokens deliberately left undocumented — no Matrix homeserver
in the current running stack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:01:29 +02:00
Till JS
3a68a63728 feat(picture,api): GPT-Image-2 image generation
Adds a third provider path to /api/v1/picture/generate that calls OpenAI
gpt-image-2 when model starts with "openai/". Supports n=1..4 batch
generation with character continuity, base64 response decoded server-side
and uploaded to mana-media for dedup + thumbnails. Credit cost scales
by quality (low=3, medium=10, high=25) × n.

Env plumbing:
- scripts/generate-env.mjs: new apps/api/.env stanza propagates
  OPENAI_API_KEY + REPLICATE_API_TOKEN from .env.secrets
- .env.macmini.example: documents OPENAI_API_KEY for prod

Frontend /picture/generate: model + quality + aspect-ratio + batch-count
selectors, real fetch with auth, persists each image via imagesStore.insert
(encrypted + synced). Wrapped in ModuleShell variant=fill with back-arrow
to /picture and a live credit badge in the header actions slot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 00:37:15 +02:00
Till JS
4d5a96e21b perf(invoices): lazy-load pdf-lib + swissqrbill, -516 KB on route
/(app)/invoices/[id] route bundle drops from **534 KB → 18.6 KB** by
moving PDF rendering behind dynamic imports.

Changes:
  - views/DetailView.svelte: `await import('../pdf/renderer')` inside
    renderPdf() + downloadPdf(), cached in a module-local ref.
  - components/SendModal.svelte: same for openAndDownload().
  - pdf/scor.ts (new): generateSCORReference extracted so the
    invoices store can derive a reference string without pulling
    swissqrbill/svg + pdf-lib into the list-view bundle.
  - pdf/qr-bill.ts: re-exports generateSCORReference from scor.ts
    for backward compatibility.
  - stores/invoices.svelte.ts: imports from ../pdf/scor (light) instead
    of ../pdf/qr-bill (heavy).
  - index.ts: drop re-export of the PDF renderer from the module
    barrel so `import ... from '$lib/modules/invoices'` never drags
    pdf-lib in.

The heavy chunk (pdf-lib + swissqrbill, ~576 KB) now only loads when
a user actually opens an invoice detail — list views, create flow, and
all other routes stay lean.

20/20 qr-bill tests pass; svelte-check clean.

Bonus: scripts/audit-icon-usage.mjs (+ pnpm run audit:icon-usage)
audits @mana/shared-icons imports. Reveals 204 distinct icons across
the codebase, 199 of them at default weight but paying for all 6
Phosphor weights. Biggest offender: app-registry/apps.ts with 69
static icon imports accounting for ~290 KB of the shared 466 KB icon
chunk. Migration path for that is documented in
docs/optimizable/bundle-analysis.md §2 — next session's work.

docs/optimizable/bundle-analysis.md also updated with the root (app)
layout (260 KB) investigation notes (start/stop lifecycle hooks to
defer via idleCallback).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:03:53 +02:00
Till JS
3b85d7d3d2 chore(bundle): add bundle-size audit + snapshot inventory
scripts/audit-bundle.mjs reads `.svelte-kit/output/client/_app/immutable`
after a prod build and reports:
  - Total size + category breakdown (entry / nodes / chunks / workers /
    assets).
  - Top N JS files with content heuristics (transformers.js, zxcvbn,
    tiptap, pdf-lib, swissqrbill, rrule, suncalc, Phosphor icon paths,
    Vite __vite__mapDeps metadata, etc).
  - Route mapping for `nodes/*.js` by parsing the server manifest's
    `leaf:` entries, so node 118 is identified as /(app)/invoices/[id].
  - ⚠ flag on chunks/ ≥ 200 KB (shared, potentially eager).

Current snapshot (docs/optimizable/bundle-analysis.md):
  entry   92 KB  |  nodes   2.77 MB  |  chunks   5.59 MB
  workers 22.3 MB (ONNX WASM, lazy)  |  total   31.8 MB

Already healthy:
  - 92 KB entry (no critical-path bloat).
  - 22 MB transformers.js WASM is worker-scoped — only fetched on first
    /llm-test or memoro voice use.
  - zxcvbn (1.25 MB combined dict + keyboard graphs) correctly behind a
    dynamic import in PasswordStrength.svelte.

Follow-up opportunities logged:
  1. /invoices/[id] = 534 KB — split swissqrbill + pdf-lib via dynamic
     import.
  2. @mana/shared-icons = 317 + 149 KB SVG path chunks — migrate to
     tree-shakable per-icon imports or lazy-load.
  3. Root (app) layout = 260 KB — check for module bleed into shared
     shell.

Report-only. Run with `pnpm run audit:bundle`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:52:08 +02:00
Till JS
68c0eb2892 chore(test + audit): add test-coverage audit + wire audit:all
#6 test coverage (pivot to reporting): 34/653 tests currently fail
(in-flight spaces-foundation migrations). Hard coverage thresholds
aren't enforceable until the suite is green, so this session ships a
file-presence audit instead of line-coverage gates.

  - scripts/audit-test-coverage.mjs — counts .svelte + .ts source files
    vs .test.ts + .spec.ts per module. Reports total ratio, lists
    modules with 0 tests + ≥3 source files (prioritised by size).
  - pnpm run audit:test-coverage  wires it into audit:*.
  - docs/optimizable/test-health.md — state + prevention path + top
    untested modules ranked by impact.

Current baseline: 2.6% file-level coverage. 66/78 modules have zero
tests. Biggest untested: times (32 src), articles (29), events (27),
inventory + skilltree (20 each).

#8 audit:all: single entry point for the reporting audits. Runs
port-drift + i18n-coverage + test-coverage in --summary mode. Distinct
from validate:all (which is gates, not reports).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:38:12 +02:00
Till JS
4d91e2daad chore(services): add port-drift audit
Each services/*/CLAUDE.md declares `## Port: NNNN` — the authoritative
per-service port spec (docs/PORT_SCHEMA.md is explicitly partially
aspirational). This audit verifies:

  1. Declared port appears as a literal in the service's own source
     (catches: moved port in code but forgot to update CLAUDE.md).
  2. No two services claim the same port (catches: accidental
     collision when scaffolding new services).

Current state: ✓ 15 services, all declared ports found in code, zero
collisions (mana-auth/geocoding/stt/tts/image-gen/voice-bot/mail/
credits/user/subscriptions/analytics/events/news-ingester/ai/research).

Report-only; not a CI gate. Run with `pnpm run audit:port-drift`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:22:37 +02:00
Till JS
52af8c0cec refactor(theming): migrate who semantic colours to theme tokens
PlayView used Tailwind palette classes for game-status feedback:

  bg-emerald-500/10 + text-emerald-300   (won)    → bg-success/10 + text-success
  bg-amber-500/10 + text-amber-300       (lost)   → bg-warning/10 + text-warning
  border-red-500/20 + bg-red-500/10 +
    text-red-300                         (error)  → border-error/20 + bg-error/10 + text-error
  placeholder-white/30 focus:border-purple-400/50 → placeholder:text-muted-foreground/60 focus:border-primary/50

Semantic status now tracks the theme (errors are red in dark, darker red
in light, etc.) instead of being fixed hex ramps.

The `bg-purple-500` / `bg-purple-500/30` / `hover:bg-purple-600` classes
on the user's chat bubble and submit buttons STAY — purple is the who
module's primary identity colour (historical-deck accent `#a855f7` is
semantically the same hue). Documented in brand-literals.md §who.

Also harden two validators against mid-rename states where git ls-files
returns paths that aren't on disk yet — both now skip unreadable files
instead of crashing the pre-commit hook (caught while migrating who).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:19:53 +02:00
Till JS
eec369bd04 chore(i18n): add coverage audit + migration inventory
Translation infrastructure (@mana/shared-i18n + svelte-i18n + 35
per-module locale files with ~3500 lines across de/en/it/fr/es) is fully
wired, but 65/78 modules still hardcode German in .svelte templates
rather than calling {$_('module.key')}.

Adds:
  - scripts/audit-i18n-coverage.mjs — scans lib/modules/**/*.svelte for
    hardcoded German keywords (Abbrechen, Speichern, Löschen, etc.) in
    files that don't import $_(). Reports per-module hit counts,
    bucket (FULL/PARTIAL/NONE), and whether the locale file exists.
    Supports --summary and --top N flags.
  - pnpm run audit:i18n-coverage  wires it into the audit:* family
    (reporting only, not a CI gate — existing debt would fail
    validate:all otherwise).
  - docs/optimizable/i18n-migration-inventory.md — priority list,
    per-module workflow, and prevention plan.

Top offenders: broadcast (26 hits), articles (24), events (23),
invoices (22), quiz (20), stretch (20), library (19), profile (17),
skilltree (15, PARTIAL), calendar (14, PARTIAL). Modules without a
locale file (broadcast/articles/events/invoices/…) need the locale
stubs scaffolded first.

Real string migration is per-site careful work (key naming, 5-language
parity, UI visual QA) and is left for per-module follow-up sessions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:16:55 +02:00
Till JS
430aa30cbf refactor(theming): re-apply theme validator suite after parallel rebase
The plan-doc commits 129971ffc + 9db044178 dropped the
audit-theme-tokens → validate-theme-variables rename, the
validate-theme-tokens → validate-theme-utilities rename, the new
validate-theme-parity script, brand-literals.md, and the corresponding
package.json + lint-staged.config.js + themes.css wiring. The files
still existed on disk (git mv changes survived) but were untracked.

Restore the validator suite so `pnpm run validate:all` works again:
  - validate:theme-variables (CSS var names: --muted → --color-muted)
  - validate:theme-utilities (Tailwind: no white/N, no neutral palette)
  - validate:theme-parity    (every --color-* in :root ⇔ .dark + each
                              [data-theme="..."])

All three wired into validate:all and lint-staged. `pnpm run validate:all`
is clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:07:48 +02:00
Till JS
ea71d3c215 refactor(theming): replace transition-all with specific transitions
Sweep 98 `transition-all` occurrences across 62 files and replace with
targeted Tailwind transition utilities. Motivation:

1. `transition-all` animates every property, including CSS custom-
   property-backed colours. On first paint the vars may not have
   resolved yet, producing the P5 "white-on-white until first
   interaction" rendering bug. The same bug hit food/moodlit ListViews
   in the earlier theme migration.

2. Specific transitions also perform better — no layout-property
   interpolation overhead.

Codemod scripts/migrate-transition-all.mjs classifies each class
attribute by its sibling classes and picks one of:

  - `transition-opacity`                     — icon fade on group-hover
  - `transition-[width]`                     — progress-bar width anim
  - `transition-[transform,colors,box-shadow]` — scaled buttons/cards
  - `transition-[border-color,box-shadow]`   — card hover:border+shadow
  - `transition-colors`                      — default (card/row hover)

91 / 98 auto-classified, 7 hand-migrated:

  - EntryItem              → transition-[box-shadow]      (ring fade)
  - NutritionProgressWidget → transition-[stroke-dashoffset,stroke]
  - OnboardingModal        → transition-[width,background-color]
  - times/reports (3×)     → transition-[width] / -[height] (bar anims)
  - presi/present          → transition-[width,background-color] (dots)

svelte-check clean with 0 errors; validate:all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:57:49 +02:00
Till JS
7d6a340b13 refactor(theming): migrate remaining 738 token violations across routes + components
Expand validate-theme-tokens.mjs scope from ListViews only to all
lib/modules/**/*.svelte and routes/(app)/**/*.svelte. Add a second rule
banning the neutral Tailwind palette (gray/slate/zinc/neutral/stone-N)
— these should be theme tokens (bg-card, bg-muted, text-foreground,
text-muted-foreground, border-border) instead.

Apply one-shot codemod (scripts/migrate-theme-tokens.mjs) that
replaces:
  bg-gray-800/900        → bg-card
  bg-gray-600/700        → bg-muted (with opacity preserved)
  border-gray-600..900   → border-border
  text-gray-800/900      → text-foreground
  text-gray-300          → text-foreground/90
  text-gray-400/500/700  → text-muted-foreground
  placeholder-gray-*     → placeholder:text-muted-foreground/60
  bg/border-white/N      → bg-muted/N, border-border/N
  text-white/70-90       → text-foreground
  text-white/40-60       → text-muted-foreground
  text-white/10-30       → text-muted-foreground/70

42 files touched; biggest: presi/deck/[id] (91 subs), uload/analytics
(58), uload/+page (53), presi/+page (47), who/PlayView (35),
skilltree/Edit+AddXpModal (28 each), context/* (115 across 4 pages),
uload/links+tags (50 across 2).

Brand-literal overlays in moodlit/components/mood/{MoodFullscreen,
MoodCard,CreateMoodDialog}.svelte stay unmigrated — they render on
vivid colour gradients. Validator exempts these 3 files from the
white-alpha rule; they still obey the neutral-palette rule.

Result: 527 files pass validate:theme-tokens; svelte-check clean with
0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:42:55 +02:00
Till JS
a2a43b1d5a refactor(theming): migrate 6 ListViews + ai-missions badges to theme tokens
Replace raw white-alpha Tailwind utilities (text-white/x, bg-white/x,
border-white/x) with canonical theme tokens (text-foreground, bg-muted,
border-border, etc.) in cards, context, food, moodlit, storage, music
ListViews. Replace hardcoded hex badge/dot/phase colors in ai-missions
with success/warning/error/primary tokens.

Fix two transition-all bugs (food:160, moodlit:223) that prevented CSS
custom property colors from resolving on first paint under theme switches.

Add scripts/validate-theme-tokens.mjs to prevent regression; run via
pnpm run validate:theme-tokens. Not yet in validate:all — 12 modules
still use raw white utilities (citycorners, guides, inventory, memoro,
picture, plants, playground, presi, questions, times, uload, who).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:23:55 +02:00
Till JS
1861e89d45 chore(broadcast): wire mana-mail into env pipeline + push schema
The three final pre-dogfood items:

1. drizzle.config: schemaFilter now includes 'broadcast' alongside
   'mail'. Without this, `bun run db:push` skipped the broadcast
   tables — schema existed in code but not in Postgres. Tested via
   db:push + psql \dt (3 tables created: campaigns, events, sends).

2. .env.development: new MANA-MAIL SERVICE section with Stalwart
   knobs + broadcast config (tracking secret, rate limits, send
   throttle). DEV secret is explicitly labelled non-production —
   prod rotates via env.

3. generate-env.mjs: new block writes services/mana-mail/.env on
   `pnpm setup:env`. Mirrors the invoices / research / events
   pattern. All 16 broadcast/mail vars flow through from SSOT.

Verified end-to-end:
- pnpm setup:env → services/mana-mail/.env contains
  BROADCAST_TRACKING_SECRET + rate limits
- bun run src/index.ts → /health returns 200 with the new config
- psql → broadcast.campaigns / events / sends are materialised

Broadcast module is now fully ready to send real mail — nothing
else required before the first dogfood campaign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:21:57 +02:00
Till JS
5ec1dfc747 chore(db): enforce pgSchema isolation with a lint script
The "every Drizzle table uses pgSchema" rule was documented in
.claude/guidelines/database.md (added yesterday as part of Concern 5)
but enforced only by convention. A new service could slip a raw
\`pgTable()\` past review and collide in the default \`public\` schema
of \`mana_platform\`, and nothing would surface the mistake until a
production migration failed.

- \`scripts/validate-pg-schema-isolation.mjs\` scans every tracked
  TypeScript file under services/, apps/api/, packages/ for call sites
  of \`pgTable(\` (not imports — imports can still be useful for types).
  Strips comments before matching so doc-examples like "use \`pgTable()\`"
  don't trigger false positives.
- Wired as \`pnpm run validate:pg-schema\` and a new CI step in the
  validate job (right after the turbo-recursion check). 721 files
  scan clean today.
- Removed an unused \`pgTable\` import in mana-subscriptions that would
  have been the only import of the symbol remaining after this change.
- Updated .claude/guidelines/database.md — the old verification blurb
  said "no automated lint rule yet", now points at the enforcer.

Drift verified: injecting a synthetic \`pgTable('bad', {})\` into
subscriptions.ts failed with a clear file:line violation pointing at
the database guideline.

Closes the "no automated lint rule" gap noted in the database guideline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:45:59 +02:00
Till JS
1eda3f5395 chore(turbo): lint against recursive \turbo run\ calls in child packages
CLAUDE.md flagged this as "CRITICAL" — a child package.json defining
e.g. \`"build": "turbo run build"\` causes a 10+ minute CI hang with
thousands of duplicate task spawns. The rule was documented but never
enforced, so it re-emerged every couple of months as someone copied a
parent script pattern.

- \`scripts/validate-no-recursive-turbo.mjs\` walks every tracked
  package.json (via \`git ls-files\`, so node_modules is auto-skipped)
  and fails if any non-root package has build/type-check/lint/test/
  test:coverage/check scripts containing \`turbo run\`. \`dev\` stays
  allowed — delegating it from a parent is the intended ergonomic.
- Wired as \`pnpm run validate:turbo\` + a new CI step in the validate
  job (before type-check — fails fast).
- CLAUDE.md §Turborepo updated to point at the enforcer and call out
  the full task list (test/test:coverage/check were missing from the
  original prose).

Verified: 138 non-root package.json files scan clean. Drift simulation
(injecting \`"build": "turbo run build"\` into apps/mana/apps/web) fails
with a clear message pointing at the offending file + script + fix.

This closes audit item #32 from the architecture review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:39:32 +02:00
Till JS
c7af693c6d feat(crypto): Phase C — build-time registry ↔ Dexie audit
Before: adding a new Dexie table left the encryption decision implicit.
If you forgot to register it, the table silently shipped in plaintext
forever — no error, no warning, no footprint anywhere. The architecture
audit flagged this as the root of Concern 1.

- `scripts/audit-crypto-registry.mjs` parses database.ts's `.stores()`
  blocks and registry.ts's entries, then enforces three invariants:
    1. Every Dexie table is either in the encryption registry OR in the
       new `plaintext-allowlist.ts` — one conscious classification per
       table.
    2. No dead registry entries (referring to tables that no longer
       exist in Dexie).
    3. No table appears in both — single authoritative source.
- `plaintext-allowlist.ts` auto-seeded from current state. 105 entries,
  each tagged `// TODO: audit` as an invitation to review whether the
  table truly holds nothing sensitive. The allowlist is intentionally
  a separate file so additions are reviewable on their own (not buried
  inside database.ts schema bumps).
- Wired into `pnpm run check:crypto` + CI validate job — a new table
  now fails the PR check instead of slipping past review.
- `check:crypto:seed` regenerates the allowlist if ever needed.

Verified: drift simulation (removing aiMissions from the allowlist)
fails the audit with a clear message pointing at the missing
classification. Current state passes: 187 Dexie tables, 82 encrypted,
105 explicit plaintext.

Concern 1 is now fully closed (A: typed registry entries, B: dev-mode
runtime drift check, C: build-time audit enforcing coverage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 14:36:32 +02:00
Till JS
97abd251e3 fix(events): Eventbrite provider — switch from dead API to web scraping
Eventbrite shut down their public Event Search API (/v3/events/search)
in 2023. The provider now uses the website extractor pipeline
(mana-research + LLM) to scrape Eventbrite's public search pages.
No API key needed — same pipeline as any website source.

Also adds mana-events to generate-env.mjs for automatic .env generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 16:51:58 +02:00
Till JS
2bdb48bdd1 feat(research): add mana-research service — Phase 1 + 2
New Bun/Hono service on port 3068 that bundles many web-research providers
behind a unified interface for side-by-side comparison. All eval runs
persist in research.* (mana_platform) so quality can be reviewed later.

Providers (Phase 1+2):
  search:  searxng, duckduckgo, brave, tavily, exa, serper
  extract: readability (via mana-search), jina-reader, firecrawl

Endpoints:
  POST /v1/search, /v1/search/compare       — single + fan-out
  POST /v1/extract, /v1/extract/compare     — single + fan-out
  GET  /v1/runs, /v1/runs/:id               — history
  POST /v1/runs/:run/results/:id/rate       — manual eval
  GET  /v1/providers, /v1/providers/health  — catalog + readiness

Auto-routing: when `provider` is omitted, queries are classified via regex
(fast path, 0ms) with optional mana-llm fallback, then routed to the first
available provider for that query type (news → tavily, academic → exa,
semantic → exa, etc.).

Credits: server-key calls go through mana-credits reserve → commit/refund
so failed provider calls don't charge the user. BYO-keys supported via
research.provider_configs (UI arrives in Phase 4).

Cache: Redis with graceful degradation (1h TTL for search, 24h for
extract). Pay-per-use APIs only — no subscription-gated providers.

Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:42:25 +02:00
Till JS
fc028fa8f0 chore(lint): audit:theme-tokens guard against bare --muted / --theme-* drift
Three naming conventions had drifted through the monorepo (--muted, --theme-*,
--color-*). Only the last is defined in the Mana theme; the others silently
fell back to nothing and stopped tracking theme variants. Today's cleanup
migrated ~100 files, but nothing stopped the drift from creeping back.

- scripts/audit-theme-tokens.mjs scans ~3k source files and fails if any
  references a bare shadcn token or a --theme-* prefix, with an allowlist
  for known-literal module brand colors (news-research, agent templates)
- wire into pnpm script and lint-staged (runs once per commit touching
  *.{svelte,css}, ignores per-file args)
- design-ux.md guideline: fix stale --color-destructive entry (Mana uses
  --color-error), add explicit "never bare tokens" warning with examples

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:58:13 +02:00
Till JS
4c8034f9d0 chore(dev): seed real credit balance in setup-dev-user.sh
The shared-hono credits client returns DEFAULT_BALANCE=1000 when
/api/v1/internal/credits/balance/:userId responds with no row, so
local-dev accounts silently diverge from production — credit-gated
flows look free in dev and only blow up after deploy. Seeding a
real credits.balances row makes the fallback unreachable and the
dev stack exercises the same code path as prod.

Default is 10_000 credits (overridable via CREDITS env var) and
is applied alongside the existing tier + role + sync-gift upserts,
so setup-dev-user.sh stays a single idempotent pass. Existing dev
accounts (tills95, tilljkb, rajiehq) were backfilled manually
once; re-running the script won't clobber a higher balance
because the ON CONFLICT uses GREATEST.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:51:39 +02:00
Till JS
53fb3232f3 chore(dev): also grant role=admin in setup-dev-user.sh
Admin-gated backend endpoints (e.g. POST /api/v1/admin/sync/:id/gift,
GET /api/v1/admin/users/:id/tier) check auth.users.role === 'admin',
which is orthogonal to access_tier. The script was already lifting
every dev account to tier=founder but left role at the 'user'
default, so founders couldn't exercise the admin UI flows against
their local stack. Wire role alongside tier (both via env-overridable
defaults) and reflect it in the success output so re-runs surface
what's being applied.

Backfilled the existing three dev accounts (tills95, tilljkb,
rajiehq) to role=admin manually once; re-running the script now is
idempotent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:18:31 +02:00
Till JS
0ef650d94a chore(dev): run mana-credits locally and gift sync to dev users
Two halves of the same "why is sync inactive in dev" fix:

- package.json: new dev:credits script and mana-credits added to
  the dev:mana:servers concurrently group. The service was never
  started by pnpm dev:mana:all, so the frontend's
  GET /api/v1/sync/status failed, syncBilling.load() caught the
  error and defaulted to inactive — while mana-sync (Go) was
  actually fail-open on the billing check, making the UI
  indicator lie about the backend state.
- scripts/dev/setup-dev-user.sh: after the existing
  email-verify + tier-lift UPDATE, upsert a row into
  credits.sync_subscriptions with is_gifted=true. Mirrors what
  POST /api/v1/admin/sync/:id/gift would do, so every new dev
  user gets Cloud Sync from the first login without a separate
  admin call. The credits schema lives inside mana_platform, so
  no new database needed — just a second statement in the same
  psql heredoc.

Existing dev users (tills95, tilljkb, rajiehq) were backfilled
manually with the same INSERT … ON CONFLICT DO UPDATE once;
future runs of setup-dev-user.sh stay idempotent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:09:26 +02:00
Till JS
0bf01f434e feat(mana-ai): Prometheus /metrics endpoint + status.mana.how integration
Wires mana-ai into the existing observability stack so tick throughput,
plan-failure rates, planner latencies, and snapshot refresh health are
visible in Grafana + Prometheus, and the service's uptime surfaces on
status.mana.how under the "Internal" section.

- `src/metrics.ts` — prom-client Registry with `mana_ai_` prefix.
  Counters: ticks_total, plans_produced_total, plans_written_back_total,
  parse_failures_total, mission_errors_total, snapshots_new/updated,
  snapshot_rows_applied_total, http_requests_total.
  Histograms: tick_duration_seconds (0.1–120s), planner_request_
  duration_seconds (0.25–60s), http_request_duration_seconds (0.005–10s).
- `src/index.ts` — HTTP middleware labels every request by
  method/path/status; `/metrics` serves the Prometheus text format.
- `src/cron/tick.ts` — increments counters + wraps the tick with
  `tickDuration.startTimer()`. Snapshot stats fold through.
- `src/planner/client.ts` — wraps `complete()` in a latency histogram
  timer so planner tail latency shows up separately from tick duration.
- `docker/prometheus/prometheus.yml` —
  1. New `mana-ai` scrape job against `mana-ai:3066/metrics` (30s).
  2. `/health` added to the `blackbox-internal` job so uptime shows on
     status.mana.how alongside mana-geocoding.
- `scripts/generate-status-page.sh` — friendly label for the new probe:
  `mana-ai:3066/health` → "Mana AI Runner" (generator already iterates
  `blackbox-internal`, no other changes needed).
- `package.json` — prom-client ^15.1.3

All 17 Bun tests still pass; tsc clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 01:41:40 +02:00
Till JS
851a281e5a refactor: rename zitare -> quotes (Zitate)
Zitare was opaque Latin/Italian-flavored branding. Renamed to clear
English "quotes" (DE: Zitate) matching short-concrete-noun cluster.

- Module, routes, API, i18n, standalone landing app, plans dirs
- Dexie tables: quotesFavorites, quotesLists, quotesListTags,
  customQuotes (dropped redundant "quotes" prefix on the last)
- Logo QuotesLogo, theme quotes.css, search provider, dashboard
  widget QuoteWidget
- German user-facing label "Zitate" (English brand stays Quotes)

Pre-launch, no data migration needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:59:16 +02:00
Till JS
7c1c6cd54c chore(audit): module complexity reports + workbench map
Adds four audit scripts (module health, inter-module coupling, per-function
cognitive complexity, D3 treemap) with generated reports under docs/ and
an iframe-embedded workbench app at /admin/complexity. Reports regenerate
weekly via the module-health GitHub Action.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:47:42 +02:00
Till JS
53b3746b98 refactor: rename nutriphi module to food (Essen)
Complete rename across the entire monorepo pre-launch:
- Module, routes, API, i18n, standalone landing app directories
- All code identifiers, display names, logo component
- German user-facing label: "Essen" (English brand stays "Food")
- Dexie table nutriFavorites -> foodFavorites
- Infra configs (docker-compose, cloudflared, nginx, wrangler)

Zero residue of nutriphi remains. No data migration needed (pre-launch).

Follow-up: run pnpm install, update Cloudflare DNS
(food.mana.how), rename Cloudflare Pages project.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:30:07 +02:00
Till JS
a2f05409a4 chore(mail): add infra — port 3042, DB schema setup, pnpm install
Reserves port 3042 in PORT_SCHEMA.md, adds mail pgSchema to
setup-databases.sh and init-db scripts, installs mana-mail workspace
dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 20:42:12 +02:00
Till JS
a9c51517eb fix(presi): wire up db:push for presi schema via @mana/api
The presi module's schema was defined inline in routes.ts but had no
working db:push mechanism — the old references to @presi/server and
@presi/backend no longer exist after consolidation. Extracts schema
into its own file, adds a dedicated drizzle config, and updates the
setup script so tables are actually created.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 14:32:44 +02:00
Till JS
3ac3a4bae4 fix(status-page): drop set -e — heredoc subshells trigger silent exits 2026-04-11 17:33:35 +02:00
Till JS
6f975a5cbe fix(status-page): use ${TIER_APPS:-} for set -u safety 2026-04-11 17:27:09 +02:00
Till JS
56a9811263 fix(status-page): replace multi-line awk with shell loop for ash compatibility
The TIER_JSON generator used a multi-line awk script embedded in a
\$() command substitution with escaped double quotes inside a
single-quoted awk program. Alpine's ash shell refused to parse this,
reporting "syntax error: unterminated quoted string". Under set -e
the syntax error killed the script BEFORE the jq call that writes
status.json, so the file stopped updating after our monitoring changes
triggered a full re-parse cycle.

Replace the awk block with a portable while-read shell loop that ash
handles cleanly. Verified with both `bash -n` and `alpine:3.20 sh -n`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:23:29 +02:00
Till JS
c47ce83e83 fix(geocoding): proxy Pelias health through wrapper for monitoring
blackbox-exporter can't resolve host.docker.internal on Colima, so
probes of host.docker.internal:4000 and :9200 always fail. Instead,
add a /health/pelias endpoint on the Hono wrapper that proxies to
the Pelias API, and update prometheus.yml to probe the wrapper's
proxied health endpoint.

Also simplifies the status page friendly_name() now that we don't
need to display the host.docker.internal targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:45:43 +02:00
Till JS
957060ca55 feat(monitoring): add mana-geocoding + Pelias to prod compose, Prometheus, Grafana, and status.mana.how
Production deployment + observability for the self-hosted geocoding stack:

**docker-compose.macmini.yml**
- New mana-geocoding container (port 3018, internal-only — no traefik
  labels, no Cloudflare route). Uses host.docker.internal to reach the
  Pelias API on the host's pelias compose stack. Dockerfile added under
  services/mana-geocoding/ using the same Bun/Hono pattern as mana-events.

**Prometheus**
- New blackbox-internal job probing mana-geocoding:3018/health, the
  Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch
  at host.docker.internal:9200/_cluster/health. Kept separate from
  blackbox-api which is reserved for public HTTPS endpoints.

**status.mana.how (generate-status-page.sh)**
- Include blackbox-internal in the metric query and add an "Interne
  Dienste" section with its own summary card, right between Infrastruktur
  and GPU Dienste. Summary grid goes from 4 to 5 columns with a
  900px breakpoint.
- friendly_name() now handles http:// URLs and rewrites container-name
  hosts like mana-geocoding:3018/health → "Mana Geocoding",
  host.docker.internal:4000 → "Pelias API",
  host.docker.internal:9200 → "Pelias Elasticsearch".

**Grafana uptime dashboard**
- Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel
- New "Interne Dienste Status" table panel showing per-instance up/down
- New "Geocoding Ø Latenz" stat panel for probe_duration_seconds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:11:01 +02:00
Till JS
5647b2f8ae fix(dx): suppress AZURE_OPENAI_API_KEY warning, honest db:push reporting
- docker-compose: add empty default for AZURE_OPENAI_API_KEY to suppress
  Docker Compose "variable is not set" warning
- setup-databases.sh: detect when pnpm filter matches no packages and
  report "Skipped" instead of false "Schema pushed" success

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 18:36:35 +02:00
Till JS
83eaf71e9f fix(macmini): clean up container conflicts in build-app.sh restart cycle
Some checks are pending
CI / Build mana-api-gateway (push) Blocked by required conditions
CI / Build mana-crawler (push) Blocked by required conditions
CI / Build mana-media (push) Blocked by required conditions
CI / Build mana-credits (push) Blocked by required conditions
CI / Build mana-web (push) Blocked by required conditions
CI / Build chat-backend (push) Blocked by required conditions
CI / Build chat-web (push) Blocked by required conditions
CI / Build todo-backend (push) Blocked by required conditions
CI / Build todo-web (push) Blocked by required conditions
CI / Build calendar-backend (push) Blocked by required conditions
CI / Build calendar-web (push) Blocked by required conditions
CI / Build clock-web (push) Blocked by required conditions
CI / Build contacts-backend (push) Blocked by required conditions
CI / Build contacts-web (push) Blocked by required conditions
CI / Build presi-web (push) Blocked by required conditions
CI / Build storage-backend (push) Blocked by required conditions
CI / Build storage-web (push) Blocked by required conditions
CI / Build telegram-stats-bot (push) Blocked by required conditions
CI / Build nutriphi-backend (push) Blocked by required conditions
CI / Build nutriphi-web (push) Blocked by required conditions
CI / Build skilltree-web (push) Blocked by required conditions
Docker Validate / Validate Dockerfiles (push) Waiting to run
Docker Validate / Build calendar-web (push) Blocked by required conditions
Docker Validate / Build todo-backend (push) Blocked by required conditions
Docker Validate / Build todo-web (push) Blocked by required conditions
Docker Validate / Build zitare-web (push) Blocked by required conditions
Docker Validate / Build mana-auth (push) Blocked by required conditions
Docker Validate / Build mana-sync (push) Blocked by required conditions
Docker Validate / Build mana-media (push) Blocked by required conditions
Mirror to Forgejo / Push to Forgejo (push) Waiting to run
Hit "container name already in use" / "removal in progress" errors
three times during today's Phase 5 deploys. The previous restart
pattern was just `compose up -d --no-deps`, which fails when:

  1. A previous interrupted recreate left a stale container under
     the canonical name. The new `up` tries to claim the name and
     gets a conflict.
  2. Compose's recovery from #1 sometimes creates a hash-prefixed
     orphan container (`<hash>_<container_name>`), which then
     blocks the next clean run too.
  3. Even `--force-recreate` can't always handle the case because
     the old container is in the middle of being removed when the
     new one is being created (race).

Two-step replacement that's reliable across all three failure modes:

  Step 1 — `docker compose rm -fs SERVICES`
    Stops + force-removes the canonical compose-managed container.
    Idempotent: does nothing if already gone. Filters out the
    "No stopped containers" log noise so the output stays clean.

  Step 2 — orphan sweep via `docker rm -f`
    For each service, look up its container_name from the
    compose config (falls back to the service name if not set),
    then `docker ps -aq --filter name=^${cname}$` for the canonical
    one and `name=_${cname}$` for hash-prefixed orphans. Anything
    found gets nuked. This catches the case where compose's own
    state has lost track of an orphan it created earlier.

  Step 3 — `docker compose up -d --no-deps --remove-orphans`
    Creates the fresh container. The `--remove-orphans` flag also
    silences the "Found orphan containers ([mana-game-whopixels])"
    warning we kept seeing — that's a leftover from a removed
    service that nobody had cleaned up.

The container_name extraction uses awk on `compose config` output
(verified locally: `mana-web` → `mana-app-web`) so the script doesn't
need a hard-coded service→container mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 20:22:52 +02:00
Till JS
6c0f88f5a2 chore(infra): pre-commit validator for cloudflared-config.yml
Adds scripts/validate-cloudflared-config.mjs — a node-only validator
that lint-staged runs whenever cloudflared-config.yml is staged. The
goal is to catch the same failure modes that
`cloudflared tunnel ingress validate` would catch on the server, but
without requiring cloudflared to be installed on every dev box.

Checks:
  - YAML parses
  - tunnel: is a uuid
  - credentials-file: ends with .json and contains the tunnel id
    (warning when it doesn't — likely an out-of-sync remnant from a
    previous rebuild, exactly the failure mode that bit us in the
    first locally-managed switch)
  - ingress: is a non-empty array
  - every rule except the last has both hostname AND service
  - the LAST rule is the catch-all `service: http_status:NNN`
  - no duplicate hostnames (the most common copy-paste mistake)
  - service URLs look like http(s):// / ssh:// / http_status:NNN
    / unix:/ / hello_world
  - hostnames are lowercase dot-separated DNS labels (no spaces, no
    weird characters)

Wired into lint-staged.config.js with a single glob entry; the
existing eslint + prettier flow is unchanged.

Tested against the live cloudflared-config.yml (passes, 51 hostnames)
and a synthetic broken file (catches all 6 categories of error +
the credentials-file/tunnel id drift warning).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 18:02:51 +02:00
Till JS
77b2d1eb32 chore(infra): smarter tunnel rebuild — apex via API + sane probes
Two improvements to scripts/mac-mini/rebuild-tunnel.sh based on what
the first prod run actually surfaced.

═══ 1. Apex domain auto-fix via Cloudflare API ═══

`cloudflared tunnel route dns` cannot route the apex of a zone
(error code 1003: "An A, AAAA, or CNAME record with that host already
exists"). The CLI has no command to delete those records. The first
rebuild left mana.how returning 530 because the script silently
failed to route it and we had to fix the apex manually in the
dashboard.

The new `apex_route_via_api()` helper:
  - Detects apex hostnames by dot count (one dot → two-label name)
  - Uses $CLOUDFLARE_API_TOKEN if available
  - Resolves the zone id by name
  - Deletes any existing A / AAAA / CNAME records on the apex
  - Creates a fresh proxied CNAME pointing at <tunnel>.cfargotunnel.com
  - Cloudflare's CNAME flattening at the apex makes this work
    transparently

If $CLOUDFLARE_API_TOKEN is not set, the script logs a warning at the
top of step 6 and falls back to the old behavior (route fails, user
fixes the apex manually). The token needs Zone:DNS:Edit on the
target zone.

═══ 2. Smarter HTTP verification ═══

The first run reported "5 hosts down (404/000)" but those were all
backend services without a root handler — credits/media/llm/mana-api
all return 404 at `/` and 200 at `/health`. The verify pass was
flagging healthy services as down and made the rebuild look more
broken than it was.

New `probe_host()` tries `/health` first, falls back to `/` only if
/health returned 4xx, and prefers a 2xx/3xx root response over a 4xx
/health. `probe_is_down()` only counts 5xx and 000 (libcurl error)
as failures — anything in 1xx-4xx means the request reached the
origin and the tunnel routing is correct, which is the actual thing
the verify pass cares about. `probe_label()` adds a one-word health
summary so the verify log reads "200 ok" / "401 auth required" /
"404 routed (no handler)" / "530 tunnel error" instead of just bare
status codes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:52:40 +02:00
Till JS
bd231cd689 feat(api/web): wire-format envelope versioning + Anthropic prompt-cache hints
Two related AI-infrastructure hardenings landing together because both
touch the same nutriphi/planta route definitions:

═══ 1. Wire-format schema versioning ═══

Adds AI_SCHEMA_VERSION + AiResponseEnvelope<T> in @mana/shared-types so
every AI structured-output endpoint speaks a single envelope dialect:

    { schemaVersion: '1', data: <validated object> }

Backend wraps via a small `envelope()` helper in each module's routes.ts;
frontend api.ts unwraps via `unwrapEnvelope<T>()` which throws an
AiSchemaVersionMismatchError if the server returns a version this
client wasn't compiled against.

Why this matters before launch:
  - Catches stale-cache scenarios immediately ("client v1 talking to
    server v2") with an actionable error in the network panel, not a
    cascade of "field is undefined" bugs further down the stack
  - Forces explicit version bumps when we make non-additive schema
    changes — the bump rules are documented inline next to the constant
  - Cheap to remove if it ever feels overkill: drop the envelope() call
    on the backend and the unwrapEnvelope on the frontend, ~10 lines

═══ 2. Anthropic prompt-caching directive (forward-compat) ═══

Adds `providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } }`
on the system message in nutriphi + planta routes via a SYSTEM_CACHE_HINT
constant. This is a NO-OP today because:
  - mana-llm currently routes to Gemini, not Claude
  - Our system prompts are ~50 tokens, well under Anthropic's 1024-token
    cache minimum

Kept anyway because it's ~5 lines per route and lights up automatically
when either condition flips (e.g. when we add per-user dietary preferences
as system context, pushing prompts past the threshold). The day we point
mana-llm at Claude Sonnet, every existing call site already has caching
enabled — no scavenger hunt through the routes.

System messages had to migrate from the `system:` shorthand to a full
messages[] entry to attach providerOptions, which is a tiny readability
loss but the only way to get per-message metadata into the AI SDK.

═══ Tests ═══

13 new cases in apps/mana/apps/web/.../nutriphi/ai-schemas.test.ts cover:
  - AI_SCHEMA_VERSION presence + AiSchemaVersionMismatchError shape
  - MealAnalysisSchema acceptance/rejection (confidence bounds, missing
    nutrients, optional food fields, default empty arrays)
  - PlantIdentificationSchema (every-field-optional design, defaults,
    confidence range)

(Test file lives in the web app rather than packages/shared-types
because the latter has no test runner configured — adding vitest there
just for these would be overkill.)

Total nutriphi + planta suite: 62/62 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:17:18 +02:00
Till JS
3993400013 chore(infra): make cloudflared-config.yml the single source of truth
Reconciles the in-repo cloudflared-config.yml with the actually-loaded
ingress map on the Mac Mini production tunnel — the previous repo file
was missing 30+ hostnames (per-app subdomains, mana-api, sync, llm,
media, credits, subscriptions, etc.) because it was last updated
before the unified Mana web app rollout. Adds the new mana-api.mana.how
ingress for apps/api on port 3060 so the unified backend has a public
client URL for the SvelteKit web app's PUBLIC_MANA_API_URL_CLIENT.

Drops the dead matrix.mana.how / element.mana.how routes — the matrix
subsystem was removed in 2514831a3 and those services no longer exist.

Adds scripts/mac-mini/sync-tunnel-config.sh — the one-command flow for
shipping a tunnel-config change: pull on the server, validate the
yaml, kickstart cloudflared via launchctl. setup-cloudflared-service.sh
already wires the launchd plist with --config <repo-path> pointing at
this file, so a fresh Mac Mini install + setup script + sync script
gives you a fully reproducible tunnel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:37:21 +02:00
Till JS
9ef97a1877 feat(news): backend ingester service + curated feed API
Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON
feeds into news.curated_articles every 15 min, with Mozilla Readability
fallback for thin RSS bodies and 30-day retention. apps/api /feed is
rewritten to read from the new pool table directly instead of the
sync_changes hack, with topics/lang/since/limit/offset query params.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:53:26 +02:00
Till JS
45790ffbb8 refactor(mana): rename inventar → inventory across the codebase
The workbench-registry app id 'inventar' did not match its
@mana/shared-branding MANA_APPS counterpart 'inventory', so the tier-
gating join in apps/web/src/lib/app-registry/registry.ts silently
failed for the inventory module — it fell into the "no MANA_APPS
entry, default visible" fallback and was effectively un-gated. The
codebase had also voted overwhelmingly for 'inventar' (53 files) vs
'inventory' (3 files in shared-branding), so the long-standing
mismatch was just bookkeeping debt waiting to bite.

Pre-release, no live data, so the cleanest fix is to align everything
on the English 'inventory':

- Workbench-registry id, module.config.ts appId, module folder, route
  folder and i18n locale folder all renamed via git mv
- Standalone apps/inventar/ workspace package renamed
- All imports, store identifiers (InventarEvents → InventoryEvents,
  INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto
  paths follow the rename
- The German display label "Inventar" is preserved everywhere it is a
  user-visible string (page titles, i18n values, toast labels)
- Dexie table prefixes (invCollections, invItems, …) are unchanged
- Drive-by fix: ListView.svelte was querying non-existent
  inventarCollections/inventarItems tables — corrected to the actual
  invCollections/invItems names from module.config
- The "inventar ↔ inventory id mismatch" workaround comment in
  registry.ts is removed since the mismatch no longer exists

module-registry.ts also picks up the user's parallel newsModuleConfig
addition because both edits land in the same import block — keeping
them split would have left the build in an inconsistent state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:50:24 +02:00
Till JS
38a02b77a8 chore(dev): add setup-dev-user script for local founder accounts
Local mana-auth has no built-in admin seed and `requireEmailVerification`
turned on with no real SMTP — every developer ends up writing the same
"register + UPDATE auth.users" SQL incantation by hand. Bundles it
into one idempotent script + a pnpm alias.

  pnpm setup:dev-user                       # creates 3 default accounts
  ./scripts/dev/setup-dev-user.sh foo bar   # creates / repairs one

What it does per user:
  1. POST /api/v1/auth/register on mana-auth (so Better Auth's
     signUpEmail handles password hashing the way the runtime
     expects — no hand-rolled scrypt)
  2. UPDATE auth.users SET email_verified = true, access_tier = 'founder'
     so the new user can immediately log in AND exercise every
     tier-gated module without a tier upgrade dance

Idempotent: existing users get tier + verification re-applied without
touching the password. Re-running after a partial setup is safe.

Defaults to three accounts (tills95 / tilljkb / rajiehq @gmail.com,
all with password "Aa-123456789") so the next dev doesn't have to
remember anything. Override via `TIER=alpha` / `DB_HOST=...` env
vars when needed.

Two preflight gates fail loud: psql in PATH + mana-auth reachable
on :3001. ON_ERROR_STOP=1 in psql so a bad SQL run doesn't get
silently swallowed.

Replaces the dangling `seed:dev-user` package.json alias that pointed
at a `pnpm --filter @mana/auth db:seed:dev` script that was never
created — clean rename to `setup:dev-user` to match the existing
`setup:env` / `setup:db` family.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:23:15 +02:00
Till JS
30766f153e fix(macmini): auto-rebuild stale sveltekit-base before per-app web builds
NOTE: the previous commit 048184bef carried this commit message but
accidentally bundled an unrelated PickerOverlay refactor instead of
this script change (lint-staged stash interaction). This is the
actual fix.

Per-app web Dockerfiles do `FROM sveltekit-base:local` and do NOT
re-COPY packages/shared-* — those packages are baked into the base
image. So a change to packages/shared-utils, packages/shared-ui, etc.
only reaches the live web app if the base image is also rebuilt.

This bit us THREE times on 2026-04-08 alone:
1. CSP fix in shared-utils ('wasm-unsafe-eval') sat unused in
   production for over an hour because every `build-app.sh mana-web`
   reused the cached base layer with old shared-utils.
2. The BaseListView export in shared-ui after the ListView
   consolidation refactor — mana-web's build failed because Rollup
   couldn't resolve the new symbol from the stale base.
3. Same shape, different package, repeatedly during the Gemma 4
   migration push.

The pattern is identical every time and the manual workaround
(`build-app.sh --base` first) is something you only think to run if
you already know how the layering works. Make the script catch it.

New `is_base_image_stale` helper compares the base image's `Created`
timestamp against the latest git commit touching paths the base image
actually depends on (packages/, docker/Dockerfile.sveltekit-base,
pnpm-lock.yaml). When building any *-web service, if the image is
stale or missing, the base is rebuilt automatically before the
per-app build kicks off, with the triggering commit's oneline
printed for transparency.

Date parsing handles macOS Docker's local-TZ-offset RFC3339 format
(`...+02:00`, not Z). We strip from char 19 onward and parse the
literal local clock time with BSD date (no -u). GNU date is the
fallback for Linux dev boxes. If parsing fails for any reason we
conservatively force a rebuild rather than risk shipping stale code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 22:35:46 +02:00
Till JS
d941ff2231 fix(mana-auth): account lockout was structurally dead + add failure-path tests
While adding negative-path integration tests for the auth flow I
discovered that *neither* of the lockout primitives in
services/mana-auth/src/services/security.ts has actually been
working in production. Two independent silent failures that combined
into a "the lockout never triggers, ever" outcome:

1. recordAttempt() inserted into auth.login_attempts with explicit
   `id = gen_random_uuid()`, but auth.login_attempts.id is a
   `serial integer` column with `nextval('auth.login_attempts_id_seq')`
   as default. The UUID-into-integer cast threw a type error every
   single time, the bare `catch {}` swallowed it as "non-critical",
   and not a single login attempt was ever persisted. Lockout's "5
   failures in 15 min" check was running against an empty table.

2. checkLockout() built `attempted_at > ${new Date(...)}` via the
   drizzle sql template, but postgres-js cannot bind a JS Date object
   directly — it tries to byteLength() the parameter and crashes with
   `Received an instance of Date`. Same anti-pattern: bare `catch`,
   returns `{locked: false}` (fail-open), no log, completely invisible.

Both are "silent broken since the encryption-vault series of changes"
class — caught only because the integration test for the lockout flow
expected the 6th login attempt to return 429 and got 200 instead.

Fixes:
- recordAttempt(): drop the bogus `id` column from the INSERT (let the
  sequence default assign it), default ipAddress to null instead of
  letting `${undefined}` collapse the parameter slot, and surface
  errors in the catch instead of swallowing them silently.
- checkLockout(): pass `windowStart.toISOString()` instead of the Date
  object so postgres-js can serialize it. Same catch upgrade — log the
  cause when failing open.

Failure-path test additions (tests/integration/auth-failures.test.ts):
- wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events,
  +1 row in auth.login_attempts
- account lockout: 5 failed attempts then 6th returns 429 with
  remainingSeconds, even with the correct password
- unverified email login: 403 with code = EMAIL_NOT_VERIFIED
- validate with garbage token: valid !== true
- resend verification: second mail arrives in mailpit

Plus the run-integration-tests.sh helper now runs both .test.ts files
and tests/integration/package.json's `test` script does the same.

Negative-control: reverted the recordAttempt fix (re-added the bogus
gen_random_uuid id), the wrong-password test failed at the
login_attempts assertion. Reverted the checkLockout fix, the lockout
test failed at the 429 assertion. Both fixes verified to be load-bearing.

6 tests, 45 expects, ~1.3s on a warm cache.
2026-04-08 18:29:00 +02:00
Till JS
c5e5963cbe fix(macmini): repair container auto-recovery (broken --env-file path)
Two unrelated bugs in scripts/mac-mini/ensure-containers-running.sh,
both caught while debugging a mana-auth crash loop on 2026-04-08:

1. The recovery path passed --env-file "$PROJECT_ROOT/.env.macmini" to
   docker compose, but that file has never existed on the server — only
   .env does, and compose auto-loads it from the working directory. The
   explicit --env-file silently caused recovered containers to start with
   empty secrets (e.g. blank MANA_AUTH_KEK), which made mana-auth crash
   the moment it came back up. The auto-recovery loop was therefore
   self-defeating: it kept "fixing" auth into the same broken state
   every 5 minutes for hours, with no notification because compose
   exited 0. Drop --env-file entirely and cd into PROJECT_ROOT so
   compose's standard .env discovery applies.

2. mana-infra-minio-init is a one-shot job container that legitimately
   sits in "exited" state after running once. The script flagged it as
   "stuck" every cycle, tried to "recover" it, and spammed the log with
   ERROR lines. Add an explicit ONESHOT_INIT_CONTAINERS allowlist and
   skip those names in both the initial scan and the post-recovery
   verification.

Also tee compose output into the log so future failures actually leave
a breadcrumb instead of disappearing into the void.

Also: bump @mlc-ai/web-llm from a transitive dep (via @mana/local-llm)
to a direct dep of @mana/web. SvelteKit's adapter-node post-build
Rollup pass uses the web app's direct deps as its externals heuristic;
without this entry it warns "@mlc-ai/web-llm ... could not be resolved
- treating it as an external dependency" on every build. Functionally
harmless (the dynamic import in LocalLLMEngine only fires in the
browser), but the warning hid a real adapter-node misconfiguration
that would have bitten us if we'd ever tried to SSR /llm-test.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:17:31 +02:00
Till JS
4fce6a3ede feat(env): persistent dev secrets via .env.secrets override
Local dev secrets like MANA_STT_API_KEY had no persistent home — they
lived only in the gitignored, generator-overwritten per-app .env files.
Every `pnpm setup:env` wiped them, so devs had to re-paste keys after
any env regeneration. Same recurring friction for MANA_LLM_API_KEY,
MANA_AUTH_KEK, OAuth keys, etc.

New layer: `.env.secrets` at the repo root.

- Gitignored, optional, never required for the build to pass
- Read by generate-env.mjs AFTER .env.development; non-empty values
  override the matching key, so the merged result drives every per-app
  .env the generator writes
- Empty values fall through to the .env.development defaults — a
  freshly-copied .env.secrets.example is a no-op
- One source of truth for all dev secrets, propagated to every app
  with one `pnpm setup:env`

Files:
- `.env.secrets.example` — committed template documenting all known
  secret keys (mana-stt, mana-llm, auth KEK, sync JWT, MinIO, third-
  party APIs). Devs `cp .env.secrets.example .env.secrets` and fill in.
- `.gitignore` — ignores .env.secrets, allows .env.secrets.example
- `scripts/generate-env.mjs` — loads .env.secrets if present, prints
  "Loaded N secrets from .env.secrets" so devs see the override
  taking effect
- `scripts/setup-secrets.mjs` + `pnpm setup:secrets` — convenience
  script that SSHes to mana-server, greps the prod .env for the keys
  defined in .env.secrets.example, and writes them locally. Confirms
  before overwriting an existing .env.secrets unless --force is set;
  reports which keys couldn't be found on the remote so devs know
  what's left to fill manually
- `docs/LOCAL_DEVELOPMENT.md` + `docs/ENVIRONMENT_VARIABLES.md` —
  walk-through and architecture diagram update

Verified end-to-end:
- `rm .env.secrets apps/mana/apps/web/.env && pnpm setup:env` →
  STT key empty (no regression for devs who haven't opted in)
- `pnpm setup:secrets --force && pnpm setup:env` →
  STT key propagated, "Loaded 3 secrets from .env.secrets" in output
- POST /api/v1/voice/transcribe with a real audio file →
  full transcript back via gpu-stt.mana.how, end-to-end working

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 17:50:37 +02:00