Commit graph

640 commits

Author SHA1 Message Date
Till JS
39a6f42209 fix(mana-credits): correct pnpm workspace filter (@mana/credits-service, not @mana/credits)
Build was succeeding-by-luck because the wrong filter resolved to
nothing → pnpm installed all workspace deps. After Phase 3.A added
the new grant route, the install pruning must have changed enough
that the build started failing with /app/node_modules: not found.
Fix the filter to match the real package name.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:27:18 +02:00
Till JS
dbe24acfc4 feat(feedback,credits): community-credit grants — +5 submit / +500 ship / +25 reaction-match
Phase 3.A des feedback-rewards-and-identity-Plans. Direkter Reziprozitäts-
Loop: User kriegt sofort etwas zurück fürs Mitwirken, Originalwunsch-
Eulen werden beim Ship belohnt, Reagierer kriegen einen Anteil.

mana-credits:
- Neuer Endpoint POST /api/v1/internal/credits/grant + grantCredits()
  Service-Methode mit Idempotency via metadata.referenceId.
- transaction_type-Enum erweitert um 'grant' (eigener Typ statt
  Mismatch mit 'refund').
- Migration 0001_grant_transaction_type.sql + partial-Index auf
  metadata->>'referenceId' für O(log n) Idempotency-Lookup.

mana-analytics:
- FeedbackService stempelt sofort +5 Credits beim createFeedback (top-
  level only, Replies bekommen nichts), wenn Mindest-20-Zeichen erfüllt
  und Rate-Limit (10/User/24h via feedback_grant_log) nicht überschritten.
- adminUpdate triggert beim FRISCHEN Übergang nach 'completed':
  +500 Credits an Original-Wisher + +25 an alle, die mit 👍 oder 🚀
  reagiert haben. Doppel-Pay strukturell unmöglich via referenceId
  (`<id>_shipped`, `<id>_reaction_<userId>`).
- Founder-Whitelist via FEEDBACK_FOUNDER_USER_IDS env (verhindert
  Self-Reward).
- Drop voteCount-Spalte (durch reactions/score seit 0002 ersetzt).
- Migration 0003_grant_log_drop_vote_count.sql idempotent, lokal +
  prod eingespielt.

Plan: docs/plans/feedback-rewards-and-identity.md (Phase 3.A-3.F).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:13:46 +02:00
Till JS
099cac4a01 feat(auth): explicit bootstrap-singletons endpoint + idempotent functions (F4 robust)
The F4 server-side singleton bootstrap was fire-and-forget at signup
time — a transient mana_sync outage during registration would leave the
user with no singleton and only the in-store `getOrCreateLocalDoc()`
fallback to race on the first write. The signup-hook is still the
happy-path zero-latency bootstrap; this commit adds a deliberate
reconciliation path that converges on every boot.

- Idempotent `bootstrapUserSingletons` / `bootstrapSpaceSingletons`:
  both functions now existence-check sync_changes before INSERT and
  return boolean (true=inserted, false=skipped).
- New endpoint `POST /api/v1/me/bootstrap-singletons` — JWT-gated under
  the existing `/api/v1/me/*` prefix. Provisions the caller's
  userContext and the kontextDoc for every Space they're a member of.
  Returns `{ ok, bootstrapped: { userContext, spaces: { id: bool } } }`.
- Webapp `(app)/+layout.svelte` calls the endpoint once per
  authenticated boot, after `restoreClientIdFromDexie()` and before
  `createUnifiedSync.startAll()`. Best-effort; failures swallow into a
  console warning and the in-store fallback still covers the rare
  race window.

Plan: docs/plans/sync-field-meta-overhaul.md (F4-robust row).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:38:14 +02:00
Till JS
3df7391905 feat(auth): bootstrap per-Space kontextDoc on Space-creation (F4 follow-up)
Symmetrically extends the F4 server-side singleton bootstrap to the
per-Space `kontextDoc`. Every Space-creation — Personal at signup and
brand/club/family/team/practice via the org plugin — now writes an empty
kontextDoc row straight into mana_sync.sync_changes with origin='system',
client_id='system:bootstrap'. Fresh clients pull the row instead of
racing on a local insert that the next pull would clobber.

- New `bootstrapSpaceSingletons(spaceId, ownerUserId, syncSql)` in
  services/mana-auth/src/services/bootstrap-singletons.ts; shared
  `buildFieldMeta` helper extracted.
- `createBetterAuth(databaseUrl, syncDatabaseUrl, webauthn)` now takes
  the sync-DB URL and lazy-creates a module-scoped postgres pool for
  the bootstrap inserts.
- Hook into `databaseHooks.user.create.after` (only on `created: true`
  from createPersonalSpaceFor) and `organizationHooks.afterCreateOrganization`.
- Webapp `kontextStore.ensureDoc()` made private as `getOrCreateLocalDoc()` —
  same fallback role as userContextStore's after F5. Public API is now just
  setContent + appendContent.

Plan: docs/plans/sync-field-meta-overhaul.md (F4-fu row in Shipping Log).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 01:21:31 +02:00
Till JS
5387681936 fix(mana-analytics): copy shared-logger into installer (transitive workspace dep)
shared-hono depends on @mana/shared-logger; without it, the bun runtime
crashes on first import with ENOENT for the workspace symlink target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:52:33 +02:00
Till JS
db01ddccc7 fix(mana-analytics): multi-stage Dockerfile that resolves workspace deps
Switches the build context to repo-root so the pnpm-workspace install
can pull in @mana/shared-hono. Mirrors the mana-auth/mana-ai pattern
(node+pnpm installer stage → bun runtime stage).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:50:25 +02:00
Till JS
8b0a943e71 feat(mana-analytics): pseudonym + reactions + public feed + admin
Macht mana-analytics zur Backend-Heimat des Public-Community-Hubs.

- Pseudonym: createDisplayHash(userId, secret) + generateDisplayName()
  produzieren deterministisch "Wachsame Eule #4528" pro User. 100
  Adjektive × 80 Tieren × 10000 Suffixe → ~80M Kombinationen. 7 unit
  tests, 0 PII im Output.
- Schema-Erweiterung user_feedback: display_hash, display_name,
  module_context, parent_id (1-level Threading), reactions jsonb
  (cached emoji→count), score (cached weighted sort).
- feedback_votes ersetzt durch feedback_reactions (Slack-Pattern:
  unique(feedback_id, user_id, emoji), pro User mehrere Emojis möglich).
- Service: createFeedback stempelt display_hash + display_name. Neue
  Methoden getPublicFeed (redacted), getReplies, toggleReaction
  (rebuilt reactions+score). Admin-Methoden adminListAll/adminUpdate
  founder-tier-gated im Route-Layer.
- Routes:
  /api/v1/public/feedback/*  — anonymous reads (kein Auth, kein
                              userId/displayHash/deviceInfo im Output)
  /api/v1/feedback/*         — auth-required Submit/React/Manage,
                              plus :id/replies, :id/react, /admin
- Config: neuer FEEDBACK_PSEUDONYM_SECRET-Env-Var seedet die Hashes;
  Rotation re-keyt nur künftige Pseudonyme, alte Records bleiben stabil.

Migration 0002_public-community-foundation.sql idempotent, lokal +
prod (mana-server) eingespielt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 00:00:35 +02:00
Till JS
c07db300b0 feat(sync): F4 — server-side singleton bootstrap
Closes the userContext race-on-first-mount that surfaced as a
"10 fields overwritten" conflict toast pre-F2. Adds a fire-and-forget
hook in the /register flow that writes the per-user `userContext`
singleton straight into `mana_sync.sync_changes` with
`client_id='system:bootstrap'` and `origin='system'`.

Behavior:
- On successful `signUpEmail`, `bootstrapUserSingletons(userId, syncSql)`
  inserts a `profile/userContext` row with the empty-default shape that
  mirrors the webapp's `emptyUserContext()` factory in
  `apps/mana/apps/web/src/lib/modules/profile/types.ts`.
- The receiving client treats the change as origin='server-replay'
  on apply (per F2 conflict-gate), so no toasts on first pull.
- Failure is logged but does not abort registration — the webapp's
  existing `ensureDoc()` fallback still works during the F4→F5
  transition.

Module-scoped postgres pool (max=2 connections) lazy-initialized on
first signUp; reused for the lifetime of the process. Same pattern as
`UserDataService.getSyncSql`.

Out of scope for F4:
- `kontextDoc` is per-Space (not per-user) — bootstrap there will be
  hooked into the Space-creation flow, not /register. The webapp's
  `ensureDoc()` for kontextDoc stays as-is for now.
- Webapp `ensureDoc()` removal is F5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:18:54 +02:00
Till JS
6bb9d77be9 feat(sync): F3 — drop updatedAt as a synced data field
Removes `updatedAt` from the wire protocol and from every Local-prefixed
record type. Replaced by two orthogonal mechanisms — deriveUpdatedAt()
for read-side public-facing values, _updatedAtIndex shadow for indexed
sorts.

Local-side:
- New `_updatedAtIndex` shadow column. Stamped by the Dexie creating /
  updating hook on every write. Stripped from the pending-change payload
  so it never travels to mana-sync. Indexed in Dexie v53 on the 22 tables
  that previously indexed `updatedAt`.
- `deriveUpdatedAt(record)` in sync.ts returns max(__fieldMeta[*].at) so
  the public-facing Task / Note / etc. shape keeps an `updatedAt: string`
  property without holding it as data.
- Type-converters across ~60 module/queries.ts and types.ts files now
  call `deriveUpdatedAt(local)` instead of reading `local.updatedAt`.

Module-store sweep:
- Regex codemod removed `updatedAt: new Date().toISOString()` /
  `: now` / `: now()` / `: nowIso()` stamping from 121 store files
  (~382 call sites total). Single-property update calls
  (`{ updatedAt: now }`) collapsed to `{}`; touch-only patterns
  (writing/drafts, writing/generations) kept the call as a no-op
  because the hook now stamps `_updatedAtIndex` automatically on
  any Dexie modification.
- Local* interfaces stripped of `updatedAt: string` (43 types.ts files).
  Public-facing types (Task, Note, Mission, Agent, …) keep
  `updatedAt: string` as a computed read-side property.
- Companion's chat conversation now sorts on a real
  `lastMessageAt` data field instead of touching `updatedAt`.
- Session-only stores (times/session-alarms, session-countdown-timers)
  stamp `updatedAt: now` directly because they're not in Dexie and
  have no field-meta layer to derive from.

Sync engine:
- applyServerChanges sets `_updatedAtIndex` itself when applying
  server changes (max of server-field times for updates, recordTime
  for inserts) so server-replays land orderable.
- Dropped the legacy `localUpdatedAt` fallback — every record now has
  `__fieldMeta`, the per-field at is the canonical source.
- Soft-delete tombstone path stops stamping `updatedAt: serverTime`,
  uses `_updatedAtIndex` instead.

Server-side:
- mana-ai iteration-writer no longer emits `updatedAt` in
  sync_changes.data; receivers derive it from the field-meta map.
- mana-sync types: no change (the wire format already uses
  `field_meta` / `at` from F1).

Out of scope: backend Drizzle schemas (mana-credits, mana-events, …)
keep their `updated_at` columns. Those are pure server-internal — not
part of the sync_changes / __fieldMeta mechanism F3 cleans up.

Tests + checks:
- 0 svelte-check errors over 7652 files.
- 29/29 sync.test.ts (vitest).
- 61 mana-ai bun tests.
- mana-sync go test ./... cached green.

Plan: docs/plans/sync-field-meta-overhaul.md F3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 23:12:22 +02:00
Till JS
f10439e369 fix(mana-analytics): point migration at public.feedback_status, not feedback.*
Drizzle's pgEnum() ohne pgSchema-Wrap landet immer in public — der
schemaFilter versteckt das nur im Diff (siehe Repo-Memory:
"Drizzle enums with schemaFilter must use pgSchema().enum()"). Die
Tabelle feedback.user_feedback referenziert die Enums quer aus public,
das funktioniert; aber die ALTER-TYPE-Statements in der ursprünglichen
Migration zielten auf feedback.feedback_status / feedback.feedback_category
und hätten damit nichts gefunden.

Lokal verifiziert (mana_platform.public.feedback_status,
mana_platform.public.feedback_category):
- 6 Status-Werte umbenannt → submitted/under_review/planned/in_progress/completed/declined
- Default-Status auf 'submitted'
- Category 'onboarding-wish' hinzugefügt
- Re-Run idempotent (DO-Blöcke + ADD VALUE IF NOT EXISTS)

Mittelfristig sollte feedbackSchema.enum(...) verwendet werden, damit
Enums tatsächlich im feedback-Namespace landen — eigener Refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:55:45 +02:00
Till JS
ba6274edbe refactor(feedback): align package + DB enums, plan central hub
Macht @mana/feedback zur SSOT für alle Nutzer-Feedback-Categories und
-Status — Voraussetzung dafür, dass Onboarding-Wishes, NPS, Churn-Feedback
etc. künftig dort landen.

- Status-Enum: DB-Werte umbenannt new/reviewed/done/rejected →
  submitted/under_review/completed/declined (Package gewinnt). PG≥10
  ALTER TYPE … RENAME VALUE ist non-destructive.
- Category 'praise' ins Package aufgenommen (war nur in DB).
- Category 'onboarding-wish' neu in Package + DB für den Wish-Step.
- Default status in DB: 'new' → 'submitted'.
- CreateFeedbackInput.isPublic optional → Service reicht durch, default
  bleibt true; private Categories wie onboarding-wish setzen false.
- Schema-Datei mit SSOT-Kommentar versehen, der Drift in Zukunft verhindert.

Hand-authored Migration unter services/mana-analytics/drizzle/0001_*.sql
weil drizzle-kit push Enum-Werte nicht zuverlässig umbenennt. Manuell
einspielen vor nächstem db:push:

  psql "\$DATABASE_URL" -f services/mana-analytics/drizzle/0001_align-feedback-enums.sql

Plan in docs/plans/feedback-hub.md (Phase 0–4); Phase 0 + 1 jetzt, 2-4
deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:52:25 +02:00
Till JS
7766ea5021 docs(plans): mark llm-fallback-aliases SHIPPED, add M-by-M commit table
All 5 milestones landed today in one continuous session: registry,
health cache, fallback router, observability, and consumer migration.
115 service-side tests, validator covers 2538 files.
2026-04-26 21:27:57 +02:00
Till JS
fea3adf5fe feat(llm-aliases): M5 — migrate consumers to MANA_LLM aliases
Final milestone of docs/plans/llm-fallback-aliases.md. Every backend
caller now requests models via the `mana/<class>` alias system instead
of hardcoded `ollama/...` strings. mana-llm resolves aliases through
`services/mana-llm/aliases.yaml` with health-aware fallback (M3) and
emits resolved-model + fallback metrics (M4).

SSOT moved to `packages/shared-ai/src/llm-aliases.ts` so apps/api,
apps/mana/apps/web, and services/mana-ai all import the same
`MANA_LLM` constant via the existing `@mana/shared-ai` workspace
dependency. Three additional sites (memoro-server, mana-events,
mana-research) inline the alias string with a SSOT comment because
they don't pull @mana/shared-ai today.

Migrated 14 sites across 10 files:
- apps/api: writing(LONG_FORM), comic(STRUCTURED), context(FAST_TEXT),
  food(VISION), plants(VISION), research orchestrator (3 tiers
  collapsed to STRUCTURED+FAST_TEXT/LONG_FORM)
- apps/mana/apps/web: voice/parse-task + parse-habit (STRUCTURED)
- services/mana-ai: planner llm-client + tick.ts (REASONING)
- services/mana-events: website-extractor (STRUCTURED, inlined)
- services/mana-research: mana-llm client (FAST_TEXT, inlined)
- apps/memoro/apps/server: ai.ts (FAST_TEXT, inlined)

Legacy env-vars removed: WRITING_MODEL, COMIC_STORYBOARD_MODEL,
VISION_MODEL, MANA_LLM_DEFAULT_MODEL. The chain in aliases.yaml is
now the single tuning surface; SIGHUP reloads it without redeploys.

New `scripts/validate-llm-strings.mjs` regex-scans 2538 files for
hardcoded `<provider>/<model>` strings and fails the build if any
land outside the SSOT or the explicitly-allowed paths (image-gen
modules, model-inspector code, this validator itself, the registry).
Wired into `validate:all` next to the i18n + theme validators.

Verified: `pnpm validate:llm-strings` clean, `pnpm --filter @mana/api
type-check` clean, `pnpm --filter @mana/ai-service type-check`
clean. Web type-check has 2 pre-existing errors in
SettingsSidebar.svelte (i18n MessageFormatter type drift, last
touched in 988c17a67 — unrelated to this work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:26:03 +02:00
Till JS
8a49e3ffd5 feat(mana-llm): M4 — observability, debug endpoints, SIGHUP reload
- `X-Mana-LLM-Resolved: <provider>/<model>` header on non-streaming
  responses. Streaming clients read the same info from each chunk's
  `model` field (SSE headers go out before the chain is walked).
- Three new Prometheus metrics: `mana_llm_alias_resolved_total{alias,
  target}` (which concrete model an alias resolved to per request),
  `mana_llm_fallback_total{from_model, to_model, reason}` (each
  fallback transition), `mana_llm_provider_healthy{provider}` (gauge,
  mirrors the circuit-breaker).
- New debug endpoints: `GET /v1/aliases` (registry inspection — chain
  + description per alias, useful for confirming SIGHUP reloads),
  `GET /v1/health` (full per-provider liveness snapshot — failure
  counter, last error, unhealthy-until backoff).
- `kill -HUP <pid>` reloads `aliases.yaml`. Parse errors leave the
  previous good state in memory and log the rejection.
- `ProviderHealthCache.add_listener()` for cache→metrics decoupling:
  the gauge is updated via a transition-only listener wired in main.py
  rather than the cache importing prometheus_client itself.
- Request-side metrics now use the requested model string, success-side
  uses the resolved one. So `mana_llm_llm_requests_total{provider="ollama",
  model="gemma3:12b"}` reflects actual upstream load even when callers
  used `mana/long-form` aliases.

16 new observability tests (test_m4_observability.py): listener
fire-on-transition semantics, exception-isolation, multi-listener,
counter increments, gauge writes, end-to-end alias→metric flow,
v1/aliases + v1/health endpoint shape, response.model carries the
resolved target after fallback. Total suite: 115/115 in 1.6s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:52:28 +02:00
Till JS
3046da3b19 feat(mana-llm): M3 — health-aware router with alias + chain fallback
Replaces the old Ollama→Google special-case auto-fallback with the
unified pipeline: caller passes either a direct provider/model or an
alias from the `mana/` namespace; the router resolves to a chain and
walks it skipping unhealthy providers (per ProviderHealthCache from M2),
trying each entry, marking provider unhealthy on retryable errors and
falling through to the next.

Retryable: ConnectError, ReadTimeout, RemoteProtocolError, 5xx,
ProviderRateLimitError. Propagated (don't fall back, don't poison the
cache): ProviderCapabilityError, ProviderAuthError, ProviderBlockedError,
4xx, unknown exception types. The cache stays "what the network told us
about this provider's liveness" — caller errors don't muddy that signal.

Streaming: pre-first-byte fallback only. Once a chunk has been yielded
the provider is committed; mid-stream errors propagate as-is so we
don't splice two voices into one output.

`NoHealthyProviderError` (HTTP 503) carries a structured attempt log —
each chain entry shows up as `(model, reason)` so the cause of a 503
is visible in the response and metrics, not only in service logs.

main.py wires the lifespan: aliases.yaml is loaded, ProviderHealthCache
created, ProviderRouter takes both as constructor deps, HealthProbe
spawned with cheap HTTP probes per configured provider (Ollama
/api/tags, OpenAI-compat /v1/models with Bearer header). Google is
skipped — google-genai SDK has no obvious cheap probe; the call-site
fallback handles real errors.

22 new router tests (test_router_fallback.py): chain walking, capability
& auth propagation, 5xx vs 4xx differentiation, rate-limit retry,
all-fail → NoHealthyProviderError, direct provider strings bypass
aliases, streaming pre-first-byte fallback, mid-stream-failure does
NOT fall back, empty stream commits without retry, cache feedback on
success/failure/non-retryable. Existing test_providers.py updated for
the new constructor signature; all 99 service tests green via the dev
container (Python 3.12).

Legacy purged: `_ollama_concurrent`, `_ollama_health_cache`,
`_can_fallback_to_google`, `_should_use_ollama`, `_fallback_to_google`,
`_get_ollama_health_cached` all gone. The `auto_fallback_enabled` /
`ollama_max_concurrent` settings remain in config.py for now (M5 will
remove them along with the per-feature env-var overrides).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:44:16 +02:00
Till JS
59557e62d7 feat(mana-llm): M2 — ProviderHealthCache + background probe loop
Per-provider liveness with circuit-breaker semantics. The router (M3)
will read `is_healthy()` to skip dead providers in a chain; the probe
loop and the call-site fallback handler write state via
`mark_healthy` / `mark_unhealthy`.

State machine: 1st failure stays healthy (transient blips happen);
2nd consecutive failure trips the breaker and sets a 60s backoff
window during which `is_healthy → False`. After the window the
provider is half-open again — next call exercises it, success
resets, failure re-arms.

HealthProbe is the background asyncio.Task that pings every
registered provider every 30s with a 3s timeout. Probes run
concurrently per tick and one bad probe can't sink the loop. Probe
functions are injected (`{name: async-fn}`) so this module stays
decoupled from the provider classes — the wiring lives in main.py
where we already know which providers are configured.

32 new tests (FakeClock for deterministic backoff timing, slow-probe
helpers for parallelism + timeout, lifecycle tests for start/stop
idempotency and tick-after-error survival). 64/64 alias+health tests
green.

Not yet wired into the request path — that's M3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:29:57 +02:00
Till JS
dff8629e1d feat(mana-llm): M1 — AliasRegistry + aliases.yaml SSOT
First milestone of the LLM-fallback plan (docs/plans/llm-fallback-aliases.md).
Introduces the `mana/<class>` namespace; the registry parses + validates
aliases.yaml at startup and reloads on demand. Schema-rejects empty
chains, missing provider prefixes, alias names outside the reserved
namespace, default→unknown references, etc.

Reload semantics: parse error keeps the previous good state in memory
so a typo + SIGHUP doesn't take the service down.

5 aliases ship with the initial config: fast-text, long-form, structured,
reasoning, vision. Each chain ends with a cloud provider so the system
keeps working when the GPU server is offline.

32 unit tests covering happy path, schema validation, namespace check,
reload safety, and a guard that the shipped aliases.yaml itself parses.
M2 (health-cache + probe-loop) and M3 (router fallback execution) build
on this; aliases are not yet wired into the request path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 20:23:51 +02:00
Till JS
dff02d24a9 fix(mana-media): HEIC uploads from Chrome — sniff + transcode at the edge
iPhone HEIC photos uploaded through Chrome on macOS landed as
`mimeType: application/octet-stream` because Chrome doesn't recognise
the HEIC MIME and `file.type` was empty. The transform endpoint then
refused with `Transform only supported for images` (HTTP 400) and
the wardrobe Try-On flow surfaced this as `mana-media transform
failed for <id>: HTTP 400`. Even fixing the MIME wouldn't have been
enough — sharp's prebuilt binary ships the heif container format
without a HEVC decoder plugin (libde265 is omitted for patent
reasons), so the actual decode would still throw.

Three-part fix at the upload edge:

1. New `services/sniff.ts` — magic-byte sniffer for image MIMEs.
   Reads the first ~16 bytes and recognises JPEG, PNG, GIF, WebP,
   BMP, TIFF, HEIC, HEIF, AVIF. Returns `null` for everything else
   so the caller can fall back to whatever the browser claimed.

2. Upload route — sniffs every upload before passing the buffer to
   `uploadService.upload`. Trusts magic bytes over `file.type` so
   Chrome's empty-type HEIC still lands with `image/heic`. Removes
   the entire class of `application/octet-stream` rows for files
   that are obviously images.

3. HEIC/HEIF transcoded to JPEG at upload via the new
   `heic-convert` dependency (pure-JS WASM, no system libs needed).
   The original buffer is replaced with the JPEG bytes, the MIME
   becomes `image/jpeg`, and the filename's `.HEIC` extension is
   rewritten to `.jpg`. Downstream code (process pipeline, transform
   endpoint, sharp) then deals exclusively with formats sharp can
   actually decode. Failure path returns HTTP 500 with a clear
   `HEIC conversion failed` error so the client knows it wasn't a
   generic crash.

Bonus, transform endpoint hardening: `mimeType.startsWith('image/')`
gate now also accepts a row whose stored MIME is wrong (legacy
`application/octet-stream` from before this fix) when the actual
bytes sniff as an image. Lets old broken rows still serve where
the format itself is decodable; the upload-side fix prevents new
ones from existing.

Sharp 0.33 on this machine reports `heif: 1.18.2` for the container
but rejects the actual HEVC compressed bitstream — confirmed by the
exact error string `No decoding plugin installed for this
compression format (11.6003)`. Going through `heic-convert` first
sidesteps that entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 13:46:13 +02:00
Till JS
e66654068f feat(auth): error-classification layer + passkey end-to-end
Two interlocking fixes driven by a production lockout incident.

## Bug that motivated this

A fresh schema-drift column (auth.users.onboarding_completed_at) made
every Better Auth query crash with Postgres 42703. The /login wrapper
swallowed the non-2xx and mapped it onto a generic "401 Invalid
credentials" AND bumped the password lockout counter — so 5 legit
login attempts against a broken DB would have locked every real user
out of their own account. Same wrapper pattern on /register, /refresh,
/reset-password etc. The 30-minute hunt ended in a one-off repro
script that finally surfaced the real Postgres error.

The user-facing passkey button additionally returned generic 404s on
every login-page mount because the route wasn't registered (the DB
schema existed, the Better Auth plugin wasn't wired).

## Phase 1 — Error classification (services/mana-auth/src/lib/auth-errors)

- 19-code AuthErrorCode taxonomy (INVALID_CREDENTIALS, EMAIL_NOT_VERIFIED,
  ACCOUNT_LOCKED, SERVICE_UNAVAILABLE, PASSKEY_VERIFICATION_FAILED, …)
- classifyFromResponse/classifyFromError handle: Better Auth APIError
  (duck-typed on `name === 'APIError'`), Postgres errors (23505 unique,
  42703/08xxx → infra), ZodError, fetch/ECONNREFUSED network errors,
  bare Error, unknown.
- respondWithError routes the structured response, logs at the right
  level, fires the correct security event, and CRITICALLY only bumps
  the lockout counter for actual credential failures — SERVICE_UNAVAILABLE
  and INTERNAL never touch lockout.
- All 12 endpoints in routes/auth.ts refactored (/login, /register,
  /logout, /session-to-token, /refresh, /validate, /forgot-password,
  /reset-password, /resend-verification, /profile GET+POST,
  /change-email, /change-password, /account DELETE).
- Fixed pre-existing auth.api.forgetPassword typo (→ requestPasswordReset).
- shared-logger + requestLogger middleware wired in index.ts; all
  console.* calls in the service removed.

## Phase 2 — Passkey end-to-end (@better-auth/passkey 1.6+)

- sql/007_passkey_bootstrap.sql: idempotent schema alignment —
  friendly_name→name, +aaguid, transports jsonb→text, +method column
  on login_attempts.
- better-auth.config.ts: passkey plugin wired with rpID/rpName/origin
  from new webauthn config section. rpID defaults to mana.how in prod
  (from COOKIE_DOMAIN), localhost in dev.
- routes/passkeys.ts: 7 wrapper endpoints (capability probe,
  register/options+verify, authenticate/options+verify with JWT mint,
  list, delete, rename). Each routes errors through the classifier;
  authenticate/verify promotes generic INVALID_CREDENTIALS to
  PASSKEY_VERIFICATION_FAILED.
- PasskeyRateLimitService: in-memory per-IP (options: 20/min) and
  per-credential (verify: 10 failures/min → 5 min cooldown) buckets.
  Deliberately separate from the password lockout — different factor,
  different blast radius.
- Client: authService.getPasskeyCapability() async probe, memoised per
  session. authStore.passkeyAvailable reactive state. LoginPage gates
  on === true so a slow probe doesn't flash the button in.
- AuthResult grew a code: AuthErrorCode field; handleAuthError in
  shared-auth prefers the server envelope over the legacy message
  heuristics.

## Tests

- 30 unit tests for the classifier covering every branch (including
  the exact Postgres 42703 shape that started this).
- 9 unit tests for the rate limiter.
- 14 integration tests for the auth routes — the regression test
  explicitly asserts "upstream 500 → 503 + zero lockout bumps".
- 101 tests pass, 0 fail, 30 pre-existing skips unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:52:51 +02:00
Till JS
5aecf8b90d feat(onboarding): M2 — route guard + shell + Screen 1 (name)
- PATCH /api/v1/me/profile in mana-auth (name, image with 1–80 char
  validation) — powers the Screen-1 save
- (app)/+layout.svelte:
  * isOnboarding derived from pathname
  * handleAuthReady loads onboardingStatus, redirects brand-new users
    to /onboarding/name (fire-and-forget so sync/data-layer init keeps
    running in parallel)
  * chrome (PillNav, wallpaper, bottom-stack) hidden in onboarding mode;
    AuthGate still wraps so the flow enforces authentication
- /onboarding/+layout.svelte: full-viewport shell with progress dots
  (1/3, 2/3, 3/3) and a skip-all that marks the flow complete and
  sends the user home
- /onboarding/+page.svelte: redirects bare entry to /onboarding/name
- /onboarding/name/+page.svelte: text input (1–40 chars), Enter = Weiter,
  skip falls back to email local-part so Screen 2's greeting is never
  empty

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:49:52 +02:00
Till JS
5a92e1168b feat(onboarding): M1 — data model + endpoints + client store
- auth.users: new nullable `onboarding_completed_at` column
- new /api/v1/me/onboarding routes: GET, POST /complete, PATCH /reset
- onboardingStatus Svelte store in the web app that reads/writes via
  those endpoints (no JWT claim so completing the flow takes effect
  without a token re-mint)
- docs/plans/onboarding-flow.md adjusted: no backfill (launch without
  existing users), better-auth `name` clarified, 7 templates including
  "Arbeit" confirmed

Foundation for the 3-screen first-login flow (Name → Look → Templates).
No UI and no route guard yet — those ship in M2 when the redirect target
actually exists. Schema change is a pure column-add, applied via
`pnpm --filter @mana/auth db:push`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 22:24:49 +02:00
Till JS
f7536bc0b9 feat(shared-ai): route compactor to Haiku-tier model by default (M2.5)
compactHistory() now defaults to DEFAULT_COMPACT_MODEL =
'google/gemini-2.5-flash-lite' when the caller doesn't override. Lite
is ~3–5x cheaper than gemini-2.5-flash with near-identical
summarisation quality — summarisation doesn't need the same tier as
reasoning + tool-calling, and the compactor fires exactly when token
spend is highest, so the cheaper route saves exactly where it matters.

CompactHistoryOptions.model is now optional. All three consumers
(mana-ai tick, webapp Companion, webapp Mission runner) drop their
explicit gemini-2.5-flash override and let the default apply.

This is the pragmatic M2.5: no mana-llm changes. The "tier" abstraction
(X-Model-Tier header, env-routed aliases) from the Claude-Code report
makes sense only once multiple utility tasks need cheaper routing —
topic-detection, classification, command-injection checks. Today only
the compactor wants it, and a model constant is the simplest contract
that works.

2 new tests (default applied + override honoured). 79 shared-ai tests
green, all three consumers type-check clean. One pre-existing unrelated
type error in apps/mana/apps/web/src/lib/modules/wardrobe/queries.ts
(not touched by this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 18:26:50 +02:00
Till JS
89388fb369 refactor(mana-auth): move enums from public to auth schema
pgEnum() defaults to the public schema. Because
drizzle.config.ts sets schemaFilter: ['auth'], push introspection
never saw the enums and kept re-emitting CREATE TYPE access_tier ...,
failing with 42710. This blocked setup-databases.sh from advancing
mana-auth past the enum declarations and silently masked other drift
(e.g. the new `kind` column on auth.users going un-pushed).

Source side: three enums now live on authSchema via
authSchema.enum(...) instead of pgEnum(...). DB side: migration 006
recreates access_tier / user_role / user_kind inside the auth schema,
repoints auth.users.access_tier and auth.users.role via ::text cast
(preserving all data and defaults), and drops the old public types.

After this, `drizzle-kit push --force` reports "No changes detected"
on a clean DB and the broader `pnpm setup:db` run is green without
workarounds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:39 +02:00
Till JS
52f53c844b chore(mana-auth): add 005 persona tables migration
Documents the SQL that was applied manually to match the personas.ts
Drizzle schema introduced in 493db0c3b. Idempotent. See
docs/plans/mana-mcp-and-personas.md for the design. Required because
the spaces tables created alongside personas sit outside the auth
schemaFilter, and pre-existing public enums would otherwise trip
drizzle-kit push (resolved separately in migration 006).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:26 +02:00
Till JS
72f7978ed4 feat(agent-loop): expose compactionsDone + compactedReminder producer
Closes the loop on M2: when the compactor fires, the LLM needs to know
it's now seeing a <compact-summary> instead of raw turns so it
doesn't waste a turn asking about lost details or re-executing tools
whose responses are gone.

shared-ai:
  - LoopState grows `compactionsDone: number` (cap-1 by current loop
    policy, but shape kept as count for future multi-compact cycles).
  - runPlannerLoop populates it on each reminder-channel call. New
    loop test asserts [0, 1] sequence: round 1 before compaction,
    round 2 after.

mana-ai:
  - New producer `compactedReminder` — fires severity=info when
    compactionsDone >= 1, wrapped in a German one-liner ("frag nicht
    nach verlorenen Details").
  - Injected FIRST in buildReminderChannel so the LLM frames the rest
    of the round with "I'm looking at a summary" context. Metric
    surface stays `{producer='compacted', severity='info'}`.

4 new reminder tests (3 pure producer + 1 composition-ordering) +
1 loop-wiring test. 77 shared-ai, 20 reminders.test.ts — green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:36:21 +02:00
Till JS
eb8fac23ec fix(personas): exact tool_use_id pairing + CI drift audit
Two loose ends from M3/M4:

1. Tool_use_id-based error attribution in the persona-runner
-----------------------------------------------------------
The previous collectActionsFromMessage() flipped the *most recent*
ActionRow to 'error' when a tool_result carried is_error:true. That was
fine as long as Claude invoked tools strictly in sequence, but when
the planner pipelines multiple tools in one turn, a later tool_result
carries an earlier tool_use_id — the last-action fallback mis-
attributes the error.

runMainTurn() now keeps a tool_use_id → action-index Map for the
duration of the tick. On tool_use we stash block.id, on tool_result we
look up the exact ActionRow via tool_use_id and flip that one. The
"flip last" path survives as a pure fallback if a future SDK ever
ships a block without an id.

2. New audit:encrypted-tools script
-----------------------------------
scripts/audit-encrypted-tools.ts — loads registerAllModules() and
apps/mana/…/crypto/registry.ts, diffs every ToolSpec.encryptedFields
against the authoritative web-app ENCRYPTION_REGISTRY.

Catches three classes of drift:
- missing-table : tool declares a table the web-app doesn't encrypt
- field-drift   : both agree a table is encrypted but the field lists
                  differ (half-encryption in the wire is silent death)
- disabled      : web-app has enabled:false while the tool still
                  encrypts — advisory warning, not a fail

Negative-tested by injecting a deliberate drift on todo.create +
todo.list (shortened ENCRYPTED_FIELDS to ['title']); the auditor
flagged both tools with full field diffs, restore returned to green.

Wired into `pnpm run validate:all` so the contract survives future
edits on either side. Fills the M4 audit gap noted in
project_mana_mcp_personas.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:34:52 +02:00
Till JS
83a4606a9a feat(mana-ai): wire context-window compactor into mission runner (M2.3)
The Claude-Code wU2 pattern goes live. Every mission run now passes a
compactor into runPlannerLoop that will fire once if cumulative token
usage crosses 92% of MANA_AI_COMPACT_MAX_CTX (default 1_000_000, the
gemini-2.5-flash ceiling). Override via env for deployments on smaller
models; set to 0 to disable entirely.

The compactor reuses the planner's own LlmClient + gemini-2.5-flash
model for now. When mana-llm grows a Haiku tier we'll route the
compactor there — it's pure summarisation and a cheaper model saves
tokens exactly where they matter.

New metrics:
  - mana_ai_compactions_triggered_total — counter, one per firing
  - mana_ai_compacted_turns — histogram, how many middle turns got
    folded each time (< 3 ⇒ maxCtx is probably misconfigured)

Logs print a 60-char tail of the summary.goal so the "what was this
mission doing again" question survives a compaction.

No new tests here — compactHistory and the loop wiring are already
covered by the 22 tests in shared-ai (M2.1 + M2.2). The 57 existing
mana-ai bun tests stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:28:20 +02:00
Till JS
5a5e24f582 chore(docker): drop obsolete services/mana-search/docker-compose.dev.yml
Superseded by the top-level docker-compose.dev.yml (which defines
searxng + redis as part of the unified dev stack via `pnpm docker:up`).
This per-service file was an artefact from before the unified setup
and no script / doc / README still references it.

An orphan `mana-searxng-dev` + `mana-search-redis-dev` had been running
from this file for ~2 weeks, squatting on the host's port 8080. Every
first `pnpm dev:mana:all` after a cold machine start would fail with

  Bind for 0.0.0.0:8080 failed: port is already allocated

because the top-level compose's `mana-searxng` service couldn't take
8080 while the orphan held it. The second invocation silently
"worked" — docker saw the freshly-created mana-searxng container and
skipped the bind step on the idempotent up, leaving it healthy but
only reachable inside the docker network (8080/tcp, no external
publish).

Cleanup already done out-of-band:
  docker compose -f services/mana-search/docker-compose.dev.yml down
  docker compose -f docker-compose.dev.yml up -d --force-recreate searxng

Deleting the file so a stale `docker compose -f …/mana-search/dev.yml up`
can't resurrect the orphan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:27:19 +02:00
Till JS
3edf680ea0 feat(mana-ai): telemetry for reminder producers (mana_ai_reminders_emitted_total)
Producers now return structured {producer, severity, text} objects
instead of raw strings. buildReminderChannel collects them, increments
mana_ai_reminders_emitted_total{producer, severity} per emission, and
maps back to strings for the shared-ai loop input.

Why structured: the Prometheus label "severity" lets dashboards split
75-99% token-budget warnings (severity=warn) from 100%+ escalations
(severity=escalate) without NLP on the reminder text. Adding a new
producer that emits only info-level state (e.g. stale-sync warning)
falls out for free.

Active producer labels today:
  - token-budget (warn, escalate)
  - retry-loop (warn)

With this plus the scrape job (d087b4744), we can finally answer:
"does the budget warning actually change LLM behaviour?" — correlate
reminders_emitted_total{producer='token-budget'} with
tick_duration_seconds or planner_rounds_histogram.

3 tests updated to assert the new {producer, severity, text} shape
(16 reminder tests total, all green).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:10:27 +02:00
Till JS
8f283726b1 feat(agent-loop): activate retryLoopReminder via LoopState.recentCalls
Extends LoopState with a sliding window of the last N ExecutedCalls
(oldest-first), capped at LOOP_STATE_RECENT_CALLS_WINDOW = 5. The loop
maintains the window automatically; reminderChannel producers read it
without touching internal state.

This activates retryLoopReminder which was shape-only in faa472be9.
The guard now fires end-to-end: when round >= 3 and the tail-2 calls
both returned success:false, the LLM sees a "stop retrying, write a
summary instead" <reminder> on the next turn. The tail-2 check rather
than window-wide is deliberate — a flaky run with intermittent success
(F, F, F, OK, F) is not a retry loop, just flaky tools.

Why window=5: retry loops usually manifest within 2-3 consecutive
rounds; a 5-deep window gives room for burst-detection and
stale-tool heuristics without bloating the reminder channel. Cap
keeps the reminder producers O(5) regardless of loop length.

Tests: 3 new (sliding-window cap + slide + order in shared-ai, retry
composition + budget+retry chain + tail-only heuristic in mana-ai).
Total agent-loop tests now 74 across both packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 15:02:40 +02:00
Till JS
25c3bb6cdf docs(mana-mcp,mana-ai): CLAUDE.md coverage for M1 agent-loop primitives
mana-mcp:
  - Policy-gate section: POLICY_MODE semantics, the four decision
    rules, where to find soak metrics during log-only burn-in.
  - /metrics section pointing at the Prometheus job.

mana-ai:
  - New v0.8 status block: reminderChannel wiring, the two live
    producers (tokenBudgetReminder active, retryLoopReminder dormant
    pending LoopState extension), why POLICY_MODE here is limited to
    freetext inspection, why parallel-reads have no effect until the
    tool-registry absorbs the full AI_TOOL_CATALOG (M4 of personas).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:25:14 +02:00
Till JS
c94ab01c69 feat(mana-mcp): Prometheus metrics for policy gate + tool invocations
Replaces the stub /metrics endpoint with a real prom-client registry
(mana_mcp_ prefix, {service="mana-mcp"} default label). Default
process metrics come along for free.

Policy-gate telemetry is the whole point — without it we can't soak
POLICY_MODE=log-only safely or decide when to flip to enforce. New
counter mana_mcp_policy_decisions_total{decision, reason, mode} buckets
every evaluatePolicy() call:

  decision ∈ {allow, deny, flagged}
  reason   ∈ {admin-scope-not-invokable, destructive-not-allowed,
              rate-limit-exceeded, injection-marker, clean, unknown}
  mode     ∈ {log-only, enforce}

So the rate of "would have been denied" during soak is visible directly
as policy_decisions_total{decision="deny", mode="log-only"}.

Also:
  - mana_mcp_tool_invocations_total{tool, outcome} — success |
    handler-error | input-invalid. Policy denies are NOT counted here
    (they're in policy_decisions_total above); this counter only counts
    calls that actually reached the handler or tripped zod validation.
  - mana_mcp_tool_duration_seconds histogram per tool/outcome.

Dep: prom-client ^15.1.3 (same version mana-ai pins).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:23:08 +02:00
Till JS
f07eae3c01 feat(personas): M3.b-d — tick loop + Claude Agent SDK + persistence (real)
Previous commit 38dc80654 carries this M3 title but its payload is an
unrelated apps/api/picture change — shared-.git-index race with a
parallel session (see feedback_git_workflow.md). This commit holds the
actual M3.b/c/d code. Leaving the misnamed commit for the user to
re-attribute / revert as they prefer.

Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner picks up due personas, drives each through Claude + MCP for
one simulated turn, collects actions + ratings, persists through
service-key internal endpoints in mana-auth.

Internal endpoints (mana-auth, service-key-gated)

- GET  /api/v1/internal/personas/due
    Returns personas whose tickCadence + lastActiveAt say they're
    due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
    NULLS FIRST so never-run personas go ahead of stale ones.

- POST /api/v1/internal/personas/:id/actions
    Batch ≤ 500. Row ids are deterministic
    `${tickId}-${i}-${toolName}` + ON CONFLICT DO NOTHING so the
    runner can retry a tick without doubling audit rows. Also
    bumps personas.last_active_at so the next /due call sees it.

- POST /api/v1/internal/personas/:id/feedback
    Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
    one rating per module per tick.

Runner tick pipeline (services/mana-persona-runner/src/runner/)

- claude-session.ts
    Two phases per tick. runMainTurn feeds the persona's system
    prompt + a German "simulate a day" user prompt to Claude Agent
    SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
    server. We iterate the returned AsyncGenerator and extract
    tool_use blocks into ActionRows; a tool_result with
    is_error=true flips the most recent action. runRatingTurn is a
    fresh query() with tools:[] asking Claude in character to rate
    each used module 1-5 as strict JSON. We parse with tolerance
    for whitespace / fences. Unparseable output becomes a synthetic
    '__parse' feedback row so operators see the failure.

- tick.ts
    Orchestrator. Skips when config.paused. Fetches /due, processes
    in batches of config.concurrency via Promise.allSettled so a
    single persona failure never kills the batch. Returns
    {due, ranSuccessfully, failed[], durationMs}.

- types.ts
    ActionRow + FeedbackRow shapes shared between claude-session
    and the internal client.

Runner bootstrap (src/index.ts)

- setInterval(config.tickIntervalMs) starts the tick loop on boot.
  tickInFlight guards against overlap when Claude latency >
  interval. If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing,
  loop is disabled with a warn line — /health + /diag/login still
  work.
- POST /diag/tick (dev-only) fires one tick on demand, returns
  the result. Avoids waiting a full interval during testing.
- Graceful SIGTERM/SIGINT shutdown clears the interval.

Client

- clients/mana-auth-internal.ts
    X-Service-Key client for the three endpoints above.
    Constructor throws on empty serviceKey — fail loud.

Boot smoke verified: /health returns ok, /diag/tick 500s with
descriptive messages when keys absent. Warning lines on boot when
keys are missing. Type-check green across mana-auth, tool-registry,
mcp, persona-runner.

M3 exit gate is the end-to-end smoke recipe (docker up → db:push →
seed:personas → diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md.

M2.d (cross-space family/team memberships) still deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:18:31 +02:00
Till JS
a1caeaa7f3 feat(personas): M3.a — scaffold mana-persona-runner service on :3070
First concrete piece of M3 (docs/plans/mana-mcp-and-personas.md). The
tick loop itself and the Claude Agent SDK + MCP integration are M3.b;
the action/feedback persistence endpoints are M3.c. This commit just
stands up the service so the remaining pieces have a shell to land in.

Service shape (Bun/Hono on :3070)

- src/config.ts
    Env-driven configuration: auth URL, MCP URL, service key for
    action/feedback callbacks (M3.c), Anthropic API key, deterministic
    PERSONA_SEED_SECRET (must match scripts/personas/password.ts so the
    runner can log back in without any stored credentials), tick
    interval and concurrency, RUNNER_PAUSED kill-switch. Production
    start asserts all secrets are set and the dev fallback secret is
    rotated.

- src/password.ts
    Bit-for-bit identical HMAC-SHA256 password derivation to
    scripts/personas/password.ts. Duplicated deliberately: the two
    sides can't share code (one is a repo-root utility script, the
    other is a workspace service) but must stay in sync — comment
    at the top calls this out.

- src/clients/auth.ts
    Two upstream calls the runner needs for one tick: POST /auth/login
    and GET /api/auth/organization/list. loginAndResolvePersonalSpace()
    wraps both and picks the persona's auto-created personal space as
    the write target (throws if none exists — Spaces-Foundation should
    always have seeded one on signup).

- src/index.ts
    Hono app: /health, /metrics (stub), and a dev-only /diag/login
    endpoint that takes a persona email, derives the password, logs
    in, resolves the personal space, and returns {userId, spaceId} as
    an end-to-end sanity check. Disabled in production.

No tick loop yet — RUNNER_PAUSED prints an info line on boot, but
nothing fires. The dispatcher + Claude Agent SDK + MCP client land in
M3.b; the internal POST callbacks into mana-auth for persona_actions /
persona_feedback land in M3.c.

Infra

- Port 3070 added to docs/PORT_SCHEMA.md.
- Service listed in root CLAUDE.md next to mana-mcp.
- services/mana-persona-runner/CLAUDE.md documents what's built today,
  what lands in M3.b/c, and the local diag smoke recipe.

Boot smoke verified: /health returns ok + paused/interval/concurrency,
/diag/login without email returns 400.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:00:43 +02:00
Till JS
faa472be91 feat(mana-ai): first live reminder producers — token budget + retry-loop
Wires the M1 reminderChannel into the mana-ai mission runner with two
initial producers in services/mana-ai/src/planner/reminders.ts:

- tokenBudgetReminder — warns at 75% of the agent's daily cap, emits a
  stronger "wrap up NOW" message at/above 100%. Uses pretick usage +
  accumulated round usage so the warning tracks drift during a long
  plan.
- retryLoopReminder — shape is in place (round≥3 + last 2 failures),
  currently limited to the single lastCall LoopState exposes. Extends
  cleanly once LoopState carries the full failure window.

buildReminderChannel composes active producers; the tick hoists
pretickUsage24h so the channel has the baseline. Each round the loop
re-evaluates the producers, so usage drift across rounds surfaces on
the NEXT turn.

Also exports LoopState + ReminderChannel from @mana/shared-ai top-level
so consumers don't need to reach into /planner.

Tests: 13 new bun tests covering thresholds, pretick+round summing,
composition, and per-round re-evaluation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 14:00:04 +02:00
Till JS
e5d230e599 feat(agent-loop): M1 — policy gate + reminder channel + parallel reads
Three Claude-Code-inspired primitives for runPlannerLoop, derived from the
reverse-engineering reports in docs/reports/:

1. **Policy gate** (@mana/tool-registry) — evaluatePolicy() gates every tool
   dispatch: denies admin-scope, denies destructive tools not in the user's
   opt-in list, rate-limits per tool (30/60s default), flags prompt-injection
   markers in freetext without blocking. Wired into mana-mcp with a
   per-user rolling invocation log and POLICY_MODE env (off|log-only|enforce,
   default log-only). mana-ai uses detectInjectionMarker only — tool dispatch
   there is plan-only, so rate-limit/destructive checks don't apply yet.

2. **Reminder channel** (packages/shared-ai/src/planner/loop.ts) — new
   reminderChannel callback in PlannerLoopInput. Called once per round with
   LoopState snapshot (round, toolCallCount, usage, lastCall); returned
   strings wrap in <reminder> tags and inject as transient system messages
   into THIS LLM request only. Never pushed to messages[] — the Claude-Code
   <system-reminder> pattern that keeps the KV-cache prefix stable.

3. **Parallel reads** (loop.ts) — isParallelSafe predicate enables
   Promise.all dispatch when every tool_call in a round is parallel-safe,
   in batches of PARALLEL_TOOL_BATCH_SIZE=10. Any non-safe call downgrades
   the whole round to sequential. messages[] always appends in source
   order, never completion order, so the debug log stays linear.
   Default-off (undefined predicate) preserves pre-M1 behaviour.

Tests: 21 new in tool-registry (policy), 9 new in shared-ai (5 parallel,
4 reminder). All 74 green, type-check clean across 4 packages.

Design/plan: docs/plans/agent-loop-improvements-m1.md
Reports: docs/reports/claude-code-architecture.md,
         docs/reports/mana-agent-improvements-from-claude-code.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:56:40 +02:00
Till JS
493db0c3b2 feat(personas): M2.a-c — persona schemas + admin endpoints + seed pipeline
Continuation of docs/plans/mana-mcp-and-personas.md. Personas are the
auto-test users the M3 runner will drive — they're real Mana users
(kind='persona', tier='founder'), registered through the same Better
Auth pipeline as humans, just stamped differently and metadata-tracked
so the persona-runner knows how to role-play them.

Schemas (auth namespace — personas are 1:1 with users, no reason for a
separate platform.* schema that the plan originally sketched)

- userKindEnum ('human' | 'persona' | 'system') + users.kind column,
  wired into better-auth additionalFields so the JWT/user object carry
  the flag. Default 'human' keeps every existing user untouched.
- auth.personas — 1:1 descriptor (archetype, systemPrompt, moduleMix
  jsonb, tickCadence, lastActiveAt). CASCADE from users.id.
- auth.persona_actions — tick-grouped audit of every tool call the
  runner makes (toolName, inputHash for dedup, result, latency).
- auth.persona_feedback — structured 1-5 ratings per module per tick,
  plus free-text notes. This is where the runner writes the
  self-reflection step at end of each tick.

Admin endpoints (/api/v1/admin/personas, admin-tier-gated)

- POST /            create-or-update by email. Uses auth.api.signUpEmail
                    if the user's new, then stamps kind+tier+verified
                    and upserts the personas row. Idempotent — safe to
                    re-run after catalog edits.
- GET  /            list with 7-day action count per persona.
- GET  /:id         detail + recent 20 actions + per-module feedback
                    aggregate.
- DELETE /:id       hard delete. Refuses non-persona users as
                    defense-in-depth: an admin typo here would cascade
                    through the full user-delete chain.

Catalog + seed pipeline (scripts/personas/)

- catalog.json      10 handwritten personas spanning 7 archetypes
                    (adhd-student, ceo-busy, creative-parent, solo-dev,
                    researcher, freelancer, overwhelmed-newbie).
                    Five pairs of personas that will later share
                    family/team spaces (cross-space setup is deferred
                    to M2.d per the plan).
- catalog.ts        zod-validated loader. Refines email to require
                    @mana.test TLD — non-existent, no bounce risk.
- password.ts       deterministic HMAC-SHA256(PERSONA_SEED_SECRET,
                    email). No stored per-persona credentials; the
                    runner re-derives on every login. Refuses the
                    dev-fallback secret in production.
- seed.ts           POST /admin/personas per catalog entry. Flags:
                    --auth=, --jwt=, --dry-run.
- cleanup.ts        Hard-delete every live persona. Warns when the
                    live set drifts from the catalog.

Root package.json:
  pnpm seed:personas
  pnpm seed:personas:cleanup

Extends the ESLint root-ignore list with `scripts/**` so Bun-typed
utility scripts don't fail the typed-parser check they weren't opted
into. Consistent with the rest of scripts/ being .mjs+.sh.

To go live (user action):
  pnpm docker:up
  cd services/mana-auth && bun run db:push
  export MANA_ADMIN_JWT=...
  pnpm seed:personas

M2.d deferred: cross-space (family/team/practice) memberships between
persona pairs. Better Auth's org-invite flow is multi-step and would
roughly double the M2 scope; the persona-runner (M3) can operate in
personal spaces first, shared-space tests land as their own milestone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:55:14 +02:00
Till JS
16c8818338 feat(mcp): M1+M1.5 MCP gateway + tool-registry + shared-crypto
Foundation for autonomous Claude-driven testing. Plan:
docs/plans/mana-mcp-and-personas.md.

New packages
- @mana/tool-registry — schema-first ToolSpec<InputSchema, OutputSchema>
  with zod generics, scope ('user-space' | 'admin') and policyHint
  ('read' | 'write' | 'destructive'). sync-client helpers speak the
  mana-sync push/pull protocol directly so RLS and field-level LWW are
  preserved. MasterKeyClient fetches per-user MKs via the existing
  mana-auth GET /api/v1/me/encryption-vault/key endpoint (JWT-gated,
  ZK-aware, already audited) — no new service-key endpoint built.
  ZeroKnowledgeUserError surfaced as a typed throw.
- @mana/shared-crypto — AES-GCM-256 primitives extracted from the web
  app's $lib/data/crypto/aes.ts so the server-side tool handlers and the
  browser produce byte-for-byte identical wire format
  (enc:1:{b64(iv)}.{b64(ct)}). Web app aes.ts now re-exports from
  shared-crypto — 5 existing importers unchanged, svelte-check stays
  green.

New service
- services/mana-mcp (:3069, Bun/Hono) — MCP Streamable HTTP gateway.
  JWKS auth against mana-auth, per-user session isolation (session-id
  belongs to the user who opened it — cross-user access returns 403),
  admin-scoped tools filtered out before registration. MasterKeyClient
  cached per process with a 5-minute TTL.

11 tools registered
- habits.{create,list,update,archive}, spaces.list (plaintext, M1)
- todo.{create,list,complete}, notes.{create,search}, journal.add
  (encrypted — field lists match
  apps/mana/apps/web/src/lib/data/crypto/registry.ts verbatim)

Infra
- Port 3069 added to docs/PORT_SCHEMA.md
- services/mana-mcp/CLAUDE.md with architecture, auth model,
  tool-authoring recipe, local smoke-test steps
- Root CLAUDE.md services list updated

Type-check green across shared-crypto, mana-tool-registry, mana-mcp.
svelte-check on apps/mana/apps/web stays at 0 errors / 0 warnings.
Boot smoke verified: /health returns registry.loaded=true, unauthed
/mcp → 401, invalid-JWT /mcp → 401 with descriptive message.

Decisions locked in for later milestones (per plan D1–D10):
- Personas will be real mana-auth users (users.kind='persona'), no
  service-key bypass (D1, D2)
- Tool-registry is the SSOT; mana-ai and the legacy
  apps/api/src/mcp/server.ts get merged into it in M4 (three current
  parallel tool catalogs collapse to one)
- Persona-runner (:3070) will be a separate service using the Claude
  Agent SDK + MCP client (D5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 13:18:35 +02:00
Till JS
c1498c1099 fix(infra): include shared-types in mana-auth Dockerfile installer
mana-auth's package.json declares @mana/shared-types as a workspace
dependency, but the Dockerfile's install stage never copied its source
into the build context. pnpm then silently failed to create the
workspace symlink under node_modules, and bun hit ENOENT on every
import at runtime: "reading /app/services/mana-auth/node_modules/
@mana/shared-types".

The broken image sat undetected as long as the long-running container
didn't restart. Tonight's deploy recreated it and every mana-auth
container immediately crash-looped — taking mana-api and mana-web
down with it via depends_on.

Same class of bug as 70c62e758 (shared-logger).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 02:31:20 +02:00
Till JS
fd1ea47075 feat(backup): client-driven v2 snapshot export, drop server-side backup
Replaces the mana-sync event-stream export (GET /backup/export) with a
fully client-driven `.mana` v2 archive: webapp reads Dexie, decrypts
per-field, packages JSONL + manifest, optionally PBKDF2+AES-GCM seals
with a passphrase.

- New: backup/v2/{format,passphrase,export,import}.ts + format.test.ts
  (10 tests: round-trip, sealed path, 3 failure modes incl. wrong-
  passphrase vs. tamper distinction).
- UI: ExportImportPanel with module multi-select, optional passphrase,
  progress + sealed-file detection — replaces the old backup flow in
  Settings → MyData.
- Removes services/mana-sync/internal/backup/ and the corresponding
  client helpers + v1 tests. No parallel paths, no legacy shim.
- Why client-driven: zero-knowledge users hold their vault key only
  client-side, so a server exporter cannot produce plaintext archives;
  GDPR Art. 20 portability is better served by plaintext-by-default.
- Cross-account restore works via re-encryption under the target
  vault key (no MK transfer needed).

DATA_LAYER_AUDIT.md §8 rewritten to reflect the new architecture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:46:29 +02:00
Till JS
3a7bc7f1c3 test(mana-research): fixture-based tests for Gemini poll-response parser
Re-commit of c413ab7dd (reverted in c31dcdd66) without the unrelated
files that accidentally got swept into the original stage. Parser
content is identical.

The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).

This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:

  - status dispatch (queued, in_progress, failed, cancelled, incomplete)
  - completed body concatenated across text items, skipping thought/image
  - empty/missing `outputs` without crashing
  - missing usage
  - citations deduped by url, hostname extracted as title
  - wrong-type annotations and those without url skipped
  - real vertexaisearch redirect URLs Gemini emits
  - fallback to url as title when the URL is unparseable
  - trimming of leading/trailing whitespace

To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.

Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.

17 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:44:21 +02:00
Till JS
c31dcdd66c Revert "test(mana-research): fixture-based tests for Gemini poll-response parser"
This reverts commit c413ab7dd3.
2026-04-22 18:43:48 +02:00
Till JS
c413ab7dd3 test(mana-research): fixture-based tests for Gemini poll-response parser
The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).

This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:

  - status dispatch (queued, in_progress, failed, cancelled, incomplete)
  - completed body concatenated across text items, skipping thought/image
  - empty/missing `outputs` without crashing
  - missing usage
  - citations deduped by url, hostname extracted as title
  - wrong-type annotations and those without url skipped
  - real vertexaisearch redirect URLs Gemini emits
  - fallback to url as title when the URL is unparseable
  - trimming of leading/trailing whitespace

To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.

Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.

17 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:34:33 +02:00
Till JS
2a18cb5ee4 feat(mana-ai): v0.7 — cross-tick Deep Research Max pre-planning
Opt-in path for missions that want Gemini Deep Research Max (up to 60 min
per task) instead of the shallow RSS pre-research. Because Max runs well
past a single 60-second tick, the state is carried across ticks:

  tick N:   submit → INSERT mission_research_jobs row → skip planner
  tick N+k: poll → still running → skip planner (metric pending_skips)
  tick N+m: poll → completed → inject as ResolvedInput, DELETE row, plan

- ManaResearchClient talks to mana-research's new internal
  /v1/internal/research/async endpoints with X-Service-Key +
  X-User-Id. Graceful-null on transport errors so a flaky
  mana-research never crashes the tick loop.
- New table mana_ai.mission_research_jobs with PK (user_id, mission_id)
  — presence is the "pending" flag; delete-on-terminal keeps queries
  trivial.
- handleDeepResearch() encapsulates the state machine; planOneMission
  now returns a discriminated union (planned | skipped | failed) so
  "research pending" isn't miscounted as a parse failure.
- Opt-in at TWO gates to keep cost in check ($3–7/task, 1500 credits
  per run):
    1. MANA_AI_DEEP_RESEARCH_ENABLED=true server-side (default off)
    2. DEEP_RESEARCH_TRIGGER regex matches the mission objective
       (strict: "deep research", "tiefe recherche", "umfassende
       recherche", "hintergrundrecherche", "deep dive")
  Falls back to shallow RSS when either gate fails or the submit
  errors upstream.
- Prom metrics: mana_ai_research_jobs_{submitted,completed,failed}_total
  labelled by provider, plus _pending_skips_total.
- docker-compose wires MANA_RESEARCH_URL + the opt-in flag and adds
  mana-research to depends_on.
- Full write-up with real API response shape (outputs plural, not
  OpenAI-style), step-3 MCP-server plan (security-gated, not built),
  ops + kill-switch: docs/reports/gemini-deep-research.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:56:06 +02:00
Till JS
f10a95e842 feat(mana-research): add Gemini 3.1 Pro Deep Research async providers
- New providers gemini-deep-research + gemini-deep-research-max on the
  Interactions API (preview-04-2026). Submit/poll split, tier parameter
  selects between standard (~minutes, $1–3) and max (up to 60 min, $3–7).
- Parser matches the real response shape: flat `outputs` array of
  thought|text|image items, url_citation annotations without title,
  `usage.total_input_tokens` / `total_output_tokens`.
- Route generalisation: /v1/research/async accepts `provider` with
  default 'openai-deep-research' (backward compatible) and dispatches
  to the right submit/poll pair.
- New internal service-to-service endpoint /v1/internal/research/async
  gated by X-Service-Key + X-User-Id for credit accounting. Enables
  mana-ai to drive deep-research jobs on the mission owner's wallet
  without requiring a user JWT.
- Pricing: 300 credits (standard) / 1500 credits (max). Conservative
  markup over the ~$3/$7 ceiling so the first runs can't surprise us.
- Docs: AGENT_PROVIDER_IDS + pricing + env map + auto-router stay in
  sync; CLAUDE.md Phase 3b now current; API_KEYS.md references the
  new providers under GOOGLE_GENAI_API_KEY.

Verified with a real smoke test against the Gemini API: submit + poll
both succeed, completed response parsed cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:55:30 +02:00
Till JS
177734a860 fix(tsconfig): unblock shared-types consumers
shared-types/src/index.ts re-exports with explicit .ts extensions
(Tailwind v4 module resolver needs them). TS 5.7 requires consumers
to opt in via allowImportingTsExtensions. The flag only type-checks
when noEmit:true; the NestJS builder also needs
rewriteRelativeImportExtensions so tsc still emits valid JS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:53:55 +02:00
Till JS
12be75e6a6 fix(broadcast): track route paths + shared-branding tsconfig
Two fixes surfaced by the end-to-end smoke test.

1. broadcast-track.ts: inner route paths double-prefixed.
   Routes were declared as '/track/open/:token' etc, then
   mounted at '/api/v1/track', yielding '/api/v1/track/track/open/:token'
   — every tracking endpoint returned 404. Dropped the redundant
   '/track/' prefix so the full path is now
   '/api/v1/track/{open,click,unsubscribe}/:token' as the
   orchestrator + client both expect.

   Verified with live curl:
   - /track/open/BAD → 200 image/gif 42 bytes (graceful no-signal)
   - /track/click/?url missing → 400 missing url
   - /track/click?url=javascript: → 400 bad url
   - /track/click?url=https://ok + bad token → 302 graceful
   - /track/unsubscribe/BAD GET → 400 HTML
   - /track/unsubscribe/BAD POST → 400 (RFC 8058)

2. shared-branding/tsconfig.json: allowImportingTsExtensions
   missing. shared-types/src/index.ts uses explicit .ts
   imports (intentional, for Tailwind's module resolver); any
   downstream tsconfig without allowImportingTsExtensions emits
   8 errors. shared-auth already had this fix — shared-branding
   gets the same treatment. noEmit:true is set, so no rewrite
   flag needed.

   Verified: shared-branding pnpm check → 0 errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:30:47 +02:00
Till JS
1861e89d45 chore(broadcast): wire mana-mail into env pipeline + push schema
The three final pre-dogfood items:

1. drizzle.config: schemaFilter now includes 'broadcast' alongside
   'mail'. Without this, `bun run db:push` skipped the broadcast
   tables — schema existed in code but not in Postgres. Tested via
   db:push + psql \dt (3 tables created: campaigns, events, sends).

2. .env.development: new MANA-MAIL SERVICE section with Stalwart
   knobs + broadcast config (tracking secret, rate limits, send
   throttle). DEV secret is explicitly labelled non-production —
   prod rotates via env.

3. generate-env.mjs: new block writes services/mana-mail/.env on
   `pnpm setup:env`. Mirrors the invoices / research / events
   pattern. All 16 broadcast/mail vars flow through from SSOT.

Verified end-to-end:
- pnpm setup:env → services/mana-mail/.env contains
  BROADCAST_TRACKING_SECRET + rate limits
- bun run src/index.ts → /health returns 200 with the new config
- psql → broadcast.campaigns / events / sends are materialised

Broadcast module is now fully ready to send real mail — nothing
else required before the first dogfood campaign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:21:57 +02:00
Till JS
260dd312a9 feat(broadcast): M8 DNS auth check (SPF / DKIM / DMARC)
Closes the last plan milestone. Users can verify their sending-domain
setup without leaving the broadcast settings page.

Server (mana-mail)
- services/dns-check.ts: parseSpf / parseDkim / parseDmarc are pure
  functions. SPF accepts include:<mailDomain>, flags weak (+all) and
  wrong (include missing) and multi-record (RFC 7208 §3.2). DKIM
  needs v=DKIM1 + a p= public-key segment. DMARC requires v=DMARC1,
  flags p=none as weak (monitoring only), ok on quarantine/reject.
  All three are case-insensitive.
- lookupTxt(): DNS-over-HTTPS against Cloudflare 1.1.1.1 — avoids
  the Bun/container udp-resolver flakiness and works everywhere.
  Multi-string TXT (`"a" "b"`) get concatenated before parsing.
- checkDomain(): one call, three parallel DoH lookups, returns a
  structured result with suggested copy-paste records scoped to the
  user's actual mail domain from config.
- Route: GET /v1/mail/dns-check?domain=&selector= (JWT auth). Zod
  validates the domain looks sensible before hitting DoH.
- 16 unit tests covering all three parsers + multi-record edge case.

Client
- api.ts: runDnsCheck(domain, selector?) helper with typed result.
- components/DnsCheckBanner.svelte: derives domain from the default
  from-email (after @), calls the check on-demand, renders per-record
  status chips (ok / weak / wrong / missing) with messages, exposes
  copy-pasteable SPF + DMARC records when anything's off. DKIM setup
  is provider-specific so we show a hint rather than a canned record.
  Last-check timestamp persists to settings.dnsCheck so the banner
  survives a reload without re-hitting the API.
- Wired into SettingsForm between Impressum and Standard-Footer —
  where the user is already thinking about "what's required to
  actually send".

All checks clean:
- webapp pnpm check: 0 broadcast errors (4 pre-existing articles errors
  from parallel Spaces work, unrelated)
- mana-mail tests: 36/36 across tracking-token + link-rewriter + dns-check
- mana-mail build: 2.51 MB (+8 KB for juice — dns-check itself is ~3 KB)

Plan: docs/plans/broadcast-module.md §M8. All 10 milestones now done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:48:03 +02:00
Till JS
a312d98f09 feat(broadcast): click-link tracking + send throttle
Closes the last two dogfood blockers before real-campaign use.

link-rewriter.ts
- rewriteClickLinks(): walks <a href="http…"> in the HTML body and
  replaces each URL with /api/v1/track/click/{token}?url={original}
  so clicks go through the tracking endpoint. Regex-based because
  Tiptap output is well-formed; returns a count for debugging.
- Leaves mailto: / tel: / anchor fragments alone — wrapping those
  breaks the recipient's native handler and accomplishes nothing.
- `skipUrls` param carries the unsubscribe + web-view URLs (already
  tracking endpoints themselves) so they don't get double-wrapped.
- 11 unit tests covering http/https rewriting, skip list, non-http
  schemes, attribute preservation, multi-link count, quoted-attr
  variants, idempotency.

Orchestrator wiring
- substituteUrls now calls rewriteClickLinks after the preview-
  placeholder swap and before the open-pixel injection. The
  unsubscribe + web-view URLs from this same function are passed
  in as skip entries so they survive the pass untouched.
- Constructor gains `sendThrottleMs` param (default 150ms).
- Main send loop awaits sleep(throttleMs) between iterations. 150ms
  = ~6/sec = ~360/min, safely below most SMTP provider limits.
  100-recipient campaign = ~15s extra wall-clock but that's fine
  for MVP (and most campaigns are way smaller).

Config
- New env BROADCAST_SEND_THROTTLE_MS (default 150). Wired from
  loadConfig to the orchestrator constructor.

The broadcast module is now functionally complete for dogfooding.
Remaining before a real campaign can actually go out: run
`cd services/mana-mail && bun run db:push` to materialise the
broadcast.* schema tables.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 15:07:58 +02:00