- DATA_LAYER_AUDIT.md: new section 8 covering the export/import flow
end-to-end — architecture diagram, .mana format, protocol-stability
commitments we locked in pre-launch (eventId + schemaVersion + op
vocab + tombstones-forever), encryption-boundary argument, file
map, and the remaining backup backlog (M4b, M5, signature,
resumable download, dedup table).
- services/mana-sync/CLAUDE.md: /backup/export row in API table with
explicit note that it sits outside the billing gate, new Backup /
Restore section with format sketch + split between writer.go (pure)
and handler.go (shim), test-coverage line mentions the backup cases,
project-structure tree lists backup/*.go, Security section mentions
RLS still applies to the export path.
No code changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Refactor: HTTP handler becomes a thin shim over a pure WriteBackup(w,
userID, createdAt, iter) function. RowIterator abstracts the store, so
tests feed synthetic ChangeRow slices and production feeds
StreamAllUserChanges. Zero behavior change in production — same bytes
on the wire.
Tests (all pass):
- TestWriteBackup_Roundtrip: three rows across two apps, assert zip has
2 entries, events.jsonl has 3 JSON lines in order, insert omits
fieldTimestamps, update surfaces them, manifest apps are sorted,
eventsSha256 equals a recomputed sha of the decompressed body.
- TestWriteBackup_EmptyUser: empty userID refused up-front.
- TestWriteBackup_NoRows: zero-row export still produces a valid zip
with an empty events.jsonl and a manifest with eventCount=0 and a
non-empty sha (sha of empty input).
- TestWriteBackup_DefaultsSchemaVersionZeroRowsToOne: legacy rows with
schema_version=0 clamp to 1 so the manifest never claims a protocol
version that never existed.
Paired with the vitest zip parser suite on the TS side, this closes
the Go-writes / JS-reads round-trip without needing live mana-sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Recovering three files dropped when a parallel terminal reset past the
original M1 commit:
- cmd/server/main.go: register GET /backup/export outside billingMiddleware
- lib/api/services/backup.ts: browser-side downloadBackup() helper
- settings/my-data/+page.svelte: "Backup & Wiederherstellung" section
Pairs with the earlier backup handler + schema_version work already on
main (79996f946). With this commit the endpoint is actually reachable
end-to-end and the download button works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- sync_changes gains schema_version column (default 1, idempotent ADD)
- Change/Changeset carry schemaVersion; server refuses > MaxSupported
- server->client changes now carry eventId + schemaVersion so the
restore path can dedup via eventId and route through a migration
chain keyed on schemaVersion
- backup JSONL gains schemaVersion per line
Pre-M2 clients (omit the field) are treated as v1 for compatibility.
This is the stability contract we commit to before launch: once v1
events are in the wild, all future builds must replay them forward.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Admins can now grant Cloud Sync to users without charging credits. Gifted
rows carry is_gifted=true plus gifted_by/gifted_at audit columns; the
billing cron skips them, and /activate and /deactivate refuse to touch
them. New endpoints POST/DELETE /api/v1/admin/sync/:userId/gift.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backend: Hono/Bun service on port 3042 with JMAP client for Stalwart,
account provisioning (@mana.how addresses on user registration),
thread/message/send/label API endpoints, and JWT + service-key auth.
Frontend: Mail module with 3-column inbox UI (mailboxes, thread list,
detail/compose), local-first encrypted drafts in Dexie, and API-driven
thread fetching. Scoped CSS with theme tokens.
Integration: Dexie v11 schema, mail pgSchema in mana_platform,
mana-auth fire-and-forget hook for account provisioning,
getManaMailUrl() in API config, app registry + branding update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
**Unit tests (`bun test`, 42 checks, 0 deps)**
- `src/lib/__tests__/category-map.test.ts` locks in the Pelias→
PlaceCategory priority resolution. Covers the ambiguous multi-category
case (food beats retail for restaurants, transit beats professional
for car rentals, transport:rail still maps to transit, …), the simple
single-category paths, the layer-hint fallback, and regression cases
from real Konstanz/Stuttgart/Köln venues observed during deploy
verification.
- `src/lib/__tests__/cache.test.ts` covers LRU eviction order, TTL
expiry, move-to-end on get (so frequently-read entries survive
eviction), size tracking, and typed-value storage.
**Smoke test (`./scripts/smoke-test.sh` or `bun run test:smoke`)**
End-to-end curls against a running service, aimed at post-deploy
verification. Health endpoints, forward (venue + street fallback),
focus biasing, reverse geocoding, cache hit. 9 checks total.
Wired up as `test:smoke` in package.json so it runs alongside the
unit tests. Verified working: 42/42 unit tests green locally, 9/9
smoke checks green against the live Mac Mini deployment.
CLAUDE.md Testing section rewritten to reflect the new test layers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the 2026-04-11 production deploy, several non-obvious gotchas
surfaced that needed documenting:
- Forward search: autocomplete→search fallback explained, so future-me
knows why the handler hits two Pelias endpoints for address-style
queries.
- Pelias infra: corrected object counts (13.4M actual, not 22M), noted
the libpostal RAM surprise (~1.9 GB, much larger than Pelias docs
suggest), and added real per-container RAM numbers from production.
- pelias.json: document that we dropped placeholder/pip/interpolation
(not just how to run them) and why the cleaner degradation matters.
- Wrapper gotchas section: Bun idleTimeout, Colima bind-mount cache
staleness, and the host.docker.internal-from-blackbox workaround.
- /health/pelias endpoint is now listed in the API table since it's
the integration point with blackbox monitoring.
- Testing section added — explicitly "no automated tests yet", with a
curl-based manual smoke test set a human can run after changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pelias /autocomplete deliberately excludes the address layer as a
performance optimization, so queries like "Marktstätte Konstanz"
(street + locality) return 0 venue matches even though they're clearly
in the index. /search covers all layers including addresses and streets.
Query /autocomplete first (fast, fuzzy, great for venue names), and if
it returns nothing, try /search. Best of both worlds: quick matches for
"Konzil Restaurant" plus reliable matches for street addresses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two production follow-ups surfaced after the deploy:
1. Pelias API was emitting continuous `ENOTFOUND placeholder`, `pip`,
`interpolation` errors because we declared those services in
pelias.json but never actually run them (we don't need WOF
admin lookup or street interpolation for the DACH use case).
Removed the stale entries — Pelias degrades cleanly to
libpostal-only parsing, which is what we want.
2. Bun.serve's default idleTimeout is 10s, which is too tight for
cold Pelias queries hitting Elasticsearch. Raise to 60s so
first-query-after-idle doesn't get cut off.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
blackbox-exporter can't resolve host.docker.internal on Colima, so
probes of host.docker.internal:4000 and :9200 always fail. Instead,
add a /health/pelias endpoint on the Hono wrapper that proxies to
the Pelias API, and update prometheus.yml to probe the wrapper's
proxied health endpoint.
Also simplifies the status page friendly_name() now that we don't
need to display the host.docker.internal targets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port 4400 collides with mana-infra-landings (status.mana.how nginx)
on the production mac mini. libpostal is only reached internally by
pelias-api over the pelias compose network anyway — no host binding
needed. Use expose instead of ports to drop the host mapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use node:22-alpine + pnpm to install workspace dependencies, then copy
node_modules into the bun runtime stage. This resolves @mana/shared-hono
which depends on @mana/shared-logger (transitive workspace dep).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bun install doesn't read pnpm-workspace.yaml, so workspace dependencies
like @mana/shared-hono can't be resolved. Switch to pnpm install with
--filter to install only mana-credits and its workspace deps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous version chained cd + bun install with || fallback, which
left CWD in services/mana-credits after the first attempt and caused the
fallback cd to fail. Use WORKDIR directives instead — each step starts
from a known absolute path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Production deployment + observability for the self-hosted geocoding stack:
**docker-compose.macmini.yml**
- New mana-geocoding container (port 3018, internal-only — no traefik
labels, no Cloudflare route). Uses host.docker.internal to reach the
Pelias API on the host's pelias compose stack. Dockerfile added under
services/mana-geocoding/ using the same Bun/Hono pattern as mana-events.
**Prometheus**
- New blackbox-internal job probing mana-geocoding:3018/health, the
Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch
at host.docker.internal:9200/_cluster/health. Kept separate from
blackbox-api which is reserved for public HTTPS endpoints.
**status.mana.how (generate-status-page.sh)**
- Include blackbox-internal in the metric query and add an "Interne
Dienste" section with its own summary card, right between Infrastruktur
and GPU Dienste. Summary grid goes from 4 to 5 columns with a
900px breakpoint.
- friendly_name() now handles http:// URLs and rewrites container-name
hosts like mana-geocoding:3018/health → "Mana Geocoding",
host.docker.internal:4000 → "Pelias API",
host.docker.internal:9200 → "Pelias Elasticsearch".
**Grafana uptime dashboard**
- Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel
- New "Interne Dienste Status" table panel showing per-instance up/down
- New "Geocoding Ø Latenz" stat panel for probe_duration_seconds
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expand services/mana-geocoding/CLAUDE.md with:
- The Pelias API patch (geojsonify_place_details.js) that forces the
category field to always be returned, with regeneration instructions
- The priority-ordered Pelias→PlaceCategory mapping and verified
example mappings from the DACH index
- A full initial-import walkthrough covering the non-obvious gotchas
(analysis-icu plugin, dach-latest → planet-latest rename, adminLookup
disabled, leveldbpath, libpostal config object form, boundary.country
single-value constraint)
Also register mana-geocoding in the root services list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dockerfile only copied services/mana-sync, but go.mod has a replace
directive pointing to ../../packages/shared-go which needs to be in the
build context. Switch context to repo root and copy both packages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pelias hides the 'category' field from API responses unless the
caller filters by categories=... explicitly — a default intended for
keyword search that strips category metadata from address queries.
Patch the Pelias API's geojsonify_place_details.js so the category
array is returned on every feature (food, retail, transport, …),
mounted into the container as a read-only volume override.
Rewrite category-map.ts to map Pelias' OSM taxonomy to our 7
PlaceCategories using a priority-ordered list so a restaurant
tagged ['food','retail','nightlife'] resolves to 'food' (the most
specific), not 'shopping'.
Verified with Konstanz test queries:
Konzil Restaurant → food
Bahnhof Konstanz → transit
Physiotherapie-Schule → work
MX-Park → leisure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After importing 22M OSM objects for the DACH extract:
- Disable adminLookup (no WOF data needed for address search)
- Configure leveldb path inside the data volume
- Specify planet-latest.osm.pbf as the import filename
- Convert libpostal service config from string to object form
- Drop boundary.country default — Pelias only accepts a single
country value, and our index only contains DACH data anyway
Verified forward + reverse geocoding work end-to-end for Konstanz
test queries via the mana-geocoding wrapper on port 3018.
Known limitation: OSM category/type (amenity:restaurant etc.) is
not yet populated in Pelias responses — will require whitelisting
those tags in the importer config and re-running the import.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dockerfile copied only its own package.json, causing bun install to
fail on @mana/shared-hono workspace dependency. Now copies workspace root
package.json and shared-hono/shared-types packages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New mana-geocoding service (port 3018) wraps a self-hosted Pelias
instance with LRU caching and OSM→PlaceCategory auto-mapping.
All geocoding queries stay within our infrastructure — no user
location data leaves the network.
Places module integration:
- Address autocomplete search in ListView (creates place with
name, coords, address, category in one step)
- Address search + reverse geocoding button in DetailView
- Auto-fill address via reverse geocoding during tracking
- OSM category mapping (amenity:restaurant→food, shop:*→shopping, etc.)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MANA_CREDITS_URL and MANA_SERVICE_KEY to configuration table
- Document billing gate on sync endpoints (402 behavior, 5min cache, fail-open)
- Add billing/check.go to project structure
- Add stream endpoint to API table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cloud Sync is now a paid feature: 30 credits/month (90/quarter, 360/year).
Users start in local-only mode and opt-in via Settings > Cloud Sync.
1 Credit = 1 Cent, so sync costs ~0.30€/month.
When credits run out, sync is paused (not deleted) and an in-app banner
prompts the user to top up. Local data is always preserved.
Backend (mana-credits):
- New sync_subscriptions table in credits schema
- SyncBillingService with activate/deactivate/chargeRecurring
- User-facing routes: GET/POST /api/v1/sync/{status,activate,deactivate,change-interval}
- Internal routes for server-side checks and cron triggers
Frontend (mana web):
- Sync API client + reactive sync-billing store
- syncEnabled parameter gates createUnifiedSync() — sync only starts when active
- Settings sync page with interval selection and activate/deactivate
- Pause banner in app layout when credits insufficient
Also: removed CALDAV_SYNC/GOOGLE_SYNC operations (not needed),
updated CLOUD_SYNC cost from 5 to 30 credits/month.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The credit system was overengineered for the local-first architecture:
- Productivity micro-credits (task/event/contact creation at 0.02 credits) made no sense
since these operations happen locally in IndexedDB with zero server cost and were never enforced
- Guild pool system (6 DB tables, spending limits, membership checks) had no active users
- Gift system had 5 types (simple/personalized/split/first_come/riddle) when 2 suffice
Now credits are only charged for operations that actually cost money: AI API calls and
premium features (sync, exports). This makes the value proposition clear to users.
Changes:
- Remove 8 productivity operations + CreditCategory.PRODUCTIVITY from @mana/credits
- Delete guild pool service, routes, schema (3 files); remove guild refs from 8 backend files
- Simplify gifts to simple + personalized only; remove bcrypt/riddle/portions logic
- Update all frontend pages (credits dashboard, gift create/redeem, public gift page)
- Update shared-hono consumeCredits() to remove creditSource parameter
- Update mana-credits CLAUDE.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redis: allkeys-lru → noeviction to prevent silent data loss when memory full
- mana-media: --watch → --hot to fix EADDRINUSE crash on Bun HMR reload
- Svelte: build initial values before $state() to avoid state_referenced_locally warnings
in create-app-onboarding.svelte.ts and shared-llm/store.svelte.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The media schema/tables were never created on fresh deploys because
mana-media only shipped a `db:push` script and nothing ever ran it
in the container. Result: every upload returned 500 the moment a
new environment came up (just hit prod again on mana.how).
- Add `db:generate` + `db:migrate` scripts and a migrate.ts runner
- Generate the initial migration covering media/media_references/
media_thumbnails (matches what was already on local + prod, which
were stamped manually so the migrator skips on existing deploys)
- Call runMigrations() at startup in src/index.ts so future fresh
containers self-bootstrap. Idempotent — drizzle tracks state in
drizzle.__drizzle_migrations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The first iteration of the Ollama response_format passthrough crashed
with 'ChatCompletionRequest object has no attribute response_format'
because the Pydantic request model didn't declare the field at all —
incoming response_format from OpenAI-compatible clients was being
silently dropped at the parsing layer before the provider could see it.
Fix: declare a typed ResponseFormat sub-model with the two OpenAI shapes
('json_object' and 'json_schema'), add it as an optional field on
ChatCompletionRequest, and let the Ollama provider read it directly
without defensive getattr fallbacks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Ollama provider was completely ignoring `response_format` from the
incoming OpenAI-compatible request. Two consequences:
1. Clients that asked for `{"type":"json_object"}` or
`{"type":"json_schema",...}` got back JSON wrapped in
```json ... ``` markdown fences, because Ollama defaults to
conversational output.
2. Strict downstream parsers (Vercel AI SDK `generateObject`,
manual `JSON.parse`) failed to decode the response and threw,
even though the underlying JSON was valid inside the fences.
Fix: when response_format is set, translate it to Ollama's native
`format` field:
- `{"type":"json_object"}` → `format: "json"`
- `{"type":"json_schema","json_schema":{"schema":{...}}}`
→ `format: <the schema dict>` (Ollama 0.5+ supports full JSON
schemas in the format field)
Defensive belt-and-suspenders: a small `_strip_json_fences` helper
runs after the Ollama response is decoded and removes any leftover
```json ... ``` wrapping. Some older vision models still wrap
output in fences even when `format` is set; this catches them.
Streaming path is unchanged because the nutriphi/planta refactor uses
non-streaming `generateObject`. Streaming structured output with
Ollama deserves its own pass when someone actually needs it.
Discovered during the AI SDK + Zod refactor smoke test — neither the
old nor the new vision routes ever returned validated JSON locally
because of this bug. Production uses Google Gemini directly via
fallback so the issue was masked there.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSDOM throws CSS / parser errors from detached parse5 callbacks that
escape every try/catch in the call stack and even bun's
process.on('uncaughtException') handlers — leaving the daemon stuck
crash-looping past the first bad page in source #4 (heise) without
ever making forward progress.
Set FULL_TEXT_THRESHOLD_WORDS = 0 so we never call into Readability.
Sources that ship full RSS bodies (Tagesschau, Spiegel, BBC, …) are
unaffected. Title-only sources (Hacker News) keep the row with an
empty content field; the reader already falls back to "Original
öffnen ↗" in that case.
Re-enabling extraction in a worker thread is left for a follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSDOM's CSS parser throws on plenty of real-world pages and the error
escapes every try/catch in the buildRow → ingestSource chain because
it fires from a parse5 callback that runs after JSDOM has returned.
In the prod container this killed the process on the first bad page,
docker restarted it, and it crash-looped on the same first source
forever — no progress past tech.
Two-layer fix: a silent VirtualConsole on every JSDOM instance to
swallow CSS / resource errors at the source, plus process-level
uncaughtException + unhandledRejection handlers that log and continue
so any future async escape can't kill the daemon either.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was copied verbatim from mana-credits' template but not actually
imported anywhere in src/. Removing it lets the Docker build's bun
install resolve from npm only — workspace:* refs need the full
monorepo context which the Dockerfile doesn't copy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON
feeds into news.curated_articles every 15 min, with Mozilla Readability
fallback for thin RSS bodies and 30-day retention. apps/api /feed is
rewritten to read from the new pool table directly instead of the
sync_changes hack, with topics/lang/since/limit/offset query params.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The workbench-registry app id 'inventar' did not match its
@mana/shared-branding MANA_APPS counterpart 'inventory', so the tier-
gating join in apps/web/src/lib/app-registry/registry.ts silently
failed for the inventory module — it fell into the "no MANA_APPS
entry, default visible" fallback and was effectively un-gated. The
codebase had also voted overwhelmingly for 'inventar' (53 files) vs
'inventory' (3 files in shared-branding), so the long-standing
mismatch was just bookkeeping debt waiting to bite.
Pre-release, no live data, so the cleanest fix is to align everything
on the English 'inventory':
- Workbench-registry id, module.config.ts appId, module folder, route
folder and i18n locale folder all renamed via git mv
- Standalone apps/inventar/ workspace package renamed
- All imports, store identifiers (InventarEvents → InventoryEvents,
INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto
paths follow the rename
- The German display label "Inventar" is preserved everywhere it is a
user-visible string (page titles, i18n values, toast labels)
- Dexie table prefixes (invCollections, invItems, …) are unchanged
- Drive-by fix: ListView.svelte was querying non-existent
inventarCollections/inventarItems tables — corrected to the actual
invCollections/invItems names from module.config
- The "inventar ↔ inventory id mismatch" workaround comment in
registry.ts is removed since the mismatch no longer exists
module-registry.ts also picks up the user's parallel newsModuleConfig
addition because both edits land in the same import block — keeping
them split would have left the build in an inconsistent state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a "Local Login & Dev Users" section to docs/LOCAL_DEVELOPMENT.md
and a short pointer in services/mana-auth/CLAUDE.md so the next dev
finds the script without first hitting the "why can't I log in?" wall:
- Why it exists (no admin seed, requireEmailVerification + no SMTP)
- The 3 default accounts + password
- Single-account form + env overrides (TIER, AUTH_URL, …)
- Idempotency promise
- Prereqs (Postgres + mana-auth on :3001)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /admin route in the unified Mana web app was rendering hardcoded
mock data (42 users, 156 successful logins, 3 failed) for every
admin who opened it. The previous code had a TODO comment to wire
up a real endpoint and the backend half had been waiting for the
frontend half ever since the consolidation landed.
Backend (mana-auth):
Add GET /api/v1/admin/stats — admin-only, returns the seven counts
the dashboard needs in a single response. Each count is its own
Drizzle query against auth.users / auth.sessions / auth.login_
attempts; they run in parallel via Promise.all so total latency is
dominated by the round-trip to Postgres, not the per-query work.
Stats:
- totalUsers → users where deleted_at IS NULL
- newUsers7d → users created in the last 7 days
- newUsers30d → users created in the last 30 days
- activeSessions → sessions where expires_at > now() AND not revoked
- uniqueUsers24h → distinct user_id from sessions with last_activity
in the last 24h (and not revoked)
- loginSuccess7d → login_attempts where successful=true, last 7d
- loginFailed7d → login_attempts where successful=false, last 7d
Plus a generatedAt ISO timestamp so the client can show staleness
if it ever caches the response.
Frontend (apps/mana/apps/web):
- Add adminService.getStats() in the existing admin API service
(sits next to getUsers / getUserData / deleteUserData; uses the
same authenticated base-client and ApiResult envelope).
- Replace the onMount mock-data block in admin/+page.svelte with
a single adminService.getStats() call. Drop the local Stats
interface in favor of the AdminStats type exported from the
service.
- Guard the Success Rate calculation against division by zero on
fresh deployments — when there have been no login attempts in
the last 7 days, render '—%' instead of NaN%.
Verification:
- mana-auth type-check unchanged (baseline errors only)
- mana-auth runtime tests still 19/19 passing
- svelte-check on the two changed web files: zero errors
Closes item #12 in docs/REFACTORING_AUDIT_2026_04.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three pnpm artifacts that were either Pre-Consolidation leftovers or
unintentional drift:
- apps/context/pnpm-lock.yaml + apps/context/pnpm-workspace.yaml
apps/context used to be its own nested workspace declaring
apps/* and packages/*. After consolidation only apps/context/
apps/mobile remains, and the root pnpm-workspace.yaml already
matches it via 'apps/*/apps/*'. The nested lockfile (242 KB)
was a separate dependency graph drifting independently from
the root.
- services/mana-media/packages/client/pnpm-lock.yaml
Anomalous lockfile in a workspace sub-package. The root
workspace already covers services/*/packages/* — no reason
for client/ to maintain its own resolution.
Verified after deletion:
- pnpm install completes cleanly (~16s) and now resolves
apps/context/apps/mobile from the root lockfile (pnpm list
confirms the workspace registration)
- apps/api type-check still 0 errors
- mana-auth tests still 19/19 passing
Tracked as item #26 in docs/REFACTORING_AUDIT_2026_04.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Locks in the relationship between three places that must agree about
SSO origin configuration:
1. TRUSTED_ORIGINS in better-auth.config.ts (Better Auth allow-list)
2. CORS_ORIGINS env var on mana-auth in docker-compose.macmini.yml
3. The HTTPS subset of (1) must be a subset of (2) — every origin
Better Auth trusts must also pass CORS preflight
Background: root CLAUDE.md references this spec file as the canonical
"Adding an app to SSO" verification step (line 116) but the file
itself never existed. The first run of this spec immediately caught
two real bugs:
- 3 origins in TRUSTED_ORIGINS were missing from CORS_ORIGINS
(https://auth.mana.how, https://arcade.mana.how, https://whopxl.mana.how)
- 22 zombie subdomain entries in CORS_ORIGINS left over from before
the consolidation (calendar, chat, todo, ...) that no app actually
routes to anymore
Both fixes shipped together with the TRUSTED_ORIGINS extraction in
the broader pre-launch sweep (commit 919fcca4b). This spec is the
guard against the same drift creeping back in.
Eight tests:
- canonical mana.how + auth subdomain present
- localhost dev origins (3001, 5173) present
- all production origins HTTPS
- all production origins on *.mana.how
- no duplicates
- every HTTPS trusted origin appears in mana-auth CORS_ORIGINS
- soft warning for CORS_ORIGINS entries not in trustedOrigins
(catches drift in the other direction)
8/8 pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-launch theme system audit found multiple parallel layers in themes.css
(--theme-X full hsl strings, --X partial shadcn aliases, --color-X populated
by runtime store with raw channels) plus dead-code companion files. The
inconsistency caused light-mode regressions when scoped-CSS consumers
wrote `var(--color-X)` standalone — the variable holds raw HSL channels
which is invalid as a color value, browser fell back to inherited (white).
Rewrite to one consistent layer:
- Source of truth: --color-X defined as raw HSL channels (e.g.
`0 0% 17%`) in :root, .dark, and all variant [data-theme="..."]
blocks. Matches the format the runtime store
(@mana/shared-theme/src/utils.ts) writes, eliminating the
static-fallback-vs-runtime mismatch and the corresponding flash
of unstyled content on hydration.
- @theme inline uses self-reference + Tailwind v4 <alpha-value>
placeholder so utility classes generate correctly AND opacity
modifiers work: `text-foreground/50` → `hsl(var(--color-foreground) / 0.5)`.
- @layer components (.btn-primary, .card, .badge, etc.) wraps
var(--color-X) refs with hsl() — they were broken in light mode
too for the same reason.
Convention going forward (also documented in the file header):
1. Markup: use Tailwind utility classes (text-foreground, bg-card, …)
2. Scoped CSS: hsl(var(--color-X)) — always wrap with hsl()
3. NEVER raw var(--color-X) in CSS — that's the bug pattern
Net file: 692 → 580 LOC. Single source layer, no indirection.
Also delete dead companion files (zero imports anywhere):
- tailwind-v4.css (had broken self-reference, never imported)
- theme-variables.css (legacy hex-based palette)
- components.css (legacy component utilities)
- index.js / preset.js / colors.js (Tailwind v3 preset format,
irrelevant under Tailwind v4)
package.json exports map shrinks accordingly to just `./themes.css`.
Consumers using `hsl(var(--color-X))` (~379 files across mana-web,
manavoxel-web, arcade-web) keep working unchanged — the public API
name `--color-X` is preserved. Only the broken pattern `var(--color-X)`
(~61 files) needs a follow-up sweep, handled in a separate commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
While adding negative-path integration tests for the auth flow I
discovered that *neither* of the lockout primitives in
services/mana-auth/src/services/security.ts has actually been
working in production. Two independent silent failures that combined
into a "the lockout never triggers, ever" outcome:
1. recordAttempt() inserted into auth.login_attempts with explicit
`id = gen_random_uuid()`, but auth.login_attempts.id is a
`serial integer` column with `nextval('auth.login_attempts_id_seq')`
as default. The UUID-into-integer cast threw a type error every
single time, the bare `catch {}` swallowed it as "non-critical",
and not a single login attempt was ever persisted. Lockout's "5
failures in 15 min" check was running against an empty table.
2. checkLockout() built `attempted_at > ${new Date(...)}` via the
drizzle sql template, but postgres-js cannot bind a JS Date object
directly — it tries to byteLength() the parameter and crashes with
`Received an instance of Date`. Same anti-pattern: bare `catch`,
returns `{locked: false}` (fail-open), no log, completely invisible.
Both are "silent broken since the encryption-vault series of changes"
class — caught only because the integration test for the lockout flow
expected the 6th login attempt to return 429 and got 200 instead.
Fixes:
- recordAttempt(): drop the bogus `id` column from the INSERT (let the
sequence default assign it), default ipAddress to null instead of
letting `${undefined}` collapse the parameter slot, and surface
errors in the catch instead of swallowing them silently.
- checkLockout(): pass `windowStart.toISOString()` instead of the Date
object so postgres-js can serialize it. Same catch upgrade — log the
cause when failing open.
Failure-path test additions (tests/integration/auth-failures.test.ts):
- wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events,
+1 row in auth.login_attempts
- account lockout: 5 failed attempts then 6th returns 429 with
remainingSeconds, even with the correct password
- unverified email login: 403 with code = EMAIL_NOT_VERIFIED
- validate with garbage token: valid !== true
- resend verification: second mail arrives in mailpit
Plus the run-integration-tests.sh helper now runs both .test.ts files
and tests/integration/package.json's `test` script does the same.
Negative-control: reverted the recordAttempt fix (re-added the bogus
gen_random_uuid id), the wrong-password test failed at the
login_attempts assertion. Reverted the checkLockout fix, the lockout
test failed at the 429 assertion. Both fixes verified to be load-bearing.
6 tests, 45 expects, ~1.3s on a warm cache.
logEvent() builds its INSERT via a raw `sql` tagged template:
sql\`INSERT INTO auth.security_events
(..., user_id, ip_address, user_agent, metadata, ...)
VALUES (..., \${params.userId}, \${params.ipAddress},
\${params.userAgent}, \${...metadata}, ...)\`
Most call sites only pass userId+eventType (or only eventType for the
LOGIN_FAILURE / PASSWORD_RESET_REQUESTED / PROFILE_UPDATED /
PASSWORD_CHANGED / ACCOUNT_DELETED events). The other params land in
the template as `undefined`, and postgres-js's tagged-template renderer
collapses `${undefined}` into literal nothing — producing this:
VALUES (gen_random_uuid(), $1, $2, , , $3::jsonb, NOW())
^^^^
Postgres rejects with "syntax error at or near \",\"". The catch block
swallowed it as a `console.warn('Failed to log security event
(non-critical):', params.eventType)` with no error detail, which is why
this has been silently broken for who knows how long — every register,
every login, every password change has been losing its audit row.
Fix:
- Coerce optional params to `null` (`params.userId ?? null`) before
interpolation. NULL is what postgres-js renders for an explicit null.
- Surface the actual error in the catch warn so the next time something
similar happens it shows up in logs instead of just "non-critical".
Verified the diagnosis by toggling `log_statement = all` on the test
postgres, triggering a register, and reading the literal failed
statement out of postgres logs.
A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:
- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
(the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
service itself was deleted long ago)
Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
role behind)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The matrix subsystem was removed in a prior commit. This commit cleans
up the small leftovers that grep found:
- docker-compose.macmini.yml: dropped the "Matrix Stack" port-range
comment, the "matrix" category from the naming convention, and a
stale watchtower comment about Matrix notifications.
- packages/credits/src/operations.ts: removed AI_BOT_CHAT credit
operation type and its definition. It was the billing entry for "Chat
with AI via Matrix bot" — no callers left.
- services/mana-credits gifts schema + service + validation: removed the
targetMatrixId column / param / Zod field. The corresponding
PostgreSQL column was dropped manually with
`ALTER TABLE gifts.gift_codes DROP COLUMN target_matrix_id` on prod.
- docker/grafana/dashboards/{master,system}-overview.json: removed the
`up{job="synapse"}` panel queries — they would have shown No Data
forever now that Synapse is gone.
Production-side cleanup performed in parallel (not in this commit):
- Stopped + removed mana-matrix-{synapse,element,web,bot} containers
- Removed mana-matrix-bot:local, matrix-web:latest,
matrixdotorg/synapse:latest, vectorim/element-web:latest images (~3 GB)
- Removed mana-matrix-bots-data Docker volume
- Removed /Volumes/ManaData/matrix/ media store (4.3 MB)
- DROP DATABASE matrix; DROP DATABASE synapse; on Postgres
Cosmetic leftovers intentionally untouched:
- Eisenhower matrix in todo (LayoutMode 'matrix') — productivity concept
- ${{ matrix.service }} in .github/workflows — GitHub Actions strategy
- services/mana-media/apps/api/dist/.../matrix/* — stale build output
(not in git, regenerated next mana-media build)
This commit bundles two unrelated changes that were swept together by an
accidental `git add -A` in another working session. Documented here so the
history reflects what's actually inside.
═══════════════════════════════════════════════════════════════════════
1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead
of api.signInEmail
═══════════════════════════════════════════════════════════════════════
Previous attempt (commit 55cc75e7d) tried to fix the broken JWT mint in
/api/v1/auth/login by switching the cookie name from `mana.session_token`
to `__Secure-mana.session_token` for production. That was necessary but
not sufficient: Better Auth's session cookie value isn't just the raw
session token, it's `<token>.<HMAC>` where the HMAC is derived from the
better-auth secret. Reconstructing the cookie from auth.api.signInEmail's
JSON response only gave us the raw token, so /api/auth/token's
get-session middleware still couldn't validate it and the JWT mint kept
silently failing.
Real fix: do the sign-in via auth.handler (the HTTP path) rather than
auth.api.signInEmail (the SDK path). The handler returns a real fetch
Response with a Set-Cookie header containing the fully signed cookie
envelope. We capture that header verbatim and forward it as the cookie
on the /api/auth/token request, which now passes validation and mints
the JWT correctly.
Verified end-to-end on auth.mana.how:
$ curl -X POST https://auth.mana.how/api/v1/auth/login \
-d '{"email":"...","password":"..."}'
{
"user": {...},
"token": "<session token>",
"accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now
"refreshToken": "<session token>"
}
Side benefits:
- Email-not-verified path is now handled by checking
signInResponse.status === 403 directly, no more catching APIError
with the comment-noted async-stream footgun.
- X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter
and our security log see the real client IP.
- The leftover catch block now only handles unexpected exceptions
(network errors etc); the FORBIDDEN-checking logic in it is dead but
harmless and left in for defense in depth.
═══════════════════════════════════════════════════════════════════════
2. chore: remove the entire self-hosted Matrix stack (Synapse, Element,
Manalink, mana-matrix-bot)
═══════════════════════════════════════════════════════════════════════
The Matrix subsystem ran parallel to the main Mana product without any
load-bearing integration: the unified web app never imported matrix-js-sdk,
the chat module uses mana-sync (local-first), and mana-matrix-bot's
plugins duplicated features the unified app already ships natively.
Keeping it alive cost a Synapse + Element + matrix-web + bot container
quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth,
and a steady drip of devlog/dependency churn.
Removed:
- apps/matrix (Manalink web + mobile, ~150 files)
- services/mana-matrix-bot (Go bot with ~20 plugins)
- docker/matrix configs (Synapse + Element)
- synapse/element-web/matrix-web/mana-matrix-bot services in
docker-compose.macmini.yml
- matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes
- OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks
table from mana-auth (oauth_* schema definitions also removed)
- MatrixService import path in mana-media (importFromMatrix endpoint)
- Matrix notification channel in mana-notify (worker, metrics, config,
channel_type enum, MatrixOptions handler)
- Matrix entries from shared-branding (mana-apps + app-icons),
notify-client, the i18n bundle, the observatory map, the credits
app-label list, the landing footer/apps page, the prometheus + alerts
+ promtail tier mappings, and the matrix-related deploy paths in
cd-macmini.yml + ci.yml
Devlog/manascore/blueprint entries that mention Matrix are left intact
as historical record. The oauth_* + matrix_user_links Postgres tables
stay on existing prod databases — code can no longer write to them, drop
them in a follow-up migration if you want them gone for real.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The custom /api/v1/auth/login route signs the user in via the
better-auth SDK (auth.api.signInEmail) and then forges a request to
/api/auth/token to mint a JWT, passing the session token as a synthetic
cookie header.
The cookie name was hardcoded as `mana.session_token=...`, but in
production better-auth issues the session cookie with the __Secure-
prefix (because secure: true is enabled). Get-session middleware on the
/api/auth/token side couldn't find the session under the unprefixed
name, so it returned 401 silently. Result: tokenResponse.ok was false,
the route fell through, and the response had no `accessToken` field at
all — only the bare { token, user, redirect } from signInEmail.
The frontend in @mana/shared-auth then picked this up as
`data.accessToken === undefined` and stored undefined as the JWT, while
the parallel /api/auth/sign-in/email call masked the visible damage by
setting the SSO cookie. So login *appeared* to work in the browser
(cookie present, session worked) but the JWT path was always broken.
Fix: pick the cookie name based on config.nodeEnv. In production use
__Secure-mana.session_token, in development use mana.session_token (no
__Secure- prefix because secure: false in dev).
Verified end-to-end on auth.mana.how:
POST /api/v1/auth/login → response now includes accessToken (a real
JWT, EdDSA, with sub/email/role/sid/tier/iss/aud claims), refreshToken
(the session token), plus the original signInEmail fields.
The other /api/auth/get-session call sites in this file forward the
incoming request headers verbatim, so they preserve whatever real cookie
the browser sent and don't have this bug.