**Unit tests (`bun test`, 42 checks, 0 deps)**
- `src/lib/__tests__/category-map.test.ts` locks in the Pelias→
PlaceCategory priority resolution. Covers the ambiguous multi-category
case (food beats retail for restaurants, transit beats professional
for car rentals, transport:rail still maps to transit, …), the simple
single-category paths, the layer-hint fallback, and regression cases
from real Konstanz/Stuttgart/Köln venues observed during deploy
verification.
- `src/lib/__tests__/cache.test.ts` covers LRU eviction order, TTL
expiry, move-to-end on get (so frequently-read entries survive
eviction), size tracking, and typed-value storage.
**Smoke test (`./scripts/smoke-test.sh` or `bun run test:smoke`)**
End-to-end curls against a running service, aimed at post-deploy
verification. Health endpoints, forward (venue + street fallback),
focus biasing, reverse geocoding, cache hit. 9 checks total.
Wired up as `test:smoke` in package.json so it runs alongside the
unit tests. Verified working: 42/42 unit tests green locally, 9/9
smoke checks green against the live Mac Mini deployment.
CLAUDE.md Testing section rewritten to reflect the new test layers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After the 2026-04-11 production deploy, several non-obvious gotchas
surfaced that needed documenting:
- Forward search: autocomplete→search fallback explained, so future-me
knows why the handler hits two Pelias endpoints for address-style
queries.
- Pelias infra: corrected object counts (13.4M actual, not 22M), noted
the libpostal RAM surprise (~1.9 GB, much larger than Pelias docs
suggest), and added real per-container RAM numbers from production.
- pelias.json: document that we dropped placeholder/pip/interpolation
(not just how to run them) and why the cleaner degradation matters.
- Wrapper gotchas section: Bun idleTimeout, Colima bind-mount cache
staleness, and the host.docker.internal-from-blackbox workaround.
- /health/pelias endpoint is now listed in the API table since it's
the integration point with blackbox monitoring.
- Testing section added — explicitly "no automated tests yet", with a
curl-based manual smoke test set a human can run after changes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pelias /autocomplete deliberately excludes the address layer as a
performance optimization, so queries like "Marktstätte Konstanz"
(street + locality) return 0 venue matches even though they're clearly
in the index. /search covers all layers including addresses and streets.
Query /autocomplete first (fast, fuzzy, great for venue names), and if
it returns nothing, try /search. Best of both worlds: quick matches for
"Konzil Restaurant" plus reliable matches for street addresses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two production follow-ups surfaced after the deploy:
1. Pelias API was emitting continuous `ENOTFOUND placeholder`, `pip`,
`interpolation` errors because we declared those services in
pelias.json but never actually run them (we don't need WOF
admin lookup or street interpolation for the DACH use case).
Removed the stale entries — Pelias degrades cleanly to
libpostal-only parsing, which is what we want.
2. Bun.serve's default idleTimeout is 10s, which is too tight for
cold Pelias queries hitting Elasticsearch. Raise to 60s so
first-query-after-idle doesn't get cut off.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
blackbox-exporter can't resolve host.docker.internal on Colima, so
probes of host.docker.internal:4000 and :9200 always fail. Instead,
add a /health/pelias endpoint on the Hono wrapper that proxies to
the Pelias API, and update prometheus.yml to probe the wrapper's
proxied health endpoint.
Also simplifies the status page friendly_name() now that we don't
need to display the host.docker.internal targets.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port 4400 collides with mana-infra-landings (status.mana.how nginx)
on the production mac mini. libpostal is only reached internally by
pelias-api over the pelias compose network anyway — no host binding
needed. Use expose instead of ports to drop the host mapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use node:22-alpine + pnpm to install workspace dependencies, then copy
node_modules into the bun runtime stage. This resolves @mana/shared-hono
which depends on @mana/shared-logger (transitive workspace dep).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bun install doesn't read pnpm-workspace.yaml, so workspace dependencies
like @mana/shared-hono can't be resolved. Switch to pnpm install with
--filter to install only mana-credits and its workspace deps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous version chained cd + bun install with || fallback, which
left CWD in services/mana-credits after the first attempt and caused the
fallback cd to fail. Use WORKDIR directives instead — each step starts
from a known absolute path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Production deployment + observability for the self-hosted geocoding stack:
**docker-compose.macmini.yml**
- New mana-geocoding container (port 3018, internal-only — no traefik
labels, no Cloudflare route). Uses host.docker.internal to reach the
Pelias API on the host's pelias compose stack. Dockerfile added under
services/mana-geocoding/ using the same Bun/Hono pattern as mana-events.
**Prometheus**
- New blackbox-internal job probing mana-geocoding:3018/health, the
Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch
at host.docker.internal:9200/_cluster/health. Kept separate from
blackbox-api which is reserved for public HTTPS endpoints.
**status.mana.how (generate-status-page.sh)**
- Include blackbox-internal in the metric query and add an "Interne
Dienste" section with its own summary card, right between Infrastruktur
and GPU Dienste. Summary grid goes from 4 to 5 columns with a
900px breakpoint.
- friendly_name() now handles http:// URLs and rewrites container-name
hosts like mana-geocoding:3018/health → "Mana Geocoding",
host.docker.internal:4000 → "Pelias API",
host.docker.internal:9200 → "Pelias Elasticsearch".
**Grafana uptime dashboard**
- Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel
- New "Interne Dienste Status" table panel showing per-instance up/down
- New "Geocoding Ø Latenz" stat panel for probe_duration_seconds
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expand services/mana-geocoding/CLAUDE.md with:
- The Pelias API patch (geojsonify_place_details.js) that forces the
category field to always be returned, with regeneration instructions
- The priority-ordered Pelias→PlaceCategory mapping and verified
example mappings from the DACH index
- A full initial-import walkthrough covering the non-obvious gotchas
(analysis-icu plugin, dach-latest → planet-latest rename, adminLookup
disabled, leveldbpath, libpostal config object form, boundary.country
single-value constraint)
Also register mana-geocoding in the root services list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dockerfile only copied services/mana-sync, but go.mod has a replace
directive pointing to ../../packages/shared-go which needs to be in the
build context. Switch context to repo root and copy both packages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pelias hides the 'category' field from API responses unless the
caller filters by categories=... explicitly — a default intended for
keyword search that strips category metadata from address queries.
Patch the Pelias API's geojsonify_place_details.js so the category
array is returned on every feature (food, retail, transport, …),
mounted into the container as a read-only volume override.
Rewrite category-map.ts to map Pelias' OSM taxonomy to our 7
PlaceCategories using a priority-ordered list so a restaurant
tagged ['food','retail','nightlife'] resolves to 'food' (the most
specific), not 'shopping'.
Verified with Konstanz test queries:
Konzil Restaurant → food
Bahnhof Konstanz → transit
Physiotherapie-Schule → work
MX-Park → leisure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After importing 22M OSM objects for the DACH extract:
- Disable adminLookup (no WOF data needed for address search)
- Configure leveldb path inside the data volume
- Specify planet-latest.osm.pbf as the import filename
- Convert libpostal service config from string to object form
- Drop boundary.country default — Pelias only accepts a single
country value, and our index only contains DACH data anyway
Verified forward + reverse geocoding work end-to-end for Konstanz
test queries via the mana-geocoding wrapper on port 3018.
Known limitation: OSM category/type (amenity:restaurant etc.) is
not yet populated in Pelias responses — will require whitelisting
those tags in the importer config and re-running the import.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Dockerfile copied only its own package.json, causing bun install to
fail on @mana/shared-hono workspace dependency. Now copies workspace root
package.json and shared-hono/shared-types packages.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New mana-geocoding service (port 3018) wraps a self-hosted Pelias
instance with LRU caching and OSM→PlaceCategory auto-mapping.
All geocoding queries stay within our infrastructure — no user
location data leaves the network.
Places module integration:
- Address autocomplete search in ListView (creates place with
name, coords, address, category in one step)
- Address search + reverse geocoding button in DetailView
- Auto-fill address via reverse geocoding during tracking
- OSM category mapping (amenity:restaurant→food, shop:*→shopping, etc.)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add MANA_CREDITS_URL and MANA_SERVICE_KEY to configuration table
- Document billing gate on sync endpoints (402 behavior, 5min cache, fail-open)
- Add billing/check.go to project structure
- Add stream endpoint to API table
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cloud Sync is now a paid feature: 30 credits/month (90/quarter, 360/year).
Users start in local-only mode and opt-in via Settings > Cloud Sync.
1 Credit = 1 Cent, so sync costs ~0.30€/month.
When credits run out, sync is paused (not deleted) and an in-app banner
prompts the user to top up. Local data is always preserved.
Backend (mana-credits):
- New sync_subscriptions table in credits schema
- SyncBillingService with activate/deactivate/chargeRecurring
- User-facing routes: GET/POST /api/v1/sync/{status,activate,deactivate,change-interval}
- Internal routes for server-side checks and cron triggers
Frontend (mana web):
- Sync API client + reactive sync-billing store
- syncEnabled parameter gates createUnifiedSync() — sync only starts when active
- Settings sync page with interval selection and activate/deactivate
- Pause banner in app layout when credits insufficient
Also: removed CALDAV_SYNC/GOOGLE_SYNC operations (not needed),
updated CLOUD_SYNC cost from 5 to 30 credits/month.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The credit system was overengineered for the local-first architecture:
- Productivity micro-credits (task/event/contact creation at 0.02 credits) made no sense
since these operations happen locally in IndexedDB with zero server cost and were never enforced
- Guild pool system (6 DB tables, spending limits, membership checks) had no active users
- Gift system had 5 types (simple/personalized/split/first_come/riddle) when 2 suffice
Now credits are only charged for operations that actually cost money: AI API calls and
premium features (sync, exports). This makes the value proposition clear to users.
Changes:
- Remove 8 productivity operations + CreditCategory.PRODUCTIVITY from @mana/credits
- Delete guild pool service, routes, schema (3 files); remove guild refs from 8 backend files
- Simplify gifts to simple + personalized only; remove bcrypt/riddle/portions logic
- Update all frontend pages (credits dashboard, gift create/redeem, public gift page)
- Update shared-hono consumeCredits() to remove creditSource parameter
- Update mana-credits CLAUDE.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redis: allkeys-lru → noeviction to prevent silent data loss when memory full
- mana-media: --watch → --hot to fix EADDRINUSE crash on Bun HMR reload
- Svelte: build initial values before $state() to avoid state_referenced_locally warnings
in create-app-onboarding.svelte.ts and shared-llm/store.svelte.ts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The media schema/tables were never created on fresh deploys because
mana-media only shipped a `db:push` script and nothing ever ran it
in the container. Result: every upload returned 500 the moment a
new environment came up (just hit prod again on mana.how).
- Add `db:generate` + `db:migrate` scripts and a migrate.ts runner
- Generate the initial migration covering media/media_references/
media_thumbnails (matches what was already on local + prod, which
were stamped manually so the migrator skips on existing deploys)
- Call runMigrations() at startup in src/index.ts so future fresh
containers self-bootstrap. Idempotent — drizzle tracks state in
drizzle.__drizzle_migrations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The first iteration of the Ollama response_format passthrough crashed
with 'ChatCompletionRequest object has no attribute response_format'
because the Pydantic request model didn't declare the field at all —
incoming response_format from OpenAI-compatible clients was being
silently dropped at the parsing layer before the provider could see it.
Fix: declare a typed ResponseFormat sub-model with the two OpenAI shapes
('json_object' and 'json_schema'), add it as an optional field on
ChatCompletionRequest, and let the Ollama provider read it directly
without defensive getattr fallbacks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Ollama provider was completely ignoring `response_format` from the
incoming OpenAI-compatible request. Two consequences:
1. Clients that asked for `{"type":"json_object"}` or
`{"type":"json_schema",...}` got back JSON wrapped in
```json ... ``` markdown fences, because Ollama defaults to
conversational output.
2. Strict downstream parsers (Vercel AI SDK `generateObject`,
manual `JSON.parse`) failed to decode the response and threw,
even though the underlying JSON was valid inside the fences.
Fix: when response_format is set, translate it to Ollama's native
`format` field:
- `{"type":"json_object"}` → `format: "json"`
- `{"type":"json_schema","json_schema":{"schema":{...}}}`
→ `format: <the schema dict>` (Ollama 0.5+ supports full JSON
schemas in the format field)
Defensive belt-and-suspenders: a small `_strip_json_fences` helper
runs after the Ollama response is decoded and removes any leftover
```json ... ``` wrapping. Some older vision models still wrap
output in fences even when `format` is set; this catches them.
Streaming path is unchanged because the nutriphi/planta refactor uses
non-streaming `generateObject`. Streaming structured output with
Ollama deserves its own pass when someone actually needs it.
Discovered during the AI SDK + Zod refactor smoke test — neither the
old nor the new vision routes ever returned validated JSON locally
because of this bug. Production uses Google Gemini directly via
fallback so the issue was masked there.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSDOM throws CSS / parser errors from detached parse5 callbacks that
escape every try/catch in the call stack and even bun's
process.on('uncaughtException') handlers — leaving the daemon stuck
crash-looping past the first bad page in source #4 (heise) without
ever making forward progress.
Set FULL_TEXT_THRESHOLD_WORDS = 0 so we never call into Readability.
Sources that ship full RSS bodies (Tagesschau, Spiegel, BBC, …) are
unaffected. Title-only sources (Hacker News) keep the row with an
empty content field; the reader already falls back to "Original
öffnen ↗" in that case.
Re-enabling extraction in a worker thread is left for a follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JSDOM's CSS parser throws on plenty of real-world pages and the error
escapes every try/catch in the buildRow → ingestSource chain because
it fires from a parse5 callback that runs after JSDOM has returned.
In the prod container this killed the process on the first bad page,
docker restarted it, and it crash-looped on the same first source
forever — no progress past tech.
Two-layer fix: a silent VirtualConsole on every JSDOM instance to
swallow CSS / resource errors at the source, plus process-level
uncaughtException + unhandledRejection handlers that log and continue
so any future async escape can't kill the daemon either.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Was copied verbatim from mana-credits' template but not actually
imported anywhere in src/. Removing it lets the Docker build's bun
install resolve from npm only — workspace:* refs need the full
monorepo context which the Dockerfile doesn't copy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON
feeds into news.curated_articles every 15 min, with Mozilla Readability
fallback for thin RSS bodies and 30-day retention. apps/api /feed is
rewritten to read from the new pool table directly instead of the
sync_changes hack, with topics/lang/since/limit/offset query params.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The workbench-registry app id 'inventar' did not match its
@mana/shared-branding MANA_APPS counterpart 'inventory', so the tier-
gating join in apps/web/src/lib/app-registry/registry.ts silently
failed for the inventory module — it fell into the "no MANA_APPS
entry, default visible" fallback and was effectively un-gated. The
codebase had also voted overwhelmingly for 'inventar' (53 files) vs
'inventory' (3 files in shared-branding), so the long-standing
mismatch was just bookkeeping debt waiting to bite.
Pre-release, no live data, so the cleanest fix is to align everything
on the English 'inventory':
- Workbench-registry id, module.config.ts appId, module folder, route
folder and i18n locale folder all renamed via git mv
- Standalone apps/inventar/ workspace package renamed
- All imports, store identifiers (InventarEvents → InventoryEvents,
INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto
paths follow the rename
- The German display label "Inventar" is preserved everywhere it is a
user-visible string (page titles, i18n values, toast labels)
- Dexie table prefixes (invCollections, invItems, …) are unchanged
- Drive-by fix: ListView.svelte was querying non-existent
inventarCollections/inventarItems tables — corrected to the actual
invCollections/invItems names from module.config
- The "inventar ↔ inventory id mismatch" workaround comment in
registry.ts is removed since the mismatch no longer exists
module-registry.ts also picks up the user's parallel newsModuleConfig
addition because both edits land in the same import block — keeping
them split would have left the build in an inconsistent state.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a "Local Login & Dev Users" section to docs/LOCAL_DEVELOPMENT.md
and a short pointer in services/mana-auth/CLAUDE.md so the next dev
finds the script without first hitting the "why can't I log in?" wall:
- Why it exists (no admin seed, requireEmailVerification + no SMTP)
- The 3 default accounts + password
- Single-account form + env overrides (TIER, AUTH_URL, …)
- Idempotency promise
- Prereqs (Postgres + mana-auth on :3001)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /admin route in the unified Mana web app was rendering hardcoded
mock data (42 users, 156 successful logins, 3 failed) for every
admin who opened it. The previous code had a TODO comment to wire
up a real endpoint and the backend half had been waiting for the
frontend half ever since the consolidation landed.
Backend (mana-auth):
Add GET /api/v1/admin/stats — admin-only, returns the seven counts
the dashboard needs in a single response. Each count is its own
Drizzle query against auth.users / auth.sessions / auth.login_
attempts; they run in parallel via Promise.all so total latency is
dominated by the round-trip to Postgres, not the per-query work.
Stats:
- totalUsers → users where deleted_at IS NULL
- newUsers7d → users created in the last 7 days
- newUsers30d → users created in the last 30 days
- activeSessions → sessions where expires_at > now() AND not revoked
- uniqueUsers24h → distinct user_id from sessions with last_activity
in the last 24h (and not revoked)
- loginSuccess7d → login_attempts where successful=true, last 7d
- loginFailed7d → login_attempts where successful=false, last 7d
Plus a generatedAt ISO timestamp so the client can show staleness
if it ever caches the response.
Frontend (apps/mana/apps/web):
- Add adminService.getStats() in the existing admin API service
(sits next to getUsers / getUserData / deleteUserData; uses the
same authenticated base-client and ApiResult envelope).
- Replace the onMount mock-data block in admin/+page.svelte with
a single adminService.getStats() call. Drop the local Stats
interface in favor of the AdminStats type exported from the
service.
- Guard the Success Rate calculation against division by zero on
fresh deployments — when there have been no login attempts in
the last 7 days, render '—%' instead of NaN%.
Verification:
- mana-auth type-check unchanged (baseline errors only)
- mana-auth runtime tests still 19/19 passing
- svelte-check on the two changed web files: zero errors
Closes item #12 in docs/REFACTORING_AUDIT_2026_04.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three pnpm artifacts that were either Pre-Consolidation leftovers or
unintentional drift:
- apps/context/pnpm-lock.yaml + apps/context/pnpm-workspace.yaml
apps/context used to be its own nested workspace declaring
apps/* and packages/*. After consolidation only apps/context/
apps/mobile remains, and the root pnpm-workspace.yaml already
matches it via 'apps/*/apps/*'. The nested lockfile (242 KB)
was a separate dependency graph drifting independently from
the root.
- services/mana-media/packages/client/pnpm-lock.yaml
Anomalous lockfile in a workspace sub-package. The root
workspace already covers services/*/packages/* — no reason
for client/ to maintain its own resolution.
Verified after deletion:
- pnpm install completes cleanly (~16s) and now resolves
apps/context/apps/mobile from the root lockfile (pnpm list
confirms the workspace registration)
- apps/api type-check still 0 errors
- mana-auth tests still 19/19 passing
Tracked as item #26 in docs/REFACTORING_AUDIT_2026_04.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Locks in the relationship between three places that must agree about
SSO origin configuration:
1. TRUSTED_ORIGINS in better-auth.config.ts (Better Auth allow-list)
2. CORS_ORIGINS env var on mana-auth in docker-compose.macmini.yml
3. The HTTPS subset of (1) must be a subset of (2) — every origin
Better Auth trusts must also pass CORS preflight
Background: root CLAUDE.md references this spec file as the canonical
"Adding an app to SSO" verification step (line 116) but the file
itself never existed. The first run of this spec immediately caught
two real bugs:
- 3 origins in TRUSTED_ORIGINS were missing from CORS_ORIGINS
(https://auth.mana.how, https://arcade.mana.how, https://whopxl.mana.how)
- 22 zombie subdomain entries in CORS_ORIGINS left over from before
the consolidation (calendar, chat, todo, ...) that no app actually
routes to anymore
Both fixes shipped together with the TRUSTED_ORIGINS extraction in
the broader pre-launch sweep (commit 919fcca4b). This spec is the
guard against the same drift creeping back in.
Eight tests:
- canonical mana.how + auth subdomain present
- localhost dev origins (3001, 5173) present
- all production origins HTTPS
- all production origins on *.mana.how
- no duplicates
- every HTTPS trusted origin appears in mana-auth CORS_ORIGINS
- soft warning for CORS_ORIGINS entries not in trustedOrigins
(catches drift in the other direction)
8/8 pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-launch theme system audit found multiple parallel layers in themes.css
(--theme-X full hsl strings, --X partial shadcn aliases, --color-X populated
by runtime store with raw channels) plus dead-code companion files. The
inconsistency caused light-mode regressions when scoped-CSS consumers
wrote `var(--color-X)` standalone — the variable holds raw HSL channels
which is invalid as a color value, browser fell back to inherited (white).
Rewrite to one consistent layer:
- Source of truth: --color-X defined as raw HSL channels (e.g.
`0 0% 17%`) in :root, .dark, and all variant [data-theme="..."]
blocks. Matches the format the runtime store
(@mana/shared-theme/src/utils.ts) writes, eliminating the
static-fallback-vs-runtime mismatch and the corresponding flash
of unstyled content on hydration.
- @theme inline uses self-reference + Tailwind v4 <alpha-value>
placeholder so utility classes generate correctly AND opacity
modifiers work: `text-foreground/50` → `hsl(var(--color-foreground) / 0.5)`.
- @layer components (.btn-primary, .card, .badge, etc.) wraps
var(--color-X) refs with hsl() — they were broken in light mode
too for the same reason.
Convention going forward (also documented in the file header):
1. Markup: use Tailwind utility classes (text-foreground, bg-card, …)
2. Scoped CSS: hsl(var(--color-X)) — always wrap with hsl()
3. NEVER raw var(--color-X) in CSS — that's the bug pattern
Net file: 692 → 580 LOC. Single source layer, no indirection.
Also delete dead companion files (zero imports anywhere):
- tailwind-v4.css (had broken self-reference, never imported)
- theme-variables.css (legacy hex-based palette)
- components.css (legacy component utilities)
- index.js / preset.js / colors.js (Tailwind v3 preset format,
irrelevant under Tailwind v4)
package.json exports map shrinks accordingly to just `./themes.css`.
Consumers using `hsl(var(--color-X))` (~379 files across mana-web,
manavoxel-web, arcade-web) keep working unchanged — the public API
name `--color-X` is preserved. Only the broken pattern `var(--color-X)`
(~61 files) needs a follow-up sweep, handled in a separate commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
While adding negative-path integration tests for the auth flow I
discovered that *neither* of the lockout primitives in
services/mana-auth/src/services/security.ts has actually been
working in production. Two independent silent failures that combined
into a "the lockout never triggers, ever" outcome:
1. recordAttempt() inserted into auth.login_attempts with explicit
`id = gen_random_uuid()`, but auth.login_attempts.id is a
`serial integer` column with `nextval('auth.login_attempts_id_seq')`
as default. The UUID-into-integer cast threw a type error every
single time, the bare `catch {}` swallowed it as "non-critical",
and not a single login attempt was ever persisted. Lockout's "5
failures in 15 min" check was running against an empty table.
2. checkLockout() built `attempted_at > ${new Date(...)}` via the
drizzle sql template, but postgres-js cannot bind a JS Date object
directly — it tries to byteLength() the parameter and crashes with
`Received an instance of Date`. Same anti-pattern: bare `catch`,
returns `{locked: false}` (fail-open), no log, completely invisible.
Both are "silent broken since the encryption-vault series of changes"
class — caught only because the integration test for the lockout flow
expected the 6th login attempt to return 429 and got 200 instead.
Fixes:
- recordAttempt(): drop the bogus `id` column from the INSERT (let the
sequence default assign it), default ipAddress to null instead of
letting `${undefined}` collapse the parameter slot, and surface
errors in the catch instead of swallowing them silently.
- checkLockout(): pass `windowStart.toISOString()` instead of the Date
object so postgres-js can serialize it. Same catch upgrade — log the
cause when failing open.
Failure-path test additions (tests/integration/auth-failures.test.ts):
- wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events,
+1 row in auth.login_attempts
- account lockout: 5 failed attempts then 6th returns 429 with
remainingSeconds, even with the correct password
- unverified email login: 403 with code = EMAIL_NOT_VERIFIED
- validate with garbage token: valid !== true
- resend verification: second mail arrives in mailpit
Plus the run-integration-tests.sh helper now runs both .test.ts files
and tests/integration/package.json's `test` script does the same.
Negative-control: reverted the recordAttempt fix (re-added the bogus
gen_random_uuid id), the wrong-password test failed at the
login_attempts assertion. Reverted the checkLockout fix, the lockout
test failed at the 429 assertion. Both fixes verified to be load-bearing.
6 tests, 45 expects, ~1.3s on a warm cache.
logEvent() builds its INSERT via a raw `sql` tagged template:
sql\`INSERT INTO auth.security_events
(..., user_id, ip_address, user_agent, metadata, ...)
VALUES (..., \${params.userId}, \${params.ipAddress},
\${params.userAgent}, \${...metadata}, ...)\`
Most call sites only pass userId+eventType (or only eventType for the
LOGIN_FAILURE / PASSWORD_RESET_REQUESTED / PROFILE_UPDATED /
PASSWORD_CHANGED / ACCOUNT_DELETED events). The other params land in
the template as `undefined`, and postgres-js's tagged-template renderer
collapses `${undefined}` into literal nothing — producing this:
VALUES (gen_random_uuid(), $1, $2, , , $3::jsonb, NOW())
^^^^
Postgres rejects with "syntax error at or near \",\"". The catch block
swallowed it as a `console.warn('Failed to log security event
(non-critical):', params.eventType)` with no error detail, which is why
this has been silently broken for who knows how long — every register,
every login, every password change has been losing its audit row.
Fix:
- Coerce optional params to `null` (`params.userId ?? null`) before
interpolation. NULL is what postgres-js renders for an explicit null.
- Surface the actual error in the catch warn so the next time something
similar happens it shows up in logs instead of just "non-critical".
Verified the diagnosis by toggling `log_statement = all` on the test
postgres, triggering a register, and reading the literal failed
statement out of postgres logs.
A grep audit after the previous matrix removal commits found a handful
of stragglers in non-runtime files that the earlier sweeps missed:
- services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the
consumer-apps diagram and from the related-services table
- services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration
bullet
- packages/notify-client/README.md: removed sendMatrix() doc entry
(the method itself was already gone in the prior cleanup)
- docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix
Stack" log row that queried tier="matrix" (would show no data forever)
- docker/grafana/dashboards/master-overview.json: dropped the "Matrix
Bots" stat panel that counted up{job=~"matrix-.*-bot"}
- apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via
scripts/ecosystem-audit.mjs to drop matrix from the app list, icon
counts, file analytics, top offenders and authGuard missing list
- .gitignore: removed services/matrix-stt-bot/data/ pattern (the
service itself was deleted long ago)
Production-side stragglers also addressed (not in this commit):
- DROP USER synapse on prod Postgres (the parallel cleanup commit
2514831a3 dropped DATABASE matrix + DATABASE synapse but left the
role behind)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The matrix subsystem was removed in a prior commit. This commit cleans
up the small leftovers that grep found:
- docker-compose.macmini.yml: dropped the "Matrix Stack" port-range
comment, the "matrix" category from the naming convention, and a
stale watchtower comment about Matrix notifications.
- packages/credits/src/operations.ts: removed AI_BOT_CHAT credit
operation type and its definition. It was the billing entry for "Chat
with AI via Matrix bot" — no callers left.
- services/mana-credits gifts schema + service + validation: removed the
targetMatrixId column / param / Zod field. The corresponding
PostgreSQL column was dropped manually with
`ALTER TABLE gifts.gift_codes DROP COLUMN target_matrix_id` on prod.
- docker/grafana/dashboards/{master,system}-overview.json: removed the
`up{job="synapse"}` panel queries — they would have shown No Data
forever now that Synapse is gone.
Production-side cleanup performed in parallel (not in this commit):
- Stopped + removed mana-matrix-{synapse,element,web,bot} containers
- Removed mana-matrix-bot:local, matrix-web:latest,
matrixdotorg/synapse:latest, vectorim/element-web:latest images (~3 GB)
- Removed mana-matrix-bots-data Docker volume
- Removed /Volumes/ManaData/matrix/ media store (4.3 MB)
- DROP DATABASE matrix; DROP DATABASE synapse; on Postgres
Cosmetic leftovers intentionally untouched:
- Eisenhower matrix in todo (LayoutMode 'matrix') — productivity concept
- ${{ matrix.service }} in .github/workflows — GitHub Actions strategy
- services/mana-media/apps/api/dist/.../matrix/* — stale build output
(not in git, regenerated next mana-media build)
This commit bundles two unrelated changes that were swept together by an
accidental `git add -A` in another working session. Documented here so the
history reflects what's actually inside.
═══════════════════════════════════════════════════════════════════════
1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead
of api.signInEmail
═══════════════════════════════════════════════════════════════════════
Previous attempt (commit 55cc75e7d) tried to fix the broken JWT mint in
/api/v1/auth/login by switching the cookie name from `mana.session_token`
to `__Secure-mana.session_token` for production. That was necessary but
not sufficient: Better Auth's session cookie value isn't just the raw
session token, it's `<token>.<HMAC>` where the HMAC is derived from the
better-auth secret. Reconstructing the cookie from auth.api.signInEmail's
JSON response only gave us the raw token, so /api/auth/token's
get-session middleware still couldn't validate it and the JWT mint kept
silently failing.
Real fix: do the sign-in via auth.handler (the HTTP path) rather than
auth.api.signInEmail (the SDK path). The handler returns a real fetch
Response with a Set-Cookie header containing the fully signed cookie
envelope. We capture that header verbatim and forward it as the cookie
on the /api/auth/token request, which now passes validation and mints
the JWT correctly.
Verified end-to-end on auth.mana.how:
$ curl -X POST https://auth.mana.how/api/v1/auth/login \
-d '{"email":"...","password":"..."}'
{
"user": {...},
"token": "<session token>",
"accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now
"refreshToken": "<session token>"
}
Side benefits:
- Email-not-verified path is now handled by checking
signInResponse.status === 403 directly, no more catching APIError
with the comment-noted async-stream footgun.
- X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter
and our security log see the real client IP.
- The leftover catch block now only handles unexpected exceptions
(network errors etc); the FORBIDDEN-checking logic in it is dead but
harmless and left in for defense in depth.
═══════════════════════════════════════════════════════════════════════
2. chore: remove the entire self-hosted Matrix stack (Synapse, Element,
Manalink, mana-matrix-bot)
═══════════════════════════════════════════════════════════════════════
The Matrix subsystem ran parallel to the main Mana product without any
load-bearing integration: the unified web app never imported matrix-js-sdk,
the chat module uses mana-sync (local-first), and mana-matrix-bot's
plugins duplicated features the unified app already ships natively.
Keeping it alive cost a Synapse + Element + matrix-web + bot container
quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth,
and a steady drip of devlog/dependency churn.
Removed:
- apps/matrix (Manalink web + mobile, ~150 files)
- services/mana-matrix-bot (Go bot with ~20 plugins)
- docker/matrix configs (Synapse + Element)
- synapse/element-web/matrix-web/mana-matrix-bot services in
docker-compose.macmini.yml
- matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes
- OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks
table from mana-auth (oauth_* schema definitions also removed)
- MatrixService import path in mana-media (importFromMatrix endpoint)
- Matrix notification channel in mana-notify (worker, metrics, config,
channel_type enum, MatrixOptions handler)
- Matrix entries from shared-branding (mana-apps + app-icons),
notify-client, the i18n bundle, the observatory map, the credits
app-label list, the landing footer/apps page, the prometheus + alerts
+ promtail tier mappings, and the matrix-related deploy paths in
cd-macmini.yml + ci.yml
Devlog/manascore/blueprint entries that mention Matrix are left intact
as historical record. The oauth_* + matrix_user_links Postgres tables
stay on existing prod databases — code can no longer write to them, drop
them in a follow-up migration if you want them gone for real.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The custom /api/v1/auth/login route signs the user in via the
better-auth SDK (auth.api.signInEmail) and then forges a request to
/api/auth/token to mint a JWT, passing the session token as a synthetic
cookie header.
The cookie name was hardcoded as `mana.session_token=...`, but in
production better-auth issues the session cookie with the __Secure-
prefix (because secure: true is enabled). Get-session middleware on the
/api/auth/token side couldn't find the session under the unprefixed
name, so it returned 401 silently. Result: tokenResponse.ok was false,
the route fell through, and the response had no `accessToken` field at
all — only the bare { token, user, redirect } from signInEmail.
The frontend in @mana/shared-auth then picked this up as
`data.accessToken === undefined` and stored undefined as the JWT, while
the parallel /api/auth/sign-in/email call masked the visible damage by
setting the SSO cookie. So login *appeared* to work in the browser
(cookie present, session worked) but the JWT path was always broken.
Fix: pick the cookie name based on config.nodeEnv. In production use
__Secure-mana.session_token, in development use mana.session_token (no
__Secure- prefix because secure: false in dev).
Verified end-to-end on auth.mana.how:
POST /api/v1/auth/login → response now includes accessToken (a real
JWT, EdDSA, with sub/email/role/sid/tier/iss/aud claims), refreshToken
(the session token), plus the original signInEmail fields.
The other /api/auth/get-session call sites in this file forward the
incoming request headers verbatim, so they preserve whatever real cookie
the browser sent and don't have this bug.
mana-auth has been crash-looping in production with:
error: Cannot find package 'nanoid' from
'/app/src/services/encryption-vault/index.ts'
The encryption-vault service imports nanoid for audit row IDs (line 27,
used at line 547 in the audit log writer), but nanoid was never added
to services/mana-auth/package.json. The import was introduced in commit
e9915428c (phase 2 — server-side master key custody) and slipped past
because nanoid happens to exist transitively in the workspace via
postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just
fine; a strict isolated container build can't.
Fix:
- Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps
- pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules
Verified the import resolves locally:
bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())'
→ ok: 6TLuTWlenhC0KnSESn5Ex
The Mac Mini still needs to redeploy mana-auth (rebuild image with the
new lockfile, restart container) to pick this up — production is
currently 502ing on auth.mana.how.
mana-voice-bot's source default was 3050, which collided with mana-sync.
Today the collision is latent (voice-bot isn't deployed anywhere), but
sooner or later someone is going to start it on a host that's already
running mana-sync and the second one will refuse to bind. Moving to
3024 puts it inside the AI/ML port range alongside its dependencies
(stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync.
Updated:
- app/main.py — PORT default 3050 → 3024
- start.sh, setup.sh — same fix in the example commands
- CLAUDE.md — full rewrite. Old version described "Mac Mini deployment"
with launchd; the new version explicitly says "not deployed yet" and
documents the seven concrete steps to deploy on the Windows GPU box
alongside the other AI services (Scheduled Task, service.pyw, .env,
firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update).
docs/WINDOWS_GPU_SERVER_SETUP.md:
- Added the missing ManaVideoGen scheduled task to all four
Start-ScheduledTask snippets — video-gen has been running on the
Windows GPU but the doc had never picked it up.
- Added a "mana-video-gen (Port 3026)" service section parallel to the
existing image-gen one, with venv path, repo pointer, model, etc.
- Added a repo-pendants table mapping C:\mana\services\<svc>\ to the
corresponding services/<svc>/ directory in the repo, plus a note that
changes should flow repo→Windows, not the other way around.
docs/PORT_SCHEMA.md:
- Reconciled the warning block with the post-cleanup reality: no more
active or latent port collisions (image-gen ↔ video-gen and
voice-bot ↔ sync are both resolved). Listed the actual ports per host
with public URLs. Kept the planned-vs-actual disclaimer for the
services that still don't match the aspirational ranges (mana-credits
3061 vs planned 3002, etc).
The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those
services live on the Windows GPU server now. The Mac-targeted
installers, plists, and platform-checking setup scripts have been
sitting in the repo as cargo-cult, suggesting Mac Mini deployment is
still a real option. It isn't.
Removed (Mac-Mini deployment infrastructure):
services/mana-stt/
- com.mana.mana-stt.plist (LaunchAgent)
- com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment)
- install-service.sh (single-service launchd installer)
- install-services.sh (mana-stt + vllm-voxtral installer)
- setup.sh (Mac arm64 installer)
- scripts/setup-vllm.sh (vLLM-Voxtral setup)
- scripts/start-vllm-voxtral.sh
services/mana-tts/
- com.mana.mana-tts.plist
- install-service.sh
- setup.sh (Mac arm64 installer)
scripts/mac-mini/
- setup-image-gen.sh (Mac flux2.c launchd installer)
- setup-stt.sh
- setup-tts.sh
- launchd/com.mana.image-gen.plist
- launchd/com.mana.mana-stt.plist
- launchd/com.mana.mana-tts.plist
setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse
side), not the mana-tts service.
Updated:
- services/mana-stt/CLAUDE.md, README.md — fully rewritten for the
Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys
matching the actual production .env on the box)
- services/mana-tts/CLAUDE.md, README.md — same treatment, documenting
Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS
- scripts/mac-mini/README.md — dropped the STT setup section, replaced
with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service
CLAUDE.md files
- docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents"
list to mention the now-removed plists, added the full GPU service
port table with public URLs, added a cleanup snippet for any old plists
still installed on a Mac Mini somewhere
The repo's mana-image-gen used to be a Mac Mini–only service built on
flux2.c with hard MPS+arm64 platform checks. The actual production
image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace
diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code
that lived only at C:\mana\services\mana-image-gen\ on the GPU box.
This commit pulls the Windows implementation into the repo and deletes
the Mac one, so there's exactly one mana-image-gen and its source of
truth is git rather than one folder on one machine.
Removed:
- setup.sh — Mac-only flux2.c installer with hard arm64 platform check
- app/main.py (Mac flux2.c subprocess wrapper version)
- app/flux_service.py (Mac flux2.c subprocess wrapper version)
Added (pulled from C:\mana\services\mana-image-gen\):
- app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup)
- app/flux_service.py — diffusers FluxPipeline wrapper
- app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY)
- app/vram_manager.py — shared VRAM accounting
- service.pyw — Windows runner used by the ManaImageGen scheduled task
Updated:
- main.py PORT default from 3025 → 3023 to match the production reality
(the service.pyw runner already binds 3023 explicitly via uvicorn.run,
but the source default should match so direct uvicorn invocations and
local tests don't pick the wrong port)
- CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack
- README.md trimmed to a pointer at CLAUDE.md + the public URL
- .env.example written from scratch (didn't exist before — the service's
.env on the GPU box was undocumented)
The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the
actual Mac Mini deployment will be cleaned up in the next commit, along
with the rest of the Mac-Mini AI service infrastructure.
The Windows GPU server has been the actual production home for these
services for some time, and the running code there has drifted ahead of
the repo. This sync pulls the live versions back into the repo so the
Windows box is no longer the only place those changes exist.
Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11):
mana-llm:
- src/main.py, src/config.py — small fixes (auth wiring, config tweaks)
- src/api_auth.py — NEW (cross-service GPU_API_KEY validator)
- service.pyw — Windows runner used by the ManaLLM scheduled task
(sets up logging redirect, loads .env, calls uvicorn)
mana-stt:
- app/main.py — substantial cleanup (684→392 lines), drops the
whisperx-as-separate-backend branching now that whisper_service.py
rolls whisperx in directly
- app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines)
- app/auth.py + external_auth.py — significantly expanded auth
- app/vram_manager.py — NEW (shared VRAM accounting helper)
- service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH
injection, .env loading
- removed: app/whisper_service_cuda.py (folded into whisper_service.py)
- removed: app/whisperx_service.py (folded into whisper_service.py)
mana-tts:
- app/auth.py, external_auth.py — same auth expansion as stt
- app/f5_service.py, kokoro_service.py — Windows tweaks
- app/vram_manager.py — NEW (same shared helper as stt)
- service.pyw — Windows runner
mana-video-gen:
- service.pyw — Windows runner (no other changes; the .py code on the
GPU box is byte-identical to what's already in the repo)
The service.pyw files contain absolute Windows paths
(C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user
profile. Kept as-is intentionally — they exist to be deployed to that
one machine and any abstraction layer would just hide what's actually
happening. Anyone redeploying to a different layout will need to edit
the path strings, which is a known and obvious change.
Mac-Mini infrastructure for these services (launchd plists, install
scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen
implementation) is still on disk and will be removed in a follow-up
commit, along with replacing mana-image-gen with the Windows
diffusers+CUDA implementation. This commit is just the live-code sync.
Source default was 3026 but Mac Mini production has been overriding to
3025 via the launchd plist in scripts/mac-mini/setup-image-gen.sh ever
since the service was set up. The override existed in exactly one place
that is not version-controlled in any obvious way — anyone redeploying
without that script would land on 3026 and clients pointing at 3025
would fail to connect.
Source default → 3025 across main.py, setup.sh, README, CLAUDE.md so the
launchd plist is no longer load-bearing. The Mac Mini setup script still
sets PORT=3025 explicitly; that's now belt-and-suspenders rather than the
only thing keeping production alive.
Also added a note clarifying that this Mac Mini service (flux2.c, MPS,
arm64-only) is *not* the same thing as the "image-gen" running on the
Windows GPU server (PyTorch + diffusers + CUDA, port 3023, code lives at
C:\mana\services\mana-image-gen\ outside this repo). Two different
implementations sharing a name was confusing the port-collision audit.
Updated docs/PORT_SCHEMA.md warning block to retract the previous false
claims of two active port collisions:
- image-gen ↔ video-gen on 3026 — wrong: image-gen runs on Mac Mini
on 3025 (now also the source default), video-gen is alone on the
Windows GPU on 3026
- voice-bot ↔ sync on 3050 — latent only: mana-voice-bot is not
deployed anywhere (no launchd, no scheduled task, no cloudflared
route), so the collision is in source defaults but not in production
The voice-bot 3050 default should still be moved before voice-bot is
ever deployed — flagged in the PORT_SCHEMA warning instead of silently
fixed since voice-bot deployment is its own decision.
New service docs:
- services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local),
WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy
backend loading and the launchd plist setup on the Mac Mini.
- services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and
event-sharing. Documents the host (JWT) vs public (token) split, the
rate-limit sweeper, and the createApp factory pattern that lets unit
tests run without bootstrapping the production sweeper.
Stale entries fixed:
- mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the
rewrite is the only mana-auth there is now. Email channel updated from
Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md).
- mana-notify: same Brevo → Stalwart fix in the channel table and env
var defaults.
PORT_SCHEMA.md flagged as aspirational:
- The doc was dated 2026-03-28 and presented as "single source of truth",
but cross-checking against actual service source files (config.go,
main.py, start.sh) shows nothing matches. Added a prominent warning at
the top with the real ports + two confirmed collisions:
* mana-image-gen and mana-video-gen both default to PORT 3026
* mana-voice-bot and mana-sync both default to PORT 3050
Today these are masked because image-gen + voice-bot live on the
Windows GPU server while video-gen + sync live on the Mac Mini, but
the moment they share a host they collide. Either execute the planned
reorg or pick non-colliding ports and rewrite the doc to match
reality — flagged as a real follow-up.
PyTorch's `torch.cuda.get_device_properties(0)` returns a
`_CudaDeviceProperties` object whose memory attribute is
`total_memory` (bytes), not `total_mem`. The typo crashed the
service immediately at startup because `get_model_info()` is
called from the FastAPI lifespan handler, not lazily — uvicorn
logged "Application startup failed" before any request could land.
Found while installing mana-video-gen on the Windows GPU box
(192.168.178.11:3026) for the gpu-video.mana.how Cloudflare route.
After the fix the service starts cleanly under the ManaVideoGen
scheduled task and responds 200 on /health both LAN and via
Cloudflare tunnel. status.mana.how now reports 42/42 — first time
ever.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>