managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 22:41:09 +02:00

Author	SHA1	Message	Date
Till JS	286e273b18	test(geocoding): add unit tests + end-to-end smoke test script Unit tests (`bun test`, 42 checks, 0 deps) - `src/lib/__tests__/category-map.test.ts` locks in the Pelias→ PlaceCategory priority resolution. Covers the ambiguous multi-category case (food beats retail for restaurants, transit beats professional for car rentals, transport:rail still maps to transit, …), the simple single-category paths, the layer-hint fallback, and regression cases from real Konstanz/Stuttgart/Köln venues observed during deploy verification. - `src/lib/__tests__/cache.test.ts` covers LRU eviction order, TTL expiry, move-to-end on get (so frequently-read entries survive eviction), size tracking, and typed-value storage. Smoke test (`./scripts/smoke-test.sh` or `bun run test:smoke`) End-to-end curls against a running service, aimed at post-deploy verification. Health endpoints, forward (venue + street fallback), focus biasing, reverse geocoding, cache hit. 9 checks total. Wired up as `test:smoke` in package.json so it runs alongside the unit tests. Verified working: 42/42 unit tests green locally, 9/9 smoke checks green against the live Mac Mini deployment. CLAUDE.md Testing section rewritten to reflect the new test layers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 20:21:18 +02:00
Till JS	32d9f25e7f	docs(geocoding): update CLAUDE.md with deploy lessons learned After the 2026-04-11 production deploy, several non-obvious gotchas surfaced that needed documenting: - Forward search: autocomplete→search fallback explained, so future-me knows why the handler hits two Pelias endpoints for address-style queries. - Pelias infra: corrected object counts (13.4M actual, not 22M), noted the libpostal RAM surprise (~1.9 GB, much larger than Pelias docs suggest), and added real per-container RAM numbers from production. - pelias.json: document that we dropped placeholder/pip/interpolation (not just how to run them) and why the cleaner degradation matters. - Wrapper gotchas section: Bun idleTimeout, Colima bind-mount cache staleness, and the host.docker.internal-from-blackbox workaround. - /health/pelias endpoint is now listed in the API table since it's the integration point with blackbox monitoring. - Testing section added — explicitly "no automated tests yet", with a curl-based manual smoke test set a human can run after changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:59:34 +02:00
Till JS	69ce4c2c25	feat(geocoding): fall back to Pelias /search when /autocomplete is empty Pelias /autocomplete deliberately excludes the address layer as a performance optimization, so queries like "Marktstätte Konstanz" (street + locality) return 0 venue matches even though they're clearly in the index. /search covers all layers including addresses and streets. Query /autocomplete first (fast, fuzzy, great for venue names), and if it returns nothing, try /search. Best of both worlds: quick matches for "Konzil Restaurant" plus reliable matches for street addresses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:54:57 +02:00
Till JS	020f327503	fix(geocoding): drop unused Pelias services, raise Bun idleTimeout Two production follow-ups surfaced after the deploy: 1. Pelias API was emitting continuous `ENOTFOUND placeholder`, `pip`, `interpolation` errors because we declared those services in pelias.json but never actually run them (we don't need WOF admin lookup or street interpolation for the DACH use case). Removed the stale entries — Pelias degrades cleanly to libpostal-only parsing, which is what we want. 2. Bun.serve's default idleTimeout is 10s, which is too tight for cold Pelias queries hitting Elasticsearch. Raise to 60s so first-query-after-idle doesn't get cut off. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 17:41:57 +02:00
Till JS	c47ce83e83	fix(geocoding): proxy Pelias health through wrapper for monitoring blackbox-exporter can't resolve host.docker.internal on Colima, so probes of host.docker.internal:4000 and :9200 always fail. Instead, add a /health/pelias endpoint on the Hono wrapper that proxies to the Pelias API, and update prometheus.yml to probe the wrapper's proxied health endpoint. Also simplifies the status page friendly_name() now that we don't need to display the host.docker.internal targets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:45:43 +02:00
Till JS	6977d189ab	fix(geocoding): don't bind libpostal to host port 4400 Port 4400 collides with mana-infra-landings (status.mana.how nginx) on the production mac mini. libpostal is only reached internally by pelias-api over the pelias compose network anyway — no host binding needed. Use expose instead of ports to drop the host mapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:41:26 +02:00
Till JS	3a93c56fe5	fix(mana-credits): multi-stage Dockerfile with node+pnpm installer Use node:22-alpine + pnpm to install workspace dependencies, then copy node_modules into the bun runtime stage. This resolves @mana/shared-hono which depends on @mana/shared-logger (transitive workspace dep). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:35:36 +02:00
Till JS	fa7bfd33b4	fix(mana-credits): use pnpm in Dockerfile to handle workspace deps bun install doesn't read pnpm-workspace.yaml, so workspace dependencies like @mana/shared-hono can't be resolved. Switch to pnpm install with --filter to install only mana-credits and its workspace deps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:32:05 +02:00
Till JS	c9a3c8c989	fix(mana-credits): rewrite Dockerfile to use WORKDIR instead of cd The previous version chained cd + bun install with \|\| fallback, which left CWD in services/mana-credits after the first attempt and caused the fallback cd to fail. Use WORKDIR directives instead — each step starts from a known absolute path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:30:34 +02:00
Till JS	957060ca55	feat(monitoring): add mana-geocoding + Pelias to prod compose, Prometheus, Grafana, and status.mana.how Production deployment + observability for the self-hosted geocoding stack: docker-compose.macmini.yml - New mana-geocoding container (port 3018, internal-only — no traefik labels, no Cloudflare route). Uses host.docker.internal to reach the Pelias API on the host's pelias compose stack. Dockerfile added under services/mana-geocoding/ using the same Bun/Hono pattern as mana-events. Prometheus - New blackbox-internal job probing mana-geocoding:3018/health, the Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch at host.docker.internal:9200/_cluster/health. Kept separate from blackbox-api which is reserved for public HTTPS endpoints. status.mana.how (generate-status-page.sh) - Include blackbox-internal in the metric query and add an "Interne Dienste" section with its own summary card, right between Infrastruktur and GPU Dienste. Summary grid goes from 4 to 5 columns with a 900px breakpoint. - friendly_name() now handles http:// URLs and rewrites container-name hosts like mana-geocoding:3018/health → "Mana Geocoding", host.docker.internal:4000 → "Pelias API", host.docker.internal:9200 → "Pelias Elasticsearch". Grafana uptime dashboard - Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel - New "Interne Dienste Status" table panel showing per-instance up/down - New "Geocoding Ø Latenz" stat panel for probe_duration_seconds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 16:11:01 +02:00
Till JS	f7de9fdf2d	docs(geocoding): document the Pelias category patch + import gotchas Expand services/mana-geocoding/CLAUDE.md with: - The Pelias API patch (geojsonify_place_details.js) that forces the category field to always be returned, with regeneration instructions - The priority-ordered Pelias→PlaceCategory mapping and verified example mappings from the DACH index - A full initial-import walkthrough covering the non-obvious gotchas (analysis-icu plugin, dach-latest → planet-latest rename, adminLookup disabled, leveldbpath, libpostal config object form, boundary.country single-value constraint) Also register mana-geocoding in the root services list. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 15:50:40 +02:00
Till JS	3717f42cb8	fix(mana-sync): update Dockerfile to copy workspace shared-go dependency The Dockerfile only copied services/mana-sync, but go.mod has a replace directive pointing to ../../packages/shared-go which needs to be in the build context. Switch context to repo root and copy both packages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 15:49:01 +02:00
Till JS	e82b5c1449	feat(geocoding): auto-categorize places via Pelias taxonomy Pelias hides the 'category' field from API responses unless the caller filters by categories=... explicitly — a default intended for keyword search that strips category metadata from address queries. Patch the Pelias API's geojsonify_place_details.js so the category array is returned on every feature (food, retail, transport, …), mounted into the container as a read-only volume override. Rewrite category-map.ts to map Pelias' OSM taxonomy to our 7 PlaceCategories using a priority-ordered list so a restaurant tagged ['food','retail','nightlife'] resolves to 'food' (the most specific), not 'shopping'. Verified with Konstanz test queries: Konzil Restaurant → food Bahnhof Konstanz → transit Physiotherapie-Schule → work MX-Park → leisure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 15:48:24 +02:00
Till JS	1293756bbf	fix(mana-sync): bump Go base image to 1.25 to match go.mod Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 15:47:29 +02:00
Till JS	1943a1d13c	fix(geocoding): Pelias config for DACH-only import + single-country filter After importing 22M OSM objects for the DACH extract: - Disable adminLookup (no WOF data needed for address search) - Configure leveldb path inside the data volume - Specify planet-latest.osm.pbf as the import filename - Convert libpostal service config from string to object form - Drop boundary.country default — Pelias only accepts a single country value, and our index only contains DACH data anyway Verified forward + reverse geocoding work end-to-end for Konstanz test queries via the mana-geocoding wrapper on port 3018. Known limitation: OSM category/type (amenity:restaurant etc.) is not yet populated in Pelias responses — will require whitelisting those tags in the importer config and re-running the import. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 04:58:55 +02:00
Till JS	68c59c84b1	fix(docker): fix mana-credits Dockerfile to resolve workspace deps Some checks are pending CI / Build mana-api-gateway (push) Blocked by required conditions Details CI / Build mana-crawler (push) Blocked by required conditions Details CI / Build mana-media (push) Blocked by required conditions Details CI / Build mana-credits (push) Blocked by required conditions Details CI / Build mana-web (push) Blocked by required conditions Details CI / Build chat-backend (push) Blocked by required conditions Details CI / Build chat-web (push) Blocked by required conditions Details CI / Build todo-backend (push) Blocked by required conditions Details CI / Build todo-web (push) Blocked by required conditions Details CI / Build calendar-backend (push) Blocked by required conditions Details CI / Build calendar-web (push) Blocked by required conditions Details CI / Build clock-web (push) Blocked by required conditions Details CI / Build contacts-backend (push) Blocked by required conditions Details CI / Build contacts-web (push) Blocked by required conditions Details CI / Build presi-web (push) Blocked by required conditions Details CI / Build storage-backend (push) Blocked by required conditions Details CI / Build storage-web (push) Blocked by required conditions Details CI / Build telegram-stats-bot (push) Blocked by required conditions Details CI / Build nutriphi-backend (push) Blocked by required conditions Details CI / Build nutriphi-web (push) Blocked by required conditions Details CI / Build skilltree-web (push) Blocked by required conditions Details Docker Validate / Validate Dockerfiles (push) Waiting to run Details Docker Validate / Build calendar-web (push) Blocked by required conditions Details Docker Validate / Build todo-backend (push) Blocked by required conditions Details Docker Validate / Build todo-web (push) Blocked by required conditions Details Docker Validate / Build zitare-web (push) Blocked by required conditions Details Docker Validate / Build mana-auth (push) Blocked by required conditions Details Docker Validate / Build mana-sync (push) Blocked by required conditions Details Docker Validate / Build mana-media (push) Blocked by required conditions Details Mirror to Forgejo / Push to Forgejo (push) Waiting to run Details The Dockerfile copied only its own package.json, causing bun install to fail on @mana/shared-hono workspace dependency. Now copies workspace root package.json and shared-hono/shared-types packages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 23:14:07 +02:00
Till JS	82f58e44fa	A11y	2026-04-10 23:04:39 +02:00
Till JS	a47a7bfdba	feat(places): add self-hosted geocoding with Pelias (DACH) New mana-geocoding service (port 3018) wraps a self-hosted Pelias instance with LRU caching and OSM→PlaceCategory auto-mapping. All geocoding queries stay within our infrastructure — no user location data leaves the network. Places module integration: - Address autocomplete search in ListView (creates place with name, coords, address, category in one step) - Address search + reverse geocoding button in DetailView - Auto-fill address via reverse geocoding during tracking - OSM category mapping (amenity:restaurant→food, shop:*→shopping, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 23:02:25 +02:00
Till JS	56d7f9a4de	docs(mana-sync): document billing middleware, new env vars, project structure - Add MANA_CREDITS_URL and MANA_SERVICE_KEY to configuration table - Document billing gate on sync endpoints (402 behavior, 5min cache, fail-open) - Add billing/check.go to project structure - Add stream endpoint to API table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 22:38:23 +02:00
Till JS	ed76f53b00	feat(sync): Phase 2 — server-side billing gate, cron charging, email notifications Server-side gating (mana-sync Go): - New billing.Checker with 5-minute cache per user - Middleware wraps POST/GET /sync/{appId} endpoints - Returns 402 Payment Required when sync subscription inactive - Fail-open: if mana-credits is unreachable, sync is allowed - Config: MANA_CREDITS_URL + MANA_SERVICE_KEY env vars Recurring charge cron (mana-credits): - Hourly setInterval checks for due sync subscriptions - Calls chargeRecurring() which debits credits and advances nextChargeAt - On insufficient credits: pauses subscription, sends email via mana-notify Email notifications: - Sends "Cloud Sync pausiert" email via mana-notify when subscription paused - Uses POST /api/v1/notifications/send with X-Service-Key auth Client-side 402 handling: - sync.ts detects 402 from push/pull, fires onBillingRequired callback - Layout wires callback to reload syncBilling store → shows pause banner Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 22:28:57 +02:00
Till JS	5c2ea614cd	feat(credits): add sync billing — monthly credit subscription for cloud sync Cloud Sync is now a paid feature: 30 credits/month (90/quarter, 360/year). Users start in local-only mode and opt-in via Settings > Cloud Sync. 1 Credit = 1 Cent, so sync costs ~0.30€/month. When credits run out, sync is paused (not deleted) and an in-app banner prompts the user to top up. Local data is always preserved. Backend (mana-credits): - New sync_subscriptions table in credits schema - SyncBillingService with activate/deactivate/chargeRecurring - User-facing routes: GET/POST /api/v1/sync/{status,activate,deactivate,change-interval} - Internal routes for server-side checks and cron triggers Frontend (mana web): - Sync API client + reactive sync-billing store - syncEnabled parameter gates createUnifiedSync() — sync only starts when active - Settings sync page with interval selection and activate/deactivate - Pause banner in app layout when credits insufficient Also: removed CALDAV_SYNC/GOOGLE_SYNC operations (not needed), updated CLOUD_SYNC cost from 5 to 30 credits/month. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 22:21:58 +02:00
Till JS	e068335dd4	refactor(credits): simplify credit system — remove productivity credits, guild pools, complex gift types The credit system was overengineered for the local-first architecture: - Productivity micro-credits (task/event/contact creation at 0.02 credits) made no sense since these operations happen locally in IndexedDB with zero server cost and were never enforced - Guild pool system (6 DB tables, spending limits, membership checks) had no active users - Gift system had 5 types (simple/personalized/split/first_come/riddle) when 2 suffice Now credits are only charged for operations that actually cost money: AI API calls and premium features (sync, exports). This makes the value proposition clear to users. Changes: - Remove 8 productivity operations + CreditCategory.PRODUCTIVITY from @mana/credits - Delete guild pool service, routes, schema (3 files); remove guild refs from 8 backend files - Simplify gifts to simple + personalized only; remove bcrypt/riddle/portions logic - Update all frontend pages (credits dashboard, gift create/redeem, public gift page) - Update shared-hono consumeCredits() to remove creditSource parameter - Update mana-credits CLAUDE.md Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 19:08:42 +02:00
Till JS	3e81a6ebef	fix: dev startup — Redis eviction policy, mana-media port crash, Svelte warnings - Redis: allkeys-lru → noeviction to prevent silent data loss when memory full - mana-media: --watch → --hot to fix EADDRINUSE crash on Bun HMR reload - Svelte: build initial values before $state() to avoid state_referenced_locally warnings in create-app-onboarding.svelte.ts and shared-llm/store.svelte.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 18:33:41 +02:00
Till JS	64b8ab30ad	fix(mana-media): commit initial schema migration + run on startup The media schema/tables were never created on fresh deploys because mana-media only shipped a `db:push` script and nothing ever ran it in the container. Result: every upload returned 500 the moment a new environment came up (just hit prod again on mana.how). - Add `db:generate` + `db:migrate` scripts and a migrate.ts runner - Generate the initial migration covering media/media_references/ media_thumbnails (matches what was already on local + prod, which were stamped manually so the migrator skips on existing deploys) - Call runMigrations() at startup in src/index.ts so future fresh containers self-bootstrap. Idempotent — drizzle tracks state in drizzle.__drizzle_migrations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 02:51:41 +02:00
Till JS	5520f1385e	fix(mana-llm): add response_format to ChatCompletionRequest model The first iteration of the Ollama response_format passthrough crashed with 'ChatCompletionRequest object has no attribute response_format' because the Pydantic request model didn't declare the field at all — incoming response_format from OpenAI-compatible clients was being silently dropped at the parsing layer before the provider could see it. Fix: declare a typed ResponseFormat sub-model with the two OpenAI shapes ('json_object' and 'json_schema'), add it as an optional field on ChatCompletionRequest, and let the Ollama provider read it directly without defensive getattr fallbacks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:50:54 +02:00
Till JS	3ef095aaff	fix(mana-llm/ollama): pass response_format to Ollama + strip markdown fences The Ollama provider was completely ignoring `response_format` from the incoming OpenAI-compatible request. Two consequences: 1. Clients that asked for `{"type":"json_object"}` or `{"type":"json_schema",...}` got back JSON wrapped in ```json ... ``` markdown fences, because Ollama defaults to conversational output. 2. Strict downstream parsers (Vercel AI SDK `generateObject`, manual `JSON.parse`) failed to decode the response and threw, even though the underlying JSON was valid inside the fences. Fix: when response_format is set, translate it to Ollama's native `format` field: - `{"type":"json_object"}` → `format: "json"` - `{"type":"json_schema","json_schema":{"schema":{...}}}` → `format: <the schema dict>` (Ollama 0.5+ supports full JSON schemas in the format field) Defensive belt-and-suspenders: a small `_strip_json_fences` helper runs after the Ollama response is decoded and removes any leftover ```json ... ``` wrapping. Some older vision models still wrap output in fences even when `format` is set; this catches them. Streaming path is unchanged because the nutriphi/planta refactor uses non-streaming `generateObject`. Streaming structured output with Ollama deserves its own pass when someone actually needs it. Discovered during the AI SDK + Zod refactor smoke test — neither the old nor the new vision routes ever returned validated JSON locally because of this bug. Production uses Google Gemini directly via fallback so the issue was masked there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 18:12:01 +02:00
Till JS	52159ee07a	fix(news-ingester): disable Readability fallback to break crash loop JSDOM throws CSS / parser errors from detached parse5 callbacks that escape every try/catch in the call stack and even bun's process.on('uncaughtException') handlers — leaving the daemon stuck crash-looping past the first bad page in source #4 (heise) without ever making forward progress. Set FULL_TEXT_THRESHOLD_WORDS = 0 so we never call into Readability. Sources that ship full RSS bodies (Tagesschau, Spiegel, BBC, …) are unaffected. Title-only sources (Hacker News) keep the row with an empty content field; the reader already falls back to "Original öffnen ↗" in that case. Re-enabling extraction in a worker thread is left for a follow-up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:21:09 +02:00
Till JS	dad174a631	fix(news-ingester): silence JSDOM CSS errors + add process-level safety net JSDOM's CSS parser throws on plenty of real-world pages and the error escapes every try/catch in the buildRow → ingestSource chain because it fires from a parse5 callback that runs after JSDOM has returned. In the prod container this killed the process on the first bad page, docker restarted it, and it crash-looped on the same first source forever — no progress past tech. Two-layer fix: a silent VirtualConsole on every JSDOM instance to swallow CSS / resource errors at the source, plus process-level uncaughtException + unhandledRejection handlers that log and continue so any future async escape can't kill the daemon either. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:15:46 +02:00
Till JS	68d1bda7e5	fix(news-ingester): drop unused @mana/shared-hono workspace dep Was copied verbatim from mana-credits' template but not actually imported anywhere in src/. Removing it lets the Docker build's bun install resolve from npm only — workspace:* refs need the full monorepo context which the Dockerfile doesn't copy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 16:11:58 +02:00
Till JS	9ef97a1877	feat(news): backend ingester service + curated feed API Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON feeds into news.curated_articles every 15 min, with Mozilla Readability fallback for thin RSS bodies and 30-day retention. apps/api /feed is rewritten to read from the new pool table directly instead of the sync_changes hack, with topics/lang/since/limit/offset query params. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:53:26 +02:00
Till JS	45790ffbb8	refactor(mana): rename inventar → inventory across the codebase The workbench-registry app id 'inventar' did not match its @mana/shared-branding MANA_APPS counterpart 'inventory', so the tier- gating join in apps/web/src/lib/app-registry/registry.ts silently failed for the inventory module — it fell into the "no MANA_APPS entry, default visible" fallback and was effectively un-gated. The codebase had also voted overwhelmingly for 'inventar' (53 files) vs 'inventory' (3 files in shared-branding), so the long-standing mismatch was just bookkeeping debt waiting to bite. Pre-release, no live data, so the cleanest fix is to align everything on the English 'inventory': - Workbench-registry id, module.config.ts appId, module folder, route folder and i18n locale folder all renamed via git mv - Standalone apps/inventar/ workspace package renamed - All imports, store identifiers (InventarEvents → InventoryEvents, INVENTAR_GUEST_SEED, inventarModuleConfig), i18n keys and href/goto paths follow the rename - The German display label "Inventar" is preserved everywhere it is a user-visible string (page titles, i18n values, toast labels) - Dexie table prefixes (invCollections, invItems, …) are unchanged - Drive-by fix: ListView.svelte was querying non-existent inventarCollections/inventarItems tables — corrected to the actual invCollections/invItems names from module.config - The "inventar ↔ inventory id mismatch" workaround comment in registry.ts is removed since the mismatch no longer exists module-registry.ts also picks up the user's parallel newsModuleConfig addition because both edits land in the same import block — keeping them split would have left the build in an inconsistent state. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 15:50:24 +02:00
Till JS	b8f2d8f694	docs(local-dev): document setup-dev-user + the three founder accounts Adds a "Local Login & Dev Users" section to docs/LOCAL_DEVELOPMENT.md and a short pointer in services/mana-auth/CLAUDE.md so the next dev finds the script without first hitting the "why can't I log in?" wall: - Why it exists (no admin seed, requireEmailVerification + no SMTP) - The 3 default accounts + password - Single-account form + env overrides (TIER, AUTH_URL, …) - Idempotency promise - Prereqs (Postgres + mana-auth on :3001) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:26:37 +02:00
Till JS	fbb71f9366	feat(admin): replace mock dashboard stats with real /admin/stats endpoint The /admin route in the unified Mana web app was rendering hardcoded mock data (42 users, 156 successful logins, 3 failed) for every admin who opened it. The previous code had a TODO comment to wire up a real endpoint and the backend half had been waiting for the frontend half ever since the consolidation landed. Backend (mana-auth): Add GET /api/v1/admin/stats — admin-only, returns the seven counts the dashboard needs in a single response. Each count is its own Drizzle query against auth.users / auth.sessions / auth.login_ attempts; they run in parallel via Promise.all so total latency is dominated by the round-trip to Postgres, not the per-query work. Stats: - totalUsers → users where deleted_at IS NULL - newUsers7d → users created in the last 7 days - newUsers30d → users created in the last 30 days - activeSessions → sessions where expires_at > now() AND not revoked - uniqueUsers24h → distinct user_id from sessions with last_activity in the last 24h (and not revoked) - loginSuccess7d → login_attempts where successful=true, last 7d - loginFailed7d → login_attempts where successful=false, last 7d Plus a generatedAt ISO timestamp so the client can show staleness if it ever caches the response. Frontend (apps/mana/apps/web): - Add adminService.getStats() in the existing admin API service (sits next to getUsers / getUserData / deleteUserData; uses the same authenticated base-client and ApiResult envelope). - Replace the onMount mock-data block in admin/+page.svelte with a single adminService.getStats() call. Drop the local Stats interface in favor of the AdminStats type exported from the service. - Guard the Success Rate calculation against division by zero on fresh deployments — when there have been no login attempts in the last 7 days, render '—%' instead of NaN%. Verification: - mana-auth type-check unchanged (baseline errors only) - mana-auth runtime tests still 19/19 passing - svelte-check on the two changed web files: zero errors Closes item #12 in docs/REFACTORING_AUDIT_2026_04.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:20:18 +02:00
Till JS	034a07d166	chore(workspace): remove redundant nested lockfiles + workspace.yaml Three pnpm artifacts that were either Pre-Consolidation leftovers or unintentional drift: - apps/context/pnpm-lock.yaml + apps/context/pnpm-workspace.yaml apps/context used to be its own nested workspace declaring apps/* and packages/. After consolidation only apps/context/ apps/mobile remains, and the root pnpm-workspace.yaml already matches it via 'apps//apps/'. The nested lockfile (242 KB) was a separate dependency graph drifting independently from the root. - services/mana-media/packages/client/pnpm-lock.yaml Anomalous lockfile in a workspace sub-package. The root workspace already covers services//packages/* — no reason for client/ to maintain its own resolution. Verified after deletion: - pnpm install completes cleanly (~16s) and now resolves apps/context/apps/mobile from the root lockfile (pnpm list confirms the workspace registration) - apps/api type-check still 0 errors - mana-auth tests still 19/19 passing Tracked as item #26 in docs/REFACTORING_AUDIT_2026_04.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 11:57:11 +02:00
Till JS	e19a81c83c	test(mana-auth): sso-config consistency spec Locks in the relationship between three places that must agree about SSO origin configuration: 1. TRUSTED_ORIGINS in better-auth.config.ts (Better Auth allow-list) 2. CORS_ORIGINS env var on mana-auth in docker-compose.macmini.yml 3. The HTTPS subset of (1) must be a subset of (2) — every origin Better Auth trusts must also pass CORS preflight Background: root CLAUDE.md references this spec file as the canonical "Adding an app to SSO" verification step (line 116) but the file itself never existed. The first run of this spec immediately caught two real bugs: - 3 origins in TRUSTED_ORIGINS were missing from CORS_ORIGINS (https://auth.mana.how, https://arcade.mana.how, https://whopxl.mana.how) - 22 zombie subdomain entries in CORS_ORIGINS left over from before the consolidation (calendar, chat, todo, ...) that no app actually routes to anymore Both fixes shipped together with the TRUSTED_ORIGINS extraction in the broader pre-launch sweep (commit `919fcca4b`). This spec is the guard against the same drift creeping back in. Eight tests: - canonical mana.how + auth subdomain present - localhost dev origins (3001, 5173) present - all production origins HTTPS - all production origins on *.mana.how - no duplicates - every HTTPS trusted origin appears in mana-auth CORS_ORIGINS - soft warning for CORS_ORIGINS entries not in trustedOrigins (catches drift in the other direction) 8/8 pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 11:55:30 +02:00
Till JS	919fcca4b7	refactor(shared-tailwind): rewrite themes.css to single-layer shadcn convention Pre-launch theme system audit found multiple parallel layers in themes.css (--theme-X full hsl strings, --X partial shadcn aliases, --color-X populated by runtime store with raw channels) plus dead-code companion files. The inconsistency caused light-mode regressions when scoped-CSS consumers wrote `var(--color-X)` standalone — the variable holds raw HSL channels which is invalid as a color value, browser fell back to inherited (white). Rewrite to one consistent layer: - Source of truth: --color-X defined as raw HSL channels (e.g. `0 0% 17%`) in :root, .dark, and all variant [data-theme="..."] blocks. Matches the format the runtime store (@mana/shared-theme/src/utils.ts) writes, eliminating the static-fallback-vs-runtime mismatch and the corresponding flash of unstyled content on hydration. - @theme inline uses self-reference + Tailwind v4 <alpha-value> placeholder so utility classes generate correctly AND opacity modifiers work: `text-foreground/50` → `hsl(var(--color-foreground) / 0.5)`. - @layer components (.btn-primary, .card, .badge, etc.) wraps var(--color-X) refs with hsl() — they were broken in light mode too for the same reason. Convention going forward (also documented in the file header): 1. Markup: use Tailwind utility classes (text-foreground, bg-card, …) 2. Scoped CSS: hsl(var(--color-X)) — always wrap with hsl() 3. NEVER raw var(--color-X) in CSS — that's the bug pattern Net file: 692 → 580 LOC. Single source layer, no indirection. Also delete dead companion files (zero imports anywhere): - tailwind-v4.css (had broken self-reference, never imported) - theme-variables.css (legacy hex-based palette) - components.css (legacy component utilities) - index.js / preset.js / colors.js (Tailwind v3 preset format, irrelevant under Tailwind v4) package.json exports map shrinks accordingly to just `./themes.css`. Consumers using `hsl(var(--color-X))` (~379 files across mana-web, manavoxel-web, arcade-web) keep working unchanged — the public API name `--color-X` is preserved. Only the broken pattern `var(--color-X)` (~61 files) needs a follow-up sweep, handled in a separate commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 01:13:06 +02:00
Till JS	d941ff2231	fix(mana-auth): account lockout was structurally dead + add failure-path tests While adding negative-path integration tests for the auth flow I discovered that neither of the lockout primitives in services/mana-auth/src/services/security.ts has actually been working in production. Two independent silent failures that combined into a "the lockout never triggers, ever" outcome: 1. recordAttempt() inserted into auth.login_attempts with explicit `id = gen_random_uuid()`, but auth.login_attempts.id is a `serial integer` column with `nextval('auth.login_attempts_id_seq')` as default. The UUID-into-integer cast threw a type error every single time, the bare `catch {}` swallowed it as "non-critical", and not a single login attempt was ever persisted. Lockout's "5 failures in 15 min" check was running against an empty table. 2. checkLockout() built `attempted_at > ${new Date(...)}` via the drizzle sql template, but postgres-js cannot bind a JS Date object directly — it tries to byteLength() the parameter and crashes with `Received an instance of Date`. Same anti-pattern: bare `catch`, returns `{locked: false}` (fail-open), no log, completely invisible. Both are "silent broken since the encryption-vault series of changes" class — caught only because the integration test for the lockout flow expected the 6th login attempt to return 429 and got 200 instead. Fixes: - recordAttempt(): drop the bogus `id` column from the INSERT (let the sequence default assign it), default ipAddress to null instead of letting `${undefined}` collapse the parameter slot, and surface errors in the catch instead of swallowing them silently. - checkLockout(): pass `windowStart.toISOString()` instead of the Date object so postgres-js can serialize it. Same catch upgrade — log the cause when failing open. Failure-path test additions (tests/integration/auth-failures.test.ts): - wrong password: assert 401, no JWT, +1 LOGIN_FAILURE in security_events, +1 row in auth.login_attempts - account lockout: 5 failed attempts then 6th returns 429 with remainingSeconds, even with the correct password - unverified email login: 403 with code = EMAIL_NOT_VERIFIED - validate with garbage token: valid !== true - resend verification: second mail arrives in mailpit Plus the run-integration-tests.sh helper now runs both .test.ts files and tests/integration/package.json's `test` script does the same. Negative-control: reverted the recordAttempt fix (re-added the bogus gen_random_uuid id), the wrong-password test failed at the login_attempts assertion. Reverted the checkLockout fix, the lockout test failed at the 429 assertion. Both fixes verified to be load-bearing. 6 tests, 45 expects, ~1.3s on a warm cache.	2026-04-08 18:29:00 +02:00
Till JS	ed746297b5	fix(mana-auth): security_events INSERT crashed on undefined optional fields logEvent() builds its INSERT via a raw `sql` tagged template: sql\`INSERT INTO auth.security_events (..., user_id, ip_address, user_agent, metadata, ...) VALUES (..., \${params.userId}, \${params.ipAddress}, \${params.userAgent}, \${...metadata}, ...)\` Most call sites only pass userId+eventType (or only eventType for the LOGIN_FAILURE / PASSWORD_RESET_REQUESTED / PROFILE_UPDATED / PASSWORD_CHANGED / ACCOUNT_DELETED events). The other params land in the template as `undefined`, and postgres-js's tagged-template renderer collapses `${undefined}` into literal nothing — producing this: VALUES (gen_random_uuid(), $1, $2, , , $3::jsonb, NOW()) ^^^^ Postgres rejects with "syntax error at or near \",\"". The catch block swallowed it as a `console.warn('Failed to log security event (non-critical):', params.eventType)` with no error detail, which is why this has been silently broken for who knows how long — every register, every login, every password change has been losing its audit row. Fix: - Coerce optional params to `null` (`params.userId ?? null`) before interpolation. NULL is what postgres-js renders for an explicit null. - Surface the actual error in the catch warn so the next time something similar happens it shows up in logs instead of just "non-critical". Verified the diagnosis by toggling `log_statement = all` on the test postgres, triggering a register, and reading the literal failed statement out of postgres logs.	2026-04-08 17:59:23 +02:00
Till JS	bfeeef7819	chore(matrix): final scrub of stale matrix references A grep audit after the previous matrix removal commits found a handful of stragglers in non-runtime files that the earlier sweeps missed: - services/mana-llm/CLAUDE.md: removed matrix-ollama-bot from the consumer-apps diagram and from the related-services table - services/mana-video-gen/CLAUDE.md: removed "Matrix Bots" integration bullet - packages/notify-client/README.md: removed sendMatrix() doc entry (the method itself was already gone in the prior cleanup) - docker/grafana/dashboards/logs-explorer.json: dropped the "Matrix Stack" log row that queried tier="matrix" (would show no data forever) - docker/grafana/dashboards/master-overview.json: dropped the "Matrix Bots" stat panel that counted up{job=~"matrix-.*-bot"} - apps/mana/apps/landing/src/data/ecosystem-health.json: regenerated via scripts/ecosystem-audit.mjs to drop matrix from the app list, icon counts, file analytics, top offenders and authGuard missing list - .gitignore: removed services/matrix-stt-bot/data/ pattern (the service itself was deleted long ago) Production-side stragglers also addressed (not in this commit): - DROP USER synapse on prod Postgres (the parallel cleanup commit `2514831a3` dropped DATABASE matrix + DATABASE synapse but left the role behind) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:47:54 +02:00
Till JS	2514831a3b	chore(matrix): scrub final matrix references after subsystem removal The matrix subsystem was removed in a prior commit. This commit cleans up the small leftovers that grep found: - docker-compose.macmini.yml: dropped the "Matrix Stack" port-range comment, the "matrix" category from the naming convention, and a stale watchtower comment about Matrix notifications. - packages/credits/src/operations.ts: removed AI_BOT_CHAT credit operation type and its definition. It was the billing entry for "Chat with AI via Matrix bot" — no callers left. - services/mana-credits gifts schema + service + validation: removed the targetMatrixId column / param / Zod field. The corresponding PostgreSQL column was dropped manually with `ALTER TABLE gifts.gift_codes DROP COLUMN target_matrix_id` on prod. - docker/grafana/dashboards/{master,system}-overview.json: removed the `up{job="synapse"}` panel queries — they would have shown No Data forever now that Synapse is gone. Production-side cleanup performed in parallel (not in this commit): - Stopped + removed mana-matrix-{synapse,element,web,bot} containers - Removed mana-matrix-bot:local, matrix-web:latest, matrixdotorg/synapse:latest, vectorim/element-web:latest images (~3 GB) - Removed mana-matrix-bots-data Docker volume - Removed /Volumes/ManaData/matrix/ media store (4.3 MB) - DROP DATABASE matrix; DROP DATABASE synapse; on Postgres Cosmetic leftovers intentionally untouched: - Eisenhower matrix in todo (LayoutMode 'matrix') — productivity concept - ${{ matrix.service }} in .github/workflows — GitHub Actions strategy - services/mana-media/apps/api/dist/.../matrix/* — stale build output (not in git, regenerated next mana-media build)	2026-04-08 16:39:42 +02:00
Till JS	8e8b6ac65f	fix(mana-auth) + chore: rewrite /api/v1/auth/login JWT mint, remove Matrix stack This commit bundles two unrelated changes that were swept together by an accidental `git add -A` in another working session. Documented here so the history reflects what's actually inside. ═══════════════════════════════════════════════════════════════════════ 1. fix(mana-auth): /api/v1/auth/login mints JWT via auth.handler instead of api.signInEmail ═══════════════════════════════════════════════════════════════════════ Previous attempt (commit `55cc75e7d`) tried to fix the broken JWT mint in /api/v1/auth/login by switching the cookie name from `mana.session_token` to `__Secure-mana.session_token` for production. That was necessary but not sufficient: Better Auth's session cookie value isn't just the raw session token, it's `<token>.<HMAC>` where the HMAC is derived from the better-auth secret. Reconstructing the cookie from auth.api.signInEmail's JSON response only gave us the raw token, so /api/auth/token's get-session middleware still couldn't validate it and the JWT mint kept silently failing. Real fix: do the sign-in via auth.handler (the HTTP path) rather than auth.api.signInEmail (the SDK path). The handler returns a real fetch Response with a Set-Cookie header containing the fully signed cookie envelope. We capture that header verbatim and forward it as the cookie on the /api/auth/token request, which now passes validation and mints the JWT correctly. Verified end-to-end on auth.mana.how: $ curl -X POST https://auth.mana.how/api/v1/auth/login \ -d '{"email":"...","password":"..."}' { "user": {...}, "token": "<session token>", "accessToken": "eyJhbGciOiJFZERTQSI...", ← real JWT now "refreshToken": "<session token>" } Side benefits: - Email-not-verified path is now handled by checking signInResponse.status === 403 directly, no more catching APIError with the comment-noted async-stream footgun. - X-Forwarded-For is forwarded explicitly so Better Auth's rate limiter and our security log see the real client IP. - The leftover catch block now only handles unexpected exceptions (network errors etc); the FORBIDDEN-checking logic in it is dead but harmless and left in for defense in depth. ═══════════════════════════════════════════════════════════════════════ 2. chore: remove the entire self-hosted Matrix stack (Synapse, Element, Manalink, mana-matrix-bot) ═══════════════════════════════════════════════════════════════════════ The Matrix subsystem ran parallel to the main Mana product without any load-bearing integration: the unified web app never imported matrix-js-sdk, the chat module uses mana-sync (local-first), and mana-matrix-bot's plugins duplicated features the unified app already ships natively. Keeping it alive cost a Synapse + Element + matrix-web + bot container quartet, three Cloudflare routes, an OIDC provider plugin in mana-auth, and a steady drip of devlog/dependency churn. Removed: - apps/matrix (Manalink web + mobile, ~150 files) - services/mana-matrix-bot (Go bot with ~20 plugins) - docker/matrix configs (Synapse + Element) - synapse/element-web/matrix-web/mana-matrix-bot services in docker-compose.macmini.yml - matrix.mana.how/element.mana.how/link.mana.how Cloudflare tunnel routes - OIDC provider plugin + matrix-synapse trustedClient + matrixUserLinks table from mana-auth (oauth_* schema definitions also removed) - MatrixService import path in mana-media (importFromMatrix endpoint) - Matrix notification channel in mana-notify (worker, metrics, config, channel_type enum, MatrixOptions handler) - Matrix entries from shared-branding (mana-apps + app-icons), notify-client, the i18n bundle, the observatory map, the credits app-label list, the landing footer/apps page, the prometheus + alerts + promtail tier mappings, and the matrix-related deploy paths in cd-macmini.yml + ci.yml Devlog/manascore/blueprint entries that mention Matrix are left intact as historical record. The oauth_* + matrix_user_links Postgres tables stay on existing prod databases — code can no longer write to them, drop them in a follow-up migration if you want them gone for real. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 16:32:13 +02:00
Till JS	55cc75e7d3	fix(mana-auth): /api/v1/auth/login uses wrong cookie name in production The custom /api/v1/auth/login route signs the user in via the better-auth SDK (auth.api.signInEmail) and then forges a request to /api/auth/token to mint a JWT, passing the session token as a synthetic cookie header. The cookie name was hardcoded as `mana.session_token=...`, but in production better-auth issues the session cookie with the __Secure- prefix (because secure: true is enabled). Get-session middleware on the /api/auth/token side couldn't find the session under the unprefixed name, so it returned 401 silently. Result: tokenResponse.ok was false, the route fell through, and the response had no `accessToken` field at all — only the bare { token, user, redirect } from signInEmail. The frontend in @mana/shared-auth then picked this up as `data.accessToken === undefined` and stored undefined as the JWT, while the parallel /api/auth/sign-in/email call masked the visible damage by setting the SSO cookie. So login appeared to work in the browser (cookie present, session worked) but the JWT path was always broken. Fix: pick the cookie name based on config.nodeEnv. In production use __Secure-mana.session_token, in development use mana.session_token (no __Secure- prefix because secure: false in dev). Verified end-to-end on auth.mana.how: POST /api/v1/auth/login → response now includes accessToken (a real JWT, EdDSA, with sub/email/role/sid/tier/iss/aud claims), refreshToken (the session token), plus the original signInEmail fields. The other /api/auth/get-session call sites in this file forward the incoming request headers verbatim, so they preserve whatever real cookie the browser sent and don't have this bug.	2026-04-08 16:20:18 +02:00
Till JS	0d1d3b9449	fix(mana-auth): declare missing nanoid dependency mana-auth has been crash-looping in production with: error: Cannot find package 'nanoid' from '/app/src/services/encryption-vault/index.ts' The encryption-vault service imports nanoid for audit row IDs (line 27, used at line 547 in the audit log writer), but nanoid was never added to services/mana-auth/package.json. The import was introduced in commit `e9915428c` (phase 2 — server-side master key custody) and slipped past because nanoid happens to exist transitively in the workspace via postcss → nanoid@3.3.11. Local pnpm store lookups would resolve it just fine; a strict isolated container build can't. Fix: - Add "nanoid": "^5.0.0" to services/mana-auth/package.json deps - pnpm install pulled nanoid@5.1.7 into services/mana-auth/node_modules Verified the import resolves locally: bun -e 'import { nanoid } from "nanoid"; console.log(nanoid())' → ok: 6TLuTWlenhC0KnSESn5Ex The Mac Mini still needs to redeploy mana-auth (rebuild image with the new lockfile, restart container) to pick this up — production is currently 502ing on auth.mana.how.	2026-04-08 15:50:14 +02:00
Till JS	4cb1bc1827	fix(mana-voice-bot): move default port 3050 → 3024 + Windows GPU deployment notes mana-voice-bot's source default was 3050, which collided with mana-sync. Today the collision is latent (voice-bot isn't deployed anywhere), but sooner or later someone is going to start it on a host that's already running mana-sync and the second one will refuse to bind. Moving to 3024 puts it inside the AI/ML port range alongside its dependencies (stt 3020, tts 3022, image-gen 3023, llm 3025) and away from sync. Updated: - app/main.py — PORT default 3050 → 3024 - start.sh, setup.sh — same fix in the example commands - CLAUDE.md — full rewrite. Old version described "Mac Mini deployment" with launchd; the new version explicitly says "not deployed yet" and documents the seven concrete steps to deploy on the Windows GPU box alongside the other AI services (Scheduled Task, service.pyw, .env, firewall rule, cloudflared route, WINDOWS_GPU_SERVER_SETUP.md update). docs/WINDOWS_GPU_SERVER_SETUP.md: - Added the missing ManaVideoGen scheduled task to all four Start-ScheduledTask snippets — video-gen has been running on the Windows GPU but the doc had never picked it up. - Added a "mana-video-gen (Port 3026)" service section parallel to the existing image-gen one, with venv path, repo pointer, model, etc. - Added a repo-pendants table mapping C:\mana\services\<svc>\ to the corresponding services/<svc>/ directory in the repo, plus a note that changes should flow repo→Windows, not the other way around. docs/PORT_SCHEMA.md: - Reconciled the warning block with the post-cleanup reality: no more active or latent port collisions (image-gen ↔ video-gen and voice-bot ↔ sync are both resolved). Listed the actual ports per host with public URLs. Kept the planned-vs-actual disclaimer for the services that still don't match the aspirational ranges (mana-credits 3061 vs planned 3002, etc).	2026-04-08 13:14:57 +02:00
Till JS	f4347032ca	chore(mac-mini): remove all AI service infrastructure (moved to Windows GPU) The Mac Mini hasn't run mana-llm/stt/tts/image-gen for a while — those services live on the Windows GPU server now. The Mac-targeted installers, plists, and platform-checking setup scripts have been sitting in the repo as cargo-cult, suggesting Mac Mini deployment is still a real option. It isn't. Removed (Mac-Mini deployment infrastructure): services/mana-stt/ - com.mana.mana-stt.plist (LaunchAgent) - com.mana.vllm-voxtral.plist (LaunchAgent for the abandoned local Voxtral experiment) - install-service.sh (single-service launchd installer) - install-services.sh (mana-stt + vllm-voxtral installer) - setup.sh (Mac arm64 installer) - scripts/setup-vllm.sh (vLLM-Voxtral setup) - scripts/start-vllm-voxtral.sh services/mana-tts/ - com.mana.mana-tts.plist - install-service.sh - setup.sh (Mac arm64 installer) scripts/mac-mini/ - setup-image-gen.sh (Mac flux2.c launchd installer) - setup-stt.sh - setup-tts.sh - launchd/com.mana.image-gen.plist - launchd/com.mana.mana-stt.plist - launchd/com.mana.mana-tts.plist setup-tts-bot.sh stays — it's the Matrix TTS bot installer (Synapse side), not the mana-tts service. Updated: - services/mana-stt/CLAUDE.md, README.md — fully rewritten for the Windows GPU reality (CUDA WhisperX, Scheduled Task ManaSTT, .env keys matching the actual production .env on the box) - services/mana-tts/CLAUDE.md, README.md — same treatment, documenting Kokoro/Piper/F5-TTS on the Windows GPU under Scheduled Task ManaTTS - scripts/mac-mini/README.md — dropped the STT setup section, replaced with a pointer to docs/WINDOWS_GPU_SERVER_SETUP.md and the per-service CLAUDE.md files - docs/MAC_MINI_SERVER.md — expanded the "deactivated launchagents" list to mention the now-removed plists, added the full GPU service port table with public URLs, added a cleanup snippet for any old plists still installed on a Mac Mini somewhere	2026-04-08 13:06:40 +02:00
Till JS	c7b4388cec	feat(mana-image-gen): replace Mac flux2.c implementation with Windows GPU diffusers The repo's mana-image-gen used to be a Mac Mini–only service built on flux2.c with hard MPS+arm64 platform checks. The actual production image-gen runs on the Windows GPU server (RTX 3090) using HuggingFace diffusers + PyTorch CUDA + FLUX.1-schnell — completely different code that lived only at C:\mana\services\mana-image-gen\ on the GPU box. This commit pulls the Windows implementation into the repo and deletes the Mac one, so there's exactly one mana-image-gen and its source of truth is git rather than one folder on one machine. Removed: - setup.sh — Mac-only flux2.c installer with hard arm64 platform check - app/main.py (Mac flux2.c subprocess wrapper version) - app/flux_service.py (Mac flux2.c subprocess wrapper version) Added (pulled from C:\mana\services\mana-image-gen\): - app/main.py — FastAPI endpoints (/generate, /images/*, /cleanup) - app/flux_service.py — diffusers FluxPipeline wrapper - app/api_auth.py — ApiKeyMiddleware (GPU_API_KEY) - app/vram_manager.py — shared VRAM accounting - service.pyw — Windows runner used by the ManaImageGen scheduled task Updated: - main.py PORT default from 3025 → 3023 to match the production reality (the service.pyw runner already binds 3023 explicitly via uvicorn.run, but the source default should match so direct uvicorn invocations and local tests don't pick the wrong port) - CLAUDE.md fully rewritten to describe the Windows/CUDA/diffusers stack - README.md trimmed to a pointer at CLAUDE.md + the public URL - .env.example written from scratch (didn't exist before — the service's .env on the GPU box was undocumented) The setup-image-gen.sh launchd installer in scripts/mac-mini/ and the actual Mac Mini deployment will be cleaned up in the next commit, along with the rest of the Mac-Mini AI service infrastructure.	2026-04-08 13:02:42 +02:00
Till JS	b8e18b7f82	chore(ai-services): adopt Windows GPU as source of truth for llm/stt/tts The Windows GPU server has been the actual production home for these services for some time, and the running code there has drifted ahead of the repo. This sync pulls the live versions back into the repo so the Windows box is no longer the only place those changes exist. Pulled from C:\mana\services\* on mana-server-gpu (192.168.178.11): mana-llm: - src/main.py, src/config.py — small fixes (auth wiring, config tweaks) - src/api_auth.py — NEW (cross-service GPU_API_KEY validator) - service.pyw — Windows runner used by the ManaLLM scheduled task (sets up logging redirect, loads .env, calls uvicorn) mana-stt: - app/main.py — substantial cleanup (684→392 lines), drops the whisperx-as-separate-backend branching now that whisper_service.py rolls whisperx in directly - app/whisper_service.py — full CUDA + whisperx rewrite (158→358 lines) - app/auth.py + external_auth.py — significantly expanded auth - app/vram_manager.py — NEW (shared VRAM accounting helper) - service.pyw — Windows runner with CUDA pre-init, FFmpeg PATH injection, .env loading - removed: app/whisper_service_cuda.py (folded into whisper_service.py) - removed: app/whisperx_service.py (folded into whisper_service.py) mana-tts: - app/auth.py, external_auth.py — same auth expansion as stt - app/f5_service.py, kokoro_service.py — Windows tweaks - app/vram_manager.py — NEW (same shared helper as stt) - service.pyw — Windows runner mana-video-gen: - service.pyw — Windows runner (no other changes; the .py code on the GPU box is byte-identical to what's already in the repo) The service.pyw files contain absolute Windows paths (C:\mana\services\<svc>) and a hardcoded FFmpeg PATH for the tills user profile. Kept as-is intentionally — they exist to be deployed to that one machine and any abstraction layer would just hide what's actually happening. Anyone redeploying to a different layout will need to edit the path strings, which is a known and obvious change. Mac-Mini infrastructure for these services (launchd plists, install scripts, scripts/mac-mini/setup-{stt,tts}.sh, the Mac-flux2c image-gen implementation) is still on disk and will be removed in a follow-up commit, along with replacing mana-image-gen with the Windows diffusers+CUDA implementation. This commit is just the live-code sync.	2026-04-08 12:46:03 +02:00
Till JS	3c91691d26	fix(mana-image-gen): align source default port with production reality Source default was 3026 but Mac Mini production has been overriding to 3025 via the launchd plist in scripts/mac-mini/setup-image-gen.sh ever since the service was set up. The override existed in exactly one place that is not version-controlled in any obvious way — anyone redeploying without that script would land on 3026 and clients pointing at 3025 would fail to connect. Source default → 3025 across main.py, setup.sh, README, CLAUDE.md so the launchd plist is no longer load-bearing. The Mac Mini setup script still sets PORT=3025 explicitly; that's now belt-and-suspenders rather than the only thing keeping production alive. Also added a note clarifying that this Mac Mini service (flux2.c, MPS, arm64-only) is not the same thing as the "image-gen" running on the Windows GPU server (PyTorch + diffusers + CUDA, port 3023, code lives at C:\mana\services\mana-image-gen\ outside this repo). Two different implementations sharing a name was confusing the port-collision audit. Updated docs/PORT_SCHEMA.md warning block to retract the previous false claims of two active port collisions: - image-gen ↔ video-gen on 3026 — wrong: image-gen runs on Mac Mini on 3025 (now also the source default), video-gen is alone on the Windows GPU on 3026 - voice-bot ↔ sync on 3050 — latent only: mana-voice-bot is not deployed anywhere (no launchd, no scheduled task, no cloudflared route), so the collision is in source defaults but not in production The voice-bot 3050 default should still be moved before voice-bot is ever deployed — flagged in the PORT_SCHEMA warning instead of silently fixed since voice-bot deployment is its own decision.	2026-04-08 12:30:33 +02:00
Till JS	b0a08ce239	docs(services): add CLAUDE.md for stt + events, fix stale entries, flag port collisions New service docs: - services/mana-stt/CLAUDE.md — FastAPI surface with Whisper MLX (local), WhisperX (rich), and Voxtral (local + Mistral API). Documents the lazy backend loading and the launchd plist setup on the Mac Mini. - services/mana-events/CLAUDE.md — Hono/Bun service for public RSVP and event-sharing. Documents the host (JWT) vs public (token) split, the rate-limit sweeper, and the createApp factory pattern that lets unit tests run without bootstrapping the production sweeper. Stale entries fixed: - mana-auth: dropped "rewritten from NestJS / drop-in replacement" — the rewrite is the only mana-auth there is now. Email channel updated from Brevo SMTP to self-hosted Stalwart (see docs/MAIL_SERVER.md). - mana-notify: same Brevo → Stalwart fix in the channel table and env var defaults. PORT_SCHEMA.md flagged as aspirational: - The doc was dated 2026-03-28 and presented as "single source of truth", but cross-checking against actual service source files (config.go, main.py, start.sh) shows nothing matches. Added a prominent warning at the top with the real ports + two confirmed collisions: * mana-image-gen and mana-video-gen both default to PORT 3026 * mana-voice-bot and mana-sync both default to PORT 3050 Today these are masked because image-gen + voice-bot live on the Windows GPU server while video-gen + sync live on the Mac Mini, but the moment they share a host they collide. Either execute the planned reorg or pick non-colliding ports and rewrite the doc to match reality — flagged as a real follow-up.	2026-04-08 12:23:48 +02:00
Till JS	b6486a8a46	fix(mana-video-gen): typo in get_model_info — total_mem → total_memory PyTorch's `torch.cuda.get_device_properties(0)` returns a `_CudaDeviceProperties` object whose memory attribute is `total_memory` (bytes), not `total_mem`. The typo crashed the service immediately at startup because `get_model_info()` is called from the FastAPI lifespan handler, not lazily — uvicorn logged "Application startup failed" before any request could land. Found while installing mana-video-gen on the Windows GPU box (192.168.178.11:3026) for the gpu-video.mana.how Cloudflare route. After the fix the service starts cleanly under the ManaVideoGen scheduled task and responds 200 on /health both LAN and via Cloudflare tunnel. status.mana.how now reports 42/42 — first time ever. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 11:59:40 +02:00

1 2 3 4 5 ...

508 commits