managarten

till/managarten

Fork 0

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 22:41:09 +02:00

Commit graph

Author	SHA1	Message	Date
Till JS	7f6b41654e	test(load): k6 script for the unified apps/api server The pre-launch consolidation collapsed 17+ per-product backends into the single Hono/Bun process at apps/api. That makes apps/api the single point of failure for every authenticated module call the unified Mana web app makes — a missing index, a hot-path allocation in auth middleware, or rate-limiter contention degrades all 16 modules at once. The other scripts in load-tests/ already cover mana-auth, mana-sync, mana-llm and the SvelteKit frontends, but apps/api itself was unmeasured. This is that missing piece. What it tests ------------- A weighted mixed workload that walks the full middleware stack (CORS → request logger → rate limit → auth → router → handler) plus a representative range of handler shapes: 25% GET /health (no auth, baseline) 20% GET /api/v1/moodlit/presets (auth + in-memory return) 15% GET /api/v1/chat/models (auth + DB read) 20% POST /api/v1/calendar/events/expand (auth + Zod + RRULE compute) 12% POST /api/v1/todo/compute/next-occurrence (auth + Zod + rrule lib) 8% POST /api/v1/todo/compute/validate (auth + Zod + validation) Deliberately no write endpoints — those would conflate write amplification with API-server cost. The compute routes here all run in <50ms warm; what we're measuring is the overhead the unified server adds on top of pure handler work. Per-route-class p95 budgets via tags: health < 100ms authed_get < 300ms authed_post < 500ms global p95 < 500ms, p99 < 2s Application-level error rate (4xx + 5xx + check failures) must stay under 1% — exit code 1 otherwise, so it's CI-gateable. Auth setup ---------- apps/api requires JWT on every /api/* route. setup() acquires a token once before VUs start hammering and shares it for the run. Three sources tried in order: 1. $MANA_API_TOKEN (CI passes a pre-minted token) 2. login at $TEST_EMAIL / $TEST_PASSWORD 3. register a fresh account on the fly Bails with a clear error message if all three fail. Load profile ------------ 4 minute total: 30s warmup → 2m sustained @ 50 VUs → 1m peak @ 100 VUs → 30s cooldown. Override with --vus / --duration as usual. Closes item #23 in docs/REFACTORING_AUDIT_2026_04.md. Follow-ups not in this commit: - Wire into .github/workflows/daily-tests.yml (requires standing up the apps/api stack in the runner — bigger lift, separate PR) - Per-module thresholds once we have a few real runs and know where the natural baseline sits Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 12:27:54 +02:00
Till JS	9276d9a212	feat: GPU offload, signup limit, load tests & capacity planning - Route all AI workloads (Ollama, STT, TTS, Image Gen) to GPU server (192.168.178.11) via LAN instead of host.docker.internal - Upgrade default model to gemma3:12b and max concurrent to 5 - Add daily signup limit service (MAX_DAILY_SIGNUPS env var) - Add GET /api/v1/auth/signup-status public endpoint - Add k6 load test suite (web-apps, auth, sync-websocket, ollama) - Add capacity planning documentation - Fix: add eslint-config to sveltekit-base and calendar Dockerfiles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-28 21:14:24 +01:00

Author

SHA1

Message

Date

Till JS

7f6b41654e

test(load): k6 script for the unified apps/api server

The pre-launch consolidation collapsed 17+ per-product backends into
the single Hono/Bun process at apps/api. That makes apps/api the
single point of failure for every authenticated module call the
unified Mana web app makes — a missing index, a hot-path allocation
in auth middleware, or rate-limiter contention degrades all 16
modules at once. The other scripts in load-tests/ already cover
mana-auth, mana-sync, mana-llm and the SvelteKit frontends, but
apps/api itself was unmeasured. This is that missing piece.

What it tests
-------------
A weighted mixed workload that walks the full middleware stack
(CORS → request logger → rate limit → auth → router → handler)
plus a representative range of handler shapes:

  25%  GET /health                            (no auth, baseline)
  20%  GET /api/v1/moodlit/presets            (auth + in-memory return)
  15%  GET /api/v1/chat/models                (auth + DB read)
  20%  POST /api/v1/calendar/events/expand    (auth + Zod + RRULE compute)
  12%  POST /api/v1/todo/compute/next-occurrence
                                              (auth + Zod + rrule lib)
   8%  POST /api/v1/todo/compute/validate     (auth + Zod + validation)

Deliberately no write endpoints — those would conflate write
amplification with API-server cost. The compute routes here all run
in <50ms warm; what we're measuring is the overhead the unified
server adds on top of pure handler work.

Per-route-class p95 budgets via tags:

  health      < 100ms
  authed_get  < 300ms
  authed_post < 500ms
  global      p95 < 500ms, p99 < 2s

Application-level error rate (4xx + 5xx + check failures) must stay
under 1% — exit code 1 otherwise, so it's CI-gateable.

Auth setup
----------
apps/api requires JWT on every /api/* route. setup() acquires a
token once before VUs start hammering and shares it for the run.
Three sources tried in order:

  1. $MANA_API_TOKEN  (CI passes a pre-minted token)
  2. login at $TEST_EMAIL / $TEST_PASSWORD
  3. register a fresh account on the fly

Bails with a clear error message if all three fail.

Load profile
------------
4 minute total: 30s warmup → 2m sustained @ 50 VUs → 1m peak @ 100 VUs
→ 30s cooldown. Override with --vus / --duration as usual.

Closes item #23 in docs/REFACTORING_AUDIT_2026_04.md.

Follow-ups not in this commit:
  - Wire into .github/workflows/daily-tests.yml (requires standing
    up the apps/api stack in the runner — bigger lift, separate PR)
  - Per-module thresholds once we have a few real runs and know
    where the natural baseline sits

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-09 12:27:54 +02:00

Till JS

9276d9a212

feat: GPU offload, signup limit, load tests & capacity planning

- Route all AI workloads (Ollama, STT, TTS, Image Gen) to GPU server
  (192.168.178.11) via LAN instead of host.docker.internal
- Upgrade default model to gemma3:12b and max concurrent to 5
- Add daily signup limit service (MAX_DAILY_SIGNUPS env var)
- Add GET /api/v1/auth/signup-status public endpoint
- Add k6 load test suite (web-apps, auth, sync-websocket, ollama)
- Add capacity planning documentation
- Fix: add eslint-config to sveltekit-base and calendar Dockerfiles

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-28 21:14:24 +01:00

2 commits