Commit graph

2 commits

Author SHA1 Message Date
Till JS
77b2d1eb32 chore(infra): smarter tunnel rebuild — apex via API + sane probes
Two improvements to scripts/mac-mini/rebuild-tunnel.sh based on what
the first prod run actually surfaced.

═══ 1. Apex domain auto-fix via Cloudflare API ═══

`cloudflared tunnel route dns` cannot route the apex of a zone
(error code 1003: "An A, AAAA, or CNAME record with that host already
exists"). The CLI has no command to delete those records. The first
rebuild left mana.how returning 530 because the script silently
failed to route it and we had to fix the apex manually in the
dashboard.

The new `apex_route_via_api()` helper:
  - Detects apex hostnames by dot count (one dot → two-label name)
  - Uses $CLOUDFLARE_API_TOKEN if available
  - Resolves the zone id by name
  - Deletes any existing A / AAAA / CNAME records on the apex
  - Creates a fresh proxied CNAME pointing at <tunnel>.cfargotunnel.com
  - Cloudflare's CNAME flattening at the apex makes this work
    transparently

If $CLOUDFLARE_API_TOKEN is not set, the script logs a warning at the
top of step 6 and falls back to the old behavior (route fails, user
fixes the apex manually). The token needs Zone:DNS:Edit on the
target zone.

═══ 2. Smarter HTTP verification ═══

The first run reported "5 hosts down (404/000)" but those were all
backend services without a root handler — credits/media/llm/mana-api
all return 404 at `/` and 200 at `/health`. The verify pass was
flagging healthy services as down and made the rebuild look more
broken than it was.

New `probe_host()` tries `/health` first, falls back to `/` only if
/health returned 4xx, and prefers a 2xx/3xx root response over a 4xx
/health. `probe_is_down()` only counts 5xx and 000 (libcurl error)
as failures — anything in 1xx-4xx means the request reached the
origin and the tunnel routing is correct, which is the actual thing
the verify pass cares about. `probe_label()` adds a one-word health
summary so the verify log reads "200 ok" / "401 auth required" /
"404 routed (no handler)" / "530 tunnel error" instead of just bare
status codes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:52:40 +02:00
Till JS
bd231cd689 feat(api/web): wire-format envelope versioning + Anthropic prompt-cache hints
Two related AI-infrastructure hardenings landing together because both
touch the same nutriphi/planta route definitions:

═══ 1. Wire-format schema versioning ═══

Adds AI_SCHEMA_VERSION + AiResponseEnvelope<T> in @mana/shared-types so
every AI structured-output endpoint speaks a single envelope dialect:

    { schemaVersion: '1', data: <validated object> }

Backend wraps via a small `envelope()` helper in each module's routes.ts;
frontend api.ts unwraps via `unwrapEnvelope<T>()` which throws an
AiSchemaVersionMismatchError if the server returns a version this
client wasn't compiled against.

Why this matters before launch:
  - Catches stale-cache scenarios immediately ("client v1 talking to
    server v2") with an actionable error in the network panel, not a
    cascade of "field is undefined" bugs further down the stack
  - Forces explicit version bumps when we make non-additive schema
    changes — the bump rules are documented inline next to the constant
  - Cheap to remove if it ever feels overkill: drop the envelope() call
    on the backend and the unwrapEnvelope on the frontend, ~10 lines

═══ 2. Anthropic prompt-caching directive (forward-compat) ═══

Adds `providerOptions: { anthropic: { cacheControl: { type: 'ephemeral' } } }`
on the system message in nutriphi + planta routes via a SYSTEM_CACHE_HINT
constant. This is a NO-OP today because:
  - mana-llm currently routes to Gemini, not Claude
  - Our system prompts are ~50 tokens, well under Anthropic's 1024-token
    cache minimum

Kept anyway because it's ~5 lines per route and lights up automatically
when either condition flips (e.g. when we add per-user dietary preferences
as system context, pushing prompts past the threshold). The day we point
mana-llm at Claude Sonnet, every existing call site already has caching
enabled — no scavenger hunt through the routes.

System messages had to migrate from the `system:` shorthand to a full
messages[] entry to attach providerOptions, which is a tiny readability
loss but the only way to get per-message metadata into the AI SDK.

═══ Tests ═══

13 new cases in apps/mana/apps/web/.../nutriphi/ai-schemas.test.ts cover:
  - AI_SCHEMA_VERSION presence + AiSchemaVersionMismatchError shape
  - MealAnalysisSchema acceptance/rejection (confidence bounds, missing
    nutrients, optional food fields, default empty arrays)
  - PlantIdentificationSchema (every-field-optional design, defaults,
    confidence range)

(Test file lives in the web app rather than packages/shared-types
because the latter has no test runner configured — adding vitest there
just for these would be overkill.)

Total nutriphi + planta suite: 62/62 passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:17:18 +02:00