mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-14 18:41:08 +02:00
M1 of docs/plans/wardrobe-module.md — pure data layer + backend plumbing, zero UI (that's M2). A user can now hold a digital wardrobe per space: brand merch, club Trikots, family Kleiderschrank, team Kostüme, practice Dresscode, and personal closet all live as separate pools under the same Dexie tables, space-scoped like tags/scenes/agents after Phase 2c. Data model — two tables, no join: - wardrobeGarments (Dexie v41): single clothing items / accessories. Indexed on `category` + `createdAt` + `isArchived`. Encrypted: name/brand/color/size/material/tags/notes. Plaintext: category, mediaIds, counters, timestamps — all indexed or structural. `mediaIds[0]` is the primary photo used for try-on; additional ids are alternate views (back, detail) for M7. - wardrobeOutfits (Dexie v41): named compositions referencing garment ids. Encrypted: name/description/tags. Plaintext: garmentIds (FK array), occasion (closed enum — useful for undecrypted filtering), season, booleans, lastTryOn snapshot. - picture.images gains `wardrobeOutfitId?: string | null` as a plaintext back-reference. Try-on results land in the Picture gallery like any other generation; the outfit detail view queries them via this id rather than maintaining a third table. Space scope: - `wardrobe` added to all five explicit allowlists in shared-types/ spaces.ts (personal is wildcard, no edit needed). Each space type gets a one-line comment explaining the real-world use case. - App registry: `wardrobe` entry in shared-branding/mana-apps.ts with a rose→fuchsia gradient icon (T-shirt on hanger silhouette), color #e11d48, tier 'beta', status 'beta'. - Module registry: wardrobeModuleConfig imported + appended to MODULE_CONFIGS so SYNC_APP_MAP picks it up automatically. Backend: - MAX_REFERENCE_IMAGES bumped 4 → 8 in picture/generate-with- reference (plus the client-side default in ReferenceImagePicker). Justified with a comment: face + body + top + bottom + shoes + outerwear + 2 accessories = 8. Cost doesn't scale with ref count (OpenAI bills per output), so the bump is a pure capability expansion with no credit-side risk. - New POST /api/v1/wardrobe/garments/upload wraps uploadImageToMedia with app='wardrobe'. Registered under /api/v1/wardrobe in index.ts. Pattern 1:1 with the profile/me-images/upload endpoint; tier-gating falls out of wardrobe NOT being in RESOURCE_MODULES (tier='guest' works — consistent with picture's plain CRUD). Stores emit domain events (WardrobeGarmentAdded, WardrobeOutfitCreated, WardrobeOutfitTryOn, etc.) so later mana-ai missions can observe activity without polling. No UI in this commit. M2 (Garments-Grundlayer) wires the route + grid + upload-zone; M3 the Outfit composer; M4 the Try-On integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
84 lines
4.6 KiB
Markdown
84 lines
4.6 KiB
Markdown
# Website Builder — Observability
|
||
|
||
_Shipped 2026-04-23 (M7)._
|
||
|
||
Every metric below lives on mana-api's `/metrics` scrape endpoint (port 3060, unauthenticated — relies on reverse-proxy to keep it off the public internet).
|
||
|
||
## Metrics
|
||
|
||
### Write path
|
||
|
||
| Metric | Type | Labels | What it tells you |
|
||
|---|---|---|---|
|
||
| `mana_api_website_publish_total` | Counter | `result` = success \| slug_taken \| invalid \| error | Publish-attempt outcome mix. |
|
||
| `mana_api_website_publish_duration_seconds` | Histogram | — | End-to-end publish latency (validation + transaction). |
|
||
| `mana_api_website_domain_verify_total` | Counter | `result` = verified \| failed | Custom-domain DNS check outcomes. |
|
||
|
||
### Public surface
|
||
|
||
| Metric | Type | Labels | What it tells you |
|
||
|---|---|---|---|
|
||
| `mana_api_website_public_reads_total` | Counter | `result` = hit \| not_found | Anonymous reads of `/public/sites/:slug`. |
|
||
| `mana_api_website_public_read_age_seconds` | Histogram | — | Age of the served snapshot at read time. A bimodal distribution (many <10s AND many >1h) tells you the edge cache is working. |
|
||
| `mana_api_website_host_resolve_total` | Counter | `result` = hit \| miss \| error | Custom-host → slug resolutions from the SvelteKit hook. |
|
||
| `mana_api_website_submissions_total` | Counter | `result` = received \| spam \| rate_limit \| not_found \| invalid | Form submissions received. |
|
||
|
||
## Quick PromQL queries
|
||
|
||
**Publish success rate (30 min rolling):**
|
||
```promql
|
||
sum(rate(mana_api_website_publish_total{result="success"}[30m]))
|
||
/
|
||
sum(rate(mana_api_website_publish_total[30m]))
|
||
```
|
||
|
||
**p95 publish latency:**
|
||
```promql
|
||
histogram_quantile(0.95, sum by (le) (rate(mana_api_website_publish_duration_seconds_bucket[10m])))
|
||
```
|
||
|
||
**Custom-host resolve hit rate (production target: >98% once bindings stabilise):**
|
||
```promql
|
||
sum(rate(mana_api_website_host_resolve_total{result="hit"}[5m]))
|
||
/
|
||
sum(rate(mana_api_website_host_resolve_total[5m]))
|
||
```
|
||
|
||
**Spam-to-received ratio (form submissions):**
|
||
```promql
|
||
sum(rate(mana_api_website_submissions_total{result="spam"}[1h]))
|
||
/
|
||
sum(rate(mana_api_website_submissions_total{result=~"received|spam"}[1h]))
|
||
```
|
||
|
||
## Alerts (recommended)
|
||
|
||
- **`website-publish-failure-spike`** — fires when `rate(mana_api_website_publish_total{result="error"}[10m]) > 0.1/s`. Indicates DB trouble or an unhandled exception path.
|
||
- **`website-public-cold`** — fires when `rate(mana_api_website_public_reads_total[1h]) > 10/s AND rate(mana_api_website_public_read_age_seconds_count{le="10"}[1h]) / rate(mana_api_website_public_read_age_seconds_count[1h]) > 0.5`. Half the traffic is hitting fresh snapshots = the edge cache isn't doing its job, usually a CF config drift.
|
||
- **`website-domain-verify-failed-burst`** — fires when `increase(mana_api_website_domain_verify_total{result="failed"}[1h]) > 20`. Either ops broke the DNS target (CNAME not pointing anywhere) or one angry user is thrashing.
|
||
- **`website-form-spam-storm`** — fires when `rate(mana_api_website_submissions_total{result="spam"}[5m]) > 1/s`. Honeypot is holding, but a motivated attacker might move on to CAPTCHA-busting next.
|
||
|
||
## Dashboard
|
||
|
||
Grafana dashboard lives at `grafana.internal/d/website-builder` (add it to the existing "Mana Services" folder). Panels: publish volume + outcome mix, publish latency heatmap, submissions/spam split, host-resolve hit ratio, domain-verify trend.
|
||
|
||
## Orphan-asset GC
|
||
|
||
Read-only scan script at `apps/api/scripts/gc-website-assets.ts`. Run manually for now:
|
||
|
||
```bash
|
||
MANA_SERVICE_KEY=… DATABASE_URL=… bun apps/api/scripts/gc-website-assets.ts
|
||
```
|
||
|
||
The script:
|
||
1. Walks every `published_snapshots.blob` and `submissions.payload` to collect referenced `mediaId`s.
|
||
2. Asks mana-media for everything scoped to `app=website`.
|
||
3. Reports items older than 30 days that aren't referenced anywhere.
|
||
|
||
**Current status: report-only.** No deletion. After 2–3 weeks of production reports showing the candidate list is stable and doesn't include false positives, we flip a `--delete` flag in a follow-up commit.
|
||
|
||
## Future (M7.x)
|
||
|
||
- Per-site view counts. Would require a cheap counter table (`website.site_views { site_id, day, count }`) incremented from the public-read handler. Skipped in M7 first-pass because the analytics block already covers the per-visit needs; add when someone asks for a dashboard inside the editor.
|
||
- Cloudflare hostname status reconciliation. Once the CF SaaS API is wired, a periodic poller should compare our `custom_domains.status` against CF's `hostname.ssl.status` and flag drift.
|
||
- Submission-payload retention job. Fields are kept indefinitely today; when target-delivery lands (M4.x) the job runs after delivery and nulls the payload, keeping only IDs + status.
|