Add Google's Gemini image edit family (Nano Banana) as a user-
selectable model for Wardrobe Try-On next to the existing OpenAI
path. Three concrete choices now expose themselves in the Solo and
Outfit Try-On buttons:
- openai/gpt-image-2 (default, falls back to gpt-image-1
server-side when the org isn't
verified)
- google/gemini-3-pro-image-preview (Nano Banana Pro — premium
identity / character consistency)
- google/gemini-3.1-flash-image-preview (Nano Banana 2 — newest,
fast, cheapest)
All three accept multi-image refs (face + body + garment) through
the same /api/v1/picture/generate-with-reference endpoint; the only
differences are the provider-specific request/response shape and
the model-id routing.
Server (apps/api/src/modules/picture/routes.ts):
- Guard now accepts `openai/*` and `google/*` prefixes and rejects
everything else as "not supported for edits". Each provider's key
is validated separately so missing GEMINI_API_KEY doesn't break
OpenAI calls and vice versa.
- New `callGeminiEdits(modelName)` helper mirrors the shape of
callOpenAiEdits: encodes the normalized PNG refs as base64
inline_data parts, POSTs to
generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
with responseModalities=["TEXT","IMAGE"] and imageConfig
(aspectRatio + imageSize), pulls the generated image out of
candidates[].content.parts[].inlineData.
- Our internal size strings map cleanly: 1024x1024 → 1:1 / 1K,
1024x1536 → 2:3 / 1K, 1536x1024 → 3:2 / 1K. Gemini 1K is enough
for the thumbnail sizes Wardrobe renders; going higher bloats
payload without visible gain.
- creditsFor() gains a google/ branch proportional to upstream
pricing (pro ≈ 18, 3.1-flash ≈ 6, 2.5-flash ≈ 5).
- Response `model` reports `${provider}/${modelUsed}` so the picture
row's model metadata is accurate across providers.
Client (apps/mana/apps/web/src/lib/modules/wardrobe):
- api/try-on.ts: export `TryOnModel` union + `DEFAULT_TRY_ON_MODEL`.
RunGarmentTryOnParams / RunOutfitTryOnParams gain an optional
`model` field, threaded through `callGenerateWithReference`.
- components/TryOnModelPicker.svelte: new segmented control, three
options with label + one-line hint. Grid-auto-fits so it reflows
on the narrow workbench card.
- components/GarmentTryOnButton.svelte + TryOnButton.svelte: both
mount the picker above the Sparkle CTA. `estimatedCredits` on the
button label updates live when the user switches model so the
cost signal matches what the server will actually charge.
Env (scripts/generate-env.mjs): GEMINI_API_KEY and GOOGLE_API_KEY
now propagate from the root `.env.development` into `apps/api/.env`
so mana-api can pick them up at boot. The route reads GEMINI_API_KEY
with GOOGLE_API_KEY as fallback, matching how mana-llm ships today.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Neues Comic-Modul: aus Text-Inputs (Journal / Notes / Writing / Library
/ Calendar) entsteht ein mehrseitiger Comic, generiert mit gpt-image-2
über die bestehende /picture/generate-with-reference-Route. Plan in
docs/plans/comic-module.md (M1–M5 + optional M6–M8).
M1 schafft die Datenschicht ohne UI:
- Dexie v44 `comicStories` (space-scoped, Indices createdAt/style/
isFavorite/isArchived). Story hält `panelImageIds: string[]` und
`panelMeta: Record<panelImageId, {caption, dialogue, promptUsed,
sourceInput?}>` — Panels selbst sind picture.images-Rows mit
comicStoryId + comicPanelIndex Back-Refs.
- Fünf Stil-Presets (comic / manga / cartoon / graphic-novel / webtoon)
mit Prompt-Prefix-Templates in styles.ts; composePanelPrompt webt
Stil + Panel-Prompt + Caption + Dialog zusammen. Sprechblasen
werden von gpt-image-2 direkt ins Bild gerendert — kein SVG-Overlay.
- Encryption-Registry-Eintrag: title / description / storyContext /
tags / panelMeta als JSON-Blob. Struktur (id, style, character-
MediaIds, panelImageIds, Flags, visibility) bleibt plaintext.
- Module-Registry registriert appId='comic', verifyMediaOwnership auf
der /picture/generate-with-reference-Route akzeptiert jetzt
['me', 'wardrobe', 'comic'] — 'comic'-Slot ist reserviert für M6+
Anchor-/Backdrop-Uploads.
- Space-Allowlist: comic in brand (Marken-Storys), club (Vereins-
geschichte), family (Kinder-Abenteuer), team (Release-Comics),
practice (Patienten-Aufklärung). Personal via '*'-Sentinel.
- mana-apps.ts Eintrag mit comic-Icon (Sprechblase + Lightning-Bolt,
f97316→dc2626 Gradient). Lokal tier='guest' mit LOCAL TIER PATCH-
Comment wie Wardrobe, canonical ist 'beta'.
Visibility-System von Anfang an adopted (setVisibility-Methode im
Store, unlistedToken-Generierung inklusive). appendPanel() als
Vorarbeit für M2 bereits da, ohne Aufrufer.
5 Encryption-Roundtrip-Tests grün (panelMeta nested JSON, leeres
panelMeta, partielle panelMeta ohne sourceInput, null-description).
pnpm run check + validate:all sauber (207 Dexie-Tabellen klassifiziert,
comicStories unter den 106 encrypted).
Kein UI, keine Panel-Generierung, keine MCP-Tools — alles M2/M3/M5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
gpt-image-1 answered the last Try-On attempt with
invalid_image_file: Invalid image file or mode for image 2
because one of the references (face/body/garment) was in a format or
color mode OpenAI's edits endpoint rejects — typical culprits are
HEIC from iPhones, CMYK JPEG, palette-mode PNG, APNG, or JPEG with an
ICC profile gpt-image-1 doesn't honour. mana-media stores originals
verbatim so whatever the user uploaded is what we were forwarding.
Route the references through mana-media's existing on-the-fly
/transform endpoint (format=png, w/h=1024, fit=inside) which pipes
the buffer through sharp server-side. One call per ref, all run in
parallel, same latency budget as before. Output is guaranteed
- PNG / RGB (or RGBA if the source had alpha, which gpt-image-1 accepts),
- no more than 1024 px on the longest side → well under OpenAI's
4 MB/image cap,
- aspect-ratio-preserving (fit=inside) so a portrait body photo
doesn't get squished into a square.
New helper `getMediaBufferAsPng(mediaId, longestSide)` in lib/media.ts
encapsulates the transform-URL build. The Try-On path in the picture
route now uses it instead of `getMediaBuffer`; all Blob filenames
pin to `.png` since the buffer is already normalized.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
OpenAI started gating gpt-image-2 behind per-organization verification
(platform.openai.com/settings/organization/general → Verify Organization,
propagation up to 15 min). Unverified orgs get:
"Your organization must be verified to use the model gpt-image-2"
Keeps Try-On broken until the user completes that manual step. Since
the edits endpoint is identical across gpt-image-1 and gpt-image-2
(same image[] multi-ref, same size/quality/n params), detect that
specific rejection and retry once with gpt-image-1.
- buildFormData(modelName) + callOpenAiEdits(modelName) extracted so
the retry is a one-line re-invoke with the fallback model instead
of a duplicated fetch block.
- needsGptImage1Fallback() matches /verified to use the model/i in
the error body AND checks the attempted model was actually
gpt-image-2 — an explicit openai/gpt-image-1 request stays on 1.
- Response now reports `model: openai/${modelUsed}` so the
picture.images row records whichever model actually produced the
image (matters for future re-generation / audit).
Credits unchanged: our flat 3/10/25-per-quality tariff applies to all
openai/* paths. Slight over-charge for the gpt-image-1 fallback until
the user verifies, then gpt-image-2 takes over automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The try-on path POST'd N reference images as repeated `image` fields in
the multipart body. OpenAI's edits endpoint answers that with
`duplicate_parameter: Duplicate parameter: 'image'. You provided
multiple values for this parameter, whereas only one is allowed. If
you are trying to provide a list of values, use the array syntax
instead e.g. 'image[]=<value>'.`
Switch to the array-syntax field name `image[]`, which OpenAI accepts
for cardinality ≥ 1 (no branching needed for the single-ref case).
Also surface the underlying error from the three 502 branches
(ownership-check, media-fetch, OpenAI call) into both the server log
(structured console.error with refIds + openai body) and the response
`detail` field. The client's callGenerateWithReference now prepends
`detail` to the thrown message so the user sees the concrete reason
in-module instead of a generic "Try-On fehlgeschlagen (502)".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
M4 of docs/plans/wardrobe-module.md — the loop closes. A user with at
least a face-ref in the active space can click "Anprobieren" on an
outfit detail page; the client composes a reference call against the
existing M3 `/generate-with-reference` endpoint, persists the result
into the Picture gallery with a `wardrobeOutfitId` back-reference,
and pins a `lastTryOn` snapshot on the outfit so its card instantly
shows the AI preview next time.
Server side — picture/routes.ts:
- verifyMediaOwnership now accepts `apps: string | readonly string[]`.
Under the hood it runs one list() per app-tag and unions the owned
set before the missing-id check. Preserves the 500-row per-app
sanity cap. Single-tag callers unchanged — it's an additive widen.
- Picture /generate-with-reference passes `['me', 'wardrobe']` so
face/body portraits (me-images) and garment photos (wardrobe) can
ride in the same referenceMediaIds array. Anything outside those
two tags still 404s — no expansion of the trust surface.
Client side — wardrobe/api/try-on.ts:
- `runOutfitTryOn({ outfit, garments, faceRefMediaId, bodyRefMediaId?, ... })`
composes the ref list (face → body → up to 6 garments, respecting
the 8-slot server cap), picks portrait 1024x1536 by default (or
1024x1024 in accessory-only mode), and POSTs with
`model='openai/gpt-image-2'`, `quality='medium'`, `n=1`. One render
per click; multi-variant is a future Generator-style extension.
- Default prompts are composed in DE from the outfit meta (name +
occasion); callers can override via `prompt`. Accessory-only mode
uses a tighter studio-portrait phrasing since the fullbody ref is
dropped there.
- `isAccessoryOnlyOutfit()` helper — iff every garment is in
FACE_ONLY_CATEGORIES, skip body-ref and render square. Covers the
Brille-Try-On headline use case.
- On success: inserts a `picture.images` row with generationMode=
'reference', referenceImageIds, and wardrobeOutfitId set; then
calls wardrobeOutfitsStore.setLastTryOn() with imageId + imageUrl
so OutfitCard + DetailOutfitView immediately flip to the AI cover.
TryOnButton — wardrobe/components/TryOnButton.svelte:
- Three states: ready (click to render), missing-references (shows
UserCircle + link to /profile/me-images, with the right hint for
accessory-only vs. fullbody), loading (spinner).
- Credit estimate on the button (10c medium quality).
- Hints: accessory-only, too-many-garments (>6, over server cap),
and non-personal-space disclosure — the family-space case gets its
own sentence since "Try-On rendert dich, nicht dein Kind" is
non-obvious.
- Reads face-ref/body-ref via useImageByPrimary (space-scoped after
the v40 meImages migration — brand/club/family spaces need their
own references uploaded).
UI wiring:
- DetailOutfitView replaces the M3 stub button with <TryOnButton/>.
The existing "Try-On Verlauf"-Strip already reads
`useOutfitTryOns(outfit.id)` which filters `picture.images` by
wardrobeOutfitId — it lights up automatically on first render.
Not in M4 (punted to follow-ups):
- Solo-garment try-on on DetailGarmentView ("nur diese Brille auf
mein Gesicht"). Plan called it out as optional; the outfit flow
already covers it when the outfit contains only that one garment.
- Multi-variant rendering (n=2/4). Usable "show me 3 looks" needs a
picker UI on top, not just a param bump.
- Quality + prompt override in the button. A power-user panel can
come later; default medium + auto-prompt keeps M4's click-to-try-on
one-tap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
M1 of docs/plans/wardrobe-module.md — pure data layer + backend plumbing,
zero UI (that's M2). A user can now hold a digital wardrobe per space:
brand merch, club Trikots, family Kleiderschrank, team Kostüme, practice
Dresscode, and personal closet all live as separate pools under the same
Dexie tables, space-scoped like tags/scenes/agents after Phase 2c.
Data model — two tables, no join:
- wardrobeGarments (Dexie v41): single clothing items / accessories.
Indexed on `category` + `createdAt` + `isArchived`. Encrypted:
name/brand/color/size/material/tags/notes. Plaintext: category,
mediaIds, counters, timestamps — all indexed or structural.
`mediaIds[0]` is the primary photo used for try-on; additional
ids are alternate views (back, detail) for M7.
- wardrobeOutfits (Dexie v41): named compositions referencing
garment ids. Encrypted: name/description/tags. Plaintext:
garmentIds (FK array), occasion (closed enum — useful for
undecrypted filtering), season, booleans, lastTryOn snapshot.
- picture.images gains `wardrobeOutfitId?: string | null` as a
plaintext back-reference. Try-on results land in the Picture
gallery like any other generation; the outfit detail view
queries them via this id rather than maintaining a third table.
Space scope:
- `wardrobe` added to all five explicit allowlists in shared-types/
spaces.ts (personal is wildcard, no edit needed). Each space type
gets a one-line comment explaining the real-world use case.
- App registry: `wardrobe` entry in shared-branding/mana-apps.ts
with a rose→fuchsia gradient icon (T-shirt on hanger silhouette),
color #e11d48, tier 'beta', status 'beta'.
- Module registry: wardrobeModuleConfig imported + appended to
MODULE_CONFIGS so SYNC_APP_MAP picks it up automatically.
Backend:
- MAX_REFERENCE_IMAGES bumped 4 → 8 in picture/generate-with-
reference (plus the client-side default in ReferenceImagePicker).
Justified with a comment: face + body + top + bottom + shoes +
outerwear + 2 accessories = 8. Cost doesn't scale with ref count
(OpenAI bills per output), so the bump is a pure capability
expansion with no credit-side risk.
- New POST /api/v1/wardrobe/garments/upload wraps uploadImageToMedia
with app='wardrobe'. Registered under /api/v1/wardrobe in index.ts.
Pattern 1:1 with the profile/me-images/upload endpoint; tier-gating
falls out of wardrobe NOT being in RESOURCE_MODULES (tier='guest'
works — consistent with picture's plain CRUD).
Stores emit domain events (WardrobeGarmentAdded, WardrobeOutfitCreated,
WardrobeOutfitTryOn, etc.) so later mana-ai missions can observe
activity without polling.
No UI in this commit. M2 (Garments-Grundlayer) wires the route + grid
+ upload-zone; M3 the Outfit composer; M4 the Try-On integration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the M3 loop from docs/plans/mana-mcp-and-personas.md. The
runner now picks up due personas, drives them through Claude + MCP
for one simulated turn, collects actions + ratings, and persists
them through service-key internal endpoints in mana-auth.
Internal endpoints (mana-auth, service-key-gated)
- GET /api/v1/internal/personas/due
Returns personas whose tickCadence + lastActiveAt say they're
due. Rules: hourly > 1h, daily > 24h, weekdays > 24h mon-fri.
NULLS FIRST so never-run personas go ahead of stale ones.
- POST /api/v1/internal/personas/:id/actions
Batch ≤ 500. Row ids are deterministic
(`${tickId}-${i}-${toolName}`) + ON CONFLICT DO NOTHING so the
runner can retry a tick without doubling audit rows. Also
bumps personas.last_active_at so the next /due call sees it.
- POST /api/v1/internal/personas/:id/feedback
Batch ≤ 100. Row id is `${tickId}-${module}` — natural key is
one rating per module per tick.
Runner tick pipeline (services/mana-persona-runner/src/runner/)
- claude-session.ts
Two phases per tick. runMainTurn feeds the persona's system
prompt + a German "simulate a day" user prompt to Claude Agent
SDK's query(), with mana-mcp wired in as a streamable-HTTP MCP
server. We iterate the returned AsyncGenerator and extract
tool_use blocks into ActionRows; tool_result with is_error=true
flips the most recent action. runRatingTurn is a fresh query()
with tools:[] asking Claude in character to rate each used
module 1-5 as strict JSON, which we parse with tolerance for
surrounding whitespace / fences. Unparseable output becomes a
synthetic '__parse' feedback row so operators see the failure.
- tick.ts
Orchestrator. Skips if config.paused. Fetches /due, processes
in batches of config.concurrency (Promise.allSettled so one
failure doesn't kill the batch), returns {due, ranSuccessfully,
failed[], durationMs}.
- types.ts
ActionRow and FeedbackRow shapes shared between claude-session
and the internal client; mirrors the mana-auth schema but in
narrow plain TS for the wire.
Runner bootstrap (src/index.ts)
- setInterval(config.tickIntervalMs) starts the tick loop on boot.
tickInFlight guards against overlap when Claude latency > interval.
If MANA_SERVICE_KEY or ANTHROPIC_API_KEY is missing, loop is
disabled with a warn line — /health still works, /diag/login
still works.
- New dev-only POST /diag/tick fires a single tick on demand and
returns the result, so you can verify without waiting 60 s.
- Graceful SIGTERM/SIGINT shutdown clears the interval.
Client
- clients/mana-auth-internal.ts
X-Service-Key client for the three endpoints above. Constructor
throws if serviceKey is empty — fail loud, not silent.
Boot smoke: /health + /diag/tick both return descriptive 500s when
keys are absent, 200/JSON when present. Warning lines show up on
boot for missing keys. Type-check green across mana-auth, tool-
registry, mcp, persona-runner.
End-to-end smoke recipe (docker up → db:push → seed:personas →
diag/tick → psql) documented in
services/mana-persona-runner/CLAUDE.md. That's the M3 exit gate.
M2.d (cross-space family/team memberships) still deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a third provider path to /api/v1/picture/generate that calls OpenAI
gpt-image-2 when model starts with "openai/". Supports n=1..4 batch
generation with character continuity, base64 response decoded server-side
and uploaded to mana-media for dedup + thumbnails. Credit cost scales
by quality (low=3, medium=10, high=25) × n.
Env plumbing:
- scripts/generate-env.mjs: new apps/api/.env stanza propagates
OPENAI_API_KEY + REPLICATE_API_TOKEN from .env.secrets
- .env.macmini.example: documents OPENAI_API_KEY for prod
Frontend /picture/generate: model + quality + aspect-ratio + batch-count
selectors, real fetch with auth, persists each image via imagesStore.insert
(encrypted + synced). Wrapped in ModuleShell variant=fill with back-arrow
to /picture and a live credit badge in the header actions slot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-launch theme system audit found multiple parallel layers in themes.css
(--theme-X full hsl strings, --X partial shadcn aliases, --color-X populated
by runtime store with raw channels) plus dead-code companion files. The
inconsistency caused light-mode regressions when scoped-CSS consumers
wrote `var(--color-X)` standalone — the variable holds raw HSL channels
which is invalid as a color value, browser fell back to inherited (white).
Rewrite to one consistent layer:
- Source of truth: --color-X defined as raw HSL channels (e.g.
`0 0% 17%`) in :root, .dark, and all variant [data-theme="..."]
blocks. Matches the format the runtime store
(@mana/shared-theme/src/utils.ts) writes, eliminating the
static-fallback-vs-runtime mismatch and the corresponding flash
of unstyled content on hydration.
- @theme inline uses self-reference + Tailwind v4 <alpha-value>
placeholder so utility classes generate correctly AND opacity
modifiers work: `text-foreground/50` → `hsl(var(--color-foreground) / 0.5)`.
- @layer components (.btn-primary, .card, .badge, etc.) wraps
var(--color-X) refs with hsl() — they were broken in light mode
too for the same reason.
Convention going forward (also documented in the file header):
1. Markup: use Tailwind utility classes (text-foreground, bg-card, …)
2. Scoped CSS: hsl(var(--color-X)) — always wrap with hsl()
3. NEVER raw var(--color-X) in CSS — that's the bug pattern
Net file: 692 → 580 LOC. Single source layer, no indirection.
Also delete dead companion files (zero imports anywhere):
- tailwind-v4.css (had broken self-reference, never imported)
- theme-variables.css (legacy hex-based palette)
- components.css (legacy component utilities)
- index.js / preset.js / colors.js (Tailwind v3 preset format,
irrelevant under Tailwind v4)
package.json exports map shrinks accordingly to just `./themes.css`.
Consumers using `hsl(var(--color-X))` (~379 files across mana-web,
manavoxel-web, arcade-web) keep working unchanged — the public API
name `--color-X` is preserved. Only the broken pattern `var(--color-X)`
(~61 files) needs a follow-up sweep, handled in a separate commit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Picture, Contacts, Planta, Storage, and NutriPhi image uploads now go
through mana-media instead of directly to S3. This enables SHA-256
deduplication, automatic thumbnail generation, EXIF extraction, and
makes all images visible in the Photos gallery. Non-image files (PDFs,
audio, docs) continue to use shared-storage directly. SVG avatars in
Contacts also stay on shared-storage since Sharp can't process SVGs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complete consolidation of all 15 app servers into one Hono/Bun process.
Modules added: chat, context, picture, storage, todo, planta, nutriphi,
guides, moodlit, news, traces, presi
Total: 15 modules, one server, one port (3050), ~2400 LOC.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>