Commit graph

4 commits

Author SHA1 Message Date
Till JS
e77ae5d5eb feat(who): add character dossier system for staged fact disclosure
Pre-researched dossiers (37 JSON files, DE+EN) replace the old
personality strings as the source of truth for the Who guessing game.
A strong cloud LLM (Gemini 2.5 Flash) generates structured facts per
character — voice, values, achievements, anecdotes, relationships,
forbidden-early-words, and three-stage hints — so the small runtime
model (gemma3:4b) gets only what it needs per turn instead of raw
personality text that leaks the identity immediately.

- dossier-types.ts: Zod schema + TS types for CharacterDossier
- dossier-loader.ts: boot-time loader with validation + coverage report
- generate-who-dossiers.ts: one-shot generator script (Google Gemini
  or local mana-llm fallback, idempotent, --force/--id flags)
- 37 dossier JSON files in data/dossiers/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 17:40:16 +02:00
Till JS
9d1b25130d fix(api/who): server-side validation of [IDENTITY_REVEALED] sentinel
The user asked "bist du kopernikus?" while playing Galileo. The
LLM correctly responded "Kopernikus? ... aber nicht meiner!" — and
then appended [IDENTITY_REVEALED] anyway. Game flipped to "won
in 2 messages" with Galileo's name revealed, even though the
guess was wrong.

This is gemma3:4b being lazy about the sentinel rule: any time the
user says "bist du <name>?", the model is biased toward emitting
the sentinel because the prompt mentions "errät den Namen". Weaker
LLMs in general struggle to follow strict negative instructions
when the trigger word is right there in the input.

Fix in three layers:

1. Server-side validation (the real safety net). When the LLM
   emits [IDENTITY_REVEALED], independently verify that the user's
   CURRENT message contains the canonical character name (or one
   of its significant parts) using the same matchesName helper
   the explicit /guess endpoint uses. If the LLM emitted but the
   user didn't actually name this character, strip the sentinel,
   log a who.sentinel_false_positive, and treat the reply as a
   normal turn. The legit cases — user actually said the right
   name — still flow through cleanly.

2. matchesName improvements. The previous logic only matched a
   single-word guess against name parts; "bist du leonardo?" would
   fall through and miss a real win. Rewritten to:
     a) exact normalized match
     b) guess contains the full name as substring
     c) guess contains any significant name part as a WHOLE WORD
   Plus a Set for the guessWords lookup so it's O(1) per part.

3. Tighter system prompt. Added explicit "Sentinel-Regel" section
   with two FALSCH examples ("bist du Tesla?" while playing Edison,
   "bist du ein Erfinder?") and two KORREKT examples. Doesn't fix
   the false-positive rate at the model level but reduces it.

Layer 1 is the load-bearing one — even if the LLM emits the
sentinel for the wrong reason, the server gates the reveal on
ground truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:21:14 +02:00
Till JS
51f408755c fix(api/who): use /v1/chat/completions path for mana-llm
The who module's chat endpoint was returning 502 to the browser
because mana-api called /api/v1/chat/completions on mana-llm and
got 404 — mana-llm exposes the OpenAI-compatible /v1/chat/completions
path with no /api/ prefix.

This is the same bug research had until commit 63a91e36a fixed its
path. The chat module (apps/api/src/modules/chat/routes.ts) still
has the wrong path — flagged as a follow-up.

Diagnostic from inside the mana-api container:
  /v1/chat/completions       → 422 (right path, empty body)
  /api/v1/chat/completions   → 404 (wrong path)

mana-api log line that flagged it:
  who.llm_non_200 status:404

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:48:09 +02:00
Till JS
74b5808496 feat(api): who module — LLM character-guessing endpoint cluster
Server side of the who module. Three endpoints under /api/v1/who/*:

  POST /chat
    Hot path. Body: { gameId, characterId, message, history[] }.
    Looks up character by id (server-side only — clients never see
    personalities), builds a system prompt instructing the LLM to
    roleplay the figure WITHOUT revealing its name and to append
    [IDENTITY_REVEALED] when the player has guessed correctly,
    forwards to mana-llm. Response: { reply, identityRevealed,
    characterName? } — characterName only present on win.

    Same credit pattern as chat module: validateCredits + consume
    after the LLM call succeeds. Operation 'AI_WHO', cheap (0.1
    credit) for local models, 5 for cloud.

  POST /random
    Picks a random character from a deck and returns just the id +
    category + difficulty. Frontend uses this to start a new game
    without ever knowing the personality pool. Server-side
    randomness so a determined attacker can't predict picks.

  POST /guess
    Explicit "I think it's X" submission. Fallback path for when
    the LLM forgets to emit the sentinel even though the player
    clearly said the right name. Deterministic lowercase substring
    match against the canonical name (with diacritic stripping +
    last-name-only matching for unambiguous figures like "Tesla").

  GET /decks
    Public deck catalogue with counts and category labels. Zero
    sensitive data — never leaks names or personalities. Used by
    the picker UI on mount.

data/characters.ts holds 37 characters: the original 26 from
whopixels verbatim + 11 new for the antiquity / women / inventors
decks. Each entry is in one or more decks via a `decks` array, so
e.g. Marie Curie shows up in both `historical` and `women`. Adding
a new character is one entry.

The system prompt is the carefully-tested German prompt from the
original whopixels server.js — tells the LLM to respond in the
language the user writes, give subtle hints, never directly say
"I am X", and emit the sentinel only on a correct guess.

The explicit-guess matcher catches three patterns:
  1. Exact normalized match ("Marie Curie" === "marie curie")
  2. Last-name-only ("Curie" matches "Marie Curie")
  3. Guess-contains-name ("I think it's Marie Curie" → contains)

Closes Phase A.1 of docs/WHO_MODULE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 13:09:46 +02:00