managarten/apps/api
Till JS 9d1b25130d fix(api/who): server-side validation of [IDENTITY_REVEALED] sentinel
The user asked "bist du kopernikus?" while playing Galileo. The
LLM correctly responded "Kopernikus? ... aber nicht meiner!" — and
then appended [IDENTITY_REVEALED] anyway. Game flipped to "won
in 2 messages" with Galileo's name revealed, even though the
guess was wrong.

This is gemma3:4b being lazy about the sentinel rule: any time the
user says "bist du <name>?", the model is biased toward emitting
the sentinel because the prompt mentions "errät den Namen". Weaker
LLMs in general struggle to follow strict negative instructions
when the trigger word is right there in the input.

Fix in three layers:

1. Server-side validation (the real safety net). When the LLM
   emits [IDENTITY_REVEALED], independently verify that the user's
   CURRENT message contains the canonical character name (or one
   of its significant parts) using the same matchesName helper
   the explicit /guess endpoint uses. If the LLM emitted but the
   user didn't actually name this character, strip the sentinel,
   log a who.sentinel_false_positive, and treat the reply as a
   normal turn. The legit cases — user actually said the right
   name — still flow through cleanly.

2. matchesName improvements. The previous logic only matched a
   single-word guess against name parts; "bist du leonardo?" would
   fall through and miss a real win. Rewritten to:
     a) exact normalized match
     b) guess contains the full name as substring
     c) guess contains any significant name part as a WHOLE WORD
   Plus a Set for the guessWords lookup so it's O(1) per part.

3. Tighter system prompt. Added explicit "Sentinel-Regel" section
   with two FALSCH examples ("bist du Tesla?" while playing Edison,
   "bist du ein Erfinder?") and two KORREKT examples. Doesn't fix
   the false-positive rate at the model level but reduces it.

Layer 1 is the load-bearing one — even if the LLM emits the
sentinel for the wrong reason, the server gates the reveal on
ground truth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 17:21:14 +02:00
..
drizzle/research feat(questions): deep-research module — mana-search + mana-llm pipeline 2026-04-08 22:15:35 +02:00
src fix(api/who): server-side validation of [IDENTITY_REVEALED] sentinel 2026-04-09 17:21:14 +02:00
Dockerfile fix(api/Dockerfile): switch builder stage to node:20-alpine 2026-04-09 14:10:59 +02:00
drizzle.config.ts feat(questions): deep-research module — mana-search + mana-llm pipeline 2026-04-08 22:15:35 +02:00
package.json feat(shared-types): add Zod schemas for AI structured outputs 2026-04-09 16:59:28 +02:00
tsconfig.json feat(api): create unified API server with first 3 modules 2026-04-02 21:12:15 +02:00