Commit graph

12 commits

Author SHA1 Message Date
Till JS
fea3adf5fe feat(llm-aliases): M5 — migrate consumers to MANA_LLM aliases
Final milestone of docs/plans/llm-fallback-aliases.md. Every backend
caller now requests models via the `mana/<class>` alias system instead
of hardcoded `ollama/...` strings. mana-llm resolves aliases through
`services/mana-llm/aliases.yaml` with health-aware fallback (M3) and
emits resolved-model + fallback metrics (M4).

SSOT moved to `packages/shared-ai/src/llm-aliases.ts` so apps/api,
apps/mana/apps/web, and services/mana-ai all import the same
`MANA_LLM` constant via the existing `@mana/shared-ai` workspace
dependency. Three additional sites (memoro-server, mana-events,
mana-research) inline the alias string with a SSOT comment because
they don't pull @mana/shared-ai today.

Migrated 14 sites across 10 files:
- apps/api: writing(LONG_FORM), comic(STRUCTURED), context(FAST_TEXT),
  food(VISION), plants(VISION), research orchestrator (3 tiers
  collapsed to STRUCTURED+FAST_TEXT/LONG_FORM)
- apps/mana/apps/web: voice/parse-task + parse-habit (STRUCTURED)
- services/mana-ai: planner llm-client + tick.ts (REASONING)
- services/mana-events: website-extractor (STRUCTURED, inlined)
- services/mana-research: mana-llm client (FAST_TEXT, inlined)
- apps/memoro/apps/server: ai.ts (FAST_TEXT, inlined)

Legacy env-vars removed: WRITING_MODEL, COMIC_STORYBOARD_MODEL,
VISION_MODEL, MANA_LLM_DEFAULT_MODEL. The chain in aliases.yaml is
now the single tuning surface; SIGHUP reloads it without redeploys.

New `scripts/validate-llm-strings.mjs` regex-scans 2538 files for
hardcoded `<provider>/<model>` strings and fails the build if any
land outside the SSOT or the explicitly-allowed paths (image-gen
modules, model-inspector code, this validator itself, the registry).
Wired into `validate:all` next to the i18n + theme validators.

Verified: `pnpm validate:llm-strings` clean, `pnpm --filter @mana/api
type-check` clean, `pnpm --filter @mana/ai-service type-check`
clean. Web type-check has 2 pre-existing errors in
SettingsSidebar.svelte (i18n MessageFormatter type drift, last
touched in 988c17a67 — unrelated to this work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:26:03 +02:00
Till JS
3a7bc7f1c3 test(mana-research): fixture-based tests for Gemini poll-response parser
Re-commit of c413ab7dd (reverted in c31dcdd66) without the unrelated
files that accidentally got swept into the original stage. Parser
content is identical.

The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).

This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:

  - status dispatch (queued, in_progress, failed, cancelled, incomplete)
  - completed body concatenated across text items, skipping thought/image
  - empty/missing `outputs` without crashing
  - missing usage
  - citations deduped by url, hostname extracted as title
  - wrong-type annotations and those without url skipped
  - real vertexaisearch redirect URLs Gemini emits
  - fallback to url as title when the URL is unparseable
  - trimming of leading/trailing whitespace

To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.

Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.

17 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:44:21 +02:00
Till JS
c31dcdd66c Revert "test(mana-research): fixture-based tests for Gemini poll-response parser"
This reverts commit c413ab7dd3.
2026-04-22 18:43:48 +02:00
Till JS
c413ab7dd3 test(mana-research): fixture-based tests for Gemini poll-response parser
The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).

This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:

  - status dispatch (queued, in_progress, failed, cancelled, incomplete)
  - completed body concatenated across text items, skipping thought/image
  - empty/missing `outputs` without crashing
  - missing usage
  - citations deduped by url, hostname extracted as title
  - wrong-type annotations and those without url skipped
  - real vertexaisearch redirect URLs Gemini emits
  - fallback to url as title when the URL is unparseable
  - trimming of leading/trailing whitespace

To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.

Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.

17 pass / 0 fail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:34:33 +02:00
Till JS
f10a95e842 feat(mana-research): add Gemini 3.1 Pro Deep Research async providers
- New providers gemini-deep-research + gemini-deep-research-max on the
  Interactions API (preview-04-2026). Submit/poll split, tier parameter
  selects between standard (~minutes, $1–3) and max (up to 60 min, $3–7).
- Parser matches the real response shape: flat `outputs` array of
  thought|text|image items, url_citation annotations without title,
  `usage.total_input_tokens` / `total_output_tokens`.
- Route generalisation: /v1/research/async accepts `provider` with
  default 'openai-deep-research' (backward compatible) and dispatches
  to the right submit/poll pair.
- New internal service-to-service endpoint /v1/internal/research/async
  gated by X-Service-Key + X-User-Id for credit accounting. Enables
  mana-ai to drive deep-research jobs on the mission owner's wallet
  without requiring a user JWT.
- Pricing: 300 credits (standard) / 1500 credits (max). Conservative
  markup over the ~$3/$7 ceiling so the first runs can't surprise us.
- Docs: AGENT_PROVIDER_IDS + pricing + env map + auto-router stay in
  sync; CLAUDE.md Phase 3b now current; API_KEYS.md references the
  new providers under GOOGLE_GENAI_API_KEY.

Verified with a real smoke test against the Gemini API: submit + poll
both succeed, completed response parsed cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:55:30 +02:00
Till JS
8dd3dbc9e5 docs(mana-research): step-by-step API_KEYS.md setup guide
Complete walkthrough per provider — signup URL, free-tier details,
pay-per-use pricing, env-var name, key format — plus sections on where
to paste keys (.env.secrets), BYO-keys vs server-keys, verification
curl commands and troubleshooting (including the cross-service
MANA_SERVICE_KEY mismatch encountered during live testing).

Linked from services/mana-research/CLAUDE.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:50:06 +02:00
Till JS
85537cb92a fix(research): default Gemini to 2.5-flash (2.0-flash deprecated for new users)
Google deprecated `gemini-2.0-flash` for new API users — existing
accounts still work, but a freshly-billed key returns 404
"models/gemini-2.0-flash is no longer available to new users". The
working replacement is `gemini-2.5-flash` (same price tier, better
quality, groundingMetadata shape unchanged).

Verified live: the fix produced a real answer with 6 grounding
citations in 2.6s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:01:24 +02:00
Till JS
536fc89050 fix(research): Claude Opus 4.7 rejects temperature param + log executor errors
- claude-web-search.ts: only send `temperature` when caller explicitly
  sets one. Opus 4.7 deprecated the param and returns
  400 invalid_request_error "`temperature` is deprecated for this
  model." Sonnet/Haiku still accept it, so keep the opt-in path.
- execute-research.ts: log provider errors via console.warn so future
  integration failures are visible in stdout. Previously the executor
  swallowed the underlying error and only returned a generic errorCode,
  which made diagnosing vendor-specific API changes impossible.

Discovered via smoke-testing with a real Anthropic key — the direct
curl worked, but our provider 400'd because Opus 4.7 tightened the
accepted param set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:36:22 +02:00
Till JS
7d120225dc feat(research): Phase 3b openai-deep-research async + BYO-keys CRUD & UI
Two backlog items landed in one commit because an earlier amend in a
parallel terminal dropped the initial Phase 3b commit and the BYO-keys
work was blocked on the same wiring.

openai-deep-research (async):
- New research.async_jobs table persists the OpenAI response.id, query,
  reservation, and cached result/error.
- POST /v1/research/async reserves credits, submits to the Responses API
  with background=true, returns a taskId. Submit failure refunds.
- GET /v1/research/async/:taskId polls upstream, commits the reservation
  on completion, refunds on failure, short-circuits for terminal states.
- GET /v1/research/async lists the user's async tasks.

BYO-keys:
- research.provider_configs CRUD at /v1/provider-configs. Keys are masked
  (••••last4) on read so the raw secret never re-transits to the browser.
  Currently stored plaintext with a TODO for AES-GCM-256 via the shared
  KEK — single call site in storage/configs.ts.decryptKey().
- New frontend route /research-lab/keys lets the user paste a key per
  provider, toggle enabled, and set daily/monthly credit budgets.
- ListView grew a 🔑 link in the header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:43:12 +02:00
Till JS
8f0a74b2e7 feat(research-lab): tier gate (beta+), 1–5 star ratings, run detail route
- Branding: research-lab registered in @mana/shared-branding with requiredTier: 'beta' + a custom flask-on-purple icon, so guest/public users are filtered out of the workbench picker.
- Backend: compare routes now return resultId alongside each CompareEntry so the frontend can wire ratings to the eval_results rows in research.*.
- Frontend: click-to-rate stars in CompareColumn (persists via POST /v1/runs/:runId/results/:resultId/rate), recent-run list rows are now buttons that navigate to /research-lab/runs/[id], and the detail route reconstructs CompareEntry shapes from eval_results + reuses CompareColumn for a full read-only view of any past run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:28:02 +02:00
Till JS
49f315f6be feat(research): Phase 3a — 4 sync research agents
Adds Perplexity Sonar, Claude web_search, OpenAI Responses, and Gemini
Grounding as ResearchAgents behind the same comparison interface as the
search and extract providers.

New endpoints:
  POST /v1/research          — single-agent (or auto-routed to the first
                               provider with a configured key)
  POST /v1/research/compare  — fan-out across N agents, persist all
                               answers + citations in research.eval_*

Each agent normalizes its native response into a common AgentAnswer shape
(answer text + citations[] + tokenUsage), storing the provider's raw
response alongside for later inspection. Implementations use direct HTTP
against each vendor's public API — no SDK deps added.

Auto-routing preference: perplexity-sonar → gemini-grounding →
openai-responses → claude-web-search → (openai-deep-research stubbed for
Phase 3b). Credits orchestration reuses the search/extract executor
pattern (reserve → call → commit/refund).

Deferred to Phase 3b: openai-deep-research (async job queue), migration
of mana-ai + mana-api news-research to call this service directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:06:12 +02:00
Till JS
2bdb48bdd1 feat(research): add mana-research service — Phase 1 + 2
New Bun/Hono service on port 3068 that bundles many web-research providers
behind a unified interface for side-by-side comparison. All eval runs
persist in research.* (mana_platform) so quality can be reviewed later.

Providers (Phase 1+2):
  search:  searxng, duckduckgo, brave, tavily, exa, serper
  extract: readability (via mana-search), jina-reader, firecrawl

Endpoints:
  POST /v1/search, /v1/search/compare       — single + fan-out
  POST /v1/extract, /v1/extract/compare     — single + fan-out
  GET  /v1/runs, /v1/runs/:id               — history
  POST /v1/runs/:run/results/:id/rate       — manual eval
  GET  /v1/providers, /v1/providers/health  — catalog + readiness

Auto-routing: when `provider` is omitted, queries are classified via regex
(fast path, 0ms) with optional mana-llm fallback, then routed to the first
available provider for that query type (news → tavily, academic → exa,
semantic → exa, etc.).

Credits: server-key calls go through mana-credits reserve → commit/refund
so failed provider calls don't charge the user. BYO-keys supported via
research.provider_configs (UI arrives in Phase 4).

Cache: Redis with graceful degradation (1h TTL for search, 24h for
extract). Pay-per-use APIs only — no subscription-gated providers.

Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:42:25 +02:00