Commit graph

5 commits

Author SHA1 Message Date
Till JS
536fc89050 fix(research): Claude Opus 4.7 rejects temperature param + log executor errors
- claude-web-search.ts: only send `temperature` when caller explicitly
  sets one. Opus 4.7 deprecated the param and returns
  400 invalid_request_error "`temperature` is deprecated for this
  model." Sonnet/Haiku still accept it, so keep the opt-in path.
- execute-research.ts: log provider errors via console.warn so future
  integration failures are visible in stdout. Previously the executor
  swallowed the underlying error and only returned a generic errorCode,
  which made diagnosing vendor-specific API changes impossible.

Discovered via smoke-testing with a real Anthropic key — the direct
curl worked, but our provider 400'd because Opus 4.7 tightened the
accepted param set.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:36:22 +02:00
Till JS
7d120225dc feat(research): Phase 3b openai-deep-research async + BYO-keys CRUD & UI
Two backlog items landed in one commit because an earlier amend in a
parallel terminal dropped the initial Phase 3b commit and the BYO-keys
work was blocked on the same wiring.

openai-deep-research (async):
- New research.async_jobs table persists the OpenAI response.id, query,
  reservation, and cached result/error.
- POST /v1/research/async reserves credits, submits to the Responses API
  with background=true, returns a taskId. Submit failure refunds.
- GET /v1/research/async/:taskId polls upstream, commits the reservation
  on completion, refunds on failure, short-circuits for terminal states.
- GET /v1/research/async lists the user's async tasks.

BYO-keys:
- research.provider_configs CRUD at /v1/provider-configs. Keys are masked
  (••••last4) on read so the raw secret never re-transits to the browser.
  Currently stored plaintext with a TODO for AES-GCM-256 via the shared
  KEK — single call site in storage/configs.ts.decryptKey().
- New frontend route /research-lab/keys lets the user paste a key per
  provider, toggle enabled, and set daily/monthly credit budgets.
- ListView grew a 🔑 link in the header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:43:12 +02:00
Till JS
8f0a74b2e7 feat(research-lab): tier gate (beta+), 1–5 star ratings, run detail route
- Branding: research-lab registered in @mana/shared-branding with requiredTier: 'beta' + a custom flask-on-purple icon, so guest/public users are filtered out of the workbench picker.
- Backend: compare routes now return resultId alongside each CompareEntry so the frontend can wire ratings to the eval_results rows in research.*.
- Frontend: click-to-rate stars in CompareColumn (persists via POST /v1/runs/:runId/results/:resultId/rate), recent-run list rows are now buttons that navigate to /research-lab/runs/[id], and the detail route reconstructs CompareEntry shapes from eval_results + reuses CompareColumn for a full read-only view of any past run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:28:02 +02:00
Till JS
49f315f6be feat(research): Phase 3a — 4 sync research agents
Adds Perplexity Sonar, Claude web_search, OpenAI Responses, and Gemini
Grounding as ResearchAgents behind the same comparison interface as the
search and extract providers.

New endpoints:
  POST /v1/research          — single-agent (or auto-routed to the first
                               provider with a configured key)
  POST /v1/research/compare  — fan-out across N agents, persist all
                               answers + citations in research.eval_*

Each agent normalizes its native response into a common AgentAnswer shape
(answer text + citations[] + tokenUsage), storing the provider's raw
response alongside for later inspection. Implementations use direct HTTP
against each vendor's public API — no SDK deps added.

Auto-routing preference: perplexity-sonar → gemini-grounding →
openai-responses → claude-web-search → (openai-deep-research stubbed for
Phase 3b). Credits orchestration reuses the search/extract executor
pattern (reserve → call → commit/refund).

Deferred to Phase 3b: openai-deep-research (async job queue), migration
of mana-ai + mana-api news-research to call this service directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 15:06:12 +02:00
Till JS
2bdb48bdd1 feat(research): add mana-research service — Phase 1 + 2
New Bun/Hono service on port 3068 that bundles many web-research providers
behind a unified interface for side-by-side comparison. All eval runs
persist in research.* (mana_platform) so quality can be reviewed later.

Providers (Phase 1+2):
  search:  searxng, duckduckgo, brave, tavily, exa, serper
  extract: readability (via mana-search), jina-reader, firecrawl

Endpoints:
  POST /v1/search, /v1/search/compare       — single + fan-out
  POST /v1/extract, /v1/extract/compare     — single + fan-out
  GET  /v1/runs, /v1/runs/:id               — history
  POST /v1/runs/:run/results/:id/rate       — manual eval
  GET  /v1/providers, /v1/providers/health  — catalog + readiness

Auto-routing: when `provider` is omitted, queries are classified via regex
(fast path, 0ms) with optional mana-llm fallback, then routed to the first
available provider for that query type (news → tavily, academic → exa,
semantic → exa, etc.).

Credits: server-key calls go through mana-credits reserve → commit/refund
so failed provider calls don't charge the user. BYO-keys supported via
research.provider_configs (UI arrives in Phase 4).

Cache: Redis with graceful degradation (1h TTL for search, 24h for
extract). Pay-per-use APIs only — no subscription-gated providers.

Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 14:42:25 +02:00