managarten

mirror of https://github.com/Memo-2023/mana-monorepo.git synced 2026-05-14 22:41:09 +02:00

Author	SHA1	Message	Date
Till JS	fea3adf5fe	feat(llm-aliases): M5 — migrate consumers to MANA_LLM aliases Final milestone of docs/plans/llm-fallback-aliases.md. Every backend caller now requests models via the `mana/<class>` alias system instead of hardcoded `ollama/...` strings. mana-llm resolves aliases through `services/mana-llm/aliases.yaml` with health-aware fallback (M3) and emits resolved-model + fallback metrics (M4). SSOT moved to `packages/shared-ai/src/llm-aliases.ts` so apps/api, apps/mana/apps/web, and services/mana-ai all import the same `MANA_LLM` constant via the existing `@mana/shared-ai` workspace dependency. Three additional sites (memoro-server, mana-events, mana-research) inline the alias string with a SSOT comment because they don't pull @mana/shared-ai today. Migrated 14 sites across 10 files: - apps/api: writing(LONG_FORM), comic(STRUCTURED), context(FAST_TEXT), food(VISION), plants(VISION), research orchestrator (3 tiers collapsed to STRUCTURED+FAST_TEXT/LONG_FORM) - apps/mana/apps/web: voice/parse-task + parse-habit (STRUCTURED) - services/mana-ai: planner llm-client + tick.ts (REASONING) - services/mana-events: website-extractor (STRUCTURED, inlined) - services/mana-research: mana-llm client (FAST_TEXT, inlined) - apps/memoro/apps/server: ai.ts (FAST_TEXT, inlined) Legacy env-vars removed: WRITING_MODEL, COMIC_STORYBOARD_MODEL, VISION_MODEL, MANA_LLM_DEFAULT_MODEL. The chain in aliases.yaml is now the single tuning surface; SIGHUP reloads it without redeploys. New `scripts/validate-llm-strings.mjs` regex-scans 2538 files for hardcoded `<provider>/<model>` strings and fails the build if any land outside the SSOT or the explicitly-allowed paths (image-gen modules, model-inspector code, this validator itself, the registry). Wired into `validate:all` next to the i18n + theme validators. Verified: `pnpm validate:llm-strings` clean, `pnpm --filter @mana/api type-check` clean, `pnpm --filter @mana/ai-service type-check` clean. Web type-check has 2 pre-existing errors in SettingsSidebar.svelte (i18n MessageFormatter type drift, last touched in `988c17a67` — unrelated to this work). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:26:03 +02:00
Till JS	3a7bc7f1c3	test(mana-research): fixture-based tests for Gemini poll-response parser Re-commit of `c413ab7dd` (reverted in `c31dcdd66`) without the unrelated files that accidentally got swept into the original stage. Parser content is identical. The real Gemini /v1beta/interactions/:id completed shape bit us once already during the initial smoke-test (we had OpenAI-style nested `output.message.content[]` coded; reality is a flat `outputs` array of thought\|text\|image items, with url_citations that carry no title and usage fields named `total_input_tokens` rather than `input_tokens`). This test pins the parser against a synthetic fixture covering the cases we saw in the wild plus the failure modes that are hard to provoke from a live API call: - status dispatch (queued, in_progress, failed, cancelled, incomplete) - completed body concatenated across text items, skipping thought/image - empty/missing `outputs` without crashing - missing usage - citations deduped by url, hostname extracted as title - wrong-type annotations and those without url skipped - real vertexaisearch redirect URLs Gemini emits - fallback to url as title when the URL is unparseable - trimming of leading/trailing whitespace To make this testable I pulled the completed-branch of pollGeminiDeepResearch into a standalone parseInteractionResponse helper — same behaviour, now reachable without mocking global fetch. Also adds the `test` script to package.json so `pnpm --filter @mana/research-service test` works. 17 pass / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:44:21 +02:00
Till JS	c31dcdd66c	Revert "test(mana-research): fixture-based tests for Gemini poll-response parser" This reverts commit `c413ab7dd3`.	2026-04-22 18:43:48 +02:00
Till JS	c413ab7dd3	test(mana-research): fixture-based tests for Gemini poll-response parser The real Gemini /v1beta/interactions/:id completed shape bit us once already during the initial smoke-test (we had OpenAI-style nested `output.message.content[]` coded; reality is a flat `outputs` array of thought\|text\|image items, with url_citations that carry no title and usage fields named `total_input_tokens` rather than `input_tokens`). This test pins the parser against a synthetic fixture covering the cases we saw in the wild plus the failure modes that are hard to provoke from a live API call: - status dispatch (queued, in_progress, failed, cancelled, incomplete) - completed body concatenated across text items, skipping thought/image - empty/missing `outputs` without crashing - missing usage - citations deduped by url, hostname extracted as title - wrong-type annotations and those without url skipped - real vertexaisearch redirect URLs Gemini emits - fallback to url as title when the URL is unparseable - trimming of leading/trailing whitespace To make this testable I pulled the completed-branch of pollGeminiDeepResearch into a standalone parseInteractionResponse helper — same behaviour, now reachable without mocking global fetch. Also adds the `test` script to package.json so `pnpm --filter @mana/research-service test` works. 17 pass / 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 18:34:33 +02:00
Till JS	f10a95e842	feat(mana-research): add Gemini 3.1 Pro Deep Research async providers - New providers gemini-deep-research + gemini-deep-research-max on the Interactions API (preview-04-2026). Submit/poll split, tier parameter selects between standard (~minutes, $1–3) and max (up to 60 min, $3–7). - Parser matches the real response shape: flat `outputs` array of thought\|text\|image items, url_citation annotations without title, `usage.total_input_tokens` / `total_output_tokens`. - Route generalisation: /v1/research/async accepts `provider` with default 'openai-deep-research' (backward compatible) and dispatches to the right submit/poll pair. - New internal service-to-service endpoint /v1/internal/research/async gated by X-Service-Key + X-User-Id for credit accounting. Enables mana-ai to drive deep-research jobs on the mission owner's wallet without requiring a user JWT. - Pricing: 300 credits (standard) / 1500 credits (max). Conservative markup over the ~$3/$7 ceiling so the first runs can't surprise us. - Docs: AGENT_PROVIDER_IDS + pricing + env map + auto-router stay in sync; CLAUDE.md Phase 3b now current; API_KEYS.md references the new providers under GOOGLE_GENAI_API_KEY. Verified with a real smoke test against the Gemini API: submit + poll both succeed, completed response parsed cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-22 17:55:30 +02:00
Till JS	8dd3dbc9e5	docs(mana-research): step-by-step API_KEYS.md setup guide Complete walkthrough per provider — signup URL, free-tier details, pay-per-use pricing, env-var name, key format — plus sections on where to paste keys (.env.secrets), BYO-keys vs server-keys, verification curl commands and troubleshooting (including the cross-service MANA_SERVICE_KEY mismatch encountered during live testing). Linked from services/mana-research/CLAUDE.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 18:50:06 +02:00
Till JS	85537cb92a	fix(research): default Gemini to 2.5-flash (2.0-flash deprecated for new users) Google deprecated `gemini-2.0-flash` for new API users — existing accounts still work, but a freshly-billed key returns 404 "models/gemini-2.0-flash is no longer available to new users". The working replacement is `gemini-2.5-flash` (same price tier, better quality, groundingMetadata shape unchanged). Verified live: the fix produced a real answer with 6 grounding citations in 2.6s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 17:01:24 +02:00
Till JS	536fc89050	fix(research): Claude Opus 4.7 rejects `temperature` param + log executor errors - claude-web-search.ts: only send `temperature` when caller explicitly sets one. Opus 4.7 deprecated the param and returns 400 invalid_request_error "`temperature` is deprecated for this model." Sonnet/Haiku still accept it, so keep the opt-in path. - execute-research.ts: log provider errors via console.warn so future integration failures are visible in stdout. Previously the executor swallowed the underlying error and only returned a generic errorCode, which made diagnosing vendor-specific API changes impossible. Discovered via smoke-testing with a real Anthropic key — the direct curl worked, but our provider 400'd because Opus 4.7 tightened the accepted param set. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 16:36:22 +02:00
Till JS	7d120225dc	feat(research): Phase 3b openai-deep-research async + BYO-keys CRUD & UI Two backlog items landed in one commit because an earlier amend in a parallel terminal dropped the initial Phase 3b commit and the BYO-keys work was blocked on the same wiring. openai-deep-research (async): - New research.async_jobs table persists the OpenAI response.id, query, reservation, and cached result/error. - POST /v1/research/async reserves credits, submits to the Responses API with background=true, returns a taskId. Submit failure refunds. - GET /v1/research/async/:taskId polls upstream, commits the reservation on completion, refunds on failure, short-circuits for terminal states. - GET /v1/research/async lists the user's async tasks. BYO-keys: - research.provider_configs CRUD at /v1/provider-configs. Keys are masked (••••last4) on read so the raw secret never re-transits to the browser. Currently stored plaintext with a TODO for AES-GCM-256 via the shared KEK — single call site in storage/configs.ts.decryptKey(). - New frontend route /research-lab/keys lets the user paste a key per provider, toggle enabled, and set daily/monthly credit budgets. - ListView grew a 🔑 link in the header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:43:12 +02:00
Till JS	8f0a74b2e7	feat(research-lab): tier gate (beta+), 1–5 star ratings, run detail route - Branding: research-lab registered in @mana/shared-branding with requiredTier: 'beta' + a custom flask-on-purple icon, so guest/public users are filtered out of the workbench picker. - Backend: compare routes now return resultId alongside each CompareEntry so the frontend can wire ratings to the eval_results rows in research.*. - Frontend: click-to-rate stars in CompareColumn (persists via POST /v1/runs/:runId/results/:resultId/rate), recent-run list rows are now buttons that navigate to /research-lab/runs/[id], and the detail route reconstructs CompareEntry shapes from eval_results + reuses CompareColumn for a full read-only view of any past run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:28:02 +02:00
Till JS	49f315f6be	feat(research): Phase 3a — 4 sync research agents Adds Perplexity Sonar, Claude web_search, OpenAI Responses, and Gemini Grounding as ResearchAgents behind the same comparison interface as the search and extract providers. New endpoints: POST /v1/research — single-agent (or auto-routed to the first provider with a configured key) POST /v1/research/compare — fan-out across N agents, persist all answers + citations in research.eval_* Each agent normalizes its native response into a common AgentAnswer shape (answer text + citations[] + tokenUsage), storing the provider's raw response alongside for later inspection. Implementations use direct HTTP against each vendor's public API — no SDK deps added. Auto-routing preference: perplexity-sonar → gemini-grounding → openai-responses → claude-web-search → (openai-deep-research stubbed for Phase 3b). Credits orchestration reuses the search/extract executor pattern (reserve → call → commit/refund). Deferred to Phase 3b: openai-deep-research (async job queue), migration of mana-ai + mana-api news-research to call this service directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 15:06:12 +02:00
Till JS	2bdb48bdd1	feat(research): add mana-research service — Phase 1 + 2 New Bun/Hono service on port 3068 that bundles many web-research providers behind a unified interface for side-by-side comparison. All eval runs persist in research.* (mana_platform) so quality can be reviewed later. Providers (Phase 1+2): search: searxng, duckduckgo, brave, tavily, exa, serper extract: readability (via mana-search), jina-reader, firecrawl Endpoints: POST /v1/search, /v1/search/compare — single + fan-out POST /v1/extract, /v1/extract/compare — single + fan-out GET /v1/runs, /v1/runs/:id — history POST /v1/runs/:run/results/:id/rate — manual eval GET /v1/providers, /v1/providers/health — catalog + readiness Auto-routing: when `provider` is omitted, queries are classified via regex (fast path, 0ms) with optional mana-llm fallback, then routed to the first available provider for that query type (news → tavily, academic → exa, semantic → exa, etc.). Credits: server-key calls go through mana-credits reserve → commit/refund so failed provider calls don't charge the user. BYO-keys supported via research.provider_configs (UI arrives in Phase 4). Cache: Redis with graceful degradation (1h TTL for search, 24h for extract). Pay-per-use APIs only — no subscription-gated providers. Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 14:42:25 +02:00

12 commits