Catches the service-level package.json files that the previous
sweep (4cca25ed0) missed — they don't appear in any dev:*:full
orchestrator but get invoked when someone runs `pnpm --filter
@mana/<service> dev` directly.
Touched: mana-geocoding, mana-mail, mana-subscriptions, mana-mcp,
news-ingester, mana-persona-runner, mana-research, mana-user,
plus apps/memoro (server + audio-server).
mana-ai stays on --watch on purpose: its entry uses an explicit
`Bun.serve({...})` call instead of `export default { port,
fetch }`, plus a SIGTERM/SIGINT handler that calls
`server.stop()`. --hot would replace the module without releasing
the old server reference and produce exactly the EADDRINUSE we're
trying to avoid. If mana-ai gets refactored to the standard
default-export shape, flip its dev script too.
Final milestone of docs/plans/llm-fallback-aliases.md. Every backend
caller now requests models via the `mana/<class>` alias system instead
of hardcoded `ollama/...` strings. mana-llm resolves aliases through
`services/mana-llm/aliases.yaml` with health-aware fallback (M3) and
emits resolved-model + fallback metrics (M4).
SSOT moved to `packages/shared-ai/src/llm-aliases.ts` so apps/api,
apps/mana/apps/web, and services/mana-ai all import the same
`MANA_LLM` constant via the existing `@mana/shared-ai` workspace
dependency. Three additional sites (memoro-server, mana-events,
mana-research) inline the alias string with a SSOT comment because
they don't pull @mana/shared-ai today.
Migrated 14 sites across 10 files:
- apps/api: writing(LONG_FORM), comic(STRUCTURED), context(FAST_TEXT),
food(VISION), plants(VISION), research orchestrator (3 tiers
collapsed to STRUCTURED+FAST_TEXT/LONG_FORM)
- apps/mana/apps/web: voice/parse-task + parse-habit (STRUCTURED)
- services/mana-ai: planner llm-client + tick.ts (REASONING)
- services/mana-events: website-extractor (STRUCTURED, inlined)
- services/mana-research: mana-llm client (FAST_TEXT, inlined)
- apps/memoro/apps/server: ai.ts (FAST_TEXT, inlined)
Legacy env-vars removed: WRITING_MODEL, COMIC_STORYBOARD_MODEL,
VISION_MODEL, MANA_LLM_DEFAULT_MODEL. The chain in aliases.yaml is
now the single tuning surface; SIGHUP reloads it without redeploys.
New `scripts/validate-llm-strings.mjs` regex-scans 2538 files for
hardcoded `<provider>/<model>` strings and fails the build if any
land outside the SSOT or the explicitly-allowed paths (image-gen
modules, model-inspector code, this validator itself, the registry).
Wired into `validate:all` next to the i18n + theme validators.
Verified: `pnpm validate:llm-strings` clean, `pnpm --filter @mana/api
type-check` clean, `pnpm --filter @mana/ai-service type-check`
clean. Web type-check has 2 pre-existing errors in
SettingsSidebar.svelte (i18n MessageFormatter type drift, last
touched in 988c17a67 — unrelated to this work).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-commit of c413ab7dd (reverted in c31dcdd66) without the unrelated
files that accidentally got swept into the original stage. Parser
content is identical.
The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).
This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:
- status dispatch (queued, in_progress, failed, cancelled, incomplete)
- completed body concatenated across text items, skipping thought/image
- empty/missing `outputs` without crashing
- missing usage
- citations deduped by url, hostname extracted as title
- wrong-type annotations and those without url skipped
- real vertexaisearch redirect URLs Gemini emits
- fallback to url as title when the URL is unparseable
- trimming of leading/trailing whitespace
To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.
Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.
17 pass / 0 fail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The real Gemini /v1beta/interactions/:id completed shape bit us once
already during the initial smoke-test (we had OpenAI-style nested
`output.message.content[]` coded; reality is a flat `outputs` array
of thought|text|image items, with url_citations that carry no title
and usage fields named `total_input_tokens` rather than `input_tokens`).
This test pins the parser against a synthetic fixture covering the
cases we saw in the wild plus the failure modes that are hard to
provoke from a live API call:
- status dispatch (queued, in_progress, failed, cancelled, incomplete)
- completed body concatenated across text items, skipping thought/image
- empty/missing `outputs` without crashing
- missing usage
- citations deduped by url, hostname extracted as title
- wrong-type annotations and those without url skipped
- real vertexaisearch redirect URLs Gemini emits
- fallback to url as title when the URL is unparseable
- trimming of leading/trailing whitespace
To make this testable I pulled the completed-branch of
pollGeminiDeepResearch into a standalone parseInteractionResponse
helper — same behaviour, now reachable without mocking global fetch.
Also adds the `test` script to package.json so `pnpm --filter
@mana/research-service test` works.
17 pass / 0 fail.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New providers gemini-deep-research + gemini-deep-research-max on the
Interactions API (preview-04-2026). Submit/poll split, tier parameter
selects between standard (~minutes, $1–3) and max (up to 60 min, $3–7).
- Parser matches the real response shape: flat `outputs` array of
thought|text|image items, url_citation annotations without title,
`usage.total_input_tokens` / `total_output_tokens`.
- Route generalisation: /v1/research/async accepts `provider` with
default 'openai-deep-research' (backward compatible) and dispatches
to the right submit/poll pair.
- New internal service-to-service endpoint /v1/internal/research/async
gated by X-Service-Key + X-User-Id for credit accounting. Enables
mana-ai to drive deep-research jobs on the mission owner's wallet
without requiring a user JWT.
- Pricing: 300 credits (standard) / 1500 credits (max). Conservative
markup over the ~$3/$7 ceiling so the first runs can't surprise us.
- Docs: AGENT_PROVIDER_IDS + pricing + env map + auto-router stay in
sync; CLAUDE.md Phase 3b now current; API_KEYS.md references the
new providers under GOOGLE_GENAI_API_KEY.
Verified with a real smoke test against the Gemini API: submit + poll
both succeed, completed response parsed cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Complete walkthrough per provider — signup URL, free-tier details,
pay-per-use pricing, env-var name, key format — plus sections on where
to paste keys (.env.secrets), BYO-keys vs server-keys, verification
curl commands and troubleshooting (including the cross-service
MANA_SERVICE_KEY mismatch encountered during live testing).
Linked from services/mana-research/CLAUDE.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Google deprecated `gemini-2.0-flash` for new API users — existing
accounts still work, but a freshly-billed key returns 404
"models/gemini-2.0-flash is no longer available to new users". The
working replacement is `gemini-2.5-flash` (same price tier, better
quality, groundingMetadata shape unchanged).
Verified live: the fix produced a real answer with 6 grounding
citations in 2.6s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- claude-web-search.ts: only send `temperature` when caller explicitly
sets one. Opus 4.7 deprecated the param and returns
400 invalid_request_error "`temperature` is deprecated for this
model." Sonnet/Haiku still accept it, so keep the opt-in path.
- execute-research.ts: log provider errors via console.warn so future
integration failures are visible in stdout. Previously the executor
swallowed the underlying error and only returned a generic errorCode,
which made diagnosing vendor-specific API changes impossible.
Discovered via smoke-testing with a real Anthropic key — the direct
curl worked, but our provider 400'd because Opus 4.7 tightened the
accepted param set.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two backlog items landed in one commit because an earlier amend in a
parallel terminal dropped the initial Phase 3b commit and the BYO-keys
work was blocked on the same wiring.
openai-deep-research (async):
- New research.async_jobs table persists the OpenAI response.id, query,
reservation, and cached result/error.
- POST /v1/research/async reserves credits, submits to the Responses API
with background=true, returns a taskId. Submit failure refunds.
- GET /v1/research/async/:taskId polls upstream, commits the reservation
on completion, refunds on failure, short-circuits for terminal states.
- GET /v1/research/async lists the user's async tasks.
BYO-keys:
- research.provider_configs CRUD at /v1/provider-configs. Keys are masked
(••••last4) on read so the raw secret never re-transits to the browser.
Currently stored plaintext with a TODO for AES-GCM-256 via the shared
KEK — single call site in storage/configs.ts.decryptKey().
- New frontend route /research-lab/keys lets the user paste a key per
provider, toggle enabled, and set daily/monthly credit budgets.
- ListView grew a 🔑 link in the header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Branding: research-lab registered in @mana/shared-branding with requiredTier: 'beta' + a custom flask-on-purple icon, so guest/public users are filtered out of the workbench picker.
- Backend: compare routes now return resultId alongside each CompareEntry so the frontend can wire ratings to the eval_results rows in research.*.
- Frontend: click-to-rate stars in CompareColumn (persists via POST /v1/runs/:runId/results/:resultId/rate), recent-run list rows are now buttons that navigate to /research-lab/runs/[id], and the detail route reconstructs CompareEntry shapes from eval_results + reuses CompareColumn for a full read-only view of any past run.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Perplexity Sonar, Claude web_search, OpenAI Responses, and Gemini
Grounding as ResearchAgents behind the same comparison interface as the
search and extract providers.
New endpoints:
POST /v1/research — single-agent (or auto-routed to the first
provider with a configured key)
POST /v1/research/compare — fan-out across N agents, persist all
answers + citations in research.eval_*
Each agent normalizes its native response into a common AgentAnswer shape
(answer text + citations[] + tokenUsage), storing the provider's raw
response alongside for later inspection. Implementations use direct HTTP
against each vendor's public API — no SDK deps added.
Auto-routing preference: perplexity-sonar → gemini-grounding →
openai-responses → claude-web-search → (openai-deep-research stubbed for
Phase 3b). Credits orchestration reuses the search/extract executor
pattern (reserve → call → commit/refund).
Deferred to Phase 3b: openai-deep-research (async job queue), migration
of mana-ai + mana-api news-research to call this service directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New Bun/Hono service on port 3068 that bundles many web-research providers
behind a unified interface for side-by-side comparison. All eval runs
persist in research.* (mana_platform) so quality can be reviewed later.
Providers (Phase 1+2):
search: searxng, duckduckgo, brave, tavily, exa, serper
extract: readability (via mana-search), jina-reader, firecrawl
Endpoints:
POST /v1/search, /v1/search/compare — single + fan-out
POST /v1/extract, /v1/extract/compare — single + fan-out
GET /v1/runs, /v1/runs/:id — history
POST /v1/runs/:run/results/:id/rate — manual eval
GET /v1/providers, /v1/providers/health — catalog + readiness
Auto-routing: when `provider` is omitted, queries are classified via regex
(fast path, 0ms) with optional mana-llm fallback, then routed to the first
available provider for that query type (news → tavily, academic → exa,
semantic → exa, etc.).
Credits: server-key calls go through mana-credits reserve → commit/refund
so failed provider calls don't charge the user. BYO-keys supported via
research.provider_configs (UI arrives in Phase 4).
Cache: Redis with graceful degradation (1h TTL for search, 24h for
extract). Pay-per-use APIs only — no subscription-gated providers.
Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>