managarten/services/mana-research/CLAUDE.md

# mana-research

Web research orchestration service. Bundles 16+ providers (search, extract, agent) behind one interface. Pay-per-use APIs only, integrated with `mana-credits` 2-phase debit.

**Plan:** [`docs/plans/mana-research-service.md`](../../docs/plans/mana-research-service.md)
**Related analysis:** [`docs/reports/web-research-capabilities.md`](../../docs/reports/web-research-capabilities.md)
**API-Keys Setup-Guide:** [`API_KEYS.md`](./API_KEYS.md) — step-by-step per provider, pricing, signup URLs

## Tech Stack

| Layer | Technology |
|-------|------------|
| **Runtime** | Bun |
| **Framework** | Hono |
| **Database** | PostgreSQL + Drizzle ORM (`research.*` schema in `mana_platform`) |
| **Cache** | Redis (ioredis, graceful degradation) |
| **Auth** | JWT via JWKS from mana-auth, plus `X-Service-Key` for service-to-service |

## Quick Start

```bash
# From repo root: ensure postgres + redis are up, then run
pnpm --filter @mana/research-service dev

# Database schema (creates research.* tables)
cd services/mana-research
bun run db:push
bun run db:studio
```

## Port: 3068

## Phases

- **Phase 1** ✅ — 4 search providers (`searxng`, `duckduckgo`, `brave`, `tavily`), `/v1/search`, `/v1/search/compare`, `/v1/runs`, `/v1/providers`, `mana-credits` reserve/commit/refund.
- **Phase 2** ✅ — +2 search providers (`exa`, `serper`), 3 extract providers (`readability`, `jina-reader`, `firecrawl`), `/v1/extract`, `/v1/extract/compare`, query classifier + auto-router, `/v1/providers/health`.
- **Phase 3a** ✅ — 4 sync research agents (`perplexity-sonar`, `claude-web-search`, `openai-responses`, `gemini-grounding`), `/v1/research`, `/v1/research/compare`, agent auto-router.
- **Phase 3b (current)** ✅ — async agents `openai-deep-research`, `gemini-deep-research`, `gemini-deep-research-max` via `research.async_jobs` queue. User-facing `/v1/research/async`, service-to-service `/v1/internal/research/async` (used by mana-ai's cross-tick deep-research flow). See [`docs/reports/gemini-deep-research.md`](../../docs/reports/gemini-deep-research.md).
- **Phase 4** — Research Lab UI + Settings for BYO-keys.

## API Endpoints

### User-facing (JWT auth)

| Method | Path | Description |
|---|---|---|
| POST | `/api/v1/search` | Single-provider search, or auto-routed if `provider` omitted. Body: `{ query, provider?, options?, useLlmClassifier? }`. |
| POST | `/api/v1/search/compare` | Fan-out to N providers (max 5), persist eval_run. Body: `{ query, providers[], options? }`. |
| POST | `/api/v1/extract` | Single-provider extract, auto-routed if `provider` omitted. Body: `{ url, provider?, options? }`. |
| POST | `/api/v1/extract/compare` | Fan-out to N extract providers (max 4). Body: `{ url, providers[], options? }`. |
| POST | `/api/v1/research` | Single-agent research. Auto-routed if `provider` omitted. Body: `{ query, provider?, options? }`. |
| POST | `/api/v1/research/compare` | Fan-out to N agents (max 4). Body: `{ query, providers[], options? }`. |
| GET | `/api/v1/runs` | List user's eval runs. Query: `?limit=50&offset=0`. |
| GET | `/api/v1/runs/:id` | Run + all results. |
| POST | `/api/v1/runs/:runId/results/:resultId/rate` | Body: `{ rating: 1-5, notes? }`. |

### Public

| Method | Path | Description |
|---|---|---|
| GET | `/health` | Health check. |
| GET | `/metrics` | Prometheus stub (wired up later). |
| GET | `/api/v1/providers` | List registered providers + capabilities + pricing. |
| GET | `/api/v1/providers/health` | Per-provider readiness check (`free` / `ready` / `needs-key`). |

### Service-to-service (X-Service-Key)

All `/api/v1/internal/*` routes require `X-Service-Key: <MANA_SERVICE_KEY>`. Endpoints that touch per-user state additionally require `X-User-Id: <userId>` so credit reservations + eval-run rows land on the right user.

| Method | Path | Description |
|---|---|---|
| GET | `/api/v1/internal/health` | Placeholder health probe. |
| POST | `/api/v1/internal/research/async` | Submit async research job. Body: `{ query, provider, options? }` where `provider ∈ { openai-deep-research, gemini-deep-research, gemini-deep-research-max }`. Requires `X-User-Id`. |
| GET | `/api/v1/internal/research/async/:id` | Poll status / read completed result. Requires `X-User-Id` (same user as submit). |

Caller today: **mana-ai** (`ManaResearchClient`), which fires deep-research-max tasks from the tick-loop's pre-planning step for missions that opt in via `DEEP_RESEARCH_TRIGGER`.

## Providers

### Search (6)

| Provider | Key | Cost | Notes |
|---|---|---|---|
| `searxng` | — | 0 | Wraps `mana-search` (SearXNG). Self-hosted. |
| `duckduckgo` | — | 0 | Instant Answer API. Rate-limited. |
| `brave` | `BRAVE_API_KEY` | 5 | $5/1k PAYG. Independent index. |
| `tavily` | `TAVILY_API_KEY` | 8 | Agent-optimized, returns content. |
| `exa` | `EXA_API_KEY` | 6 | Semantic/neural, best for papers + semantic similarity. |
| `serper` | `SERPER_API_KEY` | 1 | Google SERP as JSON. $0.30–1/1k. |

### Extract (3)

| Provider | Key | Cost | Notes |
|---|---|---|---|
| `readability` | — | 0 | Wraps `mana-search /extract` (go-readability). |
| `jina-reader` | optional `JINA_API_KEY` | 1 | `r.jina.ai`, JS-rendering + PDF, Markdown out. |
| `firecrawl` | `FIRECRAWL_API_KEY` | 10 | Playwright-based, best for JS-heavy sites. Self-hostable. |

### Research Agents (4 sync, 1 async planned)

| Provider | Key | Cost | Notes |
|---|---|---|---|
| `perplexity-sonar` | `PERPLEXITY_API_KEY` | 50 | 4 models: sonar, sonar-pro, sonar-reasoning, sonar-deep-research. Best plug-and-play. |
| `gemini-grounding` | `GOOGLE_GENAI_API_KEY` | 100 | Gemini + Google Search grounding. Single-step. |
| `openai-responses` | `OPENAI_API_KEY` | 200 | Responses API with `web_search_preview` tool. Multi-step. |
| `claude-web-search` | `ANTHROPIC_API_KEY` | 200 | Claude + `web_search_20250305` tool, up to 5 searches/call. |
| `openai-deep-research` | `OPENAI_API_KEY` | 1000 | async, returns taskId to poll. |
| `gemini-deep-research` | `GOOGLE_GENAI_API_KEY` | 300 | async, Gemini 3.1 Pro preview (04-2026). Standard tier, ~minutes. |
| `gemini-deep-research-max` | `GOOGLE_GENAI_API_KEY` | 1500 | async, Gemini 3.1 Pro preview (04-2026). Max tier, up to 60 min, deep synthesis. |

## Auto-routing

When `provider` is omitted from `POST /v1/search`, the service classifies the query via regex (fast path, ~0ms) and optionally the LLM (`useLlmClassifier: true`), then picks the first available provider from `SEARCH_ROUTE_MAP[type]`:

- `news` → tavily, brave, serper, searxng, duckduckgo
- `general` → brave, tavily, serper, searxng
- `semantic` → exa, tavily, brave
- `academic` → exa, searxng, brave
- `code` → exa, serper, brave
- `conversational` → tavily, brave, serper

Extract auto-routing prefers `firecrawl` (best quality) → `jina-reader` → `readability`.

## Credits Integration

Server-key mode uses `mana-credits` 2-phase debit:

```
reserve → provider call → (commit on success | refund on error)
```

BYO-key mode bypasses credits entirely (user brings their own API key, Phase 4 UI).

Pricing map: `src/lib/pricing.ts`.

## Database

Schema `research` in `mana_platform`:

- `eval_runs` — one per request (`single`/`compare`/`auto` mode).
- `eval_results` — one per provider response. Raw + normalized output, latency, cost, optional user rating.
- `provider_configs` — per-user BYO-key + budget. `userId=null` reserved for server defaults.
- `provider_stats` — rolled-up daily metrics for admin dashboard + auto-router.

All eval runs are **permanent** by design — this is the comparison engine's point.

## Environment Variables

```env
PORT=3068
DATABASE_URL=postgresql://mana:devpassword@localhost:5432/mana_platform
REDIS_URL=redis://localhost:6379
MANA_AUTH_URL=http://localhost:3001
MANA_LLM_URL=http://localhost:3025
MANA_CREDITS_URL=http://localhost:3061
MANA_SEARCH_URL=http://localhost:3021
MANA_SERVICE_KEY=dev-service-key
CACHE_TTL_SECONDS=3600
CORS_ORIGINS=http://localhost:5173

# Provider keys (optional in dev — providers without keys are unavailable)
BRAVE_API_KEY=
TAVILY_API_KEY=
EXA_API_KEY=
SERPER_API_KEY=
JINA_API_KEY=
FIRECRAWL_API_KEY=
SCRAPINGBEE_API_KEY=
PERPLEXITY_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_GENAI_API_KEY=
```