feat(research): add mana-research service — Phase 1 + 2

New Bun/Hono service on port 3068 that bundles many web-research providers
behind a unified interface for side-by-side comparison. All eval runs
persist in research.* (mana_platform) so quality can be reviewed later.

Providers (Phase 1+2):
  search:  searxng, duckduckgo, brave, tavily, exa, serper
  extract: readability (via mana-search), jina-reader, firecrawl

Endpoints:
  POST /v1/search, /v1/search/compare       — single + fan-out
  POST /v1/extract, /v1/extract/compare     — single + fan-out
  GET  /v1/runs, /v1/runs/:id               — history
  POST /v1/runs/:run/results/:id/rate       — manual eval
  GET  /v1/providers, /v1/providers/health  — catalog + readiness

Auto-routing: when `provider` is omitted, queries are classified via regex
(fast path, 0ms) with optional mana-llm fallback, then routed to the first
available provider for that query type (news → tavily, academic → exa,
semantic → exa, etc.).

Credits: server-key calls go through mana-credits reserve → commit/refund
so failed provider calls don't charge the user. BYO-keys supported via
research.provider_configs (UI arrives in Phase 4).

Cache: Redis with graceful degradation (1h TTL for search, 24h for
extract). Pay-per-use APIs only — no subscription-gated providers.

Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-17 14:42:25 +02:00
parent 004fc0b2fd
commit 2bdb48bdd1
56 changed files with 4431 additions and 298 deletions

View file

@ -0,0 +1,149 @@
# mana-research
Web research orchestration service. Bundles 16+ providers (search, extract, agent) behind one interface. Pay-per-use APIs only, integrated with `mana-credits` 2-phase debit.
**Plan:** [`docs/plans/mana-research-service.md`](../../docs/plans/mana-research-service.md)
**Related analysis:** [`docs/reports/web-research-capabilities.md`](../../docs/reports/web-research-capabilities.md)
## Tech Stack
| Layer | Technology |
|-------|------------|
| **Runtime** | Bun |
| **Framework** | Hono |
| **Database** | PostgreSQL + Drizzle ORM (`research.*` schema in `mana_platform`) |
| **Cache** | Redis (ioredis, graceful degradation) |
| **Auth** | JWT via JWKS from mana-auth, plus `X-Service-Key` for service-to-service |
## Quick Start
```bash
# From repo root: ensure postgres + redis are up, then run
pnpm --filter @mana/research-service dev
# Database schema (creates research.* tables)
cd services/mana-research
bun run db:push
bun run db:studio
```
## Port: 3068
## Phases
- **Phase 1** ✅ — 4 search providers (`searxng`, `duckduckgo`, `brave`, `tavily`), `/v1/search`, `/v1/search/compare`, `/v1/runs`, `/v1/providers`, `mana-credits` reserve/commit/refund.
- **Phase 2 (current)** ✅ — +2 search providers (`exa`, `serper`), 3 extract providers (`readability`, `jina-reader`, `firecrawl`), `/v1/extract`, `/v1/extract/compare`, query classifier + auto-router, `/v1/providers/health`.
- **Phase 3** — Research agents (`perplexity-sonar`, `claude-web-search`, `openai-responses`, `gemini-grounding`, `openai-deep-research`). mana-ai migration to use this service.
- **Phase 4** — Research Lab UI + Settings for BYO-keys.
## API Endpoints
### User-facing (JWT auth)
| Method | Path | Description |
|---|---|---|
| POST | `/api/v1/search` | Single-provider search, or auto-routed if `provider` omitted. Body: `{ query, provider?, options?, useLlmClassifier? }`. |
| POST | `/api/v1/search/compare` | Fan-out to N providers (max 5), persist eval_run. Body: `{ query, providers[], options? }`. |
| POST | `/api/v1/extract` | Single-provider extract, auto-routed if `provider` omitted. Body: `{ url, provider?, options? }`. |
| POST | `/api/v1/extract/compare` | Fan-out to N extract providers (max 4). Body: `{ url, providers[], options? }`. |
| GET | `/api/v1/runs` | List user's eval runs. Query: `?limit=50&offset=0`. |
| GET | `/api/v1/runs/:id` | Run + all results. |
| POST | `/api/v1/runs/:runId/results/:resultId/rate` | Body: `{ rating: 1-5, notes? }`. |
### Public
| Method | Path | Description |
|---|---|---|
| GET | `/health` | Health check. |
| GET | `/metrics` | Prometheus stub (wired up later). |
| GET | `/api/v1/providers` | List registered providers + capabilities + pricing. |
| GET | `/api/v1/providers/health` | Per-provider readiness check (`free` / `ready` / `needs-key`). |
### Service-to-service (X-Service-Key)
Reserved for Phase 3 when `mana-ai` migrates to call this service directly. `/api/v1/internal/health` exists as a placeholder.
## Providers
### Search (6)
| Provider | Key | Cost | Notes |
|---|---|---|---|
| `searxng` | — | 0 | Wraps `mana-search` (SearXNG). Self-hosted. |
| `duckduckgo` | — | 0 | Instant Answer API. Rate-limited. |
| `brave` | `BRAVE_API_KEY` | 5 | $5/1k PAYG. Independent index. |
| `tavily` | `TAVILY_API_KEY` | 8 | Agent-optimized, returns content. |
| `exa` | `EXA_API_KEY` | 6 | Semantic/neural, best for papers + semantic similarity. |
| `serper` | `SERPER_API_KEY` | 1 | Google SERP as JSON. $0.301/1k. |
### Extract (3)
| Provider | Key | Cost | Notes |
|---|---|---|---|
| `readability` | — | 0 | Wraps `mana-search /extract` (go-readability). |
| `jina-reader` | optional `JINA_API_KEY` | 1 | `r.jina.ai`, JS-rendering + PDF, Markdown out. |
| `firecrawl` | `FIRECRAWL_API_KEY` | 10 | Playwright-based, best for JS-heavy sites. Self-hostable. |
## Auto-routing
When `provider` is omitted from `POST /v1/search`, the service classifies the query via regex (fast path, ~0ms) and optionally the LLM (`useLlmClassifier: true`), then picks the first available provider from `SEARCH_ROUTE_MAP[type]`:
- `news` → tavily, brave, serper, searxng, duckduckgo
- `general` → brave, tavily, serper, searxng
- `semantic` → exa, tavily, brave
- `academic` → exa, searxng, brave
- `code` → exa, serper, brave
- `conversational` → tavily, brave, serper
Extract auto-routing prefers `firecrawl` (best quality) → `jina-reader``readability`.
## Credits Integration
Server-key mode uses `mana-credits` 2-phase debit:
```
reserve → provider call → (commit on success | refund on error)
```
BYO-key mode bypasses credits entirely (user brings their own API key, Phase 4 UI).
Pricing map: `src/lib/pricing.ts`.
## Database
Schema `research` in `mana_platform`:
- `eval_runs` — one per request (`single`/`compare`/`auto` mode).
- `eval_results` — one per provider response. Raw + normalized output, latency, cost, optional user rating.
- `provider_configs` — per-user BYO-key + budget. `userId=null` reserved for server defaults.
- `provider_stats` — rolled-up daily metrics for admin dashboard + auto-router.
All eval runs are **permanent** by design — this is the comparison engine's point.
## Environment Variables
```env
PORT=3068
DATABASE_URL=postgresql://mana:devpassword@localhost:5432/mana_platform
REDIS_URL=redis://localhost:6379
MANA_AUTH_URL=http://localhost:3001
MANA_LLM_URL=http://localhost:3025
MANA_CREDITS_URL=http://localhost:3061
MANA_SEARCH_URL=http://localhost:3021
MANA_SERVICE_KEY=dev-service-key
CACHE_TTL_SECONDS=3600
CORS_ORIGINS=http://localhost:5173
# Provider keys (optional in dev — providers without keys are unavailable)
BRAVE_API_KEY=
TAVILY_API_KEY=
EXA_API_KEY=
SERPER_API_KEY=
JINA_API_KEY=
FIRECRAWL_API_KEY=
SCRAPINGBEE_API_KEY=
PERPLEXITY_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
GOOGLE_GENAI_API_KEY=
```

View file

@ -0,0 +1,37 @@
# Install stage: use node + pnpm to resolve workspace dependencies
FROM node:22-alpine AS installer
RUN corepack enable && corepack prepare pnpm@9.15.0 --activate
WORKDIR /app
COPY package.json pnpm-lock.yaml pnpm-workspace.yaml ./
COPY services/mana-research/package.json ./services/mana-research/
COPY packages/shared-hono ./packages/shared-hono
COPY packages/shared-logger ./packages/shared-logger
COPY packages/shared-types ./packages/shared-types
COPY packages/shared-research ./packages/shared-research
RUN pnpm install --filter @mana/research-service... --no-frozen-lockfile --ignore-scripts
# Runtime stage: bun
FROM oven/bun:1 AS production
WORKDIR /app
COPY --from=installer /app/node_modules ./node_modules
COPY --from=installer /app/services/mana-research/node_modules ./services/mana-research/node_modules
COPY --from=installer /app/packages ./packages
COPY services/mana-research/package.json ./services/mana-research/
COPY services/mana-research/src ./services/mana-research/src
COPY services/mana-research/tsconfig.json services/mana-research/drizzle.config.ts ./services/mana-research/
WORKDIR /app/services/mana-research
EXPOSE 3068
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD bun -e "fetch('http://localhost:3068/health').then(r => process.exit(r.ok ? 0 : 1)).catch(() => process.exit(1))"
CMD ["bun", "run", "src/index.ts"]

View file

@ -0,0 +1,11 @@
import { defineConfig } from 'drizzle-kit';
export default defineConfig({
schema: './src/db/schema/*.ts',
out: './drizzle',
dialect: 'postgresql',
dbCredentials: {
url: process.env.DATABASE_URL || 'postgresql://mana:devpassword@localhost:5432/mana_platform',
},
schemaFilter: ['research'],
});

View file

@ -0,0 +1,28 @@
{
"name": "@mana/research-service",
"version": "0.1.0",
"private": true,
"type": "module",
"scripts": {
"dev": "bun run --watch src/index.ts",
"start": "bun run src/index.ts",
"db:push": "drizzle-kit push",
"db:generate": "drizzle-kit generate",
"db:studio": "drizzle-kit studio",
"type-check": "tsc --noEmit"
},
"dependencies": {
"@mana/shared-hono": "workspace:*",
"@mana/shared-research": "workspace:*",
"hono": "^4.7.0",
"drizzle-orm": "^0.38.3",
"postgres": "^3.4.5",
"ioredis": "^5.4.1",
"jose": "^6.1.2",
"zod": "^3.24.0"
},
"devDependencies": {
"drizzle-kit": "^0.30.4",
"typescript": "^5.9.3"
}
}

View file

@ -0,0 +1,68 @@
/**
* HTTP client for mana-credits. Uses the internal Reserve/Commit/Refund endpoints
* (added in this phase see services/mana-credits/src/routes/internal.ts).
*/
export interface CreditsClientConfig {
baseUrl: string;
serviceKey: string;
}
export interface ReservationResponse {
reservationId: string;
balance: number;
}
export class CreditsClient {
constructor(private config: CreditsClientConfig) {}
private headers() {
return {
'Content-Type': 'application/json',
'X-Service-Key': this.config.serviceKey,
'X-App-Id': 'mana-research',
};
}
async getBalance(userId: string): Promise<{ balance: number }> {
const res = await fetch(
`${this.config.baseUrl}/api/v1/internal/credits/balance/${encodeURIComponent(userId)}`,
{ headers: this.headers() }
);
if (!res.ok) throw new Error(`credits.balance failed: ${res.status}`);
return res.json() as Promise<{ balance: number }>;
}
async reserve(userId: string, amount: number, reason: string): Promise<ReservationResponse> {
const res = await fetch(`${this.config.baseUrl}/api/v1/internal/credits/reserve`, {
method: 'POST',
headers: this.headers(),
body: JSON.stringify({ userId, amount, reason }),
});
if (!res.ok) {
const body = await res.text();
throw new Error(`credits.reserve failed: ${res.status} ${body}`);
}
return res.json() as Promise<ReservationResponse>;
}
async commit(reservationId: string, description?: string): Promise<{ success: boolean }> {
const res = await fetch(`${this.config.baseUrl}/api/v1/internal/credits/commit`, {
method: 'POST',
headers: this.headers(),
body: JSON.stringify({ reservationId, description }),
});
if (!res.ok) throw new Error(`credits.commit failed: ${res.status}`);
return res.json() as Promise<{ success: boolean }>;
}
async refund(reservationId: string): Promise<{ success: boolean }> {
const res = await fetch(`${this.config.baseUrl}/api/v1/internal/credits/refund-reservation`, {
method: 'POST',
headers: this.headers(),
body: JSON.stringify({ reservationId }),
});
if (!res.ok) throw new Error(`credits.refund failed: ${res.status}`);
return res.json() as Promise<{ success: boolean }>;
}
}

View file

@ -0,0 +1,46 @@
/**
* HTTP client for mana-llm OpenAI-compatible chat completions endpoint.
* Used by the query classifier (short prompt JSON tag).
*/
export interface ChatMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
export class ManaLlmClient {
constructor(private baseUrl: string) {}
async chat(
messages: ChatMessage[],
opts: { model?: string; maxTokens?: number; temperature?: number; signal?: AbortSignal } = {}
): Promise<{ content: string; tokenUsage?: { input: number; output: number } }> {
const res = await fetch(`${this.baseUrl}/v1/chat/completions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: opts.model ?? 'ollama/gemma3:4b',
messages,
max_tokens: opts.maxTokens ?? 256,
temperature: opts.temperature ?? 0.2,
}),
signal: opts.signal,
});
if (!res.ok) {
throw new Error(`mana-llm returned ${res.status}`);
}
type ChatResponse = {
choices?: Array<{ message?: { content?: string } }>;
usage?: { prompt_tokens?: number; completion_tokens?: number };
};
const data = (await res.json()) as ChatResponse;
const content = data.choices?.[0]?.message?.content ?? '';
const tokenUsage = data.usage
? {
input: data.usage.prompt_tokens ?? 0,
output: data.usage.completion_tokens ?? 0,
}
: undefined;
return { content, tokenUsage };
}
}

View file

@ -0,0 +1,91 @@
/**
* HTTP client for mana-search (Go service on port 3021).
* Used by the SearXNGProvider and ReadabilityProvider wrappers.
*/
import { ProviderError } from '../lib/errors';
export interface ManaSearchHit {
url: string;
title: string;
snippet: string;
engine?: string;
score?: number;
publishedDate?: string;
category?: string;
}
export interface ManaSearchResponse {
results: ManaSearchHit[];
meta: {
query: string;
totalResults: number;
engines: string[];
cached: boolean;
duration: number;
};
}
export interface ManaExtractResponse {
success: boolean;
content?: {
title: string;
text: string;
markdown?: string;
author?: string;
publishedDate?: string;
siteName?: string;
wordCount: number;
};
}
export class ManaSearchClient {
constructor(private baseUrl: string) {}
async search(
query: string,
options: {
limit?: number;
categories?: string[];
language?: string;
signal?: AbortSignal;
} = {}
): Promise<ManaSearchResponse> {
const res = await fetch(`${this.baseUrl}/api/v1/search`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query,
options: {
limit: options.limit ?? 10,
categories: options.categories,
language: options.language ?? 'de-DE',
},
}),
signal: options.signal,
});
if (!res.ok) {
throw new ProviderError('searxng', `mana-search returned ${res.status}`);
}
return res.json() as Promise<ManaSearchResponse>;
}
async extract(
url: string,
options: { maxLength?: number; signal?: AbortSignal } = {}
): Promise<ManaExtractResponse> {
const res = await fetch(`${this.baseUrl}/api/v1/extract`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url,
options: { maxLength: options.maxLength ?? 50000, includeMarkdown: true },
}),
signal: options.signal,
});
if (!res.ok) {
throw new ProviderError('readability', `mana-search extract returned ${res.status}`);
}
return res.json() as Promise<ManaExtractResponse>;
}
}

View file

@ -0,0 +1,68 @@
/**
* Application configuration loaded from environment variables.
*/
export interface Config {
port: number;
databaseUrl: string;
redisUrl: string;
manaAuthUrl: string;
manaLlmUrl: string;
manaCreditsUrl: string;
manaSearchUrl: string;
serviceKey: string;
cors: { origins: string[] };
cacheTtlSeconds: number;
providerKeys: {
brave?: string;
tavily?: string;
exa?: string;
serper?: string;
perplexity?: string;
anthropic?: string;
openai?: string;
googleGenai?: string;
jina?: string;
firecrawl?: string;
scrapingbee?: string;
};
}
export function loadConfig(): Config {
const requiredEnv = (key: string, fallback?: string): string => {
const value = process.env[key] || fallback;
if (!value) throw new Error(`Missing required env var: ${key}`);
return value;
};
return {
port: parseInt(process.env.PORT || '3068', 10),
databaseUrl: requiredEnv(
'DATABASE_URL',
'postgresql://mana:devpassword@localhost:5432/mana_platform'
),
redisUrl: process.env.REDIS_URL || 'redis://localhost:6379',
manaAuthUrl: requiredEnv('MANA_AUTH_URL', 'http://localhost:3001'),
manaLlmUrl: requiredEnv('MANA_LLM_URL', 'http://localhost:3025'),
manaCreditsUrl: requiredEnv('MANA_CREDITS_URL', 'http://localhost:3061'),
manaSearchUrl: requiredEnv('MANA_SEARCH_URL', 'http://localhost:3021'),
serviceKey: requiredEnv('MANA_SERVICE_KEY', 'dev-service-key'),
cors: {
origins: (process.env.CORS_ORIGINS || 'http://localhost:5173').split(','),
},
cacheTtlSeconds: parseInt(process.env.CACHE_TTL_SECONDS || '3600', 10),
providerKeys: {
brave: process.env.BRAVE_API_KEY,
tavily: process.env.TAVILY_API_KEY,
exa: process.env.EXA_API_KEY,
serper: process.env.SERPER_API_KEY,
perplexity: process.env.PERPLEXITY_API_KEY,
anthropic: process.env.ANTHROPIC_API_KEY,
openai: process.env.OPENAI_API_KEY,
googleGenai: process.env.GOOGLE_GENAI_API_KEY,
jina: process.env.JINA_API_KEY,
firecrawl: process.env.FIRECRAWL_API_KEY,
scrapingbee: process.env.SCRAPINGBEE_API_KEY,
},
};
}

View file

@ -0,0 +1,15 @@
import { drizzle } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';
import * as schema from './schema/index';
let db: ReturnType<typeof drizzle<typeof schema>> | null = null;
export function getDb(databaseUrl: string) {
if (!db) {
const client = postgres(databaseUrl, { max: 10 });
db = drizzle(client, { schema });
}
return db;
}
export type Database = ReturnType<typeof getDb>;

View file

@ -0,0 +1 @@
export * from './research';

View file

@ -0,0 +1,131 @@
/**
* Research Schema provider configs, eval runs, results, and stats.
*
* Lives in mana_platform DB under the `research` pgSchema.
* All userId columns are text without FK (separate ownership from auth.users).
*/
import {
pgSchema,
uuid,
integer,
text,
timestamp,
jsonb,
boolean,
real,
index,
uniqueIndex,
primaryKey,
pgEnum,
} from 'drizzle-orm/pg-core';
export const researchSchema = pgSchema('research');
export const billingModeEnum = pgEnum('research_billing_mode', [
'server-key',
'byo-key',
'free',
'mixed',
]);
export const runCategoryEnum = pgEnum('research_run_category', ['search', 'extract', 'agent']);
export const runModeEnum = pgEnum('research_run_mode', ['single', 'compare', 'auto']);
/** A single research request: one query, one or more providers. */
export const evalRuns = researchSchema.table(
'eval_runs',
{
id: uuid('id').primaryKey().defaultRandom(),
userId: text('user_id'),
query: text('query').notNull(),
queryType: text('query_type'),
mode: runModeEnum('mode').notNull(),
category: runCategoryEnum('category').notNull(),
providersRequested: text('providers_requested').array().notNull(),
billingMode: billingModeEnum('billing_mode').notNull(),
totalCostCredits: integer('total_cost_credits').notNull().default(0),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
},
(t) => ({
userIdx: index('eval_runs_user_idx').on(t.userId, t.createdAt),
queryIdx: index('eval_runs_query_idx').on(t.query),
})
);
/** One provider response per run. */
export const evalResults = researchSchema.table(
'eval_results',
{
id: uuid('id').primaryKey().defaultRandom(),
runId: uuid('run_id')
.notNull()
.references(() => evalRuns.id, { onDelete: 'cascade' }),
providerId: text('provider_id').notNull(),
success: boolean('success').notNull(),
latencyMs: integer('latency_ms').notNull(),
costCredits: integer('cost_credits').notNull().default(0),
billingMode: billingModeEnum('billing_mode').notNull(),
cacheHit: boolean('cache_hit').notNull().default(false),
rawResponse: jsonb('raw_response'),
normalizedResult: jsonb('normalized_result'),
errorCode: text('error_code'),
errorMessage: text('error_message'),
userRating: integer('user_rating'),
userNotes: text('user_notes'),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
},
(t) => ({
runIdx: index('eval_results_run_idx').on(t.runId),
providerIdx: index('eval_results_provider_idx').on(t.providerId, t.createdAt),
})
);
/** Per-user BYO-key config + budgets. `userId=null` reserved for server-default row. */
export const providerConfigs = researchSchema.table(
'provider_configs',
{
id: uuid('id').primaryKey().defaultRandom(),
userId: text('user_id'),
providerId: text('provider_id').notNull(),
apiKeyEncrypted: text('api_key_encrypted'),
enabled: boolean('enabled').notNull().default(true),
dailyBudgetCredits: integer('daily_budget_credits'),
monthlyBudgetCredits: integer('monthly_budget_credits'),
createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
updatedAt: timestamp('updated_at', { withTimezone: true }).defaultNow().notNull(),
},
(t) => ({
userProviderUnique: uniqueIndex('provider_configs_user_provider_unique').on(
t.userId,
t.providerId
),
})
);
/** Aggregated per-day stats for Admin dashboard + auto-router. */
export const providerStats = researchSchema.table(
'provider_stats',
{
providerId: text('provider_id').notNull(),
day: text('day').notNull(),
totalCalls: integer('total_calls').notNull().default(0),
totalLatencyMs: integer('total_latency_ms').notNull().default(0),
totalCostCredits: integer('total_cost_credits').notNull().default(0),
successCount: integer('success_count').notNull().default(0),
errorCount: integer('error_count').notNull().default(0),
avgUserRating: real('avg_user_rating'),
ratingCount: integer('rating_count').notNull().default(0),
},
(t) => ({
pk: primaryKey({ columns: [t.providerId, t.day] }),
})
);
export type EvalRun = typeof evalRuns.$inferSelect;
export type NewEvalRun = typeof evalRuns.$inferInsert;
export type EvalResult = typeof evalResults.$inferSelect;
export type NewEvalResult = typeof evalResults.$inferInsert;
export type ProviderConfig = typeof providerConfigs.$inferSelect;
export type ProviderStat = typeof providerStats.$inferSelect;

View file

@ -0,0 +1,25 @@
import type { ProviderId } from '@mana/shared-research';
import type { Config } from '../config';
/**
* Maps a ProviderId to the corresponding env-key slot on Config.providerKeys.
* Extract/agent providers that share a key with search (e.g. openai agents)
* reuse the same slot.
*/
export function mapEnvKey(providerId: ProviderId): keyof Config['providerKeys'] {
const map: Partial<Record<ProviderId, keyof Config['providerKeys']>> = {
brave: 'brave',
tavily: 'tavily',
exa: 'exa',
serper: 'serper',
'perplexity-sonar': 'perplexity',
'claude-web-search': 'anthropic',
'openai-responses': 'openai',
'openai-deep-research': 'openai',
'gemini-grounding': 'googleGenai',
'jina-reader': 'jina',
firecrawl: 'firecrawl',
scrapingbee: 'scrapingbee',
};
return map[providerId] ?? 'brave';
}

View file

@ -0,0 +1,150 @@
/**
* Extract-side executor. Same shape as executeSearch but for URL extraction.
*/
import type {
BillingMode,
ExtractedContent,
ExtractOptions,
ExtractProvider,
ProviderId,
ProviderMeta,
} from '@mana/shared-research';
import type { CreditsClient } from '../clients/mana-credits';
import type { Config } from '../config';
import { ProviderNotConfiguredError } from '../lib/errors';
import { priceFor } from '../lib/pricing';
import type { ConfigStorage } from '../storage/configs';
import { cacheGet, cacheKey, cacheSet } from '../lib/cache';
import { mapEnvKey } from './env-map';
export interface ExecuteExtractInput {
provider: ExtractProvider;
url: string;
options: ExtractOptions;
userId: string;
signal?: AbortSignal;
}
export interface ExecuteExtractOutput {
success: boolean;
data?: { content: ExtractedContent };
meta: ProviderMeta;
}
export interface ExecutorDeps {
credits: CreditsClient;
configs: ConfigStorage;
config: Config;
}
export async function executeExtract(
input: ExecuteExtractInput,
deps: ExecutorDeps
): Promise<ExecuteExtractOutput> {
const { provider, url, options, userId, signal } = input;
const providerId = provider.id;
const t0 = performance.now();
// Resolve API key (BYO → server → none)
let apiKey: string | null = null;
let billingMode: BillingMode = 'free';
if (provider.requiresApiKey) {
const userConfig = await deps.configs.getForUser(userId, providerId);
if (userConfig?.enabled && userConfig.apiKeyEncrypted) {
apiKey = await deps.configs.decryptKey(userConfig);
if (apiKey) billingMode = 'byo-key';
}
if (!apiKey) {
apiKey = deps.config.providerKeys[mapEnvKey(providerId)] ?? null;
if (apiKey) billingMode = 'server-key';
}
if (!apiKey) {
return makeError(providerId, t0, new ProviderNotConfiguredError(providerId));
}
} else if (providerId === 'jina-reader' && deps.config.providerKeys.jina) {
// jina-reader is zero-auth but a key lifts the rate limit
apiKey = deps.config.providerKeys.jina;
}
const price = billingMode === 'server-key' ? priceFor(providerId, 'extract') : 0;
const ckey = cacheKey('extract', providerId, url, options);
const cached = await cacheGet<{ content: ExtractedContent }>(ckey);
if (cached) {
return {
success: true,
data: cached,
meta: {
provider: providerId,
category: 'extract',
latencyMs: Math.round(performance.now() - t0),
costCredits: 0,
cacheHit: true,
billingMode,
},
};
}
let reservationId: string | null = null;
if (price > 0 && billingMode === 'server-key') {
try {
const reservation = await deps.credits.reserve(
userId,
price,
`research:${providerId}:extract`
);
reservationId = reservation.reservationId;
} catch (err) {
return makeError(providerId, t0, err as Error);
}
}
try {
const res = await provider.extract(url, options, { apiKey, userId, signal });
await cacheSet(ckey, { content: res.content }, deps.config.cacheTtlSeconds * 24);
if (reservationId) {
await deps.credits
.commit(reservationId, `extract ${providerId}`)
.catch((err) => console.warn('[executor] commit failed:', err));
}
return {
success: true,
data: { content: res.content },
meta: {
provider: providerId,
category: 'extract',
latencyMs: Math.round(performance.now() - t0),
costCredits: price,
cacheHit: false,
billingMode,
},
};
} catch (err) {
if (reservationId) {
await deps.credits
.refund(reservationId)
.catch((refundErr) => console.warn('[executor] refund failed:', refundErr));
}
return makeError(providerId, t0, err as Error);
}
}
function makeError(providerId: ProviderId, t0: number, err: Error): ExecuteExtractOutput {
const code = (err as { code?: string }).code ?? err.name ?? 'ERROR';
return {
success: false,
meta: {
provider: providerId,
category: 'extract',
latencyMs: Math.round(performance.now() - t0),
costCredits: 0,
cacheHit: false,
billingMode: 'free',
errorCode: code,
},
};
}

View file

@ -0,0 +1,153 @@
/**
* Core execution path: resolve key reserve credits call provider
* commit/refund persist result.
*
* Used by both /v1/search (single) and /v1/search/compare (fan-out).
*/
import type {
BillingMode,
ProviderId,
ProviderMeta,
SearchHit,
SearchOptions,
SearchProvider,
} from '@mana/shared-research';
import type { CreditsClient } from '../clients/mana-credits';
import type { Config } from '../config';
import { ProviderNotConfiguredError } from '../lib/errors';
import { priceFor } from '../lib/pricing';
import type { ConfigStorage } from '../storage/configs';
import { cacheGet, cacheKey, cacheSet } from '../lib/cache';
import { mapEnvKey } from './env-map';
export interface ExecuteSearchInput {
provider: SearchProvider;
query: string;
options: SearchOptions;
userId: string;
signal?: AbortSignal;
}
export interface ExecuteSearchOutput {
success: boolean;
data?: { results: SearchHit[] };
meta: ProviderMeta;
}
export interface ExecutorDeps {
credits: CreditsClient;
configs: ConfigStorage;
config: Config;
}
export async function executeSearch(
input: ExecuteSearchInput,
deps: ExecutorDeps
): Promise<ExecuteSearchOutput> {
const { provider, query, options, userId, signal } = input;
const providerId = provider.id;
const t0 = performance.now();
// Resolve API key (BYO first, then server)
let apiKey: string | null = null;
let billingMode: BillingMode = 'free';
if (provider.requiresApiKey) {
const userConfig = await deps.configs.getForUser(userId, providerId);
if (userConfig?.enabled && userConfig.apiKeyEncrypted) {
apiKey = await deps.configs.decryptKey(userConfig);
if (apiKey) billingMode = 'byo-key';
}
if (!apiKey) {
apiKey = deps.config.providerKeys[mapEnvKey(providerId)] ?? null;
if (apiKey) billingMode = 'server-key';
}
if (!apiKey) {
return makeError(providerId, t0, new ProviderNotConfiguredError(providerId));
}
}
const price = billingMode === 'server-key' ? priceFor(providerId, 'search') : 0;
// Cache check
const ckey = cacheKey('search', providerId, query, options);
const cached = await cacheGet<{ results: SearchHit[] }>(ckey);
if (cached) {
return {
success: true,
data: cached,
meta: {
provider: providerId,
category: 'search',
latencyMs: Math.round(performance.now() - t0),
costCredits: 0,
cacheHit: true,
billingMode,
},
};
}
// Reserve credits for paid server-key calls
let reservationId: string | null = null;
if (price > 0 && billingMode === 'server-key') {
try {
const reservation = await deps.credits.reserve(
userId,
price,
`research:${providerId}:search`
);
reservationId = reservation.reservationId;
} catch (err) {
return makeError(providerId, t0, err as Error);
}
}
// Execute provider
try {
const res = await provider.search(query, options, { apiKey, userId, signal });
await cacheSet(ckey, { results: res.results }, deps.config.cacheTtlSeconds);
if (reservationId) {
await deps.credits
.commit(reservationId, `search ${providerId}`)
.catch((err) => console.warn('[executor] commit failed:', err));
}
return {
success: true,
data: { results: res.results },
meta: {
provider: providerId,
category: 'search',
latencyMs: Math.round(performance.now() - t0),
costCredits: price,
cacheHit: false,
billingMode,
},
};
} catch (err) {
if (reservationId) {
await deps.credits
.refund(reservationId)
.catch((refundErr) => console.warn('[executor] refund failed:', refundErr));
}
return makeError(providerId, t0, err as Error);
}
}
function makeError(providerId: ProviderId, t0: number, err: Error): ExecuteSearchOutput {
const code = (err as { code?: string }).code ?? err.name ?? 'ERROR';
return {
success: false,
meta: {
provider: providerId,
category: 'search',
latencyMs: Math.round(performance.now() - t0),
costCredits: 0,
cacheHit: false,
billingMode: 'free',
errorCode: code,
},
};
}

View file

@ -0,0 +1,101 @@
/**
* mana-research Web Research Provider Orchestration
*
* Bundles search/extract/agent providers behind a unified interface.
* Phase 1: 4 search providers (SearXNG, DuckDuckGo, Brave, Tavily) with
* credits + cache + eval-run persistence.
*
* Port: 3068. See docs/plans/mana-research-service.md.
*/
import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { loadConfig } from './config';
import { getDb } from './db/connection';
import { serviceErrorHandler } from '@mana/shared-hono';
import { jwtAuth } from './middleware/jwt-auth';
import { serviceAuth } from './middleware/service-auth';
import { healthRoutes } from './routes/health';
import { createSearchRoutes } from './routes/search';
import { createExtractRoutes } from './routes/extract';
import { createProvidersRoutes } from './routes/providers';
import { createRunsRoutes } from './routes/runs';
import { buildRegistry } from './providers/registry';
import { RunStorage } from './storage/runs';
import { ConfigStorage } from './storage/configs';
import { CreditsClient } from './clients/mana-credits';
import { ManaSearchClient } from './clients/mana-search';
import { ManaLlmClient } from './clients/mana-llm';
import { initCache } from './lib/cache';
// ─── Bootstrap ──────────────────────────────────────────────
const config = loadConfig();
const db = getDb(config.databaseUrl);
initCache(config.redisUrl);
const manaSearch = new ManaSearchClient(config.manaSearchUrl);
const manaLlm = new ManaLlmClient(config.manaLlmUrl);
const credits = new CreditsClient({
baseUrl: config.manaCreditsUrl,
serviceKey: config.serviceKey,
});
const runStorage = new RunStorage(db);
const configStorage = new ConfigStorage(db);
const registry = buildRegistry({ manaSearch });
const executorDeps = {
credits,
configs: configStorage,
config,
};
// ─── App ────────────────────────────────────────────────────
const app = new Hono();
app.onError(serviceErrorHandler);
app.use(
'*',
cors({
origin: config.cors.origins,
credentials: true,
})
);
// Health (no auth)
app.route('/health', healthRoutes);
// Metrics stub (no auth) — will be populated in Phase 2 with prometheus-style output
app.get('/metrics', (c) => c.text('# mana-research metrics stub\n'));
// Providers catalog (no auth — callers often query this to build UIs)
app.route('/api/v1/providers', createProvidersRoutes(registry, config));
// User-facing research (JWT auth)
app.use('/api/v1/search/*', jwtAuth(config.manaAuthUrl));
app.route(
'/api/v1/search',
createSearchRoutes(registry, runStorage, executorDeps, config, manaLlm)
);
app.use('/api/v1/extract/*', jwtAuth(config.manaAuthUrl));
app.route('/api/v1/extract', createExtractRoutes(registry, runStorage, executorDeps, config));
app.use('/api/v1/runs/*', jwtAuth(config.manaAuthUrl));
app.route('/api/v1/runs', createRunsRoutes(runStorage));
// Service-to-service (X-Service-Key auth) — wired up in Phase 3 when mana-ai migrates
app.use('/api/v1/internal/*', serviceAuth(config.serviceKey));
app.get('/api/v1/internal/health', (c) => c.json({ ok: true }));
// ─── Start ──────────────────────────────────────────────────
console.log(`mana-research starting on port ${config.port}...`);
export default {
port: config.port,
fetch: app.fetch,
};

View file

@ -0,0 +1,59 @@
/**
* Redis cache wrapper. Graceful degradation if Redis is down, cache methods
* return null (miss) and set() is a no-op so the service still works.
*/
import Redis from 'ioredis';
import { createHash } from 'node:crypto';
let redis: Redis | null = null;
export function initCache(redisUrl: string) {
if (redis) return redis;
redis = new Redis(redisUrl, {
lazyConnect: true,
maxRetriesPerRequest: 2,
enableOfflineQueue: false,
});
redis.on('error', (err) => {
console.warn('[cache] redis error:', err.message);
});
redis.connect().catch((err) => {
console.warn('[cache] connect failed, running without cache:', err.message);
});
return redis;
}
export function cacheKey(
category: string,
providerId: string,
query: string,
opts: unknown
): string {
const h = createHash('sha256');
h.update(providerId);
h.update('\0');
h.update(query);
h.update('\0');
h.update(JSON.stringify(opts ?? {}));
return `research:${category}:${providerId}:${h.digest('hex').slice(0, 32)}`;
}
export async function cacheGet<T>(key: string): Promise<T | null> {
if (!redis || redis.status !== 'ready') return null;
try {
const raw = await redis.get(key);
return raw ? (JSON.parse(raw) as T) : null;
} catch {
return null;
}
}
export async function cacheSet<T>(key: string, value: T, ttlSeconds: number): Promise<void> {
if (!redis || redis.status !== 'ready') return;
try {
await redis.setex(key, ttlSeconds, JSON.stringify(value));
} catch {
/* ignore */
}
}

View file

@ -0,0 +1,54 @@
/**
* Custom errors for mana-research. All extend HTTPException so `serviceErrorHandler`
* from @mana/shared-hono renders them with status + legacy `{ statusCode, message, details }`.
*/
import { HTTPException } from 'hono/http-exception';
export class BadRequestError extends HTTPException {
constructor(message: string, cause?: Record<string, unknown>) {
super(400, { message, cause });
}
}
export class UnauthorizedError extends HTTPException {
constructor(message = 'Unauthorized') {
super(401, { message });
}
}
export class NotFoundError extends HTTPException {
constructor(message = 'Not found') {
super(404, { message });
}
}
export class InsufficientCreditsError extends HTTPException {
constructor(
public readonly required: number,
public readonly available: number
) {
super(402, {
message: 'Insufficient credits',
cause: { required, available },
});
}
}
export class ProviderError extends HTTPException {
constructor(providerId: string, message: string, status: 500 | 502 | 503 = 502) {
super(status, {
message: `Provider "${providerId}" error: ${message}`,
cause: { providerId },
});
}
}
export class ProviderNotConfiguredError extends HTTPException {
constructor(providerId: string) {
super(501, {
message: `Provider "${providerId}" is not configured — no API key available`,
cause: { providerId },
});
}
}

View file

@ -0,0 +1,16 @@
/**
* Hono Context Variables typing for mana-research.
*
* Allows `c.get('user')` / `c.set('user', ...)` to be typed throughout the
* service. Used via `new Hono<{ Variables: HonoVariables }>()`.
*/
import type { AuthUser } from '../middleware/jwt-auth';
export interface HonoVariables {
user: AuthUser;
appId?: string;
service?: boolean;
}
export type HonoEnv = { Variables: HonoVariables };

View file

@ -0,0 +1,42 @@
/**
* Provider pricing in credits. 1 credit 1 cent EUR (matches mana-credits).
*
* Keep in sync with docs/plans/mana-research-service.md §2. Review quarterly.
* Prices as of 2026-04-17.
*/
import type { ProviderId } from '@mana/shared-research';
export const PROVIDER_PRICING: Record<
ProviderId,
{ search?: number; extract?: number; research?: number }
> = {
// Search providers
searxng: { search: 0 },
duckduckgo: { search: 0 },
brave: { search: 5 },
tavily: { search: 8 },
exa: { search: 6 },
serper: { search: 1 },
// Extract providers
readability: { extract: 0 },
'jina-reader': { extract: 1 },
firecrawl: { extract: 10 },
scrapingbee: { extract: 8 },
// Research agents
'perplexity-sonar': { research: 50 },
'claude-web-search': { research: 200 },
'openai-responses': { research: 200 },
'gemini-grounding': { research: 100 },
'openai-deep-research': { research: 1000 },
};
export function priceFor(
providerId: ProviderId,
operation: 'search' | 'extract' | 'research'
): number {
const entry = PROVIDER_PRICING[providerId];
return entry?.[operation] ?? 0;
}

View file

@ -0,0 +1,54 @@
/**
* JWT Authentication Middleware
*
* Validates Bearer tokens via JWKS from mana-auth. Mirrors mana-credits pattern.
*/
import type { MiddlewareHandler } from 'hono';
import { createRemoteJWKSet, jwtVerify } from 'jose';
import { UnauthorizedError } from '../lib/errors';
let jwks: ReturnType<typeof createRemoteJWKSet> | null = null;
function getJwks(authUrl: string) {
if (!jwks) {
jwks = createRemoteJWKSet(new URL('/api/auth/jwks', authUrl));
}
return jwks;
}
export interface AuthUser {
userId: string;
email: string;
role: string;
tier?: string;
}
export function jwtAuth(authUrl: string): MiddlewareHandler {
return async (c, next) => {
const authHeader = c.req.header('Authorization');
if (!authHeader?.startsWith('Bearer ')) {
throw new UnauthorizedError('Missing or invalid Authorization header');
}
const token = authHeader.slice(7);
try {
const { payload } = await jwtVerify(token, getJwks(authUrl), {
issuer: authUrl,
audience: 'mana',
});
const user: AuthUser = {
userId: payload.sub || '',
email: (payload.email as string) || '',
role: (payload.role as string) || 'user',
tier: payload.tier as string | undefined,
};
c.set('user', user);
await next();
} catch {
throw new UnauthorizedError('Invalid or expired token');
}
};
}

View file

@ -0,0 +1,22 @@
/**
* Service-to-Service Authentication Middleware
*
* Validates X-Service-Key header for backend-to-backend calls.
* Used by /internal/* routes and for calls from mana-ai.
*/
import type { MiddlewareHandler } from 'hono';
import { UnauthorizedError } from '../lib/errors';
export function serviceAuth(serviceKey: string): MiddlewareHandler {
return async (c, next) => {
const key = c.req.header('X-Service-Key');
if (!key || key !== serviceKey) {
throw new UnauthorizedError('Invalid or missing service key');
}
const appId = c.req.header('X-App-Id') || 'unknown';
c.set('appId', appId);
c.set('service', true);
await next();
};
}

View file

@ -0,0 +1,86 @@
/**
* Firecrawl Playwright-based JS rendering + LLM-friendly Markdown output.
* Docs: https://docs.firecrawl.dev/api-reference/endpoint/scrape
*
* Pay-per-use credits. Self-hostable via Docker (then set FIRECRAWL_API_URL to
* your own instance and any non-empty key works).
*/
import type { ExtractProvider } from '@mana/shared-research';
import { ProviderError, ProviderNotConfiguredError } from '../../lib/errors';
interface FirecrawlApiResponse {
success: boolean;
data?: {
markdown?: string;
html?: string;
metadata?: {
title?: string;
description?: string;
language?: string;
sourceURL?: string;
author?: string;
publishedTime?: string;
ogSiteName?: string;
};
};
error?: string;
}
export function createFirecrawlProvider(apiUrl = 'https://api.firecrawl.dev'): ExtractProvider {
return {
id: 'firecrawl',
requiresApiKey: true,
capabilities: {
jsRendering: true,
pdfSupport: true,
markdownOutput: true,
},
async extract(url, options, ctx) {
if (!ctx.apiKey) throw new ProviderNotConfiguredError('firecrawl');
const t0 = performance.now();
const res = await fetch(`${apiUrl}/v1/scrape`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${ctx.apiKey}`,
},
body: JSON.stringify({
url,
formats: ['markdown'],
onlyMainContent: true,
timeout: options.timeoutMs ?? 30000,
}),
signal: ctx.signal,
});
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new ProviderError('firecrawl', `HTTP ${res.status} ${body.slice(0, 200)}`);
}
const data = (await res.json()) as FirecrawlApiResponse;
if (!data.success || !data.data) {
throw new ProviderError('firecrawl', data.error ?? 'extraction failed');
}
const md = (data.data.markdown ?? '').slice(0, options.maxLength ?? 50000);
const meta = data.data.metadata ?? {};
return {
content: {
url: meta.sourceURL ?? url,
title: meta.title ?? '',
content: md,
excerpt: meta.description,
author: meta.author,
siteName: meta.ogSiteName,
publishedAt: meta.publishedTime,
wordCount: md.split(/\s+/).filter(Boolean).length,
providerRaw: data,
},
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,66 @@
/**
* Jina Reader `https://r.jina.ai/<url>` returns extracted Markdown.
* Free tier: 1M tokens/month without key. Paid tier via `JINA_API_KEY` lifts rate limit.
*
* The service is markedly better than plain Readability on JS-heavy sites.
*/
import type { ExtractProvider } from '@mana/shared-research';
import { ProviderError } from '../../lib/errors';
export function createJinaReaderProvider(): ExtractProvider {
return {
id: 'jina-reader',
requiresApiKey: false,
capabilities: {
jsRendering: true,
pdfSupport: true,
markdownOutput: true,
},
async extract(url, options, ctx) {
const t0 = performance.now();
const readerUrl = `https://r.jina.ai/${url}`;
const headers: Record<string, string> = {
Accept: 'application/json',
'X-Return-Format': 'markdown',
};
if (ctx.apiKey) headers.Authorization = `Bearer ${ctx.apiKey}`;
if (options.timeoutMs) headers['X-Timeout'] = String(Math.round(options.timeoutMs / 1000));
const res = await fetch(readerUrl, { headers, signal: ctx.signal });
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new ProviderError('jina-reader', `HTTP ${res.status} ${body.slice(0, 200)}`);
}
type JinaResponse = {
data?: {
title?: string;
url?: string;
content?: string;
description?: string;
publishedTime?: string;
};
};
const data = (await res.json()) as JinaResponse;
const d = data.data ?? {};
const content = (d.content ?? '').slice(0, options.maxLength ?? 50000);
const wordCount = content.split(/\s+/).filter(Boolean).length;
return {
content: {
url: d.url ?? url,
title: d.title ?? '',
content,
excerpt: d.description,
publishedAt: d.publishedTime,
wordCount,
providerRaw: data,
},
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,43 @@
/**
* Readability Extract Provider wraps mana-search /api/v1/extract (go-readability).
* Free (self-hosted), no JS rendering. Good baseline for simple HTML.
*/
import type { ExtractProvider } from '@mana/shared-research';
import { ProviderError } from '../../lib/errors';
import type { ManaSearchClient } from '../../clients/mana-search';
export function createReadabilityProvider(client: ManaSearchClient): ExtractProvider {
return {
id: 'readability',
requiresApiKey: false,
capabilities: {
jsRendering: false,
pdfSupport: false,
markdownOutput: true,
},
async extract(url, options, ctx) {
const t0 = performance.now();
const res = await client.extract(url, { maxLength: options.maxLength, signal: ctx.signal });
if (!res.success || !res.content) {
throw new ProviderError('readability', 'extraction failed');
}
const c = res.content;
return {
content: {
url,
title: c.title,
content: c.markdown ?? c.text,
author: c.author,
siteName: c.siteName,
publishedAt: c.publishedDate,
wordCount: c.wordCount,
providerRaw: res,
},
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,85 @@
/**
* Provider registry maps provider IDs to their instances + metadata.
*/
import type {
AgentProviderId,
ExtractProvider,
ExtractProviderId,
ProviderId,
SearchProvider,
SearchProviderId,
} from '@mana/shared-research';
import { BadRequestError } from '../lib/errors';
import type { ManaSearchClient } from '../clients/mana-search';
import { createBraveProvider } from './search/brave';
import { createDuckDuckGoProvider } from './search/duckduckgo';
import { createExaProvider } from './search/exa';
import { createSearxngProvider } from './search/searxng';
import { createSerperProvider } from './search/serper';
import { createTavilyProvider } from './search/tavily';
import { createFirecrawlProvider } from './extract/firecrawl';
import { createJinaReaderProvider } from './extract/jina-reader';
import { createReadabilityProvider } from './extract/readability';
export interface ProviderRegistry {
search: Map<SearchProviderId, SearchProvider>;
extract: Map<ExtractProviderId, ExtractProvider>;
}
export function buildRegistry(deps: { manaSearch: ManaSearchClient }): ProviderRegistry {
const search = new Map<SearchProviderId, SearchProvider>();
search.set('searxng', createSearxngProvider(deps.manaSearch));
search.set('duckduckgo', createDuckDuckGoProvider());
search.set('brave', createBraveProvider());
search.set('tavily', createTavilyProvider());
search.set('exa', createExaProvider());
search.set('serper', createSerperProvider());
const extract = new Map<ExtractProviderId, ExtractProvider>();
extract.set('readability', createReadabilityProvider(deps.manaSearch));
extract.set('jina-reader', createJinaReaderProvider());
extract.set('firecrawl', createFirecrawlProvider());
return { search, extract };
}
export function getSearchProvider(registry: ProviderRegistry, id: string): SearchProvider {
const provider = registry.search.get(id as SearchProviderId);
if (!provider) {
throw new BadRequestError(`Unknown search provider: ${id}`, {
available: [...registry.search.keys()],
});
}
return provider;
}
export function getExtractProvider(registry: ProviderRegistry, id: string): ExtractProvider {
const provider = registry.extract.get(id as ExtractProviderId);
if (!provider) {
throw new BadRequestError(`Unknown extract provider: ${id}`, {
available: [...registry.extract.keys()],
});
}
return provider;
}
export function listProviders(registry: ProviderRegistry) {
return {
search: [...registry.search.values()].map((p) => ({
id: p.id,
category: 'search' as const,
requiresApiKey: p.requiresApiKey,
capabilities: p.capabilities,
})),
extract: [...registry.extract.values()].map((p) => ({
id: p.id,
category: 'extract' as const,
requiresApiKey: p.requiresApiKey,
capabilities: p.capabilities,
})),
agent: [] as Array<{ id: AgentProviderId; category: 'agent'; requiresApiKey: boolean }>,
};
}
export type { ProviderId };

View file

@ -0,0 +1,74 @@
/**
* Brave Search API provider.
* Docs: https://api.search.brave.com/app/documentation/web-search/get-started
*
* Pay-per-use (Data for Search plan, $5 / 1k queries). Requires X-Subscription-Token header.
*/
import type { SearchProvider } from '@mana/shared-research';
import { ProviderError, ProviderNotConfiguredError } from '../../lib/errors';
interface BraveApiResponse {
web?: {
results?: Array<{
url: string;
title: string;
description: string;
age?: string;
page_age?: string;
profile?: { name?: string };
}>;
};
}
export function createBraveProvider(): SearchProvider {
return {
id: 'brave',
requiresApiKey: true,
capabilities: {
webSearch: true,
newsSearch: true,
},
async search(query, options, ctx) {
if (!ctx.apiKey) throw new ProviderNotConfiguredError('brave');
const t0 = performance.now();
const params = new URLSearchParams({
q: query,
count: String(options.limit ?? 10),
});
if (options.language) params.set('search_lang', options.language.split('-')[0]);
if (options.safeSearch !== undefined) {
params.set('safesearch', ['off', 'moderate', 'strict'][options.safeSearch] || 'moderate');
}
const res = await fetch(`https://api.search.brave.com/res/v1/web/search?${params}`, {
headers: {
Accept: 'application/json',
'X-Subscription-Token': ctx.apiKey,
},
signal: ctx.signal,
});
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new ProviderError('brave', `HTTP ${res.status} ${body.slice(0, 200)}`);
}
const data = (await res.json()) as BraveApiResponse;
const webResults = data.web?.results ?? [];
return {
results: webResults.map((r) => ({
url: r.url,
title: r.title,
snippet: r.description,
publishedAt: r.page_age ?? r.age,
author: r.profile?.name,
providerRaw: r,
})),
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,68 @@
/**
* DuckDuckGo provider uses DDG HTML search via the unofficial "html.duckduckgo.com"
* endpoint. Zero-auth, zero-cost, but heavily rate-limited in practice.
*
* For Phase 1 we keep this minimal: good as a free fallback / sanity check.
*/
import type { SearchProvider, SearchHit } from '@mana/shared-research';
import { ProviderError } from '../../lib/errors';
interface DDGInstantAnswer {
Heading?: string;
Abstract?: string;
AbstractURL?: string;
RelatedTopics?: Array<{
Text?: string;
FirstURL?: string;
Icon?: unknown;
}>;
}
export function createDuckDuckGoProvider(): SearchProvider {
return {
id: 'duckduckgo',
requiresApiKey: false,
capabilities: {
webSearch: true,
},
async search(query, options, ctx) {
const t0 = performance.now();
const url = `https://api.duckduckgo.com/?q=${encodeURIComponent(query)}&format=json&no_html=1&no_redirect=1`;
const res = await fetch(url, {
signal: ctx.signal,
headers: { 'User-Agent': 'Mozilla/5.0 (mana-research)' },
});
if (!res.ok) {
throw new ProviderError('duckduckgo', `HTTP ${res.status}`);
}
const data = (await res.json()) as DDGInstantAnswer;
const hits: SearchHit[] = [];
if (data.AbstractURL && data.Abstract) {
hits.push({
url: data.AbstractURL,
title: data.Heading ?? query,
snippet: data.Abstract,
providerRaw: data,
});
}
for (const topic of data.RelatedTopics ?? []) {
if (!topic.FirstURL || !topic.Text) continue;
hits.push({
url: topic.FirstURL,
title: topic.Text.slice(0, 80),
snippet: topic.Text,
providerRaw: topic,
});
if (hits.length >= (options.limit ?? 10)) break;
}
return {
results: hits,
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,76 @@
/**
* Exa (formerly Metaphor) semantic/neural search.
* Docs: https://docs.exa.ai/reference/search
*
* Best for "find similar to this", academic papers, long-tail technical queries.
* Pay-per-use, ~$0.0010.01/query depending on options.
*/
import type { SearchProvider } from '@mana/shared-research';
import { ProviderError, ProviderNotConfiguredError } from '../../lib/errors';
interface ExaApiResponse {
results: Array<{
id: string;
url: string;
title: string;
publishedDate?: string;
author?: string;
score?: number;
text?: string;
}>;
}
export function createExaProvider(): SearchProvider {
return {
id: 'exa',
requiresApiKey: true,
capabilities: {
webSearch: true,
semanticSearch: true,
scholarSearch: true,
contentInResults: true,
},
async search(query, options, ctx) {
if (!ctx.apiKey) throw new ProviderNotConfiguredError('exa');
const t0 = performance.now();
const res = await fetch('https://api.exa.ai/search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': ctx.apiKey,
},
body: JSON.stringify({
query,
numResults: Math.min(options.limit ?? 10, 25),
type: 'neural',
useAutoprompt: true,
contents: { text: { maxCharacters: 2000 } },
}),
signal: ctx.signal,
});
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new ProviderError('exa', `HTTP ${res.status} ${body.slice(0, 200)}`);
}
const data = (await res.json()) as ExaApiResponse;
return {
results: data.results.map((r) => ({
url: r.url,
title: r.title,
snippet: r.text?.slice(0, 300) ?? '',
content: r.text,
publishedAt: r.publishedDate,
author: r.author,
score: r.score,
providerRaw: r,
})),
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,48 @@
/**
* SearXNG provider wraps mana-search (Go service on port 3021).
* Free, self-hosted, no API key. Always available.
*/
import type { SearchProvider } from '@mana/shared-research';
import type { ManaSearchClient } from '../../clients/mana-search';
export function createSearxngProvider(client: ManaSearchClient): SearchProvider {
return {
id: 'searxng',
requiresApiKey: false,
capabilities: {
webSearch: true,
newsSearch: true,
scholarSearch: true,
},
async search(query, options, ctx) {
const t0 = performance.now();
const categoryMap: Record<string, string> = {
general: 'general',
news: 'news',
science: 'science',
it: 'it',
};
const categories = options.categories?.map((c) => categoryMap[c]).filter(Boolean);
const res = await client.search(query, {
limit: options.limit,
categories,
language: options.language,
signal: ctx.signal,
});
return {
results: res.results.map((r) => ({
url: r.url,
title: r.title,
snippet: r.snippet,
publishedAt: r.publishedDate,
score: r.score,
providerRaw: r,
})),
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,78 @@
/**
* Serper Google SERP as JSON.
* Docs: https://serper.dev/
*
* Good for classic Google search coverage (incl. People Also Ask, Knowledge Panel).
* $0.301 / 1k queries. Pay-per-use.
*/
import type { SearchProvider } from '@mana/shared-research';
import { ProviderError, ProviderNotConfiguredError } from '../../lib/errors';
interface SerperApiResponse {
organic?: Array<{
title: string;
link: string;
snippet: string;
date?: string;
position?: number;
}>;
answerBox?: {
title?: string;
answer?: string;
snippet?: string;
link?: string;
};
}
export function createSerperProvider(): SearchProvider {
return {
id: 'serper',
requiresApiKey: true,
capabilities: {
webSearch: true,
newsSearch: true,
},
async search(query, options, ctx) {
if (!ctx.apiKey) throw new ProviderNotConfiguredError('serper');
const t0 = performance.now();
const [gl, hl] = (options.language ?? 'de-DE').toLowerCase().split('-');
const res = await fetch('https://google.serper.dev/search', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-API-KEY': ctx.apiKey,
},
body: JSON.stringify({
q: query,
num: Math.min(options.limit ?? 10, 20),
gl: hl || gl,
hl: gl,
}),
signal: ctx.signal,
});
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new ProviderError('serper', `HTTP ${res.status} ${body.slice(0, 200)}`);
}
const data = (await res.json()) as SerperApiResponse;
const organic = data.organic ?? [];
return {
results: organic.map((r) => ({
url: r.link,
title: r.title,
snippet: r.snippet,
publishedAt: r.date,
score: r.position ? 1 - r.position / 100 : undefined,
providerRaw: r,
})),
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,75 @@
/**
* Tavily Search API optimized for LLM agents. Returns extracted content
* alongside links, which is why we map it to `content` on SearchHit.
*
* Docs: https://docs.tavily.com/docs/rest-api/api-reference
* Billing: credit-packs (pay-per-use, no subscription lock-in).
*/
import type { SearchProvider } from '@mana/shared-research';
import { ProviderError, ProviderNotConfiguredError } from '../../lib/errors';
interface TavilyApiResponse {
query: string;
answer?: string;
results: Array<{
url: string;
title: string;
content: string;
score: number;
published_date?: string;
}>;
}
export function createTavilyProvider(): SearchProvider {
return {
id: 'tavily',
requiresApiKey: true,
capabilities: {
webSearch: true,
newsSearch: true,
contentInResults: true,
},
async search(query, options, ctx) {
if (!ctx.apiKey) throw new ProviderNotConfiguredError('tavily');
const t0 = performance.now();
const topic = options.categories?.includes('news') ? 'news' : 'general';
const maxResults = Math.min(options.limit ?? 10, 20);
const res = await fetch('https://api.tavily.com/search', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
api_key: ctx.apiKey,
query,
topic,
max_results: maxResults,
include_answer: false,
search_depth: 'basic',
}),
signal: ctx.signal,
});
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new ProviderError('tavily', `HTTP ${res.status} ${body.slice(0, 200)}`);
}
const data = (await res.json()) as TavilyApiResponse;
return {
results: data.results.map((r) => ({
url: r.url,
title: r.title,
snippet: r.content.slice(0, 300),
content: r.content,
publishedAt: r.published_date,
score: r.score,
providerRaw: r,
})),
rawLatencyMs: Math.round(performance.now() - t0),
};
},
};
}

View file

@ -0,0 +1,70 @@
/**
* Auto-router maps QueryType + available providers to an ordered preference
* list. The first provider in the returned list that has a valid API key wins.
*/
import type { SearchProviderId, ExtractProviderId } from '@mana/shared-research';
import type { Config } from '../config';
import type { QueryType } from './classify';
export const SEARCH_ROUTE_MAP: Record<QueryType, SearchProviderId[]> = {
news: ['tavily', 'brave', 'serper', 'searxng', 'duckduckgo'],
general: ['brave', 'tavily', 'serper', 'searxng', 'duckduckgo'],
semantic: ['exa', 'tavily', 'brave', 'searxng'],
academic: ['exa', 'searxng', 'brave', 'tavily'],
code: ['exa', 'serper', 'brave', 'searxng'],
conversational: ['tavily', 'brave', 'serper', 'searxng'],
};
export const EXTRACT_ROUTE_DEFAULT: ExtractProviderId[] = [
'firecrawl',
'jina-reader',
'readability',
];
/**
* Pick the first provider in `preferences` that has a configured key (or
* doesn't need one). Returns `null` if no provider is usable caller should
* fall back to free providers.
*/
export function pickSearchProvider(
preferences: SearchProviderId[],
config: Config,
alwaysAvailable: Set<SearchProviderId> = new Set(['searxng', 'duckduckgo'])
): SearchProviderId | null {
const envMap: Record<SearchProviderId, keyof Config['providerKeys'] | null> = {
searxng: null,
duckduckgo: null,
brave: 'brave',
tavily: 'tavily',
exa: 'exa',
serper: 'serper',
};
for (const id of preferences) {
if (alwaysAvailable.has(id)) return id;
const envKey = envMap[id];
if (envKey && config.providerKeys[envKey]) return id;
}
return null;
}
export function pickExtractProvider(
preferences: ExtractProviderId[],
config: Config,
alwaysAvailable: Set<ExtractProviderId> = new Set(['readability', 'jina-reader'])
): ExtractProviderId | null {
const envMap: Record<ExtractProviderId, keyof Config['providerKeys'] | null> = {
readability: null,
'jina-reader': 'jina',
firecrawl: 'firecrawl',
scrapingbee: 'scrapingbee',
};
for (const id of preferences) {
if (alwaysAvailable.has(id)) return id;
const envKey = envMap[id];
if (envKey && config.providerKeys[envKey]) return id;
}
return null;
}

View file

@ -0,0 +1,107 @@
/**
* Query classifier maps a free-text query to a QueryType.
*
* Hybrid strategy:
* 1. Regex fast-path (no network, ~0ms) catches the obvious cases
* (URLs, "paper"/"arxiv", "news"/"latest", "github"/"code", etc.)
* 2. Optional mana-llm fallback for ambiguous queries. Off by default
* callers opt-in via `useLlm: true` when latency is OK.
*/
import type { ManaLlmClient } from '../clients/mana-llm';
export type QueryType = 'news' | 'general' | 'semantic' | 'academic' | 'code' | 'conversational';
export interface ClassifyOptions {
useLlm?: boolean;
signal?: AbortSignal;
llm?: ManaLlmClient;
}
const NEWS_PATTERNS = [
/\b(latest|recent|news|heute|aktuell|today|yesterday|breaking|gerade|neueste|this week)\b/i,
];
const ACADEMIC_PATTERNS = [
/\b(paper|arxiv|research|study|studie|journal|doi|citation|pubmed|nature|science)\b/i,
];
const CODE_PATTERNS = [
/\b(github|code|function|library|framework|npm package|pip install|error:|exception:|stack trace)\b/i,
/[a-z_]+\([^)]*\)/i,
];
const CONVERSATIONAL_PATTERNS = [
/^(how|why|what|when|where|who|can you|could you|should i|is there|erklär|explain|zusammenfass)/i,
/\?\s*$/,
];
const SEMANTIC_PATTERNS = [
/\b(similar to|like this|related to|ähnlich|vergleichbar|find sites like)\b/i,
/^https?:\/\//,
];
export function classifyRegex(query: string): { type: QueryType; confidence: number } {
const q = query.trim();
for (const p of SEMANTIC_PATTERNS) if (p.test(q)) return { type: 'semantic', confidence: 0.9 };
for (const p of ACADEMIC_PATTERNS) if (p.test(q)) return { type: 'academic', confidence: 0.8 };
for (const p of CODE_PATTERNS) if (p.test(q)) return { type: 'code', confidence: 0.8 };
for (const p of NEWS_PATTERNS) if (p.test(q)) return { type: 'news', confidence: 0.7 };
for (const p of CONVERSATIONAL_PATTERNS)
if (p.test(q)) return { type: 'conversational', confidence: 0.6 };
return { type: 'general', confidence: 0.4 };
}
const LLM_PROMPT = `You are a query classifier. Given a user search query, respond with exactly one word from this list: news, general, semantic, academic, code, conversational.
Guidelines:
- news: current events, latest updates, breaking stories
- academic: research papers, scientific literature, DOIs, arXiv
- code: programming questions, libraries, errors, GitHub
- semantic: "find similar to", URL-based, related-to queries
- conversational: open-ended questions, "how does X work"
- general: anything else
Respond with just the label, no punctuation.`;
export async function classify(
query: string,
opts: ClassifyOptions = {}
): Promise<{ type: QueryType; confidence: number; source: 'regex' | 'llm' }> {
const regex = classifyRegex(query);
if (!opts.useLlm || !opts.llm || regex.confidence >= 0.8) {
return { ...regex, source: 'regex' };
}
try {
const { content } = await opts.llm.chat(
[
{ role: 'system', content: LLM_PROMPT },
{ role: 'user', content: query },
],
{ maxTokens: 10, temperature: 0, signal: opts.signal }
);
const raw = content
.trim()
.toLowerCase()
.replace(/[^a-z]/g, '');
const valid: QueryType[] = [
'news',
'general',
'semantic',
'academic',
'code',
'conversational',
];
if ((valid as string[]).includes(raw)) {
return { type: raw as QueryType, confidence: 0.9, source: 'llm' };
}
} catch {
/* fall through */
}
return { ...regex, source: 'regex' };
}

View file

@ -0,0 +1,157 @@
/**
* POST /v1/extract single-provider extract
* POST /v1/extract/compare fan-out
*/
import { Hono } from 'hono';
import { z } from 'zod';
import type { ExtractedContent } from '@mana/shared-research';
import { EXTRACT_PROVIDER_IDS, extractOptionsSchema } from '@mana/shared-research';
import type { ExecutorDeps } from '../executor/execute-extract';
import { executeExtract } from '../executor/execute-extract';
import type { HonoEnv } from '../lib/hono-env';
import type { ProviderRegistry } from '../providers/registry';
import { getExtractProvider } from '../providers/registry';
import type { RunStorage } from '../storage/runs';
import { BadRequestError } from '../lib/errors';
import type { Config } from '../config';
import { EXTRACT_ROUTE_DEFAULT, pickExtractProvider } from '../router/auto-route';
const MAX_COMPARE_PROVIDERS = 4;
const extractBodySchema = z.object({
url: z.string().url(),
provider: z.enum(EXTRACT_PROVIDER_IDS).optional(),
options: extractOptionsSchema.optional(),
});
const extractCompareBodySchema = z.object({
url: z.string().url(),
providers: z.array(z.enum(EXTRACT_PROVIDER_IDS)).min(1).max(MAX_COMPARE_PROVIDERS),
options: extractOptionsSchema.optional(),
});
export function createExtractRoutes(
registry: ProviderRegistry,
storage: RunStorage,
deps: ExecutorDeps,
config: Config
) {
return new Hono<HonoEnv>()
.post('/', async (c) => {
const user = c.get('user');
const body = extractBodySchema.parse(await c.req.json());
const providerId =
body.provider ?? pickExtractProvider(EXTRACT_ROUTE_DEFAULT, config) ?? 'readability';
const provider = getExtractProvider(registry, providerId);
const run = await storage.createRun({
userId: user.userId,
query: body.url,
mode: body.provider ? 'single' : 'auto',
category: 'extract',
providersRequested: [providerId],
billingMode: provider.requiresApiKey ? 'server-key' : 'free',
});
const out = await executeExtract(
{
provider,
url: body.url,
options: body.options ?? {},
userId: user.userId,
},
deps
);
await storage.addResult({
runId: run.id,
providerId,
success: out.success,
latencyMs: out.meta.latencyMs,
costCredits: out.meta.costCredits,
billingMode: out.meta.billingMode,
cacheHit: out.meta.cacheHit,
normalizedResult: out.data ?? null,
errorCode: out.meta.errorCode ?? null,
});
if (out.meta.costCredits > 0) {
await storage.finalizeRunCost(run.id, out.meta.costCredits);
}
return c.json({
runId: run.id,
url: body.url,
provider: providerId,
success: out.success,
data: out.data,
meta: out.meta,
});
})
.post('/compare', async (c) => {
const user = c.get('user');
const body = extractCompareBodySchema.parse(await c.req.json());
if (new Set(body.providers).size !== body.providers.length) {
throw new BadRequestError('Duplicate providers in request');
}
const providers = body.providers.map((id) => getExtractProvider(registry, id));
const anyPaid = providers.some((p) => p.requiresApiKey);
const run = await storage.createRun({
userId: user.userId,
query: body.url,
mode: 'compare',
category: 'extract',
providersRequested: body.providers,
billingMode: anyPaid ? 'mixed' : 'free',
});
const settled = await Promise.all(
providers.map((provider) =>
executeExtract(
{
provider,
url: body.url,
options: body.options ?? {},
userId: user.userId,
},
deps
)
)
);
let totalCost = 0;
for (let i = 0; i < providers.length; i++) {
const out = settled[i];
totalCost += out.meta.costCredits;
await storage.addResult({
runId: run.id,
providerId: providers[i].id,
success: out.success,
latencyMs: out.meta.latencyMs,
costCredits: out.meta.costCredits,
billingMode: out.meta.billingMode,
cacheHit: out.meta.cacheHit,
normalizedResult: out.data ?? null,
errorCode: out.meta.errorCode ?? null,
});
}
if (totalCost > 0) await storage.finalizeRunCost(run.id, totalCost);
return c.json({
runId: run.id,
url: body.url,
results: providers.map((provider, i) => ({
provider: provider.id,
success: settled[i].success,
data: settled[i].data as { content: ExtractedContent } | undefined,
meta: settled[i].meta,
})),
});
});
}

View file

@ -0,0 +1,10 @@
import { Hono } from 'hono';
export const healthRoutes = new Hono().get('/', (c) =>
c.json({
status: 'ok',
service: 'mana-research',
version: '0.1.0',
timestamp: new Date().toISOString(),
})
);

View file

@ -0,0 +1,77 @@
/**
* GET /v1/providers registered providers with capabilities + pricing
* GET /v1/providers/health which providers are currently usable (key present)
*/
import { Hono } from 'hono';
import type { ProviderRegistry } from '../providers/registry';
import { listProviders } from '../providers/registry';
import { PROVIDER_PRICING } from '../lib/pricing';
import type { Config } from '../config';
export function createProvidersRoutes(registry: ProviderRegistry, config: Config) {
return new Hono()
.get('/', (c) => {
const list = listProviders(registry);
return c.json({
search: list.search.map((p) => ({ ...p, pricing: PROVIDER_PRICING[p.id] })),
extract: list.extract.map((p) => ({ ...p, pricing: PROVIDER_PRICING[p.id] })),
agent: list.agent,
});
})
.get('/health', (c) => {
const keys = config.providerKeys;
type Entry = {
id: string;
category: 'search' | 'extract' | 'agent';
requiresApiKey: boolean;
serverKeyAvailable: boolean;
status: 'ready' | 'needs-key' | 'free';
};
const check = (
id: string,
requiresKey: boolean,
serverKeyPresent: boolean
): Entry['status'] => {
if (!requiresKey) return 'free';
return serverKeyPresent ? 'ready' : 'needs-key';
};
const keyMap: Record<string, boolean> = {
brave: !!keys.brave,
tavily: !!keys.tavily,
exa: !!keys.exa,
serper: !!keys.serper,
'jina-reader': !!keys.jina,
firecrawl: !!keys.firecrawl,
scrapingbee: !!keys.scrapingbee,
};
const list = listProviders(registry);
const entries: Entry[] = [
...list.search.map((p) => ({
id: p.id,
category: 'search' as const,
requiresApiKey: p.requiresApiKey,
serverKeyAvailable: !!keyMap[p.id],
status: check(p.id, p.requiresApiKey, !!keyMap[p.id]),
})),
...list.extract.map((p) => ({
id: p.id,
category: 'extract' as const,
requiresApiKey: p.requiresApiKey,
serverKeyAvailable: !!keyMap[p.id],
status: check(p.id, p.requiresApiKey, !!keyMap[p.id]),
})),
];
return c.json({
providers: entries,
summary: {
ready: entries.filter((e) => e.status === 'ready' || e.status === 'free').length,
total: entries.length,
},
});
});
}

View file

@ -0,0 +1,44 @@
/**
* /v1/runs user's saved eval runs + per-result rating.
*/
import { Hono } from 'hono';
import { z } from 'zod';
import type { HonoEnv } from '../lib/hono-env';
import { BadRequestError, NotFoundError } from '../lib/errors';
import type { RunStorage } from '../storage/runs';
const rateSchema = z.object({
rating: z.number().int().min(1).max(5),
notes: z.string().max(2000).optional(),
});
export function createRunsRoutes(storage: RunStorage) {
return new Hono<HonoEnv>()
.get('/', async (c) => {
const user = c.get('user');
const limit = Math.min(parseInt(c.req.query('limit') || '50', 10), 200);
const offset = parseInt(c.req.query('offset') || '0', 10);
const runs = await storage.listRuns(user.userId, limit, offset);
return c.json({ runs });
})
.get('/:id', async (c) => {
const user = c.get('user');
const id = c.req.param('id');
const out = await storage.getRunWithResults(id, user.userId);
if (!out) throw new NotFoundError('Run not found');
return c.json(out);
})
.post('/:runId/results/:resultId/rate', async (c) => {
const user = c.get('user');
const body = rateSchema.parse(await c.req.json());
const ok = await storage.rateResult(
c.req.param('resultId'),
user.userId,
body.rating,
body.notes
);
if (!ok) throw new BadRequestError('Cannot rate this result');
return c.json({ success: true });
});
}

View file

@ -0,0 +1,178 @@
/**
* POST /v1/search single-provider search (or auto-routed if provider omitted)
* POST /v1/search/compare fan-out to N providers, persist as eval_run
*/
import { Hono } from 'hono';
import { z } from 'zod';
import type { ProviderId, SearchHit } from '@mana/shared-research';
import { SEARCH_PROVIDER_IDS, searchOptionsSchema } from '@mana/shared-research';
import type { ExecutorDeps } from '../executor/execute';
import { executeSearch } from '../executor/execute';
import type { HonoEnv } from '../lib/hono-env';
import type { ProviderRegistry } from '../providers/registry';
import { getSearchProvider } from '../providers/registry';
import type { RunStorage } from '../storage/runs';
import { BadRequestError } from '../lib/errors';
import type { Config } from '../config';
import { SEARCH_ROUTE_MAP, pickSearchProvider } from '../router/auto-route';
import { classify } from '../router/classify';
import type { ManaLlmClient } from '../clients/mana-llm';
const MAX_COMPARE_PROVIDERS = 5;
const searchBodySchema = z.object({
query: z.string().min(1).max(1000),
provider: z.enum(SEARCH_PROVIDER_IDS).optional(),
options: searchOptionsSchema.optional(),
useLlmClassifier: z.boolean().optional(),
});
const compareBodySchema = z.object({
query: z.string().min(1).max(1000),
providers: z.array(z.enum(SEARCH_PROVIDER_IDS)).min(1).max(MAX_COMPARE_PROVIDERS),
options: searchOptionsSchema.optional(),
});
export function createSearchRoutes(
registry: ProviderRegistry,
storage: RunStorage,
deps: ExecutorDeps,
config: Config,
llm: ManaLlmClient
) {
return new Hono<HonoEnv>()
.post('/', async (c) => {
const user = c.get('user');
const body = searchBodySchema.parse(await c.req.json());
// Auto-route when no explicit provider
let providerId: ProviderId;
let queryType: string | undefined;
let mode: 'single' | 'auto' = 'single';
if (body.provider) {
providerId = body.provider;
} else {
mode = 'auto';
const cls = await classify(body.query, {
useLlm: body.useLlmClassifier,
llm,
});
queryType = cls.type;
const picked = pickSearchProvider(SEARCH_ROUTE_MAP[cls.type], config);
providerId = picked ?? 'searxng';
}
const provider = getSearchProvider(registry, providerId);
const run = await storage.createRun({
userId: user.userId,
query: body.query,
queryType,
mode,
category: 'search',
providersRequested: [providerId],
billingMode: provider.requiresApiKey ? 'server-key' : 'free',
});
const out = await executeSearch(
{
provider,
query: body.query,
options: body.options ?? {},
userId: user.userId,
},
deps
);
await storage.addResult({
runId: run.id,
providerId,
success: out.success,
latencyMs: out.meta.latencyMs,
costCredits: out.meta.costCredits,
billingMode: out.meta.billingMode,
cacheHit: out.meta.cacheHit,
normalizedResult: out.data ?? null,
errorCode: out.meta.errorCode ?? null,
});
if (out.meta.costCredits > 0) {
await storage.finalizeRunCost(run.id, out.meta.costCredits);
}
return c.json({
runId: run.id,
query: body.query,
provider: providerId,
queryType,
success: out.success,
data: out.data,
meta: out.meta,
});
})
.post('/compare', async (c) => {
const user = c.get('user');
const body = compareBodySchema.parse(await c.req.json());
if (new Set(body.providers).size !== body.providers.length) {
throw new BadRequestError('Duplicate providers in request');
}
const providers = body.providers.map((id) => getSearchProvider(registry, id));
const anyPaid = providers.some((p) => p.requiresApiKey);
const run = await storage.createRun({
userId: user.userId,
query: body.query,
mode: 'compare',
category: 'search',
providersRequested: body.providers,
billingMode: anyPaid ? 'mixed' : 'free',
});
const settled = await Promise.all(
providers.map((provider) =>
executeSearch(
{
provider,
query: body.query,
options: body.options ?? {},
userId: user.userId,
},
deps
)
)
);
let totalCost = 0;
for (let i = 0; i < providers.length; i++) {
const out = settled[i];
totalCost += out.meta.costCredits;
await storage.addResult({
runId: run.id,
providerId: providers[i].id,
success: out.success,
latencyMs: out.meta.latencyMs,
costCredits: out.meta.costCredits,
billingMode: out.meta.billingMode,
cacheHit: out.meta.cacheHit,
normalizedResult: out.data ?? null,
errorCode: out.meta.errorCode ?? null,
});
}
if (totalCost > 0) await storage.finalizeRunCost(run.id, totalCost);
return c.json({
runId: run.id,
query: body.query,
results: providers.map((provider, i) => ({
provider: provider.id,
success: settled[i].success,
data: settled[i].data as { results: SearchHit[] } | undefined,
meta: settled[i].meta,
})),
});
});
}

View file

@ -0,0 +1,31 @@
/**
* Access to research.provider_configs per-user BYO-key + budget storage.
*
* Phase 1: plaintext storage with a TODO for encryption in Phase 4 when the
* Settings UI ships. Column name is still `apiKeyEncrypted` so the schema
* doesn't need a rename later.
*/
import { and, eq } from 'drizzle-orm';
import type { Database } from '../db/connection';
import { providerConfigs } from '../db/schema/research';
import type { ProviderConfig } from '../db/schema/research';
export class ConfigStorage {
constructor(private db: Database) {}
async getForUser(userId: string, providerId: string): Promise<ProviderConfig | null> {
const [row] = await this.db
.select()
.from(providerConfigs)
.where(and(eq(providerConfigs.userId, userId), eq(providerConfigs.providerId, providerId)))
.limit(1);
return row ?? null;
}
async decryptKey(config: ProviderConfig): Promise<string | null> {
// TODO (Phase 4): AES-GCM-256 decryption via MANA_RESEARCH_KEK or mana-auth KEK wrapping.
// Phase 1: plaintext passthrough — BYO-key UX isn't built yet, so this path stays unused.
return config.apiKeyEncrypted ?? null;
}
}

View file

@ -0,0 +1,105 @@
/**
* Persist eval runs + results to research.* tables.
*/
import { desc, eq, sql } from 'drizzle-orm';
import type { Database } from '../db/connection';
import { evalResults, evalRuns, providerStats } from '../db/schema/research';
import type { EvalRun, EvalResult, NewEvalRun, NewEvalResult } from '../db/schema/research';
export class RunStorage {
constructor(private db: Database) {}
async createRun(input: NewEvalRun): Promise<EvalRun> {
const [row] = await this.db.insert(evalRuns).values(input).returning();
return row;
}
async addResult(input: NewEvalResult): Promise<EvalResult> {
const [row] = await this.db.insert(evalResults).values(input).returning();
// Fire-and-forget stats rollup (no error propagation)
this.bumpStats(input).catch((err) => console.warn('[storage] stats rollup failed:', err));
return row;
}
async finalizeRunCost(runId: string, totalCostCredits: number): Promise<void> {
await this.db.update(evalRuns).set({ totalCostCredits }).where(eq(evalRuns.id, runId));
}
async listRuns(userId: string, limit = 50, offset = 0) {
return this.db
.select()
.from(evalRuns)
.where(eq(evalRuns.userId, userId))
.orderBy(desc(evalRuns.createdAt))
.limit(limit)
.offset(offset);
}
async getRunWithResults(runId: string, userId: string) {
const [run] = await this.db.select().from(evalRuns).where(eq(evalRuns.id, runId)).limit(1);
if (!run || run.userId !== userId) return null;
const results = await this.db
.select()
.from(evalResults)
.where(eq(evalResults.runId, runId))
.orderBy(evalResults.providerId);
return { run, results };
}
async rateResult(
resultId: string,
userId: string,
rating: number,
notes?: string
): Promise<boolean> {
// Verify ownership via join
const [result] = await this.db
.select({ runUserId: evalRuns.userId, resultId: evalResults.id })
.from(evalResults)
.innerJoin(evalRuns, eq(evalResults.runId, evalRuns.id))
.where(eq(evalResults.id, resultId))
.limit(1);
if (!result || result.runUserId !== userId) return false;
await this.db
.update(evalResults)
.set({ userRating: rating, userNotes: notes })
.where(eq(evalResults.id, resultId));
return true;
}
private async bumpStats(result: NewEvalResult): Promise<void> {
const day = new Date().toISOString().slice(0, 10);
const success = result.success ? 1 : 0;
const error = result.success ? 0 : 1;
await this.db
.insert(providerStats)
.values({
providerId: result.providerId,
day,
totalCalls: 1,
totalLatencyMs: result.latencyMs,
totalCostCredits: result.costCredits ?? 0,
successCount: success,
errorCount: error,
})
.onConflictDoUpdate({
target: [providerStats.providerId, providerStats.day],
set: {
totalCalls: sql`${providerStats.totalCalls} + 1`,
totalLatencyMs: sql`${providerStats.totalLatencyMs} + ${result.latencyMs}`,
totalCostCredits: sql`${providerStats.totalCostCredits} + ${result.costCredits ?? 0}`,
successCount: sql`${providerStats.successCount} + ${success}`,
errorCount: sql`${providerStats.errorCount} + ${error}`,
},
});
}
}

View file

@ -0,0 +1,18 @@
{
"compilerOptions": {
"target": "ESNext",
"module": "ESNext",
"moduleResolution": "bundler",
"strict": true,
"esModuleInterop": true,
"skipLibCheck": true,
"outDir": "dist",
"rootDir": "src",
"declaration": true,
"types": ["bun-types"],
"paths": {
"@/*": ["./src/*"]
}
},
"include": ["src/**/*.ts"]
}