feat(research): add mana-research service — Phase 1 + 2

New Bun/Hono service on port 3068 that bundles many web-research providers
behind a unified interface for side-by-side comparison. All eval runs
persist in research.* (mana_platform) so quality can be reviewed later.

Providers (Phase 1+2):
  search:  searxng, duckduckgo, brave, tavily, exa, serper
  extract: readability (via mana-search), jina-reader, firecrawl

Endpoints:
  POST /v1/search, /v1/search/compare       — single + fan-out
  POST /v1/extract, /v1/extract/compare     — single + fan-out
  GET  /v1/runs, /v1/runs/:id               — history
  POST /v1/runs/:run/results/:id/rate       — manual eval
  GET  /v1/providers, /v1/providers/health  — catalog + readiness

Auto-routing: when `provider` is omitted, queries are classified via regex
(fast path, 0ms) with optional mana-llm fallback, then routed to the first
available provider for that query type (news → tavily, academic → exa,
semantic → exa, etc.).

Credits: server-key calls go through mana-credits reserve → commit/refund
so failed provider calls don't charge the user. BYO-keys supported via
research.provider_configs (UI arrives in Phase 4).

Cache: Redis with graceful degradation (1h TTL for search, 24h for
extract). Pay-per-use APIs only — no subscription-gated providers.

Docs: docs/plans/mana-research-service.md + docs/reports/web-research-capabilities.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-17 14:42:25 +02:00
parent 004fc0b2fd
commit 2bdb48bdd1
56 changed files with 4431 additions and 298 deletions

View file

@ -381,6 +381,61 @@ services:
- "traefik.http.routers.mana-credits.tls=true"
- "traefik.http.services.mana-credits.loadbalancer.server.port=3002"
mana-research:
build:
context: .
dockerfile: services/mana-research/Dockerfile
image: mana-research:local
container_name: mana-research
restart: always
mem_limit: 256m
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
mana-credits:
condition: service_healthy
mana-search:
condition: service_started
environment:
TZ: Europe/Berlin
NODE_ENV: production
PORT: 3068
DATABASE_URL: postgresql://postgres:${POSTGRES_PASSWORD:-mana123}@postgres:5432/mana_platform
REDIS_URL: redis://redis:6379
MANA_AUTH_URL: http://mana-auth:3001
MANA_LLM_URL: http://mana-llm:3025
MANA_CREDITS_URL: http://mana-credits:3002
MANA_SEARCH_URL: http://mana-search:3021
MANA_SERVICE_KEY: ${MANA_SERVICE_KEY}
CACHE_TTL_SECONDS: 3600
BRAVE_API_KEY: ${BRAVE_API_KEY:-}
TAVILY_API_KEY: ${TAVILY_API_KEY:-}
EXA_API_KEY: ${EXA_API_KEY:-}
SERPER_API_KEY: ${SERPER_API_KEY:-}
JINA_API_KEY: ${JINA_API_KEY:-}
FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY:-}
SCRAPINGBEE_API_KEY: ${SCRAPINGBEE_API_KEY:-}
PERPLEXITY_API_KEY: ${PERPLEXITY_API_KEY:-}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
GOOGLE_GENAI_API_KEY: ${GOOGLE_GENAI_API_KEY:-}
CORS_ORIGINS: https://mana.how,https://chat.mana.how,https://research.mana.how
ports:
- "3068:3068"
healthcheck:
test: ["CMD", "bun", "-e", "fetch('http://127.0.0.1:3068/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"]
interval: 120s
timeout: 10s
retries: 3
start_period: 15s
labels:
- "traefik.enable=true"
- "traefik.http.routers.mana-research.rule=Host(`research.mana.how`)"
- "traefik.http.routers.mana-research.tls=true"
- "traefik.http.services.mana-research.loadbalancer.server.port=3068"
mana-events:
build:
context: services/mana-events