feat(questions): deep-research module — mana-search + mana-llm pipeline

End-to-end deep-research feature for the questions module: a fire-and-
forget orchestrator in apps/api that plans sub-queries with mana-llm,
retrieves sources via mana-search (with optional Readability extraction),
and streams a structured synthesis back to the web app over SSE.

Backend (apps/api/src/modules/research):
- schema.ts: pgSchema('research') with research_results + sources
- orchestrator.ts: three-phase pipeline (plan / retrieve / synthesise)
  with depth-aware config (quick=1×, standard=3×, deep=6× sub-queries)
- pubsub.ts: in-process event bus, single-node, swappable for Redis
- routes.ts: POST /start (202, fire-and-forget), GET /:id/stream (SSE),
  POST /start-sync (test only), GET /:id, GET /:id/sources
- Credit gating via @mana/shared-hono/credits — validate up-front,
  consume best-effort on `done`. Failed runs cost nothing.

Helpers (apps/api/src/lib):
- llm.ts: llmJson() + llmStream() over mana-llm OpenAI-compat API
- search.ts: webSearch() + bulkExtract() over mana-search Go service
- responses.ts: shared errorResponse / listResponse / validationError

Schema deployment:
- drizzle.config.ts (research-scoped) + drizzle/research/0000_init.sql
  hand-authored migration, deployable via psql -f or drizzle-kit push.
- drizzle-kit added as devDep with db:generate / db:push scripts.

Web client (apps/mana/apps/web/src/lib/api/research.ts):
- Typed start() / get() / listSources() / streamProgress(). The stream
  uses fetch + ReadableStream (not EventSource) so we can attach the
  JWT via Authorization header. Special-cases 402 for friendly toast.
- New PUBLIC_MANA_API_URL plumbing in hooks.server.ts + config.ts.

Module store (modules/questions/stores/answers.svelte.ts):
- New write-side store with createManual / startResearch / accept /
  softDelete. startResearch creates an optimistic empty answer, opens
  the SSE stream, debounces token deltas in 100ms batches into the
  encrypted local row, and on `done` replaces the streamed text with
  the parsed { summary, keyPoints, followUps } payload + citations
  resolved against research.sources.id.

Citation rendering (modules/questions/components/AnswerCitations.svelte):
- Tokenises [n] markers in the answer body into clickable pills with
  hover popovers showing title / host / snippet / external link.
- Lazy-loaded via a session-scoped source cache (stores/sources.svelte.ts)
  that deduplicates concurrent fetches.

UI (routes/(app)/questions/[id]/+page.svelte):
- Recherche card with three-state button (start / cancel / re-run),
  animated phase indicator, source counter.
- Confirmation dialog warning about web/LLM transmission since the
  question itself is locally encrypted.
- Toasts for success / error / cancel via @mana/shared-ui/toast.
- Re-run flow soft-deletes prior research-driven answers but keeps
  manual ones intact.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-08 22:15:35 +02:00
parent 30787e36d2
commit e82851985b
18 changed files with 2221 additions and 4 deletions

175
apps/api/src/lib/llm.ts Normal file
View file

@ -0,0 +1,175 @@
/**
* Thin client for the mana-llm gateway.
*
* Two helpers, deliberately small:
*
* llmJson() non-streaming, parses the model response as JSON.
* Used for plan/structuring steps where we need a typed object.
*
* llmStream() streaming, calls onToken() for each delta and returns
* the full concatenated text at the end. Used for synthesis.
*
* mana-llm exposes an OpenAI-compatible /api/v1/chat/completions endpoint
* (see services/mana-llm). Models are namespaced as `provider/model`, e.g.
* `ollama/gemma3:4b`, `openrouter/meta-llama/llama-3.1-70b-instruct`.
*
* Internal service-to-service calls no auth on the wire (private network).
*/
const LLM_URL = process.env.MANA_LLM_URL || 'http://localhost:3025';
export interface LlmMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
export interface LlmJsonOptions {
model: string;
system?: string;
user: string;
temperature?: number;
maxTokens?: number;
}
export interface LlmStreamOptions {
model: string;
system?: string;
user: string;
temperature?: number;
maxTokens?: number;
onToken: (delta: string) => void | Promise<void>;
signal?: AbortSignal;
}
export class LlmError extends Error {
constructor(
message: string,
public readonly status?: number,
public readonly body?: string
) {
super(message);
this.name = 'LlmError';
}
}
function buildMessages(system: string | undefined, user: string): LlmMessage[] {
const msgs: LlmMessage[] = [];
if (system) msgs.push({ role: 'system', content: system });
msgs.push({ role: 'user', content: user });
return msgs;
}
/**
* Call the LLM and parse the response as JSON.
*
* Strips markdown code fences if the model wraps its output in ```json ... ```.
* Throws LlmError on transport/HTTP failure or if the body isn't valid JSON.
*/
export async function llmJson<T = unknown>(opts: LlmJsonOptions): Promise<T> {
const res = await fetch(`${LLM_URL}/api/v1/chat/completions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: opts.model,
messages: buildMessages(opts.system, opts.user),
temperature: opts.temperature ?? 0.2,
max_tokens: opts.maxTokens ?? 1000,
response_format: { type: 'json_object' },
}),
});
if (!res.ok) {
const body = await res.text().catch(() => '');
throw new LlmError(`mana-llm returned ${res.status}`, res.status, body);
}
const data = (await res.json()) as {
choices?: Array<{ message?: { content?: string } }>;
};
const raw = data.choices?.[0]?.message?.content;
if (!raw) throw new LlmError('mana-llm response missing content');
const cleaned = stripCodeFence(raw);
try {
return JSON.parse(cleaned) as T;
} catch (err) {
throw new LlmError(
`mana-llm returned non-JSON content: ${(err as Error).message}`,
undefined,
raw
);
}
}
/**
* Call the LLM in streaming mode. Invokes onToken() for each delta and
* returns the full concatenated text once the stream completes.
*
* Parses OpenAI-style SSE: lines beginning with `data: ` and the
* sentinel `data: [DONE]`.
*/
export async function llmStream(opts: LlmStreamOptions): Promise<string> {
const res = await fetch(`${LLM_URL}/api/v1/chat/completions`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: opts.model,
messages: buildMessages(opts.system, opts.user),
temperature: opts.temperature ?? 0.5,
max_tokens: opts.maxTokens ?? 2000,
stream: true,
}),
signal: opts.signal,
});
if (!res.ok || !res.body) {
const body = await res.text().catch(() => '');
throw new LlmError(`mana-llm stream returned ${res.status}`, res.status, body);
}
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
let full = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// SSE frames are separated by blank lines, but mana-llm forwards
// line-by-line — split on \n and keep the last (possibly partial) line.
const lines = buffer.split('\n');
buffer = lines.pop() ?? '';
for (const line of lines) {
if (!line.startsWith('data: ')) continue;
const payload = line.slice(6).trim();
if (!payload || payload === '[DONE]') continue;
try {
const chunk = JSON.parse(payload) as {
choices?: Array<{ delta?: { content?: string } }>;
};
const delta = chunk.choices?.[0]?.delta?.content;
if (delta) {
full += delta;
await opts.onToken(delta);
}
} catch {
// ignore malformed frames — keepalives, comments, etc.
}
}
}
return full;
}
function stripCodeFence(text: string): string {
const trimmed = text.trim();
if (!trimmed.startsWith('```')) return trimmed;
// ```json\n...\n``` or ```\n...\n```
const withoutOpen = trimmed.replace(/^```(?:json)?\s*\n?/, '');
return withoutOpen.replace(/\n?```\s*$/, '');
}