feat(questions): deep-research module — mana-search + mana-llm pipeline

End-to-end deep-research feature for the questions module: a fire-and- forget orchestrator in apps/api that plans sub-queries with mana-llm, retrieves sources via mana-search (with optional Readability extraction), and streams a structured synthesis back to the web app over SSE. Backend (apps/api/src/modules/research): - schema.ts: pgSchema('research') with research_results + sources - orchestrator.ts: three-phase pipeline (plan / retrieve / synthesise) with depth-aware config (quick=1×, standard=3×, deep=6× sub-queries) - pubsub.ts: in-process event bus, single-node, swappable for Redis - routes.ts: POST /start (202, fire-and-forget), GET /:id/stream (SSE), POST /start-sync (test only), GET /:id, GET /:id/sources - Credit gating via @mana/shared-hono/credits — validate up-front, consume best-effort on `done`. Failed runs cost nothing. Helpers (apps/api/src/lib): - llm.ts: llmJson() + llmStream() over mana-llm OpenAI-compat API - search.ts: webSearch() + bulkExtract() over mana-search Go service - responses.ts: shared errorResponse / listResponse / validationError Schema deployment: - drizzle.config.ts (research-scoped) + drizzle/research/0000_init.sql hand-authored migration, deployable via psql -f or drizzle-kit push. - drizzle-kit added as devDep with db:generate / db:push scripts. Web client (apps/mana/apps/web/src/lib/api/research.ts): - Typed start() / get() / listSources() / streamProgress(). The stream uses fetch + ReadableStream (not EventSource) so we can attach the JWT via Authorization header. Special-cases 402 for friendly toast. - New PUBLIC_MANA_API_URL plumbing in hooks.server.ts + config.ts. Module store (modules/questions/stores/answers.svelte.ts): - New write-side store with createManual / startResearch / accept / softDelete. startResearch creates an optimistic empty answer, opens the SSE stream, debounces token deltas in 100ms batches into the encrypted local row, and on `done` replaces the streamed text with the parsed { summary, keyPoints, followUps } payload + citations resolved against research.sources.id. Citation rendering (modules/questions/components/AnswerCitations.svelte): - Tokenises [n] markers in the answer body into clickable pills with hover popovers showing title / host / snippet / external link. - Lazy-loaded via a session-scoped source cache (stores/sources.svelte.ts) that deduplicates concurrent fetches. UI (routes/(app)/questions/[id]/+page.svelte): - Recherche card with three-state button (start / cancel / re-run), animated phase indicator, source counter. - Confirmation dialog warning about web/LLM transmission since the question itself is locally encrypted. - Toasts for success / error / cancel via @mana/shared-ui/toast. - Re-run flow soft-deletes prior research-driven answers but keeps manual ones intact. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-18 08:49:39 +02:00 · 2026-04-08 22:15:35 +02:00 · 2026-04-08 22:15:35 +02:00 · e82851985b
commit e82851985b
parent 30787e36d2
18 changed files with 2221 additions and 4 deletions
--- a/apps/api/src/modules/research/schema.ts
+++ b/apps/api/src/modules/research/schema.ts
@ -0,0 +1,73 @@
+/**
+ * Research module — DB schema (Drizzle / pgSchema 'research')
+ *
+ * Server-side store for deep-research runs orchestrated by apps/api.
+ * Lives in mana_platform under its own pgSchema.
+ *
+ * - research_results: one row per research run, holds plan + final synthesis
+ * - sources:          one row per web source consumed by a run
+ *
+ * The local-first questions module references research_results.id from
+ * LocalAnswer.researchResultId; sources are fetched on-demand via the API
+ * and never mirrored into IndexedDB (they're public web content).
+ */
+
+import { drizzle } from 'drizzle-orm/postgres-js';
+import postgres from 'postgres';
+import { pgSchema, uuid, text, timestamp, integer, jsonb } from 'drizzle-orm/pg-core';
+
+const DATABASE_URL =
+	process.env.DATABASE_URL ?? 'postgresql://mana:devpassword@localhost:5432/mana_platform';
+
+export const researchSchema = pgSchema('research');
+
+/**
+ * One row per research run. Created in `planning` state immediately on
+ * /start, then updated as the orchestrator advances through phases.
+ */
+export const researchResults = researchSchema.table('research_results', {
+	id: uuid('id').defaultRandom().primaryKey(),
+	userId: text('user_id').notNull(),
+	questionId: text('question_id').notNull(), // mirrors local LocalQuestion.id (UUID)
+	depth: text('depth').notNull(), // 'quick' | 'standard' | 'deep'
+	status: text('status').notNull(), // 'planning' | 'searching' | 'extracting' | 'synthesizing' | 'done' | 'error'
+	subQueries: jsonb('sub_queries').$type<string[]>(),
+	summary: text('summary'),
+	keyPoints: jsonb('key_points').$type<string[]>(),
+	followUpQuestions: jsonb('follow_up_questions').$type<string[]>(),
+	errorMessage: text('error_message'),
+	startedAt: timestamp('started_at', { withTimezone: true }).defaultNow().notNull(),
+	finishedAt: timestamp('finished_at', { withTimezone: true }),
+});
+
+/**
+ * Sources consumed during a research run. Rank reflects ordering in the
+ * synthesis prompt so citation [n] in the summary maps to sources[n-1].
+ */
+export const sources = researchSchema.table('sources', {
+	id: uuid('id').defaultRandom().primaryKey(),
+	researchResultId: uuid('research_result_id')
+		.notNull()
+		.references(() => researchResults.id, { onDelete: 'cascade' }),
+	url: text('url').notNull(),
+	title: text('title'),
+	snippet: text('snippet'),
+	extractedContent: text('extracted_content'),
+	category: text('category'),
+	rank: integer('rank').notNull(),
+	createdAt: timestamp('created_at', { withTimezone: true }).defaultNow().notNull(),
+});
+
+const connection = postgres(DATABASE_URL, { max: 5, idle_timeout: 20 });
+export const db = drizzle(connection, { schema: { researchResults, sources } });
+
+export type ResearchResult = typeof researchResults.$inferSelect;
+export type Source = typeof sources.$inferSelect;
+export type ResearchDepth = 'quick' | 'standard' | 'deep';
+export type ResearchStatus =
+	| 'planning'
+	| 'searching'
+	| 'extracting'
+	| 'synthesizing'
+	| 'done'
+	| 'error';