chore(articles): polish pass — schema cleanup, MAX cap, filters, docs (#8,#9,#13,#15,#18,#20)

Polish-pass on top of the bulk-import rollout. Five contained items.

#8 + #9 — Dexie v60 schema cleanup
   - Drop articleImportJobs.leasedBy + .leasedUntil. They were defined
     on the original v57 schema as a soft-lease handshake, but the
     worker uses pg_try_advisory_xact_lock and never wrote them.
     Local-* type + projection row stripped.
   - Drop the standalone `state` index on articleImportItems.
     [jobId+state] covers the worker's hot query; the state-solo
     index had no call site.
   Both changes lossless — Dexie just removes the column declarations
   from new rows; existing rows still carry the dead nulls (zombies)
   until the next full row-rewrite. Not worth a hard migration for
   two never-written columns.

#15 — MAX_URLS_PER_JOB hard cap (200)
   articleImportsStore.createJob() throws if the URL list exceeds the
   cap. BulkImportForm surfaces the limit in the live counter chip
   and disables the submit when over. The worker can chew through any
   N, but at high counts the UI gets unwieldy (no virtualisation) and
   wall-clock duration climbs into multi-hour. 200 is a pragmatic
   ceiling — Pocket-export dumps average 50–150.

#13 — Filter-Tabs in JobsList
   Pill-style tabs above the list: Alle / Aktiv / Fertig / Mit Fehlern,
   each with the row count. Disabled when the bucket is empty so the
   user only sees actionable filters. The "Mit Fehlern" filter
   (errorCount > 0) is the most valuable for triage.

#18 — apps/mana/CLAUDE.md
   - Articles row added to the Tool Coverage table (5 propose +
     1 auto, including the new auto-policy import_articles_from_urls).
   - New "Articles bulk-import" section after the AI Workbench part:
     pipeline diagram, table list, actor + metrics + cap pointers.

#20 — ARTICLES_IMPORT_WORKER_DISABLED env var documented
   New row under "Mana API — Articles Bulk-Import Worker" in
   docs/ENVIRONMENT_VARIABLES.md.

Plan: docs/plans/articles-bulk-import.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Till JS 2026-04-29 02:42:46 +02:00
parent ace1b706e6
commit e37c008a7a
9 changed files with 218 additions and 22 deletions

View file

@ -39,8 +39,6 @@ export interface ImportJobRow {
spaceId: string | null;
totalUrls: number;
status: 'queued' | 'running' | 'paused' | 'done' | 'cancelled';
leasedBy: string | null;
leasedUntil: string | null;
startedAt: string | null;
finishedAt: string | null;
savedCount: number;
@ -192,8 +190,6 @@ function projectJob(userId: string, recordId: string, merged: Row | null): Impor
spaceId: optStr(merged.spaceId),
totalUrls,
status,
leasedBy: optStr(merged.leasedBy),
leasedUntil: optStr(merged.leasedUntil),
startedAt: optStr(merged.startedAt),
finishedAt: optStr(merged.finishedAt),
savedCount: num(merged.savedCount) ?? 0,

View file

@ -275,6 +275,7 @@ Agents interact with the app through tools — each one either auto (executes si
| food | — | `nutrition_summary`, `log_meal` |
| news | `save_news_article` | — |
| news-research | `research_news` | — |
| articles | `save_article`, `archive_article`, `tag_article`, `add_article_highlight`, `import_articles_from_urls` (auto) | `list_articles` |
| journal | `create_journal_entry` | — |
| habits | `create_habit`, `log_habit` | `get_habits` |
| contacts | `create_contact` | `get_contacts` |
@ -304,6 +305,36 @@ Each template bundles: optional agent + optional scene layout + optional starter
Full architecture (Planner prompt + parser in `@mana/shared-ai`, server-side runner, Postgres actor column, materialized snapshots, Multi-Agent gating, server-side web-research, Prometheus metrics + status.mana.how integration): [`docs/architecture/COMPANION_BRAIN_ARCHITECTURE.md`](../../docs/architecture/COMPANION_BRAIN_ARCHITECTURE.md) §20 (AI Workbench) + §21 (Mission Grants) + §22 (Multi-Agent Workbench).
## Articles bulk-import
Background pipeline that ingests N URLs into a user's reading list as
one Job, with the same encryption + scope semantics as a single-URL
save. Same shape as the AI mission runner: state lives in
`sync_changes`, a server-side worker projects + writes back, the
client encrypts the final article.
```
client createJob(urls)
→ bulkAdd articleImportItems(state='pending') + articleImportJobs(queued)
→ sync push → mana_sync.sync_changes
→ apps/api worker tick (every 2s, advisory-lock-gated)
→ extractFromUrl (shared-rss / Readability)
→ write articleExtractPickup row + flip item → 'extracted'
→ sync pull → liveQuery
→ consume-pickup encryptRecord + articleTable.add
→ flip item → 'saved' (or 'duplicate' / 'consent-wall')
→ delete pickup row
→ server flips job → 'done', emits ArticleImportFinished
```
Tables: `articleImportJobs`, `articleImportItems`, `articleExtractPickup`
(all plaintext-allowlisted — see `data/crypto/plaintext-allowlist.ts`).
Actor on every server-write: `system:articles-import-worker`. Worker
metrics under `mana_api_articles_import_*`. Hard cap of 200 URLs per
job (`MAX_URLS_PER_JOB` in `modules/articles/stores/imports.svelte`).
Plan: [`docs/plans/articles-bulk-import.md`](../../docs/plans/articles-bulk-import.md).
## Reference Documents
| Path | Purpose |

View file

@ -1465,6 +1465,34 @@ db.version(59).stores({
documentTags: null,
});
// v60 — Articles bulk-import schema cleanup.
// Two changes, both lossless:
//
// 1. articleImportJobs: drop the unused `leasedBy`/`leasedUntil`
// columns. They were on the original v57 schema as a soft-lease
// handshake, but the worker uses pg_try_advisory_xact_lock
// instead and never wrote them. Dexie's index list shrinks but
// no data is migrated — the columns simply disappear from
// future writes; existing rows still carry them as zombies (a
// one-shot row-rewrite to delete the field would be a hard-
// migration; not worth it for two never-written nulls).
// 2. articleImportItems: drop the standalone `state` index.
// `[jobId+state]` covers the only hot query (worker's per-job
// pending scan). The state-solo index had no call site —
// retryFailed uses [jobId+state]. Trimming the index list saves
// a bit of write amplification.
//
// Kept on the schema (not dropped here): `idx` standalone index on
// articleImportItems. It's also unused right now, but the
// JobDetailView currently sorts items in JS via .sort((a,b)=>a.idx-b.idx);
// if that view ever switches to a server-side ordered scan we'd want
// the index back, and re-adding indexes after the fact is more
// painful than keeping a small one around.
db.version(60).stores({
articleImportJobs: 'id, status, [spaceId+status], _updatedAtIndex',
articleImportItems: 'id, jobId, [jobId+state], idx',
});
// ─── Sync Routing ──────────────────────────────────────────
// SYNC_APP_MAP, TABLE_TO_SYNC_NAME, TABLE_TO_APP, SYNC_NAME_TO_TABLE,
// toSyncName() and fromSyncName() are now derived from per-module

View file

@ -7,13 +7,14 @@
-->
<script lang="ts">
import { goto } from '$app/navigation';
import { articleImportsStore, parseUrls } from '../stores/imports.svelte';
import { articleImportsStore, MAX_URLS_PER_JOB, parseUrls } from '../stores/imports.svelte';
let raw = $state('');
let busy = $state(false);
let error = $state<string | null>(null);
const parsed = $derived(parseUrls(raw));
const overLimit = $derived(parsed.valid.length > MAX_URLS_PER_JOB);
async function handleSubmit() {
if (busy) return;
@ -21,6 +22,10 @@
error = 'Mindestens eine gültige URL einfügen.';
return;
}
if (overLimit) {
error = `Maximal ${MAX_URLS_PER_JOB} URLs pro Job. Splitte den Import in mehrere Jobs.`;
return;
}
busy = true;
error = null;
try {
@ -51,7 +56,9 @@
></textarea>
<div class="counter-row" aria-live="polite">
<span class="counter counter-valid">{parsed.valid.length} gültig</span>
<span class="counter counter-valid" class:counter-overlimit={overLimit}>
{parsed.valid.length} gültig{overLimit ? ` / max ${MAX_URLS_PER_JOB}` : ''}
</span>
{#if parsed.duplicates.length > 0}
<span class="counter counter-dup">{parsed.duplicates.length} doppelt (übersprungen)</span>
{/if}
@ -60,6 +67,12 @@
{/if}
</div>
{#if overLimit}
<p class="error" role="alert">
Zu viele URLs ({parsed.valid.length}). Maximal {MAX_URLS_PER_JOB} pro Job — splitte den Import.
</p>
{/if}
{#if parsed.invalid.length > 0}
<details class="invalid-details">
<summary>Ungültige Zeilen anzeigen ({parsed.invalid.length})</summary>
@ -80,7 +93,7 @@
type="button"
class="primary"
onclick={handleSubmit}
disabled={busy || parsed.valid.length === 0}
disabled={busy || parsed.valid.length === 0 || overLimit}
>
{#if busy}Erstelle Job…{:else}{parsed.valid.length} URLs importieren{/if}
</button>
@ -144,6 +157,10 @@
background: color-mix(in srgb, #16a34a 12%, transparent);
color: #16a34a;
}
.counter-overlimit {
background: rgba(239, 68, 68, 0.12);
color: #ef4444;
}
.counter-dup {
background: color-mix(in srgb, #f59e0b 12%, transparent);
color: #b45309;

View file

@ -9,8 +9,30 @@
import { useImportJobs } from '../queries';
import type { ArticleImportJob } from '../types';
type Filter = 'all' | 'active' | 'done' | 'errors';
const jobs$ = useImportJobs();
const jobs = $derived(jobs$.value);
const allJobs = $derived(jobs$.value);
let filter = $state<Filter>('all');
const activeCount = $derived(
allJobs.filter((j) => j.status === 'queued' || j.status === 'running' || j.status === 'paused')
.length
);
const doneCount = $derived(allJobs.filter((j) => j.status === 'done').length);
const errorCount = $derived(allJobs.filter((j) => j.errorCount > 0).length);
const visibleJobs = $derived(
filter === 'all'
? allJobs
: filter === 'active'
? allJobs.filter(
(j) => j.status === 'queued' || j.status === 'running' || j.status === 'paused'
)
: filter === 'done'
? allJobs.filter((j) => j.status === 'done')
: allJobs.filter((j) => j.errorCount > 0)
);
function progress(job: ArticleImportJob): string {
const done = job.savedCount + job.duplicateCount + job.errorCount;
@ -33,11 +55,53 @@
}
</script>
{#if jobs.length > 0}
{#if allJobs.length > 0}
<section class="jobs-list">
<h2>Bisherige Imports</h2>
<header class="list-header">
<h2>Bisherige Imports</h2>
<nav class="filter-tabs" aria-label="Filter">
<button
type="button"
class="tab"
class:tab-active={filter === 'all'}
onclick={() => (filter = 'all')}
>
Alle ({allJobs.length})
</button>
<button
type="button"
class="tab"
class:tab-active={filter === 'active'}
onclick={() => (filter = 'active')}
disabled={activeCount === 0}
>
Aktiv ({activeCount})
</button>
<button
type="button"
class="tab"
class:tab-active={filter === 'done'}
onclick={() => (filter = 'done')}
disabled={doneCount === 0}
>
Fertig ({doneCount})
</button>
<button
type="button"
class="tab"
class:tab-active={filter === 'errors'}
onclick={() => (filter = 'errors')}
disabled={errorCount === 0}
>
Mit Fehlern ({errorCount})
</button>
</nav>
</header>
{#if visibleJobs.length === 0}
<p class="empty-filter">Keine Jobs in dieser Ansicht.</p>
{/if}
<ul>
{#each jobs as job (job.id)}
{#each visibleJobs as job (job.id)}
<button type="button" class="row" onclick={() => goto(`/articles/import/${job.id}`)}>
<span class="status status-{job.status}">{statusLabel(job.status)}</span>
<span class="progress">{progress(job)}</span>
@ -63,10 +127,54 @@
margin: 1.5rem auto 0;
padding: 0 1.5rem;
}
.list-header {
display: flex;
gap: 0.85rem;
align-items: baseline;
flex-wrap: wrap;
margin-bottom: 0.65rem;
}
.jobs-list h2 {
margin: 0 0 0.65rem 0;
margin: 0;
font-size: 1.05rem;
}
.filter-tabs {
display: flex;
gap: 0.25rem;
flex-wrap: wrap;
}
.tab {
padding: 0.18rem 0.55rem;
border-radius: 999px;
border: 1px solid var(--color-border, rgba(0, 0, 0, 0.12));
background: transparent;
color: var(--color-text-muted, #64748b);
font: inherit;
font-size: 0.78rem;
cursor: pointer;
}
.tab:hover:not(:disabled) {
border-color: color-mix(in srgb, #f97316 60%, transparent);
color: inherit;
}
.tab:disabled {
opacity: 0.4;
cursor: not-allowed;
}
.tab-active {
background: #f97316;
color: white;
border-color: #f97316;
}
.tab-active:hover:not(:disabled) {
background: #ea580c;
color: white;
}
.empty-filter {
margin: 0.5rem 0 0 0;
color: var(--color-text-muted, #64748b);
font-size: 0.85rem;
}
.jobs-list ul {
list-style: none;
margin: 0;

View file

@ -285,8 +285,6 @@ export function toImportJob(local: LocalArticleImportJob): ArticleImportJob {
id: local.id,
totalUrls: local.totalUrls,
status: local.status,
leasedBy: local.leasedBy ?? null,
leasedUntil: local.leasedUntil ?? null,
startedAt: local.startedAt ?? null,
finishedAt: local.finishedAt ?? null,
savedCount: local.savedCount ?? 0,

View file

@ -26,6 +26,17 @@ import type {
// (BulkImportForm, tools.ts) keep working unchanged.
export { parseUrls, type ParsedUrls };
/**
* Hard cap on the URL count per job. The worker can chew through any
* number of items, but at very high counts the UI becomes unwieldy
* (JobDetailView is a flat list, no virtualisation yet) and the
* worst-case wall-clock duration climbs into the multi-hour range
* (50 URLs 510 min at concurrency 3, scales linearly). 200 is a
* pragmatic ceiling real reading-list dumps from Pocket exports
* average 50150 items.
*/
export const MAX_URLS_PER_JOB = 200;
export const articleImportsStore = {
/**
* Create a job with N items, all in state='pending'. Returns the
@ -39,14 +50,17 @@ export const articleImportsStore = {
if (urls.length === 0) {
throw new Error('createJob: empty url list');
}
if (urls.length > MAX_URLS_PER_JOB) {
throw new Error(
`createJob: too many URLs (${urls.length}). Max ${MAX_URLS_PER_JOB} pro Job — splitte den Import in mehrere Jobs.`
);
}
const jobId = crypto.randomUUID();
const job: LocalArticleImportJob = {
id: jobId,
totalUrls: urls.length,
status: 'queued',
leasedBy: null,
leasedUntil: null,
startedAt: null,
finishedAt: null,
savedCount: 0,

View file

@ -165,10 +165,6 @@ export type ArticleImportItemState =
export interface LocalArticleImportJob extends BaseRecord {
totalUrls: number;
status: ArticleImportJobStatus;
/** Worker lease — workerId of the apps/api instance that claimed the job. */
leasedBy: string | null;
/** ISO timestamp; lease is dead once `leasedUntil < now`. */
leasedUntil: string | null;
startedAt: string | null;
finishedAt: string | null;
/** Counters mirror the per-item terminal states. Cache for fast list
@ -178,6 +174,10 @@ export interface LocalArticleImportJob extends BaseRecord {
duplicateCount: number;
errorCount: number;
warningCount: number;
// NOTE: `leasedBy` + `leasedUntil` were defined on the original
// schema as a soft-lease handshake but the worker uses
// pg_try_advisory_xact_lock instead, so they were never written.
// Removed in Dexie v58 — see database.ts.
}
export interface LocalArticleImportItem extends BaseRecord {
@ -227,8 +227,6 @@ export interface ArticleImportJob {
id: string;
totalUrls: number;
status: ArticleImportJobStatus;
leasedBy: string | null;
leasedUntil: string | null;
startedAt: string | null;
finishedAt: string | null;
savedCount: number;