Polish-pass on top of the bulk-import rollout. Five contained items.
#8 + #9 — Dexie v60 schema cleanup
- Drop articleImportJobs.leasedBy + .leasedUntil. They were defined
on the original v57 schema as a soft-lease handshake, but the
worker uses pg_try_advisory_xact_lock and never wrote them.
Local-* type + projection row stripped.
- Drop the standalone `state` index on articleImportItems.
[jobId+state] covers the worker's hot query; the state-solo
index had no call site.
Both changes lossless — Dexie just removes the column declarations
from new rows; existing rows still carry the dead nulls (zombies)
until the next full row-rewrite. Not worth a hard migration for
two never-written columns.
#15 — MAX_URLS_PER_JOB hard cap (200)
articleImportsStore.createJob() throws if the URL list exceeds the
cap. BulkImportForm surfaces the limit in the live counter chip
and disables the submit when over. The worker can chew through any
N, but at high counts the UI gets unwieldy (no virtualisation) and
wall-clock duration climbs into multi-hour. 200 is a pragmatic
ceiling — Pocket-export dumps average 50–150.
#13 — Filter-Tabs in JobsList
Pill-style tabs above the list: Alle / Aktiv / Fertig / Mit Fehlern,
each with the row count. Disabled when the bucket is empty so the
user only sees actionable filters. The "Mit Fehlern" filter
(errorCount > 0) is the most valuable for triage.
#18 — apps/mana/CLAUDE.md
- Articles row added to the Tool Coverage table (5 propose +
1 auto, including the new auto-policy import_articles_from_urls).
- New "Articles bulk-import" section after the AI Workbench part:
pipeline diagram, table list, actor + metrics + cap pointers.
#20 — ARTICLES_IMPORT_WORKER_DISABLED env var documented
New row under "Mana API — Articles Bulk-Import Worker" in
docs/ENVIRONMENT_VARIABLES.md.
Plan: docs/plans/articles-bulk-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#5 — SYSTEM_ARTICLES_IMPORT_WORKER hoisted into @mana/shared-ai
The worker built its actor inline, bypassing the SystemSource union
that's the blessed list for system-write principals. Now uses
makeSystemActor(SYSTEM_ARTICLES_IMPORT_WORKER) like every other
server-side system writer (mission-runner, projection, …).
#7 — sync-db helper hoisted out of mcp/ into lib/
Implementation moved to apps/api/src/lib/sync-db.ts; mcp/sync-db.ts
is a re-export shim so existing MCP imports keep working. Articles
bulk-import + future modules import from lib/ directly — no more
"articles depending on mcp" layering smell.
#11 — Prometheus metrics for the worker
New counters + histogram in lib/metrics.ts under
mana_api_articles_import_*:
- ticks_total{result=processed|skipped|error}
- items_total{result=extracted|error|consent_wall|cancelled}
- extract_duration_seconds (histogram, 0.25–30s buckets)
- jobs_completed_total{result=done}
- pickup_gc_rows_total
Worker tick + extractor instrumented at the right transition points.
Steady-state pickup_gc_rows_total > 0 over time signals a stuck
consumer somewhere — useful operator alert.
Plan: docs/plans/articles-bulk-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four cross-cutting fixes that make the bulk-import worker safe to run
under real production load. All four were called out as live-rollout
risks in the post-ship review of docs/plans/articles-bulk-import.md.
#1 — Same fieldMetaTime bug fixed in mana-ai
The articles fix in 054b9e5be hoists the helper to its own file
`apps/api/src/modules/articles/field-meta.ts`. The same naive
`rowFM[k] >= localTime` LWW comparison existed in three more
projections under services/mana-ai (missions-projection,
snapshot-refresh, agents-projection). Once any F3 stamp lands
beside a legacy-string stamp, the comparison evaluates
`'[object Object]' >= 'ISO-…'` (false) and the older value wins.
New `services/mana-ai/src/db/field-meta.ts` — same helper,
deliberately duplicated (each service treats sync_changes as a
read-only event log; sharing infra across services is out of
scope here). All 61 mana-ai bun tests still pass.
#2 — Stale 'extracting' items recycle
If the worker dies mid-fetch (OOM, pod restart), items stay in
state='extracting' forever and the job never completes. New sweep
at the start of `processOneJob`: items whose lastAttemptAt is
older than 5 minutes get bounced back to 'pending' so the next
tick re-claims them. STALE_EXTRACTING_MS tuned for the 15s
shared-rss fetch + JSDOM-parse worst case.
#3 — Pickup-row GC
Every 30 ticks (~once per minute) the worker hard-deletes
articleExtractPickup rows older than 24h. Without this a stuck
pickup-consumer (all tabs closed, Web-Lock mismatch) would let
sync_changes accumulate without bound. Logs the row count when
non-zero so we can spot stuck consumers in the wild.
#4 — DRY consent-wall heuristic
Identical CONSENT_KEYWORDS + threshold lived in routes.ts AND
import-extractor.ts. Hoisted to
`apps/api/src/modules/articles/consent-wall.ts`; both call sites
now share one heuristic.
Plan: docs/plans/articles-bulk-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live-test caught it: the worker projects sync_changes via field-level
LWW, comparing `field_meta[k]` directly. But field_meta is two-shaped
on the wire:
- Legacy plaintext writes: { state: '2026-04-28T…' }
- Field-meta-overhaul writes: { state: { at, actor, origin } }
The naive `rowFM[k] >= localTime` worked for the all-legacy case, but
once a client write (legacy string) followed a worker write (F3
object), the comparison evaluated `'2026-04-28T…' >= '[object …]'`
and the projection silently kept the older value. Live symptom: an
item that was correctly flipped to 'saved' on the client was reported
back as 'extracted' by the projection.
Fix: `fieldMetaTime()` helper that pulls the ISO string out of either
shape; both write paths now compare apples-to-apples.
Verified end-to-end:
- Synthetic job + item written into sync_changes
- runTickOnce() → claim → extractFromUrl(example.com) → pickup row
with title='Example Domain', wordCount=16, actor=
system:articles-import-worker
- Item transitions pending → extracting → extracted
- Simulated client write 'saved'
- Next tick rolls counters: savedCount 0→1, status running→done,
finishedAt stamped
Plan: docs/plans/articles-bulk-import.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>