Commit graph

3 commits

Author SHA1 Message Date
Till JS
b297f68ee4 fix(articles, mana-ai): rollout-block hardening for sync_changes projections
Four cross-cutting fixes that make the bulk-import worker safe to run
under real production load. All four were called out as live-rollout
risks in the post-ship review of docs/plans/articles-bulk-import.md.

#1 — Same fieldMetaTime bug fixed in mana-ai
   The articles fix in 054b9e5be hoists the helper to its own file
   `apps/api/src/modules/articles/field-meta.ts`. The same naive
   `rowFM[k] >= localTime` LWW comparison existed in three more
   projections under services/mana-ai (missions-projection,
   snapshot-refresh, agents-projection). Once any F3 stamp lands
   beside a legacy-string stamp, the comparison evaluates
   `'[object Object]' >= 'ISO-…'` (false) and the older value wins.
   New `services/mana-ai/src/db/field-meta.ts` — same helper,
   deliberately duplicated (each service treats sync_changes as a
   read-only event log; sharing infra across services is out of
   scope here). All 61 mana-ai bun tests still pass.

#2 — Stale 'extracting' items recycle
   If the worker dies mid-fetch (OOM, pod restart), items stay in
   state='extracting' forever and the job never completes. New sweep
   at the start of `processOneJob`: items whose lastAttemptAt is
   older than 5 minutes get bounced back to 'pending' so the next
   tick re-claims them. STALE_EXTRACTING_MS tuned for the 15s
   shared-rss fetch + JSDOM-parse worst case.

#3 — Pickup-row GC
   Every 30 ticks (~once per minute) the worker hard-deletes
   articleExtractPickup rows older than 24h. Without this a stuck
   pickup-consumer (all tabs closed, Web-Lock mismatch) would let
   sync_changes accumulate without bound. Logs the row count when
   non-zero so we can spot stuck consumers in the wild.

#4 — DRY consent-wall heuristic
   Identical CONSENT_KEYWORDS + threshold lived in routes.ts AND
   import-extractor.ts. Hoisted to
   `apps/api/src/modules/articles/consent-wall.ts`; both call sites
   now share one heuristic.

Plan: docs/plans/articles-bulk-import.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 00:53:39 +02:00
Till JS
efe1810b04 feat(articles): browser-HTML bookmarklet + consent-wall detection + auto-save
Three intertwined improvements so the "save an article" flow actually
works on real-world sites, not just bloggy happy-path URLs.

=== Consent-wall detection ===

apps/api/src/modules/articles/routes.ts: the /extract response now
includes `warning: 'probable_consent_wall'` when the extracted text
is both short (<300 words) AND contains cookie-dialog vocabulary
(Cookies zustimmen / cookie consent / Zustimmung / accept all cookies
/ enable javascript / privacy center / Datenschutzeinstellungen). The
server still returns whatever it got so the client can decide; it just
flags it as probably-not-the-article.

Frontend surfaces that warning prominently instead of silently
persisting a "Cookies zustimmen…" blob as the article body.

=== Browser-HTML extract path ===

Server-side: new POST /api/v1/articles/extract/html endpoint accepting
{ url, html }, running @mana/shared-rss's extractFromHtml on the
caller-supplied HTML. 10 MiB payload cap. Same response shape as
/extract, including the consent-wall warning (in case the bookmarklet
fires before the user dismisses the dialog).

Client-side: new extractFromHtml() in api.ts with the same 25s
timeout + typed network-error mapping as extractArticle.

AddUrlForm gains a postMessage handshake: when loaded with
?source=bookmarklet, it posts `mana-ready` to window.opener and
listens one-shot for `mana-html` with { url, html, title } from the
opener's tab. The HTML goes straight to our own /extract/html
endpoint — same-origin, carries the user's auth cookie. No CORS, no
form-submission CSP tango, no cross-origin token smuggling. If
nothing arrives within 30s we surface a clear error instead of
hanging.

Settings page adds a second "browser-HTML" bookmarklet (marked as
"Empfohlen") alongside the legacy URL bookmarklet. New snippet opens
/articles/add?source=bookmarklet in a new tab, waits for mana-ready,
then postMessages the tab's documentElement.outerHTML over. 15s
safety timeout.

This bypasses cookie-consent walls and soft paywalls because the
HTML already comes from the user's own authenticated, consented
browser tab.

=== Auto-save after successful extract ===

Previously every save path had a two-click UX: preview → confirm.
Now on clean extract the preview skips straight to persist + navigate
to the reader. Consent-wall warning is the only fallback that pauses
the flow — the user gets a "Trotzdem speichern" button to opt into
saving a teaser anyway.

Button in the manual input row is renamed "Vorschau abrufen" → "Speichern"
since it's now the commit action, not the inspect action. Loading-block
messaging distinguishes "Server extrahiert…" vs "Speichere in deine
Leseliste… Gleich weiter zum Reader."

Net click count:
  Bookmarklet v1/v2 on working site:  2 clicks → 1 click
  Manual paste:                        2 clicks → 1 click
  Consent-wall fallback:              2 clicks (explicit "Trotzdem")
  Duplicate:                          2 clicks ("Zum gespeicherten
                                        Artikel")

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 15:29:53 +02:00
Till JS
3357e88a1c feat(articles): new read-it-later module — save / read / highlight
Pocket-style module for saving arbitrary web URLs, extracting readable
content server-side via @mana/shared-rss (Readability + JSDOM), and
storing it AES-GCM encrypted in IndexedDB for offline reading.

M1 skeleton: Dexie v33 (articles, articleHighlights, articleTags),
crypto registry entries, module registration, app-registry entry with
orange icon, empty-state ListView. articleTags is a pure junction
into the existing globalTags system (appId 'tags') — same pattern as
noteTags, eventTags, placeTags.

M2 URL save + reader: POST /api/v1/articles/extract (one endpoint,
not two — client caches the preview payload to avoid a double
server fetch). AddUrlForm with scope-aware dedupe, DetailView with
ReaderView typography shell (serif/sans, light/sepia/dark, size
slider), auto-tracked reading progress with scroll restore.

M3 highlights: TreeWalker-based plain-text offset resolution
(lib/offsets.ts), highlights store, floating HighlightMenu with
create + edit modes, HighlightLayer orchestrator that wraps/unwraps
highlight spans whenever highlights or htmlVersion changes. Four
colours (yellow/green/blue/pink), optional notes, click-to-edit,
dark-mode-aware overlay colours.

Drive-by: removed stale 'pendingProposals' entry from the plaintext
allowlist — the table was dropped in Dexie v29 and the allowlist
audit was flagging it as a dead entry.

Plan: docs/plans/articles-module.md. M4 (tags + filter + progress),
M5 (news:type='saved' migration), M6 (AI tools), M7 (share target),
M8 (highlights view + stats) still open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 16:20:23 +02:00