Commit graph

5 commits

Author SHA1 Message Date
Till JS
b768a0ffce refactor(shared-rss): extract RSS parsing + Readability into one package
news-ingester and apps/api both shipped their own copy of rss-parser
+ jsdom + Readability glue. Single source now in packages/shared-rss.
Adds discoverFeeds (rel=alternate + common-paths probe) and validateFeed
which News Research will use. JSDOM virtualConsole is silenced once,
in the package, instead of in two parallel call sites.

- packages/shared-rss: parse, extract, discover, validate
- services/news-ingester: drop local parsers, depend on @mana/shared-rss
- apps/api: drop @mozilla/readability + jsdom direct deps, use shared

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:30:44 +02:00
Till JS
52159ee07a fix(news-ingester): disable Readability fallback to break crash loop
JSDOM throws CSS / parser errors from detached parse5 callbacks that
escape every try/catch in the call stack and even bun's
process.on('uncaughtException') handlers — leaving the daemon stuck
crash-looping past the first bad page in source #4 (heise) without
ever making forward progress.

Set FULL_TEXT_THRESHOLD_WORDS = 0 so we never call into Readability.
Sources that ship full RSS bodies (Tagesschau, Spiegel, BBC, …) are
unaffected. Title-only sources (Hacker News) keep the row with an
empty content field; the reader already falls back to "Original
öffnen ↗" in that case.

Re-enabling extraction in a worker thread is left for a follow-up.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:21:09 +02:00
Till JS
dad174a631 fix(news-ingester): silence JSDOM CSS errors + add process-level safety net
JSDOM's CSS parser throws on plenty of real-world pages and the error
escapes every try/catch in the buildRow → ingestSource chain because
it fires from a parse5 callback that runs after JSDOM has returned.
In the prod container this killed the process on the first bad page,
docker restarted it, and it crash-looped on the same first source
forever — no progress past tech.

Two-layer fix: a silent VirtualConsole on every JSDOM instance to
swallow CSS / resource errors at the source, plus process-level
uncaughtException + unhandledRejection handlers that log and continue
so any future async escape can't kill the daemon either.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:15:46 +02:00
Till JS
68d1bda7e5 fix(news-ingester): drop unused @mana/shared-hono workspace dep
Was copied verbatim from mana-credits' template but not actually
imported anywhere in src/. Removing it lets the Docker build's bun
install resolve from npm only — workspace:* refs need the full
monorepo context which the Dockerfile doesn't copy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 16:11:58 +02:00
Till JS
9ef97a1877 feat(news): backend ingester service + curated feed API
Adds the services/news-ingester Bun service that pulls 25 public RSS/JSON
feeds into news.curated_articles every 15 min, with Mozilla Readability
fallback for thin RSS bodies and 30-day retention. apps/api /feed is
rewritten to read from the new pool table directly instead of the
sync_changes hack, with topics/lang/since/limit/offset query params.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 15:53:26 +02:00