mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-18 06:09:41 +02:00
news-ingester: als DEPRECATED markiert (Cutover auf mana-news-pool)
Some checks are pending
CD Mac Mini / Detect Changes (push) Waiting to run
CD Mac Mini / Deploy (push) Blocked by required conditions
CI / Detect Changes (push) Waiting to run
CI / Validate (push) Waiting to run
CI / Build mana-search (push) Blocked by required conditions
CI / Build mana-sync (push) Blocked by required conditions
CI / Build mana-api-gateway (push) Blocked by required conditions
CI / Build mana-crawler (push) Blocked by required conditions
Docker Validate / Validate Dockerfiles (push) Waiting to run
Docker Validate / Build calendar-web (push) Blocked by required conditions
Docker Validate / Build quotes-web (push) Blocked by required conditions
Docker Validate / Build todo-backend (push) Blocked by required conditions
Docker Validate / Build todo-web (push) Blocked by required conditions
Docker Validate / Build mana-auth (push) Blocked by required conditions
Docker Validate / Build mana-sync (push) Blocked by required conditions
Docker Validate / Build mana-media (push) Blocked by required conditions
Mirror to Forgejo / Push to Forgejo (push) Waiting to run
Some checks are pending
CD Mac Mini / Detect Changes (push) Waiting to run
CD Mac Mini / Deploy (push) Blocked by required conditions
CI / Detect Changes (push) Waiting to run
CI / Validate (push) Waiting to run
CI / Build mana-search (push) Blocked by required conditions
CI / Build mana-sync (push) Blocked by required conditions
CI / Build mana-api-gateway (push) Blocked by required conditions
CI / Build mana-crawler (push) Blocked by required conditions
Docker Validate / Validate Dockerfiles (push) Waiting to run
Docker Validate / Build calendar-web (push) Blocked by required conditions
Docker Validate / Build quotes-web (push) Blocked by required conditions
Docker Validate / Build todo-backend (push) Blocked by required conditions
Docker Validate / Build todo-web (push) Blocked by required conditions
Docker Validate / Build mana-auth (push) Blocked by required conditions
Docker Validate / Build mana-sync (push) Blocked by required conditions
Docker Validate / Build mana-media (push) Blocked by required conditions
Mirror to Forgejo / Push to Forgejo (push) Waiting to run
CLAUDE.md umgeschrieben — Service-Beschreibung war seit dem 2026-05-17-Cutover irreführend (sprach von Container :3066, der nicht mehr läuft, und 'unified mana-api liest aus derselben Tabelle', wo jetzt HTTP-Proxy steht). Klare Drop-Bedingungen für das ganze Verzeichnis dokumentiert: - mana-news-pool 30 Tage stabil (~2026-06-17) - altes news.curated_articles-Schema gedroppt Bis dahin nicht anfassen — Source-Tree als Referenz für die letzte managarten-eigene Source-Liste.
This commit is contained in:
parent
5c47de8dd2
commit
501055a76c
1 changed files with 26 additions and 89 deletions
|
|
@ -1,100 +1,37 @@
|
|||
# news-ingester
|
||||
# news-ingester — DEPRECATED 2026-05-17
|
||||
|
||||
Pulls public RSS/JSON feeds into `news.curated_articles` for the News Hub
|
||||
module in the unified Mana app. The unified `mana-api` reads from the
|
||||
same table to serve `GET /api/v1/news/feed`.
|
||||
> **Dieser Service wurde am 2026-05-17 durch
|
||||
> [`mana-news-pool`](https://git.mana.how/mana/mana) (Plattform-Port 3079,
|
||||
> eigene DB `mana_news_pool`, Schema `pool.curated_articles`) ersetzt.**
|
||||
|
||||
## Tech Stack
|
||||
Der Container `news-ingester:3066` läuft nicht mehr. `managarten/apps/api/
|
||||
src/modules/news/routes.ts` ist seit Commit `ad97c5362` ein HTTP-Proxy
|
||||
auf `MANA_NEWS_POOL_URL=http://mana-news-pool:3079`.
|
||||
|
||||
| Layer | Technology |
|
||||
|-------|------------|
|
||||
| Runtime | Bun |
|
||||
| Framework | Hono (only for health/status/manual trigger) |
|
||||
| Database | PostgreSQL + Drizzle ORM (schema `news` in `mana_platform`) |
|
||||
| Parsing | `rss-parser` for RSS/Atom, `@mozilla/readability` + `jsdom` for full-text fallback |
|
||||
Source-Liste, Ingest-Logik, Konventionen leben jetzt in:
|
||||
`mana/services/mana-news-pool/` (siehe `CLAUDE.md` dort).
|
||||
|
||||
## Port: 3066
|
||||
## Was hier noch steht — und warum
|
||||
|
||||
## What it does
|
||||
- **Source-Tree als Referenz**: `services/news-ingester/src/sources.ts`
|
||||
ist die Stand-2026-05-16-Source-Liste. Wenn jemand die Drift zwischen
|
||||
alten und neuen Sources rückblickend prüfen will, ist das die letzte
|
||||
managarten-eigene Version.
|
||||
- **Dockerfile + package.json**: dokumentieren das alte Pattern. Können
|
||||
beim Sprint-Aufräumen gedroppt werden.
|
||||
|
||||
On startup and every `TICK_INTERVAL_MS` (default 15 min):
|
||||
## Drop-Plan
|
||||
|
||||
1. For each source in `src/sources.ts`, fetch the feed (RSS or HN JSON).
|
||||
2. Normalize items and dedupe by `sha256(originalUrl)` against the
|
||||
`url_hash` unique index — re-runs are safe.
|
||||
3. If the feed body has fewer than 200 words, fall back to Mozilla
|
||||
Readability against the original URL to get the full article text.
|
||||
4. Insert into `news.curated_articles` with topic + source slug from the
|
||||
source definition. Topic classification is **static** (per-source);
|
||||
we do not run any content classifier.
|
||||
5. Prune rows older than 30 days at the end of each tick.
|
||||
Dieses Verzeichnis kann komplett gelöscht werden, sobald:
|
||||
|
||||
## API
|
||||
1. `mana-news-pool` 30 Tage stabil läuft (~2026-06-17).
|
||||
2. Das alte `mana_platform.news.curated_articles`-Schema gedroppt ist
|
||||
(siehe Memory `project_news_pool_old_schema_drop`).
|
||||
|
||||
| Method | Path | Description |
|
||||
|--------|------|-------------|
|
||||
| GET | `/health` | Healthcheck — returns 503 if Postgres unreachable |
|
||||
| GET | `/status` | Last tick result (sources, counts, duration) |
|
||||
| POST | `/ingest/run` | Trigger an ingest tick now (returns immediately) |
|
||||
Bis dahin: nicht anfassen, dokumentiert den Cutover-Pfad.
|
||||
|
||||
No auth — service is internal-only behind the docker network.
|
||||
## Cross-Refs
|
||||
|
||||
## Adding a source
|
||||
|
||||
1. Append to `SOURCES` in `src/sources.ts` with a stable `slug`, type
|
||||
(`rss` or `hn`), URL, topic, and language.
|
||||
2. Mirror the slug + name into the unified web app's onboarding picker
|
||||
at `apps/mana/apps/web/src/lib/modules/news/sources-meta.ts` so users
|
||||
can opt out of it. **Slugs must match** — user blocklists reference
|
||||
them.
|
||||
3. Restart container and `curl -X POST http://localhost:3066/ingest/run`
|
||||
to populate immediately.
|
||||
|
||||
## Topics
|
||||
|
||||
The seven shipped topics are: `tech`, `wissenschaft`, `weltgeschehen`,
|
||||
`wirtschaft`, `kultur`, `gesundheit`, `politik`. Adding a new topic
|
||||
means updating the `Topic` union in `src/sources.ts` AND the matching
|
||||
type in the unified web app's `news/types.ts`.
|
||||
|
||||
## Database
|
||||
|
||||
Schema: `news` in `mana_platform`. Single table `curated_articles`,
|
||||
indexed on `(topic, published_at)`, `(language, published_at)`,
|
||||
`source_slug`, and `ingested_at`.
|
||||
|
||||
`bun run db:push` pushes the schema. The schema is intentionally NOT
|
||||
referenced from `apps/api` — `apps/api/src/modules/news/routes.ts`
|
||||
queries the table via raw SQL to keep the API service free of a Drizzle
|
||||
schema dependency on this service.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```env
|
||||
PORT=3066
|
||||
DATABASE_URL=postgresql://mana:devpassword@localhost:5432/mana_platform
|
||||
TICK_INTERVAL_MS=900000 # 15 minutes
|
||||
RUN_ON_STARTUP=true
|
||||
```
|
||||
|
||||
## Local Dev
|
||||
|
||||
```bash
|
||||
cd services/news-ingester
|
||||
bun install
|
||||
bun run db:push # creates news.curated_articles
|
||||
bun run dev # starts on :3066, ticks immediately
|
||||
curl -X POST http://localhost:3066/ingest/run
|
||||
curl http://localhost:3066/status | jq
|
||||
```
|
||||
|
||||
## Privacy / Legal
|
||||
|
||||
Only public RSS feeds intended for syndication are ingested. The
|
||||
`User-Agent` is `ManaNewsIngester/1.0 (+https://mana.how/news)` so site
|
||||
owners can identify and contact us. Per-source rate limit is implicit
|
||||
(15 min interval × ~30 items/source = ~2 req/min/source).
|
||||
|
||||
User reading behavior is **not** tracked here. Personalization happens
|
||||
client-side in the unified Mana app's local IndexedDB; the ingester
|
||||
only knows what was published, not what was read.
|
||||
- `mana/services/mana-news-pool/CLAUDE.md` — neuer Service
|
||||
- `managarten/apps/api/src/modules/news/routes.ts` — Proxy-Implementation
|
||||
- `mana/docs/MICROSERVICES_KANDIDATEN.md` — Lift-B-Plan
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue