mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-16 02:19:41 +02:00
Three intertwined improvements so the "save an article" flow actually
works on real-world sites, not just bloggy happy-path URLs.
=== Consent-wall detection ===
apps/api/src/modules/articles/routes.ts: the /extract response now
includes `warning: 'probable_consent_wall'` when the extracted text
is both short (<300 words) AND contains cookie-dialog vocabulary
(Cookies zustimmen / cookie consent / Zustimmung / accept all cookies
/ enable javascript / privacy center / Datenschutzeinstellungen). The
server still returns whatever it got so the client can decide; it just
flags it as probably-not-the-article.
Frontend surfaces that warning prominently instead of silently
persisting a "Cookies zustimmen…" blob as the article body.
=== Browser-HTML extract path ===
Server-side: new POST /api/v1/articles/extract/html endpoint accepting
{ url, html }, running @mana/shared-rss's extractFromHtml on the
caller-supplied HTML. 10 MiB payload cap. Same response shape as
/extract, including the consent-wall warning (in case the bookmarklet
fires before the user dismisses the dialog).
Client-side: new extractFromHtml() in api.ts with the same 25s
timeout + typed network-error mapping as extractArticle.
AddUrlForm gains a postMessage handshake: when loaded with
?source=bookmarklet, it posts `mana-ready` to window.opener and
listens one-shot for `mana-html` with { url, html, title } from the
opener's tab. The HTML goes straight to our own /extract/html
endpoint — same-origin, carries the user's auth cookie. No CORS, no
form-submission CSP tango, no cross-origin token smuggling. If
nothing arrives within 30s we surface a clear error instead of
hanging.
Settings page adds a second "browser-HTML" bookmarklet (marked as
"Empfohlen") alongside the legacy URL bookmarklet. New snippet opens
/articles/add?source=bookmarklet in a new tab, waits for mana-ready,
then postMessages the tab's documentElement.outerHTML over. 15s
safety timeout.
This bypasses cookie-consent walls and soft paywalls because the
HTML already comes from the user's own authenticated, consented
browser tab.
=== Auto-save after successful extract ===
Previously every save path had a two-click UX: preview → confirm.
Now on clean extract the preview skips straight to persist + navigate
to the reader. Consent-wall warning is the only fallback that pauses
the flow — the user gets a "Trotzdem speichern" button to opt into
saving a teaser anyway.
Button in the manual input row is renamed "Vorschau abrufen" → "Speichern"
since it's now the commit action, not the inspect action. Loading-block
messaging distinguishes "Server extrahiert…" vs "Speichere in deine
Leseliste… Gleich weiter zum Reader."
Net click count:
Bookmarklet v1/v2 on working site: 2 clicks → 1 click
Manual paste: 2 clicks → 1 click
Consent-wall fallback: 2 clicks (explicit "Trotzdem")
Duplicate: 2 clicks ("Zum gespeicherten
Artikel")
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
131 lines
4.3 KiB
TypeScript
131 lines
4.3 KiB
TypeScript
/**
|
|
* Articles API client — talks to apps/api `/api/v1/articles/*`.
|
|
*
|
|
* One endpoint (`POST /extract`) with the Readability result. Both the
|
|
* preview (AddUrlForm) and the direct save paths share the same call;
|
|
* the client chooses whether to show the result or immediately persist.
|
|
*
|
|
* Auth + base-URL handling mirrors news/api.ts — see that file for the
|
|
* full rationale on why we read `getManaApiUrl()` and `authStore.
|
|
* getValidToken()` instead of the cookie/env shortcuts.
|
|
*/
|
|
|
|
import { authStore } from '$lib/stores/auth.svelte';
|
|
import { getManaApiUrl } from '$lib/api/config';
|
|
|
|
async function authHeader(): Promise<Record<string, string>> {
|
|
const token = await authStore.getValidToken();
|
|
return token ? { Authorization: `Bearer ${token}` } : {};
|
|
}
|
|
|
|
export interface ExtractedArticle {
|
|
originalUrl: string;
|
|
title: string;
|
|
excerpt: string | null;
|
|
content: string;
|
|
htmlContent: string;
|
|
author: string | null;
|
|
siteName: string | null;
|
|
wordCount: number;
|
|
readingTimeMinutes: number;
|
|
/**
|
|
* Server-side quality flag. Today only `'probable_consent_wall'` is
|
|
* emitted: the extracted text was suspiciously short AND contained
|
|
* consent-dialog vocabulary, which typically means the server's
|
|
* anonymous fetch hit a GDPR interstitial instead of the article.
|
|
* The client uses this to offer the bookmarklet-v2 (browser-HTML)
|
|
* path without silently persisting garbage.
|
|
*/
|
|
warning?: 'probable_consent_wall';
|
|
}
|
|
|
|
/**
|
|
* Hard client-side timeout for the extract roundtrip. The server's
|
|
* own Readability fetch has a 15s timeout + a few seconds of JSDOM
|
|
* parse overhead; anything past 25s on the wire is almost certainly a
|
|
* dead server or a stuck network path, not a slow article. Without
|
|
* this, AddUrlForm's loader just sat there forever when the API was
|
|
* unreachable — hence the bookmarklet-lands-on-loader bug.
|
|
*/
|
|
const EXTRACT_TIMEOUT_MS = 25_000;
|
|
|
|
export async function extractArticle(
|
|
url: string,
|
|
fetchImpl: typeof fetch = fetch
|
|
): Promise<ExtractedArticle> {
|
|
let response: Response;
|
|
try {
|
|
response = await fetchImpl(`${getManaApiUrl()}/api/v1/articles/extract`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Content-Type': 'application/json',
|
|
...(await authHeader()),
|
|
},
|
|
body: JSON.stringify({ url }),
|
|
signal: AbortSignal.timeout(EXTRACT_TIMEOUT_MS),
|
|
});
|
|
} catch (err) {
|
|
if (err instanceof DOMException && err.name === 'TimeoutError') {
|
|
throw new Error(
|
|
`Server antwortet nicht (nach ${EXTRACT_TIMEOUT_MS / 1000}s). Läuft apps/api?`
|
|
);
|
|
}
|
|
if (err instanceof TypeError) {
|
|
// Network-layer failure (connection refused, DNS, offline).
|
|
throw new Error(
|
|
`Server nicht erreichbar. Prüf dass apps/api läuft — pnpm run mana:dev startet beides.`
|
|
);
|
|
}
|
|
throw err;
|
|
}
|
|
if (!response.ok) {
|
|
const text = await response.text();
|
|
throw new Error(`extractArticle failed: ${response.status} ${text}`);
|
|
}
|
|
return (await response.json()) as ExtractedArticle;
|
|
}
|
|
|
|
/**
|
|
* Extract from a HTML payload the browser already has. Used by the
|
|
* bookmarklet-v2 flow — the user's browser already dealt with the
|
|
* cookie-consent wall, so we skip the server-side fetch entirely.
|
|
*
|
|
* The HTML cap is 10 MiB on the server; the browser sends
|
|
* `document.documentElement.outerHTML` which for typical article
|
|
* pages is 200-800 KB, well under the limit.
|
|
*/
|
|
export async function extractFromHtml(
|
|
url: string,
|
|
html: string,
|
|
fetchImpl: typeof fetch = fetch
|
|
): Promise<ExtractedArticle> {
|
|
let response: Response;
|
|
try {
|
|
response = await fetchImpl(`${getManaApiUrl()}/api/v1/articles/extract/html`, {
|
|
method: 'POST',
|
|
headers: {
|
|
'Content-Type': 'application/json',
|
|
...(await authHeader()),
|
|
},
|
|
body: JSON.stringify({ url, html }),
|
|
signal: AbortSignal.timeout(EXTRACT_TIMEOUT_MS),
|
|
});
|
|
} catch (err) {
|
|
if (err instanceof DOMException && err.name === 'TimeoutError') {
|
|
throw new Error(
|
|
`Server antwortet nicht (nach ${EXTRACT_TIMEOUT_MS / 1000}s). Läuft apps/api?`
|
|
);
|
|
}
|
|
if (err instanceof TypeError) {
|
|
throw new Error(
|
|
`Server nicht erreichbar. Prüf dass apps/api läuft — pnpm run mana:dev startet beides.`
|
|
);
|
|
}
|
|
throw err;
|
|
}
|
|
if (!response.ok) {
|
|
const text = await response.text();
|
|
throw new Error(`extractFromHtml failed: ${response.status} ${text}`);
|
|
}
|
|
return (await response.json()) as ExtractedArticle;
|
|
}
|