Commit graph

2 commits

Author SHA1 Message Date
Till JS
2bbcf14aba chore(geocoding): remove Pelias + close 3 bypass paths to public Nominatim
Pelias was retired from the Mac mini on 2026-04-28; photon-self
(self-hosted Photon on mana-gpu) has been the live primary since then.
This removes the now-dead Pelias adapter, config, tests, and the
services/mana-geocoding/pelias/ stack — the entire compose file, the
geojsonify_place_details.js patch, the setup.sh import script.

Provider chain is now `photon-self → photon → nominatim`. The chain
keeps its `privacy: 'local' | 'public'` split, sensitive-query
blocking, coord quantization, and aggressive caching unchanged.

Three direct calls to nominatim.openstreetmap.org that bypassed
mana-geocoding now route through the wrapper:

- citycorners/add-city + citycorners/cities/[slug]/add use the shared
  searchAddress() client (browser → same-origin proxy → mana-geocoding
  → photon-self).
- memoro mobile drops its OSM reverse-geocoding fallback entirely;
  Expo's on-device reverse-geocoding stays as the sole path. Routing
  through the wrapper would require a memoro-server proxy endpoint —
  a follow-up if Expo's quality proves insufficient.

Other behavioral changes:

- CACHE_PUBLIC_TTL_MS dropped from 7d → 1h. The long TTL was a
  privacy-amplification trick from the Pelias era; with photon-self
  serving the bulk of traffic, a transient cross-LAN blip was pinning
  cached fallback answers for days. 1h gives quick recovery.
- /health/pelias renamed to /health/photon-self; prometheus blackbox
  config + status-page generator updated.
- mana-geocoding container no longer needs `extra_hosts:
  host.docker.internal:host-gateway` (was only there for the
  Pelias-on-host-network era).

113 tests passing. CLAUDE.md rewritten to reflect the post-Pelias
architecture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:12:26 +02:00
Till JS
bcc21ca785 feat(geocoding): privacy hardening — sensitive-query block + coord
quantization + extended cache TTL for public answers

Three independent defenses limit what public geocoding APIs (Photon,
Nominatim) can learn from our outbound traffic:

1. **Sensitive-query block** (`lib/sensitive-query.ts`)
   Queries matching the medical/mental-health/crisis-service keyword
   list (Hausarzt, Psychiater, Klinikum, HIV, Frauenhaus, …) are
   never forwarded to public APIs. The chain detects sensitivity at
   the route layer and runs the search in localOnly mode — providers
   with `privacy: 'public'` are filtered out before iteration begins.
   When no local provider is available (Pelias stopped), a sensitive
   query returns ok:true with results:[] and notice:
   'sensitive_local_unavailable' so the UI can show a sensible
   message instead of "no results".

   The keyword list is documented inline. False negatives are the
   risk; false positives just produce a 0-result UX hit (better
   trade-off).

2. **Coordinate quantization** (`lib/privacy.ts`)
   Forward-search focus.lat/lon: rounded to 2 decimals (~1.1km).
     Enough for the bias to work, hides exact GPS.
   Reverse-geocoding lat/lon: rounded to 3 decimals (~110m).
     City-block resolution — sufficient for "what's near me?",
     avoids reverse-geocoding the user's exact front door.
   Pelias always gets full precision; quantization only on the way
   out to public APIs. New `privacy: 'local' | 'public'` field on
   the GeocodingProvider interface drives this.

3. **Extended cache TTL for public answers**
   New `cache.publicTtlMs` config option, default 7 days (vs. 24h
   for local-provider answers). LRU cache extended with optional
   `ttlOverrideMs` per entry. Same query from N users → 1 outbound
   request to Photon/Nominatim. Strongest privacy lever we have
   over public providers (we can't change their logging, only the
   rate at which we feed them queries).

Threat coverage:
   ✓ User IP / identity hidden (already true — wrapper is the proxy)
   ✓ Exact GPS hidden (quantization)
   ✓ Sensitive query content protected (block)
   ~ Non-sensitive query content visible (acceptable trade-off)
   ~ Aggregate profiling reduced ~10–100× (cache)
   ✗ TLS-level traffic analysis, compelled disclosure (out of scope)

Tests: 141 (was 115). New coverage:
- privacy.test.ts: quantization rules (locks the privacy claim)
- sensitive-query.test.ts: positive matches across categories +
  documented false positives we accept
- chain.test.ts: localOnly mode end-to-end including the load-
  bearing assertion that public providers' search() must NEVER be
  called when the chain is in localOnly mode (no race window)
- cache.test.ts: per-entry ttlOverride longer + shorter than default

Live smoke verified end-to-end:
- "Hausarzt Konstanz" with Pelias down → no public API call,
  notice: 'sensitive_local_unavailable'
- "Konstanz" → falls through to Photon, notice: 'fallback_used'
- Reverse with high-precision GPS → Photon receives quantized
  coords, returns city-block-level result
2026-04-28 16:04:56 +02:00