Commit graph

2 commits

Author SHA1 Message Date
Till JS
bcc21ca785 feat(geocoding): privacy hardening — sensitive-query block + coord
quantization + extended cache TTL for public answers

Three independent defenses limit what public geocoding APIs (Photon,
Nominatim) can learn from our outbound traffic:

1. **Sensitive-query block** (`lib/sensitive-query.ts`)
   Queries matching the medical/mental-health/crisis-service keyword
   list (Hausarzt, Psychiater, Klinikum, HIV, Frauenhaus, …) are
   never forwarded to public APIs. The chain detects sensitivity at
   the route layer and runs the search in localOnly mode — providers
   with `privacy: 'public'` are filtered out before iteration begins.
   When no local provider is available (Pelias stopped), a sensitive
   query returns ok:true with results:[] and notice:
   'sensitive_local_unavailable' so the UI can show a sensible
   message instead of "no results".

   The keyword list is documented inline. False negatives are the
   risk; false positives just produce a 0-result UX hit (better
   trade-off).

2. **Coordinate quantization** (`lib/privacy.ts`)
   Forward-search focus.lat/lon: rounded to 2 decimals (~1.1km).
     Enough for the bias to work, hides exact GPS.
   Reverse-geocoding lat/lon: rounded to 3 decimals (~110m).
     City-block resolution — sufficient for "what's near me?",
     avoids reverse-geocoding the user's exact front door.
   Pelias always gets full precision; quantization only on the way
   out to public APIs. New `privacy: 'local' | 'public'` field on
   the GeocodingProvider interface drives this.

3. **Extended cache TTL for public answers**
   New `cache.publicTtlMs` config option, default 7 days (vs. 24h
   for local-provider answers). LRU cache extended with optional
   `ttlOverrideMs` per entry. Same query from N users → 1 outbound
   request to Photon/Nominatim. Strongest privacy lever we have
   over public providers (we can't change their logging, only the
   rate at which we feed them queries).

Threat coverage:
   ✓ User IP / identity hidden (already true — wrapper is the proxy)
   ✓ Exact GPS hidden (quantization)
   ✓ Sensitive query content protected (block)
   ~ Non-sensitive query content visible (acceptable trade-off)
   ~ Aggregate profiling reduced ~10–100× (cache)
   ✗ TLS-level traffic analysis, compelled disclosure (out of scope)

Tests: 141 (was 115). New coverage:
- privacy.test.ts: quantization rules (locks the privacy claim)
- sensitive-query.test.ts: positive matches across categories +
  documented false positives we accept
- chain.test.ts: localOnly mode end-to-end including the load-
  bearing assertion that public providers' search() must NEVER be
  called when the chain is in localOnly mode (no race window)
- cache.test.ts: per-entry ttlOverride longer + shorter than default

Live smoke verified end-to-end:
- "Hausarzt Konstanz" with Pelias down → no public API call,
  notice: 'sensitive_local_unavailable'
- "Konstanz" → falls through to Photon, notice: 'fallback_used'
- Reverse with high-precision GPS → Photon receives quantized
  coords, returns city-block-level result
2026-04-28 16:04:56 +02:00
Till JS
f1e4a39644 feat(geocoding): provider chain with Photon + Nominatim fallbacks
mana-geocoding now tries Pelias first, falls back to public Photon
(komoot.io) and finally to public Nominatim (OSM) when Pelias is
unhealthy or unreachable. The Places module's address lookup keeps
working even when the Pelias container is stopped — which it currently
is on the Mac mini, freeing 3 GB of RAM until Pelias gets migrated to
the GPU server.

Architecture:

  ProviderChain ─ tries providers in priority order, stops on first
                  success. A clean empty-results answer is definitive
                  (don't burn through public-API budget on a query that
                  legitimately has no match). Only network errors / 5xx
                  / 429 trigger fallthrough.

  HealthCache  ─ per-provider, 30s TTL. A failed health probe or a
                  failed search marks the provider unhealthy and skips
                  it for the rest of the cache window. Lazy refresh —
                  no background pinger.

  RateLimiter  ─ single-token FIFO queue, 1100ms gap by default.
                  Used to enforce Nominatim's 1 req/sec policy. Handles
                  abort during inter-task wait by releasing the busy
                  flag so later tasks aren't blocked.

Provider details:

  pelias    — primary, self-hosted DACH index, full OSM taxonomy in
              `peliasCategories`, no rate limit
  photon    — public komoot endpoint, GeoJSON shape, raw `osm_key:
              osm_value` mapped via lib/osm-category-map.ts. Faster
              than Nominatim, no advertised rate limit but be polite.
  nominatim — public OSM endpoint, strict 1 req/sec via the limiter,
              custom User-Agent required (otherwise 403). Last
              resort — fallback for when Photon is also down.

Response shape changes (additive only — existing callers keep
working):

  - results[].provider: 'pelias' | 'photon' | 'nominatim'
  - results[].peliasCategories: only present when Pelias served the
    request (was already absent on Pelias-API patch failures)
  - top-level provider: <name> + tried: <name[]> on success/error
  - new endpoint: GET /health/providers — per-provider snapshot

Configuration via env (defaults shipped):

  GEOCODING_PROVIDERS=pelias,photon,nominatim   # order matters
  PROVIDER_TIMEOUT_MS=5000
  PROVIDER_HEALTH_CACHE_MS=30000
  PHOTON_API_URL=https://photon.komoot.io
  NOMINATIM_API_URL=https://nominatim.openstreetmap.org
  NOMINATIM_USER_AGENT=mana-geocoding/1.0 (+https://mana.how; ...)
  NOMINATIM_INTERVAL_MS=1100

Testing: 115 tests green (was 42). New coverage:
  - osm-category-map.test.ts (47 cases over food/transit/shopping/
    leisure/work/other priority resolution)
  - rate-limiter.test.ts (FIFO, abort-during-wait, abort-during-sleep)
  - chain.test.ts (failover, empty-results-stops, health-cache,
    snapshot)
  - photon-normalizer.test.ts and nominatim-normalizer.test.ts (lock
    the wire-format mapping for both fallback providers)

Live smoke against public Photon verified — both /search and /reverse
return correctly normalized results with provider="photon" when Pelias
is unreachable.
2026-04-28 15:21:11 +02:00