Commit graph

19 commits

Author SHA1 Message Date
Till JS
8a5fad34df fix(geocoding): bump PROVIDER_TIMEOUT_MS to 20s for cold cross-LAN
Cold-start fetches from the mana-geocoding container to photon-self
on mana-gpu (over WSL2 mirrored networking) consistently take >10s on
the first probe and ~2s once warm. The previous 8s default caused the
chain to false-mark photon-self unhealthy on every cold path, leaking
to public photon for the next 30s health-cache window — and pinning
the public-photon answer in the 7d cache (now shortened to 1h).

Also wires the docker-compose macmini env to honor PROVIDER_TIMEOUT_MS
and CACHE_PUBLIC_TTL_MS overrides so production picks up the new
values without a code rebuild.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:19:21 +02:00
Till JS
2bbcf14aba chore(geocoding): remove Pelias + close 3 bypass paths to public Nominatim
Pelias was retired from the Mac mini on 2026-04-28; photon-self
(self-hosted Photon on mana-gpu) has been the live primary since then.
This removes the now-dead Pelias adapter, config, tests, and the
services/mana-geocoding/pelias/ stack — the entire compose file, the
geojsonify_place_details.js patch, the setup.sh import script.

Provider chain is now `photon-self → photon → nominatim`. The chain
keeps its `privacy: 'local' | 'public'` split, sensitive-query
blocking, coord quantization, and aggressive caching unchanged.

Three direct calls to nominatim.openstreetmap.org that bypassed
mana-geocoding now route through the wrapper:

- citycorners/add-city + citycorners/cities/[slug]/add use the shared
  searchAddress() client (browser → same-origin proxy → mana-geocoding
  → photon-self).
- memoro mobile drops its OSM reverse-geocoding fallback entirely;
  Expo's on-device reverse-geocoding stays as the sole path. Routing
  through the wrapper would require a memoro-server proxy endpoint —
  a follow-up if Expo's quality proves insufficient.

Other behavioral changes:

- CACHE_PUBLIC_TTL_MS dropped from 7d → 1h. The long TTL was a
  privacy-amplification trick from the Pelias era; with photon-self
  serving the bulk of traffic, a transient cross-LAN blip was pinning
  cached fallback answers for days. 1h gives quick recovery.
- /health/pelias renamed to /health/photon-self; prometheus blackbox
  config + status-page generator updated.
- mana-geocoding container no longer needs `extra_hosts:
  host.docker.internal:host-gateway` (was only there for the
  Pelias-on-host-network era).

113 tests passing. CLAUDE.md rewritten to reflect the post-Pelias
architecture.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 22:12:26 +02:00
Till JS
fc49198992 docs(geocoding): post-migration log + Photon weekly-refresh operator scripts
- Decision report: status flipped to MIGRATED; added migration log with
  five WSL2 gotchas (bzip2 missing, no official Photon image,
  firewall=true blocks cross-LAN, vmIdleTimeout=-1 ineffective,
  PowerShell pre-expansion of bash $(...)) and resource snapshot.
- mana-geocoding CLAUDE.md: PHOTON_SELF_API_URL note now reflects live
  primary status on mana-gpu since 2026-04-28.
- photon-self/: operator scripts for the weekly DB refresh — update.sh
  (atomic-swap with rollback), systemd unit + timer (Sun 03:30 +30min
  jitter, Persistent=true), README with re-installation instructions
  for DR. Currently installed and enabled on mana-gpu.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:31:37 +02:00
Till JS
153ad8049c feat(geocoding): support dual-Photon (self-hosted + public) for GPU
migration

The chain now distinguishes two Photon instances:
  photon-self  privacy: 'local'   (self-hosted on mana-gpu)
  photon       privacy: 'public'  (komoot.io, last-resort fallback)

Both wrap the same `PhotonProvider` class with different config — only
the URL, name, and privacy stance differ. The new ProviderName variant
'photon-self' lets the chain track per-provider health for them
independently (a single 'photon' slot would collide in the health
Map).

Opt-in registration: `photon-self` is only built when
PHOTON_SELF_API_URL is set in the env. When unset (current state),
the chain has the same shape as before — full backward compat. After
the GPU migration, flipping the env-var on is the only deploy step
needed:
  PHOTON_SELF_API_URL=http://192.168.178.11:2322

Default chain order updated to:
  photon-self,pelias,photon,nominatim
  ^^^^^^^^^^^ silently skipped if not registered (env unset)

The privacy guarantee is structural: photon-self carries privacy:
'local', so the existing sensitive-query block from the previous
hardening commit now has a real local backend post-migration —
medical/crisis-service queries get real results instead of the
"sensitive_local_unavailable" notice.

Tests: 148 (was 141). New coverage:
- src/__tests__/app.test.ts: createChain registration logic — verifies
  photon-self appears iff PHOTON_SELF_API_URL is set, ordering
  honored, GEOCODING_PROVIDERS env-var filter respected
- providers/__tests__/photon-normalizer.test.ts: provider field
  carries 'photon' or 'photon-self' based on the call argument

Recon of mana-gpu (2026-04-28): Windows 11 Pro Build 26200, 64 GB
RAM (56 GB free), 739 GB disk free, no WSL2/Docker yet, no native
GPU services running. Setup plan documented in
docs/runbooks/photon-on-mana-gpu.md (3–4 h, ~1 h of which is
download/unpack waiting).
2026-04-28 17:19:04 +02:00
Till JS
bcc21ca785 feat(geocoding): privacy hardening — sensitive-query block + coord
quantization + extended cache TTL for public answers

Three independent defenses limit what public geocoding APIs (Photon,
Nominatim) can learn from our outbound traffic:

1. **Sensitive-query block** (`lib/sensitive-query.ts`)
   Queries matching the medical/mental-health/crisis-service keyword
   list (Hausarzt, Psychiater, Klinikum, HIV, Frauenhaus, …) are
   never forwarded to public APIs. The chain detects sensitivity at
   the route layer and runs the search in localOnly mode — providers
   with `privacy: 'public'` are filtered out before iteration begins.
   When no local provider is available (Pelias stopped), a sensitive
   query returns ok:true with results:[] and notice:
   'sensitive_local_unavailable' so the UI can show a sensible
   message instead of "no results".

   The keyword list is documented inline. False negatives are the
   risk; false positives just produce a 0-result UX hit (better
   trade-off).

2. **Coordinate quantization** (`lib/privacy.ts`)
   Forward-search focus.lat/lon: rounded to 2 decimals (~1.1km).
     Enough for the bias to work, hides exact GPS.
   Reverse-geocoding lat/lon: rounded to 3 decimals (~110m).
     City-block resolution — sufficient for "what's near me?",
     avoids reverse-geocoding the user's exact front door.
   Pelias always gets full precision; quantization only on the way
   out to public APIs. New `privacy: 'local' | 'public'` field on
   the GeocodingProvider interface drives this.

3. **Extended cache TTL for public answers**
   New `cache.publicTtlMs` config option, default 7 days (vs. 24h
   for local-provider answers). LRU cache extended with optional
   `ttlOverrideMs` per entry. Same query from N users → 1 outbound
   request to Photon/Nominatim. Strongest privacy lever we have
   over public providers (we can't change their logging, only the
   rate at which we feed them queries).

Threat coverage:
   ✓ User IP / identity hidden (already true — wrapper is the proxy)
   ✓ Exact GPS hidden (quantization)
   ✓ Sensitive query content protected (block)
   ~ Non-sensitive query content visible (acceptable trade-off)
   ~ Aggregate profiling reduced ~10–100× (cache)
   ✗ TLS-level traffic analysis, compelled disclosure (out of scope)

Tests: 141 (was 115). New coverage:
- privacy.test.ts: quantization rules (locks the privacy claim)
- sensitive-query.test.ts: positive matches across categories +
  documented false positives we accept
- chain.test.ts: localOnly mode end-to-end including the load-
  bearing assertion that public providers' search() must NEVER be
  called when the chain is in localOnly mode (no race window)
- cache.test.ts: per-entry ttlOverride longer + shorter than default

Live smoke verified end-to-end:
- "Hausarzt Konstanz" with Pelias down → no public API call,
  notice: 'sensitive_local_unavailable'
- "Konstanz" → falls through to Photon, notice: 'fallback_used'
- Reverse with high-precision GPS → Photon receives quantized
  coords, returns city-block-level result
2026-04-28 16:04:56 +02:00
Till JS
9a0cf5b676 fix(geocoding): bump PROVIDER_TIMEOUT_MS default 5s → 8s
First-probe DNS+TLS handshake against Nominatim can take >5s on a
cold start (verified locally: 642ms warm, sometimes 5-8s cold). The
old 5s default false-marked Nominatim unhealthy and the 30s health-
cache then locked us into a fallback-of-fallback gap. 8s gives
enough headroom for cold-start while still cutting off actually-
stuck connections.

Photon and Pelias don't hit this — Photon's CDN is consistently
sub-second and Pelias is on localhost / LAN. Only the public
Nominatim path warranted the bump, but the timeout is per-provider
shared so we adjust it globally.

Existing PROVIDER_TIMEOUT_MS env override still wins.
2026-04-28 15:39:09 +02:00
Till JS
f1e4a39644 feat(geocoding): provider chain with Photon + Nominatim fallbacks
mana-geocoding now tries Pelias first, falls back to public Photon
(komoot.io) and finally to public Nominatim (OSM) when Pelias is
unhealthy or unreachable. The Places module's address lookup keeps
working even when the Pelias container is stopped — which it currently
is on the Mac mini, freeing 3 GB of RAM until Pelias gets migrated to
the GPU server.

Architecture:

  ProviderChain ─ tries providers in priority order, stops on first
                  success. A clean empty-results answer is definitive
                  (don't burn through public-API budget on a query that
                  legitimately has no match). Only network errors / 5xx
                  / 429 trigger fallthrough.

  HealthCache  ─ per-provider, 30s TTL. A failed health probe or a
                  failed search marks the provider unhealthy and skips
                  it for the rest of the cache window. Lazy refresh —
                  no background pinger.

  RateLimiter  ─ single-token FIFO queue, 1100ms gap by default.
                  Used to enforce Nominatim's 1 req/sec policy. Handles
                  abort during inter-task wait by releasing the busy
                  flag so later tasks aren't blocked.

Provider details:

  pelias    — primary, self-hosted DACH index, full OSM taxonomy in
              `peliasCategories`, no rate limit
  photon    — public komoot endpoint, GeoJSON shape, raw `osm_key:
              osm_value` mapped via lib/osm-category-map.ts. Faster
              than Nominatim, no advertised rate limit but be polite.
  nominatim — public OSM endpoint, strict 1 req/sec via the limiter,
              custom User-Agent required (otherwise 403). Last
              resort — fallback for when Photon is also down.

Response shape changes (additive only — existing callers keep
working):

  - results[].provider: 'pelias' | 'photon' | 'nominatim'
  - results[].peliasCategories: only present when Pelias served the
    request (was already absent on Pelias-API patch failures)
  - top-level provider: <name> + tried: <name[]> on success/error
  - new endpoint: GET /health/providers — per-provider snapshot

Configuration via env (defaults shipped):

  GEOCODING_PROVIDERS=pelias,photon,nominatim   # order matters
  PROVIDER_TIMEOUT_MS=5000
  PROVIDER_HEALTH_CACHE_MS=30000
  PHOTON_API_URL=https://photon.komoot.io
  NOMINATIM_API_URL=https://nominatim.openstreetmap.org
  NOMINATIM_USER_AGENT=mana-geocoding/1.0 (+https://mana.how; ...)
  NOMINATIM_INTERVAL_MS=1100

Testing: 115 tests green (was 42). New coverage:
  - osm-category-map.test.ts (47 cases over food/transit/shopping/
    leisure/work/other priority resolution)
  - rate-limiter.test.ts (FIFO, abort-during-wait, abort-during-sleep)
  - chain.test.ts (failover, empty-results-stops, health-cache,
    snapshot)
  - photon-normalizer.test.ts and nominatim-normalizer.test.ts (lock
    the wire-format mapping for both fallback providers)

Live smoke against public Photon verified — both /search and /reverse
return correctly normalized results with provider="photon" when Pelias
is unreachable.
2026-04-28 15:21:11 +02:00
Till JS
286e273b18 test(geocoding): add unit tests + end-to-end smoke test script
**Unit tests (`bun test`, 42 checks, 0 deps)**

- `src/lib/__tests__/category-map.test.ts` locks in the Pelias→
  PlaceCategory priority resolution. Covers the ambiguous multi-category
  case (food beats retail for restaurants, transit beats professional
  for car rentals, transport:rail still maps to transit, …), the simple
  single-category paths, the layer-hint fallback, and regression cases
  from real Konstanz/Stuttgart/Köln venues observed during deploy
  verification.
- `src/lib/__tests__/cache.test.ts` covers LRU eviction order, TTL
  expiry, move-to-end on get (so frequently-read entries survive
  eviction), size tracking, and typed-value storage.

**Smoke test (`./scripts/smoke-test.sh` or `bun run test:smoke`)**

End-to-end curls against a running service, aimed at post-deploy
verification. Health endpoints, forward (venue + street fallback),
focus biasing, reverse geocoding, cache hit. 9 checks total.

Wired up as `test:smoke` in package.json so it runs alongside the
unit tests. Verified working: 42/42 unit tests green locally, 9/9
smoke checks green against the live Mac Mini deployment.

CLAUDE.md Testing section rewritten to reflect the new test layers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 20:21:18 +02:00
Till JS
32d9f25e7f docs(geocoding): update CLAUDE.md with deploy lessons learned
After the 2026-04-11 production deploy, several non-obvious gotchas
surfaced that needed documenting:

- Forward search: autocomplete→search fallback explained, so future-me
  knows why the handler hits two Pelias endpoints for address-style
  queries.
- Pelias infra: corrected object counts (13.4M actual, not 22M), noted
  the libpostal RAM surprise (~1.9 GB, much larger than Pelias docs
  suggest), and added real per-container RAM numbers from production.
- pelias.json: document that we dropped placeholder/pip/interpolation
  (not just how to run them) and why the cleaner degradation matters.
- Wrapper gotchas section: Bun idleTimeout, Colima bind-mount cache
  staleness, and the host.docker.internal-from-blackbox workaround.
- /health/pelias endpoint is now listed in the API table since it's
  the integration point with blackbox monitoring.
- Testing section added — explicitly "no automated tests yet", with a
  curl-based manual smoke test set a human can run after changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:59:34 +02:00
Till JS
69ce4c2c25 feat(geocoding): fall back to Pelias /search when /autocomplete is empty
Pelias /autocomplete deliberately excludes the address layer as a
performance optimization, so queries like "Marktstätte Konstanz"
(street + locality) return 0 venue matches even though they're clearly
in the index. /search covers all layers including addresses and streets.

Query /autocomplete first (fast, fuzzy, great for venue names), and if
it returns nothing, try /search. Best of both worlds: quick matches for
"Konzil Restaurant" plus reliable matches for street addresses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:54:57 +02:00
Till JS
020f327503 fix(geocoding): drop unused Pelias services, raise Bun idleTimeout
Two production follow-ups surfaced after the deploy:

1. Pelias API was emitting continuous `ENOTFOUND placeholder`, `pip`,
   `interpolation` errors because we declared those services in
   pelias.json but never actually run them (we don't need WOF
   admin lookup or street interpolation for the DACH use case).
   Removed the stale entries — Pelias degrades cleanly to
   libpostal-only parsing, which is what we want.

2. Bun.serve's default idleTimeout is 10s, which is too tight for
   cold Pelias queries hitting Elasticsearch. Raise to 60s so
   first-query-after-idle doesn't get cut off.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:41:57 +02:00
Till JS
c47ce83e83 fix(geocoding): proxy Pelias health through wrapper for monitoring
blackbox-exporter can't resolve host.docker.internal on Colima, so
probes of host.docker.internal:4000 and :9200 always fail. Instead,
add a /health/pelias endpoint on the Hono wrapper that proxies to
the Pelias API, and update prometheus.yml to probe the wrapper's
proxied health endpoint.

Also simplifies the status page friendly_name() now that we don't
need to display the host.docker.internal targets.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:45:43 +02:00
Till JS
6977d189ab fix(geocoding): don't bind libpostal to host port 4400
Port 4400 collides with mana-infra-landings (status.mana.how nginx)
on the production mac mini. libpostal is only reached internally by
pelias-api over the pelias compose network anyway — no host binding
needed. Use expose instead of ports to drop the host mapping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:41:26 +02:00
Till JS
957060ca55 feat(monitoring): add mana-geocoding + Pelias to prod compose, Prometheus, Grafana, and status.mana.how
Production deployment + observability for the self-hosted geocoding stack:

**docker-compose.macmini.yml**
- New mana-geocoding container (port 3018, internal-only — no traefik
  labels, no Cloudflare route). Uses host.docker.internal to reach the
  Pelias API on the host's pelias compose stack. Dockerfile added under
  services/mana-geocoding/ using the same Bun/Hono pattern as mana-events.

**Prometheus**
- New blackbox-internal job probing mana-geocoding:3018/health, the
  Pelias API on host.docker.internal:4000/v1/status, and Elasticsearch
  at host.docker.internal:9200/_cluster/health. Kept separate from
  blackbox-api which is reserved for public HTTPS endpoints.

**status.mana.how (generate-status-page.sh)**
- Include blackbox-internal in the metric query and add an "Interne
  Dienste" section with its own summary card, right between Infrastruktur
  and GPU Dienste. Summary grid goes from 4 to 5 columns with a
  900px breakpoint.
- friendly_name() now handles http:// URLs and rewrites container-name
  hosts like mana-geocoding:3018/health → "Mana Geocoding",
  host.docker.internal:4000 → "Pelias API",
  host.docker.internal:9200 → "Pelias Elasticsearch".

**Grafana uptime dashboard**
- Add an "Internal" series to the "Alle Dienste — Uptime-Verlauf" panel
- New "Interne Dienste Status" table panel showing per-instance up/down
- New "Geocoding Ø Latenz" stat panel for probe_duration_seconds

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 16:11:01 +02:00
Till JS
f7de9fdf2d docs(geocoding): document the Pelias category patch + import gotchas
Expand services/mana-geocoding/CLAUDE.md with:
- The Pelias API patch (geojsonify_place_details.js) that forces the
  category field to always be returned, with regeneration instructions
- The priority-ordered Pelias→PlaceCategory mapping and verified
  example mappings from the DACH index
- A full initial-import walkthrough covering the non-obvious gotchas
  (analysis-icu plugin, dach-latest → planet-latest rename, adminLookup
  disabled, leveldbpath, libpostal config object form, boundary.country
  single-value constraint)

Also register mana-geocoding in the root services list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 15:50:40 +02:00
Till JS
e82b5c1449 feat(geocoding): auto-categorize places via Pelias taxonomy
Pelias hides the 'category' field from API responses unless the
caller filters by categories=... explicitly — a default intended for
keyword search that strips category metadata from address queries.

Patch the Pelias API's geojsonify_place_details.js so the category
array is returned on every feature (food, retail, transport, …),
mounted into the container as a read-only volume override.

Rewrite category-map.ts to map Pelias' OSM taxonomy to our 7
PlaceCategories using a priority-ordered list so a restaurant
tagged ['food','retail','nightlife'] resolves to 'food' (the most
specific), not 'shopping'.

Verified with Konstanz test queries:
  Konzil Restaurant        → food
  Bahnhof Konstanz         → transit
  Physiotherapie-Schule    → work
  MX-Park                  → leisure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 15:48:24 +02:00
Till JS
1943a1d13c fix(geocoding): Pelias config for DACH-only import + single-country filter
After importing 22M OSM objects for the DACH extract:
- Disable adminLookup (no WOF data needed for address search)
- Configure leveldb path inside the data volume
- Specify planet-latest.osm.pbf as the import filename
- Convert libpostal service config from string to object form
- Drop boundary.country default — Pelias only accepts a single
  country value, and our index only contains DACH data anyway

Verified forward + reverse geocoding work end-to-end for Konstanz
test queries via the mana-geocoding wrapper on port 3018.

Known limitation: OSM category/type (amenity:restaurant etc.) is
not yet populated in Pelias responses — will require whitelisting
those tags in the importer config and re-running the import.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 04:58:55 +02:00
Till JS
82f58e44fa A11y 2026-04-10 23:04:39 +02:00
Till JS
a47a7bfdba feat(places): add self-hosted geocoding with Pelias (DACH)
New mana-geocoding service (port 3018) wraps a self-hosted Pelias
instance with LRU caching and OSM→PlaceCategory auto-mapping.
All geocoding queries stay within our infrastructure — no user
location data leaves the network.

Places module integration:
- Address autocomplete search in ListView (creates place with
  name, coords, address, category in one step)
- Address search + reverse geocoding button in DetailView
- Auto-fill address via reverse geocoding during tracking
- OSM category mapping (amenity:restaurant→food, shop:*→shopping, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 23:02:25 +02:00