managarten/docs/reports/geocoding-self-hosting-2026-04-28.md
Till JS fc49198992 docs(geocoding): post-migration log + Photon weekly-refresh operator scripts
- Decision report: status flipped to MIGRATED; added migration log with
  five WSL2 gotchas (bzip2 missing, no official Photon image,
  firewall=true blocks cross-LAN, vmIdleTimeout=-1 ineffective,
  PowerShell pre-expansion of bash $(...)) and resource snapshot.
- mana-geocoding CLAUDE.md: PHOTON_SELF_API_URL note now reflects live
  primary status on mana-gpu since 2026-04-28.
- photon-self/: operator scripts for the weekly DB refresh — update.sh
  (atomic-swap with rollback), systemd unit + timer (Sun 03:30 +30min
  jitter, Persistent=true), README with re-installation instructions
  for DR. Currently installed and enabled on mana-gpu.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 21:31:37 +02:00

307 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Geocoding Self-Hosting — Decision Report
**Status:** ✅ MIGRATED — Photon-on-mana-gpu live since 2026-04-28 19:27 CEST
**Date:** 2026-04-28
**Context:** Pelias was retired from the Mac mini on 2026-04-28 (3 GB RAM was crushing the host into 8.6 GB swap). The wrapper now serves all queries through public Photon + Nominatim, with sensitive-query blocking + coord quantization as privacy mitigations. We need a self-hosted geocoder back in the chain so sensitive queries (`Hausarzt`, `Klinikum`, …) don't return zero results when the user actually wants them, and so we don't depend on a third party for routine address lookups.
---
## TL;DR
**Self-host [Photon](https://github.com/komoot/photon) (Europe-wide) on `mana-gpu`.**
- **Disk:** ~80 GB unpacked (we have it on the GPU server)
- **RAM:** 48 GB Java heap (negligible vs the Mac mini's 3 GB Pelias overhead)
- **Setup:** download a pre-built tarball from GraphHopper, `docker run`, point the wrapper at it. **No PBF import, no patching, no Elasticsearch container to babysit.**
- **Updates:** weekly re-download of the latest dump, ~30 min of cron + `docker restart`
- **Maintenance:** single Java process, no schema migration, no admin lookups, no sensitive config
This replaces Pelias entirely. Once it's running, Photon becomes a **`privacy: 'local'`** provider and the sensitive-query block now has a real local backend to fall back to — meaning users can search for medical/crisis services without hitting the public OSM at all.
Pelias does not return.
---
## Decision criteria
In rough priority order:
1. **Privacy fit** — must serve sensitive queries (Hausarzt, Psychiater, …) without leaking to a third party. Means we need a `privacy: 'local'` provider.
2. **Operational cost** — every minute spent on geocoding is a minute not spent on Mana itself. Setup, updates, recovery from breakage.
3. **Resource fit** — must coexist with STT/TTS/Image-Gen/Video-Gen/Ollama on the GPU server without GPU-pass-through conflicts.
4. **DACH data quality** — German addresses + venue names. Compound-word handling ("Münsterplatz"), umlauts, postcode formats.
5. **API surface** — autocomplete (typing-fast suggestions), forward search, reverse geocoding. Categories nice-to-have.
6. **Reuse of existing wrapper code** — we already have provider adapters for Pelias, Photon, Nominatim. Anything that doesn't match one of those means new code.
---
## Candidates
### 1. Pelias (current, retired)
| | |
|---|---|
| **RAM** | ~3.2 GB (libpostal: 2 GB, ES: 1.2 GB, API: 100 MB) |
| **Disk** | ~5 GB ES index |
| **Setup** | 4 docker services + manual `dach-latest.osm.pbf` rename + `analysis-icu` plugin install + 3045 min import + patched `geojsonify_place_details.js` |
| **Updates** | Manual re-import (3045 min) every few weeks |
| **Wire format** | Multi-tag categories (`food/retail/nightlife`) — richest of the three |
| **Privacy** | `local` (self-hosted) |
| **Pre-built data** | None — must run the importer |
**Verdict:** the multi-tag taxonomy is genuinely useful but everything else is friction. The patched JS file (overriding `condition: checkCategoryParam``condition: () => true`) is a permanent maintenance liability — it has to be regenerated on every Pelias API image bump. There is no operational reason to bring Pelias back.
### 2. Nominatim
| | |
|---|---|
| **RAM** | 12 GB during import for Germany alone; 2 GB minimum to run; 128 GB recommended for planet |
| **Disk** | **~100 GB for Germany alone** (per [user reports](https://github.com/mediagis/nominatim-docker/discussions/265)); 1 TB for planet |
| **Setup** | One docker-compose (Postgres + Nominatim worker), 812 h import for Germany |
| **Updates** | OSM replication via differential updates (continuous) |
| **Wire format** | `class:type` raw OSM tags (already mapped in our `osm-category-map.ts`) |
| **Privacy** | `local` |
| **Pre-built data** | None — must run the importer |
**Verdict:** the disk number is the killer. **100 GB for Germany alone** is wildly disproportionate for our use case (mostly DACH addresses + restaurant names), driven by the flatnode file plus the rich admin-boundary indexing Nominatim does. The 812 h import is also bad — every geographic data refresh becomes a half-day operation. Used by OSM itself and Wikipedia, so quality is unquestionable, but the resource fit is wrong for a side service.
### 3. Photon (recommended)
| | |
|---|---|
| **RAM** | 48 GB Java heap configurable via `-Xmx`; planet-wide deployment recommends 64 GB but Europe runs comfortably on 68 GB |
| **Disk** | **5.8 GB for Germany dump (compressed), 30.6 GB for full Europe v1.x dump** ([GraphHopper downloads](https://download1.graphhopper.com/public/europe/index.html)). Unpacks to ~80 GB for Europe. |
| **Setup** | `docker run`, mount the unpacked dump, expose port 2322. **No PBF import.** |
| **Updates** | **Weekly pre-built dumps from GraphHopper.** Download new tar.bz2, restart. ~30 min total operator time. |
| **Wire format** | `osm_key:osm_value` raw OSM tags (already mapped) |
| **Privacy** | `local` once self-hosted |
| **Pre-built data** | **Yes — country, region, and planet, refreshed weekly** |
**Verdict:** the "pre-built index" is the deciding feature. It collapses the entire data-pipeline complexity that Pelias and Nominatim ask us to manage. Java 21 + embedded OpenSearch in a single process. The wire format already matches our existing `PhotonProvider` adapter — switching from "public Photon" to "self-hosted Photon" is literally an env-var change.
---
## Resource comparison summary
| Tool | Setup time | RAM (steady) | Disk | Update mechanism | Maintenance burden |
|---|---|---|---|---|---|
| **Pelias DACH** | 3045 min import + patch hack | 3.2 GB | 5 GB | Manual re-import | High (4 containers, JS patch) |
| **Nominatim Germany** | 812 h import | 24 GB | **~100 GB** | OSM replication | Medium (Postgres tuning) |
| **Photon Europe** | 510 min download | 48 GB | 30 GB → 80 GB unpacked | **Weekly tarball** | Low (1 container, no DB) |
| **Photon Germany** | 25 min download | 24 GB | 5.8 GB → ~15 GB unpacked | Weekly tarball | Low |
For DACH+ scope, Photon-Germany is the lightest option that still covers all our users. Photon-Europe is the only-marginally-heavier option that future-proofs against any non-DACH user (events module, travel scenarios).
---
## Privacy implications
Currently the wrapper has two `privacy: 'public'` providers (Photon, Nominatim) and zero `local` ones (Pelias is stopped). A sensitive query like "Hausarzt Konstanz" returns 0 results with `notice: 'sensitive_local_unavailable'` — privacy-correct but UX-painful.
**After self-hosting Photon on `mana-gpu`:**
- Photon-self-hosted is registered with `privacy: 'local'`
- The sensitive-query block now has a real backend → users get results without their query leaving our network
- Public Photon and Nominatim can stay in the chain as last-resort `privacy: 'public'` fallbacks for obscure non-DACH queries
- OR drop them entirely — we no longer need third-party fallbacks if our own Photon is reliable
**Recommendation:** keep public Photon as a third-tier `public` fallback, drop public Nominatim. The chain becomes:
```
1. self-hosted Photon (mana-gpu) privacy: local
2. public Photon (komoot.io) privacy: public ← only when self-hosted is down
AND query isn't sensitive
```
This gives us belt-and-suspenders: even if a Pelias/Photon migration breaks something, sensitive queries still hold the privacy line because the chain filters public providers in `localOnly` mode regardless of which one is up.
---
## Migration plan
Estimated total time: **34 hours**, of which ~1 h is download/unpack waiting time. Most of it is one-off setup that won't be repeated.
### Phase 1 — GPU server prep (1.5 h, requires physical access)
1. Verify `mana-gpu` has ≥ 100 GB free disk on a fast SSD. Photon Java heap is GC-sensitive; spinning rust would hurt latency.
2. Install **Docker Desktop for Windows** with WSL2 backend. (WSL2 is more compatible with the Java 21 + OpenSearch stack than native Hyper-V containers.)
3. Verify existing GPU services (Ollama, image-gen, video-gen, STT, TTS) still work after Docker Desktop install — Hyper-V mode can briefly conflict with CUDA. Run a quick STT inference smoke as the canary.
4. Open inbound TCP 2322 in Windows Firewall, restricted to LAN only.
### Phase 2 — Photon container (45 min, ~30 min of which is download)
1. `mkdir D:\photon-data` (or wherever you've got space)
2. Download from GraphHopper:
```powershell
cd D:\photon-data
curl -O https://download1.graphhopper.com/public/europe/photon-db-europe-1.0-latest.tar.bz2
tar -xjf photon-db-europe-1.0-latest.tar.bz2
```
(Country-only is also viable — start with Germany if you want to get something running fast and switch to Europe later.)
3. Run Photon:
```powershell
docker run -d --name photon -p 2322:2322 `
-v D:\photon-data\photon_data:/photon/photon_data `
komoot/photon
```
4. Smoke test from the GPU server:
```powershell
curl http://localhost:2322/api?q=Konstanz`&limit=2
```
### Phase 3 — Wire it into the wrapper (30 min)
In `services/mana-geocoding/.env` (or `docker-compose.macmini.yml`'s mana-geocoding env block):
```env
GEOCODING_PROVIDERS=self_photon,photon
PHOTON_API_URL=http://192.168.178.11:2322 # self_photon points here
# Keep PHOTON_API_URL_PUBLIC=https://photon.komoot.io as last-resort
```
In `services/mana-geocoding/src/app.ts`, register a second Photon provider with `privacy: 'local'` (a small refactor — the existing `PhotonProvider` class takes config, just instantiate twice).
In `services/mana-geocoding/src/providers/photon.ts`, expose `privacy` as a constructor argument so the same class can serve both roles.
Tests: extend `chain.test.ts` to verify the order pelias-class → photon-class → public Photon → public Nominatim.
### Phase 4 — Validate + cut over (30 min)
1. Deploy the updated wrapper to mana-server.
2. Smoke: `curl https://mana.how/api/v1/geocode/search?q=Hausarzt+Konstanz` should now return real results (was empty before this work).
3. Health: `curl https://mana.how/api/v1/geocode/health/providers` should show `self_photon: healthy`.
4. Watch latency for 24 h via the existing Prometheus probes.
5. Pelias container can be deleted from Mac mini (`docker compose -f services/mana-geocoding/pelias/docker-compose.yml down -v`) — frees 5 GB disk + the Docker volume.
### Phase 5 — Maintenance baseline (10 min/week)
1. Cron job on mana-gpu: every Sunday night, download the latest Photon dump, unpack to a sibling directory, swap-symlink, restart container. ~30 min unattended.
2. Keep CLAUDE.md in `services/mana-geocoding/` updated when the topology changes.
---
## Open questions
1. **GPU server RAM** — we don't know the actual amount. If it's <16 GB, drop to Photon-Germany only and skip Europe.
2. **Backup strategy** — Photon's data is reproducible (download from GraphHopper anytime), so no backup needed. Confirm this assumption — if GraphHopper goes away, we lose the easy-update path.
3. **Reverse-geocode quality** — Photon's reverse implementation is OK but not its strongest feature. If we see degraded reverse results vs the old Pelias setup, we can layer a tiny Nominatim instance on top later. Not worth doing pre-emptively.
4. **Cross-LAN latency** — adds 520 ms vs the old localhost setup. Acceptable; cache TTL stays 24 h for local provider.
---
## Why not other tools
- **Mimirsbrunn** (Pelias-derived): less maintained, French/Spanish focus, smaller community. No win over Photon.
- **Gisgraphy:** Java + Postgres, similar resource profile to Nominatim, less actively maintained than either Nominatim or Photon. No win.
- **OpenAddresses + custom indexer:** months of work, and we'd be the only users. Hard pass.
- **Self-hosted Mapbox:** doesn't exist as such; their offering requires their cloud.
- **Bezahltes API als Backup-Tier (MapTiler / OpenCage):** still worth adding later as a 4th tier behind self-hosted-Photon + public-fallbacks. Not blocking.
---
## What this avoids
- **Re-running the Pelias import pipeline.** That alone would have been 4590 min of operator time per data refresh.
- **The libpostal RAM tax.** Photon does its own address parsing without libpostal's 2 GB model.
- **The patched JS file.** Photon returns OSM tags by default; no API patch needed.
- **A second Postgres tenant.** Nominatim would force one. Photon is fully self-contained.
- **Public-API dependency for the warm path.** Photon-self-hosted is privacy-clean for ALL queries, not just sensitive ones.
---
## Sources
- [Photon GitHub repo & README](https://github.com/komoot/photon) — hardware requirements, Java 21+, OpenSearch backend
- [GraphHopper Photon downloads (Europe)](https://download1.graphhopper.com/public/europe/index.html) — 30.6 GB Europe v1.x; 5.8 GB Germany v1.x; weekly refresh
- [Nominatim 5.3.2 Installation docs](https://nominatim.org/release-docs/latest/admin/Installation/) — 128 GB RAM recommended planet, 1 TB disk
- [mediagis/nominatim-docker discussion #265](https://github.com/mediagis/nominatim-docker/discussions/265) — Germany-import resource reports (12 GB RAM, ~100 GB disk, 812 h)
- [Photon OpenSearch wiki page](https://wiki.openstreetmap.org/wiki/Photon) — region scoping, memory tuning
- Internal: [`services/mana-geocoding/CLAUDE.md`](../../services/mana-geocoding/CLAUDE.md) for the current Pelias setup we're replacing
---
## Migration log + lessons learned (2026-04-28)
The migration ran from 17:42 to 19:27 CEST — about 1 h 45 min, almost
all of which was unattended download/unpack waiting time (29 GB tarball
+ 80 GB unpack). Went smoother than the runbook estimated except for
five WSL2-specific gotchas:
### What worked first try
- **WSL2 install via SSH:** `winget install Microsoft.WSL` followed by
`wsl --install Ubuntu-24.04 --no-launch` — fully unattended, no
interactive prompts, including the previously-painful first-run user
setup (the `--no-launch` flag combined with `--user root` for
follow-up commands skipped the wizard entirely).
- **Docker Engine in WSL2 (instead of Docker Desktop):** apt install
`docker-ce` from the official repo, then run as systemd service.
Headless, no GUI session needed — much cleaner for SSH-driven
setup than Docker Desktop.
- **WSL2 Mirrored Networking** (Win11 22H2+): the Linux distro shares
the Windows host's LAN IP. Photon listens on
`192.168.178.11:2322` directly — no `netsh interface portproxy`
forwarding. Just one Windows Defender Firewall rule and the Mac
mini reaches it.
- **Photon Europe pre-built tarball** (29 GB compressed → ~80 GB
unpacked) downloaded at ~9 MB/s sustained, unpacked at ~80 MB/s.
No PBF import, no Elasticsearch tuning, no patch hacks.
### Five gotchas worth documenting
1. **`bzip2` is not installed by default in Ubuntu 24.04 minimal.**
`tar -xjf` fails with `bzip2: Cannot exec`. Fix: `apt install bzip2`
before unpacking. Took ~15 minutes to spot because the script's
`set -e` exited cleanly after the failure.
2. **No official Photon Docker image.** Komoot publishes a JAR but
no `komoot/photon` on Docker Hub. Solution: run the JAR inside
`eclipse-temurin:21-jre` with the data dir + JAR mounted in.
Cleaner than community images (which lag the upstream version).
3. **`firewall=true` in `.wslconfig` blocks cross-LAN inbound.**
The first nginx-on-:2322 cross-LAN test worked. After enabling
`firewall=true` (intended to harden Hyper-V firewall), Photon
became unreachable from the Mac mini even though the Windows
Defender rule allowed it. Removing the line fixed it instantly.
The Hyper-V firewall layer in WSL2 is a separate, stricter pass
that the Windows-side rule doesn't cover.
4. **`vmIdleTimeout=-1` does NOT prevent WSL2 idle-shutdown** on
Win11 26200. The VM still shuts down ~60 s after the last SSH
session closes, killing the Photon container. Workaround that
actually works: a Windows Task Scheduler task at boot that runs
`wsl -d Ubuntu-24.04 --user root -- /bin/sleep infinity`. Holds
the VM open permanently. Survives reboots.
5. **PowerShell quoting + bash inside `wsl ... -- bash -c "..."`.**
`$(dpkg --print-architecture)` and `$(lsb_release -cs)` got
pre-expanded by PowerShell on the Windows side, breaking the
Docker apt sources line. Fix: write the install script to a file,
transfer via scp, run via `wsl ... bash /mnt/c/temp/script.sh`.
No quoting layers to fight.
### Resource snapshot post-migration
- **mana-gpu:** Photon container 391 MB / 31 GB (1.2 %) memory at
steady state, 290 % CPU during initial OpenSearch shard recovery,
near-zero CPU at idle. Disk: 80 GB unpacked photon_data + 29 GB
tarball still on disk (kept for debugging — can be removed).
- **mana-server:** mana-geocoding container unchanged in resource
use; chain just routes to a different upstream. Cross-LAN
per-request latency added: ~515 ms.
### Cutover verification
- `provider: "photon-self"` confirmed on both `/search` and `/reverse`
endpoints from inside mana-geocoding container and externally via
`https://mana.how/api/v1/geocode/...`.
- Sensitive query "Hausarzt Konstanz" now returns real results
(`Hausarztpraxis am Tannenhof, Am Tannenhof 2, 78464 Konstanz`)
instead of the previous `notice: 'sensitive_local_unavailable'`
empty response. Privacy stance maintained: the query never leaves
our infra.
- Public Photon + public Nominatim stay registered as last-resort
`privacy: 'public'` fallbacks. Health-snapshot shows them as
`healthy: false, ageMs: null` — they're never probed because
`photon-self` is healthy.