From fc49198992571b503319d1d4b19a7ff9ccfc7d12 Mon Sep 17 00:00:00 2001 From: Till JS Date: Tue, 28 Apr 2026 21:31:08 +0200 Subject: [PATCH] docs(geocoding): post-migration log + Photon weekly-refresh operator scripts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Decision report: status flipped to MIGRATED; added migration log with five WSL2 gotchas (bzip2 missing, no official Photon image, firewall=true blocks cross-LAN, vmIdleTimeout=-1 ineffective, PowerShell pre-expansion of bash $(...)) and resource snapshot. - mana-geocoding CLAUDE.md: PHOTON_SELF_API_URL note now reflects live primary status on mana-gpu since 2026-04-28. - photon-self/: operator scripts for the weekly DB refresh — update.sh (atomic-swap with rollback), systemd unit + timer (Sun 03:30 +30min jitter, Persistent=true), README with re-installation instructions for DR. Currently installed and enabled on mana-gpu. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../geocoding-self-hosting-2026-04-28.md | 90 ++++++++++++++++++- services/mana-geocoding/CLAUDE.md | 8 +- services/mana-geocoding/photon-self/README.md | 78 ++++++++++++++++ .../photon-self/photon-update.service | 10 +++ .../photon-self/photon-update.sh | 60 +++++++++++++ .../photon-self/photon-update.timer | 13 +++ 6 files changed, 254 insertions(+), 5 deletions(-) create mode 100644 services/mana-geocoding/photon-self/README.md create mode 100644 services/mana-geocoding/photon-self/photon-update.service create mode 100755 services/mana-geocoding/photon-self/photon-update.sh create mode 100644 services/mana-geocoding/photon-self/photon-update.timer diff --git a/docs/reports/geocoding-self-hosting-2026-04-28.md b/docs/reports/geocoding-self-hosting-2026-04-28.md index d9843daff..e4effa752 100644 --- a/docs/reports/geocoding-self-hosting-2026-04-28.md +++ b/docs/reports/geocoding-self-hosting-2026-04-28.md @@ -1,6 +1,6 @@ # Geocoding Self-Hosting — Decision Report -**Status:** Recommendation — pending migration +**Status:** ✅ MIGRATED — Photon-on-mana-gpu live since 2026-04-28 19:27 CEST **Date:** 2026-04-28 **Context:** Pelias was retired from the Mac mini on 2026-04-28 (3 GB RAM was crushing the host into 8.6 GB swap). The wrapper now serves all queries through public Photon + Nominatim, with sensitive-query blocking + coord quantization as privacy mitigations. We need a self-hosted geocoder back in the chain so sensitive queries (`Hausarzt`, `Klinikum`, …) don't return zero results when the user actually wants them, and so we don't depend on a third party for routine address lookups. @@ -217,3 +217,91 @@ Tests: extend `chain.test.ts` to verify the order pelias-class → photon-class - [mediagis/nominatim-docker discussion #265](https://github.com/mediagis/nominatim-docker/discussions/265) — Germany-import resource reports (12 GB RAM, ~100 GB disk, 8–12 h) - [Photon OpenSearch wiki page](https://wiki.openstreetmap.org/wiki/Photon) — region scoping, memory tuning - Internal: [`services/mana-geocoding/CLAUDE.md`](../../services/mana-geocoding/CLAUDE.md) for the current Pelias setup we're replacing + +--- + +## Migration log + lessons learned (2026-04-28) + +The migration ran from 17:42 to 19:27 CEST — about 1 h 45 min, almost +all of which was unattended download/unpack waiting time (29 GB tarball ++ 80 GB unpack). Went smoother than the runbook estimated except for +five WSL2-specific gotchas: + +### What worked first try + +- **WSL2 install via SSH:** `winget install Microsoft.WSL` followed by + `wsl --install Ubuntu-24.04 --no-launch` — fully unattended, no + interactive prompts, including the previously-painful first-run user + setup (the `--no-launch` flag combined with `--user root` for + follow-up commands skipped the wizard entirely). +- **Docker Engine in WSL2 (instead of Docker Desktop):** apt install + `docker-ce` from the official repo, then run as systemd service. + Headless, no GUI session needed — much cleaner for SSH-driven + setup than Docker Desktop. +- **WSL2 Mirrored Networking** (Win11 22H2+): the Linux distro shares + the Windows host's LAN IP. Photon listens on + `192.168.178.11:2322` directly — no `netsh interface portproxy` + forwarding. Just one Windows Defender Firewall rule and the Mac + mini reaches it. +- **Photon Europe pre-built tarball** (29 GB compressed → ~80 GB + unpacked) downloaded at ~9 MB/s sustained, unpacked at ~80 MB/s. + No PBF import, no Elasticsearch tuning, no patch hacks. + +### Five gotchas worth documenting + +1. **`bzip2` is not installed by default in Ubuntu 24.04 minimal.** + `tar -xjf` fails with `bzip2: Cannot exec`. Fix: `apt install bzip2` + before unpacking. Took ~15 minutes to spot because the script's + `set -e` exited cleanly after the failure. + +2. **No official Photon Docker image.** Komoot publishes a JAR but + no `komoot/photon` on Docker Hub. Solution: run the JAR inside + `eclipse-temurin:21-jre` with the data dir + JAR mounted in. + Cleaner than community images (which lag the upstream version). + +3. **`firewall=true` in `.wslconfig` blocks cross-LAN inbound.** + The first nginx-on-:2322 cross-LAN test worked. After enabling + `firewall=true` (intended to harden Hyper-V firewall), Photon + became unreachable from the Mac mini even though the Windows + Defender rule allowed it. Removing the line fixed it instantly. + The Hyper-V firewall layer in WSL2 is a separate, stricter pass + that the Windows-side rule doesn't cover. + +4. **`vmIdleTimeout=-1` does NOT prevent WSL2 idle-shutdown** on + Win11 26200. The VM still shuts down ~60 s after the last SSH + session closes, killing the Photon container. Workaround that + actually works: a Windows Task Scheduler task at boot that runs + `wsl -d Ubuntu-24.04 --user root -- /bin/sleep infinity`. Holds + the VM open permanently. Survives reboots. + +5. **PowerShell quoting + bash inside `wsl ... -- bash -c "..."`.** + `$(dpkg --print-architecture)` and `$(lsb_release -cs)` got + pre-expanded by PowerShell on the Windows side, breaking the + Docker apt sources line. Fix: write the install script to a file, + transfer via scp, run via `wsl ... bash /mnt/c/temp/script.sh`. + No quoting layers to fight. + +### Resource snapshot post-migration + +- **mana-gpu:** Photon container 391 MB / 31 GB (1.2 %) memory at + steady state, 290 % CPU during initial OpenSearch shard recovery, + near-zero CPU at idle. Disk: 80 GB unpacked photon_data + 29 GB + tarball still on disk (kept for debugging — can be removed). +- **mana-server:** mana-geocoding container unchanged in resource + use; chain just routes to a different upstream. Cross-LAN + per-request latency added: ~5–15 ms. + +### Cutover verification + +- `provider: "photon-self"` confirmed on both `/search` and `/reverse` + endpoints from inside mana-geocoding container and externally via + `https://mana.how/api/v1/geocode/...`. +- Sensitive query "Hausarzt Konstanz" now returns real results + (`Hausarztpraxis am Tannenhof, Am Tannenhof 2, 78464 Konstanz`) + instead of the previous `notice: 'sensitive_local_unavailable'` + empty response. Privacy stance maintained: the query never leaves + our infra. +- Public Photon + public Nominatim stay registered as last-resort + `privacy: 'public'` fallbacks. Health-snapshot shows them as + `healthy: false, ageMs: null` — they're never probed because + `photon-self` is healthy. diff --git a/services/mana-geocoding/CLAUDE.md b/services/mana-geocoding/CLAUDE.md index 9fd0d1ecd..16bcb7cf0 100644 --- a/services/mana-geocoding/CLAUDE.md +++ b/services/mana-geocoding/CLAUDE.md @@ -159,10 +159,10 @@ GEOCODING_PROVIDERS=photon-self,pelias,photon,nominatim PROVIDER_TIMEOUT_MS=8000 # per-provider request timeout (cold-start safe) PROVIDER_HEALTH_CACHE_MS=30000 # health-cache TTL — skip dead providers -# --- Self-hosted Photon (privacy: 'local', primary post-migration) ---- -# Set this to point at the GPU-server-hosted Photon. When unset, the -# `photon-self` slot is not registered and the chain falls back to -# public providers as before. +# --- Self-hosted Photon (privacy: 'local', PRIMARY since 2026-04-28) -- +# Live on mana-gpu (Windows 11, WSL2-Ubuntu, Docker, Photon Europe-wide +# Java JAR + OpenSearch). Cross-LAN reach via WSL2 mirrored networking. +# Set in .env.macmini; flow into the container via docker-compose env. PHOTON_SELF_API_URL=http://192.168.178.11:2322 # --- Pelias (legacy, currently stopped — privacy: 'local') ------------ diff --git a/services/mana-geocoding/photon-self/README.md b/services/mana-geocoding/photon-self/README.md new file mode 100644 index 000000000..8afefe475 --- /dev/null +++ b/services/mana-geocoding/photon-self/README.md @@ -0,0 +1,78 @@ +# photon-self — operator files for the GPU-server Photon + +Source-of-truth copies of the scripts and systemd units that run on +`mana-gpu` to host the self-hosted Photon. Versioned here so the setup +can be rebuilt in DR scenarios without recreating from memory. + +## What lives where + +| File | Where it runs | Purpose | +|---|---|---| +| `photon-update.sh` | inside the WSL2 Ubuntu distro on mana-gpu, at `/usr/local/bin/photon-update.sh` | Weekly index refresh — download new tarball, atomic swap, restart container, rollback on failure | +| `photon-update.service` | `/etc/systemd/system/photon-update.service` | Oneshot wrapper that invokes the script | +| `photon-update.timer` | `/etc/systemd/system/photon-update.timer` | Sunday 03:30 + 30-min jitter, persistent across reboots | + +## Re-installation (after a clean Windows reinstall etc.) + +After you've followed [docs/runbooks/photon-on-mana-gpu.md](../../../docs/runbooks/photon-on-mana-gpu.md) +to get WSL2 + Docker + the initial Photon container running: + +```bash +# Run inside WSL2 Ubuntu as root: +cp /mnt/c/path/to/repo/services/mana-geocoding/photon-self/photon-update.sh \ + /usr/local/bin/photon-update.sh +chmod +x /usr/local/bin/photon-update.sh + +cp /mnt/c/path/to/repo/services/mana-geocoding/photon-self/photon-update.service \ + /etc/systemd/system/ +cp /mnt/c/path/to/repo/services/mana-geocoding/photon-self/photon-update.timer \ + /etc/systemd/system/ + +systemctl daemon-reload +systemctl enable --now photon-update.timer +systemctl list-timers photon-update.timer # verify next run +``` + +## Manual trigger + +To force a refresh outside the schedule: + +```bash +# Inside WSL2 Ubuntu as root +systemctl start photon-update.service +journalctl -u photon-update.service -f # watch progress +tail -f /var/log/photon-update.log # script-level detail +``` + +## What the update script does + +``` +1. curl new tarball → /opt/photon-data/photon-db.tar.bz2.new +2. Verify size ≥ 25 GB (sanity guard against truncated downloads) +3. tar -xjf into /opt/photon-data/photon_data.new +4. docker stop photon +5. mv old → .old, mv new → live (atomic-ish — both renames in same FS) +6. docker start photon +7. Poll /api?q=Konstanz for up to 180 s + - On success: rm -rf .old (cleanup) + - On failure: rollback (mv live → .bad, mv .old → live, restart) +``` + +The rollback path is the load-bearing part — a corrupted GraphHopper +dump or a Photon version-mismatch can otherwise leave the service in a +non-serving state until the operator notices. + +## Why systemd timer instead of cron + +WSL2 Ubuntu has systemd enabled by default since the 0.67.6 release. +Timers give us: + +- `Persistent=true` — runs missed jobs at next boot if the GPU server + was off Sunday morning. Cron just skips them. +- `RandomizedDelaySec=30min` — spreads 100s of weekly jobs across the + GraphHopper CDN window, polite-neighbour style. +- `journalctl -u photon-update` — structured logs in one place. +- Status-checkable with `systemctl list-timers`. + +The downside (more files on disk than a single crontab entry) is +negligible. diff --git a/services/mana-geocoding/photon-self/photon-update.service b/services/mana-geocoding/photon-self/photon-update.service new file mode 100644 index 000000000..95d33ff4e --- /dev/null +++ b/services/mana-geocoding/photon-self/photon-update.service @@ -0,0 +1,10 @@ +[Unit] +Description=Weekly Photon DB refresh from GraphHopper +After=docker.service network-online.target +Wants=network-online.target + +[Service] +Type=oneshot +ExecStart=/usr/local/bin/photon-update.sh +# Don't fail loudly if it can't run — next week's timer tries again +SuccessExitStatus=0 diff --git a/services/mana-geocoding/photon-self/photon-update.sh b/services/mana-geocoding/photon-self/photon-update.sh new file mode 100755 index 000000000..6881aac06 --- /dev/null +++ b/services/mana-geocoding/photon-self/photon-update.sh @@ -0,0 +1,60 @@ +#!/bin/bash +# Weekly Photon DB update. +# - Downloads the latest tarball from GraphHopper +# - Verifies size before swapping (don't replace good data with a partial) +# - Atomic swap via mv → restart container +# - Keeps one previous version as .old in case of bad index + +set -euo pipefail + +DATA_DIR=/opt/photon-data +URL=https://download1.graphhopper.com/public/europe/photon-db-europe-1.0-latest.tar.bz2 +MIN_SIZE=$((25 * 1024 * 1024 * 1024)) # 25 GB minimum, real one is ~29 GB +LOG=/var/log/photon-update.log + +exec >>"$LOG" 2>&1 +echo "=== $(date -Iseconds) — photon-update starting ===" + +cd "$DATA_DIR" + +# Download to .new — don't touch the live tarball +curl --silent --show-error --output photon-db.tar.bz2.new "$URL" +ACTUAL=$(stat -c %s photon-db.tar.bz2.new) +if [ "$ACTUAL" -lt "$MIN_SIZE" ]; then + echo "FAIL: downloaded tarball only $((ACTUAL / 1024 / 1024)) MB, expected ≥25 GB. Aborting." + rm -f photon-db.tar.bz2.new + exit 1 +fi +echo "Downloaded $((ACTUAL / 1024 / 1024)) MB OK" + +# Unpack to a fresh dir alongside the live one +rm -rf photon_data.new +mkdir photon_data.new +tar -xjf photon-db.tar.bz2.new -C photon_data.new --strip-components=1 + +# Stop the container, swap, restart +docker stop photon +mv photon_data photon_data.old || true +mv photon_data.new photon_data +mv photon-db.tar.bz2 photon-db.tar.bz2.old || true +mv photon-db.tar.bz2.new photon-db.tar.bz2 +docker start photon + +# Wait for it to come up + smoke +for i in $(seq 1 90); do + if curl -fsS --max-time 2 http://localhost:2322/api?q=Konstanz >/dev/null 2>&1; then + echo "OK — Photon ready after $i seconds with new index" + # Cleanup old version on success + rm -rf photon_data.old photon-db.tar.bz2.old + echo "=== $(date -Iseconds) — photon-update complete ===" + exit 0 + fi + sleep 2 +done + +echo "FAIL — Photon did not become ready within 180 s after swap. Rolling back." +docker stop photon +mv photon_data photon_data.bad +mv photon_data.old photon_data +docker start photon +exit 1 diff --git a/services/mana-geocoding/photon-self/photon-update.timer b/services/mana-geocoding/photon-self/photon-update.timer new file mode 100644 index 000000000..12ddfc4d8 --- /dev/null +++ b/services/mana-geocoding/photon-self/photon-update.timer @@ -0,0 +1,13 @@ +[Unit] +Description=Trigger weekly Photon DB refresh + +[Timer] +# Sunday 03:30 — outside likely-usage hours +OnCalendar=Sun *-*-* 03:30:00 +# Run on next boot if the system was off at the scheduled time +Persistent=true +# Spread across 30 min so we don't hammer GraphHopper at exactly :30:00 +RandomizedDelaySec=30min + +[Install] +WantedBy=timers.target