diff --git a/services/mana-geocoding/CLAUDE.md b/services/mana-geocoding/CLAUDE.md index 0a24e0f22..cfdde7217 100644 --- a/services/mana-geocoding/CLAUDE.md +++ b/services/mana-geocoding/CLAUDE.md @@ -1,6 +1,6 @@ # mana-geocoding -Self-hosted geocoding service. Wraps a local Pelias instance (DACH region) with caching and automatic OSM → PlaceCategory mapping. All geocoding queries stay within our infrastructure — no user location data leaves the network. +Geocoding service for the Places module. **Provider-chain architecture** — tries a self-hosted Pelias first, falls back to public Photon (komoot) and then public Nominatim (OSM) when Pelias is unhealthy or unreachable. All Pelias-served queries stay on our infrastructure; fallback queries leak the search string to a public OSM endpoint. ## Tech Stack @@ -8,9 +8,11 @@ Self-hosted geocoding service. Wraps a local Pelias instance (DACH region) with |-------|------------| | **Runtime** | Bun | | **Framework** | Hono | -| **Geocoding** | Pelias (self-hosted, Elasticsearch-backed) | -| **Data** | OpenStreetMap DACH extract (DE/AT/CH) | -| **Caching** | In-memory LRU (5000 entries, 24h TTL) | +| **Primary geocoder** | Pelias (self-hosted, Elasticsearch-backed) | +| **Fallback 1** | [Photon](https://photon.komoot.io) (public, no rate limit advertised) | +| **Fallback 2** | [Nominatim](https://nominatim.openstreetmap.org) (public, 1 req/sec strict) | +| **Data** | OpenStreetMap DACH extract (DE/AT/CH) for Pelias; global OSM for the public fallbacks | +| **Caching** | In-memory LRU (5000 entries, 24h TTL) — applies to all provider answers | ## Port: 3018 @@ -145,26 +147,65 @@ docker run --rm pelias/api:latest cat /code/pelias/api/helper/geojsonify_place_d docker compose up -d --force-recreate api ``` -## Architecture - -``` -Client (Places module) - → mana-geocoding (Hono, port 3018) - → LRU cache check - → Pelias API (port 4000) [patched — see above] - → Elasticsearch (port 9200) -``` - ## Configuration ```env PORT=3018 -PELIAS_API_URL=http://localhost:4000/v1 + +# --- Provider chain (tried in order) ---------------------------------- +GEOCODING_PROVIDERS=pelias,photon,nominatim +PROVIDER_TIMEOUT_MS=5000 # per-provider request timeout +PROVIDER_HEALTH_CACHE_MS=30000 # health-cache TTL — skip dead providers + +# --- Pelias (primary) ------------------------------------------------- +PELIAS_API_URL=http://pelias-api:4000/v1 + +# --- Photon (fallback 1) ---------------------------------------------- +PHOTON_API_URL=https://photon.komoot.io + +# --- Nominatim (fallback 2) ------------------------------------------- +NOMINATIM_API_URL=https://nominatim.openstreetmap.org +NOMINATIM_USER_AGENT=mana-geocoding/1.0 (+https://mana.how; kontakt@memoro.ai) +NOMINATIM_INTERVAL_MS=1100 # >= 1000 to honor 1 req/sec policy + +# --- Misc ------------------------------------------------------------- CORS_ORIGINS=http://localhost:5173,https://mana.how CACHE_MAX_ENTRIES=5000 CACHE_TTL_MS=86400000 ``` +To **disable a provider**, drop it from `GEOCODING_PROVIDERS`. To run with +no Pelias at all (e.g. while it's being migrated), set +`GEOCODING_PROVIDERS=photon,nominatim`. The chain ordering is honored +exactly — the first listed provider is tried first. + +## Provider-chain semantics + +The `ProviderChain` (`src/providers/chain.ts`) iterates providers in +priority order and stops on the first success. A provider that returns +**zero results successfully** stops the chain — we don't waste public-API +budget on a query that legitimately doesn't match. Only network errors +(unreachable, 5xx, 429) cause fallthrough. + +Per-provider health is cached for `PROVIDER_HEALTH_CACHE_MS` (default 30s). +A failed health probe or a failed search marks the provider unhealthy and +skips it for the rest of the cache window. The next request after the cache +expires re-probes lazily — there is no background health pinger. + +``` +Client (Places module) + → mana-geocoding (Hono, port 3018) + → LRU cache (24h TTL) ← hit: ~0 ms + → Provider chain + 1. Pelias ← reachable: 50–200 ms (DACH index, fully featured) + 2. Photon ← fallback: 200–500 ms public, partial features + 3. Nominatim ← last resort: 200–800 ms + 1 req/sec queue +``` + +The response body includes `provider: 'pelias' | 'photon' | 'nominatim'` +and `tried: ProviderName[]` so the caller can render a "approximate match" +hint when a fallback served the request. + ## Pelias Infrastructure The Pelias stack runs as a separate docker-compose in `pelias/`: @@ -263,15 +304,22 @@ bun test ``` - `src/lib/__tests__/category-map.test.ts` — Pelias→PlaceCategory - priority resolution. Covers the multi-category ambiguity (food beats - retail for a restaurant, transport beats professional for a car rental, - …), single-category mappings, layer-hint fallback, and real-world - venue categories observed from the DACH index during the 2026-04-11 - deploy verification. + priority resolution. +- `src/lib/__tests__/osm-category-map.test.ts` — raw OSM-tag→PlaceCategory + mapping used by Photon + Nominatim (since they emit `class:type` rather + than Pelias's curated taxonomy). - `src/lib/__tests__/cache.test.ts` — LRU eviction order, TTL expiry, move-to-end on `get`, size tracking. +- `src/lib/__tests__/rate-limiter.test.ts` — single-token rate limiter + (used to enforce Nominatim's 1 req/sec policy). FIFO order, abort + cleanup, busy-flag release on aborted interval-wait. +- `src/providers/__tests__/chain.test.ts` — provider chain failover, health + cache, "stop on empty results" semantics. +- `src/providers/__tests__/photon-normalizer.test.ts` and + `nominatim-normalizer.test.ts` — locking the wire-format mapping for the + two public fallback providers. -As of the 2026-04-11 deploy: **42 tests, all green**. +As of the 2026-04-28 fallback rollout: **115 tests, all green**. ### Smoke test (`bun run test:smoke`) @@ -293,17 +341,25 @@ geocoding for Konstanz and München, cache hit on repeat. 9 checks. ``` src/ -├── index.ts # Bootstrap -├── app.ts # Hono app factory -├── config.ts # Environment config +├── index.ts # Bootstrap +├── app.ts # Hono app factory + chain wiring +├── config.ts # Environment config (incl. provider list) ├── routes/ -│ ├── geocode.ts # Forward + reverse endpoints with caching -│ └── health.ts +│ ├── geocode.ts # Forward + reverse, delegates to chain +│ └── health.ts # /health, /health/pelias, /health/providers +├── providers/ +│ ├── types.ts # GeocodingProvider interface, shared shape +│ ├── chain.ts # Failover orchestrator + health cache +│ ├── pelias.ts # Primary: self-hosted DACH Pelias +│ ├── photon.ts # Fallback 1: photon.komoot.io +│ └── nominatim.ts # Fallback 2: nominatim.openstreetmap.org └── lib/ - ├── cache.ts # LRU cache with TTL - └── category-map.ts # OSM → PlaceCategory mapping + ├── cache.ts # LRU cache with TTL (provider-agnostic) + ├── category-map.ts # Pelias-taxonomy → PlaceCategory + ├── osm-category-map.ts # Raw OSM `class:type` → PlaceCategory + └── rate-limiter.ts # Single-token limiter (used by Nominatim) pelias/ -├── docker-compose.yml # Pelias stack -├── pelias.json # Pelias config (DACH region) -└── setup.sh # Initial data import script +├── docker-compose.yml # Pelias stack +├── pelias.json # Pelias config (DACH region) +└── setup.sh # Initial data import script ``` diff --git a/services/mana-geocoding/src/app.ts b/services/mana-geocoding/src/app.ts index 8f5fed328..84ed637cf 100644 --- a/services/mana-geocoding/src/app.ts +++ b/services/mana-geocoding/src/app.ts @@ -6,10 +6,18 @@ import { Hono } from 'hono'; import { cors } from 'hono/cors'; import type { Config } from './config'; -import { createHealthRoutes } from './routes/health'; +import { RateLimiter } from './lib/rate-limiter'; +import { ProviderChain } from './providers/chain'; +import { NominatimProvider } from './providers/nominatim'; +import { PeliasProvider } from './providers/pelias'; +import { PhotonProvider } from './providers/photon'; +import type { GeocodingProvider, ProviderName } from './providers/types'; import { createGeocodeRoutes } from './routes/geocode'; +import { createHealthRoutes } from './routes/health'; export function createApp(config: Config): Hono { + const chain = createChain(config); + const app = new Hono(); app.onError((err, c) => { @@ -25,8 +33,62 @@ export function createApp(config: Config): Hono { }) ); - app.route('/health', createHealthRoutes(config)); - app.route('/api/v1/geocode', createGeocodeRoutes(config)); + app.route('/health', createHealthRoutes(config, chain)); + app.route('/api/v1/geocode', createGeocodeRoutes(config, chain)); return app; } + +/** + * Build the provider chain from config. The order of `config.providers.enabled` + * is honored — providers earlier in the list are tried first. A disabled + * provider is simply not registered, not skipped at runtime. + */ +export function createChain(config: Config): ProviderChain { + const built = new Map(); + + built.set( + 'pelias', + new PeliasProvider({ + apiUrl: config.pelias.apiUrl, + timeoutMs: config.providers.timeoutMs, + }) + ); + + built.set( + 'photon', + new PhotonProvider({ + apiUrl: config.photon.apiUrl, + timeoutMs: config.providers.timeoutMs, + }) + ); + + const nominatimLimiter = new RateLimiter(config.nominatim.intervalMs); + built.set( + 'nominatim', + new NominatimProvider( + { + apiUrl: config.nominatim.apiUrl, + userAgent: config.nominatim.userAgent, + timeoutMs: config.providers.timeoutMs, + }, + nominatimLimiter + ) + ); + + const ordered = config.providers.enabled + .map((name) => built.get(name)) + .filter((p): p is GeocodingProvider => p !== undefined); + + return new ProviderChain({ + providers: ordered, + healthCacheMs: config.providers.healthCacheMs, + log: (level, msg, meta) => { + if (level === 'warn') { + console.warn('[geocoding-chain]', msg, meta ?? ''); + } else { + console.log('[geocoding-chain]', msg, meta ?? ''); + } + }, + }); +} diff --git a/services/mana-geocoding/src/config.ts b/services/mana-geocoding/src/config.ts index bbfec13b1..04bb88bd7 100644 --- a/services/mana-geocoding/src/config.ts +++ b/services/mana-geocoding/src/config.ts @@ -2,12 +2,25 @@ * Application configuration loaded from environment variables. */ +import type { ProviderName } from './providers/types'; + export interface Config { port: number; pelias: { /** Pelias API base URL (the API container, not the placeholder service) */ apiUrl: string; }; + photon: { + /** Photon base URL (defaults to public komoot endpoint) */ + apiUrl: string; + }; + nominatim: { + apiUrl: string; + userAgent: string; + /** Inter-request gap in ms. Public Nominatim policy is 1 req/sec — we + * default to 1100 ms to leave headroom against clock drift. */ + intervalMs: number; + }; cors: { origins: string[]; }; @@ -17,6 +30,16 @@ export interface Config { /** TTL in milliseconds (default: 24h — geocoding results rarely change) */ ttlMs: number; }; + providers: { + /** Order matters — the chain tries them top-down. Anything not in + * this list is disabled. */ + enabled: ProviderName[]; + /** TTL for the per-provider health cache. */ + healthCacheMs: number; + /** Wall-clock timeout per provider attempt (a slow provider falls + * through to the next one). */ + timeoutMs: number; + }; } export function loadConfig(): Config { @@ -25,6 +48,16 @@ export function loadConfig(): Config { pelias: { apiUrl: process.env.PELIAS_API_URL || 'http://localhost:4000/v1', }, + photon: { + apiUrl: process.env.PHOTON_API_URL || 'https://photon.komoot.io', + }, + nominatim: { + apiUrl: process.env.NOMINATIM_API_URL || 'https://nominatim.openstreetmap.org', + userAgent: + process.env.NOMINATIM_USER_AGENT || + 'mana-geocoding/1.0 (+https://mana.how; kontakt@memoro.ai)', + intervalMs: parseInt(process.env.NOMINATIM_INTERVAL_MS || '1100', 10), + }, cors: { origins: (process.env.CORS_ORIGINS || 'http://localhost:5173').split(','), }, @@ -32,5 +65,24 @@ export function loadConfig(): Config { maxEntries: parseInt(process.env.CACHE_MAX_ENTRIES || '5000', 10), ttlMs: parseInt(process.env.CACHE_TTL_MS || String(24 * 60 * 60 * 1000), 10), }, + providers: { + enabled: parseProviderList(process.env.GEOCODING_PROVIDERS, [ + 'pelias', + 'photon', + 'nominatim', + ]), + healthCacheMs: parseInt(process.env.PROVIDER_HEALTH_CACHE_MS || '30000', 10), + timeoutMs: parseInt(process.env.PROVIDER_TIMEOUT_MS || '5000', 10), + }, }; } + +function parseProviderList(raw: string | undefined, fallback: ProviderName[]): ProviderName[] { + if (!raw) return fallback; + const valid: ProviderName[] = ['pelias', 'photon', 'nominatim']; + const parsed = raw + .split(',') + .map((s) => s.trim().toLowerCase()) + .filter((s): s is ProviderName => (valid as string[]).includes(s)); + return parsed.length > 0 ? parsed : fallback; +} diff --git a/services/mana-geocoding/src/lib/__tests__/osm-category-map.test.ts b/services/mana-geocoding/src/lib/__tests__/osm-category-map.test.ts new file mode 100644 index 000000000..42b34a4b1 --- /dev/null +++ b/services/mana-geocoding/src/lib/__tests__/osm-category-map.test.ts @@ -0,0 +1,155 @@ +/** + * Unit tests for the raw-OSM-tag → PlaceCategory mapper. + * + * Covers the cases Photon and Nominatim emit for typical DACH queries. + * The Pelias mapper has its own tests in category-map.test.ts; this file + * tests *only* the raw-OSM-tag path used by the public-API fallbacks. + */ + +import { describe, expect, it } from 'bun:test'; +import { mapOsmTagToPlaceCategory } from '../osm-category-map'; + +describe('mapOsmTagToPlaceCategory', () => { + describe('food (highest priority)', () => { + it('amenity:restaurant → food', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'restaurant')).toBe('food'); + }); + it('amenity:cafe → food', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'cafe')).toBe('food'); + }); + it('amenity:bar → food (not leisure)', () => { + // Bars sit at the food/leisure boundary. We pick food because the + // Places UI groups bars next to restaurants visually. + expect(mapOsmTagToPlaceCategory('amenity', 'bar')).toBe('food'); + }); + it('shop:bakery → food (not shopping)', () => { + // Bakery is technically `shop` in OSM but functionally food. We + // special-case the shop subtypes that are food. + expect(mapOsmTagToPlaceCategory('shop', 'bakery')).toBe('food'); + }); + it('shop:butcher → food', () => { + expect(mapOsmTagToPlaceCategory('shop', 'butcher')).toBe('food'); + }); + }); + + describe('transit', () => { + it('public_transport:station → transit', () => { + expect(mapOsmTagToPlaceCategory('public_transport', 'station')).toBe('transit'); + }); + it('public_transport (any value) → transit', () => { + // Any value of public_transport falls under transit + expect(mapOsmTagToPlaceCategory('public_transport', 'platform')).toBe('transit'); + expect(mapOsmTagToPlaceCategory('public_transport', 'stop_position')).toBe('transit'); + }); + it('railway:station → transit', () => { + expect(mapOsmTagToPlaceCategory('railway', 'station')).toBe('transit'); + }); + it('railway:tram_stop → transit', () => { + expect(mapOsmTagToPlaceCategory('railway', 'tram_stop')).toBe('transit'); + }); + it('highway:bus_stop → transit', () => { + expect(mapOsmTagToPlaceCategory('highway', 'bus_stop')).toBe('transit'); + }); + it('aeroway:aerodrome → transit', () => { + expect(mapOsmTagToPlaceCategory('aeroway', 'aerodrome')).toBe('transit'); + }); + it('amenity:car_rental → transit', () => { + // Matches Pelias mapper's "car_rental → transit" decision + expect(mapOsmTagToPlaceCategory('amenity', 'car_rental')).toBe('transit'); + }); + }); + + describe('shopping (after food, so bakery/butcher fall to food first)', () => { + it('shop:supermarket → shopping', () => { + expect(mapOsmTagToPlaceCategory('shop', 'supermarket')).toBe('shopping'); + }); + it('shop:clothes → shopping', () => { + expect(mapOsmTagToPlaceCategory('shop', 'clothes')).toBe('shopping'); + }); + it('shop:electronics → shopping', () => { + expect(mapOsmTagToPlaceCategory('shop', 'electronics')).toBe('shopping'); + }); + it('amenity:marketplace → shopping', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'marketplace')).toBe('shopping'); + }); + }); + + describe('leisure', () => { + it('leisure:park → leisure', () => { + expect(mapOsmTagToPlaceCategory('leisure', 'park')).toBe('leisure'); + }); + it('tourism:attraction → leisure', () => { + expect(mapOsmTagToPlaceCategory('tourism', 'attraction')).toBe('leisure'); + }); + it('amenity:cinema → leisure', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'cinema')).toBe('leisure'); + }); + it('amenity:theatre → leisure', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'theatre')).toBe('leisure'); + }); + it('amenity:nightclub → leisure', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'nightclub')).toBe('leisure'); + }); + it('sport:tennis → leisure', () => { + expect(mapOsmTagToPlaceCategory('sport', 'tennis')).toBe('leisure'); + }); + }); + + describe('work', () => { + it('amenity:school → work', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'school')).toBe('work'); + }); + it('amenity:university → work', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'university')).toBe('work'); + }); + it('amenity:bank → work', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'bank')).toBe('work'); + }); + it('amenity:townhall → work', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'townhall')).toBe('work'); + }); + it('office:* → work', () => { + expect(mapOsmTagToPlaceCategory('office', 'company')).toBe('work'); + expect(mapOsmTagToPlaceCategory('office', 'lawyer')).toBe('work'); + }); + }); + + describe('other (health/religion/unknown)', () => { + it('amenity:hospital → other', () => { + // Health goes to other (matches Pelias mapper) + expect(mapOsmTagToPlaceCategory('amenity', 'hospital')).toBe('other'); + }); + it('amenity:pharmacy → other', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'pharmacy')).toBe('other'); + }); + it('healthcare:doctor → other', () => { + expect(mapOsmTagToPlaceCategory('healthcare', 'doctor')).toBe('other'); + }); + it('amenity:place_of_worship → other', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'place_of_worship')).toBe('other'); + }); + it('unknown class → other', () => { + expect(mapOsmTagToPlaceCategory('weirdkey', 'weirdvalue')).toBe('other'); + }); + it('undefined inputs → other', () => { + expect(mapOsmTagToPlaceCategory()).toBe('other'); + expect(mapOsmTagToPlaceCategory(undefined, undefined)).toBe('other'); + expect(mapOsmTagToPlaceCategory('amenity')).toBe('other'); // amenity without value + }); + it('place:city → other (no street/road match)', () => { + // Address-layer responses fall through to other + expect(mapOsmTagToPlaceCategory('place', 'city')).toBe('other'); + }); + }); + + describe('priority — value-specific entries beat key-only entries', () => { + it('shop:bakery is food, but shop:somethingElse is shopping', () => { + expect(mapOsmTagToPlaceCategory('shop', 'bakery')).toBe('food'); + expect(mapOsmTagToPlaceCategory('shop', 'supermarket')).toBe('shopping'); + }); + it('amenity:cinema is leisure, but amenity:marketplace is shopping', () => { + expect(mapOsmTagToPlaceCategory('amenity', 'cinema')).toBe('leisure'); + expect(mapOsmTagToPlaceCategory('amenity', 'marketplace')).toBe('shopping'); + }); + }); +}); diff --git a/services/mana-geocoding/src/lib/__tests__/rate-limiter.test.ts b/services/mana-geocoding/src/lib/__tests__/rate-limiter.test.ts new file mode 100644 index 000000000..6cd4df540 --- /dev/null +++ b/services/mana-geocoding/src/lib/__tests__/rate-limiter.test.ts @@ -0,0 +1,95 @@ +/** + * Tests for the single-token rate limiter. + * + * The hot properties: FIFO ordering, inter-task gap honored, abort + * removes from queue without blocking later tasks. + */ + +import { describe, expect, it } from 'bun:test'; +import { RateLimiter } from '../rate-limiter'; + +describe('RateLimiter', () => { + it('runs a single task immediately', async () => { + const lim = new RateLimiter(10); + const start = Date.now(); + const result = await lim.run(async () => 42); + const elapsed = Date.now() - start; + expect(result).toBe(42); + expect(elapsed).toBeLessThan(20); // No initial wait + }); + + it('spaces successive tasks by intervalMs', async () => { + const lim = new RateLimiter(50); + const start = Date.now(); + await lim.run(async () => 1); + await lim.run(async () => 2); + const elapsed = Date.now() - start; + // Second task waits ~50ms before starting. Allow a little jitter. + expect(elapsed).toBeGreaterThanOrEqual(45); + expect(elapsed).toBeLessThan(150); + }); + + it('preserves FIFO order under concurrent calls', async () => { + const lim = new RateLimiter(20); + const order: number[] = []; + const tasks = [1, 2, 3, 4].map((n) => + lim.run(async () => { + order.push(n); + return n; + }) + ); + await Promise.all(tasks); + expect(order).toEqual([1, 2, 3, 4]); + }); + + it('reports pending count', async () => { + const lim = new RateLimiter(50); + // First task takes the slot — kick it off but don't await yet + const t1 = lim.run(async () => { + await new Promise((r) => setTimeout(r, 30)); + return 1; + }); + // Schedule two more — they queue + const t2 = lim.run(async () => 2); + const t3 = lim.run(async () => 3); + // Tiny delay so t1 has acquired the lock + await new Promise((r) => setTimeout(r, 5)); + expect(lim.pending).toBe(2); + await Promise.all([t1, t2, t3]); + expect(lim.pending).toBe(0); + }); + + it('aborts a queued task without breaking later ones', async () => { + const lim = new RateLimiter(40); + const t1 = lim.run(async () => 'first'); + + const ctrl = new AbortController(); + const t2 = lim.run(async () => 'second', ctrl.signal); + const t3 = lim.run(async () => 'third'); + + // Tiny delay to ensure t1 is running and t2/t3 are queued + await new Promise((r) => setTimeout(r, 5)); + ctrl.abort(); + + // t2 should reject with abort + await expect(t2).rejects.toThrow(/aborted/); + // t1 + t3 still resolve + expect(await t1).toBe('first'); + expect(await t3).toBe('third'); + }); + + it('aborts during interval-wait without breaking later tasks', async () => { + const lim = new RateLimiter(80); + await lim.run(async () => 'warmup'); // sets nextSlotAt = now + 80 + + const ctrl = new AbortController(); + const t1 = lim.run(async () => 'next', ctrl.signal); + // While t1 is sleeping in the interval-wait, abort it + setTimeout(() => ctrl.abort(), 10); + await expect(t1).rejects.toThrow(/aborted/); + + // Verify the limiter is still functional + const t2 = await lim.run(async () => 'after'); + expect(t2).toBe('after'); + }); +}); diff --git a/services/mana-geocoding/src/lib/osm-category-map.ts b/services/mana-geocoding/src/lib/osm-category-map.ts new file mode 100644 index 000000000..cb7a28890 --- /dev/null +++ b/services/mana-geocoding/src/lib/osm-category-map.ts @@ -0,0 +1,116 @@ +/** + * Maps raw OSM `class:type` tags (Photon's `osm_key:osm_value`, + * Nominatim's `class:type`) to our 7 PlaceCategories. + * + * Pelias has a curated multi-category taxonomy (`food`, `retail`, + * `transport`, …) that we map via `category-map.ts`. Photon and Nominatim + * return raw OSM tags instead — `amenity:restaurant`, `shop:supermarket`, + * `public_transport:station`, etc. — so they need a different lookup. + * + * The list below is intentionally narrow: it only covers tags we actually + * see in real Photon/Nominatim responses for DACH queries. Anything else + * falls through to `other`, which matches the Pelias mapper's behavior for + * unknown categories. + * + * If a query returns a tag we don't handle, that's the signal to add it + * here — not to try to enumerate all 1000+ OSM types. + */ + +import type { PlaceCategory } from './category-map'; + +interface Tag { + key: string; + value?: string; +} + +/** + * Priority-ordered: first match wins. More-specific entries (with a + * `value`) come before generic key-only entries. Matches Pelias's + * "food beats retail" priority intent. + */ +const OSM_RULES: Array<{ match: Tag; category: PlaceCategory }> = [ + // ── Food (highest priority — restaurants are food, even when also + // tagged amenity or shop) ─────────────────────────────────────── + { match: { key: 'amenity', value: 'restaurant' }, category: 'food' }, + { match: { key: 'amenity', value: 'cafe' }, category: 'food' }, + { match: { key: 'amenity', value: 'fast_food' }, category: 'food' }, + { match: { key: 'amenity', value: 'bar' }, category: 'food' }, + { match: { key: 'amenity', value: 'pub' }, category: 'food' }, + { match: { key: 'amenity', value: 'biergarten' }, category: 'food' }, + { match: { key: 'amenity', value: 'food_court' }, category: 'food' }, + { match: { key: 'amenity', value: 'ice_cream' }, category: 'food' }, + { match: { key: 'shop', value: 'bakery' }, category: 'food' }, + { match: { key: 'shop', value: 'butcher' }, category: 'food' }, + { match: { key: 'shop', value: 'confectionery' }, category: 'food' }, + + // ── Transit ─────────────────────────────────────────────────────── + { match: { key: 'public_transport' }, category: 'transit' }, + { match: { key: 'railway', value: 'station' }, category: 'transit' }, + { match: { key: 'railway', value: 'halt' }, category: 'transit' }, + { match: { key: 'railway', value: 'tram_stop' }, category: 'transit' }, + { match: { key: 'highway', value: 'bus_stop' }, category: 'transit' }, + { match: { key: 'aeroway' }, category: 'transit' }, + { match: { key: 'amenity', value: 'bus_station' }, category: 'transit' }, + { match: { key: 'amenity', value: 'taxi' }, category: 'transit' }, + { match: { key: 'amenity', value: 'ferry_terminal' }, category: 'transit' }, + { match: { key: 'amenity', value: 'car_rental' }, category: 'transit' }, + { match: { key: 'amenity', value: 'parking' }, category: 'transit' }, + + // ── Shopping (after food so bakery/butcher don't fall here) ────── + { match: { key: 'shop' }, category: 'shopping' }, + { match: { key: 'amenity', value: 'marketplace' }, category: 'shopping' }, + + // ── Leisure / entertainment ────────────────────────────────────── + { match: { key: 'leisure' }, category: 'leisure' }, + { match: { key: 'tourism' }, category: 'leisure' }, + { match: { key: 'amenity', value: 'cinema' }, category: 'leisure' }, + { match: { key: 'amenity', value: 'theatre' }, category: 'leisure' }, + { match: { key: 'amenity', value: 'nightclub' }, category: 'leisure' }, + { match: { key: 'amenity', value: 'arts_centre' }, category: 'leisure' }, + { match: { key: 'sport' }, category: 'leisure' }, + + // ── Work-ish ───────────────────────────────────────────────────── + { match: { key: 'amenity', value: 'school' }, category: 'work' }, + { match: { key: 'amenity', value: 'university' }, category: 'work' }, + { match: { key: 'amenity', value: 'college' }, category: 'work' }, + { match: { key: 'amenity', value: 'kindergarten' }, category: 'work' }, + { match: { key: 'amenity', value: 'library' }, category: 'work' }, + { match: { key: 'amenity', value: 'bank' }, category: 'work' }, + { match: { key: 'amenity', value: 'post_office' }, category: 'work' }, + { match: { key: 'amenity', value: 'courthouse' }, category: 'work' }, + { match: { key: 'amenity', value: 'townhall' }, category: 'work' }, + { match: { key: 'amenity', value: 'embassy' }, category: 'work' }, + { match: { key: 'office' }, category: 'work' }, + + // ── Health / religion → other (matches Pelias mapper) ─────────── + { match: { key: 'amenity', value: 'hospital' }, category: 'other' }, + { match: { key: 'amenity', value: 'clinic' }, category: 'other' }, + { match: { key: 'amenity', value: 'doctors' }, category: 'other' }, + { match: { key: 'amenity', value: 'pharmacy' }, category: 'other' }, + { match: { key: 'amenity', value: 'dentist' }, category: 'other' }, + { match: { key: 'amenity', value: 'veterinary' }, category: 'other' }, + { match: { key: 'healthcare' }, category: 'other' }, + { match: { key: 'amenity', value: 'place_of_worship' }, category: 'other' }, + { match: { key: 'amenity', value: 'grave_yard' }, category: 'other' }, + + // Address-layer responses (no class/type, just a road match) → + // caller passes `place`/`highway` here, fall through to other +]; + +/** + * Map a single OSM `class:type` pair to a PlaceCategory. + * + * @param key Photon's `osm_key` or Nominatim's `class` (e.g. `amenity`) + * @param value Photon's `osm_value` or Nominatim's `type` (e.g. `restaurant`) + */ +export function mapOsmTagToPlaceCategory(key?: string, value?: string): PlaceCategory { + if (!key) return 'other'; + + for (const rule of OSM_RULES) { + if (rule.match.key !== key) continue; + if (rule.match.value && rule.match.value !== value) continue; + return rule.category; + } + + return 'other'; +} diff --git a/services/mana-geocoding/src/lib/rate-limiter.ts b/services/mana-geocoding/src/lib/rate-limiter.ts new file mode 100644 index 000000000..ba9138987 --- /dev/null +++ b/services/mana-geocoding/src/lib/rate-limiter.ts @@ -0,0 +1,96 @@ +/** + * Single-token rate limiter. Used for Nominatim's strict 1-req/sec policy. + * + * Why not p-queue / bottleneck: those are great packages but the surface + * we need is tiny (one slot, fixed interval, FIFO) and we want to keep + * the wrapper dependency-light. This is ~30 lines of code with a tight + * test surface. + * + * Behavior: + * - At most 1 task running at a time. + * - Between successive task starts: at least `intervalMs` elapses. + * - Tasks queue in FIFO order. No prioritization, no skipping. + * - Caller can pass an `AbortSignal` to drop their slot if they no + * longer want the answer (e.g. the wrapper's overall timeout fired). + */ + +export class RateLimiter { + private queue: Array<() => void> = []; + private nextSlotAt = 0; + private busy = false; + + constructor(private readonly intervalMs: number) {} + + async run(task: () => Promise, signal?: AbortSignal): Promise { + await this.acquire(signal); + try { + return await task(); + } finally { + this.release(); + } + } + + private async acquire(signal?: AbortSignal): Promise { + // Wait for the previous task to release the slot. The lock is + // implemented as a queue of resume-functions; release() pops one. + // We need a stable reference to remove from the queue on abort — + // a named closure works because we push and splice the same one. + if (this.busy) { + await new Promise((resolve, reject) => { + const entry = () => { + signal?.removeEventListener('abort', onAbort); + resolve(); + }; + const onAbort = () => { + const idx = this.queue.indexOf(entry); + if (idx >= 0) this.queue.splice(idx, 1); + reject(new Error('aborted')); + }; + signal?.addEventListener('abort', onAbort, { once: true }); + this.queue.push(entry); + }); + } + this.busy = true; + + // Honor the inter-task gap. Even if the previous task ran fast, + // we space starts at least `intervalMs` apart. + const wait = this.nextSlotAt - Date.now(); + if (wait > 0) { + try { + await sleep(wait, signal); + } catch (e) { + // Aborted during the inter-task wait. We've already claimed + // the busy flag — release it so the next queued task can + // proceed instead of deadlocking. + this.release(); + throw e; + } + } + + this.nextSlotAt = Date.now() + this.intervalMs; + } + + private release(): void { + const next = this.queue.shift(); + this.busy = !!next; + if (next) next(); + } + + get pending(): number { + return this.queue.length; + } +} + +function sleep(ms: number, signal?: AbortSignal): Promise { + return new Promise((resolve, reject) => { + const t = setTimeout(resolve, ms); + signal?.addEventListener( + 'abort', + () => { + clearTimeout(t); + reject(new Error('aborted')); + }, + { once: true } + ); + }); +} diff --git a/services/mana-geocoding/src/providers/__tests__/chain.test.ts b/services/mana-geocoding/src/providers/__tests__/chain.test.ts new file mode 100644 index 000000000..18ad82e04 --- /dev/null +++ b/services/mana-geocoding/src/providers/__tests__/chain.test.ts @@ -0,0 +1,244 @@ +/** + * Tests for the provider chain — failover, health-cache, fall-through + * semantics. Uses fake providers so we don't hit any real backend. + */ + +import { beforeEach, describe, expect, it } from 'bun:test'; +import { ProviderChain } from '../chain'; +import type { + GeocodingProvider, + GeocodingResult, + ProviderName, + ProviderResponse, + ReverseRequest, + SearchRequest, +} from '../types'; + +class FakeProvider implements GeocodingProvider { + calls = { search: 0, reverse: 0, health: 0 }; + healthCalls: number[] = []; + + constructor( + readonly name: ProviderName, + private behavior: { + search?: () => Promise; + reverse?: () => Promise; + health?: () => Promise; + } = {} + ) {} + + async search(_req: SearchRequest): Promise { + this.calls.search++; + return this.behavior.search ? this.behavior.search() : okResults(this.name); + } + + async reverse(_req: ReverseRequest): Promise { + this.calls.reverse++; + return this.behavior.reverse ? this.behavior.reverse() : okResults(this.name); + } + + async health(): Promise { + this.calls.health++; + this.healthCalls.push(Date.now()); + return this.behavior.health ? this.behavior.health() : true; + } +} + +function okResults(provider: ProviderName, count = 1): ProviderResponse { + const results: GeocodingResult[] = Array.from({ length: count }, (_, i) => ({ + label: `${provider} result ${i}`, + name: `name-${i}`, + latitude: 47.66 + i * 0.01, + longitude: 9.17 + i * 0.01, + address: { city: 'Konstanz' }, + category: 'other', + confidence: 0.9, + provider, + })); + return { ok: true, results }; +} + +const SEARCH: SearchRequest = { q: 'test', limit: 5, lang: 'de' }; + +describe('ProviderChain — happy path', () => { + it('returns the first provider that succeeds', async () => { + const a = new FakeProvider('pelias'); + const b = new FakeProvider('photon'); + const chain = new ProviderChain({ + providers: [a, b], + healthCacheMs: 60_000, + }); + const res = await chain.search(SEARCH); + expect(res.ok).toBe(true); + expect(res.provider).toBe('pelias'); + expect(res.tried).toEqual(['pelias']); + expect(a.calls.search).toBe(1); + expect(b.calls.search).toBe(0); + }); + + it('honors the providers array order', async () => { + const photon = new FakeProvider('photon'); + const pelias = new FakeProvider('pelias'); + // photon first this time + const chain = new ProviderChain({ + providers: [photon, pelias], + healthCacheMs: 60_000, + }); + const res = await chain.search(SEARCH); + expect(res.provider).toBe('photon'); + expect(pelias.calls.search).toBe(0); + }); +}); + +describe('ProviderChain — failover', () => { + it('falls through on unreachable, returns next provider', async () => { + const a = new FakeProvider('pelias', { + search: async () => ({ ok: false, kind: 'unreachable', status: 503 }), + }); + const b = new FakeProvider('photon'); + const chain = new ProviderChain({ providers: [a, b], healthCacheMs: 60_000 }); + const res = await chain.search(SEARCH); + expect(res.ok).toBe(true); + expect(res.provider).toBe('photon'); + expect(res.tried).toEqual(['pelias', 'photon']); + }); + + it('falls through on rate_limited', async () => { + const a = new FakeProvider('photon', { + search: async () => ({ ok: false, kind: 'rate_limited', status: 429 }), + }); + const b = new FakeProvider('nominatim'); + const chain = new ProviderChain({ providers: [a, b], healthCacheMs: 60_000 }); + const res = await chain.search(SEARCH); + expect(res.provider).toBe('nominatim'); + }); + + it('STOPS on empty results — does not consume fallback budget', async () => { + // A clean empty answer is definitive: don't burn through public APIs. + const a = new FakeProvider('pelias', { + search: async () => ({ ok: true, results: [] }), + }); + const b = new FakeProvider('photon'); + const chain = new ProviderChain({ providers: [a, b], healthCacheMs: 60_000 }); + const res = await chain.search(SEARCH); + expect(res.ok).toBe(true); + expect(res.provider).toBe('pelias'); + expect(res.results).toEqual([]); + expect(b.calls.search).toBe(0); + }); + + it('returns ok:false when all providers fail', async () => { + const a = new FakeProvider('pelias', { + search: async () => ({ ok: false, kind: 'unreachable' }), + }); + const b = new FakeProvider('photon', { + search: async () => ({ ok: false, kind: 'unreachable' }), + }); + const chain = new ProviderChain({ providers: [a, b], healthCacheMs: 60_000 }); + const res = await chain.search(SEARCH); + expect(res.ok).toBe(false); + expect(res.results).toEqual([]); + expect(res.tried).toEqual(['pelias', 'photon']); + }); +}); + +describe('ProviderChain — health cache', () => { + it('skips a provider whose health probe returned false', async () => { + const dead = new FakeProvider('pelias', { health: async () => false }); + const alive = new FakeProvider('photon'); + const chain = new ProviderChain({ providers: [dead, alive], healthCacheMs: 60_000 }); + const res = await chain.search(SEARCH); + expect(res.tried).toEqual(['photon']); // pelias was skipped, not tried + expect(dead.calls.search).toBe(0); + expect(dead.calls.health).toBe(1); + }); + + it('caches health for healthCacheMs — only one probe per window', async () => { + const a = new FakeProvider('pelias'); + const chain = new ProviderChain({ providers: [a], healthCacheMs: 60_000 }); + await chain.search(SEARCH); + await chain.search(SEARCH); + await chain.search(SEARCH); + expect(a.calls.health).toBe(1); // health probed once, then cached + expect(a.calls.search).toBe(3); + }); + + it('marks provider unhealthy when search fails, skipping it next time', async () => { + let failNext = true; + const flaky = new FakeProvider('pelias', { + search: async () => (failNext ? { ok: false, kind: 'unreachable' } : okResults('pelias')), + }); + const alive = new FakeProvider('photon'); + const chain = new ProviderChain({ providers: [flaky, alive], healthCacheMs: 60_000 }); + + // First call: pelias fails → cached unhealthy → photon serves + const r1 = await chain.search(SEARCH); + expect(r1.provider).toBe('photon'); + expect(r1.tried).toEqual(['pelias', 'photon']); + + // Second call: pelias is in unhealthy cache, not tried at all + failNext = false; // would now succeed but never gets called + const r2 = await chain.search(SEARCH); + expect(r2.provider).toBe('photon'); + expect(r2.tried).toEqual(['photon']); + expect(flaky.calls.search).toBe(1); + }); + + it('refreshes health after cache expires', async () => { + const dead = new FakeProvider('pelias', { health: async () => false }); + const alive = new FakeProvider('photon'); + // 1ms cache for fast test + const chain = new ProviderChain({ providers: [dead, alive], healthCacheMs: 1 }); + await chain.search(SEARCH); + await new Promise((r) => setTimeout(r, 5)); + await chain.search(SEARCH); + // Health re-probed after expiry + expect(dead.calls.health).toBe(2); + }); + + it('clearHealthCache forces re-probe', async () => { + const a = new FakeProvider('pelias'); + const chain = new ProviderChain({ providers: [a], healthCacheMs: 60_000 }); + await chain.search(SEARCH); + expect(a.calls.health).toBe(1); + chain.clearHealthCache(); + await chain.search(SEARCH); + expect(a.calls.health).toBe(2); + }); +}); + +describe('ProviderChain — getHealthSnapshot', () => { + it('reports per-provider health + age', async () => { + const ok = new FakeProvider('pelias'); + const dead = new FakeProvider('photon', { health: async () => false }); + const chain = new ProviderChain({ providers: [ok, dead], healthCacheMs: 60_000 }); + await chain.search(SEARCH); + const snap = chain.getHealthSnapshot(); + expect(snap).toHaveLength(2); + expect(snap[0]).toMatchObject({ name: 'pelias', healthy: true }); + expect(snap[1]).toMatchObject({ name: 'photon', healthy: false }); + expect(snap[0].ageMs).toBeLessThan(1000); + }); + + it('reports Infinity age for never-probed providers', async () => { + const a = new FakeProvider('pelias'); + const chain = new ProviderChain({ providers: [a], healthCacheMs: 60_000 }); + const snap = chain.getHealthSnapshot(); + expect(snap[0].ageMs).toBe(Infinity); + expect(snap[0].healthy).toBe(false); // unknown defaults to unhealthy + }); +}); + +describe('ProviderChain — reverse', () => { + it('uses the same provider order for reverse', async () => { + const a = new FakeProvider('pelias', { + reverse: async () => ({ ok: false, kind: 'unreachable' }), + }); + const b = new FakeProvider('photon'); + const chain = new ProviderChain({ providers: [a, b], healthCacheMs: 60_000 }); + const res = await chain.reverse({ lat: '47.66', lon: '9.17', lang: 'de' }); + expect(res.provider).toBe('photon'); + expect(b.calls.reverse).toBe(1); + expect(b.calls.search).toBe(0); + }); +}); diff --git a/services/mana-geocoding/src/providers/__tests__/nominatim-normalizer.test.ts b/services/mana-geocoding/src/providers/__tests__/nominatim-normalizer.test.ts new file mode 100644 index 000000000..e9bbb2559 --- /dev/null +++ b/services/mana-geocoding/src/providers/__tests__/nominatim-normalizer.test.ts @@ -0,0 +1,150 @@ +/** + * Tests for normalizing Nominatim's flat-JSON shape into our GeocodingResult. + * + * Nominatim differs from Photon/Pelias in three subtle ways we lock in: + * 1. Lat/lon are STRINGS, not numbers — the normalizer must parseFloat. + * 2. Display name is a comma-noisy hierarchy ("Konzil, Hafenstraße, + * Konstanz, Konstanz, Regierungsbezirk Freiburg, Baden-Württemberg, + * Germany"). We build our own label from `address.*` instead. + * 3. Venue name lives under `address.amenity|shop|tourism|...` depending + * on the OSM class. We probe each in priority order. + */ + +import { describe, expect, it } from 'bun:test'; +import { normalizeNominatimResult } from '../nominatim'; + +describe('normalizeNominatimResult', () => { + it('parses string lat/lon into numbers', () => { + const result = normalizeNominatimResult({ + lat: '47.6634', + lon: '9.1758', + class: 'amenity', + type: 'restaurant', + display_name: 'Konzil, Konstanz, Germany', + address: { road: 'Hafenstraße', amenity: 'Konzil', country: 'Germany' }, + }); + expect(typeof result.latitude).toBe('number'); + expect(typeof result.longitude).toBe('number'); + expect(result.latitude).toBeCloseTo(47.6634, 4); + expect(result.longitude).toBeCloseTo(9.1758, 4); + }); + + it('extracts venue name from address.amenity for restaurants', () => { + const result = normalizeNominatimResult({ + lat: '47.66', + lon: '9.17', + class: 'amenity', + type: 'restaurant', + address: { amenity: 'Konzil Restaurant', city: 'Konstanz', country: 'Germany' }, + }); + expect(result.name).toBe('Konzil Restaurant'); + expect(result.category).toBe('food'); + }); + + it('extracts venue name from address.shop for retail', () => { + const result = normalizeNominatimResult({ + lat: '47.66', + lon: '9.17', + class: 'shop', + type: 'supermarket', + address: { shop: 'Edeka', road: 'Marktstätte', city: 'Konstanz' }, + }); + expect(result.name).toBe('Edeka'); + expect(result.category).toBe('shopping'); + }); + + it('falls back to top-level name when no address.* venue name', () => { + const result = normalizeNominatimResult({ + lat: '47.66', + lon: '9.17', + class: 'place', + type: 'city', + name: 'Konstanz', + address: { city: 'Konstanz', country: 'Germany' }, + }); + expect(result.name).toBe('Konstanz'); + }); + + it('handles a pure street result (no venue name) without crashing', () => { + const result = normalizeNominatimResult({ + lat: '47.665', + lon: '9.176', + class: 'highway', + type: 'residential', + display_name: + 'Münsterplatz, Altstadt, Konstanz, Regierungsbezirk Freiburg, Baden-Württemberg, Germany', + address: { road: 'Münsterplatz', city: 'Konstanz', country: 'Germany', postcode: '78462' }, + }); + expect(result.name).toBe(''); + expect(result.label).toBe('Münsterplatz, 78462 Konstanz, Germany'); + expect(result.category).toBe('other'); + }); + + it('uses display_name as ultimate fallback when nothing structured', () => { + const result = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + display_name: 'Some, comma, separated, label', + }); + expect(result.label).toBe('Some, comma, separated, label'); + }); + + it('city falls through town → village → hamlet for rural addresses', () => { + const village = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + address: { village: 'Kleinkleckersdorf', country: 'Germany' }, + }); + expect(village.address.city).toBe('Kleinkleckersdorf'); + + const hamlet = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + address: { hamlet: 'Mini-Weiler', country: 'Germany' }, + }); + expect(hamlet.address.city).toBe('Mini-Weiler'); + }); + + it('uses neutral 0.5 confidence when importance is missing', () => { + const result = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + class: 'amenity', + type: 'restaurant', + }); + expect(result.confidence).toBe(0.5); + }); + + it('uses importance score when present', () => { + const result = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + class: 'amenity', + type: 'restaurant', + importance: 0.83, + }); + expect(result.confidence).toBeCloseTo(0.83, 2); + }); + + it('marks results with provider:nominatim', () => { + const result = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + class: 'place', + type: 'city', + }); + expect(result.provider).toBe('nominatim'); + }); + + it('does not set peliasCategories', () => { + // Consumer side keys off the absence of this field as a "fallback + // provider" signal. + const result = normalizeNominatimResult({ + lat: '47.0', + lon: '9.0', + class: 'amenity', + type: 'restaurant', + }); + expect(result.peliasCategories).toBeUndefined(); + }); +}); diff --git a/services/mana-geocoding/src/providers/__tests__/photon-normalizer.test.ts b/services/mana-geocoding/src/providers/__tests__/photon-normalizer.test.ts new file mode 100644 index 000000000..6f08a635c --- /dev/null +++ b/services/mana-geocoding/src/providers/__tests__/photon-normalizer.test.ts @@ -0,0 +1,127 @@ +/** + * Tests for normalizing Photon's GeoJSON shape into our GeocodingResult. + * + * Real-world fixtures captured from photon.komoot.io for DACH queries. + * The mapping logic is the brittle part — a Photon response shape change + * (different `osm_key` casing, missing `housenumber`, …) would break our + * Places UI, so we lock the shape with these tests. + */ + +import { describe, expect, it } from 'bun:test'; +import { normalizePhotonFeature } from '../photon'; + +describe('normalizePhotonFeature', () => { + it('maps a restaurant with full address fields → food', () => { + const result = normalizePhotonFeature({ + type: 'Feature', + geometry: { type: 'Point', coordinates: [9.1758, 47.6634] }, + properties: { + osm_id: 12345, + osm_type: 'N', + osm_key: 'amenity', + osm_value: 'restaurant', + name: 'Konzil', + country: 'Germany', + city: 'Konstanz', + postcode: '78462', + street: 'Hafenstraße', + housenumber: '2', + importance: 0.78, + }, + }); + + expect(result.name).toBe('Konzil'); + expect(result.latitude).toBeCloseTo(47.6634, 4); + expect(result.longitude).toBeCloseTo(9.1758, 4); + expect(result.category).toBe('food'); + expect(result.address).toEqual({ + street: 'Hafenstraße', + houseNumber: '2', + postalCode: '78462', + city: 'Konstanz', + state: undefined, + country: 'Germany', + }); + expect(result.confidence).toBeCloseTo(0.78, 2); + expect(result.provider).toBe('photon'); + // peliasCategories deliberately absent for non-Pelias providers + expect(result.peliasCategories).toBeUndefined(); + }); + + it('builds label from structured fields', () => { + const result = normalizePhotonFeature({ + type: 'Feature', + geometry: { type: 'Point', coordinates: [11.575, 48.137] }, + properties: { + osm_key: 'railway', + osm_value: 'station', + name: 'München Hauptbahnhof', + country: 'Germany', + city: 'München', + postcode: '80335', + }, + }); + expect(result.label).toBe('München Hauptbahnhof, 80335 München, Germany'); + expect(result.category).toBe('transit'); + }); + + it('falls back to district when city is missing (rural addresses)', () => { + const result = normalizePhotonFeature({ + type: 'Feature', + geometry: { type: 'Point', coordinates: [10.0, 48.5] }, + properties: { + osm_key: 'place', + osm_value: 'hamlet', + name: 'Tiny-Hamlet', + country: 'Germany', + district: 'Some-District', + postcode: '12345', + }, + }); + expect(result.address.city).toBe('Some-District'); + }); + + it('uses neutral 0.5 confidence when importance is missing', () => { + const result = normalizePhotonFeature({ + type: 'Feature', + geometry: { type: 'Point', coordinates: [9.0, 47.0] }, + properties: { osm_key: 'place', osm_value: 'city', name: 'X' }, + }); + expect(result.confidence).toBe(0.5); + }); + + it('handles a pure address (no name) gracefully', () => { + const result = normalizePhotonFeature({ + type: 'Feature', + geometry: { type: 'Point', coordinates: [9.176, 47.665] }, + properties: { + osm_key: 'place', + osm_value: 'house', + street: 'Münsterplatz', + housenumber: '5', + postcode: '78462', + city: 'Konstanz', + country: 'Germany', + }, + }); + expect(result.name).toBe(''); + // Label still composes from the available parts + expect(result.label).toBe('Münsterplatz 5, 78462 Konstanz, Germany'); + expect(result.category).toBe('other'); // place:house → other + }); + + it('coordinates: Photon emits [lon, lat] — normalizer must NOT swap', () => { + // Catches the all-too-easy lon/lat flip when porting from Pelias. + const result = normalizePhotonFeature({ + type: 'Feature', + geometry: { type: 'Point', coordinates: [9.1758, 47.6634] }, + properties: { osm_key: 'place', osm_value: 'city', name: 'Konstanz' }, + }); + // 9.x is longitude (close to 9°E), 47.x is latitude (close to 47°N). + // A swap would put us into the Indian Ocean. + expect(result.longitude).toBeGreaterThan(8); + expect(result.longitude).toBeLessThan(10); + expect(result.latitude).toBeGreaterThan(47); + expect(result.latitude).toBeLessThan(48); + }); +}); diff --git a/services/mana-geocoding/src/providers/chain.ts b/services/mana-geocoding/src/providers/chain.ts new file mode 100644 index 000000000..c55be87eb --- /dev/null +++ b/services/mana-geocoding/src/providers/chain.ts @@ -0,0 +1,140 @@ +/** + * Provider chain — tries providers in priority order until one answers. + * + * Failure handling: + * - `ok: false` (network/5xx/429) → fall through to next provider + * - `ok: true` with empty results → STOP (don't burn through public APIs + * for a query that legitimately doesn't match) + * - `ok: true` with results → cache + return + * + * Health-cache: + * Calling each provider's `health()` per-request would add an RTT to + * every search. Instead we cache health for `healthCacheMs` and skip + * providers that were last seen unhealthy. A skipped provider isn't + * tried again until the cache entry expires, at which point we probe + * it before the next request (lazy refresh). + */ + +import type { + GeocodingProvider, + GeocodingResult, + ProviderName, + ProviderResponse, + ReverseRequest, + SearchRequest, +} from './types'; + +export interface ChainConfig { + providers: GeocodingProvider[]; + /** TTL for the per-provider health cache. */ + healthCacheMs: number; + /** Optional logger — defaults to console.warn for failures so a flaky + * fallback shows up in logs without polluting happy-path output. */ + log?: (level: 'info' | 'warn', msg: string, meta?: Record) => void; +} + +interface HealthEntry { + healthy: boolean; + checkedAt: number; +} + +export interface ChainResponse { + ok: boolean; + provider?: ProviderName; + results: GeocodingResult[]; + /** Names of providers that were tried but failed before we got a hit. + * Useful for telemetry (`x-geocoding-tried` response header). */ + tried: ProviderName[]; +} + +export class ProviderChain { + private health = new Map(); + + constructor(private readonly config: ChainConfig) {} + + async search(req: SearchRequest, signal?: AbortSignal): Promise { + return this.run(req, signal, (p, r, s) => p.search(r as SearchRequest, s)); + } + + async reverse(req: ReverseRequest, signal?: AbortSignal): Promise { + return this.run(req, signal, (p, r, s) => p.reverse(r as ReverseRequest, s)); + } + + private async run( + req: SearchRequest | ReverseRequest, + signal: AbortSignal | undefined, + call: ( + provider: GeocodingProvider, + req: SearchRequest | ReverseRequest, + signal?: AbortSignal + ) => Promise + ): Promise { + const tried: ProviderName[] = []; + + for (const provider of this.config.providers) { + if (!(await this.isHealthy(provider, signal))) { + continue; + } + + tried.push(provider.name); + const result = await call(provider, req, signal); + + if (result.ok) { + // Success — even if results=[], that's a definitive answer. + return { ok: true, provider: provider.name, results: result.results, tried }; + } + + // Failure — mark unhealthy and fall through. + this.health.set(provider.name, { healthy: false, checkedAt: Date.now() }); + this.config.log?.('warn', `${provider.name} failed`, { + kind: result.kind, + status: result.status, + error: result.error, + }); + } + + return { ok: false, results: [], tried }; + } + + /** + * Health-cache lookup with lazy refresh. Returns true if the provider + * is believed to be reachable; probes the actual backend if the cache + * entry is missing or stale. + */ + private async isHealthy(provider: GeocodingProvider, signal?: AbortSignal): Promise { + const cached = this.health.get(provider.name); + const now = Date.now(); + if (cached && now - cached.checkedAt < this.config.healthCacheMs) { + return cached.healthy; + } + + // Stale or missing — refresh. We don't await this aggressively in + // happy paths (Pelias up + healthy is the cheapest case), but on + // cold-start every entry is missing so the first request pays for + // one health probe per provider. + const healthy = await provider.health(signal); + this.health.set(provider.name, { healthy, checkedAt: now }); + if (!healthy) { + this.config.log?.('warn', `${provider.name} health check failed`); + } + return healthy; + } + + /** Snapshot of provider health, for /health endpoint reporting. */ + getHealthSnapshot(): Array<{ name: ProviderName; healthy: boolean; ageMs: number }> { + const now = Date.now(); + return this.config.providers.map((p) => { + const entry = this.health.get(p.name); + return { + name: p.name, + healthy: entry?.healthy ?? false, + ageMs: entry ? now - entry.checkedAt : Infinity, + }; + }); + } + + /** Force a re-probe on the next request. Useful in tests. */ + clearHealthCache(): void { + this.health.clear(); + } +} diff --git a/services/mana-geocoding/src/providers/nominatim.ts b/services/mana-geocoding/src/providers/nominatim.ts new file mode 100644 index 000000000..37e5edd25 --- /dev/null +++ b/services/mana-geocoding/src/providers/nominatim.ts @@ -0,0 +1,244 @@ +/** + * Nominatim provider — public OSM endpoint at nominatim.openstreetmap.org. + * + * Strict 1-req/sec policy per usage policy. The provider takes a + * `RateLimiter` so a per-process Nominatim queue can be shared across + * search/reverse. A custom `User-Agent` is required (Nominatim returns + * 403 to default-UA fetches). + * + * Compared to Pelias/Photon, Nominatim returns a single flat array + * rather than GeoJSON. We adapt the shape and synthesize a confidence + * score from `importance`. + * + * https://nominatim.org/release-docs/develop/api/Search/ + * https://operations.osmfoundation.org/policies/nominatim/ + */ + +import { mapOsmTagToPlaceCategory } from '../lib/osm-category-map'; +import type { RateLimiter } from '../lib/rate-limiter'; +import type { + GeocodingProvider, + GeocodingResult, + ProviderResponse, + ReverseRequest, + SearchRequest, +} from './types'; + +export interface NominatimConfig { + apiUrl: string; + userAgent: string; + timeoutMs: number; +} + +export class NominatimProvider implements GeocodingProvider { + readonly name = 'nominatim' as const; + + constructor( + private readonly config: NominatimConfig, + private readonly limiter: RateLimiter + ) {} + + async search(req: SearchRequest, signal?: AbortSignal): Promise { + const params = new URLSearchParams({ + q: req.q.trim(), + format: 'json', + addressdetails: '1', + limit: String(req.limit), + 'accept-language': req.lang, + }); + + try { + const json = await this.limiter.run( + () => this.fetchJson(`/search?${params}`, signal), + signal + ); + if (!json.ok) { + return { + ok: false, + kind: json.status === 429 ? 'rate_limited' : 'unreachable', + status: json.status, + }; + } + return { ok: true, results: json.data.map(normalizeNominatimResult) }; + } catch (e) { + return { ok: false, kind: 'unreachable', error: errorMessage(e) }; + } + } + + async reverse(req: ReverseRequest, signal?: AbortSignal): Promise { + const params = new URLSearchParams({ + lat: req.lat, + lon: req.lon, + format: 'json', + addressdetails: '1', + 'accept-language': req.lang, + }); + + try { + const json = await this.limiter.run( + () => this.fetchJson(`/reverse?${params}`, signal), + signal + ); + if (!json.ok) { + return { + ok: false, + kind: json.status === 429 ? 'rate_limited' : 'unreachable', + status: json.status, + }; + } + // /reverse returns a single object rather than an array. Nominatim + // also returns `{ error: 'Unable to geocode' }` with status 200 + // when no result was found — treat that as an empty success. + const single = json.data; + if (!single || (single as unknown as { error?: string }).error) { + return { ok: true, results: [] }; + } + return { ok: true, results: [normalizeNominatimResult(single)] }; + } catch (e) { + return { ok: false, kind: 'unreachable', error: errorMessage(e) }; + } + } + + async health(signal?: AbortSignal): Promise { + try { + // Nominatim exposes /status as a no-rate-limit health page. + // Use a fresh fetch (don't go through the limiter) so a backed-up + // search queue doesn't make health checks artificially fail. + const res = await fetch(`${this.config.apiUrl}/status?format=json`, { + signal: combineSignals(signal, AbortSignal.timeout(this.config.timeoutMs)), + headers: { 'User-Agent': this.config.userAgent }, + }); + return res.ok; + } catch { + return false; + } + } + + private async fetchJson( + path: string, + signal?: AbortSignal + ): Promise<{ ok: true; status: number; data: T } | { ok: false; status: number }> { + const res = await fetch(`${this.config.apiUrl}${path}`, { + signal: combineSignals(signal, AbortSignal.timeout(this.config.timeoutMs)), + headers: { 'User-Agent': this.config.userAgent }, + }); + if (!res.ok) return { ok: false, status: res.status }; + const data = (await res.json()) as T; + return { ok: true, status: res.status, data }; + } +} + +// --- Nominatim native types --- + +interface NominatimSearchResult { + place_id?: number; + osm_type?: string; + osm_id?: number; + lat: string; + lon: string; + display_name?: string; + /** OSM `class` (amenity, shop, …) */ + class?: string; + /** OSM `type` (restaurant, supermarket, …) */ + type?: string; + /** Top-level name when present (venue queries). For pure addresses Nominatim + * doesn't fill this — we fall back to the first address line. */ + name?: string; + importance?: number; + address?: { + road?: string; + house_number?: string; + postcode?: string; + city?: string; + town?: string; + village?: string; + hamlet?: string; + state?: string; + country?: string; + country_code?: string; + // Nominatim returns the venue name under one of these keys depending + // on the OSM class. We try them in order. + amenity?: string; + shop?: string; + tourism?: string; + leisure?: string; + building?: string; + }; +} + +export function normalizeNominatimResult(r: NominatimSearchResult): GeocodingResult { + const lat = parseFloat(r.lat); + const lon = parseFloat(r.lon); + const a = r.address ?? {}; + + // Nominatim's `display_name` is a comma-separated label that includes + // hierarchy noise (county, district, region) we don't want. Build our + // own from the structured fields when available; fall back to display_name. + const venueName = r.name || a.amenity || a.shop || a.tourism || a.leisure || ''; + const street = a.road; + const city = a.city || a.town || a.village || a.hamlet; + const label = buildNominatimLabel({ + venueName, + street, + houseNumber: a.house_number, + postalCode: a.postcode, + city, + country: a.country, + fallbackDisplayName: r.display_name, + }); + + return { + label, + name: venueName, + latitude: lat, + longitude: lon, + address: { + street, + houseNumber: a.house_number, + postalCode: a.postcode, + city, + state: a.state, + country: a.country, + }, + category: mapOsmTagToPlaceCategory(r.class, r.type), + confidence: typeof r.importance === 'number' ? r.importance : 0.5, + provider: 'nominatim', + }; +} + +interface LabelParts { + venueName: string; + street?: string; + houseNumber?: string; + postalCode?: string; + city?: string; + country?: string; + fallbackDisplayName?: string; +} + +function buildNominatimLabel(parts: LabelParts): string { + const streetLine = [parts.street, parts.houseNumber].filter(Boolean).join(' '); + const cityLine = [parts.postalCode, parts.city].filter(Boolean).join(' '); + const composed = [parts.venueName, streetLine, cityLine, parts.country] + .filter((part) => part && part.length > 0) + .join(', '); + return composed || parts.fallbackDisplayName || ''; +} + +function errorMessage(e: unknown): string { + return e instanceof Error ? e.message : String(e); +} + +function combineSignals(...signals: Array): AbortSignal { + const real = signals.filter((s): s is AbortSignal => !!s); + if (real.length === 1) return real[0]; + const ctrl = new AbortController(); + for (const s of real) { + if (s.aborted) { + ctrl.abort(s.reason); + break; + } + s.addEventListener('abort', () => ctrl.abort(s.reason), { once: true }); + } + return ctrl.signal; +} diff --git a/services/mana-geocoding/src/providers/pelias.ts b/services/mana-geocoding/src/providers/pelias.ts new file mode 100644 index 000000000..ad3c703c2 --- /dev/null +++ b/services/mana-geocoding/src/providers/pelias.ts @@ -0,0 +1,177 @@ +/** + * Pelias provider — primary backend, self-hosted with the DACH OSM index. + * + * Forward-search uses /autocomplete first (fast venue match) and falls + * back to /search if autocomplete returns zero features (autocomplete + * deliberately excludes the address layer for perf). + */ + +import { mapPeliasToPlaceCategory } from '../lib/category-map'; +import type { + GeocodingProvider, + GeocodingResult, + ProviderResponse, + ReverseRequest, + SearchRequest, +} from './types'; + +export interface PeliasConfig { + apiUrl: string; + timeoutMs: number; +} + +export class PeliasProvider implements GeocodingProvider { + readonly name = 'pelias' as const; + + constructor(private readonly config: PeliasConfig) {} + + async search(req: SearchRequest, signal?: AbortSignal): Promise { + const params = new URLSearchParams({ + text: req.q.trim(), + size: String(req.limit), + lang: req.lang, + }); + if (req.focusLat && req.focusLon) { + params.set('focus.point.lat', req.focusLat); + params.set('focus.point.lon', req.focusLon); + } + + // /autocomplete first (fast venue match), then /search if empty. + // Both attempts are wrapped in the same external timeout signal so + // a cumulative slow Pelias still falls through to the next provider. + try { + const ac = await this.fetch(`/autocomplete?${params}`, signal); + if (!ac.ok) return { ok: false, kind: 'unreachable', status: ac.status }; + let features = ac.features; + + if (features.length === 0) { + const s = await this.fetch(`/search?${params}`, signal); + if (s.ok) features = s.features; + // /search returning a non-OK after /autocomplete returned OK-but-empty + // is a clean zero-results answer, not a fall-through. We trust the + // successful autocomplete probe. + } + + return { ok: true, results: features.map(normalizePeliasFeature) }; + } catch (e) { + return { ok: false, kind: 'unreachable', error: errorMessage(e) }; + } + } + + async reverse(req: ReverseRequest, signal?: AbortSignal): Promise { + const params = new URLSearchParams({ + 'point.lat': req.lat, + 'point.lon': req.lon, + size: '3', + lang: req.lang, + }); + + try { + const r = await this.fetch(`/reverse?${params}`, signal); + if (!r.ok) return { ok: false, kind: 'unreachable', status: r.status }; + return { ok: true, results: r.features.map(normalizePeliasFeature) }; + } catch (e) { + return { ok: false, kind: 'unreachable', error: errorMessage(e) }; + } + } + + async health(signal?: AbortSignal): Promise { + try { + const url = `${this.config.apiUrl}/status`; + const res = await fetch(url, { + signal: combineSignals(signal, AbortSignal.timeout(this.config.timeoutMs)), + }); + // /v1/status doesn't exist on every Pelias version — a 404 still + // means the server is up. Anything else (5xx, ECONNREFUSED, timeout) + // is unhealthy. + return res.ok || res.status === 404; + } catch { + return false; + } + } + + private async fetch( + path: string, + signal?: AbortSignal + ): Promise<{ ok: boolean; status: number; features: PeliasFeature[] }> { + const res = await fetch(`${this.config.apiUrl}${path}`, { + signal: combineSignals(signal, AbortSignal.timeout(this.config.timeoutMs)), + }); + if (!res.ok) return { ok: false, status: res.status, features: [] }; + const data = (await res.json()) as PeliasResponse; + return { ok: true, status: res.status, features: data.features ?? [] }; + } +} + +// --- Pelias native types --- + +interface PeliasResponse { + type: 'FeatureCollection'; + features: PeliasFeature[]; +} + +interface PeliasFeature { + type: 'Feature'; + geometry: { + type: 'Point'; + coordinates: [number, number]; // [lon, lat] + }; + properties: { + id?: string; + name?: string; + label?: string; + confidence?: number; + layer?: string; + street?: string; + housenumber?: string; + postalcode?: string; + locality?: string; + region?: string; + country?: string; + category?: string[]; + }; +} + +export function normalizePeliasFeature(feature: PeliasFeature): GeocodingResult { + const props = feature.properties; + const [lon, lat] = feature.geometry.coordinates; + + return { + label: props.label || props.name || '', + name: props.name || '', + latitude: lat, + longitude: lon, + address: { + street: props.street, + houseNumber: props.housenumber, + postalCode: props.postalcode, + city: props.locality, + state: props.region, + country: props.country, + }, + category: mapPeliasToPlaceCategory(props.category, props.layer), + peliasCategories: props.category, + confidence: props.confidence ?? 0, + provider: 'pelias', + }; +} + +function errorMessage(e: unknown): string { + return e instanceof Error ? e.message : String(e); +} + +/** Combine an external AbortSignal with our own timeout signal. AbortSignal.any + * exists in Bun but TS typing is patchy across runtimes — small helper. */ +function combineSignals(...signals: Array): AbortSignal { + const real = signals.filter((s): s is AbortSignal => !!s); + if (real.length === 1) return real[0]; + const ctrl = new AbortController(); + for (const s of real) { + if (s.aborted) { + ctrl.abort(s.reason); + break; + } + s.addEventListener('abort', () => ctrl.abort(s.reason), { once: true }); + } + return ctrl.signal; +} diff --git a/services/mana-geocoding/src/providers/photon.ts b/services/mana-geocoding/src/providers/photon.ts new file mode 100644 index 000000000..c0d4be113 --- /dev/null +++ b/services/mana-geocoding/src/providers/photon.ts @@ -0,0 +1,207 @@ +/** + * Photon provider — komoot's public photon.komoot.io. + * + * Photon is built on top of an OSM index (Elasticsearch + Nominatim + * importer). The HTTP shape is GeoJSON FeatureCollection with `properties` + * holding `osm_key`/`osm_value` raw OSM tags + structured address fields. + * + * Compared to Pelias: + * + No rate limit advertised, but be a polite neighbor: short timeouts, + * no retries, cache aggressively. + * + Reverse geocoding takes lon/lat (note the order — different from + * Pelias's point.lat/point.lon). Easy to flip if not careful. + * - No `confidence` field. We approximate from `importance` (0–1) when + * present, else 0.5 as a neutral default. + * - No DACH-specific tuning — German venue names sometimes lose umlauts + * in display labels. Acceptable for a fallback. + */ + +import { mapOsmTagToPlaceCategory } from '../lib/osm-category-map'; +import type { + GeocodingProvider, + GeocodingResult, + ProviderResponse, + ReverseRequest, + SearchRequest, +} from './types'; + +export interface PhotonConfig { + apiUrl: string; + timeoutMs: number; +} + +export class PhotonProvider implements GeocodingProvider { + readonly name = 'photon' as const; + + constructor(private readonly config: PhotonConfig) {} + + async search(req: SearchRequest, signal?: AbortSignal): Promise { + const params = new URLSearchParams({ + q: req.q.trim(), + limit: String(req.limit), + lang: req.lang, + }); + if (req.focusLat && req.focusLon) { + params.set('lat', req.focusLat); + params.set('lon', req.focusLon); + } + + try { + const res = await this.fetch(`/api?${params}`, signal); + if (!res.ok) { + return { + ok: false, + kind: res.status === 429 ? 'rate_limited' : 'unreachable', + status: res.status, + }; + } + return { ok: true, results: res.features.map(normalizePhotonFeature) }; + } catch (e) { + return { ok: false, kind: 'unreachable', error: errorMessage(e) }; + } + } + + async reverse(req: ReverseRequest, signal?: AbortSignal): Promise { + // Photon expects lon + lat, not point.lat/point.lon. Easy footgun. + const params = new URLSearchParams({ + lat: req.lat, + lon: req.lon, + lang: req.lang, + }); + + try { + const res = await this.fetch(`/reverse?${params}`, signal); + if (!res.ok) { + return { + ok: false, + kind: res.status === 429 ? 'rate_limited' : 'unreachable', + status: res.status, + }; + } + return { ok: true, results: res.features.map(normalizePhotonFeature) }; + } catch (e) { + return { ok: false, kind: 'unreachable', error: errorMessage(e) }; + } + } + + async health(signal?: AbortSignal): Promise { + try { + // Tiny probe — searching for a Konstanz landmark Photon should + // always know. We don't care about the content, only the HTTP + // status. 200/empty is fine; anything else marks unhealthy. + const res = await fetch(`${this.config.apiUrl}/api?q=Konstanz&limit=1`, { + signal: combineSignals(signal, AbortSignal.timeout(this.config.timeoutMs)), + }); + return res.ok; + } catch { + return false; + } + } + + private async fetch( + path: string, + signal?: AbortSignal + ): Promise<{ ok: boolean; status: number; features: PhotonFeature[] }> { + const res = await fetch(`${this.config.apiUrl}${path}`, { + signal: combineSignals(signal, AbortSignal.timeout(this.config.timeoutMs)), + }); + if (!res.ok) return { ok: false, status: res.status, features: [] }; + const data = (await res.json()) as PhotonResponse; + return { ok: true, status: res.status, features: data.features ?? [] }; + } +} + +// --- Photon native types --- + +interface PhotonResponse { + type: 'FeatureCollection'; + features: PhotonFeature[]; +} + +interface PhotonFeature { + type: 'Feature'; + geometry: { + type: 'Point'; + coordinates: [number, number]; // [lon, lat] + }; + properties: { + osm_id?: number; + osm_type?: string; // N | W | R + osm_key?: string; // amenity, shop, … + osm_value?: string; // restaurant, supermarket, … + name?: string; + country?: string; + state?: string; + county?: string; + city?: string; + district?: string; + street?: string; + housenumber?: string; + postcode?: string; + extent?: [number, number, number, number]; + /** 0–1 importance score (Nominatim's importance, propagated by Photon). */ + importance?: number; + /** Used by /reverse to summarise the match — not always populated. */ + type?: string; + }; +} + +export function normalizePhotonFeature(f: PhotonFeature): GeocodingResult { + const props = f.properties; + const [lon, lat] = f.geometry.coordinates; + + const label = buildPhotonLabel(props); + const category = mapOsmTagToPlaceCategory(props.osm_key, props.osm_value); + + return { + label, + name: props.name || '', + latitude: lat, + longitude: lon, + address: { + street: props.street, + houseNumber: props.housenumber, + postalCode: props.postcode, + city: props.city || props.district || props.county, + state: props.state, + country: props.country, + }, + category, + // peliasCategories deliberately omitted — Photon has osm_key:osm_value + // but the consumer side keys off the absence of this field as a + // "result came from a fallback" signal. + confidence: typeof props.importance === 'number' ? props.importance : 0.5, + provider: 'photon', + }; +} + +/** Photon doesn't return a single `display_name` like Nominatim — we + * build one from the structured fields. Order matches a typical German + * postal address: "Name, Straße Nr, PLZ Ort, Land". */ +function buildPhotonLabel(props: PhotonFeature['properties']): string { + const streetLine = [props.street, props.housenumber].filter(Boolean).join(' '); + const cityLine = [props.postcode, props.city || props.district || props.county] + .filter(Boolean) + .join(' '); + return [props.name, streetLine, cityLine, props.country] + .filter((part) => part && part.length > 0) + .join(', '); +} + +function errorMessage(e: unknown): string { + return e instanceof Error ? e.message : String(e); +} + +function combineSignals(...signals: Array): AbortSignal { + const real = signals.filter((s): s is AbortSignal => !!s); + if (real.length === 1) return real[0]; + const ctrl = new AbortController(); + for (const s of real) { + if (s.aborted) { + ctrl.abort(s.reason); + break; + } + s.addEventListener('abort', () => ctrl.abort(s.reason), { once: true }); + } + return ctrl.signal; +} diff --git a/services/mana-geocoding/src/providers/types.ts b/services/mana-geocoding/src/providers/types.ts new file mode 100644 index 000000000..89d343b55 --- /dev/null +++ b/services/mana-geocoding/src/providers/types.ts @@ -0,0 +1,84 @@ +/** + * Provider-chain types — shared interface every geocoding backend speaks. + * + * The chain (`./chain.ts`) iterates over registered providers in priority + * order until one returns a non-failure result. Each provider must + * normalize its native response into the shared `GeocodingResult` shape so + * the rest of the wrapper (cache, routes, clients) doesn't care which + * backend served the request. + */ + +import type { PlaceCategory } from '../lib/category-map'; + +/** Normalized result returned to the client. */ +export interface GeocodingResult { + /** Display name (e.g. "Münster Café, Münsterplatz 3, Konstanz") */ + label: string; + /** Short name (e.g. "Münster Café") */ + name: string; + latitude: number; + longitude: number; + /** Structured address components */ + address: { + street?: string; + houseNumber?: string; + postalCode?: string; + city?: string; + state?: string; + country?: string; + }; + /** Our Places category, derived from the provider's native taxonomy. */ + category: PlaceCategory; + /** Raw Pelias categories (food, retail, transport, …) — only present + * when the result came from Pelias. Photon/Nominatim don't have an + * equivalent multi-tag taxonomy. */ + peliasCategories?: string[]; + /** Confidence score 0–1. Pelias provides this natively; Photon/Nominatim + * approximate it from `importance`. */ + confidence: number; + /** Which provider answered — useful for telemetry + UI hints + * ("approximate match" badge for fallback providers). */ + provider: ProviderName; +} + +export type ProviderName = 'pelias' | 'photon' | 'nominatim'; + +export interface SearchRequest { + q: string; + limit: number; + lang: string; + focusLat?: string; + focusLon?: string; +} + +export interface ReverseRequest { + lat: string; + lon: string; + lang: string; +} + +/** + * A provider answers one of three ways: + * - `{ ok: true, results }` — backend reachable, returned its best guess + * (which may be `[]` if no match was found — a clean zero is still a + * successful answer, not a fallthrough trigger) + * - `{ ok: false, kind: 'unreachable' }` — network / 5xx / timeout + * - `{ ok: false, kind: 'rate_limited' }` — 429 from public APIs + * + * The chain falls through on `ok: false` only. An empty `results` array + * stops the chain — otherwise an obscure address that legitimately doesn't + * match would needlessly hit every public API down the list. + */ +export type ProviderResponse = + | { ok: true; results: GeocodingResult[] } + | { ok: false; kind: 'unreachable' | 'rate_limited'; status?: number; error?: string }; + +export interface GeocodingProvider { + readonly name: ProviderName; + search(req: SearchRequest, signal?: AbortSignal): Promise; + reverse(req: ReverseRequest, signal?: AbortSignal): Promise; + /** Cheap probe — `true` means the backend is reachable right now. + * The chain caches this result for `healthCacheMs` so we don't add a + * per-request RTT to every search. */ + health(signal?: AbortSignal): Promise; +} diff --git a/services/mana-geocoding/src/routes/geocode.ts b/services/mana-geocoding/src/routes/geocode.ts index 8777f3ced..5fb4acead 100644 --- a/services/mana-geocoding/src/routes/geocode.ts +++ b/services/mana-geocoding/src/routes/geocode.ts @@ -1,46 +1,27 @@ /** - * Geocoding routes — thin proxy to Pelias with caching and - * OSM category mapping. + * Geocoding routes — thin proxy to the provider chain with caching. * * Endpoints: * GET /api/v1/geocode/search?q=...&limit=5 — forward (autocomplete) * GET /api/v1/geocode/reverse?lat=...&lon=... — reverse + * GET /api/v1/geocode/stats — cache + provider stats */ import { Hono } from 'hono'; import type { Config } from '../config'; import { LRUCache } from '../lib/cache'; -import { mapPeliasToPlaceCategory, type PlaceCategory } from '../lib/category-map'; +import type { ProviderChain } from '../providers/chain'; +import type { GeocodingResult, ProviderName } from '../providers/types'; -/** Normalized result returned to the client */ -export interface GeocodingResult { - /** Display name (e.g. "Münster Café, Münsterplatz 3, Konstanz") */ - label: string; - /** Short name (e.g. "Münster Café") */ - name: string; - latitude: number; - longitude: number; - /** Structured address components */ - address: { - street?: string; - houseNumber?: string; - postalCode?: string; - city?: string; - state?: string; - country?: string; - }; - /** Our Places category, derived from Pelias taxonomy */ - category: PlaceCategory; - /** Raw Pelias categories (food, retail, transport, …) */ - peliasCategories?: string[]; - /** Pelias confidence score 0-1 */ - confidence: number; +interface CachedAnswer { + results: GeocodingResult[]; + provider: ProviderName | undefined; } -export function createGeocodeRoutes(config: Config) { +export function createGeocodeRoutes(config: Config, chain: ProviderChain) { const app = new Hono(); - const searchCache = new LRUCache(config.cache.maxEntries, config.cache.ttlMs); - const reverseCache = new LRUCache(config.cache.maxEntries, config.cache.ttlMs); + const searchCache = new LRUCache(config.cache.maxEntries, config.cache.ttlMs); + const reverseCache = new LRUCache(config.cache.maxEntries, config.cache.ttlMs); /** * Forward geocoding / autocomplete @@ -60,52 +41,24 @@ export function createGeocodeRoutes(config: Config) { const cacheKey = `${q}|${limit}|${lang}|${focusLat}|${focusLon}`; const cached = searchCache.get(cacheKey); if (cached) { - return c.json({ results: cached, cached: true }); + return c.json({ + results: cached.results, + cached: true, + provider: cached.provider, + }); } - // Note: we don't set boundary.country — the Pelias index only - // contains DACH data, so everything is implicitly DE/AT/CH. - const params = new URLSearchParams({ - text: q.trim(), - size: String(limit), - lang, + const response = await chain.search({ q, limit, lang, focusLat, focusLon }); + if (!response.ok) { + return c.json({ results: [], error: 'geocoding_unavailable', tried: response.tried }, 502); + } + + searchCache.set(cacheKey, { results: response.results, provider: response.provider }); + return c.json({ + results: response.results, + provider: response.provider, + tried: response.tried, }); - - // Bias results towards a focus point (user's current location) - if (focusLat && focusLon) { - params.set('focus.point.lat', focusLat); - params.set('focus.point.lon', focusLon); - } - - // Query Pelias /autocomplete first (fast, fuzzy, good for venue names). - // Autocomplete intentionally excludes the address layer as a perf - // optimization, so if it returns nothing we fall back to /search which - // covers streets/addresses too. This gives us the best of both worlds: - // quick venue matches for names like "Konzil Restaurant" AND reliable - // address matches for queries like "Marktstätte Konstanz". - let features: PeliasFeature[] = []; - const autocompleteRes = await fetch(`${config.pelias.apiUrl}/autocomplete?${params}`); - if (autocompleteRes.ok) { - const data = (await autocompleteRes.json()) as PeliasResponse; - features = data.features; - } - - if (features.length === 0) { - const searchRes = await fetch(`${config.pelias.apiUrl}/search?${params}`); - if (searchRes.ok) { - const data = (await searchRes.json()) as PeliasResponse; - features = data.features; - } else if (!autocompleteRes.ok) { - console.error( - `Pelias error: autocomplete=${autocompleteRes.status} search=${searchRes.status}` - ); - return c.json({ results: [], error: 'geocoding_unavailable' }, 502); - } - } - - const results = features.map(normalizePeliasFeature); - searchCache.set(cacheKey, results); - return c.json({ results }); }); /** @@ -128,91 +81,37 @@ export function createGeocodeRoutes(config: Config) { const cached = reverseCache.get(cacheKey); if (cached) { - return c.json({ results: cached, cached: true }); + return c.json({ + results: cached.results, + cached: true, + provider: cached.provider, + }); } - const params = new URLSearchParams({ - 'point.lat': roundedLat, - 'point.lon': roundedLon, - size: '3', - lang, - }); - - const response = await fetch(`${config.pelias.apiUrl}/reverse?${params}`); + const response = await chain.reverse({ lat: roundedLat, lon: roundedLon, lang }); if (!response.ok) { - console.error(`Pelias reverse error: ${response.status} ${response.statusText}`); - return c.json({ results: [], error: 'geocoding_unavailable' }, 502); + return c.json({ results: [], error: 'geocoding_unavailable', tried: response.tried }, 502); } - const data = (await response.json()) as PeliasResponse; - const results = data.features.map(normalizePeliasFeature); - - reverseCache.set(cacheKey, results); - return c.json({ results }); + reverseCache.set(cacheKey, { results: response.results, provider: response.provider }); + return c.json({ + results: response.results, + provider: response.provider, + tried: response.tried, + }); }); /** - * Cache stats (for monitoring) + * Cache + provider stats (for monitoring + manual debug). * GET /stats */ app.get('/stats', (c) => { return c.json({ searchCacheSize: searchCache.size, reverseCacheSize: reverseCache.size, + providers: chain.getHealthSnapshot(), }); }); return app; } - -// --- Pelias response types --- - -interface PeliasResponse { - type: 'FeatureCollection'; - features: PeliasFeature[]; -} - -interface PeliasFeature { - type: 'Feature'; - geometry: { - type: 'Point'; - coordinates: [number, number]; // [lon, lat] - }; - properties: { - id?: string; - name?: string; - label?: string; - confidence?: number; - layer?: string; - street?: string; - housenumber?: string; - postalcode?: string; - locality?: string; - region?: string; - country?: string; - category?: string[]; - }; -} - -function normalizePeliasFeature(feature: PeliasFeature): GeocodingResult { - const props = feature.properties; - const [lon, lat] = feature.geometry.coordinates; - - return { - label: props.label || props.name || '', - name: props.name || '', - latitude: lat, - longitude: lon, - address: { - street: props.street, - houseNumber: props.housenumber, - postalCode: props.postalcode, - city: props.locality, - state: props.region, - country: props.country, - }, - category: mapPeliasToPlaceCategory(props.category, props.layer), - peliasCategories: props.category, - confidence: props.confidence ?? 0, - }; -} diff --git a/services/mana-geocoding/src/routes/health.ts b/services/mana-geocoding/src/routes/health.ts index 5ea71c50d..26ec9cc7d 100644 --- a/services/mana-geocoding/src/routes/health.ts +++ b/services/mana-geocoding/src/routes/health.ts @@ -1,33 +1,65 @@ import { Hono } from 'hono'; import type { Config } from '../config'; +import type { ProviderChain } from '../providers/chain'; -export function createHealthRoutes(config: Config) { +export function createHealthRoutes(config: Config, chain: ProviderChain) { const app = new Hono(); /** Wrapper health — is our Hono server up? */ app.get('/', (c) => c.json({ status: 'ok', service: 'mana-geocoding' })); /** - * Upstream Pelias health. Proxies a request to the Pelias API and - * Elasticsearch cluster health so monitoring can reach them without - * needing `extra_hosts: host.docker.internal` on the blackbox exporter. + * Upstream Pelias health. Proxies a request to the Pelias API so + * monitoring can reach it without `extra_hosts: host.docker.internal` + * on the blackbox exporter. + * + * Backwards-compatible: existing prometheus probes against this + * endpoint keep working. Now reports `degraded` (200) instead of `down` + * (503) when Pelias is unreachable but a fallback provider is healthy + * — the system can still serve queries, just slower / less precise. */ app.get('/pelias', async (c) => { try { - // Pelias API responds to /v1/status with a JSON error for unknown - // path but a 200 means the server is alive. Any other response code - // or a timeout means Pelias is unreachable. const res = await fetch(`${config.pelias.apiUrl}/status`, { signal: AbortSignal.timeout(5000), }); if (!res.ok && res.status !== 404) { - return c.json({ status: 'degraded', upstream: res.status }, 503); + return c.json( + { status: 'degraded', upstream: res.status, fallbackAvailable: chainHasFallback(chain) }, + chainHasFallback(chain) ? 200 : 503 + ); } return c.json({ status: 'ok', upstream: 'pelias-api' }); } catch (e) { - return c.json({ status: 'down', error: e instanceof Error ? e.message : 'unknown' }, 503); + return c.json( + { + status: chainHasFallback(chain) ? 'degraded' : 'down', + error: e instanceof Error ? e.message : 'unknown', + fallbackAvailable: chainHasFallback(chain), + }, + chainHasFallback(chain) ? 200 : 503 + ); } }); + /** + * Provider-chain status — per-provider health snapshot. + * GET /providers + */ + app.get('/providers', (c) => { + return c.json({ + providers: chain.getHealthSnapshot(), + }); + }); + return app; } + +/** + * Check if any non-Pelias provider is currently believed healthy. Used + * to soften /pelias health to "degraded" instead of "down" when a + * fallback can still serve traffic. + */ +function chainHasFallback(chain: ProviderChain): boolean { + return chain.getHealthSnapshot().some((p) => p.name !== 'pelias' && p.healthy); +}