managarten/services/mana-crawler/CLAUDE.md
Till JS 7e931b1c6d refactor(services): rename Go services, remove -go suffix
mana-search-go → mana-search
mana-notify-go → mana-notify
mana-crawler-go → mana-crawler
mana-api-gateway-go → mana-api-gateway

Legacy NestJS versions are deleted, suffix no longer needed.
Updated all references in docker-compose, CLAUDE.md, package.json,
Forgejo workflows, and service package.json files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:18:40 +01:00

30 lines
819 B
Markdown

# mana-crawler (Go)
Go web crawler replacing the NestJS mana-crawler. Goroutine-based worker pool instead of BullMQ.
## Architecture
- **Language:** Go 1.25
- **HTML Parsing:** goquery (jQuery-like selectors)
- **Robots.txt:** temoto/robotstxt with 24h cache
- **Job Queue:** Goroutine worker pool + channels (replaces BullMQ)
- **Database:** PostgreSQL (pgx v5)
- **Port:** 3023
## Endpoints
- `POST /api/v1/crawl` — Start crawl job
- `GET /api/v1/crawl` — List jobs
- `GET /api/v1/crawl/{jobId}` — Job status
- `GET /api/v1/crawl/{jobId}/results` — Paginated results
- `DELETE /api/v1/crawl/{jobId}` — Cancel job
- `GET /health` — Health check
- `GET /metrics` — Prometheus metrics
## Commands
```bash
go run ./cmd/server # Dev
go build ./cmd/server # Build
go test ./... # Test
```