managarten/services/mana-crawler-go/CLAUDE.md
Till JS 64f7f768eb feat(infra): add Go web crawler (mana-crawler-go)
Goroutine-based crawler replacing NestJS mana-crawler:
- goquery for HTML parsing (title, content, links, metadata)
- robots.txt checker with 24h cache
- Worker pool with configurable concurrency + rate limiting
- PostgreSQL for job/result storage
- Same API surface: POST/GET/DELETE /api/v1/crawl

11 MB binary, ~15 MB Docker image vs ~200 MB NestJS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 22:10:45 +01:00

819 B

mana-crawler (Go)

Go web crawler replacing the NestJS mana-crawler. Goroutine-based worker pool instead of BullMQ.

Architecture

  • Language: Go 1.25
  • HTML Parsing: goquery (jQuery-like selectors)
  • Robots.txt: temoto/robotstxt with 24h cache
  • Job Queue: Goroutine worker pool + channels (replaces BullMQ)
  • Database: PostgreSQL (pgx v5)
  • Port: 3023

Endpoints

  • POST /api/v1/crawl — Start crawl job
  • GET /api/v1/crawl — List jobs
  • GET /api/v1/crawl/{jobId} — Job status
  • GET /api/v1/crawl/{jobId}/results — Paginated results
  • DELETE /api/v1/crawl/{jobId} — Cancel job
  • GET /health — Health check
  • GET /metrics — Prometheus metrics

Commands

go run ./cmd/server    # Dev
go build ./cmd/server  # Build
go test ./...          # Test