managarten/services/mana-crawler-go/go.mod
Till JS 64f7f768eb feat(infra): add Go web crawler (mana-crawler-go)
Goroutine-based crawler replacing NestJS mana-crawler:
- goquery for HTML parsing (title, content, links, metadata)
- robots.txt checker with 24h cache
- Worker pool with configurable concurrency + rate limiting
- PostgreSQL for job/result storage
- Same API surface: POST/GET/DELETE /api/v1/crawl

11 MB binary, ~15 MB Docker image vs ~200 MB NestJS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 22:10:45 +01:00

20 lines
555 B
Modula-2

module github.com/manacore/mana-crawler
go 1.25.0
require (
github.com/PuerkitoBio/goquery v1.12.0
github.com/jackc/pgx/v5 v5.9.1
github.com/rs/cors v1.11.1
github.com/temoto/robotstxt v1.1.2
)
require (
github.com/andybalholm/cascadia v1.3.3 // indirect
github.com/jackc/pgpassfile v1.0.0 // indirect
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
github.com/jackc/puddle/v2 v2.2.2 // indirect
golang.org/x/net v0.52.0 // indirect
golang.org/x/sync v0.20.0 // indirect
golang.org/x/text v0.35.0 // indirect
)