mirror of
https://github.com/Memo-2023/mana-monorepo.git
synced 2026-05-17 12:09:41 +02:00
Goroutine-based crawler replacing NestJS mana-crawler: - goquery for HTML parsing (title, content, links, metadata) - robots.txt checker with 24h cache - Worker pool with configurable concurrency + rate limiting - PostgreSQL for job/result storage - Same API surface: POST/GET/DELETE /api/v1/crawl 11 MB binary, ~15 MB Docker image vs ~200 MB NestJS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
20 lines
555 B
Modula-2
20 lines
555 B
Modula-2
module github.com/manacore/mana-crawler
|
|
|
|
go 1.25.0
|
|
|
|
require (
|
|
github.com/PuerkitoBio/goquery v1.12.0
|
|
github.com/jackc/pgx/v5 v5.9.1
|
|
github.com/rs/cors v1.11.1
|
|
github.com/temoto/robotstxt v1.1.2
|
|
)
|
|
|
|
require (
|
|
github.com/andybalholm/cascadia v1.3.3 // indirect
|
|
github.com/jackc/pgpassfile v1.0.0 // indirect
|
|
github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 // indirect
|
|
github.com/jackc/puddle/v2 v2.2.2 // indirect
|
|
golang.org/x/net v0.52.0 // indirect
|
|
golang.org/x/sync v0.20.0 // indirect
|
|
golang.org/x/text v0.35.0 // indirect
|
|
)
|