Writing web scrapers in Go with Colly framework
Colly is a web scraping framework for Go programming language. The feature set of Colly largely overlaps with that of Scrapy framework from Python ecosystem:
- Built-in concurrency.
- Cookie handling.
- Caching of HTTP response data.
- Automatic heeding of robots.txt rules.
- Automatic throttling of outgoing traffic.
Furthermore, Colly supports distributed scraping out-of-the-box through a Redis-based task queue and can be integrated with Google App Engine. This makes it a viable choice for large-scale web scraping projects.