Plurrrr

Sun 10 Apr 2022

Writing web scrapers in Go with Colly framework

Colly is a web scraping framework for Go programming language. The feature set of Colly largely overlaps with that of Scrapy framework from Python ecosystem:

  • Built-in concurrency.
  • Cookie handling.
  • Caching of HTTP response data.
  • Automatic heeding of robots.txt rules.
  • Automatic throttling of outgoing traffic.

Furthermore, Colly supports distributed scraping out-of-the-box through a Redis-based task queue and can be integrated with Google App Engine. This makes it a viable choice for large-scale web scraping projects.

Source: Writing web scrapers in Go with Colly framework.

Singleton is a bad idea

Design patterns are a great way to think about interactions among classes. But the classic Singleton pattern is bad: you shouldn’t use it and there are better options.

The classic Singleton pattern is a class which always gives you the same object when you create an instance of the class. It’s used to ensure that all users of a class are using the same object.

Source: Singleton is a bad idea, an article by Ned Batchelder.

A basic introduction to NumPy's einsum

The einsum function is one of NumPy’s jewels. It can often outperform familiar array functions in terms of speed and memory efficiency, thanks to its expressive power and smart loops. On the downside, it can take a little while understand the notation and sometimes a few attempts to apply it correctly to a tricky problem.

Source: A basic introduction to NumPy's einsum, an article by Alex Riley.