Plurrrr

week 14, 2022

Writing web scrapers in Go with Colly framework

Colly is a web scraping framework for Go programming language. The feature set of Colly largely overlaps with that of Scrapy framework from Python ecosystem:

  • Built-in concurrency.
  • Cookie handling.
  • Caching of HTTP response data.
  • Automatic heeding of robots.txt rules.
  • Automatic throttling of outgoing traffic.

Furthermore, Colly supports distributed scraping out-of-the-box through a Redis-based task queue and can be integrated with Google App Engine. This makes it a viable choice for large-scale web scraping projects.

Source: Writing web scrapers in Go with Colly framework.

Singleton is a bad idea

Design patterns are a great way to think about interactions among classes. But the classic Singleton pattern is bad: you shouldn’t use it and there are better options.

The classic Singleton pattern is a class which always gives you the same object when you create an instance of the class. It’s used to ensure that all users of a class are using the same object.

Source: Singleton is a bad idea, an article by Ned Batchelder.

A basic introduction to NumPy's einsum

The einsum function is one of NumPy’s jewels. It can often outperform familiar array functions in terms of speed and memory efficiency, thanks to its expressive power and smart loops. On the downside, it can take a little while understand the notation and sometimes a few attempts to apply it correctly to a tricky problem.

Source: A basic introduction to NumPy's einsum, an article by Alex Riley.

The Unexpected Importance of the Trailing Slash

For many using Unix-derived systems today, we take for granted that /some/path and /some/path/ are the same. Most shells will even add a trailing slash for you when you press the Tab key after the name of a directory or a symbolic link to one.

However, many programs treat these two paths as subtly different in certain cases, which I outline below, as all three have tripped me up in various ways1.

Source: The Unexpected Importance of the Trailing Slash, an article by Jacob Adams.

The ordering operators

Perl has two operators, cmp and <=>, which are basically never seen outside of sort blocks.

That doesn’t mean you can’t use them elsewhere, though. Certainly sort and these operators were designed to work seamlessly together but there isn’t anything sort-specific about the operators per se, and in some contexts they can be the most appropriate solution.

Source: The ordering operators.

Firefox DNS-over-HTTPS

When you type a web address or domain name into your address bar (example: www.mozilla.org), your browser sends a request over the Internet to look up the IP address for that website. Traditionally, this request is sent to servers over a plain text connection. This connection is not encrypted, making it easy for third-parties to see what website you’re about to access. DNS-over-HTTPS (DoH) works differently. It sends the domain name you typed to a DoH-compatible DNS server using an encrypted HTTPS connection instead of a plain text one. This prevents third-parties from seeing what websites you are trying to access.

Source: Firefox DNS-over-HTTPS.

Why Literate Programming Might Help You Write Better Code

Literate programming is an approach to programming in which the code is explained using natural language alongside the source code. This is distinct from related practices such as documentation or code comments; there, the code is primary, with commentary and explanation being secondary. In literate programming, however, explanation has equal billing with the code itself.

Source: Why Literate Programming Might Help You Write Better Code, an article by Richard Gall.

A Deep Dive Into Go's Concurrency

Go is known for its first-class support for concurrency, or the ability for a program to deal with multiple things at once. Code concurrently running is becoming a more critical part of programming as computers move from running a single code stream faster to running more streams simultaneously.

Source: A Deep Dive Into Go's Concurrency, an article by Kevin Vogel.

YAML: The Missing Battery in Python

Python is often marketed as a batteries-included language because it comes with almost everything you’d ever expect from a programming language. This statement is mostly true, as the standard library and the external modules cover a broad spectrum of programming needs. However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages.

In this tutorial, you’ll learn how to work with YAML in Python using the available third-party libraries, with a focus on PyYAML. If you’re new to YAML or haven’t used it in a while, then you’ll have a chance to take a quick crash course before diving deeper into the topic.

Source: YAML: The Missing Battery in Python, an article by Bartosz Zaczyński.

Mental models for learning Rust

Let us not beat around the bush: Rust is not easy to learn.

I think it took me nearly 1 year of full-time programming in Rust to become proficient and no longer have to read the documentation every 5 lines of code. It's a looong journey but absolutely worth it.

It requires you to re-think all the mental models you learned while using other programming languages.

This is why I thought it could be interesting to share how I adapted my programming habits when working with Rust along the years.

Source: Mental models for learning Rust, an article by Sylvain Kerkour.

What’s New in Emacs 28.1?

It’s that time again: there’s a new major version of Emacs and, with it, a treasure trove of new features and changes.

Notable features include the formal inclusion of native compilation, a technique that will greatly speed up your Emacs experience.

A critical issue surrounding the use of ligatures also fixed; without it, you couldn’t use ligatures in Emacs 27 without crashes. So that’s good news indeed also.

Source: What's New in Emacs 28.1?, an article by Mickey Petersen.

Writing a NetBSD kernel module

Kernel modules are object files used to extend an operating system’s kernel functionality at run time.

In this post, we’ll look at implementing a simple character device driver as a kernel module in NetBSD. Once it is loaded, userspace processes will be able to write an arbitrary byte string to the device, and on every successive read expect a cryptographically-secure pseudorandom permutation of the original byte string.

Source: Writing a NetBSD kernel module, an article by Saurav Sachidanand.

PIPEFAIL: How a missing shell option slowed Cloudflare down

At Cloudflare, we’re used to being the fastest in the world. However, for approximately 30 minutes last December, Cloudflare was slow. Between 20:10 and 20:40 UTC on December 16, 2021, web requests served by Cloudflare were artificially delayed by up to five seconds before being processed. This post tells the story of how a missing shell option called “pipefail” slowed Cloudflare down.

Source: PIPEFAIL: How a missing shell option slowed Cloudflare down, an article by Alex Forster.

Ballooning spiders rely on electric fields to generate lift

In 1832, Charles Darwin witnessed hundreds of ballooning spiders landing on the HMS Beagle while some 60 miles offshore. Ballooning is a phenomenon that's been known since at least the days of Aristotle—and immortalized in E.B. White's children's classic Charlotte's Web—but scientists have only recently made progress in gaining a better understanding of its underlying physics.

Source: No air currents required: Ballooning spiders rely on electric fields to generate lift, an article by Jennifer Ouellette.

Go Generics for Field Level Database Encryption

I have been working on a project that needs a rest API with a SQL database. The API creates, updates and retrieves objects from the database; such as:

  • user information
  • transactions
  • user accounts
  • credit card details
  • other

This project has some specific data security requirements for storage. We needed to encrypt certain fields on these data tables using a different encryption key per user and per account. The biggest challenge was building a Go library to support this sort of complex per field encryption. I wanted to make a nice way of encrypting and decrypting a Go struct without adding verbose code in the API. There needed to be a simple way of managing the many different encryption keys that will be handled in each API request.

Source: Go Generics for Field Level Database Encryption, an article by Josh Wales.

Some thoughts on Go's unusual approach to identifier visibility

The obvious nice thing about Go's approach is that you're never in doubt about whether an identifier is public or not when you read code. If it starts with upper case, it's public; otherwise, it's package private. This doesn't mean that a public identifier is supposed to be used generally, but at least it's clear to everyone that it could be. In other languages, you may have to consult the definition of the identifier, or perhaps a section of code that lists exported identifiers.

Source: Some thoughts on Go's unusual approach to identifier visibility, an article by Chris Siebenmann.

Those HTML Attributes You Never Use

But there is a whole bunch of lesser-used attributes that I was sure I’d forgotten about, and probably a whole bunch of attributes I didn’t even know existed. This post is the result of my research, and I hope you’ll find some of these useful to you, as you build HTML pages in the coming months.

Source: Those HTML Attributes You Never Use, an article by Louis Lazaris.