Colly is a web scraping framework for Go programming language. The feature set of Colly largely overlaps with that of Scrapy framework from Python ecosystem:
- Built-in concurrency.
- Cookie handling.
- Caching of HTTP response data.
- Automatic heeding of robots.txt rules.
- Automatic throttling of outgoing traffic.
Furthermore, Colly supports distributed scraping out-of-the-box through a Redis-based task queue and can be integrated with Google App Engine. This makes it a viable choice for large-scale web scraping projects.
Design patterns are a great way to think about interactions among classes. But the classic Singleton pattern is bad: you shouldn’t use it and there are better options.
The classic Singleton pattern is a class which always gives you the same object when you create an instance of the class. It’s used to ensure that all users of a class are using the same object.
Source: Singleton is a bad idea, an article by Ned Batchelder.
einsumfunction is one of NumPy’s jewels. It can often outperform familiar array functions in terms of speed and memory efficiency, thanks to its expressive power and smart loops. On the downside, it can take a little while understand the notation and sometimes a few attempts to apply it correctly to a tricky problem.
Source: A basic introduction to NumPy's einsum, an article by Alex Riley.
For many using Unix-derived systems today, we take for granted that
/some/path/are the same. Most shells will even add a trailing slash for you when you press the Tab key after the name of a directory or a symbolic link to one.
However, many programs treat these two paths as subtly different in certain cases, which I outline below, as all three have tripped me up in various ways1.
Source: The Unexpected Importance of the Trailing Slash, an article by Jacob Adams.
Perl has two operators,
<=>, which are basically never seen outside of
That doesn’t mean you can’t use them elsewhere, though. Certainly
sortand these operators were designed to work seamlessly together but there isn’t anything
sort-specific about the operators per se, and in some contexts they can be the most appropriate solution.
Source: The ordering operators.
AsyncSSH is a Python package which provides an asynchronous client and server implementation of the SSHv2 protocol on top of the Python 3.6+ asyncio framework.
When you type a web address or domain name into your address bar (example: www.mozilla.org), your browser sends a request over the Internet to look up the IP address for that website. Traditionally, this request is sent to servers over a plain text connection. This connection is not encrypted, making it easy for third-parties to see what website you’re about to access. DNS-over-HTTPS (DoH) works differently. It sends the domain name you typed to a DoH-compatible DNS server using an encrypted HTTPS connection instead of a plain text one. This prevents third-parties from seeing what websites you are trying to access.
Source: Firefox DNS-over-HTTPS.
Literate programming is an approach to programming in which the code is explained using natural language alongside the source code. This is distinct from related practices such as documentation or code comments; there, the code is primary, with commentary and explanation being secondary. In literate programming, however, explanation has equal billing with the code itself.
Source: Why Literate Programming Might Help You Write Better Code, an article by Richard Gall.
Go is known for its first-class support for concurrency, or the ability for a program to deal with multiple things at once. Code concurrently running is becoming a more critical part of programming as computers move from running a single code stream faster to running more streams simultaneously.
Source: A Deep Dive Into Go's Concurrency, an article by Kevin Vogel.
Nix flakes allow you to expose NixOS modules. NixOS modules are templates for system configuration and they are the basis of how you configure NixOS. Today we're going to take our Nix flake from the last article and write a NixOS module for it so that we can deploy it to a container running locally. In the next post we will deploy this to a server.
Source: Nix Flakes: Exposing and using NixOS Modules, an article by Christine Dodrill.
Python is often marketed as a batteries-included language because it comes with almost everything you’d ever expect from a programming language. This statement is mostly true, as the standard library and the external modules cover a broad spectrum of programming needs. However, Python lacks built-in support for the YAML data format, commonly used for configuration and serialization, despite clear similarities between the two languages.
In this tutorial, you’ll learn how to work with YAML in Python using the available third-party libraries, with a focus on PyYAML. If you’re new to YAML or haven’t used it in a while, then you’ll have a chance to take a quick crash course before diving deeper into the topic.
Source: YAML: The Missing Battery in Python, an article by Bartosz Zaczyński.
Let us not beat around the bush: Rust is not easy to learn.
I think it took me nearly 1 year of full-time programming in Rust to become proficient and no longer have to read the documentation every 5 lines of code. It's a looong journey but absolutely worth it.
It requires you to re-think all the mental models you learned while using other programming languages.
This is why I thought it could be interesting to share how I adapted my programming habits when working with Rust along the years.
Source: Mental models for learning Rust, an article by Sylvain Kerkour.
It’s that time again: there’s a new major version of Emacs and, with it, a treasure trove of new features and changes.
Notable features include the formal inclusion of native compilation, a technique that will greatly speed up your Emacs experience.
A critical issue surrounding the use of ligatures also fixed; without it, you couldn’t use ligatures in Emacs 27 without crashes. So that’s good news indeed also.
Source: What's New in Emacs 28.1?, an article by Mickey Petersen.
Kernel modules are object files used to extend an operating system’s kernel functionality at run time.
In this post, we’ll look at implementing a simple character device driver as a kernel module in NetBSD. Once it is loaded, userspace processes will be able to
writean arbitrary byte string to the device, and on every successive
readexpect a cryptographically-secure pseudorandom permutation of the original byte string.
Source: Writing a NetBSD kernel module, an article by Saurav Sachidanand.
At Cloudflare, we’re used to being the fastest in the world. However, for approximately 30 minutes last December, Cloudflare was slow. Between 20:10 and 20:40 UTC on December 16, 2021, web requests served by Cloudflare were artificially delayed by up to five seconds before being processed. This post tells the story of how a missing shell option called “pipefail” slowed Cloudflare down.
Source: PIPEFAIL: How a missing shell option slowed Cloudflare down, an article by Alex Forster.
In this article I present a step-by-step walkthrough of my photography workflow. I won't go through all the details of every piece of software I mention, they have their own manuals and documentation for that, I will highlight the operations I do.
Source: My free-software photography workflow, an article by Fidel Ramos.
Formatted string literals - also called f-strings - have been around since Python 3.6, so we all know what they are and how to use them. There are however some facts and handy features of f-string that you might not know about. So, let's take a tour of some awesome f-string features that you'll want to use in your everyday coding.
Source: Python f-strings Are More Powerful Than You Might Think, an article by Martin Heinz.
In 1832, Charles Darwin witnessed hundreds of ballooning spiders landing on the HMS Beagle while some 60 miles offshore. Ballooning is a phenomenon that's been known since at least the days of Aristotle—and immortalized in E.B. White's children's classic Charlotte's Web—but scientists have only recently made progress in gaining a better understanding of its underlying physics.
Source: No air currents required: Ballooning spiders rely on electric fields to generate lift, an article by Jennifer Ouellette.
I have been working on a project that needs a rest API with a SQL database. The API creates, updates and retrieves objects from the database; such as:
- user information
- user accounts
- credit card details
This project has some specific data security requirements for storage. We needed to encrypt certain fields on these data tables using a different encryption key per
account. The biggest challenge was building a Go library to support this sort of complex per field encryption. I wanted to make a nice way of encrypting and decrypting a Go struct without adding verbose code in the API. There needed to be a simple way of managing the many different encryption keys that will be handled in each API request.
Source: Go Generics for Field Level Database Encryption, an article by Josh Wales.
The obvious nice thing about Go's approach is that you're never in doubt about whether an identifier is public or not when you read code. If it starts with upper case, it's public; otherwise, it's package private. This doesn't mean that a public identifier is supposed to be used generally, but at least it's clear to everyone that it could be. In other languages, you may have to consult the definition of the identifier, or perhaps a section of code that lists exported identifiers.
Source: Some thoughts on Go's unusual approach to identifier visibility, an article by Chris Siebenmann.
But there is a whole bunch of lesser-used attributes that I was sure I’d forgotten about, and probably a whole bunch of attributes I didn’t even know existed. This post is the result of my research, and I hope you’ll find some of these useful to you, as you build HTML pages in the coming months.
Source: Those HTML Attributes You Never Use, an article by Louis Lazaris.