week 36, 2020

How to pick more beautiful colors for your data visualizations

Choosing good colors for your charts is hard. This article tries to make it easier.

I want you to feel more confident in your color choices. And if you have no sense for colors at all, here’s my attempt to help you find good ones anyway. We’ll talk about common color mistakes I see out there in the wild, and how to avoid them.

This is not the right article for you if you’re trying to find good gradients or shades. But if you need to find beautiful, distinctive colors for different categories (e.g. continents, industries, bird species) for your line charts, pie charts, stacked bar charts, etc., read on.

Source: How to pick more beautiful colors for your data visualizations, an article by Lisa Charlotte Rost.

Understanding Python Package Distribution Types

If you’ve done much Python development you’re probably familiar with importing dependencies using pip, or even easy_install, if you’ve been at this for awhile. Whether you were aware of it or not, these dependencies likely came from the public Python Package Index (PyPI) or perhaps an internal mirror of the PyPi repository that is hosted by your company.

What you may not have been aware of is how these dependencies are actually packaged, delivered, and installed, and the differences between the different distribution types available for Python.

Source: Understanding Python Package Distribution Types, an article by Andrew Scott.

Auto Linking on iOS & macOS

When object files get linked at the final build stage, the linker needs to know which libraries to link against. For example, if you add #import <AppKit/AppKit.h> to an implementation file, you need to also add -framework AppKit to the linker flags.

Auto Linking aims to remove the latter step, i.e., it aims to derive the library linker flags from the import statements in your code. Developers do not need to add any framework/library linker flags anymore, they can just start using any framework by importing.

Source: Auto Linking on iOS & macOS, an article by Milen Dzhumerov.

Content Jumping (and How To Avoid It)

Few things are as annoying on the web as having the page layout unexpectedly change or shift while you’re trying to view or interact with it. Whether you’re attempting to read an article as it wriggles around in front of you, or you try to click a link only to have another one push it out of the way and take you off to somewhere unexpected, it’s always frustrating.

Source: Content Jumping (and How To Avoid It), an article by Brandon Smith.

Even in Go, concurrency is still not easy (with an example)

Go is famous for making concurrency easy, through good language support for goroutines. Except what Go makes easy is only one level of concurrency, the nuts and bolts level of making your code do things concurrently and communicating back and forth through channels. Making it do the right things concurrently is still up to you, and unfortunately Go doesn't currently provide a lot of standard library support for correctly implemented standard concurrency patterns.

For example, one common need is for a limited amount of concurrency; you want to do several things at once, but only so many of them. At the moment this is up to you to implement on top of goroutines, channels, and things like the sync package. This is not as easy as it looks, and quite competent people can make mistakes here. As it happens, I have an example ready to hand today.

Source: Even in Go, concurrency is still not easy (with an example), an article by Chris Siebenmann.

Computed goto for efficient dispatch tables

Recently, while idly browsing through the source code of Python, I came upon an interesting comment in the bytecode VM implementation (Python/ceval.c) about using the computed gotos extension of GCC [1]. Driven by curiosity, I decided to code a simple example to evaluate the difference between using a computed goto and a traditional switch statement for a simple VM. This post is a summary of my findings.

Source: Computed goto for efficient dispatch tables, an article by Eli Bendersky.

How Emacs beat vi in the Editor Wars

In these dark times, we are all in sore need of good news. Thankfully, I can report some: Emacs has defeated vi in the Editor Wars!

Some people, laughably, believe that vi is more popular than Emacs. Fortunately, these fools are completely wrong, and it is easily proven.

Source: How Emacs beat vi in the Editor Wars, an article by Trevor Jim.

On Modern Hardware the Min-Max Heap beats a Binary Heap

The heap is a data structure that I use all the time and that others somehow use rarely. (I once had a coworker tell me that he knew some code was mine because it used a heap) Recently I was writing code that could really benefit from using a heap (as most code can) but I needed to be able to pop items from both ends. So I read up on double-ended priority queues and how to implement them. These are rare, but the most common implementation is the “Interval Heap” that can be explained quickly, has clean code and is only slightly slower than a binary heap. But there is an alternative called the “Min-Max Heap” that doesn’t have pretty code, but it has shorter dependency chains, which is important on modern hardware. As a result it often ends up faster than a binary heap, even though it allows you to pop from both ends. Which means there might be no reason to ever use a binary heap again.

Source: On Modern Hardware the Min-Max Heap beats a Binary Heap, an article by Malte Skarupke.

PNG and Hidden Pixels

Over the last few months, I've been seeing an increase in a third type of hidden pixels: PNG padding. Almost all of these sightings have been associated with steganographic challenges, capture-the-flag forensic contests, and similar puzzles. (Ever since people began staying home due to COVID-19, there seems to have been an increased interest in security, steganography, and related topics.)

Source: PNG and Hidden Pixels, an article by Dr. Neal Krawetz.

The Three-Body Problem

Netflix on Tuesday announced that David Benioff and D.B. Weiss, the showrunners of HBO’s Game of Thrones, are working on an adaptation of Chinese science fiction author Liu Cixin’s The Three-Body Problem trilogy as their first major project since signing exclusive contracts with the streaming service last year.

“Liu Cixin’s trilogy is the most ambitious science-fiction series we’ve read, taking readers on a journey from the 1960s until the end of time, from life on our pale blue dot to the distant fringes of the universe. We look forward to spending the next years of our lives bringing this to life for audiences around the world,” reads the duo’s joint statement.

Source: Game of Thrones showrunners are adapting The Three-Body Problem as first major Netflix project, an article by Nick Statt.

I look forward to this series as I enjoyed the three books a lot; I even recommended them to my friend Simon.

Web Scraping 101 with Python

In this post, which can be read as a follow up to our ultimate web scraping guide, we will cover almost all the tools Python offers you to web scrape. We will go from the more basic to the most advanced one and will cover the pros and cons of each. Of course, we won't be able to cover all aspect of every tool we discuss, but this post should be enough to have a good idea of which tools does what, and when to use which.

Source: Web Scraping 101 with Python, an article by Kevin Sahin.


Today I noticed that the sunflower that grew from a seed dropped by a bird was mostly open, so I took a photo of it. Because the strong winds recently bend it, Alice has attached it to the fence that's covered in ivy.

Sunflower against a backdrop of ivy.

Effective testing for machine learning systems

Working as a core maintainer for PyTorch Lightning, I've grown a strong appreciation for the value of tests in software development. As I've been spinning up a new project at work, I've been spending a fair amount of time thinking about how we should test machine learning systems. A couple weeks ago, one of my coworkers sent me a fascinating paper on the topic which inspired me to dig in, collect my thoughts, and write this blog post.

Source: Effective testing for machine learning systems, an article by Jeremy Jordan.

Training PyTorch models with differential privacy

We are releasing Opacus, a new high-speed library for training PyTorch models with differential privacy (DP) that’s more scalable than existing state-of-the-art methods. Differential privacy is a mathematically rigorous framework for quantifying the anonymization of sensitive data. It’s often used in analytics, with growing interest in the machine learning (ML) community. With the release of Opacus, we hope to provide an easier path for researchers and engineers to adopt differential privacy in ML, as well as to accelerate DP research in the field.

Source: Introducing Opacus: A high-speed library for training PyTorch models with differential privacy, an article by Davide Testuggine and Ilya Mironov.