week 40, 2022

Python Preloaded


The startup time of CPython including loading big libraries like PyTorch or TensorFlow is too slow. In case of slow file systems, I have seen startup times including such import of 10-20 seconds.

Very simple idea:

Keep the state of CPython right after we imported the big libraries and make it available instantly when needed. When loading the state, we can continue to run any random Python script (we can use runpy).

Source: Python Preloaded, an article by Albert Zeyer.

How to Gradually Add Types for Third Party Packages

Hynek Schlawack recently described graduality as Python’s super power: the ability to prototype in the REPL, and gradually add linting, type checking, and other practices to refine your code into maintainable, production-ready software. You can also apply graduality within tools, activating checks one at a time and fixing the resulting errors as you go.

One place you can apply graduality with Mypy is in the type hints for third party packages. The default course of action is to add type hints for a whole package at once. You do this either through the package itself if it has a py.typed file, or with a *-stubs package (both specified in PEP 561). But when you add full hints for a package used widely in your code base, you might find yourself facing an insurmountable mountain of errors.

Instead, you can gradually add type hints for the package, piece by piece, fixing small batches of errors as you go. This iterative approach is more psychologically pleasing, and it reduces the chance for mistakes.

Source: Python Type Hints: How to Gradually Add Types for Third Party Packages, an article by Adam Johnson.

Lánczos interpolation explained

Lánczos interpolation is one of the most popular methods to resize images, together with linear and cubic interpolation. I’ve spent a lot of time staring at images resampled with Lánczos, and a few years ago, I wondered where it came from. While many sources evaluate interpolation filters visually, I couldn’t find a good explanation of how Lánczos interpolation is derived. So here it is!

Source: Lánczos interpolation explained.

Don’t make databases available on the public internet

The folks at just published an excellent review of PostgreSQL security, with a startling conclusion: the vast majority of PostgreSQL connections that are happening over the public internet are insecure, due to a combination of server misconfigurations and most clients unfortunately defaulting to unsafe settings.

In short: most Postgres clients either don’t enforce TLS at all on the connections to servers, or enforce that a TLS handshake happens but don’t verify that the certificate is valid and matches the expected hostname. What this means in practice is that those connections can be trivially interposed by anyone sitting between the client and server - a classic Machine in the Middle (MitM) attack.

Source: Don’t make databases available on the public internet, an article by David Anderson.

Moving files in ZSH - The wonderful world of zmv

From time to time I find myself trying to move a batch of files that have a similar pattern in their names but doesn't quite match an easy to write glob pattern. In the past, I used to write quick and dirty scripts — usually in shell script, nothing fancy — to make it easier to move these files around. A few months ago I discovered zmv, a zsh function that is much better than plain old mv to move files around. Since an example is worth a thousand blog posts, let's jump right into it.

Source: Moving files in ZSH - The wonderful world of zmv, an article by Filipe Kiss.

Future Proofing SQL with Carefully Placed Errors

Backward compatibility is straightforward. You have full control over new code and you have full knowledge of past data and APIs. Forward compatibility is more challenging. You have full control over new code, but you don't know how data is going to change in the future, and what types of API you're going to have to support.

There are many best practices for maintaining backward and forward compatibility in application code, but it's not very commonly mentioned in relation to SQL. SQL is used to produce critical business information for applications and decision-making, so there's no reason it shouldn't benefit from similar practices.

Source: Future Proofing SQL with Carefully Placed Errors, an article by Haki Benita.

Routing PostgreSQL queries

Scaling databases is hard. However, perhaps the lowest hanging fruit is introducing read-only replicas.

A typical load balancing requirement is to route all "logical" read-only queries to a read-only instance. This requirement can be implemented in 2 ways:

  1. Create two database clients (read-write and read-only) and pass them around the application as needed.
  2. Use a middleware to assign query to a connection pool based on the query itself.

Source: Routing PostgreSQL queries between read-write & read-only instances, an article by Gajus Kuizinas.

Single Pass Recursion in Rust

This is the third post in a three-post series. In the first post we developed a stack-safe, ergonomic, and concise method for working with recursive data structures (using a simple expression language as an example). In the second post we made it fully generic, providing a set of generic tools for expanding and collapsing any recursive data structure in Rust.

In this post we will see how to combine these two things - expanding a structure and collapsing it at the same time, performing both operations in a single pass. In the process, we will gain the ability to write arbitrary recursive functions over traditional boxed-pointer recursive structures (instead of the novel RecursiveTree type introduced in my previous post) while retaining stack safety.

Source: Single Pass Recursion in Rust, an article by Inanna Malick.

Don't worry (about writing Haskell)

As we all know, static type systems are great to ensure correctness of our programs. Sadly, in industry many people are forced to work in languages with a weak type system, such as Haskell. What should you do in such a situation? Quit your job? Give up and despair? Perhaps, but I have another suggestion that I’d like to explain in this post: use our tool agda2hs.

Source: Don't worry (about writing Haskell), be happy (writing Agda instead)!, an article by Jesper Cockx.

Hard Mode Rust

This post is a case study of writing a Rust application using only minimal, artificially constrained API (eg, no dynamic memory allocation). It assumes a fair bit of familiarity with the language.

Source: Hard Mode Rust, an article by Aleksey Kladov.

Rust's Result Type is Cool

If you've worked with Rust before, you know how different its error handling story is from most other languages. The Rust Programming Language explains the two primary ways of raising errors, panicking and the Result type, and how you can propagate the Result type with the ? operator to make recoverable errors explicit without interfering with the happy path in a certain function.

Source: Rust's Result Type is Cool, an article by Eshan Singh.

Just commit more!

Over new years this past year I made dura. It’s like auto-backup for Git. It tries to stay out of the way until you’re in a panic, trying to figure out how to rescue your repository from a thoughtless git reset --hard. It makes background commits, real Git commits that you don’t normally have to see in the log, by committing to a different branch than the one you have checked out. Overall, it’s been a blast. I’ve learned a lot from the contributors, like how to write well-formed Rust as well as a bit about Nix.

One recurring quesion has been, “why don’t you just commit more”?

It’s not a bad question. I clearly went through a lot of effort to build a tool in Rust. I could’ve changed my own behavior. I guess it bugged me how many hours were being wasted on rescuing repositories around the world when the answer is so easy: just commit more.

Source: Just commit more!, an article by Tim Kellogg.

Category Theory

Category theory has come to occupy a central position in contemporary mathematics and theoretical computer science, and is also applied to mathematical physics. Roughly, it is a general mathematical theory of structures and of systems of structures. As category theory is still evolving, its functions are correspondingly developing, expanding and multiplying. At minimum, it is a powerful language, or conceptual framework, allowing us to see the universal components of a family of structures of a given kind, and how structures of different kinds are interrelated. Category theory is both an interesting object of philosophical study, and a potentially powerful formal tool for philosophical investigations of concepts such as space, system, and even truth. It can be applied to the study of logical systems in which case category theory is called “categorical doctrines” at the syntactic, proof-theoretic, and semantic levels. Category theory even leads to a different theoretical conception of set and, as such, to a possible alternative to the standard set theoretical foundation for mathematics. As such, it raises many issues about mathematical ontology and epistemology. Category theory thus affords philosophers and logicians much to use and reflect upon.

Source: Category Theory.

Nix language basics

The Nix language is used to declare packages and configurations to be built by Nix.

It is a domain-specific, purely functional, lazily evaluated, dynamically typed programming language.

Source: Nix language basics.