week 40, 2022

Sun 09 Oct 2022

Python Preloaded

Problem:

The startup time of CPython including loading big libraries like PyTorch or TensorFlow is too slow. In case of slow file systems, I have seen startup times including such import of 10-20 seconds.

Very simple idea:

Keep the state of CPython right after we imported the big libraries and make it available instantly when needed. When loading the state, we can continue to run any random Python script (we can use runpy).

Source: Python Preloaded, an article by Albert Zeyer.

python

How to Gradually Add Types for Third Party Packages

Hynek Schlawack recently described graduality as Python’s super power: the ability to prototype in the REPL, and gradually add linting, type checking, and other practices to refine your code into maintainable, production-ready software. You can also apply graduality within tools, activating checks one at a time and fixing the resulting errors as you go.

One place you can apply graduality with Mypy is in the type hints for third party packages. The default course of action is to add type hints for a whole package at once. You do this either through the package itself if it has a py.typed file, or with a *-stubs package (both specified in PEP 561). But when you add full hints for a package used widely in your code base, you might find yourself facing an insurmountable mountain of errors.

Instead, you can gradually add type hints for the package, piece by piece, fixing small batches of errors as you go. This iterative approach is more psychologically pleasing, and it reduces the chance for mistakes.

Source: Python Type Hints: How to Gradually Add Types for Third Party Packages, an article by Adam Johnson.

python

Lánczos interpolation explained

Lánczos interpolation is one of the most popular methods to resize images, together with linear and cubic interpolation. I’ve spent a lot of time staring at images resampled with Lánczos, and a few years ago, I wondered where it came from. While many sources evaluate interpolation filters visually, I couldn’t find a good explanation of how Lánczos interpolation is derived. So here it is!

Source: Lánczos interpolation explained.

computer science

Sat 08 Oct 2022

My class is bigger than your class

A lot has been written about metaclasses, and most of it is unintelligible. Suffice to say that metaclasses allow you to define methods for the class itself, instead of for its instance.

Source: My class is bigger than your class, an article by Erez Shinan.

python

Don’t make databases available on the public internet

The folks at bit.io just published an excellent review of PostgreSQL security, with a startling conclusion: the vast majority of PostgreSQL connections that are happening over the public internet are insecure, due to a combination of server misconfigurations and most clients unfortunately defaulting to unsafe settings.

In short: most Postgres clients either don’t enforce TLS at all on the connections to servers, or enforce that a TLS handshake happens but don’t verify that the certificate is valid and matches the expected hostname. What this means in practice is that those connections can be trivially interposed by anyone sitting between the client and server - a classic Machine in the Middle (MitM) attack.

Source: Don’t make databases available on the public internet, an article by David Anderson.

Moving files in ZSH - The wonderful world of zmv

From time to time I find myself trying to move a batch of files that have a similar pattern in their names but doesn't quite match an easy to write glob pattern. In the past, I used to write quick and dirty scripts — usually in shell script, nothing fancy — to make it easier to move these files around. A few months ago I discovered zmv, a zsh function that is much better than plain old mv to move files around. Since an example is worth a thousand blog posts, let's jump right into it.

Source: Moving files in ZSH - The wonderful world of zmv, an article by Filipe Kiss.

shell

Fri 07 Oct 2022

Future Proofing SQL with Carefully Placed Errors

Backward compatibility is straightforward. You have full control over new code and you have full knowledge of past data and APIs. Forward compatibility is more challenging. You have full control over new code, but you don't know how data is going to change in the future, and what types of API you're going to have to support.

There are many best practices for maintaining backward and forward compatibility in application code, but it's not very commonly mentioned in relation to SQL. SQL is used to produce critical business information for applications and decision-making, so there's no reason it shouldn't benefit from similar practices.

Source: Future Proofing SQL with Carefully Placed Errors, an article by Haki Benita.

database

Routing PostgreSQL queries

Scaling databases is hard. However, perhaps the lowest hanging fruit is introducing read-only replicas.

A typical load balancing requirement is to route all "logical" read-only queries to a read-only instance. This requirement can be implemented in 2 ways:

Create two database clients (read-write and read-only) and pass them around the application as needed.

Use a middleware to assign query to a connection pool based on the query itself.

Source: Routing PostgreSQL queries between read-write & read-only instances, an article by Gajus Kuizinas.

postgres

Single Pass Recursion in Rust

This is the third post in a three-post series. In the first post we developed a stack-safe, ergonomic, and concise method for working with recursive data structures (using a simple expression language as an example). In the second post we made it fully generic, providing a set of generic tools for expanding and collapsing any recursive data structure in Rust.

In this post we will see how to combine these two things - expanding a structure and collapsing it at the same time, performing both operations in a single pass. In the process, we will gain the ability to write arbitrary recursive functions over traditional boxed-pointer recursive structures (instead of the novel RecursiveTree type introduced in my previous post) while retaining stack safety.

Source: Single Pass Recursion in Rust, an article by Inanna Malick.

rust

Thu 06 Oct 2022

Don't worry (about writing Haskell)

As we all know, static type systems are great to ensure correctness of our programs. Sadly, in industry many people are forced to work in languages with a weak type system, such as Haskell. What should you do in such a situation? Quit your job? Give up and despair? Perhaps, but I have another suggestion that I’d like to explain in this post: use our tool agda2hs.

Source: Don't worry (about writing Haskell), be happy (writing Agda instead)!, an article by Jesper Cockx.

haskell

Partitioning in Postgres, 2022 edition

Partitioned tables aren’t an everyday go to, but are invaluable in some cases, particularly when you have a high volume table that’s expected to keep growing.

Source: Partitioning in Postgres, 2022 edition.

postgres

Hard Mode Rust

This post is a case study of writing a Rust application using only minimal, artificially constrained API (eg, no dynamic memory allocation). It assumes a fair bit of familiarity with the language.

Source: Hard Mode Rust, an article by Aleksey Kladov.

rust

Wed 05 Oct 2022

Rust's Result Type is Cool

If you've worked with Rust before, you know how different its error handling story is from most other languages. The Rust Programming Language explains the two primary ways of raising errors, panicking and the Result type, and how you can propagate the Result type with the ? operator to make recoverable errors explicit without interfering with the happy path in a certain function.

Source: Rust's Result Type is Cool, an article by Eshan Singh.

rust

Just commit more!

Over new years this past year I made dura. It’s like auto-backup for Git. It tries to stay out of the way until you’re in a panic, trying to figure out how to rescue your repository from a thoughtless git reset --hard. It makes background commits, real Git commits that you don’t normally have to see in the log, by committing to a different branch than the one you have checked out. Overall, it’s been a blast. I’ve learned a lot from the contributors, like how to write well-formed Rust as well as a bit about Nix.

One recurring quesion has been, “why don’t you just commit more”?

It’s not a bad question. I clearly went through a lot of effort to build a tool in Rust. I could’ve changed my own behavior. I guess it bugged me how many hours were being wasted on rescuing repositories around the world when the answer is so easy: just commit more.

Source: Just commit more!, an article by Tim Kellogg.

Postgres: a better message queue than Kafka?

Today I’m going to talk about why we made the unconventional decision to build our logging system on top of Postgres, what worked well, what didn’t work well, and how we did it.

Source: Postgres: a better message queue than Kafka?, an article by Pete Hunt.

postgres

Tue 04 Oct 2022

Category Theory

Category theory has come to occupy a central position in contemporary mathematics and theoretical computer science, and is also applied to mathematical physics. Roughly, it is a general mathematical theory of structures and of systems of structures. As category theory is still evolving, its functions are correspondingly developing, expanding and multiplying. At minimum, it is a powerful language, or conceptual framework, allowing us to see the universal components of a family of structures of a given kind, and how structures of different kinds are interrelated. Category theory is both an interesting object of philosophical study, and a potentially powerful formal tool for philosophical investigations of concepts such as space, system, and even truth. It can be applied to the study of logical systems in which case category theory is called “categorical doctrines” at the syntactic, proof-theoretic, and semantic levels. Category theory even leads to a different theoretical conception of set and, as such, to a possible alternative to the standard set theoretical foundation for mathematics. As such, it raises many issues about mathematical ontology and epistemology. Category theory thus affords philosophers and logicians much to use and reflect upon.

Source: Category Theory.

Functional programming in Go

Now that generics have come to Go, what real-world code can we actually write using type parameters? Are they of any practical use in programs? Are there some things that we couldn’t easily write in Go before generics?

Source: Functional programming in Go, an article by John Arundel.

Nix language basics

The Nix language is used to declare packages and configurations to be built by Nix.

It is a domain-specific, purely functional, lazily evaluated, dynamically typed programming language.

Source: Nix language basics.

Mon 03 Oct 2022

Writing a toy DNS Server in Rust using Trust DNS

Ever wondered how you can write a DNS server in Rust? No? Well, too bad, I'm telling you anyways. But don't worry, this is going to be a fun one.

Source: Writing a toy DNS Server in Rust using Trust DNS, an article by Patrick Elsen.

rust

The new wave of Javascript web frameworks

Make sense of the proliferation of new Javascript web frameworks. A deep dive into the problems at scale and the recent evolution of innovation.

Source: The new wave of Javascript web frameworks.

javascript

Software engineering practices

Gergely Orosz started a Twitter conversation asking about recommended “software engineering practices” for development teams.

(I really like his rejection of the term “best practices” here: I always feel it’s prescriptive and misguiding to announce something as “best”.)

I decided to flesh some of my replies out into a longer post.

Source: Software engineering practices, an article by Simon Willison.

software development