week 01, 2022

Saving a Third of Our Memory by Re-ordering Go Struct Fields

We had application at one of my previous companies that typically ran with ~2GB in memory at any given time, but simply changing the order of some uint variables we managed to drop the memory usage to less than 1.4GB. Let’s dive into how inefficient field ordering in Go structs can have a huge impact on the memory footprint of a program.

Source: Saving a Third of Our Memory by Re-ordering Go Struct Fields, an article by Lane Wagner.

Hashing is not encryption

In a job interview years ago, the interviewer asked me to explain the difference between encryption, encoding, and hashing. At the time I was working for a company that specialized in encryption, so I took knowing the difference for granted.

It wasn’t until much later that I understood how easily most folks can confuse the three topics for one another. Let’s take a look at each in turn.

Source: Hashing is not encryption, an article by Eric Mann.

Where is every IP Address?

IPinfo builds and sells IPv4 and IPv6 address metadata. This is available either by API, file download or as a Snowflake dataset. When you present an IP address, it'll offer that IP's physical location and ownership information. You can also see if it's used as a VPN or Tor endpoint, is owned by a hosting company and which domain names have been pointed at it.

Source: Where is every IP Address?, an article by Mark Litwintschik.

Modeling uncertainty with PyTorch

Understanding and modeling uncertainty surrounding a machine learning prediction is of critical importance to any production model. It provides a handle to deal with cases where the model strays too far away from its domain of applicability, into territories where using the prediction would be inacurate or downright dangerous. Think medical diagnosis or self-driving cars.

Source: Modeling uncertainty with PyTorch, an article by Romain Strock.

systemd, 10 years later: a historical and technical retrospective

10 years ago, systemd was announced and swiftly rose to become one of the most persistently controversial and polarizing pieces of software in recent history, and especially in the GNU/Linux world. The quality and nature of debate has not improved in the least from the major flame wars around 2012-2014, and systemd still remains poorly understood and understudied from both a technical and social level despite paradoxically having disproportionate levels of attention focused on it.

Source: systemd, 10 years later: a historical and technical retrospective.

Tracing in Linux and macOS

If you’re coming from Linux, you may be familiar with the ptrace family of commands — strace and ltrace. If you’re coming from macOS, you may have had brief encounters with dtruss or dtrace, instead.

If you haven’t heard of them before or haven’t had the chance to play with them, this post is for you. I’m going to show you what they do and why they are important tools to know.

Source: Tracing in Linux and macOS, an article by Patrick Elsen.

Shell Eval

In this post, we will perform a few experiments to see the usefulness of the eval command for a particular scenario in a POSIX-compliant shell.

Source: Shell Eval, an article by Susam Pal.

Ranges and suffering

If you're familiar with Python, you probably like Rust's ranges a lot. They're generally tidy, are lots more concise than writing out range(...) all the time, and are a ton better than magic syntax for slicing (thanks for that one, Guido)

Unfortunately, the redeeming qualities of Rust's range types stop there. Behind a friendly face lurks what is perhaps the single biggest collection of infuriating design choices in Rust's entire standard library.

Source: Ranges and suffering.

Why might you run your own DNS server?

One of the things that makes DNS difficult to understand is that it’s decentralized. There are thousands (maybe hundreds of thousands? I don’t know!) of authoritative nameservers, and at least 10 million resolvers. And they’re running lots of different software! All these different servers running software means that there’s a lot of inconsistency in how DNS works, which can cause all kinds of frustrating problems.

Source: Why might you run your own DNS server?, an article by Julia Evans.

Three Kinds of Polymorphism in Rust

When faced with a situation where you're writing code that should work across a few different kinds of values without knowing what they are ahead of time, Rust asks slightly more of you than many languages do. Dynamic languages will let you pass in anything, of course, as long as the code works when it's run. Java/C# would ask for an interface or a superclass. Duck-typed languages like Go or TypeScript would want some structural type- an object type with a particular set of properties, for instance.

Rust is different. In Rust there are three main approaches for handling this situation, and each has its own advantages and disadvantages.

Source: Three Kinds of Polymorphism in Rust, an article by Brandon Smith.

Passing runtime data to AWK

In order for one language to cooperate with another usefully via embedded programs in this way, data of some sort needs to be passed between them at runtime, and here there are a few traps with syntax that may catch out unwary shell programmers. We’ll go through a simple example showing the problems, and demonstrate a few potential solutions.

Source: Passing runtime data to AWK, an article by Tom Ryder.

Bashing JSON into Shape with SQLite

It is clear that most of the world has decided that they want to use JSON for their public-facing API endpoints. However, most of the time you will need to deal with storage engines that don't deal with JSON very well. This can be confusing to deal with because you need to fit a square peg into a round hole.

However, SQLite added JSON functions to allow you to munge and modify JSON data in whatever creative ways you want. You can use these and SQLite triggers in order to automatically massage JSON into whatever kind of tables you want. Throw in upserts and you'll be able to make things even more automated.

Source: Bashing JSON into Shape with SQLite, an article by Christine Dodrill.

Neural Network From Scratch

In this edition of Napkin Math, we'll invoke the spirit of the Napkin Math series to establish a mental model for how a neural network works by building one from scratch. In a future issue we will do napkin math on performance, as establishing the first-principle understanding is plenty of ground to cover for today!

Source: Neural Network From Scratch, an article by Simon Hørup Eskildsen.

In Defense of Async: Function Colors Are Rusty

async was controversial from its inception; it’s still controversial today; and in this post I am throwing my own 2 cents into this controversy, in defense of the feature. I am only going to try to counter one particular line of criticism here, and I don’t anticipate I’ll cover all the nuance of it – this is a multifaceted issue, and I have a day job. I am also going to assume for this post that you have some understanding of how async works, but if you don’t, or just want a refresher I heartily recommend the Tokio tutorial.

Source: In Defense of Async: Function Colors Are Rusty, an article by Jimmy Hartzell.

Optimizing the size of the Go binary

If you have ever written in Go, then the size of the resulting binaries could not escape your attention. Of course, in the age of gigabit links and terabyte drives, this shouldn’t be a big problem. Still, there are situations when you want the size of the binary to be as small as possible, and at the same time you do not want to part with Go.

Source: Optimizing the size of the Go binary.

Profiling and Analyzing Performance of Python Programs

Profiling is integral to any code and performance optimization. Any experience and skill in performance optimization that you might already have will not be very useful if you don't know where to apply it. Therefore, finding bottlenecks in your applications can help you solve performance issues quickly with very little overall effort.

In this article we will look at the tools and techniques that can help us narrow down our focus and find bottlenecks both for CPU and memory consumption, as well as how to implement easy (almost zero-effort) solutions to performance issues in cases where even well targeted code changes won't help anymore.

Source: Profiling and Analyzing Performance of Python Programs, an article by Martin Heinz.

2021 in review: unsupervised brain models

We’re in a golden age of merging AI and neuroscience. No longer tied to conventional publication venues with year-long turnaround times, our field is moving at record speed. As 2021 draws to a close, I wanted to take some time to zoom out and review a recent trend in neuro-AI, the move toward unsupervised learning to explain representations in different brain areas.

Source: 2021 in review: unsupervised brain models, an article by Patrick Mineault.

Visualizing Bayes Theorem

I recently came up with what I think is an intuitive way to explain Bayes’ Theorem. I searched in google for a while and could not find any article that explains it in this particular way.

Of course there’s the wikipedia page, that long article by Yudkowsky, and a bunch of other explanations and tutorials. But none of them have any pictures. So without further ado, and with all the chutzpah I can gather, here goes my explanation.

Source: Visualizing Bayes Theorem, an article by Oscar Bonilla.