Where is every IP Address?

IPinfo builds and sells IPv4 and IPv6 address metadata. This is available either by API, file download or as a Snowflake dataset. When you present an IP address, it'll offer that IP's physical location and ownership information. You can also see if it's used as a VPN or Tor endpoint, is owned by a hosting company and which domain names have been pointed at it.

Source: Where is every IP Address?, an article by Mark Litwintschik.

Modeling uncertainty with PyTorch

Understanding and modeling uncertainty surrounding a machine learning prediction is of critical importance to any production model. It provides a handle to deal with cases where the model strays too far away from its domain of applicability, into territories where using the prediction would be inacurate or downright dangerous. Think medical diagnosis or self-driving cars.

Source: Modeling uncertainty with PyTorch, an article by Romain Strock.

systemd, 10 years later: a historical and technical retrospective

10 years ago, systemd was announced and swiftly rose to become one of the most persistently controversial and polarizing pieces of software in recent history, and especially in the GNU/Linux world. The quality and nature of debate has not improved in the least from the major flame wars around 2012-2014, and systemd still remains poorly understood and understudied from both a technical and social level despite paradoxically having disproportionate levels of attention focused on it.

Source: systemd, 10 years later: a historical and technical retrospective.

Tracing in Linux and macOS

If you’re coming from Linux, you may be familiar with the ptrace family of commands — strace and ltrace. If you’re coming from macOS, you may have had brief encounters with dtruss or dtrace, instead.

If you haven’t heard of them before or haven’t had the chance to play with them, this post is for you. I’m going to show you what they do and why they are important tools to know.

Source: Tracing in Linux and macOS, an article by Patrick Elsen.

Shell Eval

In this post, we will perform a few experiments to see the usefulness of the eval command for a particular scenario in a POSIX-compliant shell.

Source: Shell Eval, an article by Susam Pal.

Ranges and suffering

If you're familiar with Python, you probably like Rust's ranges a lot. They're generally tidy, are lots more concise than writing out range(...) all the time, and are a ton better than magic syntax for slicing (thanks for that one, Guido)

Unfortunately, the redeeming qualities of Rust's range types stop there. Behind a friendly face lurks what is perhaps the single biggest collection of infuriating design choices in Rust's entire standard library.

Source: Ranges and suffering.

Why might you run your own DNS server?

One of the things that makes DNS difficult to understand is that it’s decentralized. There are thousands (maybe hundreds of thousands? I don’t know!) of authoritative nameservers, and at least 10 million resolvers. And they’re running lots of different software! All these different servers running software means that there’s a lot of inconsistency in how DNS works, which can cause all kinds of frustrating problems.

Source: Why might you run your own DNS server?, an article by Julia Evans.

Three Kinds of Polymorphism in Rust

When faced with a situation where you're writing code that should work across a few different kinds of values without knowing what they are ahead of time, Rust asks slightly more of you than many languages do. Dynamic languages will let you pass in anything, of course, as long as the code works when it's run. Java/C# would ask for an interface or a superclass. Duck-typed languages like Go or TypeScript would want some structural type- an object type with a particular set of properties, for instance.

Rust is different. In Rust there are three main approaches for handling this situation, and each has its own advantages and disadvantages.

Source: Three Kinds of Polymorphism in Rust, an article by Brandon Smith.

Passing runtime data to AWK

In order for one language to cooperate with another usefully via embedded programs in this way, data of some sort needs to be passed between them at runtime, and here there are a few traps with syntax that may catch out unwary shell programmers. We’ll go through a simple example showing the problems, and demonstrate a few potential solutions.

Source: Passing runtime data to AWK, an article by Tom Ryder.

Bashing JSON into Shape with SQLite

It is clear that most of the world has decided that they want to use JSON for their public-facing API endpoints. However, most of the time you will need to deal with storage engines that don't deal with JSON very well. This can be confusing to deal with because you need to fit a square peg into a round hole.

However, SQLite added JSON functions to allow you to munge and modify JSON data in whatever creative ways you want. You can use these and SQLite triggers in order to automatically massage JSON into whatever kind of tables you want. Throw in upserts and you'll be able to make things even more automated.

Source: Bashing JSON into Shape with SQLite, an article by Christine Dodrill.

Neural Network From Scratch

In this edition of Napkin Math, we'll invoke the spirit of the Napkin Math series to establish a mental model for how a neural network works by building one from scratch. In a future issue we will do napkin math on performance, as establishing the first-principle understanding is plenty of ground to cover for today!

Source: Neural Network From Scratch, an article by Simon Hørup Eskildsen.

In Defense of Async: Function Colors Are Rusty

async was controversial from its inception; it’s still controversial today; and in this post I am throwing my own 2 cents into this controversy, in defense of the feature. I am only going to try to counter one particular line of criticism here, and I don’t anticipate I’ll cover all the nuance of it – this is a multifaceted issue, and I have a day job. I am also going to assume for this post that you have some understanding of how async works, but if you don’t, or just want a refresher I heartily recommend the Tokio tutorial.

Source: In Defense of Async: Function Colors Are Rusty, an article by Jimmy Hartzell.

Optimizing the size of the Go binary

If you have ever written in Go, then the size of the resulting binaries could not escape your attention. Of course, in the age of gigabit links and terabyte drives, this shouldn’t be a big problem. Still, there are situations when you want the size of the binary to be as small as possible, and at the same time you do not want to part with Go.

Source: Optimizing the size of the Go binary.

Profiling and Analyzing Performance of Python Programs

Profiling is integral to any code and performance optimization. Any experience and skill in performance optimization that you might already have will not be very useful if you don't know where to apply it. Therefore, finding bottlenecks in your applications can help you solve performance issues quickly with very little overall effort.

In this article we will look at the tools and techniques that can help us narrow down our focus and find bottlenecks both for CPU and memory consumption, as well as how to implement easy (almost zero-effort) solutions to performance issues in cases where even well targeted code changes won't help anymore.

Source: Profiling and Analyzing Performance of Python Programs, an article by Martin Heinz.

2021 in review: unsupervised brain models

We’re in a golden age of merging AI and neuroscience. No longer tied to conventional publication venues with year-long turnaround times, our field is moving at record speed. As 2021 draws to a close, I wanted to take some time to zoom out and review a recent trend in neuro-AI, the move toward unsupervised learning to explain representations in different brain areas.

Source: 2021 in review: unsupervised brain models, an article by Patrick Mineault.

Visualizing Bayes Theorem

I recently came up with what I think is an intuitive way to explain Bayes’ Theorem. I searched in google for a while and could not find any article that explains it in this particular way.

Of course there’s the wikipedia page, that long article by Yudkowsky, and a bunch of other explanations and tutorials. But none of them have any pictures. So without further ado, and with all the chutzpah I can gather, here goes my explanation.

Source: Visualizing Bayes Theorem, an article by Oscar Bonilla.

Almost Always Unsigned

The need for signed integer arithmetic is often misplaced as most integers never represent negative values within a program. The indexing of arrays and iteration count of a loop reflects this concept as well. There should be a propensity to use unsigned integers more often than signed, yet despite this, most code incorrectly choses to use signed integers almost exclusively.

Source: Almost Always Unsigned, an article by Dale Weiler.

Go Fuzzing

Fuzzing is a type of automated testing which continuously manipulates inputs to a program to find bugs. Go fuzzing uses coverage guidance to intelligently walk through the code being fuzzed to find and report failures to the user. Since it can reach edge cases which humans often miss, fuzz testing can be particularly valuable for finding security exploits and vulnerabilities.

Source: Go Fuzzing.

Databass, Part 1: Queries

It's been a while since my last language series on this blog, but I figured I shouldn't let an entire calendar year go by without doing some technical writing here. This time we'll be working on creating a toy relational database in the vein of Tutorial D, as described in Databases, Types, and The Relational Model: The Third Manifesto by C.J. Date and Hugh Darwen. However, instead of creating a full database language with a its own syntax, we're going to embed the database language in Haskell. In particular, we're going to try and get ghc to ensure that queries are well typed as opposed to writing our own type checker.

Source: Databass, Part 1: Queries, an article by Joseph Morag.

The Modern Guide to OAuth

I know what you are thinking, is this really another guide to OAuth 2.0?

Well, yes and no. This guide is different than most of the others out there because it covers all of the ways that we actually use OAuth. It also covers all of the details you need to be an OAuth expert without reading all the specifications or writing your own OAuth server. This document is based on hundreds of conversations and client implementations as well as our experience building FusionAuth, an OAuth server which has been downloaded over a million times.

Source: The Modern Guide to OAuth, an article by Brian Pontarelli and Dan Moore.

James’s OpenBSD setup notes

These are my personal notes on installing, setting up, and using OpenBSD on two Thinkpads (an X220 and a T400). They’re applicable to OpenBSD-current as at 2020-09-05 (somewhere between OpenBSD versions 6.7 and 6.8) - please bear in mind that some things may have changed if you’re using a different version.

Source: James's OpenBSD setup notes.

Daddy's Home (2015)

Brad Whitaker is a radio host trying to get his stepchildren to love him and call him Dad. But his plans turn upside down when their biological father, Dusty Mayron, returns.

In the evening Adam, Alice and I watched Daddy's Home. I liked the movie and give it a 7 out of 10. I also liked the sound track: Here Comes Your Man (Pixies), Self Esteem (The Offspring), and Hate to Say I Told You So (The Hives).

My Setup for Self-Hosting Dozens of Web Applications

There are nearly infinite options available for hosting software today and more come come out every day. However, many articles and guides you'll find online for this kind of thing are either from public cloud providers or companies with massive infrastructure, complex application needs, and huge amounts of traffic.

I wanted to write this up mostly to share the decisions I made for the architecture and why I've done things the way I have. Although my needs are much smaller-scale and I don't currently charge any money for anything I'm running, I still want to provide the best possible experience for my sites' users and protect all the work I've put into my projects.

Source: My Setup for Self-Hosting Dozens of Web Applications + Services on a Single Server, an article by Casey Primozic.

How to back up your Git repositories

Making backups is important. You don’t want to lose all your information because of a broken device or a stolen account. One proposed solution is the 3–2–1 method (3 copies, at least in 2 different devices, and 1 of them off-site) and you should make at least one full backup every year (that could match the World Backup Day). What to back up is up to you. You can backup your contacts, emails, messages, social networks content… and your code.

Backing up code is a bit tricky question. Most of the people host their code on their computer, probably with Git and maybe on Github. But having one copy is having no copies. You don’t want to depend on Github exclusively for your code, and it is wise to have at least one extra copy. The question is then, how to make that extra copy.

Source: How to back up your Git repositories, an article by Alberto de Murga.

The Matrix (1999)

When a beautiful stranger leads computer hacker Neo to a forbidding underworld, he discovers the shocking truth--the life he knows is the elaborate deception of an evil cyber-intelligence.

In the afternoon we watched The Matrix. I like this movie a lot and give it a sold 8.5 out of 10.

Consider SQLite

If you were creating a web app from scratch today, what database would you use? Probably the most frequent answer I see to this is Postgres, although there are a wide range of common answers: MySQL, MariaDB, Microsoft SQL Server, MongoDB, etc. Today I want you to consider: what if SQLite would do just fine?

Source: Consider SQLite, an article by Wesley Aptekar-Cassels.

5 lessons learned when I TDD an algorithm in JavaScript

I just found out that Uncle Bob wrote an article about TDDing the Diamond Square algorithm (yea, I’m slow on catching up sometimes).

In the article, Uncle Bob leads us through a way to TDD an algorithm. It’s pretty nice and gave me a few insights as to how to mock and test in intervals until the algorithm emerges.

Problem is – Uncle Bob’s code is not JavaScript!!!

So I set down and reimplemented the algorithm. This time, I used the lessons learned from Uncle Bob’s article.

Source: 5 lessons learned when I TDD an algorithm in JavaScript, an article by Yonatan Kra.

Optimizing Postgres Queries at Scale

Heap's thousands of customers can build queries in the Heap UI to answer almost any question about how users are using their product. Optimizing all of these queries across all our customers presents special challenges you wouldn't typically encounter if you were optimizing the performance of a small set of queries within a typical app.

This post is about why this scale requires us to conduct performance experiments to optimize our SQL, and it details how we conduct those experiments.

Source: Optimizing Postgres Queries at Scale, an article by Matt Dupree.

Predictive CPU isolation of containers at Netflix

Because microprocessors are so fast, computer architecture design has evolved towards adding various levels of caching between compute units and the main memory, in order to hide the latency of bringing the bits to the brains. However, the key insight here is that these caches are partially shared among the CPUs, which means that perfect performance isolation of co-hosted containers is not possible. If the container running on the core next to your container suddenly decides to fetch a lot of data from the RAM, it will inevitably result in more cache misses for you (and hence a potential performance degradation).

Source: Predictive CPU isolation of containers at Netflix, an article by Benoit Rostykus and Gabriel Hartmann.

Using PostgreSQL and SQL to Randomly Sample Data

In the last post of this series we introduced trying to model fire probability in Northern California based on weather data. We showed how to use SQL to do data shaping and preparation. We ended with a data set that was ready with all the fire occurrences and weather data in a single table almost prepped for logistic regression.

There is now one more step: sample the data. If you have worked with logistic regression before you know you should try to balance the number of occurrences (1) with absences (0). To do this we are going to sample out from the non_fire_weather equal to the count in fire_weather and then combine them into one table.

Source: Using PostgreSQL and SQL to Randomly Sample Data, an article by Steve Pousty.

Image (2014)

When Eva, a young journalist, films a documentary about the mean streets in Brussels she soon gets involved in the life of a young Moroccan guy.

Esme and I watched Image. I liked the movie and give it a 7.5 out of 10.