Plurrrr

week 38, 2021

A Brief History of Markov Chains

One of the most common simple techniques for generating text is a Markov chain. The algorithm takes an input text or texts, divides it into tokens (usually letters or words), and generates new text based on the statistics of short sequences of those tokens. If you're interested in the technical details, Allison Parrish has a good tutorial on implementing Markov chains in Python.

This technique has been around for a very long time, but it's slightly unclear when it was first developed. The following is a slightly sketchy work-in-progress timeline of what I've been able to work out.

Source: A Brief History of Markov Chains, an article by Martin O'Leary.

What Makes a Good Changelog

Changelogs are important communication tools, and should be made for people to enjoy reading. There should always be some level of editing, and tailoring the message to who you’re talking to, and what you're talking about. Relying on strict commit styles and auto-generating tools limits that amount of tailoring, even if the tool allows you to customize the end user output.

Source: What Makes a Good Changelog, an article by Zeno Rocha and Herbert Lui.

The GIL and its effects on Python multithreading

As you probably know, the GIL stands for the Global Interpreter Lock, and its job is to make the CPython interpreter thread-safe. The GIL allows only one OS thread to execute Python bytecode at any given time, and the consequence of this is that it's not possible to speed up CPU-intensive Python code by distributing the work among multiple threads. This is, however, not the only negative effect of the GIL. The GIL introduces overhead that makes multi-threaded programs slower, and what is more surprising, it can even have an impact I/O-bound threads.

In this post I'd like to tell you more about non-obvious effects of the GIL. Along the way, we'll discuss what the GIL really is, why it exists, how it works, and how it's going to affect Python concurrency in the future.

Source: Python behind the scenes #13: the GIL and its effects on Python multithreading, an article by Victor Skvortsov.

Make Python Run Faster: A Machine Learning Perspective

Python has a great ecosystem for machine learning, but deep learning is computationally intensive and Python is slow. In this post, I will discuss different ways that helped to make my code run faster, more specifically in physics simulation and reinforcement learning for character animations. Nevertheless, most of the tips are applicable to all computationally intensive programs.

With that, here are the 7 ways to make any Python code run faster:

  1. Make Your Machine Run Faster
  2. Try Different Python Versions and Distributions
  3. Profile and Optimize
  4. Be Mindful of Type Conversions
  5. Be Strategic About Memory Allocations
  6. Be Clever When Writing If-Statements
  7. Be Cautious When Using Packages

Source: Make Python Run Faster: A Machine Learning Perspective.

How Replication Works in MySQL

MySQL has a feature known as replication where data from one database server (referred to as the source or primary) is copied to other database servers (replicas). Conceptually, it’s pretty straight forward. All of your data mutations (INSERT, UPDATE, DELETE, ALTER, etc.) are done against the primary database. Those commands are then copied and applied to the replicas.

Source: How Replication Works in MySQL, an article by Ryan Siemens.

Python as a build tool

Normally, when starting a Java project (or any other programming project, really), you don’t want to reinvent the wheel. You go with the de-facto build system, folder structure, environment etc. The ones that rest of the world is using.

Yet, both Skija and JWM are built using Python scripts instead of more traditional Ant/Maven/Gradle/SBT. Why? Let’s find out!

Source: Python as a build tool.

Ocean Prey

An off-duty Coast Guardsman is fishing with his family when he calls in some suspicious behavior from a nearby boat. It's a snazzy craft, slick and outfitted with extra horsepower, and is zipping along until it slows to pick up a surfaced diver . . . a diver who was apparently alone, without his own boat, in the middle of the ocean. None of it makes sense unless there's something hinky going on, and his hunch is proved right when all three Guardsmen who come out to investigate are shot and killed.

They're federal officers killed on the job, which means the case is the FBI's turf. When the FBI's investigation stalls out, they call in Lucas Davenport. And when his case turns lethal, Davenport will need to bring in every asset he can claim, including a detective with a fundamentally criminal mind: Virgil Flowers.

In the evening I started in Ocean Prey, a Lucas Davenport and Virgil Flowers novel by John Sandford.

Designing state machines in Rust

In Rust, you often hear about state machines. Futures are state machines! I thought it would be cool to read more about it. I came across this blog post (funnily enough, by a friend and mentor of mine) which really helped me! I highly recommend reading it. In this post, I'm just noting the part I found relevant. The example here is from her blog post.

Source: Designing state machines in Rust, an article by Senyo Simpson.

Taming Go’s Memory Usage

A couple months ago, we faced a question many young startups face. Should we rewrite our system in Rust?

At the time of the decision, we were a Go and Python shop. The tool we’re building passively watches API traffic to provide “one-click,” API-centric visibility, by analyzing the API traffic. Our users run an agent that sends API traffic data to our cloud for analysis. Our users were using us to watch more and more traffic in staging and production—and they were starting to complain about the memory usage.

This led me to spend 25 days in the depths of despair and the details of Go memory management, trying to get our memory footprint to an acceptable level. This was no easy feat, as Go is a memory-managed language with limited ability to tune garbage collection.

Source: Taming Go’s Memory Usage, or How We Avoided Rewriting Our Client in Rust, an article by Mark Gritter.

The Actor Reentrancy Problem in Swift

When the first time I saw the WWDC presentation about actors, I was thrilled with what it is capable of and how it will change the way we write asynchronous code in the near future. By using actors, writing asynchronous code that is free from data races and deadlocks has never been easier.

All that aside, that doesn’t mean that actors are free from threading issues. If we are not careful enough, we might accidentally introduce a reentrancy problem when using actors.

Source: The Actor Reentrancy Problem in Swift, an article by Lee Kah Seng.

Everyone’s a (Perl) critic, and you can be too!

The perlcritic tool is often your first defense against “awkward, hard to read, error-prone, or unconventional constructs in your code,” per its description. It’s part of a class of programs historically known as linters, so-called because like a clothes dryer machine’s lint trap, they “detect small errors with big effects.” (Another such linter is perltidy, which I’ve referenced in the past.)

Source: Everyone’s a (Perl) critic, and you can be too!, an article by Mark Gardner.

Inspecting coredumps like it's 2021

A coredump is a snapshot of a process’s memory that is usually created by the kernel when a crash happens. These can be fairly helpful to find out which part of the code broke by looking at the backtrace or finding any kind of corruption by introspecting the memory itself. Unfortunately it can be a bit tedious to work with these. This article aims to give an overview over helpful tools & tricks to leverage the full power of coredumps on Nix-based systems.

Source: Inspecting coredumps like it's 2021, an article by Maximilian Bosch.

Python Plotting for Exploratory Data Analysis

Plotting is an essential component of data analysis. As a data scientist, I spend a significant amount of my time making simple plots to understand complex data sets (exploratory data analysis) and help others understand them (presentations).

In particular, I make a lot of bar charts (including histograms), line plots (including time series), scatter plots, and density plots from data in Pandas data frames. I often want to facet these on various categorical variables and layer them on a common grid.

Source: Python Plotting for Exploratory Analysis, an article by Tim Hopper.

Filtering With PiHole and Podman

’ve long been a fan of filtering at the DNS level. The approach that I’ll outline here is similar to what our devices did at Luma. It is not perfect by any means, but if it is setup well then it provides a chokepoint that has a solid return on investment for the amount of time and effort that it takes to standup. It isn’t going to keep any sophisticated actors at bay, but it does demonstrably improve performance and has a good chance of reducing the headaches of administrating a network and the systems on it for a connected family.

Source: It's Always DNS. Filtering With PiHole and Podman, an article by Daniel Peck.

Currying

Currying is an advanced technique of working with functions. It’s used not only in JavaScript, but in other languages as well.

Currying is a transformation of functions that translates a function from callable as f(a, b, c) into callable as f(a)(b)(c).

Currying doesn’t call a function. It just transforms it.

Source: Currying, an article by Ilya Kantor.

Data Compression With Arithmetic Coding

Arithmetic coding is a common algorithm used in both lossless and lossy data compression algorithms.

It is an entropy encoding technique, in which the frequently seen symbols are encoded with fewer bits than rarely seen symbols. It has some advantages over well-known techniques such as Huffman coding. This article will describe the CACM87 implementation of arithmetic coding in detail, giving you a good understanding of all the details needed to implement it.

Source: Data Compression With Arithmetic Coding, an article by Mark Nelson.

Edge

Behind the well-known U.S. security organizations—the FBI and CIA among them—lies a heavily guarded, anonymous government agency dedicated to intelligence surveillance and to a highly specialized brand of citizen protection.

Shock waves of alarm ripple through the clandestine agency when Washington, D.C., police detective Ryan Kessler inexplicably becomes the target of Henry Loving, a seasoned, ruthless “lifter” hired to obtain information using whatever means necessary. While Loving is deft at torture, his expertise lies in getting an “edge” on his victim—leverage—usually by kidnapping or threatening family until the “primary” caves under pressure

In the evening I started in Edge by Jeffery Deaver.