Plurrrr

a tumblelog
week 38, 2020

How efficient is deduplication?

How efficient is Tarsnap's deduplication? It depends on what kind of data you are backing up, and how often. Let's look at a few examples, produced via the --print-stats option.

Source: Tarsnap - Deduplication examples.

FreeBSD Subversion to Git Migration: Pt 1 Why?

There's a number of factors motivating the change. We'll explore the reasons, from long term viability of Subversion, to wider support for tools that will make the project better. Today I'll enumerate these points. There are some logistical points around how the decision was made. I'll not get into the politics about how we got here, though. While interesting for insiders who like to argue and quibble, they are no more relevant to the larger community that the color of the delivery truck that delivered groceries to your grocer this morning (even if it had the latest episode of a cool, scrappy cartoon cat that was involved in a multi-year arc wooing the love of his life by buying food at this store).

Source: FreeBSD Subversion to Git Migration: Pt 1 Why?, an article by Warner Losh.

Artificial Neural Networks — The Activation Function

In a neural network, numeric data points, called inputs, are fed into the neurons in the input layer. Each neuron has a weight, and multiplying the input number with the weight gives the output of the neuron, which is transferred to the next layer.

The activation function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer. It can be as simple as a step function that turns the neuron output on and off, depending on a rule or threshold. Or it can be a transformation that maps the input signals into output signals that are needed for the neural network to function

Source: Artificial Neural Networks — The Activation Function, an article by Ahmad Haddad.

Another sunflower

In the afternoon I took a photo of another sunflower flowering. This is the second one in our garden, the first one opened the 31st of August 2020.

Another sunflower flowering in our garden
Another sunflower flowering in our garden.

In the above photo you can see the first one, which has finished flowering (top, center), and another one that's close to fully open (bottom, center). The plants grew from seeds dropped by birds. Most likely western jackdaw (Coloeus monedula) as they were very common in our garden when we fed seeds to the birds.

Announcing the Error Handling Project Group

Today we are announcing the formation of a new project group under the libs team, focused on error handling!

Some of the goals this project group will be working on include:

  1. Defining and codifying common error handling terminology.
  2. Generating consensus on current error handling best practices.
  3. Identifying pain points that exist in Rust’s error handling story.
  4. Communicating current error handling best practices.
  5. Consolidating the Rust error handling ecosystem.

Source: Announcing the Error Handling Project Group, an article by Sean Chen.

ugit: DIY Git in Python

Welcome aboard! We're going to implement Git in Python to learn more about how Git works on the inside.

This tutorial is different from most Git internals tutorials because we're not going to talk about Git only with words but also with code! We're going to write in Python as we go.

This is not a tutorial on using Git! To follow along I advise that you have working knowledge of Git. If you're a newcomer to Git, this tutorial is probably not the best place to start your Git journey. I suggest coming back here after you've used Git a bit and you're comfortable with making commits, branching, merging, pushing and pulling.

Source: Git Internals - Learn by Building Your Own Git.

Memory optimizations for Go systems

Despite its growing popularity as a systems language, Go programs are susceptible to severe performance regressions at large scale. In systems with high memory usage, garbage collection (GC) can cause performance regressions by cannibalizing resources from the main program. The goal of this post is to help you understand:

  • How Go GC works at a high level? Why would it impact your system’s performance?
  • What causes GC pressure (more resources spent on GC)?
  • How to determine if GC pressure is the cause of your performance problems?
  • How to measure and profile your program’s heap usage?
  • How to identify which part of the code is the culprit?
  • What are some steps you can take to lower heap usage and GC pressure?

Source: Memory optimizations for Go systems, an article by Nishant Roy.

Editing for cleaner merges

As we add new code to existing files, it feels natural to append – add new functions at the end of the file, new requires at the end of the list of requires, and so on. This approach introduces some friction, and in this post I'll share some pointers for improved editing and git workflow.

Source: Editing for cleaner merges, an article by Christian Johansen.

Golang is not Ready for Enterprise Systems yet and Here’s Why

Enterprise application is a long-lived, reliable system, having a lot of persisting data for many years. Nowadays the world of Golang is not providing possibilities to build systems in the way that enterprise systems are built.

Source: Golang is not Ready for Enterprise Systems yet and Here’s Why, an article by Dmitry Afonkin.

GitHub CLI 1.0 is now available

GitHub CLI brings GitHub to your terminal. It reduces context switching, helps you focus, and enables you to more easily script and create your own workflows. Earlier this year, we announced the beta of GitHub CLI. Since we released the beta, users have created over 250,000 pull requests, performed over 350,000 merges, and created over 20,000 issues with GitHub CLI. We’ve received so much thoughtful feedback, and today GitHub CLI is out of beta and available to download on Windows, macOS, and Linux.

With GitHub CLI 1.0, you can:

  • Run your entire GitHub workflow from the terminal, from issues through releases
  • Call the GitHub API to script nearly any action, and set a custom alias for any command
  • Connect to GitHub Enterprise Server in addition to GitHub.com

Source: GitHub CLI 1.0 is now available, an article by Amanda Pinsker.

New in Thunderbird 78

This article describes some of the major changes visible to users in Thunderbird version 78. Full details of all the changes can be found in the Thunderbird release notes from 78.0 and up.

Source: New in Thunderbird 78.

About dialog of Thunderbird 78.2.2 on macOS Mojave
About dialog of Thunderbird 78.2.2 on macOS Mojave.

In the evening I installed this update of Thunderbird. I think it will take some time to get used to the new icons; they pop out more than the old ones in my opinion.

I also couldn't get email from some accounts. Changing the value of security.tls.version.min to 1 fixed this. To do so I opened the Config Editor via the General section of Preferences (scroll to the bottom). Thanks user ermspv for explaining this in a comment on Hacker News.

Optional chaining (?.)

The optional chaining operator (?.) permits reading the value of a property located deep within a chain of connected objects without having to expressly validate that each reference in the chain is valid. The ?. operator functions similarly to the . chaining operator, except that instead of causing an error if a reference is nullish (null or undefined), the expression short-circuits with a return value of undefined. When used with function calls, it returns undefined if the given function does not exist.

Source: Optional chaining (?.).

Challenging LR Parsing

This post is a direct response to Which Parsing Approach?. If you haven’t read that article, do it now — it is the best short survey of the lay of the land of modern parsing techniques. I agree with conclusion — LR parsing is the way to go if you want to do parsing “properly”. I reasoned the same a couple of years ago: Modern Parser Generator.

However, and here’s the catch, rust-analyzer uses a hand-written recursive descent / Pratt parser. One of the reasons for that is that I find existing LR parser generators inadequate for production grade compiler/IDE. In this article, I want to list specific challenges for the authors of LR parser generators.

Source: Challenging LR Parsing.

Downsampling Data with Postgres Window Functions

SQL window functions are one of those topics that never quite clicked for me in the abstract, but I’ve been able to take advantage of them on a few recent projects and wanted to share some of what I’ve learned.

Source: Downsampling Data with Postgres Window Functions, an article by Chris Toomey.

Brotli vs Gzip Compression

According to a Google study, 40% of people abandon a website that takes more than 3 seconds to load and a 1-second delay in page response can result in a 7% reduction in conversions. Yes, every second matters! And we saved around 2.5 seconds (90th percentile) and 1.2 seconds (50th percentile) by using Brotli compression over gzip compression for our Javascript and CSS files.

Source: Brotli vs Gzip Compression. How we improved our latency by 37% an article by Ankit Jain AJ.

Which Parsing Approach?

We all know that parsing is an important part of designing and implementing programming languages, but it’s the equivalent of Brussels sprouts: good for the diet, but a taste that only a select few enjoy. Unfortunately, I’ve come to realise that our general distaste for parsing is problematic. While many of us think that we’ve absorbed the advances of the 1960s into our collective understanding, I fear that we have regressed, and that we are often making inappropriate decisions about parsing. If that sounds accusatory, I don’t mean it to be: I spent over 20 years assuming that parsing is easy and that I didn’t need to understand it properly in order to use it well. Alas, reality has been a cruel teacher, and in this post I want to share some of the lessons I’ve been forced to slowly learn and acknowledge.

Source: Which Parsing Approach?, an article by Laurence Tratt.

How HTTPS Works

Have you ever wondered why a green lock icon appears on your browser URL bar? And why is it important? We did too, and this comic is for you!

Source: How HTTPS works.

Hidden Gems of PostgreSQL 13

Each release has a lot of these "hidden gems" -- features that may not jump off the page, but can have a big impact when you actually need them. Postgres 13 is no exception: some of these features make it easier to write queries, add additional layers of security, or help you to avoid downtime.

So what are the hidden gems of PostgreSQL 13?

Source: Hidden Gems of PostgreSQL 13, an article by Jonathan S. Katz.

Docker—An Introduction to Container Orchestration

In this installment we are going to look at “container orchestration” for Docker. In the previous installment, we just looked at how to run an individual container. However, most applications are a combination of services which are orchestrated together to make an application.

While in theory all the pieces of an application could be built into a single container, it is better to split an application into its relevant services and run a separate container for each service. There are several reasons for this, but the biggest one is scalability.

Source: Part 4: Docker—An Introduction to Container Orchestration, an article by Jonathan Bartlett.

The State of SwiftUI

Apple released SwiftUI last year, and it’s been an exciting and wild ride. With iOS 14, a lot of the rough edges have been smoothed out — is SwiftUI finally ready for production?

Source: The State of SwiftUI, an article by Peter Steinberger.

3 lesser-known ways of using Swift enums

An enumeration (enum) is a very powerful type in Swift. I use it a lot in my code. Today I'm going to introduce you to some techniques around Swift enumeration that you might not aware of.

Source: 3 lesser-known ways of using Swift enums, an article by Sarun Wongpatcharapakorn.

The Hows and Whys of Regression Analysis

Machine learning experts have borrowed the methods of regression analysis from math because they allow the making of predictions with as little as just one known variable (as well as multiple variables). They’re useful for financial analysis, weather forecasting, medical diagnosis, and many other fields.

Source: The Hows and Whys of Regression Analysis.

Plot With Pandas: Python Data Visualization for Beginners

Whether you’re just getting to know a dataset or preparing to publish your findings, visualization is an essential tool. Python’s popular data analysis library, pandas, provides several different options for visualizing your data with .plot(). Even if you’re at the beginning of your pandas journey, you’ll soon be creating basic plots that will yield valuable insights into your data.

In this tutorial, you’ll learn:

  • What the different types of pandas plots are and when to use them
  • How to get an overview of your dataset with a histogram
  • How to discover correlation with a scatter plot
  • How to analyze different categories and their ratios

Source: Plot With Pandas: Python Data Visualization for Beginners, an article by Reka Horvath.