Plurrrr

week 50, 2020

Bulk loading into PostgreSQL: Options and comparison

You have a file, possibly a huge CSV, and you want to import its content into your database. There are lots of options to do this but how would you decide which one to use. More often than not the question is how much time would the bulk load would take. I found my self doing the same few days back when I wanted to design a data ingestion process for PostgreSQL where we needed to bulk load around 250GB of data from CSV files every 24 hours.

Source: Bulk loading into PostgreSQL: Options and comparison, an article by Muhammad Usama.

Use polling for resiliency

How should two microservices communicate?

Let’s analyze two different communication patterns:

  • Polling: Service B periodically query Service A for the current state of the users and updates its local storage.
  • Event driven: Service A publishes in a queue every time user information is updated. Service B consumes the updates to stay up to date.

Using TLA+ we are going to model the two different patterns to see how well they fit our requirements. In our specification, we are also going to take into account unexpected failures that can affect the system and show how to model them in TLA+.

Source: Use polling for resiliency, an article by Georgios Chinis.

Become shell literate

Shell literacy is one of the most important skills you ought to possess as a programmer. The Unix shell is one of the most powerful ideas ever put to code, and should be second nature to you as a programmer. No other tool is nearly as effective at commanding your computer to perform complex tasks quickly — or at storing them as scripts you can use later.

Source: Become shell literate, an article by Drew DeVault.

Regex literals optimization

The regex literals optimization avoids running the regex engine on parts of the input text that cannot possibly ever match the regex.

An example of a regex this can be applied to is \w+@\w+\.\w+, where the algorithm quickly finds the first @, then matches \w+ backwards to find the start of the match, and then matches \w+\.\w+ forward to find the end of the match. It then finds the second @, starting from the end of the previous match, and so on. This is a fairly naive (and incorrect) implementation, but it gives the idea of how it works.

Source: Regex literals optimization, an article by Esteban C. Borsani.

How Python object system works

As we know from the previous parts of this series, the execution of a Python program consists of two major steps:

  1. The CPython compiler translates Python code to bytecode.
  2. The CPython VM executes the bytecode.

We've been focusing on the second step for quite a while. In part 4 we've looked at the evaluation loop, a place where Python bytecode gets executed. And in part 5 we've studied how the VM executes the instructions that are used to implement variables. What we haven't covered yet is how the VM actually computes something. We postponed this question because to answer it, we first need to understand how the most fundamental part of the language works. Today, we'll study the Python object system.

Our Christmas tree

In the evening we decorated the Christmas tree that was delivered by the end of the afternoon to our house. Because the tree was quite large we had to move around our furniture to make space.

Our Christmas tree
Our Christmas tree.

Cameras and Lenses

Pictures have always been a meaningful part of the human experience. From the first cave drawings, to sketches and paintings, to modern photography, we’ve mastered the art of recording what we see.

Cameras and the lenses inside them may seem a little mystifying. In this blog post I’d like to explain not only how they work, but also how adjusting a few tunable parameters can produce fairly different results.

Source: Cameras and Lenses, an article by Bartosz Ciechanowski.

Dave goes back to Mac

I am writing this post from my new 13” Macbook Pro with an Apple M1 Chip. If you’ve been following the five year long #davegoeswindows saga, then this might come as a sudden surprise. I will be honest, it comes as a surprise to me too. The decision was a bit impulsive but my dev environment was blocking me and my time and patience was not a luxury I could afford. I’m living that aluminium utopia dongle life now and will stick to this for the forseeable future.

Source: Dave goes back to Mac, an article by Dave Rupert.

The Saints of Salvation

Humanity is struggling to hold out against a hostile takeover by an alien race that claims to be on a religious mission to bring all sentient life to its God at the End of Time. But while billions of cocooned humans fill the holds of the Olyix’s deadly arkships, humankind is playing an even longer game than the aliens may have anticipated. From an ultra-secret spy mission to one of the grandest battles ever seen, no strategy is off the table. Will a plan millennia in the making finally be enough to defeat this seemingly unstoppable enemy? And what secrets are the Olyix truly hiding in their most zealously protected stronghold?

In the evening I started in The Saints of Salvation, book 3 in the Salvation Sequence by Peter F. Hamilton. I liked the previous 2 books a lot so I have high expectations of the third and final book in the series.

The mythical “fast” web page

Web performance can mean a lot of different things to a lot of different people. Fundamentally, it’s a question of how fast a web page is. But fast to whom?

When this page loaded moments ago, was it fast? If so, congratulations, you had a fast experience. So ask yourself, does that make this a fast page? Not so fast! Just because you had a fast experience doesn’t mean everyone else does too. You might even revisit this page and have yourself a slow experience.

Source: The mythical “fast” web page, an article by Rick Viscomi.

vipe

vipe allows you to run your editor in the middle of a unix pipeline and edit the data that is being piped between programs. Your editor will have the full data being piped from command1 loaded into it, and when you close it, that data will be piped into command2.

Source: vipe(1) — moreutils

A Sad Day

Today I noticed that the female Aphonopelma seemanni I keep was in a death curl; it was either dying or already dead. The day before I had noticed it was leaking hemolymph from the top of its abdomen close to the pedicel. I couldn't see any damage and had no idea why this was happening.

Aphonopelma seemanni leaking hemolymph
Aphonopelma seemanni leaking hemolymph.

When I inspected my other tarantulas I noticed another loss: the Caribena versicolor I keep was also in a death curl. That's two in a single day 😢.

The Math and Algorithms of Secret Santa

Secret Santa is a traditional Christmas gift exchanging scheme in which each member of a group is randomly and anonymously assigned another member to give a Christmas gift to (usually by drawing names from a container). It is not valid for a person to be assigned to themself (if someone were to draw their own name, for example, all the names should be returned to the jar and the drawing process restarted).

Given a group of a certain size, how many different ways are there to make valid assignments? What is the probability that at least one person will draw their own name? What is the probability that two people will draw each other’s names? What is a good way to have a computer make the assignments while guaranteeing they are generated with equal probability among all possible assignments?

It turns out that these questions about secret santa present good motivation for exploring some of the fundamental concepts in combinatorics (the math of counting). In the sections below we will take a look at a bit of that math and algorithms that allow us to answer the questions we posed above. The final section presents a simple command-line program that allows generating and anonymously sending secret santa assignments via email so that we no longer need to go through the tedious ordeal of drawing names from a hat.

Source: Deranged Sinterklaas: The Math and Algorithms of Secret Santa.

Closures vs pureness

The other day I was looking at a nested map/reduce/filter constellation which had a bunch of nesting, therefore there were lot of closures. This colleague had an interesting question: "In PHP, usually we can tell the interpreter that a function is relying on something from outside of the function with the use keyword, so e.g. we could tell at one level of the nesting that a function not only relying on it's input, but something from the outside (closure). Is there a way to do this in JavaScript?".

Source: Closures vs pureness, an article by Adam Nagy.

Announcing the Atheris Python Fuzzer

Fuzz testing is a well-known technique for uncovering programming errors. Many of these detectable errors have serious security implications. Google has found thousands of security vulnerabilities and other bugs using this technique. Fuzzing is traditionally used on native languages such as C or C++, but last year, we built a new Python fuzzing engine. Today, we’re releasing the Atheris fuzzing engine as open source.

Source: Announcing the Atheris Python Fuzzer.

Mastering the Terminal to Improve Development Speed

When I was new to programming, there was nothing more impressive than watching an expert navigate around a terminal. They could be doing something as simple as editing a text file, but from the outside perspective, it was awe-inspiring. A wizard at the keys, churning out lines of codes without the need to even glance at their mouse. Fast forward several years, and I have slowly acquired the art of the terminal. In this post, I will share several techniques that can be used to speed up development processes. We will specifically cover topics such as grep, tmux, aliasing, and several others. Let’s get started!

Source: How to effectively use the Command Line!, an article by Keith Galli.