Beating grep, When Python can’t thread, and UTF-8 in Haskell

Thu 28 Apr 2022

Beating grep with Go

I set out to beat macOS Monterey's default grep (2.6.0-FreeBSD) in a microbenchmark that represents my daily file searching. I chose Go because I like writing Go programs.

Source: Beating grep with Go, an article by Andrew Healey.

When Python can’t thread: a deep-dive into the GIL’s impact

Unfortunately, in many cases Python can only run one thread at a time, due to what’s know as the Global Interpreter Lock (“GIL”). Other times it can run multiple threads just fine—it all depends on the specific usage patterns.

But which usage patterns allow parallelism, and which don’t? Naive mental models will give you inaccurate answers. So in this article you’ll build a practical mental model of how the GIL works:

We’ll start by going through a series of increasingly more accurate mental models of how the GIL works.

Then, we’ll see how our new, more accurate mental model can help you predict where and whether parallelism bottlenecks will occur.

Source: When Python can’t thread: a deep-dive into the GIL’s impact, an article by Itamar Turner-Trauring.

python

So Long Surrogates: How we moved to UTF-8 in Haskell

We released a blazingly fast Aho-Corasick implementation, written in Haskell, in 2019. This implementation was based on UTF-16 strings, since Haskell's text library uses that for its internal string representation. However, the most recent major update of text changed its internal string representation from UTF-16 to UTF-8. This is good news for us, since most of our customer’s data is ASCII, this update will cut our memory consumption in half. The big problem though is that our highly optimized string search library alfred-margaret assumes that its input is encoded as UTF-16 and uses that assumption to cut a few corners to improve performance. In this post we will illustrate the challenges we encountered implementing UTF-8 support in alfred-margaret and also give some insights into how we optimized our Haskell code for maximal performance.

Source: So Long Surrogates: How we moved to UTF-8 in Haskell, an article by Paul Brinkmeier.