week 27, 2023

Nix shell template

Nix shells are the best tool for creating software development environments right now. This article provides a template to get you started with Nix shells from scratch, and explains how to add common features.

Source: Nix shell template, an article by Victor Engmark.

TIL - IN is not the same as ANY

Not exactly from today, rather from a month or two ago, but still on my “noteworthy list”. So after a remarkably long quiet period of no surprises (Postgres doesn’t generally surprise one badly), I managed to learn something controversial - a thing considered generally good, using ANY instead of IN-list in this case, can have downsides nevertheless!

Source: TIL - IN is not the same as ANY, an article by Kaarel Moppel.

Corinna in the Perl Core

It’s been a years-long, painful process, but with the release of Perl v.5.38,0, the first bits of Corinna have been added to the Perl core. For those who have not been following along, Corinna is a project to add a new object system to the Perl core. Note that it’s not taking anything away from Perl; it’s adding a core object system for better memory consumption, performance, and elegance.

Source: Corinna in the Perl Core, an article by Curtis “Ovid” Poe.

Demystifying Text Data with the unstructured Python Library

In the world of data, textual data stands out as being particularly complex. It doesn’t fall into neat rows and columns like numerical data does. As a side project, I’m in the process of developing my own personal AI assistant. The objective is to use the data within my notes and documents to answer my questions. The important benefit is all data processing will occure locally on my computer, ensuring that no documents are uploaded to the cloud, and my documents will remain private.

To handle such unstructured data, I’ve found the unstructured Python library to be extremely useful. It’s a flexible tool that works with various document formats, including Markdown, , XML, and HTML documents.

Source: Demystifying Text Data with the unstructured Python Library (+alternatives), an article by Saeed Esmaili.

Image Upscaling Using Neural Networks

Do you remember those classic scenes from CSI TV series? When a detective, peering at a pixelated image from a surveillance camera, instructs the tech whiz, "zoom enhance". With some keyboard strokes, the blurry image transforms, revealing a perfectly clear license plate. We've all had a good laugh at that, dismissing it as pure Hollywood bullshit, right?

Source: Image Upscaling Using Neural Networks.

Regex engine internals as a library

Over the last several years, I’ve rewritten Rust’s regex crate to enable better internal composition, and to make it easier to add optimizations while maintaining correctness. In the course of this rewrite I created a new crate, regex-automata, which exposes much of the regex crate internals as their own APIs for others to use. To my knowledge, this is the first regex library to expose its internals to the degree done in regex-automata as a separately versioned library.

This blog post discusses the problems that led to the rewrite, how the rewrite solved them and a guided tour of regex-automata’s API.

Source: Regex engine internals as a library, an article by Andrew Gallant.

Mastering Intermediate Linux Commands for Server Management

As a sysadmin, you often come across complex tasks that require more than just basic commands. That’s why it’s important to learn some intermediate-level Linux commands that can make your work easier and more efficient.

These commands can help you automate repetitive tasks, manage processes, and monitor system performance, among other things. In this article, we will explore some of these commands and their usage.

Source: Mastering Intermediate Linux Commands for Efficient Server Management, an article by Akash Rajpurohit.

Most Tests Should Be Generated

Traditional testing wisdom eventually invokes the test pyramid, which is a guide to the proportion of tests to write along the isolation / integration spectrum. There’s an eternal debate about what the best proportion should be at each level, but interestingly it’s always presented with the assumption that test cases are hand-written. We should also think about test generation as a dimension, and if I were to draw a pyramid about it I’d place generated tests on the bottom and hand-written scenarios on top, i.e. most tests should be generated.

Source: Most Tests Should Be Generated, an article by Alex Weisberger.

Two Ways to Turbo-Charge tox

The traditional way to speed up tox runs is running it as tox run-parallel (née tox --parallel or just tox -p). And while it’s currently broken in tox 4 for some users (yours truly included), it’s a great feature that Nox is sorely lacking.

But there are more ways, and I’d like to share two of them with you. Both methods don’t make much difference in CIs like GitHub Actions (just like tox run-parallel, mind you!), but they can do wonders for your local development. Which is where I have the least patience, so let’s dive right in!

Source: Two Ways to Turbo-Charge tox, an article by Hynek Schlawack.

Demystifying Pratt Parsers

Pratt parsers are a beautiful way of solving the operator precedence problem:

How can an expression like 1+2-3*4+5/6^7-8*9 be parsed to meet the expectations of your PEMDAS-trained brain? Where do you put the parentheses? What goes first?

Source: Demystifying Pratt Parsers, an article by Martin Janiczek.

How to think about async/await in Rust

Some documentation of Rust async and await has presented it as a seamless alternative to threads. Just sprinkle these keywords through your code and get concurrency that scales better! I think this is very misleading. An async fn is a different thing from a normal Rust fn, and you need to think about different things to write correct code in each case.

This post presents a different way of looking at async that I think is more useful, and less likely to lead to cancellation-related bugs.

Source: How to think about async/await in Rust, an article by Cliff L. Biffle.

Joins 13 Ways

Relational (inner) joins are really common in the world of databases, and one weird thing about them is that it seems like everyone has a different idea of what they are. In this post I’ve aggregated a bunch of different definitions, ways of thinking about them, and ways of implementing them that will hopefully be interesting. They’re not without redundancy, some of them are arguably the same, but I think they’re all interesting perspectives nonetheless.

Source: Joins 13 Ways, an article by Justin Jaffray.