week 17, 2023

Sun 30 Apr 2023

PEP 647 – User-Defined Type Guards

This PEP specifies a way for programs to influence conditional type narrowing employed by a type checker based on runtime checks.

Source: PEP 647 – User-Defined Type Guards, an article by Eric Traut.

python

LLM Sandboxing: Early Lessons Learned

About two weeks ago, we launched our research project and text-based AI (sandbox) escape game Doublespeak.chat. We give the OpenAI’s Large Language Model (LLM, A.K.A. ChatGPT) a secret to keep: its name. The player’s goal is to extract that secret name. We believe we'll never win the cat-and-mouse game, but we can all have fun trying!

This post details some lessons learned in the first two weeks since Doublespeak.chat's release.

Source: LLM Sandboxing: Early Lessons Learned, an article by Matt Hamilton.

machine learning

Modern perfect hashing for strings

Looking up in a static set of strings is a common problem we encounter when parsing any textual formats. Such sets are often keywords of a programming language or protocol.

Parsing HTTP verbs appeared to be the fastest when we use a compile-time trie: a series of nested switch statements. I could not believe that a perfect hash function is not better, and that led to a novel hashing approach that is based on the instruction PEXT (Parallel Bits Extract).

Source: Modern perfect hashing for strings, an article by Wojciech Muła.

computer science

Use singular nouns for database table names

A common debate in relational database circles is whether the names of tables should be singular or plural. If you have a table that stores users, should the table be called user or users?

Source: Use singular nouns for database table names, an article by Lawrence Kesteloot.

database

Fijn Weekend (2023)

A close-knit group of friends gather in French Burgundy to finally scatter ashes of their dead friend. When the widow unexpectedly brings her new boyfriend, the weekend takes a turn, and all friendships and relationships are put on edge.

In the evening Esme and I watched Fijn Weekend. I liked the movie somewhat and give it a 6 out of 10.

Sat 29 Apr 2023

Some remarks on Large Language Models

I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Source: Some remarks on Large Language Models, an article by Yoav Goldberg.

machine learning

Exciting SQLite Improvements Since 2020

Let's take a look at some of the exciting improvements and refinements that SQLite has seen since 2020. This list focuses on changes related to the supported SQL instructions and the CLI.

Source: Exciting SQLite Improvements Since 2020, an article by Adrian Sieber.

sqlite

Nope (2022)

The residents of a lonely gulch in inland California bear witness to an uncanny and chilling discovery.

In the evening Adam and I watched Nope. To me the movie had a slow start but it got better and in the end I liked the movie. I give it a 7 out of 10.

Fri 28 Apr 2023

Beautiful Branchless Binary Search

I read a blog post by Alex Muscar, “Beautiful Binary Search in D“. It describes a binary search called “Shar’s algorithm”. I’d never heard of it and it’s impossible to google, but looking at the algorithm I couldn’t help but think “this is branchless.” And who knew that there could be a branchless binary search? So I did the work to translate it into a algorithm for C++ iterators, no longer requiring one-based indexing or fixed-size arrays.

Source: Beautiful Branchless Binary Search, an article by Malte Skarupke.

Pinging Locations

So how does an attacker find a system to attack? One approach is to send out a ping packet to every IPv4 address and listen for the reply (pong). If there's a reply, then there is a server and they can queue up the address for a subsequent attack. A single host with a gigabit connection can scan all of the IPv4 range in a few hours. (There are some scanning techniques that can go even faster, finishing in minutes.)

Source: Pinging Locations, an article by Neal Krawetz.

Performance Excuses Debunked

Whenever I point out that a common software practice is bad for performance, arguments ensue. That’s good! People should argue about these things. It helps illuminate both sides of the issue. It’s productive, and it leads to a better understanding of how software performance fits into the priorities of our industry.

What's not good is that some segments of the developer community don’t even want to have discussions, let alone arguments, about software performance. Among certain developers, there is a pervasive attitude that software simply doesn't have performance concerns anymore. They believe we are past the point in software development history where anyone should still be thinking about performance.

Source: Performance Excuses Debunked, an article by Casey Muratori.

software development

Tetris (2023)

The story of how one of the world's most popular video games found its way to players around the globe. Businessman Henk Rogers and Tetris inventor Alexey Pajitnov join forces in the USSR, risking it all to bring Tetris to the masses.

In the evening Adam, Alice, Esme, and I watched Tetris. I liked the movie a lot and give it an 8 out of 10.

Thu 27 Apr 2023

Why is OAuth still hard in 2023?

You might conclude that, armed with a client library, you would be able to implement OAuth for any API in about 10 minutes. Or at least in an hour.

If you manage, please email us — we’d like to treat you to a delicious dinner and hear how you did it.

Source: Why is OAuth still hard in 2023?, an article by Robin Guldener.

security

The Part of PostgreSQL We Hate the Most

But as much as we love PostgreSQL at OtterTune, certain aspects of it are not great. So instead of writing yet another blog article like everyone else touting the awesomeness of everyone’s favorite elephant-themed DBMS, we want to discuss the one major thing that sucks: how PostgreSQL implements multi-version concurrency control (MVCC).

Source: The Part of PostgreSQL We Hate the Most, an article by Bohan Zhang and Andy Pavlo.

postgres

urllib3 v2.0.0 is now generally available

It's my honor to present the next major release of urllib3. This major release has been in progress since 2020 and will be the foundation of future improvements to the package. Everyone on our team of contributors is excited to finally share what we've accomplished with you all.

Source: urllib3 v2.0.0 is now generally available, an article by Seth Michael Larson.

python

Wed 26 Apr 2023

More thoughts on a bootstrappable GHC

The bootstrappable builds project tries to find ways of building all our software from source, without relying on binary artifacts. A noble goal, and one that is often thwarted by languages with self-hosting compilers, like GHC: In order to build GHC, you need GHC. A Pull Request against nixpkgs, adding first steps of the bootstrapping pipeline, reminded me of the issue with GHC, which I have noted down some thoughts about before and I played around a bit more.

Source: More thoughts on a bootstrappable GHC, an article by Joachim Breitner.

haskell

JavaScript eval security best practices

JavaScript's eval() function is a powerful tool that can execute code stored as a string. However, it also poses a security risk when used improperly.

Source: JavaScript eval security best practices, an article by Oscar Salazar.

javascript

Etcd: The Unsung Hero of Kubernetes

I have worked on various projects that involve container orchestration using Kubernetes. When I was working on managed Kubernetes service project I got a chance to dive deep into Etcd one of the most critical components of Kubernetes that makes it so powerful and reliable. In this article, I will take a deep dive into etcd and explore its key features, its role in Kubernetes, and best practices for implementing it.

Source: Etcd: The Unsung Hero of Kubernetes.

kubernetes

65 (2023)

An astronaut crash lands on a mysterious planet only to discover he's not alone.

In the evening Adam, Alice, Esme, and I watched 65. The movie was OK and I give it a 6 out of 10.

Tue 25 Apr 2023

Nine ways to shoot yourself in the foot with PostgreSQL

The common thread linking most of these gotchas is scalability. They're things that won't affect you while your database is small. But if one day you want your database not to be small, it pays to think about them in advance. Otherwise they'll came back and bite you later, potentially when it's least convenient. Plus in many cases it's less work to do the right thing from the start, than it is to change a working system to do the right thing later on.

Source: Nine ways to shoot yourself in the foot with PostgreSQL, an article by Phil Booth.

postgres

chatgpt-shell updates

About a month ago, I posted about an experiment to build a ChatGPT Emacs shell using comint mode. Since then, it's turned into a package of sorts, evolving with user feedback and pull requests.

Source: chatgpt-shell updates, an article by Álvaro Ramírez.

emacs

PostgreSQL Indexes Can Hurt You

The summary in simple words: Indexes are not cheap. There is a cost, and the cost can be manifold. Indexes are not always good, and sequential scans are not always bad, either. My humble advice is to avoid looking for improving individual queries as the first step because it is a slippery slope. A top-down approach to tuning the system yields better results starting from tuning the Host machine, Operating System, PostgreSQL parameter, Schema, etc. An objective “cost-benefit analysis” is important before creating an index.

Source: PostgreSQL Indexes Can Hurt You: Negative Effects and the Costs Involved, an article by Jobin Augustine.

postgres

Mon 24 Apr 2023

Why I use Nix and make(1) to develop

After some more thinking about this subject, it was clear to me that using make the way it is presented here might give you the following advantages:

You have a standard way to build your system that Non-Nix users can leverage

It’s relatively easier to reason about the build steps

ON THE OTHER HAND, this also gave me a really interesting idea that got me excited to try. What are the advantages of using Nix as a make replacement?

You have a single tool to manage dependencies and build steps

Source: Why I use Nix and make(1) to develop, an article by Victor Freire.

Using Nix with Dockerfiles

I've been using Nix for many years and recently started building Docker images using a Dockerfile paired with Nix. This post will explain the benefits of this approach along with a basic example to show how it looks and feels.

Source: Using Nix with Dockerfiles, an article by Mitchell Hashimoto.

Leverage the richness of HTTP status codes

If you’re not a REST expert, you probably use the same HTTP codes over and over in your responses, mostly 200, 404, and 500. If using authentication, you might perhaps add 401 and 403; if using redirects 301 and 302, that might be all. But the range of possible status codes is much broader than that and can improve semantics a lot. While many discussions about REST focus on entities and methods, using the correct response status codes can make your API stand out.

Source: Leverage the richness of HTTP status codes, an article by Nicolas Fränkel.

networking