Plurrrr

week 17, 2023

LLM Sandboxing: Early Lessons Learned

About two weeks ago, we launched our research project and text-based AI (sandbox) escape game Doublespeak.chat. We give the OpenAI’s Large Language Model (LLM, A.K.A. ChatGPT) a secret to keep: its name. The player’s goal is to extract that secret name. We believe we'll never win the cat-and-mouse game, but we can all have fun trying!

This post details some lessons learned in the first two weeks since Doublespeak.chat's release.

Source: LLM Sandboxing: Early Lessons Learned, an article by Matt Hamilton.

Modern perfect hashing for strings

Looking up in a static set of strings is a common problem we encounter when parsing any textual formats. Such sets are often keywords of a programming language or protocol.

Parsing HTTP verbs appeared to be the fastest when we use a compile-time trie: a series of nested switch statements. I could not believe that a perfect hash function is not better, and that led to a novel hashing approach that is based on the instruction PEXT (Parallel Bits Extract).

Source: Modern perfect hashing for strings, an article by Wojciech Muła.

Fijn Weekend (2023)

A close-knit group of friends gather in French Burgundy to finally scatter ashes of their dead friend. When the widow unexpectedly brings her new boyfriend, the weekend takes a turn, and all friendships and relationships are put on edge.

In the evening Esme and I watched Fijn Weekend. I liked the movie somewhat and give it a 6 out of 10.

Some remarks on Large Language Models

I assume you heard of chatGPT, maybe played with it a little, and was imressed by it (or tried very hard not to be). And that you also heard that it is "a large language model". And maybe that it "solved natural language understanding". Here is a short personal perspective of my thoughts of this (and similar) models, and where we stand with respect to language understanding.

Source: Some remarks on Large Language Models, an article by Yoav Goldberg.

Nope (2022)

The residents of a lonely gulch in inland California bear witness to an uncanny and chilling discovery.

In the evening Adam and I watched Nope. To me the movie had a slow start but it got better and in the end I liked the movie. I give it a 7 out of 10.

I read a blog post by Alex Muscar, “Beautiful Binary Search in D“. It describes a binary search called “Shar’s algorithm”. I’d never heard of it and it’s impossible to google, but looking at the algorithm I couldn’t help but think “this is branchless.” And who knew that there could be a branchless binary search? So I did the work to translate it into a algorithm for C++ iterators, no longer requiring one-based indexing or fixed-size arrays.

Source: Beautiful Branchless Binary Search, an article by Malte Skarupke.

Pinging Locations

So how does an attacker find a system to attack? One approach is to send out a ping packet to every IPv4 address and listen for the reply (pong). If there's a reply, then there is a server and they can queue up the address for a subsequent attack. A single host with a gigabit connection can scan all of the IPv4 range in a few hours. (There are some scanning techniques that can go even faster, finishing in minutes.)

Source: Pinging Locations, an article by Neal Krawetz.

Performance Excuses Debunked

Whenever I point out that a common software practice is bad for performance, arguments ensue. That’s good! People should argue about these things. It helps illuminate both sides of the issue. It’s productive, and it leads to a better understanding of how software performance fits into the priorities of our industry.

What's not good is that some segments of the developer community don’t even want to have discussions, let alone arguments, about software performance. Among certain developers, there is a pervasive attitude that software simply doesn't have performance concerns anymore. They believe we are past the point in software development history where anyone should still be thinking about performance.

Source: Performance Excuses Debunked, an article by Casey Muratori.

Tetris (2023)

The story of how one of the world's most popular video games found its way to players around the globe. Businessman Henk Rogers and Tetris inventor Alexey Pajitnov join forces in the USSR, risking it all to bring Tetris to the masses.

In the evening Adam, Alice, Esme, and I watched Tetris. I liked the movie a lot and give it an 8 out of 10.

Why is OAuth still hard in 2023?

You might conclude that, armed with a client library, you would be able to implement OAuth for any API in about 10 minutes. Or at least in an hour.

If you manage, please email us — we’d like to treat you to a delicious dinner and hear how you did it.

Source: Why is OAuth still hard in 2023?, an article by Robin Guldener.

More thoughts on a bootstrappable GHC

The bootstrappable builds project tries to find ways of building all our software from source, without relying on binary artifacts. A noble goal, and one that is often thwarted by languages with self-hosting compilers, like GHC: In order to build GHC, you need GHC. A Pull Request against nixpkgs, adding first steps of the bootstrapping pipeline, reminded me of the issue with GHC, which I have noted down some thoughts about before and I played around a bit more.

Source: More thoughts on a bootstrappable GHC, an article by Joachim Breitner.

Etcd: The Unsung Hero of Kubernetes

I have worked on various projects that involve container orchestration using Kubernetes. When I was working on managed Kubernetes service project I got a chance to dive deep into Etcd one of the most critical components of Kubernetes that makes it so powerful and reliable. In this article, I will take a deep dive into etcd and explore its key features, its role in Kubernetes, and best practices for implementing it.

Source: Etcd: The Unsung Hero of Kubernetes.

65 (2023)

An astronaut crash lands on a mysterious planet only to discover he's not alone.

In the evening Adam, Alice, Esme, and I watched 65. The movie was OK and I give it a 6 out of 10.

Nine ways to shoot yourself in the foot with PostgreSQL

The common thread linking most of these gotchas is scalability. They're things that won't affect you while your database is small. But if one day you want your database not to be small, it pays to think about them in advance. Otherwise they'll came back and bite you later, potentially when it's least convenient. Plus in many cases it's less work to do the right thing from the start, than it is to change a working system to do the right thing later on.

Source: Nine ways to shoot yourself in the foot with PostgreSQL, an article by Phil Booth.

PostgreSQL Indexes Can Hurt You

The summary in simple words: Indexes are not cheap. There is a cost, and the cost can be manifold. Indexes are not always good, and sequential scans are not always bad, either. My humble advice is to avoid looking for improving individual queries as the first step because it is a slippery slope. A top-down approach to tuning the system yields better results starting from tuning the Host machine, Operating System, PostgreSQL parameter, Schema, etc. An objective “cost-benefit analysis” is important before creating an index.

Source: PostgreSQL Indexes Can Hurt You: Negative Effects and the Costs Involved, an article by Jobin Augustine.

Why I use Nix and make(1) to develop

After some more thinking about this subject, it was clear to me that using make the way it is presented here might give you the following advantages:

  1. You have a standard way to build your system that Non-Nix users can leverage
  2. It’s relatively easier to reason about the build steps

ON THE OTHER HAND, this also gave me a really interesting idea that got me excited to try. What are the advantages of using Nix as a make replacement?

  1. You have a single tool to manage dependencies and build steps

Source: Why I use Nix and make(1) to develop, an article by Victor Freire.

Leverage the richness of HTTP status codes

If you’re not a REST expert, you probably use the same HTTP codes over and over in your responses, mostly 200, 404, and 500. If using authentication, you might perhaps add 401 and 403; if using redirects 301 and 302, that might be all. But the range of possible status codes is much broader than that and can improve semantics a lot. While many discussions about REST focus on entities and methods, using the correct response status codes can make your API stand out.

Source: Leverage the richness of HTTP status codes, an article by Nicolas Fränkel.