Code that performs side effects is difficult to test because we need
figure out how to sandbox the effects so we can observe the state of
the sandbox before and after executing the effectful code. The
difficulty is increased when the side effectful code also depends on
specific OS configurations. Let us explore my solution to such a
predicament.
DockerSlim is a tool for developers that provides a set of commands
(build, xray, lint and others) to simplify and optimize your
developer experience with containers. It makes your containers
betters, smaller and more secure.
Every so often I get an email from someone starting out in web
development who asks something along these lines: “What do you use
to create your website, benhoyt.com? Do you use a Content
Management System? What theme do you use?”
I generally reply with a brief response, saying how I like to keep
it simple: I use my text editor to write Markdown files, test
locally using the Jekyll static site generator, and then push them
live to GitHub Pages using a Git tool. I don’t use a fancy “theme”,
just a simple layout I created using a few dozen lines of HTML and
CSS.
The untold story of one twelve-year-old's dream to become the
world's greatest supervillain.
In the early evening Alice, Adam, my mother-in-law, and I went to
Delft to watch the 3D version of Minions: The Rise of
Gru. I liked the movie and
give it a 7 out of 10.
How can you determine whether a JavaScript native function was
overridden? You can’t — or at least not reliably. There are ways to
get close to it, but you can’t fully trust them.
A semantic network or net is a graph structure for representing
knowledge in patterns of interconnected nodes and arcs. Computer
implementations of semantic networks were first developed for
artificial intelligence and machine translation, but earlier
versions have long been used in philosophy, psychology, and
linguistics. The Giant Global Graph of the Semantic Web is a large
semantic network (Berners-Lee et al. 2001; Hendler & van Harmelen
2008).
I’ll reiterate what I wrote in the previous article’s introduction:
You should avoid these worst practices—and eliminate them when you
maintain or refactor existing code. And, of course, resolve them if
you see these issues during a code review.
There are lots of posts trying to show how simple it is to get
started with Kubernetes. But many of these posts use complicated
Kubernetes jargon for that, so even those with some prior
server-side knowledge might be bewildered. Let me try something
different here. Instead of explaining one unfamiliar matter (how to
run a web service in Kubernetes?) with another (you just need a
manifest, with three sidecars and a bunch of gobbledygook), I'll
try to reveal how Kubernetes is actually a natural development of
the good old deployment techniques.
In this article, I will introduce you to the concept of kinds. Then,
we’ll use our newfound knowledge to understand what are
higher-kinded types and what makes them useful.
You can use org-mode as a notebook, something like a Jupyter
notebook, but much simpler. An org file is a plain text file, and
you can execute embedded code right there in your editor. You don’t
need a browser, and there’s no hidden state.
I've long been an enthusiastic user of print based
debugging,
although I did eventually realize that I reach for a debugger when
dealing with certain sorts of
bugs. But
print based debugging is eternally controversial, with any number of
people ready to tell you that you should use a debugger instead and
that you're missing out by not doing so. Recently I had a thought
about that and how it interacts with how much programming people do.
This article is about how to filter unique items from heterogeneous
lists on the type level in Haskell. This example, without further
context, might look a bit esoteric by itself, but I learned a lot
writing it and wanted to share the experience.
Sometimes, while working on macOS, you may find the need to test
something quick on Linux, or use some utility that's only available
on this OS. But, of course, you don't want to go through all the
process of creating the VM from scratch.
The good news is, you don't need to! Using
krunvm you can create and
start a microVM from a regular container image (that is, an OCI
image), in just two commands and a couple of seconds.
I recently started using the Nix Package
Manager on macOS and the process has been
painful. In
this post, I’m going to write down how I’m currently using Nix on
macOS with the Zsh shell.
This post introduces some patterns and tricks to better utilise
Rust's type system for clean and safe code.
This post is on the advanced side and in general there are no
absolutes - these patterns usually need to be evaluated on a
case-by-case basis to see if the cost / benefit trade-off is worth
it.
asdf is a tool version manager. All tool version definitions are
contained within one file (.tool-versions) which you can check in
to your project's Git repository to share with your team, ensuring
everyone is using the exact same versions of tools.
The old way of working required multiple CLI version managers, each
with their distinct API, configurations files and implementation
(e.g. $PATH manipulation, shims, environment variables,
etc...). asdf provides a single interface and configuration file
to simplify development workflows, and can be extended to all tools
and runtimes via a simple plugin interface.
macOS has a wonderful input mechanism where you press and hold a key
on your keyboard to display the accent menu. It's easy to
internalize: long press "a" if you want to input "á".
Fifteen years ago, writing Lisp code in Vim was an odd
adventure. There were no good plugins for Vim that assisted in
structured editing of Lisp s-expressions or allowed interactive
programming by embedding a Lisp Read-Eval-Print-Loop (REPL) or a
debugger within the editor. The situation is much better now. In
the last ten years, we have seen active development of two Vim
plugins named Slimv and
Vlime. Slimv is over 10 years
old now. Vlime is more recent and less than 3 years old right
now. Both support interactive programming in Lisp.
I am going to discuss and compare both Slimv and Vlime in this
article. I will show how to get started with both plugins and
introduce some of their basic features.
However, not every hash algorithm is appropriate in all of these
scenarios, and in fact, very few algorithms are usable in more than
a couple of situations. Even worse, using the wrong algorithm will
lead in the best case scenario to performance problems, but in the
worst case scenario to security issues and even financial
loss. Thus, knowing which algorithm to pick for which application is
crucial.
Therefore I'll try to summarize how I approach the topic of hashing,
including use-cases, recommended algorithms, and links to other
articles.
So imagine you need to get multiple files and folders from an
API. One option for doing so is to get all the file names and
request them from what ever file server you are using. This is
terrible don't do this. The optimal way is to bundle the entire
directory into a compressed format and distribute that one
file. Okay great, say you needed these files in an iOS/iPadOS or
MacOS application. That means you will need to decompress the files
that you received in swift.
Timsort is a sorting algorithm that is efficient for real-world data
and not created in an academic laboratory. Tim Peters created
Timsort for the Python programming language in 2001. Timsort first
analyses the list it is trying to sort and then chooses an approach
based on the analysis of the list.
If you’ve been running PostgreSQL for a while, you’ve heard about
autovacuum. Yes, autovacuum, the thing which everybody asks you not
to turn off, which is supposed to keep your database clean and
reduce bloat automatically.
And yet—imagine this: one fine day, you see that your database size
is larger than you expect, the I/O load on your database has
increased, and things have slowed down without much change in
workload. You begin looking into what might have happened. You run
the excellent Postgres bloat
query and you
notice you have a lot of bloat. So you run the VACUUM command
manually to clear the bloat in your Postgres database. Good!
But then you have to address the elephant in the room: why didn’t
Postgres autovacuum clean up the bloat in the first place…? Does the
above story sound familiar? Well, you are not alone. 😊
Old Tom Bombadil. Possibly the least liked character in The Lord of
the Rings. A childish figure so disliked by fans of the book that
few object to his absence from all adaptations of the story. And
yet, there is another way of looking at Bombadil, based only on what
appears in the book itself, that paints a very different picture of
this figure of fun.
What do we know about Tom Bombadil? He is fat and jolly and smiles
all the time. He is friendly and gregarious and always ready to help
travellers in distress.
How many times have you been aware of text's different shapes and
sizes while browsing the web lately? Probably not many, unless you
found an extremely uncomfortable typography that pushed you to
quickly flee the website.
Typography is a silent tool that UX designers and developers can
sometimes take for granted. There is much noise around this
topic. Pixels? Are breakpoints enough to switch sizes across
devices? Do we even need breakpoints at all?
Let’s find out about a few key concepts to succeed at a responsive
and accessible typography as a front-end developer or as a UX
designer.
In this document, we propose adding a new function called move to
the swift standard library, which ends the lifetime of a specific
local let, local var, or consuming function parameter, and
which enforces this by causing the compiler to emit a diagnostic
upon any uses that are after the move function. This allows for code
that relies on forwarding ownership of values for performance or
correctness to communicate that requirement to the compiler and to
human readers.
How do you create a Python package? How do you set up automated
testing and code coverage? How do you publish the package? That's
what this article teaches you.
This constrained size means that SQLite doesn't include every bell
and whistle. It's careful to include the 95% of what you need in a
database—strong SQL support, transactions, windowing functions,
CTEs, etc—without cluttering the source with more esoteric
features. This limited feature set also means the structure of the
database can stay simple and makes it easy for anyone to understand.
I accidentally stumbled upon something yesterday that I felt like
sharing, which fell squarely into the "why the hell didn’t I know
about this before?" category. In this post, I’ll describe how to
manage the various configuration files in your GNU/Linux home
directory (aka "dotfiles" like .bashrc) using GNU Stow.
Vectorization in Python, as implemented by NumPy, can give you
faster operations by using fast, low-level code to operate on bulk
data. And Pandas builds on NumPy to provide similarly fast
functionality. But vectorization isn’t a magic bullet that will
solve all your problems: sometimes it will come at the cost of
higher memory usage, sometimes the operation you need isn’t
supported, and sometimes it’s just not relevant.
Y’know that situation where you tell the client, “Here’s your
website and you can edit those four (4) little homepage features in
the CMS” and the client says “Okay okay okay” and you check the site
a week later and it looks bad because the client —despite your
incredible documentation— put an odd number of items in the feature
grid? It’s a major minor problem that’s tough to explain to the
client, but it all comes down to…
Many organizations that adopt Docker or an adjacent
containerization
technology
find it increases efficiency and accelerates the development
process. Docker’s not something that magically improves every system
though. In this article, we’ll look at some scenarios where moving
to containers might be more of a hindrance than a help.
In order to effectively write applications that communicate via
sockets, there were some realizations I needed to make that weren't
explicitly told to me by any of the documentation I read.
If you have experience writing applications using sockets, all of
this information should be obvious to you. It wasn't obvious to me
as an absolute beginner, so I'm trying to make it more explicit in
the hopes of shortening another beginner's time getting their feet
wet with sockets.
Mathematicians are a peculiar people. We live in our own little
world, studying esoteric ideas that may or may not have much
connection to the real world. One might wonder whether the bulk of
what we study is actually useful. It’s true, after all, that we
often pursue ideas not because there’s an immediate application, but
simply because they’re interesting. By and large, it seems we aren’t
overly concerned about immediate real-world application of our
results.
Don’t start campaigning to cut math funding just yet, though. Math
that doesn’t have applications today may very quickly become
extremely important, even becoming integral to our way of life!
Software is written for people to understand; variable names
should be chosen accordingly. People need to comb through your code
and understand its intent in order to extend or fix it. Too often,
variable names waste space and hinder comprehension. Even
well-intentioned engineers often choose names that are, at best,
only superficially useful. This document is meant to help engineers
choose good variable names. It artificially focuses on code reviews
because they expose most of the issues with bad variable
names. There are, of course, other reasons to choose good variable
names (such as improving code maintenance).
In many projects, the approach to dates is quite nonchalant.
People do as they want. When on-premise systems were king, the
common problem was that it was hard to know precisely when something
happened. The consistency of the configuration depended on how
meticulous ops people were. It wasn’t shocking to find out that the
server had a different time zone, the application had a different
one, and the user had a different time zone. At one point, the
development community found a compromise that “maybe we would use
the same time zone everywhere, for instance
UTC”.
In 2019, Netcraft found 74.2% of web-facing machines run
Linux. During an IPv4-wide census in 2016, an OpenSSH banner was
detected 75% of the time when there was a response on TCP
port 22. It's safe to say OpenSSH is probably the world's most
popular software for connecting to servers remotely. It's also one
of the most prized attack vectors given the functionality offered to
anyone able to connect.
Hardening the security aspects of an OpenSSH configuration is very
challenging. It's even worse for teams that aren't focused on
network security and can't justify the budget for consultants
setting up bespoke systems.
Source: Hardening
SSH, an article by
Mark Litwintschik.
When your Celery tasks are too slow, and you want them to run
faster, you usually want to find and then fix the performance
bottleneck. It’s true, you can architect a solution where slow tasks
don’t impact faster ones, and you may sometimes need to. But if you
can manage to make all your tasks fast, that is ideal.
n this post we’re going to see how we can stitch together a few
libraries to make a unit-aware queryable data frame from a CSV using
extensible records. By the end of this text, we’ll be able to parse
a CSV of data from the periodic table, complete with the correct
units, and able to quickly ask questions about our data set using
the generated indices.
In my time pretending to be an engineer and working with git at
Twitter, I’ve seen an interesting behavior pop up
intermittently. People start complaining about git-push being
slow. This particular issue becomes hard to diagnose, especially
since the pandemic because we can’t be certain of the quality of
connection being used, and optimizations to git-push has always
taken a back seat to all the other changes we’ve done to git
internally. But it has persisted long enough that it needed some
deeper diving into, and the intermittent nature always fascinated
me. Let’s talk about the problem a little more.
Yesterday, in the early evening, I noticed that the Chromatopelma
cyaneopubescens I keep had molted. And today, because I could guide
it carefully in a different position, I took a few photos.
Freshly molted Chromatopelma cyaneopubescens.
In the photo above you can see why this tarantula has the common name
green bottle blue tarantula or GBB for short.
Today we’ll look at ways of productionizing the toy program from the
previous post. Our primary goal here is allowing the user to select
various statistics, computing just what the user has selected to
compute. We’ll try to do this in a modular and composable way,
striving to isolate each statistic into its own unit of some sorts.
I've been working with Python typing annotation in the last few
years as part of our main product at Flare Systems. I've found it to
be a wonderful tool to support refactoring and make the code more
readable. Lately, I explored how we can make API safer with the uses
of types. I will specifically look about how we can use Python
typing annotation to make os.system foolproof.