Nix shells are the best tool for creating software development
environments right now. This article provides a template to get you
started with Nix shells from scratch, and explains how to add common
features.
Not exactly from today, rather from a month or two ago, but still on
my “noteworthy list”. So after a remarkably long quiet period of no
surprises (Postgres doesn’t generally surprise one badly), I managed
to learn something controversial - a thing considered generally
good, using ANY instead of IN-list in this case, can have downsides
nevertheless!
It’s been a years-long, painful process, but with the release of
Perl
v.5.38,0,
the first bits of Corinna have been added to the Perl core. For
those who have not been following along, Corinna is a project to add
a new object system to the Perl core. Note that it’s not taking
anything away from Perl; it’s adding a core object system for better
memory consumption, performance, and elegance.
It is extremely rare for a new plant species to be discovered in
Japan, a nation where flora has been extensively studied and
documented. Nevertheless, Professor Suetsugu Kenji and his
associates have recently uncovered a stunning new species of orchid
whose rosy pink petals bear a striking resemblance to glasswork.
In the world of data, textual data stands out as being particularly
complex. It doesn’t fall into neat rows and columns like numerical
data does. As a side project, I’m in the process of developing my
own personal AI assistant. The objective is to use the data within
my notes and documents to answer my questions. The important benefit
is all data processing will occure locally on my computer, ensuring
that no documents are uploaded to the cloud, and my documents will
remain private.
To handle such unstructured data, I’ve found the unstructured
Python library to be extremely useful. It’s a flexible tool that
works with various document formats, including Markdown, , XML, and
HTML documents.
Do you remember those classic scenes from CSI TV series? When a
detective, peering at a pixelated image from a surveillance camera,
instructs the tech whiz, "zoom enhance". With some keyboard strokes,
the blurry image transforms, revealing a perfectly clear license
plate. We've all had a good laugh at that, dismissing it as pure
Hollywood bullshit, right?
Over the last several years, I’ve rewritten Rust’s regex
crate to enable better
internal composition, and to make it easier to add optimizations
while maintaining correctness. In the course of this rewrite I
created a new crate,
regex-automata,
which exposes much of the regex crate internals as their own APIs
for others to use. To my knowledge, this is the first regex library
to expose its internals to the degree done in regex-automata as a
separately versioned library.
This blog post discusses the problems that led to the rewrite, how
the rewrite solved them and a guided tour of regex-automata’s API.
As a sysadmin, you often come across complex tasks that require more
than just basic commands. That’s why it’s important to learn some
intermediate-level Linux commands that can make your work easier and
more efficient.
These commands can help you automate repetitive tasks, manage
processes, and monitor system performance, among other things. In
this article, we will explore some of these commands and their
usage.
In this opinionated series, my aim is to provide a structured path
that takes you from a simple NixOS configuration to a more complex
one, while explaining the underlying concepts along the way.
Traditional testing wisdom eventually invokes the test pyramid,
which is a guide to the proportion of tests to write along the
isolation / integration spectrum. There’s an eternal debate about
what the best proportion should be at each level, but interestingly
it’s always presented with the assumption that test cases are
hand-written. We should also think about test generation as a
dimension, and if I were to draw a pyramid about it I’d place
generated tests on the bottom and hand-written scenarios on top,
i.e. most tests should be generated.
The traditional way to speed up tox runs is running it as tox run-parallel (née tox --parallel or just tox -p). And while
it’s currently broken in tox
4 for some users (yours
truly included), it’s a great feature that Nox is sorely lacking.
But there are more ways, and I’d like to share two of them with
you. Both methods don’t make much difference in CIs like GitHub
Actions (just like tox run-parallel, mind you!), but they can do
wonders for your local development. Which is where I have the least
patience, so let’s dive right in!
People change their names for all sorts of reasons. They get
married, they transition, or they just decide a different name
better suits them. When this happens, things break. Recently I
talked about how email address changes break
things. Today
it's how to fix this issue with git.
Pratt parsers are a beautiful way of solving the operator precedence problem:
How can an expression like 1+2-3*4+5/6^7-8*9 be parsed to meet the
expectations of your
PEMDAS-trained
brain? Where do you put the parentheses? What goes first?
Some documentation of Rust async and await has presented it as a
seamless alternative to threads. Just sprinkle these keywords
through your code and get concurrency that scales better! I think
this is very misleading. An async fn is a different thing from a
normal Rust fn, and you need to think about different things to
write correct code in each case.
This post presents a different way of looking at async that I
think is more useful, and less likely to lead to
cancellation-related bugs.
Relational (inner) joins are really common in the world of
databases, and one weird thing about them is that it seems like
everyone has a different idea of what they are. In this post I’ve
aggregated a bunch of different definitions, ways of thinking about
them, and ways of implementing them that will hopefully be
interesting. They’re not without redundancy, some of them are
arguably the same, but I think they’re all interesting perspectives
nonetheless.
Source: Joins 13 Ways, an
article by Justin Jaffray.
Every now and then, I get a PR from a well-meaning contributor
trying to add __all__ to a Python module for whatever reason. I
always decline these, they are unnecessary (at least for the way I
structure my code) and I thought I’d write a short post explaining
why.
In this post I want to provide you with a practical introduction to
structured concurrency. I will do my best to explain what it is, why
it's relevant, and how you can start applying it to your rust
projects today. Structured concurrency is a lens I use in almost all
of my reasoning about async Rust, and I think it might help others
too.
On a quiet day, away from the hustle of Richmond, in a small cottage
on the Virginia coast, Dr. Kay Scarpetta receives a disturbing phone
call from the Chesapeake police. Thirty feet deep in the murky
waters of Virginia's Elizabeth River, a scuba diver's body is
discovered near the Inactive Naval Shipyard.As the police begin
searching for clues, the wallet of investigative reporter Ted
Eddings is found.
Unnerved by the possible identity of the victim, Scarpetta orders
the crime scene roped off and left alone until she arrives. What was
he doing there, searching for Civil War relics as the officer
suggested, or was there a bigger story? As she rifles through the
multitude of clues, a second murder hits much closer to home. This
new development puts Scarpetta and her colleagues hot on the trail
of a military conspiracy.
In the evening I started in Cause of
Death,
Kay Scarpetta book 7 by Patricia Cornwell.
Every now and then, at work, I find myself discussing git worfklows,
commit messages, branching, releasing, versioning, changelogs
etc. Since my opinion has remained fairly consistent for the past
few years, I found myself repeating the same points a lot, so I
wrote it down. This page is the resulting compilation of my opinions
on the software development lifecycle (SDLC), without
workplace-specific tangeants.
I’d like to preface this article by saying that it is not an
authoritative guide, rather it is just me documenting my experience
figuring various things out, in the hope that it’ll be useful or
interesting to someone else. I assume some knowledge of Nix and
containerization throughout this article.
When you’re creating a Python package, one of the nice things is
that you can just add a .py source file to the package directory,
and then users can import it right away.
Ticks can be attracted across air gaps several times larger than
themselves by the static electricity that their hosts naturally
accumulate, researchers at the University of Bristol have
discovered.
Our team had some fun experimenting with Python 3.9-nogil, the
results of which will be reported in an upcoming blog post. In the
meantime, we saw an opportunity to dive deeper into the history of
the global interpreter lock (GIL), including why it makes Python so
easy to integrate with and the tradeoff between ease and
performance.
As developers, we often find ourselves working in multiple tmux
panes, each running different applications or instances of the same
application. When we make changes to a configuration file, such as
~/.vimrc for Vim or ~/.aliases for our shell, we need to
manually reload that configuration in each relevant instance. This
can be a time-consuming process, especially when working with a
large number of panes. But also, let's be wizards and automate this
process!
In this post, we'll explore a simple automation that can save you a
lot of time and effort. We'll focus on a specific use case —
reloading a .vimrc file across all Vim instances in tmux panes —
but the pattern can be applied to a variety of scenarios.
As I tend to do, I picked a topic to write about that is much larger
in scope than I could manage in a reasonable amount of time. Did I
learn? Apparently not. This article started off with switching from
zsh to
fish. Then I thought, "Might as well
manage it all with Nix!", which led me to switch to home
manager to manage
my dotfiles which led me to using Nix everywhere I possibly could.
As expected, using Nix where it's not supported caused some
issues. Buckle up, and watch my slow descent into madness (Nix).
CLI tools hidden in the Python standard
library in
which Simon Willison wonders what little tools are lurking in the
Python standard library, available on any computer with a working
Python installation?
An explaination of how to implement practical procedural macros in
the Rust programming
language. Explains the different types of macros, then shows an
implementation of a procedural macro following best practices,
focusing on testing and ergonomics. Assumes some familiarity with
Rust.
In this post, we will analyze some data covering years of early
adoption of Rust here at Google. At Google, we have been seeing
increased Rust
adoption,
especially in our consumer applications and platforms. Pulling from
the over 1,000 Google developers who have authored and committed
Rust code as some part of their work in 2022, we’ll address some
rumors head-on, both confirming some issues that could be improved
and sharing some enlightening discoveries we have made along the
way.
FreeBSD networking and containers (Jails) stacks are very mature and
provide lots of useful features … yet for some reason these features
are not properly advertised by the FreeBSD project … or not even
documented at all.
ORMs are one of those things that software writers like to pick
on. There are many online articles that go by the same tune: “ORMs
are an anti-pattern. They are a toy for startups, but eventually
hurt more than help.”
This is an exaggeration. ORMs aren’t bad. Are they perfect?
Definitely not, just like anything else in software. At the same
time, the criticisms are expected—two years ago, I would’ve agreed
with that stereotyped headline wholeheartedly. I’ve had my share of
“What do you mean the ORM ran the server out of memory?” incidents.
But in reality, ORMs are more misused than overused.
If you’re doing numeric calculations, NumPy is a lot faster than
than plain Python—but sometimes that’s not enough. What should you
do when your NumPy-based code is too slow?
Your first thought might be parallelism, but that should probably be
the last thing you consider. There are many speedups you can do
before parallelism becomes helpful, from algorithmic improvements to
working around NumPy’s architectural limitations.
Let’s see why NumPy can be slow, and then some solutions to help
speed up your code even more.
The ultimate Python library in building OAuth and OpenID Connect
servers. It is designed from low level specifications
implementations to high level frameworks integrations, to meet the
needs of everyone.
We have released a new "Cookbook of Self-Supervised
Learning,” a practical guide for
AI researchers and practitioners on how to navigate SSL recipes,
understand its various knobs and levers, and gain the know-how
needed to experiment with SSL's untapped flavors. This is part of
our efforts to lower the barrier and help democratize access to SSL
research. You’ll also find tips and tricks from more than a dozen
authors across multiple universities, including New York University,
University of Maryland, UC Davis, University of Montreal; as well as
leading Meta AI researchers, such as Yann LeCun.
Codon is a high-performance Python compiler that compiles Python
code to native machine code without any runtime overhead. Typical
speedups over Python are on the order of 100x or more, on a single
thread. Codon supports native multithreading which can lead to
speedups many times higher still.
The Codon framework is fully modular and extensible, allowing for
the seamless integration of new modules, compiler optimizations,
domain-specific languages and so on. We actively develop Codon
extensions for a number of domains such as bioinformatics and
quantitative finance.
macOS is fortunate to have access to the huge arsenal of standard
Unix tools. There are also a good number of macOS-specific
command-line utilities that provide unique macOS functionality. To
view the full documentation for any of these commands, run man <command>.
An unidentified nude female sits propped against a fountain in
Central Park. There are no signs of struggle. When Dr. Kay Scarpetta
and her colleagues Benton Wesley and Pete Marino arrive on the
scene, they instantly recognize the signature of serial killer
Temple Brooks Gault. Scarpetta, on assignment with the FBI, visits
the New York City morgue on Christmas morning, where she must use
her forensic expertise to give a name to the nameless—a difficult
task. But as she sorts through conflicting forensic clues, Gault
claims his next victim. He has infiltrated the FBI’s top secret
artificial-intelligence system developed by Scarpetta’s niece, and
sends taunting messages as his butchery continues, moving
terrifyingly closer to Scarpetta herself.
In the afternoon I started in From Potter's
Field,
Kay Scarpetta book 6 by Patricia Cornwell.
If you’ve ever had to write a parser before, you know that creating
parsers can be a tedious and complicated process. The good news is
that it doesn’t have to be this way. In this post, I’m going to
introduce parser combinators - a technique for building parsers
that I’ve found to be both practical and fun to play around
with1.
Gitflow is, by far, the most popular branching model and possibly
the one that has endured the test of time the most. Introduced by
Vincent Driessen in
2010,
its fundamental idea is that you should isolate your work into
different types of git branches.
Other branching strategies, such as the centralized workflow (for
those teams that come from SVN), and the forking workflow (for
open-source projects) exist. Git, as a version control system, only
details basic branching operations, and it remains controversial as
to which approach is the best. Beyond those basic branching
operations, it's a matter of opinion.
> In this article we will compare Gitflow with its newer approach,
The first time, I've heard of Raku was
maybe a year ago. I was too busy to look into it though. I've done
that now and BOY OH BOY, do I like this language.
PostgreSQL 9.5 introduces a new SKIP LOCKED option to SELECT ... FOR
[KEY] UPDATE|SHARE. It’s used in the same place as NOWAIT and, like
NOWAIT, affects behaviour when the tuple is locked by another
transaction.
The main utility of SKIP LOCKED is for building simple, reliable and
efficient concurrent work queues.
I have occasionally ended up with files I did not want in my git
repositories. These can both take up a lot of space, and contain
sensitive data that we just want to remove (such as MySQL dumps,
deploy keys etc).
Git keeps a history of all files, so just deleting the file doesn’t
“make it go away”. The only way to completely remove the file is to
scan through all history, removing all references to (and history
of) those files, and finally pruning the git repo (physically
removing references to what we just deleted). Finally you have to
force-push the repo changes back to the remote, overwriting the
remote.
It’s hard to create efficient algorithms without understanding the
time and space complexity of various operations. The concept of Big
O notation helps programmers understand how quickly or slowly an
algorithm will execute as the input size grows.
In this article, we’ll cover the basics of Big O notation, why it is
used and how describe the time and space complexity of algorithms
with example.
I am unreasonably excited about
passkeys, I’ve
long been looking
for a better/more convenient way than passwords to do
authentication, and I think passkeys are finally it.
However, whenever I see passkeys mentioned (for example on the
recent Tailscale post about
them), there are always a lot of misconceptions that surface in the
debate. I’d like to clear some of them here, and hopefully explain a
bit better what passkeys are.