We had application at one of my previous companies that typically
ran with ~2GB in memory at any given time, but simply changing the
order of some uint variables we managed to drop the memory usage to
less than 1.4GB. Let’s dive into how inefficient field ordering in
Go structs can have a huge impact on the memory footprint of a
program.
In a job interview years ago, the interviewer asked me to explain
the difference between encryption, encoding, and hashing. At the
time I was working for a company that specialized in encryption, so
I took knowing the difference for granted.
It wasn’t until much later that I understood how easily most folks
can confuse the three topics for one another. Let’s take a look at
each in turn.
IPinfo builds and sells IPv4 and IPv6 address
metadata. This is available either by API, file download or as a
Snowflake
dataset. When you present an IP address, it'll offer that IP's
physical location and ownership information. You can also see if
it's used as a VPN or Tor endpoint, is owned by a hosting company
and which domain names have been pointed at it.
Understanding and modeling uncertainty surrounding a machine
learning prediction is of critical importance to any production
model. It provides a handle to deal with cases where the model
strays too far away from its domain of applicability, into
territories where using the prediction would be inacurate or
downright dangerous. Think medical diagnosis or self-driving cars.
10 years ago, systemd was announced and swiftly rose to become one
of the most persistently controversial and polarizing pieces of
software in recent history, and especially in the GNU/Linux
world. The quality and nature of debate has not improved in the
least from the major flame wars around 2012-2014, and systemd still
remains poorly understood and understudied from both a technical and
social level despite paradoxically having disproportionate levels of
attention focused on it.
If you’re coming from Linux, you may be familiar with the ptrace
family of commands — strace and ltrace. If you’re coming from
macOS, you may have had brief encounters with dtruss or dtrace,
instead.
If you haven’t heard of them before or haven’t had the chance to
play with them, this post is for you. I’m going to show you what
they do and why they are important tools to know.
If you're familiar with Python, you probably like Rust's ranges a
lot. They're generally tidy, are lots more concise than writing out
range(...) all the time, and are a ton better than magic syntax
for slicing (thanks for that one, Guido)
Unfortunately, the redeeming qualities of Rust's range types stop
there. Behind a friendly face lurks what is perhaps the single
biggest collection of infuriating design choices in Rust's entire
standard library.
One of the things that makes DNS difficult to understand is that
it’s decentralized. There are thousands (maybe hundreds of
thousands? I don’t know!) of authoritative nameservers, and at least
10 million
resolvers. And
they’re running lots of different software! All these different
servers running software means that there’s a lot of inconsistency
in how DNS works, which can cause all kinds of frustrating problems.
When faced with a situation where you're writing code that should
work across a few different kinds of values without knowing what
they are ahead of time, Rust asks slightly more of you than many
languages do. Dynamic languages will let you pass in anything, of
course, as long as the code works when it's run. Java/C# would ask
for an interface or a superclass. Duck-typed languages like Go or
TypeScript would want some structural type- an object type with a
particular set of properties, for instance.
Rust is different. In Rust there are three main approaches for
handling this situation, and each has its own advantages and
disadvantages.
Reflection in Swift allows us to use the Mirror API to inspect and
manipulate arbitrary values at runtime. Even though Swift puts a lot
of emphasis on static typing, we can get the flexibility to gain
more control over types than you might expect.
In order for one language to cooperate with another usefully via
embedded programs in this way, data of some sort needs to be passed
between them at runtime, and here there are a few traps with syntax
that may catch out unwary shell programmers. We’ll go through a
simple example showing the problems, and demonstrate a few potential
solutions.
It is clear that most of the world has decided that they want to use
JSON for their public-facing API endpoints. However, most of the
time you will need to deal with storage engines that don't deal with
JSON very well. This can be confusing to deal with because you need
to fit a square peg into a round hole.
However, SQLite added JSON
functions to allow you to munge
and modify JSON data in whatever creative ways you want. You can use
these and SQLite
triggers in order
to automatically massage JSON into whatever kind of tables you
want. Throw in upserts and you'll be able to make things even more
automated.
In this edition of Napkin Math, we'll invoke the spirit of the
Napkin Math series to establish a mental model for how a neural
network works by building one from scratch. In a future issue we
will do napkin math on performance, as establishing the
first-principle understanding is plenty of ground to cover for
today!
It occurred to me afterward that there may be some confusion between
the warnings pragma and the related warn
function for reporting
arbitrary runtime errors.
async was controversial from its inception; it’s still
controversial today; and in this post I am throwing my own 2 cents
into this controversy, in defense of the feature. I am only going to
try to counter one particular line of criticism here, and I don’t
anticipate I’ll cover all the nuance of it – this is a multifaceted
issue, and I have a day job. I am also going to assume for this post
that you have some understanding of how async works, but if you
don’t, or just want a refresher I heartily recommend the Tokio
tutorial.
If you have ever written in Go, then the size of the resulting
binaries could not escape your attention. Of course, in the age of
gigabit links and terabyte drives, this shouldn’t be a big
problem. Still, there are situations when you want the size of the
binary to be as small as possible, and at the same time you do not
want to part with Go.
Profiling is integral to any code and performance optimization. Any
experience and skill in performance optimization that you might
already have will not be very useful if you don't know where to
apply it. Therefore, finding bottlenecks in your applications can
help you solve performance issues quickly with very little overall
effort.
In this article we will look at the tools and techniques that can
help us narrow down our focus and find bottlenecks both for CPU and
memory consumption, as well as how to implement easy (almost
zero-effort) solutions to performance issues in cases where even
well targeted code changes won't help anymore.
The latest batch of language models can be much smaller yet achieve
GPT-3 like performance by being able to query a database or search
the web for information. A key indication is that building larger
and larger models is not the only way to improve performance.
We’re in a golden age of merging AI and neuroscience. No longer tied
to conventional publication venues with year-long turnaround times,
our field is moving at record speed. As 2021 draws to a close, I
wanted to take some time to zoom out and review a recent trend in
neuro-AI, the move toward unsupervised learning to explain
representations in different brain areas.
I recently came up with what I think is an intuitive way to explain
Bayes’ Theorem. I searched in
google
for a while and could not find any article that explains it in this
particular way.
Of course there’s the wikipedia
page, that long
article by Yudkowsky, and a
bunch of other explanations and tutorials. But none of them have any
pictures. So without further ado, and with all the chutzpah I can
gather, here goes my explanation.