I am a big fan of Java Memory Management and in this article, I will
try to explain how to take and analyze heap dump with examples, but
let’s refresh our minds and remember what we know about this
domain. After some theoretical information, we will take a heap dump
and will analyze it for a simple application.
The Python standard library
does not have “side effects” besides reading a stream of text
input. Because I assumed YAML was equivalent to JSON and had not
read the 23,000+ word spec, I
assumed that PyYAML’s yaml.load had the same properties. Last
June, I learned that this was incorrect.
In tip #7 of 10 Common Security Gotchas in
I learned that using yaml.load could run arbitrary code. While the
danger of this possibility is limited only by your imagination, the
article provided the very plausible example of having your passwords
emailed to a hacker.
Back in 1998, Rob Pike – of Go and Plan 9 fame – wrote a simple
regular expression matcher in C for The Practice of Programming, a
book he wrote with fellow Unix hacker Brian Kernighan. If you
haven’t read Kernighan’s
of this code, it’s definitely worth the 30-minute time investment it
takes to go through that slowly.
With Go’s C heritage (and Pike’s influence on the Go language), I
thought I’d see how well the C code would translate to Go, and
whether it was still elegant.
Lockdown Mode is a new Apple feature you should hope you’ll
never need to use. But for those who do, like journalists,
politicians, lawyers and human rights defenders, it’s a last line of
defense against nation-state spyware designed to punch through an
Running a Linux VM from my MacBook Pro is how I spend much of my
time during software development. In this post, I compare multiple
solutions to this problem, with a focus on how they perform with I/O
I recently acquired a copy of Programming
often referred to as "The Camel Book" in the Perl community. After
reading the first 4 chapters I thought I would share a few Perl
tidbits I found interesting.
Pretty much every Python programmer out there has broken down at one
point and and used the
‘pickle’ module for
writing objects out to disk.
The advantage of using pickle is that it can serialize pretty much
any Python object, without having to add any extra code. Its also
smart in that in will only write out any single object once, making
it effective to store recursive structures like graphs. For these
reasons pickle is usually the default serialization mechanism in
Python, used in modules likes
However, using pickle is still a terrible idea that should be
avoided whenever possible.
When we write tests, we've inevitably got to choose an interface to
write against - for a unit test, this is the interface of whatever
unit is under test, usually the type
signature of a
function or the public methods on a class. These unit interfaces
tend to change often in response to refactoring, optimisations, new
requirements and so on, and they should be able to change quickly
too, or we make any of these important improvements.
Instead what we want to do is write our tests against interfaces
that seldom change, and thus we should target public, external
interfaces, which are (generally) better designed and much slower to
change than the non-exposed interfaces on units. For example, in our
scenario above, we could treat our system as a black box and use its
HTTP API as the interface that we use to test it (e.g. using
supertest), use a mocked
HTTP API for the web service it calls (e.g. using
nock), and run it against a
real database that we reset after each test.
The JSON format remains one of the most popular text data formats
for Data-in-Transition. You can encounter JSON data on every stack
level of your application: from the database to UI, from IoT sensors
data to the mobile app’s payload. And it is not a coincidence; the
format has a good balance between being convenient for developers
and decent payload density. In Rust ecosystem, the de-facto standard
for dealing with JSON is Serde. Although it is
the best choice for most cases, there can be alternative approaches
that can work best for your application. One of these approaches we
are going to cover in this article.
By default Flask writes logs to the console in plain-text
format. This can be limiting if you intend to store your logs in a
text file and periodically send them to a central monitoring
service. For example, Kibana, only
accepts JSON logs by default.
You might also want to enrich your logs with additional metadata,
e.g. timestamps, method names, log type (Warn, Debug, etc.). In this
post we will use the Python
logging library to
modify Flask's logging format and write them to a text file. In the
end we will see how to periodically send these logs to an external
service using Flume.
Haskell programs are infamous for having lots of space leaks. This
is the result of Haskell choosing the lazy evaluation model and not
designing the language around preventing such type of memory usage
Investigating and fixing space leaks brought tons of frustration to
Haskell developers. Believe it or not, I’m not a fan of space leaks
either. However, instead of fighting the fire later, you can use
several techniques to prevent the catastrophe in the first place.
I realize not everybody’s going to ditch the Web and switch to
Gemini or Gopher today (that’ll take, like, at least a month
/s). Until that happens, here’s a non-exhaustive, highly-opinionated
list of best practices for websites that focus primarily on text. I
don’t expect anybody to fully agree with the list; nonetheless, the
article should have at least some useful information for any web
content author or front-end web developer.
Over the years I’ve worked out some basic principles for hash table
construction that aid in quick and efficient implementation. This
article covers the technique and philosophy behind what I’ve come to
call the “mask-step-index” (MSI) hash table, which is my standard
unblob is an accurate, fast, and easy-to-useextraction
suite. It parses unknown binary blobs for more than 30 different
archive, compression, and file-system formats, extracts their
content recursively, and carves out unknown chunks that
have not been accounted for.
unblob is free to use, licensed under MIT license, it has a
and can be used as a Python library. This turns unblob into the
perfect companion for extracting, analyzing, and reverse
engineering firmware images.
I generally still hold that belief today. That belief is put into
practice in Rust’s standard library and in many core ecosystem
crates. (And that practice predates my blog post.) Yet, there still
seems to be widespread confusion about when it is and isn’t okay to
use unwrap(). This post will talk about that in more detail and
respond specifically to a number of positions I’ve seen expressed.
Principal component analysis (PCA) is probably the most magical linear method in data science. Unfortunately, while it's always good to have a sense of wonder about mathematics, if a method seems too magical it usually means that there is something left to understand. After years of almost, but not quite fully understanding PCA, here is my attempt to explain it fully, hopefully leaving some of the magic intact.