One of the most common simple techniques for generating text is a
Markov chain. The algorithm takes an input text or texts, divides it
into tokens (usually letters or words), and generates new text based
on the statistics of short sequences of those tokens. If you're
interested in the technical details, Allison Parrish has a good
tutorial on implementing Markov chains in
Python.
This technique has been around for a very long time, but it's
slightly unclear when it was first developed. The following is a
slightly sketchy work-in-progress timeline of what I've been able to
work out.
Changelogs are important communication tools, and should be made for
people to enjoy reading. There should always be some level of
editing, and tailoring the message to who you’re talking to, and
what you're talking about. Relying on strict commit styles and
auto-generating tools limits that amount of tailoring, even if the
tool allows you to customize the end user output.
As you probably know, the GIL stands for the Global Interpreter
Lock, and its job is to make the CPython interpreter
thread-safe. The GIL allows only one OS thread to execute Python
bytecode at any given time, and the consequence of this is that it's
not possible to speed up CPU-intensive Python code by distributing
the work among multiple threads. This is, however, not the only
negative effect of the GIL. The GIL introduces overhead that makes
multi-threaded programs slower, and what is more surprising, it can
even have an impact I/O-bound threads.
In this post I'd like to tell you more about non-obvious effects of
the GIL. Along the way, we'll discuss what the GIL really is, why it
exists, how it works, and how it's going to affect Python
concurrency in the future.
Hello! On Wednesday I was talking to a friend about how it would be
cool to have an nginx playground website where you can just paste in
an nginx config and test it out. And then I realized it might
actually be pretty easy to build, so got excited and started coding
and I built it. It’s at
https://nginx-playground.wizardzines.com.
Python has a great ecosystem for machine learning, but deep learning
is computationally intensive and Python is
slow. In
this post, I will discuss different ways that helped to make my code
run faster, more specifically in physics simulation and
reinforcement learning for character animations. Nevertheless, most
of the tips are applicable to all computationally intensive
programs.
With that, here are the 7 ways to make any Python code run faster:
Recently, I have been spending a lot of time working with Go. While
helping other developers I noticed that many struggled with encoding
and decoding JSON. In this guide I cover a lot of the common issues
developers run into and how to solve them.
MySQL has a feature known as
replication
where data from one database server (referred to as the source or
primary) is copied to other database servers
(replicas). Conceptually, it’s pretty straight forward. All of your
data mutations (INSERT, UPDATE, DELETE, ALTER, etc.) are done
against the primary database. Those commands are then copied and
applied to the replicas.
Normally, when starting a Java project (or any other programming project, really), you don’t want to reinvent the wheel. You go with the de-facto build system, folder structure, environment etc. The ones that rest of the world is using.
Yet, both
Skija and
JWM are built
using Python scripts instead of more traditional
Ant/Maven/Gradle/SBT. Why? Let’s find out!
An off-duty Coast Guardsman is fishing with his family when he calls
in some suspicious behavior from a nearby boat. It's a snazzy craft,
slick and outfitted with extra horsepower, and is zipping along
until it slows to pick up a surfaced diver . . . a diver who was
apparently alone, without his own boat, in the middle of the
ocean. None of it makes sense unless there's something hinky going
on, and his hunch is proved right when all three Guardsmen who come
out to investigate are shot and killed.
They're federal officers killed on the job, which means the case is
the FBI's turf. When the FBI's investigation stalls out, they call
in Lucas Davenport. And when his case turns lethal, Davenport will
need to bring in every asset he can claim, including a detective
with a fundamentally criminal mind: Virgil Flowers.
In the evening I started in Ocean
Prey,
a Lucas Davenport and Virgil Flowers novel by John Sandford.
In Rust, you often hear about state machines. Futures are state
machines! I thought it would be cool to read more about it. I came
across this blog
post
(funnily enough, by a friend and mentor of mine) which really helped
me! I highly recommend reading it. In this post, I'm just noting the
part I found relevant. The example here is from her blog post.
As part of a recent personal journey to better understand databases
and better learn Rust, I have recently took on the project of
writing a simple key-value storage engine. Crazy, right? Lets get
started!
A couple months ago, we faced a question many young startups
face. Should we rewrite our system in Rust?
At the time of the decision, we were a Go and Python shop. The tool
we’re building passively watches API traffic to provide “one-click,”
API-centric visibility, by analyzing the API traffic. Our users run
an agent that sends API traffic data to our cloud for analysis. Our
users were using us to watch more and more traffic in staging and
production—and they were starting to complain about the memory
usage.
This led me to spend 25 days in the depths of despair and the
details of Go memory management, trying to get our memory footprint
to an acceptable level. This was no easy feat, as Go is a
memory-managed language with limited ability to tune garbage
collection.
When the first time I saw the WWDC presentation about actors, I was
thrilled with what it is capable of and how it will change the way
we write asynchronous code in the near future. By using actors,
writing asynchronous code that is free from data
races and
deadlocks has never been easier.
All that aside, that doesn’t mean that actors are free from
threading issues. If we are not careful enough, we might
accidentally introduce a reentrancy problem when using actors.
The perlcritic tool is often your first
defense against “awkward, hard to read, error-prone, or
unconventional constructs in your code,” per its
description. It’s
part of a class of programs historically known as
linters, so-called
because like a clothes dryer machine’s lint trap, they “detect small
errors with big effects.” (Another such linter is
perltidy, which I’ve
referenced in the past.)
A coredump is a snapshot of a process’s memory that is usually
created by the kernel when a crash happens. These can be fairly
helpful to find out which part of the code broke by looking at the
backtrace or finding any kind of corruption by introspecting the
memory itself. Unfortunately it can be a bit tedious to work with
these. This article aims to give an overview over helpful tools &
tricks to leverage the full power of coredumps on Nix-based systems.
Plotting is an essential component of data analysis. As a data
scientist, I spend a significant amount of my time making simple
plots to understand complex data sets (exploratory data analysis)
and help others understand them (presentations).
In particular, I make a lot of bar charts (including histograms),
line plots (including time series), scatter plots, and density plots
from data in Pandas data
frames. I
often want to facet these on various categorical variables and layer
them on a common grid.
’ve long been a fan of filtering at the DNS level. The approach that
I’ll outline here is similar to what our devices did at Luma. It is
not perfect by any means, but if it is setup well then it provides a
chokepoint that has a solid return on investment for the amount of
time and effort that it takes to standup. It isn’t going to keep any
sophisticated actors at bay, but it does demonstrably improve
performance and has a good chance of reducing the headaches of
administrating a network and the systems on it for a connected
family.
Python 3.10, which is due out in early October 2021, will include a
large new language feature called structural pattern
matching. This article is a critical but (hopefully) informative
presentation of the feature, with examples based on real-world code.
Arithmetic coding is a common algorithm used in both lossless and
lossy data compression algorithms.
It is an entropy encoding technique, in which the frequently seen
symbols are encoded with fewer bits than rarely seen symbols. It has
some advantages over well-known techniques such as Huffman
coding. This article will describe the
CACM87
implementation of arithmetic coding in detail, giving you a good
understanding of all the details needed to implement it.
In the evening I finished The Broken
Eye,
Lightbringer Book 3 by Brent Weeks. While not as good as the previous
book in the Lightbringer series I liked The Broken Eye a lot. Recommended.
Behind the well-known U.S. security organizations—the FBI and CIA
among them—lies a heavily guarded, anonymous government agency
dedicated to intelligence surveillance and to a highly specialized
brand of citizen protection.
Shock waves of alarm ripple through the clandestine agency when
Washington, D.C., police detective Ryan Kessler inexplicably becomes
the target of Henry Loving, a seasoned, ruthless “lifter” hired to
obtain information using whatever means necessary. While Loving is
deft at torture, his expertise lies in getting an “edge” on his
victim—leverage—usually by kidnapping or threatening family until
the “primary” caves under pressure
In the evening I started in
Edge
by Jeffery Deaver.