Colly is a web scraping framework for Go
programming language. The feature set of Colly largely overlaps with
that of Scrapy framework from Python ecosystem:
Built-in concurrency.
Cookie handling.
Caching of HTTP response data.
Automatic heeding of robots.txt rules.
Automatic throttling of outgoing traffic.
Furthermore, Colly supports distributed scraping out-of-the-box
through a Redis-based task queue and can be integrated with Google
App Engine. This makes it a viable choice for large-scale web
scraping projects.
Design patterns are a great way to think about interactions among
classes. But the classic Singleton pattern is bad: you shouldn’t use
it and there are better options.
The classic Singleton pattern is a class which always gives you the
same object when you create an instance of the class. It’s used to
ensure that all users of a class are using the same object.
The einsum function is one of NumPy’s jewels. It can often
outperform familiar array functions in terms of speed and memory
efficiency, thanks to its expressive power and smart loops. On the
downside, it can take a little while understand the notation and
sometimes a few attempts to apply it correctly to a tricky problem.
For many using Unix-derived systems today, we take for granted that
/some/path and /some/path/ are the same. Most shells will even
add a trailing slash for you when you press the Tab key after the
name of a directory or a symbolic link to one.
However, many programs treat these two paths as subtly different in
certain cases, which I outline below, as all three have tripped me
up in various
ways1.
Perl has two operators, cmp and <=>, which are basically never
seen outside of sort blocks.
That doesn’t mean you can’t use them elsewhere, though. Certainly
sort and these operators were designed to work seamlessly together
but there isn’t anything sort-specific about the operators per se,
and in some contexts they can be the most appropriate solution.
AsyncSSH is a Python package which provides an asynchronous client
and server implementation of the SSHv2 protocol on top of the Python
3.6+ asyncio framework.
When you type a web address or domain name into your address bar
(example: www.mozilla.org), your browser
sends a request over the Internet to look up the IP address for that
website. Traditionally, this request is sent to servers over a plain
text connection. This connection is not encrypted, making it easy
for third-parties to see what website you’re about to
access. DNS-over-HTTPS
(DoH) works differently. It sends the domain name you typed to a
DoH-compatible DNS server using an encrypted HTTPS connection
instead of a plain text one. This prevents third-parties from seeing
what websites you are trying to access.
Literate programming is an approach to
programming in which
the code is explained using natural language alongside the source
code. This is distinct from related practices such as documentation
or code comments; there, the code is primary, with commentary and
explanation being secondary. In literate programming, however,
explanation has equal billing with the code itself.
Go is known for its first-class support for concurrency, or the
ability for a program to deal with multiple things at once. Code
concurrently running is becoming a more critical part of programming
as computers move from running a single code stream faster to
running more streams simultaneously.
Nix flakes allow you to expose NixOS modules. NixOS modules are
templates for system configuration and they are the basis of how you
configure NixOS. Today we're going to take our Nix flake from the
last
article and
write a NixOS module for it so that we can deploy it to a container
running locally. In the next post we will deploy this to a server.
Python is often marketed as a batteries-included language because it
comes with almost everything you’d ever expect from a programming
language. This statement is mostly true, as the standard library and
the external modules cover a broad spectrum of programming
needs. However, Python lacks built-in support for the YAML data
format, commonly used for configuration and serialization, despite
clear similarities between the two languages.
In this tutorial, you’ll learn how to work with YAML in Python using
the available third-party libraries, with a focus on PyYAML. If
you’re new to YAML or haven’t used it in a while, then you’ll have a
chance to take a quick crash course before diving deeper into the
topic.
Let us not beat around the bush: Rust is not easy to learn.
I think it took me nearly 1 year of full-time programming in Rust to
become proficient and no longer have to read the documentation every
5 lines of code. It's a looong journey but absolutely worth it.
It requires you to re-think all the mental models you learned while
using other programming languages.
This is why I thought it could be interesting to share how I adapted
my programming habits when working with Rust along the years.
It’s that time again: there’s a new major version of Emacs and, with
it, a treasure trove of new features and changes.
Notable features include the formal inclusion of native
compilation, a technique that will greatly speed up your Emacs
experience.
A critical issue surrounding the use of
ligatures
also fixed; without it, you couldn’t use ligatures in Emacs 27
without crashes. So that’s good news indeed also.
Kernel modules are object files used to extend an operating system’s
kernel functionality at run time.
In this post, we’ll look at implementing a simple character device
driver as a kernel module in NetBSD. Once it is loaded, userspace
processes will be able to write an arbitrary byte string to the
device, and on every successive read expect a
cryptographically-secure pseudorandom permutation of the original
byte string.
At Cloudflare, we’re used to being the fastest in the
world. However, for approximately 30 minutes last December,
Cloudflare was
slow. Between
20:10 and 20:40 UTC on December 16, 2021, web requests served by
Cloudflare were artificially delayed by up to five seconds before
being processed. This post tells the story of how a missing shell
option called “pipefail” slowed Cloudflare down.
In this article I present a step-by-step walkthrough of my
photography workflow. I won't go through all the details of every
piece of software I mention, they have their own manuals and
documentation for that, I will highlight the operations I do.
Formatted string literals - also called f-strings - have been
around since Python 3.6, so we all know what they are and how to use
them. There are however some facts and handy features of f-string
that you might not know about. So, let's take a tour of some awesome
f-string features that you'll want to use in your everyday coding.
In 1832, Charles Darwin witnessed hundreds of ballooning
spiders landing
on the HMS Beagle while some 60 miles offshore. Ballooning is a
phenomenon that's been known since at least the days of
Aristotle—and immortalized in E.B. White's children's classic
Charlotte's
Web—but
scientists have only recently made progress in gaining a better
understanding of its underlying physics.
I have been working on a project that needs a rest API with a SQL
database. The API creates, updates and retrieves objects from the
database; such as:
user information
transactions
user accounts
credit card details
other
This project has some specific data security requirements for
storage. We needed to encrypt certain fields on these data tables
using a different encryption key per user and per account. The
biggest challenge was building a Go library to support this sort of
complex per field encryption. I wanted to make a nice way of
encrypting and decrypting a Go struct without adding verbose code in
the API. There needed to be a simple way of managing the many
different encryption keys that will be handled in each API request.
The obvious nice thing about Go's approach is that you're never in
doubt about whether an identifier is public or not when you read
code. If it starts with upper case, it's public; otherwise, it's
package private. This doesn't mean that a public identifier is
supposed to be used generally, but at least it's clear to everyone
that it could be. In other languages, you may have to consult the
definition of the identifier, or perhaps a section of code that
lists exported identifiers.
But there is a whole bunch of lesser-used attributes that I was sure
I’d forgotten about, and probably a whole bunch of attributes I
didn’t even know existed. This post is the result of my research,
and I hope you’ll find some of these useful to you, as you build
HTML pages in the coming months.