week 06, 2020

Bot detection 101: How to detect web bots?

We can distinguish two main families of detection techniques:

  • Behavioral detection: this family of approaches leverages the user’s behavior, such as mouse movements or browsing speed, to predict whether a user is human or not.
  • Fingerprinting-based detection: this second family of approaches leverages information about the device and the browser, such as the browser version, the Operating System (OS) or the number of CPU cores.

Source: Bot detection 101: How to detect web bots?, an article by Antoine Vastel.

Overload Functions in Python

Function overloading is the ability to have multiple functions with the same name but with different signatures/implementations. When an overloaded function fn is called, the runtime first evaluates the arguments/parameters passed to the function call and judging by this invokes the corresponding implementation.

Source: Overload Functions in Python, an article by Arpit Bhayani.

Scaling to 100k Users

Many startups have been there - what feels like legions of new users are signing up for accounts every day and the engineering team is scrambling to keep things running.

It’s a good a problem to have, but information on how to take a web app from 0 to hundreds of thousands of users can be scarce. Usually solutions come from either massive fires popping up or by identifying bottlenecks (and often times both).

With that said, I’ve noticed that many of the main patterns for taking a side project to something highly scalable are relatively formulaic.

This is an attempt to distill the basics around that formula into writing. We’re going to take our new photo sharing website, Graminsta, from 1 to 100k users.

Source: Scaling to 100k Users, an article by Alex Pareto.

An Introduction to Big Data: Clustering

Clustering is a Machine Learning technique that involves the grouping of data points. Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features. Clustering is a method of unsupervised learning and is a common technique for statistical data analysis used in many fields.

Source: An Introduction to Big Data: Clustering, an article by James Le.

A Pythonista's Review of Haskell

over the past three months, I've been reading through “Haskell Programming, From First Principles” by Chris Allen and Julie Moronuki, the 4th release candidate of the 1.0 edition (1.0-rc4). I'm pleased to say that I made it to the end of this 1,857 page (by the e-reader PDF version) monstrosity. Here's some of the things that I, as a software engineer who has used Python in production and Haskell doing book exercises only, liked and didn't like about Haskell.

Source: A Pythonista's Review of Haskell, an article by Ying Wang.