How to Crawl, service isolation with VRFs, and Perl log analyzer

Tue 14 Sep 2021

How to Crawl the Web with Scrapy

Web scraping is the process of downloading data from a public website. For example, you could scrape ESPN for stats of baseball players and build a model to predict a team’s odds of winning based on their players stats and win rates. Below are a few use-cases for web scraping.

Monitoring the prices of your competitors for price matching (competitive pricing).

Collecting statistics from various websites to create a dashboard e.g. COVID-19 dashboards.

Monitoring financial forums and twitter to calculate sentiment for specific assets.

One use-case I will demonstrate is scraping the website indeed.com for job postings. Let’s say you are looking for a job but you are overwhelmed with the number of listings. You could set up a process to scrape indeed every day. Then you can write a script to automatically apply to the postings that meet certain criteria.

Source: How to Crawl the Web with Scrapy, an article by Matt Bass.

Efficient service isolation on Alpine with VRFs

Over the weekend, a reader of my blog contacted me basically asking about firewalls. Firewalls themselves are boring in my opinion, so let’s talk about something Alpine can do that, as far as I know, no other distribution can easily do out of the box yet: service isolation using the base networking stack itself instead of netfilter.

Source: Efficient service isolation on Alpine with VRFs, an article by Ariadne Conill.

A good old-fashioned Perl log analyzer

A recent Lobsters post lauding the virtues of AWK reminded me that although the language is powerful and lightning fast, I usually find myself exceeding its capabilities and reaching for Perl instead. One such application is analyzing voluminous log files such as the ones generated by this blog. Yes, WordPress has stats, but I’ve never let reinvention of the wheel get in the way of a good programming exercise.

Source: A good old-fashioned Perl log analyzer, an article by Mark Gardner.

perl