Plurrrr

Tue 14 Sep 2021

How to Crawl the Web with Scrapy

Web scraping is the process of downloading data from a public website. For example, you could scrape ESPN for stats of baseball players and build a model to predict a team’s odds of winning based on their players stats and win rates. Below are a few use-cases for web scraping.

  • Monitoring the prices of your competitors for price matching (competitive pricing).
  • Collecting statistics from various websites to create a dashboard e.g. COVID-19 dashboards.
  • Monitoring financial forums and twitter to calculate sentiment for specific assets.

One use-case I will demonstrate is scraping the website indeed.com for job postings. Let’s say you are looking for a job but you are overwhelmed with the number of listings. You could set up a process to scrape indeed every day. Then you can write a script to automatically apply to the postings that meet certain criteria.

Source: How to Crawl the Web with Scrapy, an article by Matt Bass.

A good old-fashioned Perl log analyzer

A recent Lobsters post lauding the virtues of AWK reminded me that although the language is powerful and lightning fast, I usually find myself exceeding its capabilities and reaching for Perl instead. One such application is analyzing voluminous log files such as the ones generated by this blog. Yes, WordPress has stats, but I’ve never let reinvention of the wheel get in the way of a good programming exercise.

Source: A good old-fashioned Perl log analyzer, an article by Mark Gardner.