week 16, 2021

JavaScript for Data Science

David Beazley thought that “JavaScript versus Data Science” would be a better title for this book. While that one word sums up how many people view the language, we hope we can convince you that modern JavaScript is usable as well as useful. Scientists and engineers are who we were thinking of when we wrote this book but we hope that these lessons will also help librarians, digital humanists, and everyone else who uses computing in their research.

Source: JavaScript for Data Science, an article by Maya Gans, Toby Hodges, and Greg Wilson.

How we should be using Git

Git was created by Linus Torvalds out of a need. At the time the Linux Kernel team was using a proprietary Distributed Source Control Management (DSCM) system. However, due to licensing issues the Linux Kernel team could no longer use this proprietary DSCM system. Therefore, Linus decided to build Git as the DSCM system he always wished they had.

You might also know Linus as the creator of Linux. In fact, Linus still manages the Linux Kernel today. As of 2020, the Linux kernel had over 27.8 million lines of code spread across ~66 thousand files from ~21 thousand different contributors. It has continued to be successfully developed, maintained, and extended since it was publicly announced in 1991. It is also worth noting that Git itself, another large, successful open source project, is managed in the same way.

So there must be some useful insights around long term software development and maintenance practices we can glean by looking at how Linus and his team use Git for their development and peer review workflows.

Source: How we should be using Git, an article by Drew De Ponte.

You might as well timestamp it

Storing timestamps instead of booleans, however, is one of those things I can go out on a limb and say it doesn’t really depend all that much. You might as well timestamp it. There are plenty of times in my career when I’ve stored a boolean and later wished I’d had a timestamp. There are zero times when I’ve stored a timestamp and regretted that decision.

Source: You might as well timestamp it, an article by Jerod Santo.

Writing Good Unit Tests; Don’t Mock Database Connections

Unit tests are unbelievably important to us as developers because they allow us to demonstrate the correctness of the code we’ve written. More importantly, unit tests allow us to make updates to our code base with the confidence that we haven’t broken anything. In our eagerness to get 100% code coverage, however, we often write tests for logic that perhaps we have no business testing. I’m here to assert that creating mock database abstractions in order to write unit tests is a bad idea almost all of the time.

Source: Writing Good Unit Tests; Don't Mock Database Connections, an article by Lane Wagner.

Data ordering attacks

Most deep neural networks are trained by stochastic gradient descent. Now “stochastic” is a fancy Greek word for “random”; it means that the training data are fed into the model in random order.

So what happens if the bad guys can cause the order to be not random? You guessed it – all bets are off. Suppose for example a company or a country wanted to have a credit-scoring system that’s secretly sexist, but still be able to pretend that its training was actually fair. Well, they could assemble a set of financial data that was representative of the whole population, but start the model’s training on ten rich men and ten poor women drawn from that set – then let initialisation bias do the rest of the work.

Source: Data ordering attacks, an article by Ross Anderson.

Complete Guide to Generative Adversarial Networks (GANs)

The technological advancements and developments in machine learning, deep learning, and neural networks have led to a revolutionary era. Creating and replicating photos, texts, images, and pictures based on only a collection of examples can be considered shocking to some, and marvelous to others.

We are now at a point where technology is so advanced that deep learning and neural networks can even generate realistic human faces from scratch. The faces generated do not belong to any person, alive or dead, yet they are astoundingly realistic.

One special deep learning network we have to thank for these achievements is the Generative Adversarial Network (GAN), which is the topic of this article. Let's briefly explore our table of contents to understand the main topics we'll cover.

Source: Complete Guide to Generative Adversarial Networks (GANs).

Simple Python Profiling with the @profile Decorator

A way of analyzing the performance of any given function or program is to use profilers during its execution. Profilers can help us understand timing, memory usage, and other pertinent information about code. The key to using profilers is to determine which portion of the code is slow or computationally expensive and assist the process of catching errors for optimization.

Source: Simple Python Profiling with the @profile Decorator, an article by Ryan Kuang.

“So what exactly is curl?”

You know that question you can get asked casually by a person you’ve never met before or even by someone you’ve known for a long time but haven’t really talked to about this before. Perhaps at a social event. Perhaps at a family dinner.

So what do you do?

Source: “So what exactly is curl?”, an article by Daniel Stenberg.

Git from the Bottom Up

Welcome to the world of Git. I hope this document will help to advance your understanding of this powerful content tracking system, and reveal a bit of the simplicity underlying it — however dizzying its array of options may seem from the outside.

Source: Git from the Bottom Up, an article by John Wiegley.

Continued Fractions in Haskell

In this article, we’ll develop a Haskell library for continued fractions. Continued fractions are a different representation for real numbers, besides the fractions and decimals we all learned about in grade school. In the process, we’ll build correct and performant software using ideas that are central to the Haskell programming language community: equational reasoning, property testing, and term rewriting.

Source: Continued Fractions: Haskell, Equational Reasoning, Property Testing, and Rewrite Rules in Action, an article by Chris Smith.

Animate extruded text

or better or worse, extruded text was a staple of the mid-90s desktop publishing design landscape. It was rare for a party invitation or gaming fanzine not to be blessed by this 3D text effect on its way out the printer.

As design was increasingly destined for screen rather than page, extrusion fell out of favour. But recently, I've noticed a quiet revival.

Source: Animate extruded text, an article by Matt Perry.

Internet Search Tips

Over time, I developed a certain google-fu and expertise in finding references, papers, and books online. Some of these tricks are not well-known, like checking the Internet Archive (IA) for books.

Source: Internet Search Tips.

My current HTML boilerplate

Usually when I start a new project, I either copy the HTML structure of the last site I built or I head over to HTML5 Boilerplate and copy their boilerplate. Recently I didn’t start a new project, but I had to document the structure we use at work for the sites we build. So, simply copying and pasting wasn’t an option, I had to understand the choices that have been made. Since I spent quite some time researching and putting the structure together, I decided to share it with you.

Source: My current HTML boilerplate, an article by Manuel Matuzović.