Patterns for Managing Source Code Branches, Pytest, and faster Pandas

Mon 20 Apr 2020

Patterns for Managing Source Code Branches

Modern source-control systems provide powerful tools that make it easy to create branches in source code. But eventually these branches have to be merged back together, and many teams spend an inordinate amount of time coping with their tangled thicket of branches. There are several patterns that can allow teams to use branching effectively, concentrating around integrating the work of multiple developers and organizing the path to production releases. The over-arching theme is that branches should be integrated frequently and efforts focused on a healthy mainline that can be deployed into production with minimal effort.

Source: Patterns for Managing Source Code Branches, an article by Martin Fowler.

Effective Python Testing With Pytest

Testing your code brings a wide variety of benefits. It increases your confidence that the code behaves as you expect and ensures that changes to your code won’t cause regressions. Writing and maintaining tests is hard work, so you should leverage all the tools at your disposal to make it as painless as possible. pytest is one of the best tools you can use to boost your testing productivity.

Source: Effective Python Testing With Pytest, an article by Dane Hillard.

From chunking to parallelism: faster Pandas with Dask

When data doesn’t fit in memory, you can use chunking: loading and then processing it in chunks, so that only a subset of the data needs to be in memory at any given time. But while chunking saves memory, it doesn’t address the other problem with large amounts of data: computation can also become a bottleneck.

How can you speed processing up?

One approach is to utilize multiple CPUs: pretty much every computer these days has more than one CPU. If you have two CPUs, you can often run your code (almost) twice as fast; four CPUs and you might approach a 4× speedup, and so on.

Even better, the chunking technique that helps reduce memory can also enable parallelism. Let’s see why, and then learn how the Dask library can easily enable parallelism of your Pandas processing code.

Source: From chunking to parallelism: faster Pandas with Dask, an article by Itamar Turner-Trauring.

big data