Plurrrr

Thu 02 Feb 2023

JVM Field Guide: Memory

A field guide is a book designed to help identify birds, spiders, or other animals while on a nature walk. Typically, these books are very concise as you don’t want to bring a five volume encyclopaedia to your bushwalk. They only contain the necessary details while leaving out less important information.

This article is the first chapter of an attempt to create such a guide for running and supporting JVM applications. A guide that is concise only contains the necessary information, and can be used to find a solution when encountering a problem in the field. As with animal kingdoms, there are five fundamental resources that can affect JVMs runtime: Memory, CPU, Disk IO, Network, and thread synchronisation.

This article focuses on the first one – memory. Memory is an extensive topic. There could be books written on how JVM applications use memory, and it’s impossible for a single article to cover the whole story. Instead, the guide focuses on the most practical aspects of dealing with JVM applications, primarily server-side ones, and provides plenty of references for those who’d like to dive deeper.

Source: JVM Field Guide: Memory, an article by Sergey Tselovalnikov.

Don’t bother trying to estimate Pandas memory usage

You have a file with data you want to process with Pandas, and you want to make sure you won’t run out of memory. How do you estimate memory usage given the file size?

At times you may see estimates like these:

  • “Have 5 to 10 times as much RAM as the size of your dataset”, or
  • “several times the size of your dataset”, or
  • 2×-3× the size of the dataset.

All of these estimates can both under- and over-estimate memory usage, depending on the situation. In fact, I will go so far as to say that estimating memory usage is just not worth doing.

Source: Don’t bother trying to estimate Pandas memory usage, an article by Itamar Turner-Trauring.