Friday, February 3, 2012

Swap the Annoying Cousin of HPC

Swap, Paging, Virtual Memory, drunk relatives, whatever you call it, avoid it.  There has been a lot of confusion among users so I hope to dispel some of the myths and ideas around swap for HPC applications.

Virtual memory allows modern operating systems to present more memory to an application than is actually installed in the platform. There is a lot more going on here but for the sake of simplicity in our case what happens if my HPC job requires 4GB of ram and my node I am allocated only has 3GB free?

When the operating system needs memory, and there is not any more physical memory installed the system starts swapping. This is the process of taking some data out of memory and writing it to a swap/page file on disk.  This sounds like a great idea, I have talked to users who rely on swap space to run their application, not expecting any impact on their performance.

To start think of how a hard drive works, there is a physical spinning platter with a needle moving over it, while RAM in a system is charges stored in a capacitor. Which is faster: Flying electrons or 7200RPM record player?  When the operating system runs out of space and starts putting some of your data onto disk it tries to make a best guess about what was used last in hopes that you will not access that data again soon, in most HPC applications this is not the case. Most of our data is represented in a few large arrays that we walk up and down over and over again.

The result is the hard drive in the computer rushing back and forth, trying to write out some data from RAM to make space for the data just requested, then read that requested data back into memory to run your application on.  So what is the speed of a hard drive vs. ram?

In modern systems with 3 memory channels and two sockets 12 total cores stream provides memory bandwidth of 42,174MB/s.  Under the best of situations hard drives give 100MB/s, under the swapping case where chunks of data are both read and written at the same time this falls by 10X to about 10-20MB/s.

In these situations your application will crawl. If you expected to use your hard drive, for any type of data storage for the application, in addition to swap your performance will be even worse because of all the demands placed on the hard drive.

What about SSD drives?   For the cost/speed I would just buy extra ram, if you still use SSD use one of the PCIe cards not the ones that use the SATA bus.

Never would I ever, as my first recommendation, say 'use swap'.  I would in order say:

   * Buy more ram
   * Get an Xsede allocation on Blacklight
   * Write your application to use out of core methods with a pile of SSD
   * Use out of core methods without SSD
   * Partition your code to run in parallel on more nodes (more ram in total)
   * Build a system with ScaleMP/vSMP.
   * Fine use swap

That is the story, total memory on the system is ram+swap, the useable memory on the system is ram+0*swap=ram. Do not use swap!


  1. Good stuff. I was going to be a smarty-pants and ask about SSD, but I see that you've anticipated this. I accidentally wandered into swap last week and it slowed my code down by about 150x. It's brutal.

    1. Tom, that is very true, 100x slow down is common. Every case is unique and may behave differently. SSD's will deal with the random access nature of SWAP better than spinning rust. The ones that sit in a PCIe slot will do even better.

      Something many don't know is that you can have more than one swap space or swap file active at a time. The system will use them in parallel so if you must and you have absolute hardware control, spread swap around as many drives as able.