Saturday, January 10, 2015

Using Infiniband with MATLAB Parallel Computing Toolbox

This is a cross post from the Day Job.

In High Performance Computing (HPC) there are a number of network types commonly used, among these are: Ethernet, the common network found on all computer equipment. Infiniband, a specialty high performance low latency interconnect common on commodity clusters.  There are also several propriety types and a few other less common types but I will focus on Ethernet and Infiniband.

Ethernet and really its mate protocol, TCP, are the most common supported MPI networks.  Almost all computer platforms support this network type and can be as simple as using your home network switch.  It is ubiquitous and easy to support.  Networks like Infiniband though require special drivers, uncommon hardware but the effort is normally worth it.

The MATLAB Parallel Computing Toolbox provides a collection of functions that allow users of MATLAB to utilize multiple compute nodes to work on larger problems.  Many may not realize that MathWorks chose to use the standard MPI routines to implement this toolbox.  MathWorks also chose for ease of use to ship MATLAB with the Mpich MPI library, and the version they use only support Ethernet for communication between nodes.

As noted Ethernet is about the slowest common network used in parallel applications. The question is how much can this impact performance.

Mmmmm Data:

The data was generated on 12 nodes of Xeon x5650 total 144 cores. The code was the stock MATLAB paralleldemo_backslash_bench(1.25) from MATLAB 2013b.  You can find my M-Code at Gist.

The data show two trends, one is independent of the network type.  That is many parallel algorithms do not scale unless the amount of data for each core to work on is sufficiently large. In this case for Ethernet especially the peak performance is never reached.  What should be really noted though is that without Infiniband at many problem sizes over half of the performance of the nodes is lost. The second trend is that network really matters.

How to have MATLAB use Infiniband?

MathWorks does not ship an MPI library with the parallel computing toolbox that can use infiniband by default. This is reasonable, I would be curious how large the average PCT cluster is, and/or how big the jobs ran on the toolbox are.  Lucky for us MathWorks allows a way for introducing your own MPI library.  Let me be the first to proclaim:
Thank you MathWorks for adding mpiLibConf.m  (Simple Example) as a feature. -- Brock Palen
In the above test we used Intel MPI for the infiniband test and mpich for the ethernet test.  The choice of MPI is important.  The MPI standard enforces a shared API but not a shared ABI.  Thus the MPI library you substitute needs to match the one MATLAB is compiled against. Lucky for us they used mpich, so any mpich clone should work; mvapich, IntelMPI, etc.

If you are using the MATLAB Parallel Computing Toolbox on more than one node, and if your cluster has a network other than Ethernet/TCP (there are non-tcp ethernet networks that perform very well) I highly encourage that the effort be put in to ensure you use that network.

For Flux users we have this setup, but you have to do some setup for yourself before you see the benefit.  Please visit the ARC MATLAB documentation, or send us a question at hpc-support@umich.edu.

No comments:

Post a Comment