Sunday, May 3, 2015

File Transfer Tool Performance (Globus vs SCP)

The following is something I wrote up at the day job to satisfy my curiosity.  These measurements are not robust but encouraging.  I added some additional thoughts at the end.

On the HPC systems at ARC-TS we have two primary tools for transferring data, scp (secure copy), and Globus (GridFTP).  Other tools like rsync and sftp operate over scp and thus will have performance comparable to that tool.

So which tool performs the best?  We are going to test two cases each moving data to the XSEDE machine Gordon at SDSC. One test will be for moving a single large file, the second will be many small files.

Large file case.

For the large file we are moving a single 5GB file from Flux's scratch directory to the Gordon scratch directory.  Both filesystems can move data at GB/s rates so the network or tool will be the bottleneck.

scp / gsiscp

[brockp@flux-login2 stripe]$ gsiscp -c arcfour all-out.txt.bz2
india-all-out.txt.bz2                     100% 5091MB  20.5MB/s  25.6MB/s   04:08   

Duration: 4m:08s


Request Time            : 2015-04-26 22:41:04Z
Completion Time         : 2015-04-26 22:42:44Z
Effective MBits/sec     : 427.089

Duration: 1m:40s  2.5x faster than SCP

Many File Case

In this case the same 5GB file was split into 5093 1MB files.  Many may not know that every file has overhead, and that it is well known that moving many small files of the same size is much slower than moving one larger file of the same total size.  How much impact and can Globus help with this impact read below.

scp / gsiscp

[brockp@flux-login2 stripe]$ time gsiscp -r -c arcfour iobench
real 28m9.179s

Duration: 28m:09s


Request Time            : 2015-04-27 00:18:40Z
Completion Time         : 2015-04-27 00:25:30Z
Effective MBits/sec     : 104.423

Duration: 7m:50s  3.6x faster than SCP


Globus provides significant speedup both for single large files and many smaller files over scp.  The result is even more significant the smaller the files because of the overhead in scp doing one file at a time.

Additional Thoughts
Here are some additional thoughts for admins of Globus Servers.  Specifically those using web client.  If you look at your endpoint settings in   using endpoint-list -v there are settings for both concurrency and parallelism.  Both have settings for preferred and maximum settings. Further work would be for servers with large network connections and large parallel filesystems to increase these default values and see if better network utilization and performance could be realized.

No comments:

Post a Comment