tag:blogger.com,1999:blog-39917573698761447352024-02-19T19:58:23.494-05:00FaaS - Failure as a ServiceBrock Palen the founder of ShepTech LLC, has worked in High Performance Computing (HPC) starting in 2004. In 2002 he founded MLDS-Networks.com a Joomla! and Linux consulting and hosting Provider. In 2009 he started along with Jeff Squyres RCE-cast.com a podcast focusing on HPC topics.Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.comBlogger87125tag:blogger.com,1999:blog-3991757369876144735.post-9214177096400982022020-10-30T23:07:00.000-04:002020-10-30T23:07:21.255-04:00AMD Killed ARM in the Data Center<p> When I first started in HPC, we had a software title that came on physical media in a three ring binder. It covered many platforms, Solaris on Spark, AIX on Power, Linux on Power, Linux on x86, Windows on x86, Linux on x86-64, etc. Today that same package only supports Windows and Linux on x86-64.</p><p>As early as 2 years ago ARM (and Power) for HPC and data center was a major interest. So why did this happen? Looking back it appears the market wanted an competitive alternative to Intel. Maybe Intel could have avoided this situation, and they did make some decisions that didn't help, I think though it's just nature of the market.<br /></p><p>Many of these decisions make sense, if the market will carry the cost, sure segment for extra margin (pay extra for >768GB or AVX-512). Sure don't implement on die support for NVLink in hope that your now market rejected (Phi) / or upcoming GPU (Xe) competitor would be a platform solution. </p><p>So the market starting with hyperscalers and extreme scale HPC, like DOE, started looking at other options. Power had some high profile wins, but quickly the attention turned to ARM. All the major cloud providers have some ARM play today.<br /></p><p>So why do we see ARM leaving the data center as a general purpose CPU outside of cloud? My opinion: <b>AMD is competitive</b>, leaving minimal market vs the investment to move to a new platform. Maybe ARM was just used to negotiate with Intel.<br /></p><p>By having an x86-64 capable part at a good value, and generally simple to understand portfolio, the market appetite dried up over night for a non-x86 alternative. Going back to my three ring binder of OS+CPU options, the market doesn't like that complexity if it doesn't provide significant value. The industry has matured, the market has spoken. </p><p>So it's simple economics; easy to support, good value, you now have x86 market competition that is good for us but ARM for data center is a casualty of that. The market just dried up.<br /></p><p>We have used ARM CPUs and it was very good for our use case, but I think the decision to use the IP for custom projects and not general purpose is the right decision for all the providers. If we have a repeat of AMD losing competitive capacity again, I would expect the market to respond and we could have a repeat of the last several years.</p><p><b>What about cloud ARM?</b></p><p>While AWS and other cloud provider ARM offerings will keep things alive I don't expect it to make much general inroads for general purpose. If data center ARM remains a cloud provider specific IP it will struggle in the near future to get the application and providers will want to keep our support envelope small. Looking at the instance types it's not clear that the cloud providers even position these systems for performance use vs. scale out application servers. For anything other than very large single use cases these will remain niche offerings.<br /></p><p><b> What about the A64fx CPU?</b></p><p>I'm not holding my breath. We are seeing adoption of A64fx outside Japan, but like Power8/9 I'll wait and see if it's anything outside the largest systems, or if interest wanes as there are better supported competition on the x86 space again. The one exception to this is in Europe (<i>speculative</i>) where they appear to be taking a <a href="https://www.european-processor-initiative.eu/">xenophobic approach</a> to CPU and wanting to develop European CPU IP. ARM creates issues for them to continue down that path. </p><p>The major performance benefits of A64fx would easy be provided by both Intel and AMD by providing an HBM front-ended CPU. So short of national support it's easy pickings for a niche market. </p><p><b>Summary</b></p><p>The market likes competition but also like as much simplicity and compatibility as possible. Once an reputable competitive x86 offering was on the market (AMD) interest in more complex options declined significantly. <b> </b><br /></p>Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-77872906988398061902020-10-18T10:51:00.005-04:002020-10-19T22:15:22.966-04:00Parallel Compressor Performance for Science - pigz, lbzip2, xz<p><b>UPDATE</b>: At requet of a friend I looked into zstd and wow it's a great option. As it becomes more ubiquitous it should likely replace most compressors. Compresson similar to xz and speed approaching llz4 for modest cpu increase.</p><p><b>Origonal Post</b><br /></p><p>As data volumes grow and single core performance grows slower than core count, compressing large volumes of data quickly requires the use of compressors that are capable of utilizing multiple cores for keeping up with the data volumes and hardware investments.</p><p>Luckily there are several available that are compatible compressors out there, but how do they perform and compare to classic <span style="font-family: courier;">gzip</span>? Also how well do they work on scientific data? Often scientific data has a few very large files that are often binary and thousands of small files that are compressible.</p><h3 style="text-align: left;">The Host</h3><div style="text-align: left;"><p style="text-align: left;">All tests were done on the <a href="https://arc-ts.umich.edu/greatlakes/">Great Lakes</a> login node. The properties of this node are:</p><ul style="text-align: left;"><li>36 core 36 thread Intel Xeon 6154 </li><li>192 GB Memory</li><li>1.9PB GPFS File System </li><li>100Gbps HDR Network</li></ul><h3 style="text-align: left;">The Data</h3><p style="text-align: left;">The data set has the following properties</p><ul style="text-align: left;"><li>6649 files</li><li>276 directories</li><li>221 GB total size</li></ul><p><span style="font-family: courier;"> Range Number<br />[ 0.000 B - 0.000 B ) 1<br />[ 0.000 B - 1.000 KB ) 560<br />[ 1.000 KB - 1.000 MB ) 4935<br />[ 1.000 MB - 10.000 MB ) 1175<br />[ 10.000 MB - 100.000 MB ) 116<br />[ 100.000 MB - 1.000 GB ) 94<br />[ 1.000 GB - 10.000 GB ) 43<br />[ 10.000 GB - 100.000 GB ) 1<br />[ 100.000 GB - 1.000 TB ) 0<br />[ 1.000 TB - MAX ) 0 </span></p><p><br /></p><h3 style="text-align: left;">Results</h3><p style="text-align: left;">This compares runtime and final archive size as compared to serial <span style="font-family: courier;">gzip</span>. This was accomplished with </p><p style="text-align: left;"><span style="font-family: courier;">tar -I pigz -cf myarchive.tar.gz</span><br /></p><div align="left" dir="ltr" style="margin-left: 0pt;"><table style="border-collapse: collapse; border: medium none;"><colgroup><col width="120"></col><col width="120"></col><col width="114"></col><col width="96"></col><col width="99"></col><col width="129"></col></colgroup><tbody><tr style="height: 30pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Command</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Compatible</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Parallel</span></p><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Compress</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Parallel</span></p><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Decom.</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Speed vs. Gzip</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">Gzip Size</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;"><br /></span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 700; text-decoration: none; vertical-align: baseline; white-space: pre;">153 G</span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">gzip</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">gzip</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">No</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">No</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">1x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">153G</span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><a href="https://zlib.net/pigz/" target="_blank"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">pigz</span></a></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">gzip</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">No</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">32x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">153G</span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><a href="https://lbzip2.org/" target="_blank"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">lbzip2</span></a></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">bzip2</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">23x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">151G</span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">mpibzip2</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">bzip2</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">*</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">151G</span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"><a href="https://tukaani.org/xz/" target="_blank">xz</a> -T0</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">xz/lzma</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">No</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">5.5x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">137G</span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><a href="https://github.com/vasi/pixz" target="_blank"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">pixz</span></a></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">xz/lzma</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">5.5x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">137G </span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><a href="https://github.com/facebook/zstd" target="_blank"><span style="font-family: courier;">zstd -T0</span></a><br /></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">zst</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">Yes</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">67x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">155G </span></p></td></tr><tr style="height: 31pt;"><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;">lz4</td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">lz4</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">No</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">No</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">42.2x</span></p></td><td style="border-bottom: solid #9e9e9e 1pt; border-color: rgb(158, 158, 158); border-left: solid #9e9e9e 1pt; border-right: solid #9e9e9e 1pt; border-style: solid; border-top: solid #9e9e9e 1pt; border-width: 1pt; overflow-wrap: break-word; overflow: hidden; padding: 7pt; vertical-align: top;"><p dir="ltr" style="line-height: 1.2; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 14pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">171G </span></p></td></tr></tbody></table></div></div>
<h3 style="text-align: left;">Notes</h3><ul id="docs-internal-guid-ad11cdd4-7fff-fc13-3b95-9adad1c7d07c" style="margin-bottom: 0px; margin-top: 0px; text-align: left;"><li style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre;"><p role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">pigz</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> can only compress in parallel with very minimal speedup on decompression</span></p></li><li style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre;"><p role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">xz</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> requires </span><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">-T0</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> option to use all cores in the system or will default to 1</span></p></li><li style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre;"><p role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">xz</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> cannot decompress files in parallel but </span><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">pixz</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> can</span></p></li><li style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; list-style-type: disc; text-decoration: none; vertical-align: baseline; white-space: pre;"><p role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">lbzip2</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> and </span><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">mpibzip2</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> can only decompress in parallel if the archive was compressed with a parallel aware compressor</span><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> </span><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> </span></p><p role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: 'Courier New'; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;">lz4</span><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"> is not parallel aware but is by far the fastest compressor of all, but with the least space savings</span></p><p role="presentation" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 0pt;"><span style="background-color: transparent; color: black; font-family: Arial; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"><span style="font-family: courier;">zstd</span> requires <span style="font-family: courier;">-T0</span> option to use all cores or will default to 1</span></p></li></ul><h3 style="text-align: left;"> Conclusion</h3><p style="text-align: left;">Overall using the drop in replacements for <span style="font-family: courier;">gzip</span> and <span style="font-family: courier;">bzip2</span> are obvious improvements on modern multi-core systems. While <span style="font-family: courier;">xz</span> and <span style="font-family: courier;">lz4</span> are available on almost all modern systems they are still less portable than <span style="font-family: courier;">gzip</span> and <span style="font-family: courier;">bz2</span> based compressors. </p><p style="text-align: left;">lz4 is very interesting as it's so fast it uses almost no CPU. If one was collecting data on a lower powered device using <span style="font-family: courier;">lz4</span> appears to be 'compression for free'. While not as effective as the other compressors there is almost no performance impact during tar/untar when using lz4. </p><p style="text-align: left;">One would hope over time the stock installs of gzip and bzip2 are replaced by the parallel versions. Xz is very stable but struggles to utilize very high core counts of modern systems, but still returns the best compression ratio.<br /></p>Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-84824510302277837372020-09-17T22:52:00.006-04:002020-09-17T22:53:31.488-04:00Archivetar - A better tar for Big Data<h2 style="text-align: left;"> Challenge: Trade-offs of Cost/Bit vs Bits/File and Performance<br /></h2><p style="text-align: left;">In the options of Tape, HDDs, SSD, and NVMe, there are significant trade offs to expected performance for small files, at a higher cost per unit capacity. In HPC we would love to deploy Peta Bytes of NVMe but most budgets cannot support it.</p><p style="text-align: left;">Tape and AWS Glacier have low costs, great bandwidth, but long seek times before the first file appears. Thus these technologies are often targeted at Archive use cases. It is left to the user though to organize their data in a way that does not make recalling data painfully slow.</p><h2 style="text-align: left;">80/20 Rule of Project Folders</h2><p style="text-align: left;">In a perfect world archived project folders would include data, source code, scripts to re-create the data etc. This leads to a common 80/20 split, where 80% of the files have 20% of the data. The total data volume drives the budget for storing the data, but the file count, which is only 20% of the data, drives management complexity.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2YakV1zMoRbmrYWoTYGf1Rgu5REiq3JxjoDJsd9LT3OjWFRx_W0t6C2HfHkO1y5cNam2SvWPc5NmkokIpn8sFaecuRSv3IW4v2tVYIynDNbY-x6Y5q3MUxYFidYS_Ydy0QkCgNzU3scE/s2048/filesize-8-20.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1055" data-original-width="2048" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj2YakV1zMoRbmrYWoTYGf1Rgu5REiq3JxjoDJsd9LT3OjWFRx_W0t6C2HfHkO1y5cNam2SvWPc5NmkokIpn8sFaecuRSv3IW4v2tVYIynDNbY-x6Y5q3MUxYFidYS_Ydy0QkCgNzU3scE/w400-h206/filesize-8-20.png" width="400" /></a></div><br /><h2 style="text-align: left;">Current Practices, One, Huge Tar</h2><p style="text-align: left;">Currently most researchers, not having better options, will tar and entire project and upload to an archive. As projects get larger this introduces issues:</p><ul style="text-align: left;"><li>Tars are larger than max object size</li><li>Compression is limited to a single core</li><li>To access subsets of data the entire archive must be retrieved and expanded. This requires 2x the storage space (Tar + Expanded Tar)</li><li>Opportunities for parallelism, are lost when transferring data at the file level</li><li>Large files, often binary, don't compress, dominate compressor time, for little benefit</li><li>Low utilization of CPU, Storage IO, and Networking<br /></li></ul><h2 style="text-align: left;">Desired Outcome, Sort and Split</h2><p style="text-align: left;">Preferably it would be better if files over a given size could be excluded. These will often be data files that are big enough to realize full archive performance. Files under this threshold could be sorted into lists, and assigned to tars of a target size. The end result being a folder of only large files and multiple tars of small files. Subsets of data can be recalled without needing to expand all archives.<br /></p><h2 style="text-align: left;">Archivetar - A better tar for Big Data</h2><p style="text-align: left;"><a href="https://github.com/brockpalen/archivetar/">Archivetar</a> aims to address exactly that workflow. <br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1815Dhxvg2SmnZP0YV8h2X4X5d6ZLgUsHVm9eLOFawn5HX1lCi9RmVhqktc1KjRUIukcItIJsFv3Bvj4TKgouAVg6_YzK7EbqRp85EClQ5J7JKbaHSKTY_c7mLb_01uSmH18i-Uj_Btc/s2048/workflow.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1055" data-original-width="2048" height="206" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi1815Dhxvg2SmnZP0YV8h2X4X5d6ZLgUsHVm9eLOFawn5HX1lCi9RmVhqktc1KjRUIukcItIJsFv3Bvj4TKgouAVg6_YzK7EbqRp85EClQ5J7JKbaHSKTY_c7mLb_01uSmH18i-Uj_Btc/w400-h206/workflow.png" width="400" /></a></div><p style="text-align: left;"><br /></p><p style="text-align: left;">Archivetar benefits include:</p><ul style="text-align: left;"><li>Utilized mpiFileUtils to quickly walk filesystems</li><li>Creates multiple tars simultaneously for higher performance on network filesystems</li><li>Auto detects many parallel compressors for multi-core systems</li><li>Saves an index of files in each tar to find subsets of data without needing to recall and expand all archives</li><li>Archives are still stand alone tars and can be expanded without archivetar installs <br /></li></ul><h3 style="text-align: left;">Example Archivetar</h3><p style="text-align: left;"><span style="font-family: courier;">#example data file count<br />[brockp@gl-login1 box-copy]$ find . | wc -l<br />6925</span></p><p style="text-align: left;"><span style="font-family: courier;"># create tars of all files smaller than 10M<br /># tars should be 200M before compression<br /># save purge list<br /># compress with pigz if installed<br />archivetar --prefix my-archive --size 10M --tar-size 200M --save-purge-list --gzip<br /><br /># delete small files and empty directories<br />archivepurge --purge-list my-archive-2020-09-17-22-35-20.under.cache<br /><br /># File count after<br />[brockp@gl-login1 box-copy]$ find . | wc -l<br />379<br /><br /># recreate<br />unarchivetar --prefix my-archive<br /><br />[brockp@gl-login1 box-copy]$ find . | wc -l<br />6925</span><br /></p>Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-82124724315918553702018-01-22T20:09:00.002-05:002018-01-22T20:09:43.554-05:00Automating Jetstream with Terraform<a href="http://jetstream-cloud.org/">Jetstream </a>is an OpenStack cluster for science that researchers can request access to via <a href="http://xsede.org/">XSEDE</a> which traditionally been known only as an HPC provider but has long provided other services. Jetstream provides many of the infrastructure as a service (IAAS) offerings many have turned to public cloud providers (Amazon, Google, and Azure) but many don't know that Jetstream exists.<br />
<br />
Another challenge is automation of Jetstream. AWS provides a service called cloud formation that allows automating deployments scaling etc without having to spend a lot of time in UI's and helps with predictability between deployements.<br />
<br />
Jetstream is just an implementation of Openstack at its most fundamental level and thus any tool that understands the Openstack API can work with Jetstream. Thus I went out and made a small example of how to bring up an CentOS7 system on Jetstream, and create all the supporting networks and security groups with <a href="http://terraform.io/">Terraform</a> an open source tool for automated infrastructure.<br />
<br />
<a href="https://github.com/brockpalen/tf-jetstream">You can find this example and documentation on my Github site.</a><br />
<br />
Users should find it simple to extend the example to make very complex multi network customized scalable environments, the same they can on public cloud providers but without extreme cost. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-36399129890771504452017-05-31T15:48:00.000-04:002017-05-31T15:48:03.253-04:00Job PostingJoin our group! <br />
<br />
<a href="http://careers.umich.edu/job_detail/142372/research_cloud_administrator_intermediate">http://careers.umich.edu/job_detail/142372/research_cloud_administrator_intermediate</a><br />
<br />
Looking for work using containers (docker) with some engine (Mesos, Kubernetes, or Rancher, etc) to increase the flexibility for deploying BigData tools in a dynamic research environment. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-37338985944171254092017-05-31T15:39:00.000-04:002017-05-31T15:56:40.562-04:00Opportunities for public cloud in researchIt comes down to $/performance and government regulations / sponsor requirements. I covered these in others. So what is my read of the tea leaves for cloud use in research computing.<br />
<br />
<ul>
<li><b>Campus level / modest projects with with stock CUI / NIST 800-171 or similar regulations.</b> <br />These require purpose built systems and heavy documentation. FedRamp made this simpler, avoid doing all the work required for this and save yourself the time.</li>
<li><b>Logging/Administrative/Web/CI Systems/Disaster Recovery. </b><br />These systems are generally small and part of the administrative stack of just having a center. These systems benefit the same way enterprise systems do with the flexibility of the cloud. I personally love PAAS and docker here, yes I would like another elastic search cluster please, no I do not want to worry about building it please.</li>
<li><b>High Availability Systems</b><br />IOT / Sensor Nets ingest points. Any system where you need higher availability than normal research. Similar to the sensitive systems, if you have a 1MW HPC data center you don't put the entire thing on generators and have a second data center for 20KW of distributed message buses for sensor networks. If you are not investing a lot of capital into the computer systems, don't do it anywhere else, piggyback on clouds offering of multi site built in. </li>
<li><b>High Throughput Computing / Grid Computing</b>New lower cost price models via AWS Spot, Google Interruptible, and Azure Low Priority make the cost of actual cycles very close to what you can buy bulk systems for. Every HPC center I know of is always running out of power and cooling, take these workloads that are insensitive to interruption / short run times and don't require high IO or tightly coupled parallelism and keep your scarce power for unique capability for HPC.</li>
<li> <b>Archive / HSM Replicas or the entire thing</b>Depending on your use of your tape today, the cloud sites make great replicas at similar costs. Some nitch providers like Oracle have costs that are hard to beat, with one catch. <i>As long as you never access your data. </i>Cost predictability for faculty is a problem, and with cold storage costing as much as $10,000/retrieved PB in the cloud, if your HSM system is busy use the cloud only for a second copy for DR. That is upload data (free generally), delete it (sometimes free) and never bring it back except on media error. This should help you limit your capital spend on a second site as well as the second site to put the system in.<br />If you are doing real archive, that is set it and forget it, ejected tape will forever be cheaper, but do you have a place to put it, and people to do the shuffle, there is a lot of value to maybe use the cloud for all of it.</li>
</ul>
This is my first (quick) set of thoughts, other systems like analytic systems should also be done in the cloud, they are already more mature than most research sites, and makes hosting things like notebooks, and splitting data across storage buckets for policy much more useful.<br />
<br />
I'm sure many of you will disagree with me, feel free to tweet me at @<a href="https://twitter.com/brockpalen">brockpalen</a>. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-5290099804675260612017-05-31T15:14:00.000-04:002017-05-31T15:14:01.023-04:00Data Providers Need to Catch up to Cloud
<div>
In my recent project looking to see if we could migrate to cloud in this generation for HPC another topic kept arising.
</div>
<div>
<br /></div>
<div>
We cannot yet take enough of our data or software off our owned systems and facilities.
</div>
<div>
<br /></div>
<div>
Beyond HIPAA, and BAA's there are a raft of other data regulations that data are provided to our researchers under. Last I checked there was thousands of faculty with hundreds of data sources in a campus environment.
</div>
<div>
<br /></div>
<div>
Right now because most campus projects are small, it is not worth it in both time, nor upsetting the data provider, to get any agreement in place with a cloud provider to host said data. Many of these plans require revealing information about your physical security and practices that you cannot have in general from a cloud provider. Or refer to standards that existed before clouds existed (anyone who looked at FISMA training pre-FedRamp, and any agreement with physical isolation will recognize this limitation). </div>
<div>
<br /></div>
<div>
Some data types (FISMA / NIST 800-171) come to mind that are actually easier to do in the major public clouds because you don’t need sign off from each of the data providers but just the agency who has already done the work with that public cloud provider. (NOTE: I am still early in looking into this, this is my current understanding, but I could be wrong). Thus after doing the last mile work (securing your host images, your staff policies, patch policies etc) you can actually respond to these needs faster in the cloud and get an ATO.
</div>
<div>
<br /></div>
<div>
So where does this leave the data providers that each have their own rules and require each project to have sign off form the provider making the fixed cost of each project high? As a community we should be educating them to move them towards aligning with one of the federal standards. Very few of these projects I have seen are actually stricter than NIST 800-171, thus if these data providers would accept these standards, and an ATO (Authority to operate) from the federal agencies, they would probably get better security/less under the desk 'air gaped' servers, but increase the impact / ease of access to data for the work they are trying to support.
</div>
<div>
<br /></div>
<div>
This would make funding go further, get technical staff and researchers back at what they do best and less time looking at data use agreements.
</div>
Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-34331758392092205602017-05-31T15:08:00.000-04:002017-05-31T15:08:08.989-04:00Comparing the Cost of Public Cloud to On Prem for HPC
<div>
I was recently working on long term planning of a modestly large HPC resource (20,000+ cores). The question proposed was why are we not doing this in the cloud?
</div>
<div>
<br /></div>
<div>
Personally I love the cloud for a lot of use cases, I would love to not worry about hardware, have the ability to burst to any scale, but after I did the work with one major cloud provider the economics were just not there. Will I think they get there? Probably, but not for at least 5-10 years for our shop without some discounting off list. Below I'm laying out my formula that for another shop might change the calculation:
</div>
<div>
<br /></div>
<div>
<br /></div>
<ol>
<li><b>Data Center Reliability</b><br />Cloud data centers aim to provide enterprise availability, probably Tier 3 or better. In academic HPC that draws the most MW from the data center infrastructure we don't value this much, but it is expensive to provide that level of availability.</li>
<li><b>Offerings designed for web content, enterprise, and analytics</b><br />Cloud offerings are almost all based on enterprise needs or web app delivery. HPC does not map to these work flows. Adding a few extra ms of network overhead is a small cost for human interaction on a website, but is awful in HPC MPI offerings. Yes there are HPC specific providers out there, but most are not big enough to handle the scale we are looking at, and you sacrifice most of the flexibility of cloud.<br />NOTE: As analytics becomes more important for academics we should pay very close attention here, and this might be the first option to large scale utilization of public cloud, as enterprise is ahead in this area currently. </li>
<li><b>Scaling</b><br />Cloud has way more scale in total cores than any HPC system out there, but as the adage goes, "<span style="font-style: italic;">there is no cloud only other people's computers</span>" and as many in the community have pointed out, if you have a decent sized consistent need, even reserved and pre-paid instances rack up costs quickly compared to building your own if your north of 500KW of constant need of HPC. Someone is paying for all that unused capacity to scale. In this case the massive scaling works against you in your marginal cost for additional long term need. </li>
<li><b>Staffing</b><br />Refer to #2 because cloud never wants to let anyone down, they are staffing at very high levels to keep all services running all the time to meet enterprise needs. This is great if you have a database that is key to your organization, it is very cheap compared to staffing in house for that one database, but for HPC again we don't value that as academics compared to the cost of doing that.</li>
</ol>
<div>
In general because there is no HPC specific cloud provider with scale providing a service that aims to provide "good enough" availability, if you have significant need public cloud economics won't work right now. HPC is capital heavy and staffing light compared to enterprise. Public cloud uses expensive capital for high availability (even if the lowest cost way to get it) that isn't valued by this community.
</div>
<div>
<br /></div>
<div>
Now if I was an enterprise IT person in the hardware / data center line of work. I would be worried and retool my skills for deploying and monitoring HA architecture across public clouds. The small and medium enterprise will be all cloud, it's just lower cost with greater flexibility. Once your sunk cost of a data center goes away small scale operators cannot compete the the investment being made in cloud. Your services should start shifting to running on cloud.
</div>
Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-1447620697878825792016-03-19T10:13:00.001-04:002016-03-19T10:13:47.294-04:00RCE 104: D-Wave Quantum Computing<iframe allowfullscreen="" frameborder="0" height="344" src="https://www.youtube.com/embed/faNO_iQi3iQ" width="459"></iframe>Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-63079966292361657542015-11-07T15:17:00.001-05:002015-11-07T15:17:09.236-05:00SSH Directly to XSEDE Resources with GSISSHMany know about the XSEDE <a href="https://portal.xsede.org/single-sign-on-hub">Single Sign On Login Hub</a>. Many don't know that you can make your own version of this on your local systems. To create the sign on hub, XSEDE uses the <a href="http://toolkit.globus.org/toolkit/downloads/latest-stable/">Globus Toolkit</a>.<br />
<br />
The steps include:<br />
<br />
<ul>
<li>Build Globus Toolkit with GSI enabled</li>
<li>Download XSEDE Certificates</li>
</ul>
<div>
<a href="https://gist.github.com/brockpalen/a02e8cd3866a3ce23007">Gist with example</a></div>
<div>
<br /></div>
<script src="https://gist.github.com/brockpalen/a02e8cd3866a3ce23007.js"></script>Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-46326403407432638132015-11-07T15:05:00.002-05:002015-11-07T15:05:14.430-05:00Big Data and Data Job Openings<span style="font-family: Helvetica; font-size: 12px; line-height: normal;">Advanced Research Computing - Technology Services ( </span><a href="http://arc-ts.umich.edu/" style="font-family: Helvetica; font-size: 12px; line-height: normal;">http://arc-ts.umich.edu/</a><span style="font-family: Helvetica; font-size: 12px; line-height: normal;"> ) at the University of Michigan has four new job openings as part of our Data Science Initiative ( </span><a href="http://record.umich.edu/articles/u-m-launching-100-million-data-science-initiative" style="font-family: Helvetica; font-size: 12px; line-height: normal;">http://record.umich.edu/articles/u-m-launching-100-million-data-science-initiative</a><span style="font-family: Helvetica; font-size: 12px; line-height: normal;"> ) and supporting our ongoing efforts in High Performance Computing. Available from entry level to senior.</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">ARC-TS builds and operates research computing platforms. These platforms will contain High Performance Computing (HPC) Linux clusters, High Throughput Computing (HT-Condor), data intensive (Hadoop, SQL, and NoSQL) systems, and containerized/virtualized systems (OpenStack, Docker). </span><br />
<span style="font-family: Helvetica; font-size: 12px; line-height: normal;">Big Data System Administrator Senior</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><a href="http://umjobs.org/job_detail/117063/big_data_system_administrator_seniorintermediate" style="font-family: Helvetica; font-size: 12px; line-height: normal;">http://umjobs.org/job_detail/117063/big_data_system_administrator_seniorintermediate</a><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">This position will act as a senior technical resource and be the primary position responsible for creating and operating and expanding our Hadoop and Spark infrastructure.</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">----------</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">Research Database Administrator Senior</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><a href="http://umjobs.org/job_detail/117056/research_database_administrator_seniorintermediate" style="font-family: Helvetica; font-size: 12px; line-height: normal;">http://umjobs.org/job_detail/117056/research_database_administrator_seniorintermediate</a><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">This position will act as a senior technical resource and be responsible for creating and operating our research database infrastructure and will be responsible for designing, building, operating, and supporting database platforms. These platforms will contain SQL, NoSQL, and columnar data stores.</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">----------</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">Research Cloud Administrator Intermediate</span><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><a href="http://umjobs.org/job_detail/117062/research_cloud_administrator_intermediate" style="font-family: Helvetica; font-size: 12px; line-height: normal;">http://umjobs.org/job_detail/117062/research_cloud_administrator_intermediate</a><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><br style="font-family: Helvetica; font-size: 12px; line-height: normal;" /><span style="font-family: Helvetica; font-size: 12px; line-height: normal;">This position will act as a technical resource as part of a team that will create and operate our private cloud infrastructure, and will be responsible for designing, building, operating, and supporting a research private cloud. The private cloud will host administrative systems, databases, and other services.</span>Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-45880468387873253482015-09-11T09:48:00.003-04:002015-09-11T09:48:35.365-04:00New Job OpeningsOver at <a href="http://arc-ts.umich.edu/">ARC-TS</a> we have two new job openings:<br />
<br />
Advanced Research Computing - Technology Services, the HPC, BigData and all around research computing group is expanding and we have two new job postings. While these postings refer to recent awards, the positions are backed by firm money.<br /><br /><b>HPC Storage Administrator Senior</b><br /><a href="http://umjobs.org/job_detail/115398/hpc_storage_administrator_senior">http://umjobs.org/job_detail/115398/hpc_storage_administrator_senior</a><br /><br />The position will be the primarily responsible for procurement, testing, development, implementation and user integration of a Ceph based storage system for the National Science Foundation’s CC*DNI funded OSiRIS project. Open Storage Research InfraStructure, or OSiRIS, will provide computable storage to a geographically distributed set of science users via virtualization technologies including RedHat Enterprise Virtualization (RHEV), software defined networking, and Shibboleth. <br /><br /><br /><b>HPC System Administrator Associate</b><br /><a href="http://umjobs.org/job_detail/115386/hpc_systems_administrator_associate">http://umjobs.org/job_detail/115386/hpc_systems_administrator_associate</a><br /><br />As a member of a high-performing team, the selected candidate will be responsible for user support, performing systems analysis, implementation, and troubleshooting moderate to complex technical issues and projects.Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-8071529771654786292015-07-29T14:40:00.001-04:002015-07-29T14:40:19.128-04:00Should all large allocations come with ECCS support?Quick thought, please forgive its underdeveloped nature.<br />
<br />
I'm sitting in the XSEDE15 Champions Fellow panel and I'm watching a trend of each project they worked on they are able to get huge speedups to codes.<br />
<br />
Given the size of some proposals, and the dollar value that translates into, if this behavior holds true (huge speedups for <a href="https://www.xsede.org/ecss">ECSS</a> efforts) for some class of requests*, should <a href="https://www.xsede.org/ecss">ECSS</a> review be required? It might cost less and then any changes to those codes will benefit anyone else using them on other systems.<br />
<br />
*I'm thinking that the large community codes that have already been heavily optimized, probably won't see this benefit. Large requests for privately developed codes with no prior relationship.<br />
<br />
Just thoughts, at 1Million CPU Hours, labor starts looking cheap.Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-56387819024700872642015-06-19T19:44:00.000-04:002015-06-19T19:44:04.210-04:00RCE-Cast hits 100Jeff and I would like to thank all our listeners who have kept with us going all the way back to 2009 !<br />
<br />
We released our <a href="http://www.rce-cast.com/Podcast/rce-100-fasterdata.html">100th episode</a> today with Eli Dart about <a href="https://fasterdata.es.net/">Fasterdata</a>, be sure to check it out.<br />
<br />
If you are new to www.RCE-Cast.com it is a podcast we host for all things scientific computing and/or nerdy. Be sure to get the <a href="http://www.rce-cast.com/Table/rce/Podcast/">back catalog</a>. The best kind of support we can get from you is if you leave a rating for us in iTunes and refer us to your friends, or send us requests for the show.<br />
<br />
Here is for a 100 more!<br />
<br />
Thanks,<br />
Brock PalenBrock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-49064371516878987262015-06-01T21:59:00.001-04:002015-06-01T21:59:20.091-04:00GridFTP Log Analysis with Logstash and Kibana / Elastic SearchAs noted in my post about <a href="http://www.failureasaservice.com/2014/10/lustre-stats-with-graphite-and-logstash.html">Lustre Stats with Graphite and Logstash</a> we are huge fans of the ELK (<a href="https://www.elastic.co/">Elastic Search</a>, <a href="https://www.elastic.co/products/logstash">Logstash</a>, <a href="https://www.elastic.co/products/kibana">Kibana</a>) stack. In that last example we didn't use the full ELK stack but in this example we are going to use ELK what it was meant for, log parsing and dash-boarding. <br />
<br />
We run a <a href="http://en.wikipedia.org/wiki/GridFTP">GridFTP</a> server using the <a href="http://globus.org/">Globus.org</a> packages. GridFTP for those who don't know is a <a href="http://www.failureasaservice.com/2015/05/file-transfer-tool-performance-globus.html">better performing</a> way to transfer data around. If you want to setup GridFTP please use the globus.org Globus Connect Server, its much easier than setting up the certificate system, and it quickly becoming the standard auth and identity provider for national research systems.<br />
<br />
GridFTP logs each transfer with your server. What I want to know his where, who, and how much is going though the server. I have been running this setup for a while now, but it could use some refining. You can find my full logstash config as of this <a href="https://gist.github.com/brockpalen/97312335c5119210e091#file-logstash-gridftp-conf">writing at Gist</a>. <br />
<br />
First the results:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4Dn7Ix_j33XNCITOgcxfifiiVsJjncS6nqkYe_G_rTOHPsX-Syd3Ulcghym8chRrKYs_CuGH-9JPQKXm_4He-gRKI318Sjn3v9i2L-ab227Rvuq7UmPBjRmbxXO8RYStEK40YGv_8oAM/s1600/kibana-gridftp.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="491" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj4Dn7Ix_j33XNCITOgcxfifiiVsJjncS6nqkYe_G_rTOHPsX-Syd3Ulcghym8chRrKYs_CuGH-9JPQKXm_4He-gRKI318Sjn3v9i2L-ab227Rvuq7UmPBjRmbxXO8RYStEK40YGv_8oAM/s640/kibana-gridftp.jpeg" width="640" /></a></div>
<br />
Logstash has a number of filters that makes this easier. We use the regular Grok filter to match the transfer stats lines from the GridFTP log. You could modify this to capture the entire log in Elastic search for archive reasons. Then the kv (key value) filter does a wonder on all of the log files key=value entries doing most of our work for us.<br />
<br />
I have to use a few grok filters to get the IP of the remote server isolated, but once done logstash has a built in geoip filter that tags all the transfers with geolocation information which lets the maps be created. Oh and in the dashboard those maps are interactive, so you can sort transfers just from another country by clicking on that country, or adding a direct filter for the country code, zipcode, etc. Really handy.<br />
<br />
Individual transfers are also mapped by what campus they are coming from if coming from a University address. Our sub nets across the three campuses are known and published, so we use the cidr filter to add a tag for each campus, so we can look at traffic from a specific campus. Again really handy, and would love to get contributions to see what traffic comes from internet2 / MiLR and the commodity internet. <br />
<br />
A few warnings, the bandwidth calculation is commented out for a reason. It works, but not all GridFTP log entries are complete to do the calculation, this makes ruby get angry and makes logstash hang.<br />
<br />
So it was very easy to use logstash to understand the GridFTP log files, then the rest of the ELK stack let us quickly make dashboards for our file transfers. <br />
<br />
I was inspired to write this after thinking there must be an easier way to handle GridFTP logs after a presentation at XSEDE 14 where the classic, scripts, plus copy log files, system was employed. The solution here is near real-time, and we found to be very durable. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-18226759847687569732015-05-18T20:50:00.000-04:002015-05-18T20:50:34.406-04:00Large-scale Visualization of Volumes from 2d Images This is a cross post from the day job at <a href="http://fluxhpc.blogspot.com/2015/05/large-scale-visualization-of-volumes.html">ARC-TS</a>.<br />
<br />
The <a href="http://www.nlm.nih.gov/research/visible/visible_gallery.html">Visible Human</a> project has a series of high resolution CT or MRI scans of human bodies. These images can be stitched together to make volume renderings of the original subject. First Images!<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgU3T77Pr5z76OSaAZOKTtdhXGcN1SHq88iBmtP2YXBdgtc-N95wkc-1xlj77NYcIdKv4UktXAkMmFTZ61Itluf3imJpdJz8hdLfpdBsUHApRD0RNMA3qZp4tC9cLLr5Bqy4Leosx_y0UFR/s1600/visit0000.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="188" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgU3T77Pr5z76OSaAZOKTtdhXGcN1SHq88iBmtP2YXBdgtc-N95wkc-1xlj77NYcIdKv4UktXAkMmFTZ61Itluf3imJpdJz8hdLfpdBsUHApRD0RNMA3qZp4tC9cLLr5Bqy4Leosx_y0UFR/s200/visit0000.png" width="200" /></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkR4c8ujYyDbItkby-tJCDXdK9SDAYAtjhUCq7a4kCpts9rFppMVQ_3fyTpKiuhIbJYbhT6MXw3KgRryYoQXz5KpDfeQIMTP9Rpi3R2ZLPhY-P0WsbLhKQHDMp1n7m8wX_56MoOiOoam4S/s1600/visit0002.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" height="188" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgkR4c8ujYyDbItkby-tJCDXdK9SDAYAtjhUCq7a4kCpts9rFppMVQ_3fyTpKiuhIbJYbhT6MXw3KgRryYoQXz5KpDfeQIMTP9Rpi3R2ZLPhY-P0WsbLhKQHDMp1n7m8wX_56MoOiOoam4S/s200/visit0002.png" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi12r0DL78rk8jKC12eD8O1zha7JxohtwgCpl7Ri5dU8rBqwa-imttuypW5OMAYlEjUP8HlmmtdSnimlTSxujTP8vKxs_r4ixk5Hltp8cpPmmQv34ySAFnVP1ek-4f5epKx_oiYDJaEAOUL/s1600/visit0001.png" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" height="188" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi12r0DL78rk8jKC12eD8O1zha7JxohtwgCpl7Ri5dU8rBqwa-imttuypW5OMAYlEjUP8HlmmtdSnimlTSxujTP8vKxs_r4ixk5Hltp8cpPmmQv34ySAFnVP1ek-4f5epKx_oiYDJaEAOUL/s200/visit0001.png" width="200" /></a><br />
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrBgXHiqzAIC3XZdeUvEogu_UE5WQP4D_9F7jfnbMqxTRoEhhqt98Rd5EKylMIiGc7AnaqPHIZ5R11q-AMMj1YUWCjBRrgo8nqDhvyV2s2ejzalA0bS0ULoImH8wOpnevzUqM6TpaWpd0g/s1600/visit0003.png" imageanchor="1" style="clear: left; display: inline !important; margin-bottom: 1em; margin-right: 1em; text-align: center;"><img border="0" height="188" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrBgXHiqzAIC3XZdeUvEogu_UE5WQP4D_9F7jfnbMqxTRoEhhqt98Rd5EKylMIiGc7AnaqPHIZ5R11q-AMMj1YUWCjBRrgo8nqDhvyV2s2ejzalA0bS0ULoImH8wOpnevzUqM6TpaWpd0g/s200/visit0003.png" width="200" /></a><br />
<br />
<br />
<br />
These images were generated from high resolution <a href="http://sibko.med.umich.edu/av/vl24.html">CT scans available here at Michigan</a>. The data in this case is over 5000 2d slices in TIFF format for total data of around 34GB.<br />
<br />
On standard systems working with the input data of this size is difficult let alone the derived 3d volume created. Lucky for us we can use the <a href="http://visitusers.org/index.php?title=Converting_Multiple_2D_Files_Into_One_3D_File">Visit <span style="font-family: Courier New, Courier, monospace;">imgvol</span></a> format specifically for this case.<br />
<br />
In the above example 32 cores with 25GB of memory each (800GB total) on the Flux Large Memory nodes was used and my personal Apple laptop running the Visit viewer over a home network connection (!!). Memory use in the creation of the above plots ranged from 3GB/core to 7.5GB/core. Rendering performance wasn't interactive, but a plot change would range from 15-45 seconds to redraw.<br />
<br />
The <span style="font-family: Courier New, Courier, monospace;">imgvol</span> format is very simple and allowed for us to create these sorts of plots very quickly. Most users don't have such huge data and can run this on their personal lab workstations. If your workstation isn't sufficient feel free to reach out to ARC-TS at hpc-support@umich.eduBrock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-47527102865347165822015-05-11T23:10:00.000-04:002015-05-11T23:10:28.118-04:00Keep RMA Processes SimpleLook at the picture below. Notice the quantity of paperwork included. What is going on here?<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEier3F5FOWNhneeDVyV8wZVNpfg71ey8-UC20lelu5Fgmc0Wp5Faw6BAw_GLE2P69Hdr8Jbp8VfcuBY4vV2oGjPVF5Pm0OMlZ3t8Zu_seA46kKz8Ql-g1vzoJVm4wN3p_PD2PnPX_B-ChI/s1600/Photo+Apr+28,+11+09+53+AM.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEier3F5FOWNhneeDVyV8wZVNpfg71ey8-UC20lelu5Fgmc0Wp5Faw6BAw_GLE2P69Hdr8Jbp8VfcuBY4vV2oGjPVF5Pm0OMlZ3t8Zu_seA46kKz8Ql-g1vzoJVm4wN3p_PD2PnPX_B-ChI/s400/Photo+Apr+28,+11+09+53+AM.jpg" width="400" /></a></div>
What does an RMA actually need? It needs the following:<br />
<ul>
<li>Which RMA is it</li>
<li>Where to return it</li>
</ul>
That is it, nothing else. If you are creating any RMA process, or any process don't fall into the following traps.<br />
<ul>
<li>Requesting information that was provided when the RMA was opened. The RMA number was assigned when the case was created, include that with the shiped package. Don't expect the customer to write it down again, this is error prone.</li>
<li>Requesting why the part failed. This was part of the RMA process and has already been determined by the case agent that proof exists and is on file.</li>
<li>Have instructions in an order that makes sense. The above case, requests the bad drives serial number be recorded after the instruction to seal the drive in an anti-static bag. So open the bag back up, record serial, and close it up a second time.</li>
</ul>
<br />
Really does the vendor read all this paper work? Why include DHL instruction and samples with a FedEx return tag? Why have a carbon copy work sheet with no instructions that you should keep a copy for your records, why all the paper and info on that sheet when its in the case in the support system? <br />
<br />
Now who has an RMA system that works great? DDN has the best I have seen. When a part fails they ship the part with a single sticker on the outside of the box with the RMA number, and a single FedEx return tag in the box. Swap parts and return. the RMA number on the outside identifies the case, why the part failed, and what part is expected to be returned.<br />
<br />
Covers all that is needed, keeps paper simple, a very enjoyable RMA experience. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-6975688582872757602015-05-07T22:53:00.000-04:002015-05-07T22:53:00.335-04:00Hive a high performance replacement for SQL databasesThis is a cross post from the day job. <br />
<br />
<a href="http://en.wikipedia.org/wiki/SQL">SQL</a> is is gaining popularity as more researchers work with structured data. Rather than reimport data every session, using a relational database (RDBMS) and leaving the data persistent and using SQL to query data is a significant improvement.<br />
<br />
The problem with standard RDBMS systems is that their algorithms are often serial and hampered by the needs to keep transactions (think keeping bank deposits and debits in order) consistent. This is also known as <a href="http://en.wikipedia.org/wiki/ACID">ACID</a>.<br />
<br />
In many research cases though researchers do not need transactions, they have data and they just want to query, or their data is append only such as new measurements. By relaxing the transactions needs researchers can use a whole host of new methods that are very scalable.<br />
<br />
Enter <a href="http://hive.apache.org/">Apache Hive</a>. Hive is a data warehouse tool that lets data on an <a href="http://hadoop.apache.org/">Hadoop</a> cluster (such as the <a href="http://caen.github.io/hadoop/user-hadoop.html">cluster at ARC-TS</a>) be queried using SQL syntax. For large tables even in to the thousands of GBytes of data, performance is consistent.<br />
<br />
In this example I have data in CSV format from a database. It has 12 columns and 1,487,169,693 rows. Total data size is about 880GB of raw data. With hive though once I have the data in Hadoop and <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/TruncateTable">create a table</a> out of it. I can use Hive to query it just as any other SQL table. <br />
<br />
<span style="font-family: Courier New, Courier, monospace;">SELECT COUNT(*) FROM sample_table;</span><br />
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
OK</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
1487169693</div>
<br />
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
Time taken: 75.875 seconds, Fetched: 1 row(s)</div>
<div>
<br /></div>
<div>
At 75.9 seconds to do a full table scan as Hive works on the raw text data and must read all the data for a query like this, the ARC-TS Hadoop cluster is able to scan the data at 11GB/s. Hive will maintain performance for ore complex queries also.</div>
<div>
<br /></div>
<div>
<span style="font-family: Courier New, Courier, monospace;">SELECT AVERAGE(sample_column) FROM sample_table;</span></div>
<div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
OK</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
0.011386917827452752</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
Time taken: 81.488 seconds, Fetched: 1 row(s)</div>
</div>
<br />
Researchers who work with a lot of structured data will find SQL on Hive to be intuitive and very powerful and effectively remove all limits to query performance and data size imposed by any other solution.<br />
<br />
To many researchers working with SQL or Hadoop is new to them and daunting but is part of the new BigData ecosystem. Please contact ARC-TS at hpc-support@umich.edu and one of our staff can help you with your data.Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-84623748966162791572015-05-03T23:35:00.001-04:002015-05-03T23:37:49.722-04:00Metadata ease of use in traditional HPC platformsI was reading a co-workers notes from <a href="http://msisflotsam.blogspot.com/2015/04/my-trip-to-bioit-world-2015.html">BioIT World 2015</a> and got thinking about useability of data, how object filesystems could be made more useful to the average simple researcher etc. This got me thinking about <a href="http://www.failureasaservice.com/2015/05/thinking-about-data-lifecycle.html">data life cycle</a> and the need for better metadata management. <br />
<br />
Users love regular POSIX filesystem with folders etc, and it is their own metadata structure:<br />
<br />
<span style="font-family: "Courier New", Courier, monospace;">experment1_v2/realresults/10subject_1alpha_.05deg_16f_0given/data.h5</span><br />
<br />
<span style="font-family: inherit;">People love <span style="font-family: inherit;">working</span> with this, our br<span style="font-family: inherit;">ains <span style="font-family: inherit;">w<span style="font-family: inherit;">rap around it. </span></span></span></span>The problem is data.h5 doesn't have any of the information from the directory structure the user has given it. Existing object store systems make it hard to navigate data like this.<br />
<br />
I propose two ideas, a pseudo filesystem that looks like folders but can point to data in multiple ways depending on what metadata attribute you are interested in. The second is a 'search only' filesystem. Think of a search only filesystem to be like Apple Spotlight or Launch Bar etc. Most the time it's close enough and it finds what you want based on metadata. These searchable systems should be extendable from user space (think like bash completion add ons) around different communities of use. <br />
<br />
This will allow for a few results:<br />
<ul>
<li>Users will find it useful in their own day to day work to attach metadata at data generation time rather than leaving it un-categorized data. </li>
<li>It should allow for more robust metadata though the data entire life cycle to archive and thus be more useful to future users of the data</li>
<li>Object filesystems holding the actual data can phase out traditional POSIX filesystem and hopefully help with many of the data scale problems we have had on the the trail to Exascale.</li>
</ul>
I think this would be the best of both worlds, human friendly (still has 'folders'), computer friendly (Trillions of objects, no directory size issues) and data reuse (more and better metadata) and productivity (find my data with X attribute easily using search). Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-25362639681636914532015-05-03T23:16:00.001-04:002015-05-03T23:16:39.065-04:00Thinking about the Data LifecycleI'm sure many have thought about this before me. <br />
<br />
My thoughts on data life cycle:<br />
<ol>
<li>Data as an idea (A project I have in mind to collect or generate data)</li>
<li>Model / Methodology refining (Data is of assumed low quality)</li>
<li>Bulk Data Generation (Data is of assumed high quality)</li>
<li>Post Processing (Traditional journal article generation)</li>
<li>Archiving (Code / Method and Data)</li>
<li>Secondary Insights (Other investigators)</li>
</ol>
1 through 4 are all under the control of the original investigator, 5 and 6 are not. There are many problems I see form this workflow. <br />
<br />
The most significant issues I see is that the incentives to the original investigator ends at 4. We keep talking about more data and code publication, encourage reuse etc, but until there is incentive to do so there will not be much of 5 and 6 going on. The current methods of requiring data management plans will only be moderately useful because it is a stick not a carrot. <br />
<br />
5 and 6 are more interesting to me from a technical standpoint. There is a lot of bit rot inherent in this process. Specifically the only way data is useful to a second investigator is if there is metadata and documentation (probably the first papers) describing what the data are and how they were generated. <br />
<br />
Metadata suffers from two problems, right now the archiving has minimal metadata attached, or is attached last minute and isn't attached at data generation time. Expecting all needed metadata to be recalled at a later date is unreasonable. <br />
<br />
The second issue is more subtle and I don't see how it can be addressed well, it is that is the needed metadata for any second investigator going to be listed and preserved? All data has almost unlimited metadata characteristics and many are not of interest to those who generated the data and cannot foresee how the data might be useful in the future to another. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-16680388979498810702015-05-03T21:07:00.000-04:002015-05-03T21:07:31.077-04:00File Transfer Tool Performance (Globus vs SCP)The following is something I <a href="http://fluxhpc.blogspot.com/2015/04/filetransfer-tool-performance.html">wrote up at the day job</a> to satisfy my curiosity. These measurements are not robust but encouraging. I added some additional thoughts at the end.<br />
<br />
<br />
On the HPC systems at ARC-TS we have two primary tools for transferring data, <a href="http://arc-ts.umich.edu/flux/using-flux/login-nodes/">scp</a> (secure copy), and <a href="http://arc-ts.umich.edu/flux/using-flux/transferring-files-with-globus-gridftp/">Globus</a> (GridFTP). Other tools like rsync and sftp operate over scp and thus will have performance comparable to that tool.<br />
<br />
So which tool performs the best? We are going to test two cases each moving data to the <a href="http://arc-ts.umich.edu/xsede/">XSEDE</a> machine <a href="https://portal.xsede.org/sdsc-gordon">Gordon</a> at SDSC. One test will be for moving a single large file, the second will be many small files.<br />
<h3>
Large file case.</h3>
For the large file we are moving a single 5GB file from Flux's scratch directory to the Gordon scratch directory. Both filesystems can move data at GB/s rates so the network or tool will be the bottleneck.<br />
<h4>
scp / gsiscp</h4>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
[brockp@flux-login2 stripe]$ gsiscp -c arcfour all-out.txt.bz2</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
gordon.sdsc.xsede.org:/oasis/scratch/brockp/temp_project</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
india-all-out.txt.bz2 100% 5091MB 20.5MB/s 25.6MB/s 04:08 </div>
<br />
Duration: 4m:08s<br />
<h4>
Globus </h4>
<div>
<span style="font-family: Helvetica; font-size: 12px;">Request Time : 2015-04-26 22:41:04Z</span></div>
<div>
<span style="font-family: Helvetica; font-size: 12px;">Completion Time : 2015-04-26 22:42:44Z</span></div>
<div>
<span style="font-family: Helvetica; font-size: 12px;">Effective MBits/sec : 427.089</span><br />
<span style="font-family: Helvetica; font-size: 12px;"><br /></span>
Duration: 1m:40s 2.5x faster than SCP<br />
<h3>
Many File Case</h3>
</div>
<div>
In this case the same 5GB file was split into 5093 1MB files. Many may not know that every file has overhead, and that it is well known that moving many small files of the same size is much slower than moving one larger file of the same total size. How much impact and can Globus help with this impact read below.</div>
<div>
<br /></div>
<div>
<h4>
scp / gsiscp</h4>
<div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
[brockp@flux-login2 stripe]$ time gsiscp -r -c arcfour iobench</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
gordon.sdsc.xsede.org:/oasis/scratch/brockp/temp_project/</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
real<span class="Apple-tab-span" style="white-space: pre;"> </span>28m9.179s</div>
</div>
<div style="background-color: #dfdbc4; color: #4c2f2d; font-family: Courier; font-size: 12px;">
</div>
</div>
<br />
Duration: 28m:09s<br />
<h4>
Globus</h4>
<div>
<span style="font-family: Helvetica; font-size: 12px;">Request Time : 2015-04-27 00:18:40Z</span></div>
<div>
<span style="font-family: Helvetica; font-size: 12px;">Completion Time : 2015-04-27 00:25:30Z</span></div>
<div>
<span style="font-family: Helvetica; font-size: 12px;">Effective MBits/sec : 104.423</span></div>
<div>
<span style="font-family: Helvetica; font-size: 12px;"><br /></span></div>
<div>
Duration: 7m:50s 3.6x faster than SCP</div>
<div>
<h4>
Conclusion</h4>
</div>
<div>
Globus provides significant speedup both for single large files and many smaller files over scp. The result is even more significant the smaller the files because of the overhead in scp doing one file at a time.<br />
<br />
<b>Additional Thoughts</b><br />
Here are some additional thoughts for admins of Globus Servers. Specifically those using Globus.org web client. If you look at your endpoint settings in <span style="font-family: "Courier New",Courier,monospace;">cli.globusonline.org</span> <b> </b> using <span style="font-family: "Courier New", Courier, monospace;">endpoint-list -v <endpointname> </endpointname></span>there are settings for both concurrency and parallelism. Both have settings for preferred and maximum settings. Further work would be for servers with large network connections and large parallel filesystems to increase these default values and see if better network utilization and performance could be realized.</div>
Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-77120480410823390052015-04-16T21:48:00.001-04:002015-04-16T21:48:46.788-04:00Universities: Eat your own dog food<a href="http://en.wikipedia.org/wiki/Eating_your_own_dog_food">Eating your own dog food</a> is the idea that you use what you sell. In universities we claim to have the best faculty training the next generation of students. We also claim we want to give out students more real life collaborative work projects.<br />
<br />
So why is it that the universities with the top business schools bring in outside consultants for their own administrative processes? The best marketing schools bringing marketing firms? Engineering schools bringing in design firms? <br />
<br />
Give major tasks to your faculty with student teams to mentor. Have a competition between teams. Think of the signal this would send to the public, the trust we place in our product, the pride the students would gain that they literally built their institution. I think this would be good for keeping alumni connected over time if they came to campus and saw the building they helped design the network for, or the art on campus.<br />
<br />
It takes faith, and there are reasons one wouldn't want to do this, but I think a strong look at maybe doing internal grants to faculty who are the best at what they do, and students, who can get real experience and connection to their school. <br />
<br />
Examples at Michigan, some have student and faculty involvement, some could be better:<br />
<ul>
<li>Art on campus (Art and Design)</li>
<li>Buildings (Architechture)</li>
<li>Campus layout and planing (Urban Planning)</li>
<li>Green energy initiatives (Architecture and Engineering)</li>
<li>Data center design (Architecture and Engineering)</li>
<li>WiFi network build out (Engineering) </li>
<li>Marketing and Development (Business and Video/Performing Arts)</li>
<li>Administrative Processes (Business)</li>
<li>Contracts and tech transfer (Engineering and Law)</li>
<li>Wellness programs (Medical, Nursing, and Public Health)</li>
</ul>
Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-87996279460385347342015-04-10T08:50:00.000-04:002015-04-10T08:50:02.265-04:00We care about computing, not computersThe title comes from my first HPC supervisor <a href="https://www.linkedin.com/profile/view?id=29040140&authType=NAME_SEARCH&authToken=h3_C&locale=en_US&srchid=423771461428669508034&srchindex=1&srchtotal=13&trk=vsrp_people_res_photo&trkInfo=VSRPsearchId%3A423771461428669508034%2CVSRPtargetId%3A29040140%2CVSRPcmpt%3Aprimary%2CVSRPnm%3Atrue">Andrew Caird</a> but if you have heard me speak before this is something I hammer relentlessly. It is my mantra, my one line mission statement that actually has meaning, and doesn't come from <a href="http://cmorse.org/missiongen/">Dilbert</a>.<br />
<br />
To many Research Computing resource providers get buried in hardware and get tunnel vision. Jonathan Dursi recently <a href="http://dursi.ca/hpc-is-dying-and-mpi-is-killing-it/">written article </a>I think really was starting to touch on issues that arise from this provider tunnel vision to hardware.<br />
<br />
Here is the deal, for most real science work being done on our platforms. Researchers use a HPC cluster, or Spark cluster, GPU, etc. <b>Only because they have to be competitive and timely in their research.</b><br />
<br />
If cloud providers could provide performance and pricing competitive to our local resources with reliable access for all workloads (sorry spot prices). I would tell our staff to shut everything down, and turn the data center into a racquetball court.<br />
<br />
Our team could do so much more for education of researchers in these new tools and moving up the value stack if we didn't have to spend our time with gear.<br />
<br />
We care about <strike>computing</strike> <b>research</b>, not computers.Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-87674135687949618472015-04-09T22:29:00.000-04:002015-04-09T22:29:29.496-04:00HPC is dying?Jonathan Dursi has a new post out that is causing a storm, <a href="http://dursi.ca/hpc-is-dying-and-mpi-is-killing-it/">HPC is dying, and MPI is killing it</a>. Jonathan makes a lot of good points, many I agree with but I think many commenting on it have fallen into a common problem of looking at it as a technical problem.<br />
<br />
It's a social problem. It's a <a href="http://en.wikipedia.org/wiki/Local_knowledge_problem">Knowledge Problem.</a><br />
<br />
Find me a faculty or a grad student who knows about <a href="http://spark.apache.org/">Spark</a>, or <a href="http://chapel.cray.com/">Chapel</a> (<a href="http://www.rce-cast.com/Podcast/rce-80-chapel.html">RCE 80</a>)? Heck find me a faculty or grad student who even uses BLAS or LAPACK? These have been around for decades and the understanding about their benefit and availability is rare to come across among faculty and students.<br />
<br />
I am fully on board with Jonathan, and I am going to put words in his mouth, HPC needs to be a big tent, and for research we need to be open to all technologies that demonstrate value, and not cling to a single solution.<br />
<br />
So getting back to the Knowledge Problem where is the information? MapReduce and the successors Spark/Flink come from the data intensive internet scale application world and honestly comes from business and is coming back to academia where most of us MPI folk are. It is just two different communities solving their own problems. Getting them to talk, when they have no common goals other than scale and performance is mixing oil and water.<br />
<br />
There is also a generational gap, I spent some time evaluating running Spark (really Yarn containers) and HPC/MPI codes next to each other without any hacks, and I got push back from both communities. Each saw the other as a play thing that is a novelty and while could be useful is not where effort is being invested.<br />
<br />
As for Chapel and PGAS, most of this is information dispersion also. People don't know these languages exist. Chapel has the other artifact that the funding for the base effort was limited, and left to a community effort in a community that didn't see a driving need for it. Even in a world where simpler methods would be useful adoption will stay low and never hit critical mass.<br />
<br />
An example of a simpler method would be MPI bindings for Python or other easy to boot strap language. We don't see much new code being done here as one would expect ether. Why deal with stdlib when you could have all the simplicities of Python, a mature, stable, easy language and use the MPI we are all so desperately clinging to?<br />
<br />
It will take a generation, and new domains entering the space of stodgy FORTRAN and C programers. We see this in genomics where many codes are java, perl, or python, languages that a 'respectable' HPC programer would never touch. <br />
<br />
This is how new things will happen. The old guard that made the last innovation will on average not bring you the next innovation. The ice company didn't bring us the refrigerator. Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0tag:blogger.com,1999:blog-3991757369876144735.post-62147309971970710832015-04-02T22:04:00.002-04:002015-04-02T22:04:14.585-04:00Come work with me!<a href="http://arc-ts.umich.edu/">Advanced Research Computing - Technology Services</a> (ARC-TS) where I work currently has two job openings. Please pass along to anyone you might think be interested.<br />
<br />
<h1>
<span style="font-size: small;"><span style="font-family: inherit;"><a href="http://umjobs.org/job_detail/107992/hpc_systems_administrator_seniorintermediate">HPC Systems Administrator Senior/Intermediate</a></span></span></h1>
<h1>
<span style="font-weight: normal;"><span style="font-size: small;">This position will act as a senior technical resource and be responsible
for coordinating with other ARC-TS members with input from unit
support to create the next generation of research computing
infrastructure. The successful candidate will have the ability to lead
specific project initiatives and provide expertise, guidance, and strong
collaboration to various team members. This role will be dynamic to
meet the changing requirements for building and supporting new and
innovative systems to meet to faculty needs.</span></span></h1>
<h1>
<span style="font-size: small;"><span style="font-family: inherit;"><a href="http://umjobs.org/job_detail/108135/research_storage_leadsenior">Research Storage Lead/Senior</a></span></span></h1>
<h1>
<span style="font-weight: normal;"><span style="font-size: small;">The Research Storage Lead role requires someone who works well without
supervision and who proactively anticipates and resolves problems. The
successful candidate will be able to rapidly and proactively respond to
researcher needs; both with technical solutions and support of faculty
and institute led proposal submissions. This role will require
collaborating with unit and ARC-TS staff to improve ARC-TS operational
depth.</span></span></h1>
Brock Palenhttp://www.blogger.com/profile/03992571343475028656noreply@blogger.com0