Friday, October 30, 2020

AMD Killed ARM in the Data Center

 When I first started in HPC, we had a software title that came on physical media in a three ring binder. It covered many platforms, Solaris on Spark, AIX on Power, Linux on Power, Linux on x86, Windows on x86, Linux on x86-64, etc.  Today that same package only supports Windows and Linux on x86-64.

As early as 2 years ago ARM (and Power) for HPC and data center was a major interest. So why did this happen? Looking back it appears the market wanted an competitive alternative to Intel. Maybe Intel could have avoided this situation, and they did make some decisions that didn't help, I think though it's just nature of the market.

Many of these decisions make sense, if the market will carry the cost, sure segment for extra margin (pay extra for >768GB or AVX-512). Sure don't implement on die support for NVLink in hope that your now market rejected (Phi) / or upcoming GPU (Xe) competitor would be a platform solution.  

So the market starting with hyperscalers and extreme scale HPC, like DOE, started looking at other options.  Power had some high profile wins, but quickly the attention turned to ARM.  All the major cloud providers have some ARM play today.

So why do we see ARM leaving the data center as a general purpose CPU outside of cloud?  My opinion: AMD is competitive, leaving minimal market vs the investment to move to a new platform. Maybe ARM was just used to negotiate with Intel.

By having an x86-64 capable part at a good value, and generally simple to understand portfolio, the market appetite dried up over night for a non-x86 alternative.  Going back to my three ring binder of OS+CPU options, the market doesn't like that complexity if it doesn't provide significant value.  The industry has matured, the market has spoken.

So it's simple economics; easy to support, good value, you now have x86 market competition that is good for us but ARM for data center is a casualty of that.  The market just dried up.

We have used ARM CPUs and it was very good for our use case, but I think the decision to use the IP for custom projects and not general purpose is the right decision for all the providers.  If we have a repeat of AMD losing competitive capacity again,  I would expect the market to respond and we could have a repeat of the last several years.

What about cloud ARM?

While AWS and other cloud provider ARM offerings will keep things alive I don't expect it to make much general inroads for general purpose.  If data center ARM remains a cloud provider specific IP it will struggle in the near future to get the application and providers will want to keep our support envelope small.  Looking at the instance types it's not clear that the cloud providers even position these systems for performance use vs. scale out application servers. For anything other than very large single use cases these will remain niche offerings.

What about the A64fx CPU?

I'm not holding my breath. We are seeing adoption of A64fx outside Japan, but like Power8/9 I'll wait and see if it's anything outside the largest systems, or if interest wanes as there are better supported competition on the x86 space again.  The one exception to this is in Europe (speculative) where they appear to be taking a xenophobic approach to CPU and wanting to develop  European CPU IP.  ARM creates issues for them to continue down that path. 

The major performance benefits of A64fx would easy be provided by both Intel and AMD by providing an HBM front-ended CPU.  So short of national support it's easy pickings for a niche market. 

Summary

The market likes competition but also like as much simplicity and compatibility as possible. Once an reputable competitive x86 offering was on the market (AMD) interest in more complex options declined significantly. 

Sunday, October 18, 2020

Parallel Compressor Performance for Science - pigz, lbzip2, xz

UPDATE: At requet of a friend I looked into zstd and wow it's a great option. As it becomes more ubiquitous it should likely replace most compressors. Compresson similar to xz and speed approaching llz4 for modest cpu increase.

Origonal Post

As data volumes grow and single core performance grows slower than core count, compressing large volumes of data quickly requires the use of compressors that are capable of utilizing multiple cores for keeping up with the data volumes and hardware investments.

Luckily there are several available that are compatible compressors out there, but how do they perform and compare to classic gzip?  Also how well do they work on scientific data?  Often scientific data has a few very large files that are often binary and thousands of small files that are compressible.

The Host

All tests were done on the Great Lakes login node.  The properties of this node are:

  • 36 core 36 thread Intel Xeon 6154 
  • 192 GB Memory
  • 1.9PB GPFS File System 
  • 100Gbps HDR Network

The Data

The data set has the following properties

  • 6649 files
  • 276 directories
  • 221 GB total size

 Range                       Number
[   0.000  B -   0.000  B ) 1
[   0.000  B -   1.000 KB ) 560
[   1.000 KB -   1.000 MB ) 4935
[   1.000 MB -  10.000 MB ) 1175
[  10.000 MB - 100.000 MB ) 116
[ 100.000 MB -   1.000 GB ) 94
[   1.000 GB -  10.000 GB ) 43
[  10.000 GB - 100.000 GB ) 1
[ 100.000 GB -   1.000 TB ) 0
[   1.000 TB -        MAX ) 0


Results

This compares runtime and final archive size as compared to serial gzip.  This was accomplished with 

tar -I pigz -cf myarchive.tar.gz

Command

Compatible

Parallel

Compress

Parallel

Decom.

Speed vs. Gzip

Gzip Size
153 G

gzip

gzip

No

No

1x

153G

pigz

gzip

Yes

No

32x

153G

lbzip2

bzip2

Yes

Yes

23x

151G

mpibzip2

bzip2

Yes

Yes

*

151G

xz -T0

xz/lzma

Yes

No

5.5x

137G

pixz

xz/lzma

Yes

Yes

5.5x

137G 

zstd -T0

zst

Yes

Yes

67x

155G 

lz4

lz4

No

No

42.2x

171G 

Notes

  • pigz can only compress in parallel with very minimal speedup on decompression

  • xz requires -T0 option to use all cores in the system or will default to 1

  • xz cannot decompress files in parallel but pixz can

  • lbzip2 and mpibzip2  can only decompress in parallel if the archive was compressed with a parallel aware compressor  

    lz4 is not parallel aware but is by far the fastest compressor of all, but with the least space savings

    zstd requires -T0 option to use all cores or will default to 1

 Conclusion

Overall using the drop in replacements for gzip and bzip2 are obvious improvements on modern multi-core systems.   While xz and lz4 are available on almost all modern systems they are still less portable than gzip and bz2 based compressors. 

lz4 is very interesting as it's so fast it uses almost no CPU.  If one was collecting data on a lower powered device using lz4 appears to be 'compression for free'.  While not as effective as the other compressors there is almost no performance impact during tar/untar when using lz4. 

One would hope over time the stock installs of gzip and bzip2 are replaced by the parallel versions. Xz is very stable but struggles to utilize very high core counts of modern systems, but still returns the best compression ratio.

Thursday, September 17, 2020

Archivetar - A better tar for Big Data

 Challenge: Trade-offs of Cost/Bit vs Bits/File and Performance

In the options of Tape, HDDs, SSD, and NVMe, there are significant trade offs to expected performance for small files, at a higher cost per unit capacity.   In HPC we would love to deploy Peta Bytes of NVMe but most budgets cannot support it.

Tape and AWS Glacier have low costs, great bandwidth, but long seek times before the first file appears. Thus these technologies are often targeted at Archive use cases.  It is left to the user though to organize their data in a way that does not make recalling data painfully slow.

80/20 Rule of Project Folders

In a perfect world archived project folders would include data, source code, scripts to re-create the data etc.  This leads to a common 80/20 split, where 80% of the files have 20% of the data.  The total data volume drives the budget for storing the data, but the file count, which is only 20% of the data, drives management complexity.


Current Practices, One, Huge Tar

Currently most researchers, not having better options, will tar and entire project and upload to an archive. As projects get larger this introduces issues:

  • Tars are larger than max object size
  • Compression is limited to a single core
  • To access subsets of data the entire archive must be retrieved and expanded. This requires 2x the storage space (Tar + Expanded Tar)
  • Opportunities for parallelism, are lost when transferring data at the file level
  • Large files, often binary, don't compress, dominate compressor time, for little benefit
  • Low utilization of CPU, Storage IO, and Networking

Desired Outcome, Sort and Split

Preferably it would be better if files over a given size could be excluded. These will often be data files that are big enough to realize full archive performance.  Files under this threshold could be sorted into lists, and assigned to tars of a target size.  The end result being a folder of only large files and multiple tars of small files.  Subsets of data can be recalled without needing to expand all archives.

Archivetar - A better tar for Big Data

Archivetar aims to address exactly that workflow.


Archivetar benefits include:

  • Utilized mpiFileUtils to quickly walk filesystems
  • Creates multiple tars simultaneously for higher performance on network filesystems
  • Auto detects many parallel compressors for multi-core systems
  • Saves an index of files in each tar to find subsets of data without needing to recall and expand all archives
  • Archives are still stand alone tars and can be expanded without archivetar installs

Example Archivetar

#example data file count
[brockp@gl-login1 box-copy]$ find . | wc -l
6925

# create tars of all files smaller than 10M
# tars should be 200M before compression
# save purge list
# compress with pigz if installed
archivetar --prefix my-archive --size 10M --tar-size 200M --save-purge-list --gzip

# delete small files and empty directories
archivepurge --purge-list my-archive-2020-09-17-22-35-20.under.cache

# File count after
[brockp@gl-login1 box-copy]$ find . | wc -l
379

# recreate
unarchivetar --prefix my-archive

[brockp@gl-login1 box-copy]$ find . | wc -l
6925

Monday, January 22, 2018

Automating Jetstream with Terraform

Jetstream is an OpenStack cluster for science that researchers can request access to via XSEDE which traditionally been known only as an HPC provider but has long provided other services.  Jetstream provides many of the infrastructure as a service (IAAS) offerings many have turned to public cloud providers (Amazon, Google, and Azure) but many don't know that Jetstream exists.

Another challenge is automation of Jetstream.  AWS provides a service called cloud formation that allows automating deployments scaling etc without having to spend a lot of time in UI's and helps with predictability between deployements.

Jetstream is just an implementation of Openstack at its most fundamental level and thus any tool that understands the Openstack API can work with Jetstream.  Thus I went out and made a small example of how to bring up an CentOS7 system on Jetstream, and create all the supporting networks and security groups with Terraform an open source tool for automated infrastructure.

You can find this example and documentation on my Github site.

Users should find it simple to extend the example to make very complex multi network customized scalable environments, the same they can on public cloud providers but without extreme cost.

Wednesday, May 31, 2017

Job Posting

Join our group!

http://careers.umich.edu/job_detail/142372/research_cloud_administrator_intermediate

Looking for work using containers (docker) with some engine (Mesos, Kubernetes, or Rancher, etc) to increase the flexibility for deploying BigData tools in a dynamic research environment.

Opportunities for public cloud in research

It comes down to $/performance and government regulations / sponsor requirements. I covered these in others.  So what is my read of the tea leaves for cloud use in research computing.

  • Campus level / modest projects with with stock CUI / NIST 800-171 or similar regulations. 
    These require purpose built systems and heavy documentation.  FedRamp made this simpler, avoid doing all the work required for this and save yourself the time.
  • Logging/Administrative/Web/CI Systems/Disaster Recovery. 
    These systems are generally small and part of the administrative stack of just having a center.  These systems benefit the same way enterprise systems do with the flexibility of the cloud.  I personally love PAAS and docker here, yes I would like another elastic search cluster please,  no I do not want to worry about building it please.
  • High Availability Systems
    IOT / Sensor Nets ingest points.  Any system where you need higher availability than normal research.  Similar to the sensitive systems, if you have a 1MW HPC data center you don't put the entire thing on generators and have a second data center for 20KW of distributed message buses for sensor networks.  If you are not investing a lot of capital into the computer systems,  don't do it anywhere else, piggyback on clouds offering of multi site built in. 
  • High Throughput Computing / Grid ComputingNew lower cost price models via AWS Spot, Google Interruptible, and Azure Low Priority make the cost of actual cycles very close to what you can buy bulk systems for.  Every HPC center I know of is always running out of power and cooling, take these workloads that are insensitive to interruption / short run times and don't require high IO or tightly coupled parallelism and keep your scarce power for unique capability for HPC.
  •  Archive / HSM Replicas or the entire thingDepending on your use of your tape today, the cloud sites make great replicas at similar costs.  Some nitch providers like Oracle have costs that are hard to beat, with one catch.  As long as you never access your data. Cost predictability for faculty is a problem, and with cold storage costing as much as $10,000/retrieved PB in the cloud, if your HSM system is busy use the cloud only for a second copy for DR.  That is upload data (free generally), delete it (sometimes free) and never bring it back except on media error.  This should help you limit your capital spend on a second site as well as the second site to put the system in.
    If you are doing real archive, that is set it and forget it, ejected tape will forever be cheaper, but do you have a place to put it, and people to do the shuffle, there is a lot of value to maybe use the cloud for all of it.
This is my first (quick) set of thoughts, other systems like analytic systems should also be done in the cloud, they are already more mature than most research sites, and makes hosting things like notebooks, and splitting data across storage buckets for policy much more useful.

I'm sure many of you will disagree with me, feel free to tweet me at @brockpalen.

Data Providers Need to Catch up to Cloud

In my recent project looking to see if we could migrate to cloud in this generation for HPC another topic kept arising.

We cannot yet take enough of our data or software off our owned systems and facilities.

Beyond HIPAA, and BAA's  there are a raft of other data regulations that data are provided to our researchers under.  Last I checked there was thousands of faculty with hundreds of data sources in a campus environment.

Right now because most campus projects are small, it is not worth it in both time, nor upsetting the data provider, to get any agreement in place with a cloud provider to host said data. Many of these plans require revealing information about your physical security and practices that you cannot have in general from a cloud provider.  Or refer to standards that existed before clouds existed (anyone who looked at FISMA training pre-FedRamp, and any agreement with physical isolation will recognize this limitation).

Some data types (FISMA / NIST 800-171) come to mind that are actually easier to do in the major public clouds because you don’t need sign off from each of the data providers but just the agency who has already done the work with that public cloud provider.  (NOTE: I am still early in looking into this, this is my current understanding, but I could be wrong).  Thus after doing the last mile work (securing your host images, your staff policies, patch policies etc) you can actually respond to these needs faster in the cloud and get an ATO.

So where does this leave the data providers that each have their own rules and require each project to have sign off form the provider making the fixed cost of each project high?  As a community we should be educating them to move them towards aligning with one of the federal standards.  Very few of these projects I have seen are actually stricter than NIST 800-171, thus if these data providers would accept these standards, and an ATO (Authority to operate) from the federal agencies, they would probably get better security/less under the desk 'air gaped' servers, but increase the impact / ease of access to data for the work they are trying to support.

This would make funding go further, get technical staff and researchers back at what they do best and less time looking at data use agreements.