AMD Killed ARM in the Data Center

When I first started in HPC, we had a software title that came on physical media in a three ring binder. It covered many platforms, Solaris on Spark, AIX on Power, Linux on Power, Linux on x86, Windows on x86, Linux on x86-64, etc. Today that same package only supports Windows and Linux on x86-64.

As early as 2 years ago ARM (and Power) for HPC and data center was a major interest. So why did this happen? Looking back it appears the market wanted an competitive alternative to Intel. Maybe Intel could have avoided this situation, and they did make some decisions that didn't help, I think though it's just nature of the market.

Many of these decisions make sense, if the market will carry the cost, sure segment for extra margin (pay extra for >768GB or AVX-512). Sure don't implement on die support for NVLink in hope that your now market rejected (Phi) / or upcoming GPU (Xe) competitor would be a platform solution.

So the market starting with hyperscalers and extreme scale HPC, like DOE, started looking at other options. Power had some high profile wins, but quickly the attention turned to ARM. All the major cloud providers have some ARM play today.

So why do we see ARM leaving the data center as a general purpose CPU outside of cloud? My opinion: AMD is competitive, leaving minimal market vs the investment to move to a new platform. Maybe ARM was just used to negotiate with Intel.

By having an x86-64 capable part at a good value, and generally simple to understand portfolio, the market appetite dried up over night for a non-x86 alternative. Going back to my three ring binder of OS+CPU options, the market doesn't like that complexity if it doesn't provide significant value. The industry has matured, the market has spoken.

So it's simple economics; easy to support, good value, you now have x86 market competition that is good for us but ARM for data center is a casualty of that. The market just dried up.

We have used ARM CPUs and it was very good for our use case, but I think the decision to use the IP for custom projects and not general purpose is the right decision for all the providers. If we have a repeat of AMD losing competitive capacity again, I would expect the market to respond and we could have a repeat of the last several years.

What about cloud ARM?

While AWS and other cloud provider ARM offerings will keep things alive I don't expect it to make much general inroads for general purpose. If data center ARM remains a cloud provider specific IP it will struggle in the near future to get the application and providers will want to keep our support envelope small. Looking at the instance types it's not clear that the cloud providers even position these systems for performance use vs. scale out application servers. For anything other than very large single use cases these will remain niche offerings.

What about the A64fx CPU?

I'm not holding my breath. We are seeing adoption of A64fx outside Japan, but like Power8/9 I'll wait and see if it's anything outside the largest systems, or if interest wanes as there are better supported competition on the x86 space again. The one exception to this is in Europe (speculative) where they appear to be taking a xenophobic approach to CPU and wanting to develop European CPU IP. ARM creates issues for them to continue down that path.

The major performance benefits of A64fx would easy be provided by both Intel and AMD by providing an HBM front-ended CPU. So short of national support it's easy pickings for a niche market.

Summary

The market likes competition but also like as much simplicity and compatibility as possible. Once an reputable competitive x86 offering was on the market (AMD) interest in more complex options declined significantly.

FaaS - Failure as a Service

Friday, October 30, 2020

AMD Killed ARM in the Data Center

No comments:

Post a Comment

Followers