Sunday, February 8, 2015

Will scrutiny of HPC resource allocations rise over time?

I am a huge fan of the XSEDE Startup Allocation. I hope it always remains as easy to get as it does today.  It allows researchers new to HPC to test the waters, and find the value in what they are doing.  But what is the dollar value in a startup allocation?
The grand total number of SUs in a Startup or Education request cannot exceed 200,000 SUs.
SU's or Service Units are CPU hours on platforms. Lets compare what it would cost to buy this many CPU hours on other platforms:
Flux University of Michigan Flux On Demand Amazon On Demand c4.8xlarge Amazon Reserved 3yr c4.8xlarge Penguin on Demand
Unit Rate $0.0162/SU** $0.0420/SU** $0.0516/SU $0.0191/SU $0.12/SU
Scaled $3,240* $8,402 $10,320 $3,820 $24,000

The minimum value then of a full startup is $3,240 on (shameless plug) on Flux at Michigan. Flux is a full HPC service, includes bare metal, infiniband, storage, software, support.  Penguin is also a full service. AWS gives you just (virtualized) hardware, there are many missing costs to make it a full service.  Penguin is probably the closest representation of what it really costs to run a full HPC service with no in kind support from some supporting institution.  Penguin does have academic rates but I cannot share them here, but you can ask.

So if the real cost of a startup of 200,000 hours, and scale that to any HPC grant of cpu hours, is between $8,400* and $24,000 ask yourself, if you asked a funding agency for $8,400 would they give it to you as easily as XSEDE does?

I am worried that in a resource limited world, that the only direction for allocations is that they are going to start converging with their dollar value equivalents in difficulty of procurement. XSEDE has been a great resource, I love it. NSF if you read this please expand what you fund for HPC and data intensive platforms funding. Please keep the startup easy to get. Our most interesting new inroads into other subject areas is in the area of startup grants in the sub 200K SU range.

I'm curious if value of output from dollar value in XSEDE hours is less, equal, or greater than the dollar value of other grants provided by NSF?  If you know of work trying to put a value of different areas of research output (really hard to do) please put it in the comments.

* The Standard Flux model works like the Amazon Reserved model. The equipment is promised for the full 30 days and billed as a unit, but hours are consumed if you run code or not.  Thus a large over-subscription is built into the rate. 

** All Flux Rates use the full standard published cost. Most faculty as of early 2015 pay $6.60/core-month for Flux and $16.94/core-month for Flux On Demand.  Which lowers both Flux rates by 44%.

