Sunday, February 8, 2015

Leaning on Glenn Lockwood's NSF Thoughts

The following in wonkish, I point out problems but offer no solutions.  I would love to hear others thoughts on this.

I am a huge fan of the thought processes of Glenn Lockwood. His recent thoughts on the NSF Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020 report aligns with my thoughts also.

This point is something I have been kicking around in discussions with many including Glenn for a few years:
Stop treating NSF's cyberinfrastructure as a computer science research project and start treating it like research infrastructure operation.  Office of Cyberinfrastructure (OCI) does not belong in Computer & Information Science & Engineering (CISE).

This is wildly unstable as treated. Applying the same review process used for novel science to something that should be infrastructure creates feature bloat.  Making each system proposal a test in how many extra bells and whistles novel and innovative ideas you can add to a proposal.  Huge amounts of labor is going into unproven methods for engaging investigators into the use of resources.  Like Glenn I think these sorts of efforts should be independent of the large resources.

I think PSC Bridges is a great example of what one has to commit to for a resource that if I had to estimate is still mostly going to be used as a traditional HPC platform.  It is an HPC platform, a Hadoop platform, a remote desktop / visualization platform, a database platform, and an accelerator platform. The one exception to this is portal use is common in some (focus on some) communities.

We have lost focused intensity. I think DoE still has some of this, systems are built with single purpose and can do it well.  Jack of all trades are handy, but but are masters of none, and I see our main open systems infrastructure moving more towards do it all systems.

The uncertainty of funding exists in all levels of government funding though.  Government contractors have dealt with the ebbs and flows of funding from sources.  Face it large HPC centers are in larger number than major funding sources.  They share more in common with defense contractors than say commercial cloud, of which there are a few providers for competition but thousands of funding sources/customers.

Large scale NSF funded centers should act more like highways. Highways are boring, and its a lowest cost business.  It is honestly not innovative, and this is fine. There will be some adoption of test cases for new innovative platforms, and I think NSF is already doing a great job funding these smaller projects, that is where they should stay. The standard though for when they should be scaled up and offered to everyone like a highway system needs to be evaluated.

As an operator, worrying about where my next grant for a large system is going to come from to make the case to keep staff on hand, makes it really hard to keep quality staff on hand.  The public science HPC world has already lost quality people like Glenn, to other more stable efforts.  When should a start-up be considered more stable!

This leads into two topics, how can we make things more stable to keep quality people and increase efficiency in resource delivery, and how do we help on the supply side. That is how do we create more professionals with the skill set to handle all efforts, public and private that require this skill set.

The open secret, if you are good at operating these resources, there is a bidding war for your labor.

No comments:

Post a Comment