Users love regular POSIX filesystem with folders etc, and it is their own metadata structure:
experment1_v2/realresults/10subject_1alpha_.05deg_16f_0given/data.h5
People love working with this, our brains wrap around it. The problem is data.h5 doesn't have any of the information from the directory structure the user has given it. Existing object store systems make it hard to navigate data like this.
I propose two ideas, a pseudo filesystem that looks like folders but can point to data in multiple ways depending on what metadata attribute you are interested in. The second is a 'search only' filesystem. Think of a search only filesystem to be like Apple Spotlight or Launch Bar etc. Most the time it's close enough and it finds what you want based on metadata. These searchable systems should be extendable from user space (think like bash completion add ons) around different communities of use.
This will allow for a few results:
- Users will find it useful in their own day to day work to attach metadata at data generation time rather than leaving it un-categorized data.
- It should allow for more robust metadata though the data entire life cycle to archive and thus be more useful to future users of the data
- Object filesystems holding the actual data can phase out traditional POSIX filesystem and hopefully help with many of the data scale problems we have had on the the trail to Exascale.
No comments:
Post a Comment