On 9/24/2013 9:45 PM, Charlie Zender wrote:
> It is not my place to determine whether there is a consensus, or how
> close we are, but it's clear to me there is no consensus yet. Bryan
> Lawrence, Steve Hankin, Jonathan Gregory, Karl Taylor, and Philip
> Cameron-Smith are not "on board". I hope they will speak-up and say if
> they concur that maintaining the status quo (flat files) is best
> (period), or whether they do wish to extend CF to hierarchies
> (starting now), or the additional information they would need to decide.
Hi Charlie et. al.,
Since you have asked .... I have heard two points that seemed to
bolster Bryan's pov that the multi-model use case is "great but not
compelling". (See a more positive spin at the end.)
1. file size. Model outputs today are typically too large for even a
single variable from a single model to be packaged in a single
file. Addressing a model ensemble multiplies the size barrier by
the ensemble size, N. Thus the use of groups to package a model
ensemble applies only for the cases where user is interested in
quite a small subset of the model domain, or perhaps in
pre-processed, data-reduced versions of the models. A gut-estimate
is that single file solutions, like netCDF4 groups addresses 25% or
less of the stated use case. We could argue over that number, but
it seems likely to remain on the low side of 50%. (Issues of
THREDDS-aggregating files bearing groups also deserve to be
discussed and understood. What works? what doesn't?)
2. The problems of the "suitcase packing" metaphor were invoked time
and again, further narrowing the applicability of the use case. The
sweet spot that was identified is the case of a single user desiring
a particular subset from a single data provider. Essentially a
multi-model ensemble encoded using netCDF4 groups would offer a
standardized "shopping basket" with advantages that will be enjoyed
by some high powered analysis users.
For this narrower use case I couldn't help asking myself how the
cost/benefit found through the use of netCDF4 groups compares with
the cost/benefit of simply zip-packaging the individual CF model
files. There is almost no cost to this alternative. Tools to pack
and unpack zip files are universal, have UIs embedded into common
OSes, and offer APIs that permit ensemble analysis to be done on the
zip file as a unit at similar programming effort to the use of
netCDF4 groups. Comprehension and acceptance of the zip
alternative on the part of user communities would likely be
instantaneous -- hardly even a point to generate discussion. Zip
files do not address more specialized use cases, like a desire to
view the ensemble as a 2-level hierarchy of models each providing
multiple scenarios, but the "suitcase" metaphor discussions have
pointed out the diminishing returns that accrue as the packing
strategy is made more complex.
The tipping point for me is not whether a particular group of users
would find value in a particular enhancement. It is whether the overall
cost/benefit considerations -- the expanded complexity, the need to
enhance applications, the loss of interoperabilty etc. versus the
breadth of users and the benefits they will enjoy -- clearly motivate a
change. My personal vote is that thus far the arguments fall well
short of this tipping point. But maybe there are other use cases to be
explored. Perhaps in aggregate they may tip the cost/benefit analysis.
What about the "group of satellite swaths" scenario? -- a feature
collection use case. AFAIK CF remains weak at addressing this need thus
far. (If we pursue this line of discussion we should add the
'cf_satellite' list onto the thread. That community may have new work
on this topic to discuss.)
- Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <
http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130925/bba8e5c1/attachment.html>
Received on Wed Sep 25 2013 - 13:33:32 BST