[CF-metadata] Are ensembles a compelling use case for "group-aware" metadata? (CZ) from Steve Hankin on 2013-09-25 (Archive of CF discussions from 2002 to 2019 on the cf-metadata mailing list)

From: Steve Hankin <steven.c.hankin>
Date: Wed, 25 Sep 2013 12:33:32 -0700

On 9/24/2013 9:45 PM, Charlie Zender wrote:
> It is not my place to determine whether there is a consensus, or how
> close we are, but it's clear to me there is no consensus yet. Bryan
> Lawrence, Steve Hankin, Jonathan Gregory, Karl Taylor, and Philip
> Cameron-Smith are not "on board". I hope they will speak-up and say if
> they concur that maintaining the status quo (flat files) is best
> (period), or whether they do wish to extend CF to hierarchies
> (starting now), or the additional information they would need to decide.

Hi Charlie et. al.,

Since you have asked .... I have heard two points that seemed to
bolster Bryan's pov that the multi-model use case is "great but not
compelling". (See a more positive spin at the end.)

1. file size. Model outputs today are typically too large for even a
    single variable from a single model to be packaged in a single
    file. Addressing a model ensemble multiplies the size barrier by
    the ensemble size, N. Thus the use of groups to package a model
    ensemble applies only for the cases where user is interested in
    quite a small subset of the model domain, or perhaps in
    pre-processed, data-reduced versions of the models. A gut-estimate
    is that single file solutions, like netCDF4 groups addresses 25% or
    less of the stated use case. We could argue over that number, but
    it seems likely to remain on the low side of 50%. (Issues of
    THREDDS-aggregating files bearing groups also deserve to be
    discussed and understood. What works? what doesn't?)

2. The problems of the "suitcase packing" metaphor were invoked time
    and again, further narrowing the applicability of the use case. The
    sweet spot that was identified is the case of a single user desiring
    a particular subset from a single data provider. Essentially a
    multi-model ensemble encoded using netCDF4 groups would offer a
    standardized "shopping basket" with advantages that will be enjoyed
    by some high powered analysis users.

    For this narrower use case I couldn't help asking myself how the
    cost/benefit found through the use of netCDF4 groups compares with
    the cost/benefit of simply zip-packaging the individual CF model
    files. There is almost no cost to this alternative. Tools to pack
    and unpack zip files are universal, have UIs embedded into common
    OSes, and offer APIs that permit ensemble analysis to be done on the
    zip file as a unit at similar programming effort to the use of
    netCDF4 groups. Comprehension and acceptance of the zip
    alternative on the part of user communities would likely be
    instantaneous -- hardly even a point to generate discussion. Zip
    files do not address more specialized use cases, like a desire to
    view the ensemble as a 2-level hierarchy of models each providing
    multiple scenarios, but the "suitcase" metaphor discussions have
    pointed out the diminishing returns that accrue as the packing
    strategy is made more complex.

The tipping point for me is not whether a particular group of users
would find value in a particular enhancement. It is whether the overall
cost/benefit considerations -- the expanded complexity, the need to
enhance applications, the loss of interoperabilty etc. versus the
breadth of users and the benefits they will enjoy -- clearly motivate a
change. My personal vote is that thus far the arguments fall well
short of this tipping point. But maybe there are other use cases to be
explored. Perhaps in aggregate they may tip the cost/benefit analysis.
What about the "group of satellite swaths" scenario? -- a feature
collection use case. AFAIK CF remains weak at addressing this need thus
far. (If we pursue this line of discussion we should add the
'cf_satellite' list onto the thread. That community may have new work
on this topic to discuss.)

     - Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.cgd.ucar.edu/pipermail/cf-metadata/attachments/20130925/bba8e5c1/attachment.html>
Received on Wed Sep 25 2013 - 13:33:32 BST

This archive was generated by hypermail 2.3.0 : Tue Sep 13 2022 - 23:02:41 BST